On Tue, Mar 5, 2013 at 2:02 PM, Phil Pennock <pdp@???> wrote:
> Can they run gcore to get the state of their process when it's hung,
> *without* sending ^C to it and causing data_sigterm_sigint_handler() to
> be called? That's causing bad interactions with what's actually
> happening, but I think, in this case, you've mostly pinned it down.
That backtrace is in fact from a gcore on a hung process, sorry that I
wasn't clear about that. It is still hung IIRC (he left it running
for me) so we should be able to get more info if we still need it.
The Ctrl-C thing is how he figured out he could manually trigger it
from the exim -bh commandline.
>> I suspect the fix will be to add a global var that tracks when
>> receive_bomb_out() has been entered and immediately exit from the
>> function if it detects that it has already been entered once. I think
>> it's good that it gets called multiple times when things are really
>> bad, we just need to guard it from actually trying to shut everything
>> down more than once. Does anybody see any flaws or repercussions from
>> this approach?
>
> There are two issues here:
>
> (1) receive_bomb_out is assuming it's about to exit, so some of the code
> was written with that assumption, I think, and predates the ACL
> callout.
>
> data_file is fclose()d but not set to NULL, so the actual bug
> causing the stdio lockup is that we will fclose(data_file) twice.
> Potentially, double-frees.
>
> This:
> ----------------------------8< cut here >8------------------------------
> if (data_file != NULL) (void)fclose(data_file);
> else if (data_fd >= 0) (void)close(data_fd);
> ----------------------------8< cut here >8------------------------------
> needs to be this:
> ----------------------------8< cut here >8------------------------------
> if (data_file != NULL) {
> (void)fclose(data_file);
> data_file = NULL;
> } else if (data_fd >= 0) {
> (void)close(data_fd);
> data_fd = -1;
> }
> ----------------------------8< cut here >8------------------------------
I like that.
> (2) The recursive shutdowns of SMTP; best fixed by not trying to call
> into the ACLs twice. Instead, add a function-level static which is
> set true the first time around, and if true, we don't call those
> ACLs.
> Because the only thing that should be happening at the _top_ level
> after the ACLs are called is to exit, we can short-cut cleanly by
> just exiting immediately.
> We still need to NULL out the data_file/data_fd, to prevent their
> use elsewhere.
> Note that global variables are unnecessary, function static variables
> will work just fine here.
Ok.
> The attached patch is also in git in the main repo, on the
> fix_receive_bombout branch. If it looks good to others, we can merge it
> in.
I should have time to check it out tomorrow.
Thanks Phil.
...Todd
--
The total budget at all receivers for solving senders' problems is $0.
If you want them to accept your mail and manage it the way you want,
send it the way the spec says to. --John Levine