Re: [exim-dev] local_scan timeout with troublesome dns

Góra strony
Delete this message
Reply to this message
Autor: Todd Lyons
Data:  
Dla: Todd Lyons, exim-dev
Temat: Re: [exim-dev] local_scan timeout with troublesome dns
On Tue, Mar 5, 2013 at 2:02 PM, Phil Pennock <pdp@???> wrote:

> Can they run gcore to get the state of their process when it's hung,
> *without* sending ^C to it and causing data_sigterm_sigint_handler() to
> be called? That's causing bad interactions with what's actually
> happening, but I think, in this case, you've mostly pinned it down.


That backtrace is in fact from a gcore on a hung process, sorry that I
wasn't clear about that. It is still hung IIRC (he left it running
for me) so we should be able to get more info if we still need it.
The Ctrl-C thing is how he figured out he could manually trigger it
from the exim -bh commandline.

>> I suspect the fix will be to add a global var that tracks when
>> receive_bomb_out() has been entered and immediately exit from the
>> function if it detects that it has already been entered once. I think
>> it's good that it gets called multiple times when things are really
>> bad, we just need to guard it from actually trying to shut everything
>> down more than once. Does anybody see any flaws or repercussions from
>> this approach?
>
> There are two issues here:
>
> (1) receive_bomb_out is assuming it's about to exit, so some of the code
>     was written with that assumption, I think, and predates the ACL
>     callout.

>
>     data_file is fclose()d but not set to NULL, so the actual bug
>     causing the stdio lockup is that we will fclose(data_file) twice.
>     Potentially, double-frees.

>
>     This:
> ----------------------------8< cut here >8------------------------------
> if (data_file != NULL) (void)fclose(data_file);
>   else if (data_fd >= 0) (void)close(data_fd);
> ----------------------------8< cut here >8------------------------------
>     needs to be this:
> ----------------------------8< cut here >8------------------------------
> if (data_file != NULL) {
>   (void)fclose(data_file);
>   data_file = NULL;
> } else if (data_fd >= 0) {
>   (void)close(data_fd);
>   data_fd = -1;
> }
> ----------------------------8< cut here >8------------------------------


I like that.

> (2) The recursive shutdowns of SMTP; best fixed by not trying to call
>     into the ACLs twice.  Instead, add a function-level static which is
>     set true the first time around, and if true, we don't call those
>     ACLs.
>     Because the only thing that should be happening at the _top_ level
>     after the ACLs are called is to exit, we can short-cut cleanly by
>     just exiting immediately.
>     We still need to NULL out the data_file/data_fd, to prevent their
>     use elsewhere.
> Note that global variables are unnecessary, function static variables
> will work just fine here.


Ok.

> The attached patch is also in git in the main repo, on the
> fix_receive_bombout branch. If it looks good to others, we can merge it
> in.


I should have time to check it out tomorrow.

Thanks Phil.

...Todd
--
The total budget at all receivers for solving senders' problems is $0.
If you want them to accept your mail and manage it the way you want,
send it the way the spec says to. --John Levine