On Wed, 14 Jul 2004, Marcin Owsiany wrote:
> The log contained a reception of a message, but no other activity for that
> message (in case of the second box - for over 2 hours, and in case of the
> second - for over 6 days, even though the final retry cutoff time is 4 days!)
Thanks for the report and the detailed debugging information.
> I don't know why the timeout was 600 seconds, since I have not changed the
> default, which is documented to be 5 minutes (smtp_receive_timeout),
The final timeout at the end of a message has a longer default (10
minutes).
> | /* Wait until the socket is ready */
> |
> | for (;;)
> | {
> | FD_ZERO (&select_inset);
> | FD_SET (sock, &select_inset);
> | tv.tv_sec = timeout;
> | tv.tv_usec = 0;
> | rc = select(sock + 1, (SELECT_ARG2_TYPE *)&select_inset, NULL, NULL, &tv);
> |
> | /* If some interrupt arrived, just retry. We presume this to be rare,
> | but it can happen (e.g. the SIGUSR1 signal sent by exiwhat causes
> | select() to exit). */
> |
> | if (rc < 0 && errno == EINTR)
> | {
> | HDEBUG(D_any) debug_printf("EINTR while selecting for socket read\n");
> | continue;
> | }
>
> Apparently the delivery process has spent 6 days hanging like this, by having
> its select() timeout reset to 10 minutes by exiwhat every two minutes.
Aarrgghh!! Nasty.
> I don't exactly understand why the TCP connection wasn't simply terminated by
> the remote host?
You would think so, but there are oddities in TCP/IP stacks that I do
not pretend to understand.
> Anyway, can this be worked around at least for the systems which update the
> struct timeval in case of EINTR? Or maybe even for all other, by manually
> comparing time() before select() and after it's interrupted?
Good idea. I will do something like that, but it is too late for 4.40,
which was released a few hours ago.
--
Philip Hazel University of Cambridge Computing Service,
ph10@??? Cambridge, England. Phone: +44 1223 334714.
Get the Exim 4 book: http://www.uit.co.uk/exim-book