Re: [exim] Batch SMTP hangs

Página Inicial
Delete this message
Reply to this message
Autor: Geraint Edwards
Data:  
Para: John Hall
CC: Exim Users list
Assunto: Re: [exim] Batch SMTP hangs
I've been busy of late, so my analysis has not been as thorough
as I had hoped, so apologies if the below is brief.

John Hall <john.d.hall@???> said
        (on Wed, Jul 13, 2005 at 04:06:30PM +0100):

> I have a perl script that runs as a pipe transport and injects two
> e-mails back into the system via BSMTP. Up until a recent upgrade from
> Woody to Sarge this all worked fine.


Did exim get bumped from to 4.51 at that time? You could
possibly do

    grep 'daemon started' /var/log/exim/mainlog*


to get version numbers/dates for daemon versions.

I had a very similar problem from my e-mail list software (which
I have been running for 5+ years, always using exim). It was
also piping from perl into exim for delivery (though not using
BSMTP) - my code was similar to

    open(MAIL, "|exim -oi -f$sender $recipient") or croak "..."


About a week after an upgrade to 4.51, I noticed things not
working properly (mail was often sent, but...).

The close on the pipe just hung. Namely, the perl:

    close(MAIL) or croak "..."


never returned. So the inbound delivery (triggering the above
code to send an outbound mail) *eventually* fell over as follows:

2005-07-22 15:47:25 1DvXs5-000n6s-I9 ** |/usr/sbin/listreq foo <foo-request@???> R=into_list T=address_pipe: pipe delivery process timed out

This happened several times, so I took the opportunity (ahem!) to
rewrite the mail submission part of the code using perl's
Net::SMTP module. This problem seemed to occur elsewhere - not
just for these (list-generated) mails. So, the following may be
related - my hunch is that it is.

I've since upgraded to exim 4.52 - but I still get a lot of ugliness.
Inbound deliveries work OK, locally-generated mail to local sites
is OK, but mail to remote sites (typically using mutt) seems to
follow this pattern:

    - the mail does get delivered
        (if it succeeds on first pass)
    - the exim submission(?) parent then blocks once the
        delivery child process exits
    - the log and queue look incomplete (examples below)
    - the queued message is locked by the hung process,
        so any failed deliveries are thwarted.


Here are some examples/diagnostics (edited):

# mailq
22m  1.6K 1E2REk-0005MQ-Dr <foo@???>
        D bar@???


# ps auxww | grep exim | grep -v '.-bd'
root     18403  0.0  0.3  6728 2996  ??  Is   10:50AM   0:00.00 sendmail -oem -oi -f foo@??? -- bar@??? (exim-4.52-0)
muser    18404  0.0  0.0     0    0  ??  Z    10:50AM   0:00.00  (exim-4.52-0)


It seems the child/zombie process (18404) has "completed" - a
trace says it has called exit(3) - so I can't signal it. The
parent process only responds to a 'kill -9', the logs then says
"Completed" but shows the incoming ("<=") line, no delivery
("=>") lines, before the "Completed" line. Ugly as sin.

# cd /var/spool/exim
# fstat */k/*
USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W NAME
root     exim-4.52-0 18403    5 /var     1025612 -rw-r-----     236 rw   input/k/1E2REk-0005MQ-Dr-D
root     exim-4.52-0 18403    8 /var     1025616 -rw-r-----      48  w   input/k/1E2REk-0005MQ-Dr-J
root     exim-4.52-0 18403    7 /var     1025614 -rw-r-----     112  w  msglog/k/1E2REk-0005MQ-Dr


<start indented snippet of -H file>
    <details>
    XX
    1
    bar@???


    <headers>
<end snippet>


The -J file contains all delivered recipients (in the above case,
one line). The msglog file typically contains one "... Received
from ..." line; for expanding addresses, this can also include
further ("... transport succeeded" and "... children all
complete") lines. I don't think it's complete.

I notice that the original poster (hi, John) hinted they were
using Debian. I'm running the exim port on FreeBSD 4.11 and was
considering posting to a FreeBSD list about it, but am posting
here because the problem may be cross-platform.

That's all I have time for, at the moment, sorry.

Patches/suggestions welcome.

--
Geraint A. Edwards (aka "Gedge")