Hi folks,
I've altered the retry configuration on my mass mailer to give up after
12 hours, because I handle subsequent retries myself.
At the end of the last mailshot, I found these:
27h 19K 16nIEq-000BnI-07 <mailman-handler-XXX-XXXXX@???>
guest_localpart@???
For each one, exiwhat shows the following:
55584 running queue: waiting for 16nIEq-000BnI-07 (95006)
95006 delivering 16nIEq-000BnI-07 (queue run pid 55584)
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
exim 95006 0.0 0.1 1540 1148 ?? S 3:48PM 0:04.08 /usr/local\
/sbin/exim -C /usr/local/etc/exim/configure.spool2 -q
This is at 10:30, which suggests that the process was started almost 7
hours ago! The process started before the 12 hour cut-off mark in my
retry configuration, which explains why the message is still in the
queue more than 12 hours after the first delivery attempt failed.
I have the following configuration for the remote_smtp transport:
remote_smtp:
driver = smtp
command_timeout = 2m
connect_timeout = 2m
What's weird is that I can establish a connection to port 25 on
example.com (which doesn't have an MX record, just an A record). So it
looks like this Exim process got stuck a while ago and just isn't coming
back.
A BSD ktrace(1) on the process shows _no_ system call activity at all (unless
I run exiwhat, in which case I see the expected calls required to dump
process state to exim-process.info).
So, um... what's going on here? I really would like to get to the point
where the mail spool is empty 12 hours after the mailshot starts.
Anyone have any clues, or am I going to have to break out the debugger?
Ciao,
Sheldon.