[Exim] exim-3.35: Stuck messages (hung delivery processes)

Author: Sheldon Hearn
Date:
To: exim-users
Subject: [Exim] exim-3.35: Stuck messages (hung delivery processes)

Hi folks,

I've altered the retry configuration on my mass mailer to give up after
12 hours, because I handle subsequent retries myself.

At the end of the last mailshot, I found these:

27h   19K 16nIEq-000BnI-07 <mailman-handler-XXX-XXXXX@???>
          guest_localpart@???

For each one, exiwhat shows the following:

55584 running queue: waiting for 16nIEq-000BnI-07 (95006)
95006 delivering 16nIEq-000BnI-07 (queue run pid 55584)

USER      PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED      TIME COMMAND
exim    95006  0.0  0.1  1540 1148  ??  S     3:48PM   0:04.08 /usr/local\
    /sbin/exim -C /usr/local/etc/exim/configure.spool2 -q

This is at 10:30, which suggests that the process was started almost 7
hours ago! The process started before the 12 hour cut-off mark in my
retry configuration, which explains why the message is still in the
queue more than 12 hours after the first delivery attempt failed.

I have the following configuration for the remote_smtp transport:

remote_smtp:
driver = smtp
command_timeout = 2m
connect_timeout = 2m

What's weird is that I can establish a connection to port 25 on
example.com (which doesn't have an MX record, just an A record). So it
looks like this Exim process got stuck a while ago and just isn't coming
back.

A BSD ktrace(1) on the process shows _no_ system call activity at all (unless
I run exiwhat, in which case I see the expected calls required to dump
process state to exim-process.info).

So, um... what's going on here? I really would like to get to the point
where the mail spool is empty 12 hours after the mailshot starts.
Anyone have any clues, or am I going to have to break out the debugger?

Ciao,
Sheldon.

This message is part of the following thread:
	the complete thread tree sorted by date

	Sheldon Hearn at