Re: [Exim] messages not timing out

Top Page
Delete this message
Reply to this message
Author: Richard.Hall
Date:  
To: exim-users
Subject: Re: [Exim] messages not timing out
Hi Phil,

Very long time, no see!

On Wed, 2 Jun 2004, Philip Chambers wrote:

> I have just discovered that I have several messages in my queue which are not being
> failed after several days (the oldest is 13 days). They are all addressed to
> xxxx@??? or sina.com.cn, so should go via sinamx.sina.com.cn. However,
> connections to port 25 at that address are accepted but immediately disconnected.
>
> I have the following retry section
>
> *    *  F,2h,15m;F,16h,1h;F,4d,4h

>
> I don't understand why exim has not given up on them. If I manually give up un them
> using eximon then the non-delivery messages will not give the correct reason.


I've got exactly the same problem for exactly the same domains, though I
can only claim 8 days (should have expired at 5 days)

My current (incomplete) analysis goes something like this

a) the messages are NOT frozen (sorry, Nico!)
b) sinamx.sina.com.cn has 16 (count 'em) A records.
c) exinext is reporting errors on 40 (yes, really) different addresses
d) one can only speculate about why their DNS is changing so significantly

e) some of the addresses reported by exinext are very old, eg

Transport: sinamx.sina.com.cn [202.106.182.168] error 131: Connection
reset by peer
first failed: 18-May-2004 11:03:49
last tried: 18-May-2004 11:03:49
next try at: 18-May-2004 11:13:49

This one (and hopefully all the similar ones) is not a current A record,
and is presumably waiting for an exim_tidydb run to clear it out.
Hopefully it is not contributing to the problem in the meantime.

f) some are still 'in progress', eg

Transport: sinamx.sina.com.cn [202.106.187.149] error -18: Remote host
sinamx.sina.com.cn [202.106.187.149] closed connection in response to
initial connection
first failed: 02-Jun-2004 07:02:23
last tried: 02-Jun-2004 12:19:25
next try at: 02-Jun-2004 12:34:25

g) some are 'past final cutoff time' (I never have managed to understand
what that means!!), eg

Transport: sinamx.sina.com.cn [202.106.182.232] error -18: Remote host
sinamx.sina.com.cn [202.106.182.232] closed connection in response to
initial connection
first failed: 26-May-2004 02:51:45
last tried: 02-Jun-2004 11:33:24
next try at: 02-Jun-2004 12:33:24
past final cutoff time

and yet they are still retried ...

Transport: sinamx.sina.com.cn [202.106.182.232] error -18: Remote host
sinamx.sina.com.cn [202.106.182.232] closed connection in response to
initial connection
first failed: 26-May-2004 02:51:45
last tried: 02-Jun-2004 12:33:55
next try at: 02-Jun-2004 13:33:55
past final cutoff time

h) some actually manage a connection of sorts, eg

Transport: sinamx.sina.com.cn [202.106.187.181:1BVSHb-0000uv-TU] error
-18: Remote host sinamx.sina.com.cn [202.106.187.181] closed connection in
response to end of data
first failed: 02-Jun-2004 11:31:48
last tried: 02-Jun-2004 11:31:48
next try at: 02-Jun-2004 11:41:48


I can't help feeling that there is some complicated interaction between
the number of A records, the relative 'age' of the various addresses, the
different failure modes, and who knows what else, which is conspiring to
keep these messages 'alive' beyond their sell-by date. But trying to
understand the algorithm does my poor little head in, so I'll have to
leave this to the experts.

I can provide the full exinext output if anyone wants it.

This is on Exim 4.30

> Phil.
> ---------------------------------------
> Phil Chambers (postmaster@???)
> University of Exeter



HTH,
Richard Hall