Re: [exim] problems with retry logic

Top Page
Delete this message
Reply to this message
Author: Philip Hazel
Date:  
To: Pavel Gulchouck
CC: exim-users
Subject: Re: [exim] problems with retry logic
On Fri, 5 May 2006, Pavel Gulchouck wrote:

> root@hamster:~>exim -d+retry -v -q 1FZbYM-000IDB-Hk 1FZbYM-000IDB-Hk


Please try

exim -d+retry -M 1FZbYM-000IDB-Hk

so that it does actually try a delivery (that's what I intended) - but
first see below.

> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> Considering: frenzy2008@???
> unique = frenzy2008@???
> dbfn_read: key=R:gmail.com
> dbfn_read: key=R:frenzy2008@???
> no domain retry record
> post-process frenzy2008@??? (1)
> LOG: retry_defer MAIN
> == frenzy2008@??? routing defer (-51): retry time not reached
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


You can find out what it thinks the retry time is by using the exinext
utility. This is a routing retry that will apply just to one address.

You are running 4.60, I seem to recall. This change was made for 4.61:

PH/19 When calculating a retry time, Exim used to measure the "time since
      failure" by looking at the "first failed" field in the retry record. Now
      it does not use this if it is later than than the arrival time of the
      message. Instead it uses the arrival time. This makes for better
      behaviour in cases where some deliveries succeed, thus re-setting the
      "first failed" field. An example is a quota failure for a huge message
      when small messages continue to be delivered. Without this change, the
      "time since failure" will always be short, possible causing more frequent
      delivery attempts for the huge message than are intended.
      [Note: This change was subsequently modified - see PH/04 for 4.62.]


Subsequently modified for 4.62:

PH/04 Change PH/19 for 4.61 was too wide. It should not be applied to host
      errors. Otherwise a message that provokes a temporary error (when other
      messages do not) can cause a whole host to time out.


I am not sure if this will affect routing retries, but I think it might.
So perhaps you should start by upgrading to 4.62.

> I do not understand logic...


One of the problems is that *I* no longer fully understand all of it. It
probably needs somebody to draw out a flowchart or something to try to
see what is going on.

> Exim should inspect all queue for old messages to the same destination
> after each delivery attempt of any other message?


Exim always inspects all messages, but if one message has a temporary
error, the remaining messages will see "retry time not reached". BUT, if
they have been on the queue sufficiently long, they should be bounced.

> AFAIU retry time is relative to the destination, not to the message.


Your error is a recipient error, so it's also relative to the recipient.

> I assume that after 20-minutes delay queue runner was started and it
> decides that this message is too old and removes it, its decision was
> based on retry database (relative to the destination email or to the
> relay used in this attempt). But I'm not sure of this. It's hard to
> reproduce with debug, such behavior is rare in my system.


Exim should never remove a message without logging *something*.

-- 
Philip Hazel            University of Cambridge Computing Service
Get the Exim 4 book:    http://www.uit.co.uk/exim-book