Re: [exim] Valid working host "failing for a long time"

Top Page
Delete this message
Reply to this message
Author: Phil Pennock
Date:  
To: Colin
CC: exim-users
Old-Topics: Re: [exim] Valid working host "failing for a long time"
Subject: Re: [exim] Valid working host "failing for a long time"
On 2011-08-25 at 10:30 +0100, Colin wrote:
> On 25/08/2011 10:01, Graeme Fowler wrote:
> > Please try two things: Firstly, run "exim_tidydb /path/to/spool retry"
> > and see what happens, if the problem continues then simply stop Exim
> > and *remove* the retry DB file altogether (it's only a hints file so
> > is safe to delete). It strikes me that there may be some stale data
> > lying around. Alternatively it could be related to a very old bug in
> > Exim's retry handling but that was wrinkled out a long time ago... Graeme
>
> Thanks for the reply,
>
> Shortly after my last email I actually used exim_dumpdb to check if
> there were any entries in any of the databases that related to the
> domains in question. The only entries were in the callout db andnone
> seemed relevant to the error.
>
> The only oddity I came across was that there was no misc.lockfile
> present to dump that db though I could create one (removed after) and
> the dump returned nothing.
>
> I am running Exim version 4.69 so I wouldn't think there would be any
> old bugs knocking around, the OS is Centos 5 so it is the latest out of
> the packages.


Resurrecting this thread, because I think that I know part of what's
happening.

Exim honours the retry database for any delivery happening as part of a
queue-run. It's only "immediate" delivery which bypasses the retry
rules and goes straight through, clearing DB problems.

So if you have any mails matching a queue_only directive, then the mail
will not be immediately delivered. When I saw this problem as a
postmaster, I was queuing all mail and kicking off deliveries once per
minute, which let connection re-use work really well.

In your case, if you cross over something like queue_only_load, then
mails received at that time will go into the queue.

Which separately raises the issue of why the successful deliveries
aren't clearing the DB state, but does explain why this would only be
happening to some mails.

It's so obvious in retrospect, with those specific delivery details
called out. *sigh*

It's _possible_ that Exim needs to change from "retry rules honoured for
all messages found via queue-runs" to "retry rules honoured for all
messages found via queue-runs, unless it would be the first delivery
attempt for this mail". There are profound load implications to
changing this, so I'm not going to rush into it. Perhaps a new config
option ...

-Phil