Re: exim's retry database acting up

Top Page
Delete this message
Reply to this message
Author: Piete Brooks
Date:  
To: Christoph Lameter
CC: Philip Hazel, exim-users
Subject: Re: exim's retry database acting up
> When I brought it up again a huge number of messages were waiting in the
> queue on the mailserver but they were not delivered. "Retry time not
> reached"! Initially the users home directory could not be stat()ed and
> thus the message was deferred.


ZAP the retry database, and do a queue run ...

> Then the most amazing thing:
> New messages coming in where queued instead of being delivered!


Sure -- it knows that delivery attempts to that user failed (for *some* reason)
last time, so it's backed off delivery attempts.

> I finally had enough of this nonsense and stopped exim and erased the
> retry database. On startup exim delivered all messages to their
> destination.


That's the way to do it (not sure if a restart is needed, but then I don't run
exim like that ...)

> What exim needs to do is to deliver ALL messages when the first
> .forward file from those homedirectories was successfully read.


*YOU* know that, but how should exim ?
exim doesn't keep track of *why* it failed, but in particular, how does it
know that all .forward files are on a single server ?
(we have multiple home di servers ....)

> There needs to be some easy way to get rid of those retry times.


ZAP the retry database ...

> exim_fixdb was not documented in such a way that would have allowed me to
> use it on a huge number of messages.


Need a perl script .... :-)

> The way that exim runs the queue made it take a couple of hours to get
> through it. Messages are delivered one by one and since the queue also
> contained messages to hosts not reachable. There was a significant delay
> on those since exim was taking a long time to timeout on those problem
> messages.


I run multiple processes "exim -q & exim -q & exim -q & exim -q &"

> Could exim timeout within 10 seconds on those slow messages, fork them
> into a separate process and continue delivering the rest of the queue
> faster?


What if you have (say) lost your internet connection and have (say) 10K
hosts to which email is to be sent -- you want 10K processes ?