Re: [exim] Weird retry behaviour

Author: Todd Lyons
Date:
To: Dan Carroll
CC: Exim Users, Russell King
Subject: Re: [exim] Weird retry behaviour

On Wed, Jul 30, 2014 at 4:57 PM, Dan Carroll <fbsd@???> wrote:
>
>>> You can also delete the retry and wait* databases to reset everything….
>>> (I’d probably offline exim while I did that but it’s likely not necessary).
>> And using exim_tidydb (as I said I have done) and then I quoted the only
>> two remaining entries in the retry database…
> The fact that you see this in your logs:
> "retry time not reached for any host after a long failure period”
> Means that exim considered that the host was down for a long time. That information as I am sure you realise, comes from the retry/wait dbs.
> You tidied up an old record from the MX of the domain you are trying to send to, and then the retry succeeded.

I did not interpret it that way. This is the flow as I understand it:
1) message A sent at 4:xx...deferred
2) retry..retry..retry
3) message B sent at 6:xx...succeeds
4) immediately decide to fail message A, even though the just
completed message B was to the same recipient as message A

> My guess is that the retry code matched the old DB entry (makes sense, the IP address is the same even if the hostname is different),which for some reason was not removed.
> Another guess, perhaps it went like this:
> entry is added in February when the host is offline.
> host DNS changes (as you have stated it did)
> now caramon.* does not exist, but the mx for the domain is mx0.*
> mx0.* has the same IP address as caramon.
>
> For some reason, caramon never gets cleaned (maybe exim does not clean up hosts when the hostname does not match or won’t resolve?)
> I’m not sure how db entries are cleaned. Perhaps after a successful connection, exim removes the matching entry (host+ip), which in your case would mean that it would never remove caramon from the list.

There is typically a daily cron job that cleans the exim db's:

KVM-CentOS510[root@ivwm01 ~]# more /etc/cron.daily/exim-tidydb
#!/bin/bash
SPOOLDIR=/var/spool/exim
cd $SPOOLDIR/db
for a in retry misc wait-* callout ratelimit; do
    [ -r "$a" ] || continue
    [ "${a%%.lockfile}" = "$a" ] || continue
    /usr/sbin/exim_tidydb $SPOOLDIR $a >/dev/null
done

Since an entry from Feb was still in the retry hint db, either this
cron job is not firing during the times that it's on, or it's failing
in some way, or it's not configured in cron at all.

> Then we try and deliver some mail in July. > retry matches the host caramon in the retry DB because the retry code only looks at the IP address. > So it does not even bother to retry. It seems all retries have been exhausted.

This does seem logical.

Bottom line from the discussion above: make sure that the tidydb is
being called regularly.

...Todd
--
The total budget at all receivers for solving senders' problems is $0.
If you want them to accept your mail and manage it the way you want,
send it the way the spec says to. --John Levine

This message is part of the following thread:
	the complete thread tree sorted by date
	Dan Carroll at
	Russell King at