Re: [exim] Weird retry behaviour

Top Page
Delete this message
Reply to this message
Author: Dan Carroll
Date:  
To: Russell King
CC: Exim Users
Subject: Re: [exim] Weird retry behaviour
Have a look at the retry and wait spool hint database entries.
If a host has been down long enough, Exim won’t give it the opportunity to retry even with a new message until retry time is reached.

man exim_dumpdb and exim_fixdb

You can also delete the retry and wait* databases to reset everything….
(I’d probably offline exim while I did that but it’s likely not necessary).

-D



On 28 Jul 2014, at 1:02 am, Russell King <rmk+exim@???> wrote:

> I know that 4.69 is an old version of exim, but... I'm seeing some
> weird behaviour with it.
>
> The machine in question acts as a backup machine for another computer.
> It's setup such that each night, it powers itself on, transfers the
> data, archives it, sends a mail and powers off. Once a week, it
> remains on for a 24 hour period.
>
> The problem is this - exim behaves itself just fine when it can send
> the message immediately. If it can't (because of the DSL at the site
> being down) then exim gives me a hard failure and bounces the message.
>
> This goes totally against what is in the config file for the retry
> rules:
>
> *                      *           F,2h,15m; G,16h,1h,1.5; F,4d,6h

>
> The config file is pretty much standard Fedora 14, but with these as
> the routers (as is the above line being the F14 default):
>
> remote_smtp:
> driver = smtp
> headers_rewrite = *@* hidden@??? fs
> return_path = hidden@???
>
>
> So it should take many days before bouncing. However:
>
> 2014-07-26 07:01:42 1XAv38-0000iU-Ln <= backup@??? U=backup P=local S=65027 id=20140726060142.GA2756@shgc-backup
> 2014-07-26 07:01:48 1XAv38-0000iU-Ln => rmk@??? R=dnslookup T=remote_smtp H=mx0.arm.linux.org.uk [78.32.30.218] X=TLSv1:AES256-SHA:256
>
> that one was fine. Then this morning:
>
> 2014-07-27 04:19:35 1XBEzn-0000XA-FM <= root@??? U=root P=local S=3340
> 2014-07-27 04:20:17 1XBEzn-0000XA-FM == rmk@??? <root@???> R=dnslookup defer (-1): host lookup did not complete
> 2014-07-27 04:21:22 1XBEzn-0000XA-FM == rmk@??? <root@???> routing defer (-51): retry time not reached
> 2014-07-27 04:26:21 1XBEzn-0000XA-FM == rmk@??? <root@???> routing defer (-51): retry time not reached
> 2014-07-27 04:31:23 1XBEzn-0000XA-FM == rmk@??? <root@???> routing defer (-51): retry time not reached
> 2014-07-27 04:36:42 1XBEzn-0000XA-FM == rmk@??? <root@???> R=dnslookup defer (-1): host lookup did not complete
> ...
> 2014-07-27 05:16:42 1XBEzn-0000XA-FM == rmk@??? <root@???> R=dnslookup defer (-1): host lookup did not complete
> ...
> 2014-07-27 05:36:42 1XBEzn-0000XA-FM == rmk@??? <root@???> R=dnslookup defer (-1): host lookup did not complete
> ...
> 2014-07-27 05:56:41 1XBEzn-0000XA-FM == rmk@??? <root@???> R=dnslookup defer (-1): host lookup did not complete
> ...
> 2014-07-27 06:16:41 1XBEzn-0000XA-FM == rmk@??? <root@???> R=dnslookup defer (-1): host lookup did not complete
> 2014-07-27 06:36:41 1XBEzn-0000XA-FM == rmk@??? <root@???> R=dnslookup defer (-1): host lookup did not complete
>
> 2014-07-27 06:44:43 1XBHGF-0000iX-Rm <= backup@??? U=backup P=local S=63350 id=20140727054423.GA2759@shgc-backup
> 2014-07-27 06:45:24 1XBHGF-0000iX-Rm == rmk@??? R=dnslookup defer (-1): host lookup did not complete
> 2014-07-27 06:46:19 1XBEzn-0000XA-FM == rmk@??? <root@???> routing defer (-51): retry time not reached
> 2014-07-27 06:46:19 1XBHGF-0000iX-Rm == rmk@??? routing defer (-51): retry time not reached
> ...
> 2014-07-27 07:46:39 1XBEzn-0000XA-FM == rmk@??? <root@???> R=dnslookup defer (-1): host lookup did not complete
> 2014-07-27 07:46:39 1XBHGF-0000iX-Rm == rmk@??? routing defer (-51): retry time not reached
> ...
> 2014-07-27 08:46:19 1XBEzn-0000XA-FM == rmk@??? <root@???> routing defer (-51): retry time not reached
> 2014-07-27 08:46:19 1XBHGF-0000iX-Rm == rmk@??? routing defer (-51): retry time not reached
> ...
> 2014-07-27 09:21:40 1XBEzn-0000XA-FM == rmk@??? <root@???> R=dnslookup defer (-1): host lookup did not complete
> 2014-07-27 09:21:40 1XBHGF-0000iX-Rm == rmk@??? routing defer (-51): retry time not reached
> ...
> 2014-07-27 11:40:59 1XBHGF-0000iX-Rm mx0.arm.linux.org.uk [2002:4e20:1eda:1:214:fdff:fe10:1be6] Network is unreachable
> 2014-07-27 11:40:59 1XBHGF-0000iX-Rm mx0.arm.linux.org.uk [2001:4d48:ad52:3201:214:fdff:fe10:1be6] Network is unreachable
> 2014-07-27 11:41:04 1XBHGF-0000iX-Rm => rmk@??? R=dnslookup T=remote_smtp H=mx0.arm.linux.org.uk [78.32.30.218] X=TLSv1:AES256-SHA:256
> 2014-07-27 11:41:04 1XBHGF-0000iX-Rm Completed
> 2014-07-27 11:41:19 1XBEzn-0000XA-FM ** rmk@??? <root@???> R=dnslookup T=remote_smtp: retry time not reached for any host after a long failure period
> 2014-07-27 11:41:19 1XBLtH-0000qh-O9 <= <> R=1XBEzn-0000XA-FM U=exim P=local S=4383
> 2014-07-27 11:41:19 1XBEzn-0000XA-FM Completed
> 2014-07-27 11:41:21 1XBLtH-0000qh-O9 => rmk@??? <root@???> R=dnslookup T=remote_smtp H=mx0.arm.linux.org.uk [78.32.30.218] X=TLSv1:AES256-SHA:256
>
> So, at 11:41:04, exim found that the destination was now able to be
> delivered to. However, it decided to time out the 1XBEzn-0000XA-FM
> message _before_ the retry rules stated that it should time out, and
> sent a non-delivery report... which it also successfully delivered to
> the same destination!
>
> The wait-remote_smtp database is empty.
>
> The two most recent retry database entries are:
>
> 26-Dec-2013 03:06:32 27-Jul-2014 11:41:04 27-Jul-2014 17:41:04 *
> T:mx0.arm.linux.org.uk:2002:4e20:1eda:1:214:fdff:fe10:1be6 101 77 Network is unreachable
> 24-Dec-2013 03:06:23 27-Jul-2014 11:41:04 27-Jul-2014 17:41:04 *
> T:pandora.arm.linux.org.uk:2002:4e20:1eda:1:214:fdff:fe10:1be6 101 77 Network is unreachable
>
> which are expected as the site running this exim has no IPv6 connectivity
> to be able to use the IPv6 addresses I have here. The only entry for the
> IPv4 address is an old one which should have expired long ago (and the
> DNS changed since then):
>
> 13-Feb-2014 05:26:39 13-Feb-2014 05:26:39 13-Feb-2014 05:41:39
> T:caramon.arm.linux.org.uk:78.32.30.218 110 333 Connection timed out
>
> Indeed, having tidied the retry database, the only two entries which
> remain are the two above.
>
> The DNS for the machine is configured to use google's DNS servers
> (iow, 8.8.8.8 and 8.8.4.4) as I've had problems with the ISPs DNS
> servers - so DNS would have been unavailable during the loss of
> connectivity too.
>
> So, the question is whether there's something screwed with the config
> file, or whether it's just this old exim version misbehaving (which I
> suspect is the real problem here.) What I don't understand is why the
> successful delivery of 1XBHGF-0000iX-Rm seemed to cause 1XBEzn-0000XA-FM
> to be immediately bounced.
>
> This probably isn't an issue that I can reproduce at will; I've seen it
> a number of times, and it's always triggered by the loss of connectivity
> at the site.
>
> --
> Russell King
>
> --
> ## List details at https://lists.exim.org/mailman/listinfo/exim-users
> ## Exim details at http://www.exim.org/
> ## Please use the Wiki with this list - http://wiki.exim.org/
>