[exim] Retry time not reached for any host after a long fail…

Top Page
Delete this message
Reply to this message
Author: Jens Hoffrichter
Date:  
To: exim-users
Subject: [exim] Retry time not reached for any host after a long failure period
Hello all,

I'm having this pretty weird problem with exim where I can't find a
solution for. Similar things popped up a couple of times on the
mailing list, but nothing matched the exact symptoms I'm experiencing.

I'm doing some general overhaul to a larger mail system, with several
smtp in servers, and 5 cyrus pop3/imap backend servers. Up to now, the
delivery of mail to the backends was handled by a local "deliver"
process, which used a replicated cyrus murder database to determine
the backend to deliver to, and sent it via lmtpproxyd to the
responsible backends.

Due to various reasons, we need to change that and have exim
delivering directly to the backends using lmtp, as we want to
gradually replace the smtp in servers with new hardware and newer
distributions, and the cyrus on the new in server just isn't
compatible with the old backend servers.

When I switch the exim to delivering directly via lmtp, everything
seems to work fine, except some messages get direct bounces without
even having tried to deliver it to the backend. Here is a relevant
excerpt from the logfile, with some data anonymized:

2008-08-27 01:00:02 1KY7W6-00049z-6S <= root@???
H=(webserver10.xxxxxx) [xxx.xxx.174.206] P=esmtp S=1249
id=20080826230001.D220E469D70@???
2008-08-27 01:00:02 1KY7W6-00049z-6S **
xx123456@??? <user@???> R=loadbalancer_final
T=remote_lmtp_delivery: retry time not reache
d for any host after a long failure period
2008-08-27 01:00:02 1KY7W6-00049z-6S **
xx456789@??? <user@???> R=loadbalancer_final
T=remote_lmtp_delivery: retry time not reached for
any host after a long failure period
2008-08-27 01:00:02 1KY7W6-0004A5-F7 <= <> R=1KY7W6-00049z-6S U=exim
P=local S=2363
2008-08-27 01:00:02 1KY7W6-00049z-6S Completed
2008-08-27 01:00:02 1KY7W6-0004A5-F7 => root@???
R=outgoing_route T=remote_smtp H=smtp.liwest.at [212.33.55.20]
X=TLSv1:AES256-SHA:256
2008-08-27 01:00:02 1KY7W6-0004A5-F7 Completed


The relevant routers and delivery from the config file:

address_data is filled from an ldap query in a router earlier, and is correct

loadbalancer:
driver = redirect
redirect_router = loadbalancer_final
local_part_suffix = +*
local_part_suffix_optional
condition = ${lookup{${extract {mailHost}
{$address_data}}}lsearch{/etc/exim/backends}}
data = ${extract {uid} {$address_data}}@${extract {mailHost} {$address_data}}

loadbalancer_final:
driver = accept
condition = ${if def:address_data{yes}{no} }
transport = remote_lmtp_delivery

begin transports

remote_lmtp_delivery:
driver = smtp
protocol = lmtp
port = 2003
hosts = $domain
gethostbyname = true


If cut out irrelevant parts from the config which are not used for
this example, but there is another router between the two, and more
transports, of course.

In the backends file there is a line like "lilzmailbe01.liwest.at :
OK" for each backend, so the condition matches. I wanted to use
${filter on a list, but some of the exim installations are so old that
it isn't implemented there yet, so I had to fall back to the file
variant.

The real strange thing is, that delivery to both of the mentioned
hosts in there worked fine before and after, just that message (and a
couple more) got bounced. One thing I noticed, though, is the fact
that it only happened to mails which had more than one RCPT TO:
(newsletters which got delivered to the mailsystem, mainly)

The exim version on that particular host where that happened ist 4.66

I'm really at a loss what happens here, and I can't figure out where
the problem is. I hope someone has an idea, any input is greatly
appreciated.

Thanks to all in advance,
Jens