[Exim] hosts_max_try/sun.com WAS retry timeout exceeded

Top Page
Delete this message
Reply to this message
Author: Glenn Carver
Date:  
To: exim-users
Subject: [Exim] hosts_max_try/sun.com WAS retry timeout exceeded
There was a recent thread from Marc Merlin about exim timing out
trying to deliver to sun.com. Phil found that hosts_max_try needed to
be increased from its default of 5 because the first 5 MX hosts were
timing out.

I appear to have the reverse problem. Incoming email is being
rejected from sun.com:

2002-08-02 14:52:31 H=sun-9.sjc-colo.bbnplanet.com (news.iplanet.com)
[207.240.115.10] sender verify fail for <bounce@???>: all
relevant MX records point to non-existent hosts or (invalidly) to IP
addresses
2002-08-02 14:52:31 H=sun-9.sjc-colo.bbnplanet.com (news.iplanet.com)
[207.240.115.10] F=<bounce@???> rejected RCPT
<guang.zeng@???>: Sender verify failed

Presumably increasing hosts_max_try in the smtp transport is not
going to make any difference here since this is incoming? (assuming I
understand what's going on). How do I solve this one?

TIA,

      Glenn


--- thread follows

On Thu, 1 Aug 2002, Marc MERLIN wrote:

> So, I have mail for a user that was bounced by exim
>
> 2002-08-01 05:17:40 17aEkW-0000od-00 mx9.sun.com [192.18.98.34]:
>Connection timed out
> 2002-08-01 05:20:49 17aEkW-0000od-00 mx8.sun.com [192.18.98.36]:
>Connection timed out
> 2002-08-01 05:23:58 17aEkW-0000od-00 mx2.sun.com [192.18.98.43]:
>Connection timed out


Notice the IP addresses: 192.18.98.{34,36,43}.

> 2002-08-01 05:23:58 17aEkW-0000od-00 == marco.walther@???
>R=lookuphost T=remote_smtp defer (110): Connection timed out
> 2002-08-01 05:23:58 17aEkW-0000od-00 ** marco.walther@???:
>retry timeout exceeded
>
> Ok, we've all seen this, but mail to him was working less than 24H previous
> to that:
> 2002-07-31 17:33:18 17a3tx-0006Vt-01 => marco.walther@???

F=<svlug-bounces+marco.walther=sun.com@???> R=lookuphost
T=remote_smtp S=5049 H=mx1.sun.com [192.18.98.31] C="250 SAA17437
Message accepted for delivery"

Notice that the IP address there is 192.18.98.31. That is different to
the three above.

> So, I'm trying to find out why exim gave up delivery to him before the 4
> days expired _and_ a delivery worked soon before that?


I think I now understand this. I also think it's an obscure bug in Exim,
so I have made a note to try to find a way of improving things.

The clue is the hosts_max_try option in the smtp transport, whose
default is 5. The sun.com domain currently resolves to 7 hosts:

mx6.sun.com.             A  192.18.42.13
mx8.sun.com.             A  192.18.98.36
mx9.sun.com.             A  192.18.98.34
mx7.sun.com.             A  192.18.100.1
mx1.sun.com.             A  192.18.98.31
mx2.sun.com.             A  192.18.98.43
mx5.sun.com.             A  192.18.42.14


[This shows the usefulness of posting real log data. If you had obscured
the domain, I would never have thought of this.]

So Exim would have picked 5 to try. If all 5 failed, it would have
looked at their retry times, and if they were all expired, it would have
bounced the message, ignoring the other two hosts.

> You'll see that there are some connection timeouts by some MXes, but MX1
> also returned a lot more C="250 GAA24313 Message accepted for delivery"
>
> Could it be that the failure cache is by MX, and that when the delivery
> failed, an MX lookup didn't return MX1, and all the other MXes failed (they
> apparently always do)?


Effectively, yes! It returned MX1, but Exim discarded it by virtue of
the host_max_try setting.

<grumble>
What is the point of putting 7 MXs into the DNS if most of them always
reject connections?
</grumble>

Your workaround, of course, is to set hosts_max_try to some larger
number. My task is to make Exim look at the retry time for *all* the
hosts before bouncing an address.

Thanks for persisting on this one.

Philip