Re: [exim] Potential logic error in retry handling for IPv4+…

Top Page
Delete this message
Reply to this message
Author: Philip Hazel
Date:  
To: Marc Haber, exim-users, Florian Weimer, Jeroen van Wolffelaar
CC: 
Subject: Re: [exim] Potential logic error in retry handling for IPv4+IPv6 hosts
On Sat, 17 Dec 2005, Marc Haber wrote:

> This is actually an issue with how exim handles DNS answers. Just
> imagine that the A record for a target host name expires in the
> resolver's cache some time earlier than the AAAA record. When exim now
> queries for the MX record, the resolver returns the data which it
> still has cached, which is the AAAA record, in the additional section.
>
> Exim will believe the information from the additional section, and try
> delivering there.


Aarrgghh!! Yes, that does explain it. Thank you for tracking this one
down.

On Sat, 17 Dec 2005, Marc Sherman wrote:

> What's the fix for this? Have exim always explicitly query DNS for A if
> the additional section only returns AAAA (or vice versa)? Or should the
> additional section not be trusted at all?


I'm beginning to think that the additional section should not be trusted
at all.

On Sat, 17 Dec 2005, Marc Haber wrote:

> I think that if none of the hosts listed in the additional section is
> reachable, exim should explicitly query for AAAA and A to make sure to
> catch even the hosts that are not listed in the additional section.


Given the current design of Exim, this is not possible. It does the DNS
queries at routing time, and doesn't know whether the hosts are
reachable until transport time.

(1) A possibility, I suppose, would be to believe the additional section
for the very first time a message is tried, and to do the additional DNS
lookups for any subsequent delivery attempts. No, even that would be bad
if the additional section hosts were long-time dead, because Exim would
then bounce the message.

(2) Try again: Exim could see if any of the additional section hosts
have a retry record in Exim's database, and if so, try the full DNS
lookup. But that would put extra load on the hints databases. It might
not be any better than just doing the full A/AAAA lookup.

(3) I suppose a modified (1) would be never to bounce a message for "all
hosts timed out" on the first delivery attempt. Then always do the full
lookup on the subsequent attempts. That's the best I can come up with
for now.

> This is, btw, not an ipv6 issue exclusively, it might happen in
> ipv4-only setups as well.


Indeed.

On Sat, 17 Dec 2005, Florian Weimer wrote:

> It would increase the number of DNS queries by quite a bit, though.


Yes. Do we have any idea if this would be a serious problem?

On Mon, 19 Dec 2005, Jeroen van Wolffelaar wrote:

> Not so much if first all of the additional section is tried, and extra
> queries only been done of all of those fail -- a more rare situation I'd
> say.


That's true, but Exim's design does not lend itself to implementing
that, other than by the first-time/next-time idea I outlined above.

On Mon, 19 Dec 2005, Florian Weimer wrote:

> This would require a significant change in how Exim handles routing
> (part of the routing would run after the a delivery attempt has been
> performed). I doubt it's worth the complexity.


I agree.

-- 
Philip Hazel            University of Cambridge Computing Service,
ph10@???      Cambridge, England. Phone: +44 1223 334714.
Get the Exim 4 book:    http://www.uit.co.uk/exim-book