Re: [exim-dev] Re: [exim] Bug in duplicate detection?

Author: Steven A. Reisman
Date:
To: exim-dev
Subject: Re: [exim-dev] Re: [exim] Bug in duplicate detection?

I'm posting this not so much because I have a solution, but because the
problem reminded of a compiler design course I took many, many years ago,
and I thought I'd throw this out for discussion.

On Tue, Mar 08, 2005 at 04:55:17PM +0000, Philip Hazel wrote:

> Suppose a message is addressed to both alice and bob, and after two
> routing cycles, these addresses have been aliased like this:
>
> alice bob <== start > | | > V V > alice-1 bob-1 <== after first cycle > | | > V V > alice-2 alice-2 <== after second cycle

This problem seems to be similar to that which a compiler encounters
when trying to optimize code with common subexpressions.

Consider: a=(b*c)+(b*c)

A compiler constructs a Directed Acyclic Graph merging subtrees when
it sees common subexpressions:

> Now, at this point Exim notices that it has two identical addresses, so
> it drops one as a duplicate. When I first implemented this logic, I
> thought "routing an address will always give the same answer, so it
> does not matter which address is discarded". Unfortunately, my premise
> was utterly wrong.

Instead of throwing away one of the duplicate addresses (and the
associated ancestor tree), could the two trees be merged into a DAG?
That way all common ancestors can be accounted for.

Would a similar data structure and algorithm be applicable here?
Does that save all the information we need?

More importantly, is email routing according to the current "rules"
non-deterministic?

(I'll stop here but keep the remainder of Philip's analysis below.)

Steve

> Suppose that routing alice-2 causes it to be redirected to alice-1 (this
> is in effect what was happening in Harald's situation). If this is the
> alice-2 that came from alice, Exim will treat alice-1 as a "redirection
> to ancestor" because it will find alice-1 further up the ancestor chain.
> That case works: you are allowed to forward to yourself. The router that
> processed the original alice-1 will be skipped when the second alice-1
> is routed.

> However, if it is bob's alice-2 that is retained, once it is redirected
> to alice-1, Exim discards it as a duplicate (because alice-1 has been
> handled in a different ancestor chain). Thus, in this case, no delivery
> at all occurs because all the addresses have been discarded.

> This is of course a total disaster.

> It occurs to me also that the existence of the redirect_router option
> could make this worse. Each alice-2 could have a redirect_router setting
> that caused it to be handled entirely differently, for example, one
> might redirect to adam and the other to eve.

> My reaction to this discovery is .... Aarrgghh!!

> I'm afraid nothing can be done about this in the short term, but I guess
> there should be a long-term plan to redesign this part of Exim from
> scratch. In fact, the duplicate detection logic has become messy over
> time and has caused a number of problems. See, for example, the
> following ChangeLog entries:

> 4.11/17 4.11/85 4.05/31 4.02/17 3.953/55 3.21/7 ...

> So it's clearly a rather fragile area.

> Sorry, folks.

> Philip

----------------------------------------------------------------------
Steven A. Reisman          <sar@???>           P.O. Box 409
Press Enter LLP                                       421 N 2nd Street
715-426-2100 or 651-436-5254                    River Falls, WI  54022
----------------------------------------------------------------------

This message is part of the following thread:
	the complete thread tree sorted by date
	Philip Hazel at
	Phil Chambers at