Re: [exim-dev] Re: [exim] Bug in duplicate detection?

Top Page
Delete this message
Reply to this message
Author: Steven A. Reisman
Date:  
To: exim-dev
Subject: Re: [exim-dev] Re: [exim] Bug in duplicate detection?
I'm posting this not so much because I have a solution, but because the
problem reminded of a compiler design course I took many, many years ago,
and I thought I'd throw this out for discussion.


On Tue, Mar 08, 2005 at 04:55:17PM +0000, Philip Hazel wrote:

> Suppose a message is addressed to both alice and bob, and after two
> routing cycles, these addresses have been aliased like this:
>
>     alice               bob                   <== start
>       |                  |
>       V                  V
>     alice-1             bob-1                 <== after first cycle
>       |                  |
>       V                  V
>     alice-2             alice-2               <== after second cycle



This problem seems to be similar to that which a compiler encounters
when trying to optimize code with common subexpressions.

Consider: a=(b*c)+(b*c)

A compiler constructs a Directed Acyclic Graph merging subtrees when
it sees common subexpressions:

             = 
            / \
           a   +
              / \
             |   |
              \ /
               *
              / \
             b   c


> Now, at this point Exim notices that it has two identical addresses, so
> it drops one as a duplicate. When I first implemented this logic, I
> thought "routing an address will always give the same answer, so it
> does not matter which address is discarded". Unfortunately, my premise
> was utterly wrong.


Instead of throwing away one of the duplicate addresses (and the
associated ancestor tree), could the two trees be merged into a DAG?
That way all common ancestors can be accounted for.

Would a similar data structure and algorithm be applicable here?
Does that save all the information we need?

More importantly, is email routing according to the current "rules"
non-deterministic?

(I'll stop here but keep the remainder of Philip's analysis below.)

Steve



> Suppose that routing alice-2 causes it to be redirected to alice-1 (this
> is in effect what was happening in Harald's situation). If this is the
> alice-2 that came from alice, Exim will treat alice-1 as a "redirection
> to ancestor" because it will find alice-1 further up the ancestor chain.
> That case works: you are allowed to forward to yourself. The router that
> processed the original alice-1 will be skipped when the second alice-1
> is routed.


> However, if it is bob's alice-2 that is retained, once it is redirected
> to alice-1, Exim discards it as a duplicate (because alice-1 has been
> handled in a different ancestor chain). Thus, in this case, no delivery
> at all occurs because all the addresses have been discarded.


> This is of course a total disaster.


> It occurs to me also that the existence of the redirect_router option
> could make this worse. Each alice-2 could have a redirect_router setting
> that caused it to be handled entirely differently, for example, one
> might redirect to adam and the other to eve.


> My reaction to this discovery is .... Aarrgghh!!


> I'm afraid nothing can be done about this in the short term, but I guess
> there should be a long-term plan to redesign this part of Exim from
> scratch. In fact, the duplicate detection logic has become messy over
> time and has caused a number of problems. See, for example, the
> following ChangeLog entries:


> 4.11/17 4.11/85 4.05/31 4.02/17 3.953/55 3.21/7 ...


> So it's clearly a rather fragile area.


> Sorry, folks.


> Philip


----------------------------------------------------------------------
Steven A. Reisman          <sar@???>           P.O. Box 409
Press Enter LLP                                       421 N 2nd Street
715-426-2100 or 651-436-5254                    River Falls, WI  54022
----------------------------------------------------------------------