Re: [exim] Re: Bug in Exim - duplicating messages

Top Page
Delete this message
Reply to this message
Author: Russell Stuart
Date:  
To: Philip Hazel
CC: exim-users
Subject: Re: [exim] Re: Bug in Exim - duplicating messages
On Wed, 2005-04-06 at 01:13, Philip Hazel wrote:
> Hmm. I cannot see that. Maybe I'm going mad, but I can't see anything in
> previously_transported() or the functions that it calls that would make
> any changes to that list.


I can't see it now either.

> > The reason for this long spiel is I have not tested your
> > patch, but it does look like it would exhibit the same
> > faults as my original one. I think my test script did
> > demonstrate the fault - if you go looking at what isn't
> > delivered.
>
> Certainly my patch is essentially the same as your first one. Do you
> still have the remnants of any tests that showed up the problem?


No, but I can re-do it. Here are the results.

Running http://www.lubemobile.com.au/ras/exim/bug.sh with
the working (I hope) version of my patch:

rstuart@master:~/test$ sudo ./bug.sh
Message 1DIwXJ-00030w-Me has been removed
Message 1DIwXJ-000312-RF has been removed
Deliveries done by lmtp transport:
250 2.0.0 <x@www.exim.org> OK

Running same program with the old "previously_transported(next)" patch:

rstuart@master:~/test$ sudo ./bug.sh
Message 1DIwS3-00014p-Ju has been removed
exim: no message ids given after -Mrm option
Deliveries done by lmtp transport:
250 2.0.0 <x@www.exim.org> OK

The message that couldn't be removed in the second test was delayed
because the emulated lmtp delivery agent always returns 421 for it.
In the second test exim thinks it has delivered it - hence gave an
error when I tried to remove it.


> HOWEVER:
>
> I can see a bug in my patch. Because it leaves the address on the queue,
> it will be picked up by the higher call to previously_transported() on
> the next time round the loop. This will cause child_done() to be called
> a second time on the address, which will probably cause some double
> logging, at the very least. Maybe is also caused your problem.


I will risk making another guess - it looks like child_done()
tracks whether the parent all the parents children have been
delivered by decrementing a counter ("addr->child_count -= 1"
in the code). Ergo if child_done() is called too many times,
that counter is going to be wrong - it will be decremented
one too many times, as you point out.

When that counter reaches 0 address_done() is called for the
parent(), which adds the parent to the nonrecipient tree.
It turns out that in my test the parent's email address is
identical to the undelivered child because the child was
created by an"unseen" flag in a router, so in flagging the
parent as delivered also flagged the undelivered child as
delivered.