[exim] Bug in Exim - duplicating messages

Top Page
Delete this message
Reply to this message
Author: Russell Stuart
Date:  
To: exim-users
Subject: [exim] Bug in Exim - duplicating messages
Version of Exim: 4.43, + Exiscan patch
Operating System:      Debian Sarge
URL will full details: http://www.lubemobile.com.au/ras/exim


I am getting duplicated messages with Exim. The about URL has all
the details, like logs, cron scripts, -H files, and config files.

The problem started when I took Philip's advice on how to prevent
large emails from hogging the queue. You Philip's suggestion
here:

http://www.exim.org/mail-archives/exim-users/Week-of-Mon-20050214/msg00050.html

I hope I interpreted Phillip's suggestion correctly. I added
a couple of size controls:

- An ACL ensure that messages above 20K are always queued.

  - The SMTP transport refuses to deliver messages bigger than 100K
    unless the MAX_MESSAGE_SIZE macro is redefined.


  - A cron job that runs every 5 minutes.  It starts an Exim queue
    runner with the requisite -D switch.


It all seemed to work well enough on my test machine, of course.
When I sent a real live large message, this is what happened:

1.  The message arrives.  It hits the ACL condition is queued.
    A log of what happened, as produced by "exim -d+all", is
    on the web site, file name: 1st-exim4_-d+all_russell@???.


2.  A normal queue runner comes along.  It sees the message and
    attempts to deliver it.  There are two "interesting" steps
    (well interesting in that I feel they may be related to the
    problem) in this delivery process:


    A.  The message is duplicated, ie marked as "unseen".


    B.  The message is not delivered via SMTP, but rather is
        delivered to the virus scanner.  There are no size
        restrictions on what is passed to the virus scanner.


    As far as I can tell all these things work.  I ran the
    queue runner manually using "exim -d+all -q".  The log
    file is 2nd-exim4_-d+all_-q.log on the web site.


    The one thing that didn't happen is the original message
    was not considered delivered.  This is the bug.


3.  The virus scanner does its thing, and sends the now scanned
    message back to exim using "exim -bs -oMt virus_scan < message".
    This creates a second copy of the message.  As it is a large 
    message the ACL causes it to be queued immediately.  The
    message produced is in 1D4X5K-0003oV-1M-H on the web site.


4.  I repeated steps 2&3, running another queue running manually,
    the producing a second duplicate (1D4YFN-0005Rd-Nb-H) and
    another log (3rd-exim4_-d+all_-q.log).  In real life several
    queue runners come along, each one creating a duplicate,
    until the cron job runs.  The cron job delivers the original
    message successfully, so the duplication stops.


The file 1D4X4f-0003oC-3e-H on the web site is the original message
that was queued, after it has been attacked by the queue runners.

I am off to look at the code now: any hints from those who know
the code better than I would be appreciated.