Version of Exim: 4.43, + Exiscan patch
Operating System: Debian Sarge
URL will full details: http://www.lubemobile.com.au/ras/exim
I am getting duplicated messages with Exim. The about URL has all
the details, like logs, cron scripts, -H files, and config files.
The problem started when I took Philip's advice on how to prevent
large emails from hogging the queue. You Philip's suggestion
here:
http://www.exim.org/mail-archives/exim-users/Week-of-Mon-20050214/msg00050.html
I hope I interpreted Phillip's suggestion correctly. I added
a couple of size controls:
- An ACL ensure that messages above 20K are always queued.
- The SMTP transport refuses to deliver messages bigger than 100K
unless the MAX_MESSAGE_SIZE macro is redefined.
- A cron job that runs every 5 minutes. It starts an Exim queue
runner with the requisite -D switch.
It all seemed to work well enough on my test machine, of course.
When I sent a real live large message, this is what happened:
1. The message arrives. It hits the ACL condition is queued.
A log of what happened, as produced by "exim -d+all", is
on the web site, file name: 1st-exim4_-d+all_russell@???.
2. A normal queue runner comes along. It sees the message and
attempts to deliver it. There are two "interesting" steps
(well interesting in that I feel they may be related to the
problem) in this delivery process:
A. The message is duplicated, ie marked as "unseen".
B. The message is not delivered via SMTP, but rather is
delivered to the virus scanner. There are no size
restrictions on what is passed to the virus scanner.
As far as I can tell all these things work. I ran the
queue runner manually using "exim -d+all -q". The log
file is 2nd-exim4_-d+all_-q.log on the web site.
The one thing that didn't happen is the original message
was not considered delivered. This is the bug.
3. The virus scanner does its thing, and sends the now scanned
message back to exim using "exim -bs -oMt virus_scan < message".
This creates a second copy of the message. As it is a large
message the ACL causes it to be queued immediately. The
message produced is in 1D4X5K-0003oV-1M-H on the web site.
4. I repeated steps 2&3, running another queue running manually,
the producing a second duplicate (1D4YFN-0005Rd-Nb-H) and
another log (3rd-exim4_-d+all_-q.log). In real life several
queue runners come along, each one creating a duplicate,
until the cron job runs. The cron job delivers the original
message successfully, so the duplication stops.
The file 1D4X4f-0003oC-3e-H on the web site is the original message
that was queued, after it has been attacked by the queue runners.
I am off to look at the code now: any hints from those who know
the code better than I would be appreciated.