[Exim] Duplicate Messages

Top Page
Delete this message
Reply to this message
Author: Marilyn Davis
Date:  
To: exim-users
Subject: [Exim] Duplicate Messages
We were having occasional duplicate messages sent to our lists. It's
all fixed now but I'm reporting the problem and fix in case it helps
someone else. Our system is:

Caldera Linux, Exim, Majordomo

Philip Hazel, Exim's author, worked painstakingly with me for weeks,
combing through process lists to figure this out.

We would occasionally get stuck exim processes, a clump of them for
one message that was to go to a list. The exim_mainlog would report
"Spool File Locked" on the message.

If we rebooted our machine while these processes were stuck, new
processes would start up and resend the message.

By experimenting with messages to a test list, and capturing the
process lists at short intervals, we learned that the process that
delivered the message into the majordomo alias didn't die until the
message had been delivered to each of its recipients, which could be
days. All the processes involved would stay alive.

Philip fixed this in exim 3.14 by closing the pipe. This wasn't
exactly a bug because it only produced spurious behavior in
combination with another thing happening.

After the fix, we still got a stuck message, but not a bunch of them,
with an accompanying "Spool File Locked" and a stale process, still
alive, but stuck.

So next, on Philip's suggestion, I attached my debugger to the stuck
process and viewed the stack to learn that it was stuck on a connect()
call, which is an TCP/IP call that should time out via the OS in some
reasonable length of time -- but never never did, not for weeks.

So Philip suggested a little work-around, using

connect_timeout = 3m

on the smtp transport.

Everything has been super since, and it was April 22, 2000 when we
fixed it.

Marilyn Davis, Ph.D.
eVote(R) - online polling software for email lists
http://www.deliberate.com 
marilyn@???    
+1 650 965-7121  (USA)