Hi Exim Experts,
Thanks for the clues, Philip.
The exim_mainlog is still reporting the locked spool file:
2000-03-27 09:51:42 12W2Qd-0006Nx-00 Spool file is locked
But exiwhat doesn't say anything very useful for me:
[root@deliberate /tmp]# exiwhat
528 3.02 daemon: -q15m, listening on port 25
Would it be one of the [exim] processes? The log has 7 locked spool
files in the latest queue run and there are 6 [exim] processes. The
message we're studying was first sent on the 17th. These [exim]
processes are all older than that.
[root@deliberate /tmp]# ps -ef | grep exim
root 528 1 0 Mar09 ? 00:00:00 /usr/local/bin/exim -bd -q15m
root 3380 1 0 Mar14 ? 00:00:00 [exim]
majordom 3385 3380 0 Mar14 ? 00:00:00 [exim]
majordom 3387 3385 0 Mar14 ? 00:00:00 [exim]
root 19005 1 0 Mar16 ? 00:00:00 [exim]
majordom 19006 19005 0 Mar16 ? 00:00:00 [exim]
majordom 19008 19006 0 Mar16 ? 00:00:00 [exim]
root 9333 649 1 10:00 tty1 00:00:03 emacs exim_mainlog
root 9404 9336 0 10:06 ttyp0 00:00:00 grep exim
Am I doing something wrong that I have these old [exim] processes and
that some are owned by majordom?
In thinking about how it happens, does it help to know that we run a
daemon that checks every hour to see if our connection has fallen down
and restarts it if it has?
Thanks again for thinking about this. Those duplicates are
embarrassing.
Marilyn Davis, Ph.D.
eVote - online polling software for email lists
http://www.deliberate.com
marilyn@???
+1 650 965-7121 (USA)
On Fri, 24 Mar 2000, Philip Hazel wrote:
> On Wed, 22 Mar 2000, Marilyn Davis wrote:
>
> > This last time, I did a thorough study of the history of the duplicate
> > message and read everything in the Exim manual about spool locking and
> > I'm at a loss to figure out what to do.
>
> [snip]
>
> > 2000-03-17 11:28:59 12W2Qd-0006Nx-00 <= owner-mln-chat@???
> U=majordom P=local S=3247
> id=Pine.LNX.4.10.10003171125340.672-100000@???
>
> [snip]
>
> > Now, apparently this process has given up and never again tries any
> > deliveries but has left the lock on the spool file:
> >
> > 2000-03-17 11:45:49 Start queue run: pid=24659
> > 2000-03-17 11:45:49 12W2Qd-0006Nx-00 Spool file is locked
>
> I have seen stuck processes before, but usually when an *incoming*
> TCP/IP call got dropped, not an outgoing one. It seems to be a problem
> in the TCP/IP stack such that a system call fails to time out. A way to
> get out of this situation is to use exiwhat to find out which process is
> working on the message, and kill that process. Then the message is no
> longer locked, and the next queue run will pick it up again. It *should*
> be proof against duplicates.
>
> > Here we recycled the modem, the computer stayed up, and who generated
> > this? You can tell by the id that it is the same message.
> >
> > 2000-03-18 16:52:26 12WTxC-0000Fn-00 <= owner-mln-chat@???
> U=majordom P=local S=3247
> id=Pine.LNX.4.10.10003171125340.672-100000@???
>
> By the Pine message id it's the same message, but it looks from the log
> that majordom resubmitted it to Exim. Exim has given it a new Exim id.
> So the problem is why did Majordomo resubmit the message over 24 hours
> later? I think this is unrelated to the stuck delivery.
>
> --
> Philip Hazel University of Cambridge Computing Service,
> ph10@??? Cambridge, England. Phone: +44 1223 334714.
>
>
>
> --
> ## List details at http://www.exim.org/mailman/listinfo/exim-users Exim details at http://www.exim.org/ ##
>