Re: [exim] failed to unlink *-J in queue

Top Page
Delete this message
Reply to this message
Author: Jeremy Harris
Date:  
To: exim-users
Subject: Re: [exim] failed to unlink *-J in queue
On 11/09/14 08:03, Kerstin wrote:
> Am 10.09.2014 um 20:36 schrieb Jeremy Harris:
>
>> Looking at the mainlog around one of the items in question, was
>> the queue-runner running at the time?
>
>
> Looks like there are 2 queue-runners running:
>
> 2014-09-11 03:51:12 Start queue run: pid=13336 2014-09-11 03:51:13
> Start queue run: pid=13382 2014-09-11 03:51:31 End queue run:
> pid=13336


That's... odd. Fortunately, the inter-queue-run locks
appear to have worked, and one exited fairly quickly;
before the delivery:

> 2014-09-11 03:51:38 1XRtXj-0003UM-OW >> XXXX C="250 2.6.0
> <201409-56e09801-e5e9-43f0-92b1-c912fbe7ff02@???>
> [InternalId=3664] Queued mail for delivery" 2014-09-11 03:51:38
> 1XRtXj-0003UM-OW <= XXXX P=esmtp S=21889
> id=201409-56e09801-e5e9-43f0-92b1-c912fbe7ff02@??? 2014-09-11
> 03:51:38 1XRtXj-0003UM-OW Completed


It might be useful to know when the *connection* arrived from (sigh,
obfuscated thus making our lives harder in interpreting this) "<= XXXX"

This:

> +++ 1XRtXj-0003UM-OW has not completed +++


was presumably exigrep again, on the paniclog not the mainlog.
Please, keep the two separate.


This:
> 2014-09-11 03:51:48 1XRtXj-0003UM-OW => XXXX C="250 2.6.0
> <201409-56e09801-e5e9-43f0-92b1-c912fbe7ff02@???>
> [InternalId=3665] Queued mail for delivery"


... I assume that was from the paniclog also. Two confusing things
here:
- it's apparently a store&forward delivery "=>", not the cutthrough
delivery ">>"
- the InternalId given by the destination end is different, implying
that it really was a duplicate delivery that we did

Was that line in mainlog as well as paniclog?


> 2014-09-11 03:51:48 1XRtXj-0003UM-OW failed to unlink
> /var/spool/exim4/msglog/j/1XRtXj-0003UM-OW: No such file or
> directory 2014-09-11 03:51:49 End queue run: pid=13382
>
> But only 1 daemon running at the moment:
>
> # ps aux | grep exim root      4372  0.0  0.0   3788   784 pts/0
> S+   08:23   0:00 grep exim 102      24243  0.0  0.0  11008  2752 ?
> Ss   Sep03   1:36 /usr/local/bin/exim -bd -q30m

>
> How can this happen? exim is not started from cron. Monit is
> running, but did not restart exim.


I can't think of a way right now... unless - is the system clock
stable? Is time jumping backwards? That would invalidate all sorts
of assumptions if true.
--
Cheers,
Jeremy