On 11/09/14 08:03, Kerstin wrote:
> Am 10.09.2014 um 20:36 schrieb Jeremy Harris:
>
>> Looking at the mainlog around one of the items in question, was
>> the queue-runner running at the time?
>
>
> Looks like there are 2 queue-runners running:
>
> 2014-09-11 03:51:12 Start queue run: pid=13336 2014-09-11 03:51:13
> Start queue run: pid=13382 2014-09-11 03:51:31 End queue run:
> pid=13336
That's... odd. Fortunately, the inter-queue-run locks
appear to have worked, and one exited fairly quickly;
before the delivery:
> 2014-09-11 03:51:38 1XRtXj-0003UM-OW >> XXXX C="250 2.6.0
> <201409-56e09801-e5e9-43f0-92b1-c912fbe7ff02@???>
> [InternalId=3664] Queued mail for delivery" 2014-09-11 03:51:38
> 1XRtXj-0003UM-OW <= XXXX P=esmtp S=21889
> id=201409-56e09801-e5e9-43f0-92b1-c912fbe7ff02@??? 2014-09-11
> 03:51:38 1XRtXj-0003UM-OW Completed
It might be useful to know when the *connection* arrived from (sigh,
obfuscated thus making our lives harder in interpreting this) "<= XXXX"
This:
> +++ 1XRtXj-0003UM-OW has not completed +++
was presumably exigrep again, on the paniclog not the mainlog.
Please, keep the two separate.
This:
> 2014-09-11 03:51:48 1XRtXj-0003UM-OW => XXXX C="250 2.6.0
> <201409-56e09801-e5e9-43f0-92b1-c912fbe7ff02@???>
> [InternalId=3665] Queued mail for delivery"
... I assume that was from the paniclog also. Two confusing things
here:
- it's apparently a store&forward delivery "=>", not the cutthrough
delivery ">>"
- the InternalId given by the destination end is different, implying
that it really was a duplicate delivery that we did
Was that line in mainlog as well as paniclog?
> 2014-09-11 03:51:48 1XRtXj-0003UM-OW failed to unlink
> /var/spool/exim4/msglog/j/1XRtXj-0003UM-OW: No such file or
> directory 2014-09-11 03:51:49 End queue run: pid=13382
>
> But only 1 daemon running at the moment:
>
> # ps aux | grep exim root 4372 0.0 0.0 3788 784 pts/0
> S+ 08:23 0:00 grep exim 102 24243 0.0 0.0 11008 2752 ?
> Ss Sep03 1:36 /usr/local/bin/exim -bd -q30m
>
> How can this happen? exim is not started from cron. Monit is
> running, but did not restart exim.
I can't think of a way right now... unless - is the system clock
stable? Is time jumping backwards? That would invalidate all sorts
of assumptions if true.
--
Cheers,
Jeremy