Re: [exim] spurious delay warning message?

Top Page
Delete this message
Reply to this message
Author: Maarten van Baarsel
Date:  
To: exim-users
Subject: Re: [exim] spurious delay warning message?
On 26/5/2013 22:45 , Evgeniy Berdnikov wrote:
> On Sun, May 26, 2013 at 06:15:41PM +0200, Mrten wrote:


>> I'm getting a warning message for a cronjob for a few days now:
>>
>> This message was created automatically by mail delivery software.
>> A message that you sent has not yet been delivered to one or more of its
>> recipients after more than 4 hours on the queue on kimiko.ii.nl.
>>
>> The message identifier is:     1UgMol-0004Xa-Lb
>> The date of the message is:    Sun, 26 May 2013 05:12:01 +0200
>> The subject of the message is: Cron <scripts@kimiko>
>> $HOME/mysql/backup/sql2sshfs.sh

>>
>>
>> The weird part is that all the headers indicate that the message was
>> actually sent on 05:12:19 so was only on the queue for a few seconds:


> These headers do NOT belong to the delayed message 1UgMol-0004Xa-Lb.


Argh, you're right. Man, I read and reread my initial message a few
times, too :)

Here are the headers from the "problematic" cron-mail:

Received: from kimiko.ii.nl ([82.94.191.102]:59351)
    by mail1.ii.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
    (Exim 4.76)
    (envelope-from <scripts@???>)
    id 1UgRNa-0001S8-AF
    for scripts@???; Sun, 26 May 2013 05:12:18 +0200
Received: from scripts by kimiko.ii.nl with local authenticated as
scripts with  authenticator
    (Exim 4.71)
    (envelope-from <scripts@???>)
    id 1UgMol-0004Xa-Lb
    for scripts@???; Sun, 26 May 2013 05:12:01 +0200
Date: Sun, 26 May 2013 05:12:01 +0200


As you can see, there's no indication of a four hour delay there either.

> As for the problematic message 1UgMol-0004Xa-Lb, from the first set
> of log records you can see that there is extremely large gap between
> pid=[17458] of initial exim process and pid=[21242] of the second:

[...]
> This difference is anomaly, because typically exim forks forks so quickly
> that 2nd process get pid greater by one then the pid of 1st process.
> But in your case the difference is 3784. It means that something caused
> exim to hang for a long time from the moment it records the message
> submission time to the moment it could fork and write the next log record.


Well, it is a long running cronjob (it's a daily cronjob running a
database backup), so a four hour wait is not unusual.

If I read you right, process 17458 starts when the cronjob starts and
starts the queue-timer for the message at that time, but only logs and
adds headers when the cronjob ends.

An offlist suggestion pointed out that I had the domain in queue_smtp_domains,
I unset that and today I got no delay warning message.


I guess the temporal difference is what triggered my question: is that
expected behaviour for Exim?


thanks!

Maarten.