Re: [exim] Bug#360696: Failed to get write lock for/var/spoo…

Kezdőlap
Üzenet törlése
Válasz az üzenetre
Szerző: Michel Meyers
Dátum:  
Címzett: exim-users, 360696-quiet
CC: 360696-submitter, Andreas Metzler
Régi témák: Re: [exim] Failed to get write lock for/var/spool/exim4/db/retry.lockfile: timed out
Tárgy: Re: [exim] Bug#360696: Failed to get write lock for/var/spool/exim4/db/retry.lockfile:timed out
Philip Hazel wrote:
> On Fri, 21 Apr 2006, Andreas Metzler wrote:
>
> I wonder how we can track this down. There must be something different
> about Michel's system, because nobody else is reporting this, and there
> must be many cases of this kind of retrying happening to lots of people.


If there's anything I could run to debug it, please let me know (as I
can reproduce the problem pretty easily here).

>>> The given log had this:
>>>> 2006-04-04 09:13:48 1FQfhL-00035E-Ay Failed to get write lock for
>>>> /var/spool/exim4/db/retry.lockfile: timed out
>>>> 2006-04-04 09:14:48 1FQfhL-00035E-Ay Failed to get write lock for
>>>> /var/spool/exim4/db/retry.lockfile: timed out
>>> which suggests two tries for the same message, one minute apart. How
>>> often was the OP starting queue runners?
>> <the usual -q30m>
>
> Hmm. So why are there those two messages, I wonder?


Don't get too hung up on them. I do not recall the exact circumstances
of when those were generated (I might have called runq manually at the
time).

>> Note that I get those for mails that are not stuck
>
> They should just be getting read locks (and the message is wrong, as per
> the bug I found), but why are they failing? I guess the next question is
> what DBM library is in use?


I guess you mean libdb4.2 (package rev 4.2.52-23.1 is installed)?

> What kind of file system is used for
> /var/spool/exim4? I'm grasping at straws here.


/var is ext3

>> 2006-04-20 21:06:35 1FWeOO-000396-DP Spool file is locked (another
>> process is handling this message)
>
> At least *some* locking is working. :-)
>
> Does the OP have any kind of tool for looking at open files to see what
> process is using them? For example, fuser? The output of
>
> fuser /var/spool/exim4/db/retry.lockfile
>
> might be helpful.


Had to wait to get home to reproduce the problem, here's the result:

fuser /var/spool/exim4/db/retry.lockfile
/var/spool/exim4/db/retry.lockfile: 9934 9963

  ps ax | grep 9934
  9934 ?        R      1:32 /usr/sbin/exim4 -Mc 1FYTF5-0002a2-VT
10034 pts/7    R+     0:00 grep 9934


  ps ax | grep 9963
  9963 ?        S      0:00 /usr/sbin/exim4 -Mc 1FYTFD-0002aU-BC
10060 pts/7    S+     0:00 grep 9963


a little later:

fuser /var/spool/exim4/db/retry.lockfile
/var/spool/exim4/db/retry.lockfile: 9934

9934 is the stuck process. The other one was a normal message that got
delivered.

2006-04-25 21:30:58 1FYTFD-0002aU-BC <= apache@domain U=Debian-exim
P=spam-scanned S=2855 id=fccbb95ffed7a6394c5a8b23b6ed0547@domain
2006-04-25 21:31:58 1FYTFD-0002aU-BC Failed to get write lock for
/var/spool/exim4/db/retry.lockfile: timed out
2006-04-25 21:32:58 1FYTFD-0002aU-BC Failed to get write lock for
/var/spool/exim4/db/retry.lockfile: timed out
2006-04-25 21:32:58 1FYTFD-0002aU-BC => user <address@domain>
R=local_user T=mail_spool

This time I didn't call 'runq', but I did issue several 'mailq's.

Greetings,
        Michel