Re: [exim] Failed to get write lock for/var/spool/exim4/db/…

Top Page
Delete this message
Reply to this message
Author: Philip Hazel
Date:  
To: Andreas Metzler
CC: exim-users, 360696-submitter
New-Topics: Re: [exim] Bug#360696: Failed to get write lock for/var/spool/exim4/db/retry.lockfile:timed out
Subject: Re: [exim] Failed to get write lock for/var/spool/exim4/db/retry.lockfile: timed out
On Fri, 21 Apr 2006, Andreas Metzler wrote:

> I've asked the submitter (Michel Meyers), his answers follow, my
> comments are in <angle brackets>


> A little note there: I recently sent a message to a misconfigured domain
> (their MX was pointing to an invalid hostname that didn't resolve) and
> exim got into the same loop, so it doesn't only affect 451 failures but
> also other messages deliveries that need to be retried.


I wonder how we can track this down. There must be something different
about Michel's system, because nobody else is reporting this, and there
must be many cases of this kind of retrying happening to lots of people.

> > I suppose we'll have to look at the configuration that was being used.
>
> Certainly, I have attached the config file template to this message


That doesn't help, I'm afraid.

> > The given log had this:
>
> > > 2006-04-04 09:13:48 1FQfhL-00035E-Ay Failed to get write lock for
> > > /var/spool/exim4/db/retry.lockfile: timed out
> > > 2006-04-04 09:14:48 1FQfhL-00035E-Ay Failed to get write lock for
> > > /var/spool/exim4/db/retry.lockfile: timed out
>
> > which suggests two tries for the same message, one minute apart. How
> > often was the OP starting queue runners?
>
> <the usual -q30m>


Hmm. So why are there those two messages, I wonder?

> Note that I get those for mails that are not stuck


They should just be getting read locks (and the message is wrong, as per
the bug I found), but why are they failing? I guess the next question is
what DBM library is in use? What kind of file system is used for
/var/spool/exim4? I'm grasping at straws here.

> 2006-04-20 21:06:35 1FWeOO-000396-DP Spool file is locked (another
> process is handling this message)


At least *some* locking is working. :-)

Does the OP have any kind of tool for looking at open files to see what
process is using them? For example, fuser? The output of

fuser /var/spool/exim4/db/retry.lockfile

might be helpful.

-- 
Philip Hazel            University of Cambridge Computing Service
Get the Exim 4 book:    http://www.uit.co.uk/exim-book