Re: [Exim] locking question.

Top Page
Delete this message
Reply to this message
Author: Philip Hazel
Date:  
To: John Jetmore
CC: exim-users
Subject: Re: [Exim] locking question.
On Fri, 7 Jun 2002, John Jetmore wrote:

> So, this got me to wondering about the nature of locking for these hints
> dbs. Almost everything in the docs refers to locking for appendfile
> transport and so forth. Pretty much the only mention of locking and the
> hints dbs is that it does occur. Looking at the source, it appears that
> only fcntl locking is done. The file it locks is called .lockfile, but
> the file doesn't seem to use true lockfile locking (hitching post creation
> and whatnot).


Correct. It is not "lockfile locking" in the sense that that phrase is
used for locking mailboxes. That is why the file is called .lockfile
and not .lock, incidentally.

Exim needs to have a lock before it tries to open a hints database.
Early versions tried to lock on the opened database files themselves,
but this gave problems, because the DBM libraries "do things" between
actually opening the files and returning to the caller. So instead, Exim
just takes out a lock on a separate file that is associated with the
database.

> This seems like an issue to me as our db directory is on NFS (a netapp
> F760). In fact, the whole spool directory is on NFS, and has for the most
> part worked beautifully. We have experienced in the past w/ flock times
> where files on NFS will become unlockable - that is, not only does locking
> not work, but it sort of permanently "freezes out" all locks on that
> inode. fcntl in this case certainly appears to be doing the same thing,
> which explains why removing the files (and thus generating new inodes)
> works here.


fcntl() and lockf() are supposedly "the same thing", and in some OS,
flock() is too. So I'm not surprised that problems with one cause
problems with the other.

However, it sounds to me as though this is an NFS problem with the lock
daemon, and is something that should be chased down and fixed.

> I had thought that problems like this were why exim
> supported lockfiles so much, but I see in the book that it actually had
> something to do with the size of the file reported by the NFS server. I
> suppose if this is the reason lockfile wouldn't be viewed as necessary
> because the file that gets locked contains no data itself.


Basically correct.

The reason Exim uses "traditional lockfiles" for mailboxes is indeed to
do with NFS servers, but the problems arise only if the NFS mailbox is
available for access by more than one host. In the case of the hints
files, I assumed that only one host would ever be involved. (Indeed, I
never thought anybody would put them on anything other than local disc,
if the truth be told.)

Your hints files *are* only used by one host, I presume?

> Does anyone experience problems like this? Is there any chance of getting
> lockfiles for hints dbs? I convinced my boss that exim was NFS/lockfile
> clean when we moved to it, I would certainly hate to have to use local
> storage now. Also, are there any other places where fcntl is used only?


"Traditional" lock files are very much less efficient. The reason is
that when the file is locked, a waiting process has to keep sleeping and
trying again. With fcntl() locking, it can issue a locking call with a
timeout which returns (with the lock) as soon as the holding process
releases it. In addition, the OS keeps the queue of waiting processes
and (we assume) arranges for them to get the lock in such a way that
none of them wait for ever. With a "sleep and retry" approach on a busy
system, one process might never get the lock... (Yes, this could happen
for mailboxes, but one assumes that mail for one individual mailbox
doesn't arrive that quickly.)

I'm not, as you can gather, keen to change Exim to work round what
appears to be an NFS/lockd bug for what I think is a rare kind of
configuration.

Philip

--
Philip Hazel            University of Cambridge Computing Service,
ph10@???      Cambridge, England. Phone: +44 1223 334714.