[Exim] Blocking vs non-blocking file locking

Top Page
Delete this message
Reply to this message
Author: Philip Hazel
Date:  
To: exim-users
Subject: [Exim] Blocking vs non-blocking file locking
A while ago on the list there was a discussion about the use of blocking
versus non-blocking calls to fcntl() for the purposes of locking
mailboxes. Exim currently uses non-blocking calls, with sleeps and
retries. This did not perform as well as blocking calls (with timeouts)
in a given circumstance, and it was suggested that Exim change what it
does.

I have now had a deeper look at this, and have been reminded by the
comments in the code as to why Exim uses non-blocking calls. It is all
to do with the NFS shambles. Exim is being ultra cautious. Lock files
must always be used with NFS, but somebody might make a mistake and
configure Exim to use fcntl() locking only on NFS. Exim is trying to
minimize the damage that would ensue, by not holding the mailbox file
open while waiting to get the lock. If a blocking fcntl() were used, the
file would necessarily remain open during the wait.

The question is: does this actually matter? If somebody delivers over
NFS without requesting the use of lockfiles, perhaps trying to avoid the
disaster is actually a bad thing - they ought to find out sooner rather
than later. OTOH, there may be MUAs in use which use only fcntl() and
the sysadmin might not have any knowledge of what they are actually
doing.

I'm airing this for general information. I think what I am likely to do
is change the default to blocking, but leave the possiblity of
non-blocking available, just in case. The existing lock_retries and
lock_interval values are still needed for controlling the lockfile
locking; I propose to add lock_fcntl_timeout. If this is set to 0, a
non-blocking lock will be used. Otherwise a blocking lock with that
timeout. I also propose that the number of retries of the blocking lock
be limited to that number which keeps the total retrying time less than
lock_retries * lock_interval + lock_fcntl_timeout.

For example: if lock_retries = 10 and lock_interval = 3s and
lock_fcntl_timeout = 20s there will be two attempts at the blocking
lock. (After the first, 20s < 30s, so it tries again; after the second
try, 40s > 30s, so it stops.)



-- 
Philip Hazel            University of Cambridge Computing Service,
ph10@???      Cambridge, England. Phone: +44 1223 334714.