On Wed Jan 26 09:18:53 2000, Philip Hazel wrote:
> When I started writing Exim I knew nothing about the details of Unix
> locking. People told me that flock() was obsolete - it doesn't work over
> NFS for a start - so I avoided using it, and used fcntl() for locking.
Yes, I think fcntl() is preferred. However, you can still wait on the
lock by using F_SETLKW rather than F_SETLK.
> What Exim actually does is to try to get the lock a number of times
> (set by lock_retries - default 10), waiting for the time
> lock_interval (default 3 seconds) between each try. At least this is
> what the code is supposed to do. Bugs are always with us...
I don't think there's a bug with this logic. A test (with -d9)
produced lots of messages:
fcntl() or MBX locking failed - retrying
and a few:
failed to lock mailbox /home/test1/Mail/INBOX (fcntl)
indicating that exim tried several times to get the lock before
giving up. (This was a simple test with one script invoking exim to
deliver to this mailbox in a tight loop, and another making an IMAP
connection, opening the mailbox, marking all messages for deletion,
and expunging, again in a tight loop.)
> So if you get a locking error, it has been trying for quite some time.
> Would a blocking lock actually help?
I think it would, because waiting for the lock is more aggressive than
sleeping. Also, I cheated and patched exim before repeating the test
described above. This time, exim did not report any errors obtaining a
lock.
Here's the patch (quick and dirty - not for production use):
--- exim-3.12/src/transports/appendfile.c.ORIG Wed Dec 8 09:57:09 1999
+++ exim-3.12/src/transports/appendfile.c Mon Jan 24 15:07:57 2000
@@ -1692,6 +1692,7 @@
#ifdef SUPPORT_MBX
else if (ob->use_mbx_lock)
{
+ int rc;
lock_data.l_type = F_RDLCK;
lock_data.l_whence = lock_data.l_start = lock_data.l_len = 0;
if (fcntl(fd, F_SETLK, &lock_data) >= 0 && fstat(fd, &statbuf) >= 0)
@@ -1730,7 +1731,13 @@
lock_data.l_type = F_WRLCK;
lock_data.l_whence = lock_data.l_start = lock_data.l_len = 0;
- if (fcntl(mbx_lockfd, F_SETLK, &lock_data) >= 0) break;
+ sigalrm_seen = FALSE;
+ os_non_restarting_signal(SIGALRM, sigalrm_handler);
+ alarm(30);
+ rc = fcntl(mbx_lockfd, F_SETLKW, &lock_data);
+ alarm(0);
+ signal(SIGALRM, SIG_IGN);
+ if (rc >= 0) break;
DEBUG(9) debug_printf("failed to lock %s: %s\n", mbx_lockname,
strerror(errno));
--
Ray Miller <ray.miller@???>
Unix Systems Programmer
Oxford University Computing Services