Re: [Exim] Re: spool over NFS?

Góra strony
Delete this message
Reply to this message
Autor: Philip Hazel
Data:  
Dla: Edgar Lovecraft
CC: exim-users
Temat: Re: [Exim] Re: spool over NFS?
On Wed, 2 Jun 2004, Edgar Lovecraft wrote:

> server => exports /nfs_share
> client1 => mounts /nfs_share
> client2 => mounts /nfs_share
>
> client1 => app1 opens and locks /nfs_share/file
> client1 => app2 sees the lock when accessing /nfs_share/file
> client2 => app1 opens and locks /nfs_share/file
>        as it does not see /nfs_share/file as being locked


Unless NFS has changed significantly recently, you cannot just use file
locks from multiple systems (even if they "work"). The following text is
a comment in the source of Exim's appendfile transport. Maybe it will help.

LOCK FILES

Unless no_use_lockfile is set, we attempt to build a lock file in a way that
will work over NFS. Only after that is done do we actually open the mailbox
and apply locks to it (if configured).

Originally, Exim got the file opened before doing anything about locking.
However, a very occasional problem was observed on Solaris 2 when delivering
over NFS. It is seems that when a file is opened with O_APPEND, the file size
gets remembered at open time. If another process on another host (that's
important) has the file open and locked and writes to it and then releases
the lock while the first process is waiting to get the lock, the first
process may fail to write at the new end point of the file - despite the very
definite statement about O_APPEND in the man page for write(). Experiments
have reproduced this problem, but I do not know any way of forcing a host to
update its attribute cache for an open NFS file. It would be nice if it did
so when a lock was taken out, but this does not seem to happen. Anyway, to
reduce the risk of this problem happening, we now create the lock file
(if configured) *before* opening the mailbox. That will prevent two different
Exims opening the file simultaneously. It may not prevent clashes with MUAs,
however, but Pine at least seems to operate in the same way.

Lockfiles should normally be used when NFS is involved, because of the above
problem.

The logic for creating the lock file is:

. The name of the lock file is <mailbox-name>.lock

  . First, create a "hitching post" name by adding the primary host name,
    current time and pid to the lock file name. This should be unique.


. Create the hitching post file using WRONLY + CREAT + EXCL.

  . If that fails EACCES, we assume it means that the user is unable to create
    files in the mail spool directory. Some installations might operate in this
    manner, so there is a configuration option to allow this state not to be an
    error - we proceed to lock using fcntl only, after the file is open.


. Otherwise, an error causes a deferment of the address.

. Hard link the hitching post to the lock file name.

  . If the link succeeds, we have successfully created the lock file. Simply
    close and unlink the hitching post file.


. If the link does not succeed, proceed as follows:

    o Fstat the hitching post file, and then close and unlink it.


    o Now examine the stat data. If the number of links to the file is exactly
      2, the linking succeeded but for some reason, e.g. an NFS server crash,
      the return never made it back, so the link() function gave a failure
      return.


  . This method allows for the lock file to be created by some other process
    right up to the moment of the attempt to hard link it, and is also robust
    against NFS server crash-reboots, which would probably result in timeouts
    in the middle of link().


  . System crashes may cause lock files to get left lying around, and some means
    of flushing them is required. The approach of writing a pid (used by smail
    and by elm) into the file isn't useful when NFS may be in use. Pine uses a
    timeout, which seems a better approach. Since any program that writes to a
    mailbox using a lock file should complete its task very quickly, Pine
    removes lock files that are older than 5 minutes. We allow the value to be
    configurable on the transport.



--
Philip Hazel            University of Cambridge Computing Service,
ph10@???      Cambridge, England. Phone: +44 1223 334714.
Get the Exim 4 book:    http://www.uit.co.uk/exim-book