Solaris 2.5 NFS apparent bug - ideas for workaround?

Page principale
Supprimer ce message
Répondre à ce message
Auteur: Philip Hazel
Date:  
À: exim-users
Sujet: Solaris 2.5 NFS apparent bug - ideas for workaround?
We have discovered what appears to be a bug in the client implementation
of NFS in Solaris 2.5. If it isn't a bug, it's a worrying incompatibility.
It can affect versions of Exim that are being used to do local
deliveries into NFS-mounted files.

Yes, I know that doing mail delivery over NFS is something all the
experts tell you not to do. Nevertheless, lots of people do it,
including ourselves. I've tried to make Exim as robust as possible in
this situation, but I don't know how to get round this one. Any ideas
welcomed. The problem is as follows:

If two different Exims on two different hosts (don't know if that is
essential) attempt to deliver to the same NFS-mounted mailbox at the
same time, then, under some circumstances (see below) the following
happens:

. Each host opens the file (using O_WRONLY + O_APPEND) and at this point
the file size is cached in the clients.

. One host succeeds in getting the lock first, writes its message to the
file, closes it, and goes away.

. The other host has been waiting for the lock (or maybe even not, if
the first host was blindingly fast) and now gets it, writes its
message and exits. The trouble is that it has NOT updated the file
size before doing the write. Consequently, despite the O_APPEND, it
writes at the wrong place in the file.

The man page for write() says "If the O_APPEND flag of the file status
flags is set, the file pointer is set to the end of the file prior to
each write(). The system guarantees that no intervening file
modification operation will occur between changing the file offset and
the write operation." No ifs or buts about NFS there.

I have a simple test program that exhibits this behaviour, but it isn't
100% able to do it, and I'm not sure exactly what causes it to happen.
I have tried lseek() to end of file before writing, but that doesn't
help. Also tried stat() and another open(), but they all seem to be
working from the cached file size.

What I need is a way of forcing the client to fetch the file size from
the NFS server _after_ it has got the lock. Any ideas?

--
Philip Hazel                   University Computing Service,
ph10@???             New Museums Site, Cambridge CB2 3QG,
P.Hazel@???          England.  Phone: +44 1223 334714