Re: [exim-dev] mbx locking bug in CYGWIN

Author: Derek Martin
Date:
To: exim-dev
Subject: Re: [exim-dev] mbx locking bug in CYGWIN

On Thu, Mar 17, 2005 at 08:46:45AM +0000, Philip Hazel wrote:
> to be sure of what goes on. Since I implemented it, there have been no
> reports of problems - but then, how many people use MBX mailboxes?

That's a valid point... When I was working on an IMAP implementation,
I never even considered support of MBX, mainly because in all my years
as a sysadmin, and in dealing with various forms of e-mail software,
I've never once encountered anyone who used it (other than perhaps
experimentally).

> Things may have changed in c-client. Exim's code hasn't changed in this
> area for a very long time. ("If it ain't broke" - or perhaps "if it
> ain't been reported as broke" :-)

Indeed...

> > 1. It was written 10 years ago. Much has changed in the Unix world by
> > now.
>
> Absolutely true. However, this thread originated as a CYGWIN thing, and
> (nice though it is to know that Exim works in that environment) my main
> concern is that Exim works in the "real" Unix world.

Understood. OTOH, if it is a legitimate bug, knowing about it,
wouldn't you want to fix it?

> > 2. All that said, my understanding is that it is STILL the case that, > > in a heterogenious environment, NO LOCKING METHOD WORKS RELIABLY > > OVER NFS.

>
> I wonder why we have had no problems with user mailboxes on NFS (for a
> system with three separate hosts) over the last 8 or so years, then?

It could just be that you've been lucky...

In all honesty, the race conditions involved are (as I understand it)
such that should be exceedingly rare to encounter a problem. It would
mean that the MDA and the MUA would have to, at the exact same time,
both request an exclusive lock on the same file, and the request which
came over NFS came first but was interrupted before it completed.
Even on busy mail spool files, how likely is that to happen? You'd
probably only ever see a problem on really busy mail servers, with
unusually high loads. Otherwise, since both the MUA and MDA spend
most of their time waiting around for stuff to happen, I imagine once
they are woken up to do some file I/O, it's really unlkely that a
context switch would occur in the middle of a request to obtain a
lock...

But, that doesn't mean it will never happen... If it did happen, it
might be with such infrequency that noone would think to attribute it
to a locking problem. It may even have happened at a site under your
contol, and you never heard about it, because the user didn't report
it, or something of the sort.

Even with the old, broken nfs-utils on Linux, the occurence of data
corruption/loss under these circumstances was largely unheard of. But
that doesn't mean it never happened... More likely it did, but with
sufficient infrequency that no cause was ever attributed, and the data
loss was written off as "just one of those things..." Obviously I'm
theorizing.

And, again, with the OS vendors constantly doing development to fix
these kinds of problems, it may be that there really isn't a problem
anymore. Perhaps the probability of a loss occuring is
satisfactorally low that it's worth the risk in the name of
convenience.

Personally, given that there are other potential (security) problems
with NFS spools, in addition to potential locking issues, I would
never consider allowing it in my environment. But environments vary,
and there may be very little risk of either kind of problem in
yours... I still think it's worth a strong caution against doing it.

--
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D

This message is part of the following thread:
	the complete thread tree sorted by date
	Philip Hazel at
	Philip Hazel at