Re: [exim] exim dies on the interrupted system call

Top Page
Delete this message
Reply to this message
Author: Robert N. M. Watson
Date:  
To: exim-users
CC: David Woodhouse
Subject: Re: [exim] exim dies on the interrupted system call

On 5 Jan 2011, at 17:29, Richard Clayton wrote:

> -----Original message-----
> Subject:    [exim] exim dies on the interrupted system call
> To:         exim-users@???
> Cc:         =?UTF-8?Q?=D0=9A=D0=B0=D1=8F=D0=BB=D0=B0=D0=B9=D0=BD=D0=B5=D0=BD?= 
> <artem@???>,
>            =?UTF-8?Q?=D0=90=D1=80=D1=82=D0=B5=D0=BC_?=@tahini.csx.cam.ac.uk
> From:       David Woodhouse <dwmw2@???>
> Date:       Sat, 1 Jan 2011 00:43:01 +0000
> Message-ID: <1293842581.28701.70.camel@???>

>
> On Mon, 2010-12-27 at 21:39 -0500, Phil Pennock wrote:
>> This is a bug in Exim. Looking at the code, I'm rather shocked that
>> it has never bitten us before now.
>
> It doesn't bite because most operating systems don't actually return
> short writes on a real file except on EOF. Even though POSIX permits
> them to.
>
> (The case you've seen is actually returning -1 / EINTR rather than a
> short write where it writes fewer bytes than you asked, but that's just
> a special case of the same thing.)
>
> In Linux we avoid doing short writes because we *know* a lot of
> userspace will break if we do that. Exim will not be the only program
> which breaks on the FreeBSD system in question.
>
> But yes, strictly speaking it *is* a bug in Exim. There are a bunch of
> write() calls which we should wrap with our own function that loops
> until it's either written all it had to write, or got a *real* error.


Hi David, et al,

As you observe, returning (-1, EINTR) is probably technically to spec and correct, but actually something you never want the file system to do. The only cases I'm aware of where FreeBSD file systems intentionally return EINTR are soft mounts of NFS, or in some rare edge cases, Coda (and maybe AFS by implication). As such, I'd consider it a bug if EINTR is getting returned from write(2) on a regular file in UFS2 -- and also a surprising one.

It would be worth tracking this down a bit more, since if such a bug does exist, we want to fix it. Is there any chance the write(2) is being sent to a FIFO in the file system, rather than a regular file, or even a socket? Could Exim have its file descriptors mixed up? Is Exim using threading, in which case we could be looking at a threading library bug?

(Normally sleeps performed inside the file system on block I/O are non-interruptible, for all the reasons cited above).

Robert