Hello,
No doubt this is going to be a "random" problem, difficult to track down
and even harder to solve, but here goes.
I've noticed on a number of machines that since Exim 4.30, zombie/defunct
processes seem to be appearing "randomly". I've been steadfastly ignoring
this until now, putting it down to voodoo, but I've now seen it on at
least 3 machines, all running Red Hat 9, with exiscan and SA-Exim compiled
in, using the build configuration on my website (
http://www.timj.co.uk/linux/exim.php - basically similar to default except
with a few odds and ends like SSL/DSEARCH/AUTH CRAM-MD5 & PLAIN compiled
in) and all lightly loaded.
Here's an example from "ps axf":
5290 ? S 0:00 /usr/sbin/exim -bd -q1h
6961 ? Z 0:00 \_ [exim <defunct>]
Now, maybe this is nothing I should worry about or maybe it has always
happened, but I don't think that's the case because I noticed it on one
machine straight after putting 4.30 on it, and I'm sure I would have
noticed it before if it was happening.
Frustratingly, I can't reproduce it reliably, although starting and
stopping the Exim daemon quickly (e.g. "service exim restart" with the
init script I use, which stops and starts it) seems to *sometimes* trigger
it (predictably, not while I'm writing this).
A few general observations:
- There is absolutely nothing in the logs. Indeed, a zombie process has
appeared on my desktop machine since lunchtime today, which has done
absolutely nothing except an hourly queue run (on an empty queue) all
afternoon, it hasn't even processed a single mail.
- It doesn't, as far as I can tell, seem to be impacting the normal
operation of Exim.
- Running "exiwhat" seemed to get rid of the defunct process in at least
one instance.
- I'm *pretty* (90%+) sure that I saw this happen on a test machine that
didn't have exiscan on it, so I don't think it's exiscan-related.
- I don't think it's the runtime config, since it's happened with a
"virgin" machine (i.e. default config)
- The only significant things I'm aware I'm doing differently are running
these machines with user "mailnull" instead of "mail", and using "gid
mail" delivery to local mailboxes instead of using the sticky bit.
I know this is very vague, but in the absence of a reliable reproduction
scenario it's hard to say much more or acquire debugging information. Any
ideas about any changes that might have triggered it? Or is it perhaps
something I'm doing?
Whatever the case, I thought I should flag it up, since "something not
right" alarm bells are ringing. If anyone else notices similar behaviour,
it would be helpful if you could post; at the least it might help us to
look for similarities so we can narrow down the search for what's causing
it, or find a reliable reproduction scenario. Equally, if absolutely
nobody else has seen this, it would point the finger at some configuration
and/or supporting software versions I'm using.
Thanks,
Tim