On 2008-03-12 at 15:46 +0100, Alexander Nagel wrote:
> My problem is that i have quite a big number of exim4 process with
> state ZOMBIE during mail delivery. Sometimes for any reason i dont
> know it happens that the ZOMBIE processes are not getting killed and
> they are getting more and more until NAGIOS is alerting.
A zombie process is already dead. The only state which still exists is
an entry in the process table (and some other kernel house-keeping
structures) which records the state of the process, etc. A zombie
process is, simply, a dead process which the parent hasn't reaped.
There is one way, and only one way, for a zombie process to go away: its
parent process reaps it. However, if you kill the parent process then
the zombie, like all that process's children, gets re-parented to be a
child of init (pid 1) which always reaps all its children. Thus, the
parent process reaps it (but it's a different parent).
Exim's model is to have a process handle each delivery, so if that
process has zombie children then the parent is stuck waiting on
something else. In your case, the spamc process is typically running.
When the spamc process exits, Exim cleans up.
The existence of the zombie processes is not your problem. The presence
of other processes which are hanging and which Exim is waiting on is a
possible problem and the zombie processes are a symptom of that, a
side-effect.
There is no need to kill the exim processes with -9 (SIGKILL), since
then Exim doesn't have a chance to clean up after itself and you
potentially risk leaving corrupted DB files around. If you really have
to kill Exim, is it really true that SIGTERM (-15, the default) doesn't
work for you?
> Debian- 16817 0.0 0.0 3264 768 ? S 15:27 0:00 \_ /usr/bin/spamc -t 10 -u XXXXXXXXXX
spamc should be timing out after 10 seconds (-t 10). If it's not then
there's the problem. Figure out why spamassassin is hanging.
-Phil