Re: [Exim] remote delivery process count got out of step

Pàgina inicial
Delete this message
Reply to this message
Autor: Philip Hazel
Data:  
A: Bernhard Erdmann
CC: exim-users, frank
Assumpte: Re: [Exim] remote delivery process count got out of step
On Wed, 28 Aug 2002, Bernhard Erdmann wrote:

> Now it's clearly reproducable using the same bounce procedure:
> - no strace:        Exim works well as expected
> - strace -p PID -f:    "remote delivery process count got out of step"


The ChangeLog for 4.11 has this entry:

 4. It has been discovered that, under Linux, when a process and its children
    are being traced by "strace -f", the children are stolen from the parent
    while they are being traced. A call to waitpid(-1,&x,NOHANG), which Exim
    uses to test for the completion of "any of my children" in a non-blocking
    manner, returns as if there are no children in existence. Exim used treat
    this as a serious unexpected error state. What it does now is to use
    kill(pid,0) to check explicitly for the continued existence of any of its
    children. If it finds any, it assumes it is being traced, and proceeds as
    if the return from waitpid() had been "none of your children have finished
    yet". If it can't find any children, it gives the error as before.


... and if debugging, it says

process xxx still exists: assume stolen by strace

This seems to be a Linux "feature". The same thing does not happen under
Solaris "truss", for example. I don't know about other OS.

--
Philip Hazel            University of Cambridge Computing Service,
ph10@???      Cambridge, England. Phone: +44 1223 334714.