[EXIM]

Top Page
Delete this message
Reply to this message
Author: Nigel Metheringham
Date:  
To: Philip Hazel
CC: exim-users
Subject: [EXIM]

ph10@??? said:
} > We are also seeing, particularly under these circumstances (huge
} > hits normally from one site) some processes hanging on read -

} Aha! Somebody else sees this.

} > I think the problem is actually kernel related in that the alarm
} > timeout is getting lost but in theory a lot of signal stuff is
} > unreliable.

} We have these from time to time on Solaris, and I keep looking at the
} stuck processes and can't find out what the heck is happening, except
} that it always seems to be related to a dial-in connection that has
} gone away. However, I haven't seen any for a while, and a comment on
} a Sun patch suggested that something might have got fixed.

A chat to someone who is rather better on these low level Unix internals
than I suggests that an alarm(),read() setup will fail under some
circumstances unless you also have a setjmp()/longjmp() to catch problems
related to signals within read() (this may be made worse by having the
read buried underneath the stdio layer).

I really am wondering if a change to a select based series of timeouts
would be worth having here - although it could mean a fair bit of work
converting everything to use this system.

Oddly enough I am seeing a similar problem on a ftp daemon which does use
setjmp/longjmp on timeouts.

    Nigel.
-- 
[ Nigel.Metheringham@???   -  Systems Software Engineer ]
[ Tel : +44 113 207 6112                   Fax : +44 113 234 6065 ]
[      Real life is but a pale imitation of a Dilbert strip       ]




--
*** Exim information can be found at http://www.exim.org/ ***