ph10@??? said:
} > We are also seeing, particularly under these circumstances (huge
} > hits normally from one site) some processes hanging on read -
} Aha! Somebody else sees this.
} > I think the problem is actually kernel related in that the alarm
} > timeout is getting lost but in theory a lot of signal stuff is
} > unreliable.
} We have these from time to time on Solaris, and I keep looking at the
} stuck processes and can't find out what the heck is happening, except
} that it always seems to be related to a dial-in connection that has
} gone away. However, I haven't seen any for a while, and a comment on
} a Sun patch suggested that something might have got fixed.
A chat to someone who is rather better on these low level Unix internals
than I suggests that an alarm(),read() setup will fail under some
circumstances unless you also have a setjmp()/longjmp() to catch problems
related to signals within read() (this may be made worse by having the
read buried underneath the stdio layer).
I really am wondering if a change to a select based series of timeouts
would be worth having here - although it could mean a fair bit of work
converting everything to use this system.
Oddly enough I am seeing a similar problem on a ftp daemon which does use
setjmp/longjmp on timeouts.
Nigel.
--
[ Nigel.Metheringham@??? - Systems Software Engineer ]
[ Tel : +44 113 207 6112 Fax : +44 113 234 6065 ]
[ Real life is but a pale imitation of a Dilbert strip ]
--
*** Exim information can be found at
http://www.exim.org/ ***