Re: [EXIM] accept() errors instead of accepting when network…

Top Page
Delete this message
Reply to this message
Author: Greg A. Woods
Date:  
To: exim-users
Subject: Re: [EXIM] accept() errors instead of accepting when network unreachable
[ On , November 19, 1998 at 07:13:04 (-0000), D. J. Bernstein wrote: ]
> Subject: Re: [EXIM] accept() errors instead of accepting when network unreachable
>
> Greg A. Woods writes:
> >     errno != ECONNRESET && errno != ENETUNREACH && errno != EHOSTUNREACH
> >     && errno != ENOTCONN && errno != ENETDOWN && errno != EHOSTDOWN
> >     && errno != errno != ECONREFUSED

>
> tcpserver has always treated all accept() failures as ``try again.''


If you always just try again then you have no way of knowing when you
have to close and re-open the socket! ;-)

Of course this might not be a problem depending on how the daemon
handles the socket....

> > lots of STREAMS based TCP/IP stacks where
> > completely killing and restarting the daemon, or even rebooting
> > sometimes, is still required when accept(2) gets itself tied in a knot.
>
> The sendmail notes say that you sometimes have to close and reopen
> non-blocking sockets under Solaris 2.3. Do you have evidence of a more
> serious problem?


That might be all that's necessary on SunOS-5.3, but I've never had
enough direct experience with such systems to know for sure.

One of my previous clients commonly rebooted their ISC system because
killing and re-starting earlier versions of smail didn't work. (Well,
since smail killed itself after you only had to notice it was dead and
re-start it, which a cron script was written to do, but the script hung
around and waited to see if smail died again, and if it did then it
cried out for help.) This was a commonly known problem in the smail
user community and at one time affected several SysV variants, not all
of which were using the same TCP/IP code so far as I know.

At some point someone (perhaps even Ron Karr or Landon Noll) discovered
the following (this comment goes back to before 1992):

/*
* Interactive UNIX 2.2 has a bug in accept(). If accept() is
* interrupted by an alarm signal, accept() does not return from
* waiting for a connection with errno set to EINTR. Unfortunately
* this is necessary for smail to process its mail queues at regular
* intervals, as specified with the -q option.
*
* Interactive's select() does work correctly, however. Thus,
* we use select() to determine when to call accept(), and catch
* alarm signals out of select(), instead of out of accept().
*/

After this hack was introduced I don't think the necessity to reboot was
quite as high, at least not on ISC 2.2. This made me suspect that the
code which should have set errno to EINTR and returned from accept() was
doing other evil things instead -- evil things which eventually resulted
in enough corruption that the TCP stack was useless.

If I ever again trip over anyone who has to run smail on such ancient
systems I'll offer them a patch that simply closes and reopens the
socket and see if that works for them (it would be a lot cleaner than
having a cron script constantly restarting the mailer!).

Of course any SMTP daemon that didn't need to send itself SIGALRM while
waiting for connections could probably get away without any of these
hacks and simply ignore all errors from accept(). However there are at
least some error codes which represent fatal errors in the application
and running round the loop once might not fix them. I prefer to be
paranoid and ignore only those errors which I know are meaningless to
the proper execution of the daemon.

-- 
                            Greg A. Woods


+1 416 218-0098      VE3TCP      <gwoods@???>      <robohack!woods>
Planix, Inc. <woods@???>; Secrets of the Weird <woods@???>


--
*** Exim information can be found at http://www.exim.org/ ***