Re: Paniclog contents?

Top Page
Delete this message
Reply to this message
Author: Philip Hazel
Date:  
To: Christoph Lameter, Tom Samplonius
CC: exim-users
Subject: Re: Paniclog contents?
On Fri, 10 Jan 1997, Christoph Lameter wrote:

> I have a bunch of the following messages in my paniclog:
> 1997-01-09 18:48:04 select() or accept() failed: Network is unreachable

...
> 1997-01-09 18:49:19 too many select() or accept() errors: giving up
> 1997-01-09 19:56:24 select() or accept() failed: Connection reset by peer
> 1997-01-09 21:14:23 select() or accept() failed: No route to host
>
> Why would exim panic on these conditions?


Interesting. I never realized you could get those errors on a select()
or accept() function, which is after all listening for *incoming*
connections. "Network is unreachable" sounds like the sort of problem
you get on an outgoing connection.

Now that I think about it, of course, I remember that TCP/IP is a
two-way independently routed protocol, and it is presumably possible for
an incoming packet to arrive and the response to fail in these ways.
Apologies for not thinking of this when I wrote the code - in fact what
I did was to copy the code that is in smail, which (I've just checked)
does exactly the same thing. It contains the following comment:

            /*                                
             * for some reason, accept() fails badly (and repeatedly)          
             * on some systems.  To prevent the paniclog from filling        
             * up, exit if this happens too many times.         
             */                                              


I don't know if this is still the case on modern OS, but I copied the
logic just in case. There is a pause of 5 seconds after each such error.

> We deliver messages all over the world. This is a common occurrence.
> Exim needs to panic on real problems not just because a connection dropped!


So do we, but this is about message reception, not delivery. I have
never seen one of those messages before, and I have a cron job that
tells me whenever there is anything in Exim's panic log. Nobody else has
ever remarked on it, so I'm inclined to believe that is isn't all that
common.

> Those messages belong into the main log.


Yes, I think you are probably right, but this one

> 1997-01-09 18:49:19 too many select() or accept() errors: giving up


indicates that there have been 10 successive such errors (with a pause
of 5 seconds after each one) and is presumably more serious. Perhaps I
should increase the 10 to, say 30 before causing the daemon to bomb out.
Maybe the 5 second delay is rather long. Maybe smail's paranoia is no
longer necessary.

You can find this code around line 874 in src/daemon.c if you want to
experiment with hacking it yourself.

> How can I increase the limit? I want at least 100 simultaneous connections.


smtp_accept_max = 100

> The problem was that Exim after delivering some messages did not again pick up
> listening to the SMTP port. The "exim -bd" process was simply gone for good.


This would be the case after "too many select() or accept() errors".
Perhaps I should remove the paranoia code.

Views?

--
Philip Hazel                   University Computing Service,
ph10@???             New Museums Site, Cambridge CB2 3QG,
P.Hazel@???          England.  Phone: +44 1223 334714