On Tue, Nov 11, 2003 at 04:10:16PM -0500, Scott Courtney wrote:
> I've got some additional information on our performance problem:
>
> 1. We did the test of disconnecting the incoming network from Exim, so that
> it was only processing outbound messages to the next server. This was done
> by physically disconnecting the cable. (In this situation, it happens that
> all inbound messages come into eth1, and all outbound leave via eth0.) The
> connect time instantaneously sped up from c. 45 seconds to c. 2 seconds.
> When we reconnected the cable, the connect time went back to its longer
> delay.
>
> 2. I happened to recall that we have Exim listening on port 10025 in addition
> to port 25, a configuration that happened to be copied from another server
> that uses amavisd-new in that mode. This router doesn't use that second
> port, but I had never gotten around to removing it from the config file.
> Oddly enough, even when port 25 is under heavy load, port 10025 will give
> me an Exim banner within about 2 seconds. Port 10025 is not accessible from
> the Internet due to firewall rules; it can only be accessed from the local
> host. But port 25 has the slow connection even with telnet from localhost.
>
> This really smells like some kind of a problem with Exim forking itself too
> slowly, or with Linux kernel not being able to create sockets fast enough.
I know others have said this looks like an ident lookup problem, but it
still stinks of problems creating the initial TCP connection (ie, a
kernel problem). Otherwise, wouldn't the delay always be present?
Do you have syncookies turned on? These might be some tweaking to be
done there - sounds like the connection is being queued, waiting for the
kernel to be able to complete the connection.
You really shouldn't have many connections in a SYN_RECV state, but
remembering your previous mails I recall you said there were quite a
few.
Are you running out of network buffers? Running out of sockets?