Author: Avleen Vig Date: To: Scott Courtney CC: engineers, exim-users Subject: Re: [Exim] Severe Exim performance problem, with paradoxical system statistics
On Fri, Nov 07, 2003 at 02:19:33PM -0500, Scott Courtney wrote: > * If you use telnet to connect to the inbound routers, however, you experience
> a delay of 10 to 45 seconds before getting the banner. THIS IS THE BIGGEST
> PART OF THE PROBLEM, in my opinion, and I think this is what's delaying
> the inbound messages.
>
> * Memory usage is consistently moderate, with Linux having about 900MB of RAM
> left over for disk cache.
>
> * netstat shows tons of connections in various states. Most of the connections
> to the LDAP server seem to be in TIME_WAIT status most of the time. Many of
> the inbound SMTP connections are in SYN_RECV status.
>
> * LDAP response times, even when Exim is at its slowest, are sub-second for
> queries that I've carefully selected so that I know the results are *not*
> cached in RAM. For users who have been recently queried (that is, are in
> RAM cache in slapd), the response is instantaneous. I'm convinced that LDAP
> is not the source of the delay.
PING, you might have a winner here.
Sounds like you *might* be running out of sockets, or network buffer.
What is the output of 'netstat -m' ? There are other netstat options you
should look for too.
Look in to tuning RedHat (I'd help, but I'm a BSD guy :). See if you can
increase the max sockets and reduce the time sockets take to close.
Also, what if the total number of connections (SYN_RCVD, TIME_WAIT,
ESTABLISHED, everything) ? What percentage of these are in SYN_RCVD?
Solaris has the wondering (wonderful!) ndd -get /dev/tcp tcp_listen_hash
to tell you how many connections are in each of q0 and q network
connection queuesm so you can quickly tell if you are blockign waiting
for the TCP connection to form, or blocking waiting for the application
to accept. I don't know of anything like this for Linux, I wish I did.