Re: [Exim] Severe Exim performance problem, with paradoxical…

Top Page
Delete this message
Reply to this message
Author: Scott Courtney
Date:  
To: exim-users
Subject: Re: [Exim] Severe Exim performance problem, with paradoxical system statistics
Apologies...neglected to include the list on this reply first time
I sent it.

On Friday 07 November 2003 15:59, Avleen Vig wrote:
> On Fri, Nov 07, 2003 at 02:19:33PM -0500, Scott Courtney wrote:
> > * LDAP response times, even when Exim is at its slowest, are sub-second
> > for queries that I've carefully selected so that I know the results are
> > *not* cached in RAM. For users who have been recently queried (that is,
> > are in RAM cache in slapd), the response is instantaneous. I'm convinced
> > that LDAP is not the source of the delay.
>
> PING, you might have a winner here.
> Sounds like you *might* be running out of sockets, or network buffer.
> What is the output of 'netstat -m' ? There are other netstat options you
> should look for too.


"netstat -m" isn't a valid command on RedHat, but here's the output of
"netstat -s" instead:

******* SNIP *******
Ip:
    182831819 total packets received
    0 forwarded
    42639 incoming packets discarded
    181256307 incoming packets delivered
    183167326 requests sent out
    152 reassemblies required
    76 packets reassembled ok
Icmp:
    197413 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 197405
        echo requests: 4
        echo replies: 4
    197407 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 197403
        echo replies: 4
Tcp:
    6923157 active connections openings
    6528040 passive connection openings
    7430212 failed connection attempts
    80905 connection resets received
    162 connections established
    154395426 segments received
    154652238 segments send out
    1072200 segments retransmited
    14536 bad segments received.
    2788857 resets sent
Udp:
    27992248 packets received
    197394 packets to unknown port received.
    42644 packet receive errors
    28317609 packets sent
TcpExt:
    3723 invalid SYN cookies received
    7537 resets received for embryonic SYN_RECV sockets
    4981 packets pruned from receive queue because of socket buffer overrun
    1 ICMP packets dropped because socket was locked
    ArpFilter: 0
    5456073 TCP sockets finished time wait in fast timer
    92693 TCP sockets finished time wait in slow timer
    1 passive connections rejected because of time stamp
    1848 packets rejects in established connections because of timestamp
    7514689 delayed acks sent
    1042 delayed acks further delayed because of locked socket
    Quick ack mode was activated 89519 times
    1501915 times the listen queue of a socket overflowed
    1501915 SYNs to LISTEN sockets ignored
    26866984 packets directly queued to recvmsg prequeue.
    19080474 packets directly received from backlog
    3618059509 packets directly received from prequeue
    1026 packets dropped from prequeue
    27790393 packets header predicted
    16524491 packets header predicted and directly queued to user
    TCPPureAcks: 22376447
    TCPHPAcks: 37256076
    TCPRenoRecovery: 3
    TCPSackRecovery: 9
    TCPSACKReneging: 0
    TCPFACKReorder: 0
    TCPSACKReorder: 0
    TCPRenoReorder: 0
    TCPTSReorder: 0
    TCPFullUndo: 0
    TCPPartialUndo: 0
    TCPDSACKUndo: 0
    TCPLossUndo: 184803
    TCPLoss: 0
    TCPLostRetransmit: 0
    TCPRenoFailures: 1008
    TCPSackFailures: 20174
    TCPLossFailures: 1760
    TCPFastRetrans: 13
    TCPForwardRetrans: 0
    TCPSlowStartRetrans: 5092
    TCPTimeouts: 662577
    TCPRenoRecoveryFail: 0
    TCPSackRecoveryFail: 1
    TCPSchedulerFailed: 204355
    TCPRcvCollapsed: 140832
    TCPDSACKOldSent: 78318
    TCPDSACKOfoSent: 1389
    TCPDSACKRecv: 394
    TCPDSACKOfoRecv: 14
    TCPAbortOnSyn: 0
    TCPAbortOnData: 2039278
    TCPAbortOnClose: 18528
    TCPAbortOnMemory: 0
    TCPAbortOnTimeout: 41953
    TCPAbortOnLinger: 0
    TCPAbortFailed: 0
    TCPMemoryPressures: 0
***** SNIP **********



> Look in to tuning RedHat (I'd help, but I'm a BSD guy :). See if you can
> increase the max sockets and reduce the time sockets take to close.


I haven't found any way to increase max socket count in Linux, but I agree
that this could be an issue and I'm still looking for such an option.

We did in fact change to sysctl parameters to decrease the time to close
a socket:

           tcp_rfc1337 = 1
           tcp_tw_recycle = 1


These two changes seemed to help, but didn't by any means eliminate the
problem.

>
> Also, what if the total number of connections (SYN_RCVD, TIME_WAIT,
> ESTABLISHED, everything) ? What percentage of these are in SYN_RCVD?
> Solaris has the wondering (wonderful!) ndd -get /dev/tcp tcp_listen_hash
> to tell you how many connections are in each of q0 and q network
> connection queuesm so you can quickly tell if you are blockign waiting
> for the TCP connection to form, or blocking waiting for the application
> to accept. I don't know of anything like this for Linux, I wish I did.


Another good question, and one we've been looking at as well. Before the
changes to the TCP parms in the kernel (see above), there were a large number
of TIME_WAIT and SYN_RECV states in the output of "netstat -pn". Now we have
a smaller proportion of these states as compared to the ESTABLISHED state,
where something useful is (presumably) happening.

But the time between "telnet localhost 25" and "220 Exim....blah blah" is
still anywhere from 15 seconds to infinity (timeout error).

Scott

--
-----------------------+------------------------------------------------------
Scott Courtney         | "I don't mind Microsoft making money. I mind them
courtney@???       | having a bad operating system."    -- Linus Torvalds
http://4th.com/        | ("The Rebel Code," NY Times, 21 February 1999)
                       | PGP Public Key at http://4th.com/keys/courtney.pubkey