Re: [Exim] Severe Exim performance problem, with paradoxical…

Top Page
Delete this message
Reply to this message
Author: Scott Courtney
Date:  
To: exim-users
Subject: Re: [Exim] Severe Exim performance problem, with paradoxical system statistics
On Friday 07 November 2003 23:54, Avleen Vig wrote:
> Remove one of the hosts from accepting mail (firewall it, remove the MX
> record, whatever), and THEN try the test.
> If you don't have a long wait, then you are more likely to be running in
> to performance problems with load (application or system).


The long wait appears to be load sensitive, and in particular relates to the
number of currently-active inbound SMTP sessions on that host. During light
periods, the banner appears in just a second or two, but during heavy usage
periods, it may take 45 seconds or longer. Active outbound sessions don't
seem to have much impact.

> If you do still have to wait, it might be a confirguration problem. What
> I would do at that point is tcpdump all the traffic on that interface,
> and see what comes out - you might find that something is blocking on a
> DNS lookup somewhere even though you think you've disabled all the
> required lookups ;-)


We did in fact find some problems with the customer's DNS, and got them to fix
the problems. We now have a system where:

* Every Exim host is able to forward and reverse resolve every other Exim host
on the customer's network.
* The Exim hosts that accept inbound mail are able to forward and reverse
resolve addresses from the outside, as long as the authoritative DNS has the
correct entries (as many spammers, of course, do not).

Unfortunately, this hasn't helped the performance much. I did a "tcpdump" of
the network, and it appears that the DNS queries being sent out are mostly
for the "smarthost" that we have configured for handling bounces. These are
relatively infrequent, and the replies are coming back sub-second.

I've got another theory that I'm going to test tomorrow. I think that maybe
the DNS queries are someohow going out first on the wrong interface (these
hosts have multiple network connections on different subnets). I'm going
to look into that, because that could indeed lead to timeouts.

In the meantime, I thought I'd post today's results to see if anyone's got
other suggestions. The DNS changes have helped, but not as much as I had
hoped. It still seems that we have a delay happening at the time of opening
the socket on port 25.

Scott

--
-----------------------+------------------------------------------------------
Scott Courtney         | "I don't mind Microsoft making money. I mind them
courtney@???       | having a bad operating system."    -- Linus Torvalds
http://4th.com/        | ("The Rebel Code," NY Times, 21 February 1999)
                       | PGP Public Key at http://4th.com/keys/courtney.pubkey