Re: [Exim] error response rate limiting vs. overloaded syste…

Top Page
Delete this message
Reply to this message
Author: Exim Users Mailing List
Date:  
To: Suresh Ramasubramanian
CC: Exim Users Mailing List
Old-Topics: Re: [Exim] Brain Dead ISP's?
Subject: Re: [Exim] error response rate limiting vs. overloaded systems
[[ I've finally remembered to update the subject line! ;-) ]]

[ On Sunday, July 6, 2003 at 07:01:13 (+0530), Suresh Ramasubramanian wrote: ]
> Subject: Re: [Exim] Brain Dead ISP's?
>
> Greg A. Woods wrote:
>
> > I've yet to see a system that can handle normal peak loads without
> > slowing down which can't also handle at least a 60-second error response
> > delay for all SMTP errors. Furthermore note that the tuning needed to
>
> What about an already heavily overloaded system?


Delaying error responses by another 60 seconds still won't hurt that
much more on any OS with decent networking and VM subsystems. There's
no CPU and no I/O required to hold a connection open -- only a callout
slot, a process table slot, a socket, maybe an mbuf, and the few
unsharable pages needed for the process user-land data and stack -- only
a few tens of KB of RAM at maximum (unless the MTA is way over-bloated
and has allocated a huge amount of RAM before it got this far). The
majority of all that RAM can be paged out without penalty if necessary
since 60 seconds is a lifetime when it comes to virtual memory and the
pager I/O is a smaller price to pay than a sleeping process holding the
RAM hostage (but of course the VM policy is defined by the OS, not the
mailer, provided the mailer hasn't called mlock(2) :-).

Also, if part of the "load" is caused by clients reconnecting too soon
after errors (i.e. within 60 seconds), then it'll actually help, and
perhaps by a great deal, depending on the severity of the problem.

The busiest machine I've monitored closely enough lately to discuss this
issue with any certainy is a little old P-II/300MHz with 512MB RAM and a
few not-terribly-fast SCSI disks. It receives on the order of 100,000
connections a day, rejecting about 75% of them, the rest resulting in
successful messages with about a 1:1.1 delivery ratio. It accepts
maximum of 200 connections before it returns 421 responses, and it
starts queuing without delivery at 170 connections and it rarely reaches
the 200 connection limit. It pauses 60 seconds for every one of those
~75,000 rejects per day, and often multiple times as on average about
5,000 connections are from clients that won't take the first 5xx at HELO
and go on try other commands, with at least 1,000 of them making it all
the way to the DATA command (and presumably spewing their message).
This machine also runs Cyrus IMAPd for POP & IMAP services for about
12,500 accounts, and runs a web server for user home pages, serving out
at least 1 GB of traffic daily (there are about a million entries in
yesterday's access_log). I figure this machine is running at about 75%
capacity -- i.e. it's good for at least another 5,000 accounts (and
that's without adding any more RAM and maybe no more disk either). Note
too that most of these customers are on high-speed connections too, and
they leave their bloody PCs running with their MUAs making POP
connections every five minutes, and lots of these are houseolds with
multiple mailboxes and so that's as many as a half dozen almost
simultaneous POP connections from about, say, 5,000 PCs every five
minutes. I'm far more concerned about POP clients than I am about SMTP
clients. :-)

Here's the clincher though. Over the past few months there were a
couple of instances where the mailer on this machine wasn't completely
implementing the 60-second pause. When abusing clients exploited this
problem the machine was brought to its knees and firewall rules had to
be installed to block the offending client(s). Indeed since ironing out
the last of these bugs this machine has breezed along without any
SMTP-related problems whatsoever. (which reminds me -- I should remove
those firewall rules again when I have a chance to watch it closely :-)

I would say error response rate limiting is _always_ a good idea, even
with a mailer like Postfix that has a much lighter-weight SMTP daemon,
but especially for monolithic mailers of the likes of Exim (and Sendmail
and of course Smail, which the machine I describe above is actually
running :-).

--
                                Greg A. Woods


+1 416 218-0098;            <g.a.woods@???>;           <woods@???>
Planix, Inc. <woods@???>; VE3TCP; Secrets of the Weird <woods@???>