Re: [Exim] error response rate limiting vs. overloaded syste…

Top Page
Delete this message
Reply to this message
Author: Avleen Vig
Date:  
To: Exim Users Mailing List
Subject: Re: [Exim] error response rate limiting vs. overloaded systems
On Sun, Jul 06, 2003 at 04:09:20AM -0400, Greg A. Woods wrote:
> > I speak from the viewpoint of ISP's, because this is where most of my
> > experience is based.
> > When a spammer connects to your MX, he does not open one connections,
> > pipeline as many messages as he can, close to, and then open another
> > connection.
>
> Perhaps, but also irrelevant.


Not really, as I state later, your application can only open a set
number of connections. The number is set by the administrator based on
the load the server is able to handly.

You argument boils down to "Keep a connection open for an abnormally
long period of time before returning an error, and just leave the
connection idle so it doesn't take up resources".

> The point here is to make sure that any client you've sent an error
> response to can't immediately re-connect and do the same thing again.
> It really doesn't matter why you've rejected the transaction -- it could
> be the recipient is unknown, or it could be because you've used some
> blacklist to reject UCE or whatever. You've already accepted the
> connection. You've already got a process running to process it. Now
> all you do is make that process go idle for what in system terms is a
> very long extended period of time and during that time you let your
> system re-use whatever of that process' resources it knows how to reuse
> until it's time to finally send the last line and wait for the client to
> disconnect. The added cost of doing this is negligible.


But that is exactly my point.
You can only accept a finite number of connections. Regardless of the
resources I have, if I tell Exim to have 'smtp_accept_max' at 10, Exim
cannot have more than 10 concurrent connection, *regardless* of the
state they are in (transmit, receive, or idle). That's it. End.
And if you end up with 10 connections all waiting for their error
replies so that they can disconnect, you're effectively stopping all new
connections.
The added cost of this is significant if:
If E - M > 1 you're in trouble
Where E is unmber of messages generating errors, and M is the maximum
number of connections you can open, and where E is a sample of time
equal to the length of your rate limit.

Thus, if you rate limit is 10s, and in 10s you recieve 100 connections
which recieve one error or another, and the maximum simultanious
connections you accept is 5, you're choked.
(these numbers are of course hypothetical).

You *cannot* accept more than your predefined number of connections. And
if on of those connections gets rate limited, it still takes up a slot
that a new connection cannot.

This is all true unless you're talking about Exim dynamically being able
to adjust the maximum number of connections it can take (eg, increasing
by one each time an oold connection is rate limited), but that has its
own serious issues.

> > Spammers appreciate that most ISP's, and even a number of businesses,
> > now employ the use of connection rate limiting (in one form or another).
> > He finds limits at which he can slam your servers from one of many
> > drones around the world until either his list of recipients is
> > exhausted, or your servers block him. He will open up as many
> > connections as possible.
>
> While this may be true of some spammers it is not generally true and it
> is especially not true of broken client software which has been the most
> noticable cause of problems in my experience with both ISPs and
> corporate networks.


It most certainly is true of spammers from my vantage point.

> Nope, you've failed to note that adding capacity doesn't mean you also
> give it away under the control of third parties. The goal of DoS
> protection mechanisms is to have control over your resources and to not
> give that control to third parties. Like I said: Pay a little, save a
> lot.


No, by adding more capacity you're pushing back the problem by being
able to accept more connections, until the volume of mails grows again.
Of course you should add more horsepower when your farm actually cannot
accept connections, but artificially forcing it to is not a good idea
:-)

> > Spammers, through the mass availability of open proxies and relays,
> > compromised clients, and other things we wish didn't exist, have far
> > more resources to send our mail, than any one organisation does to
> > receive (to the best of my knowledge).
>
> Sure, but that's more or less irrelevant. This part of the problem is
> solved by identification and denial of authorisation -- i.e. reject
> their transactions before they can further impact your limited
> resources.


I agree whole heartedly.

> Open proxies and open relays are relatively easily
> mechanically identified in a completely impartial manner and there are
> several well maintained lists of them. The important thing here is to
> realize that when rejecting their connections you really must employ
> error response rate limiting. This is because many open relays, and
> most open proxies, are prone to exactly the kind of problem that we
> started down this thread with -- they will unwittingly hammer on your
> server if you send them a 5xx response that they don't honour. This
> only stands to reason because any server that's an open relay is liable
> to have other implementation bugs as well.


Right, and they'll open up as many simultanious connections as the
resources on their side permit. If you rate limit one connection, it's
completely possible that the slow down in traffic is sufficient for
another thread to fire up. And so all your connections will get used up,
regardless of how much horsepower you may or may not have left.

> > So the obvious question on the minds of some readers will be "Well, why
> > not set Exim up to accept 20 connections? Then you can last twice as
> > long!".
>
> If you've actually implemented error response rate limiting fully and
> properly as I have then you'll soon realize that this is almost exactly
> the right solution, though not quite right -- maybe 12 or 15 would be
> the right number given your hypothetical scenario and assuming you have
> implemented some ACLs which might actually trigger such rate limiting).


Please explain to me then, how error response rate limiting will remedy
the above situation.

> I think you need to go learn more about tuning multi-user systems and
> servers.


Hmmm :-)

> First off, that's really not how most admins tune their servers -- at
> least none who are experienced at tuning servers will do this. The
> problem becomes quite apparent as soon as you've encountered critical
> failures caused by making such a mistake.


Different admins configure servers very differently.

> Secondly if you've paid any attention at all to how processes holding
> idle connections behave in a busy system you'll realize that the number
> of idle connections you can manage is orders of magnitude higher than
> the number of non-idle connections you can manage.,


Yes, obviously this is true. But when the number of total connections
you can manage is fixed, this doesn't matter a damn!

> > > Have you got actual stats for any such machine or group of machines?
> > What specifically are you looking for? I have many stats :-)
>
> Well we need to start by identifying the ratio of normal connections
> vs. those which trigger some kind of error response. Then we need to
> look at how many of those errors appear to have been ignored by the
> client -- indicated by an a re-connect attempt within the same period
> when a an error response rate limiting delay would have prevented any
> normal SMTP client from re-connecting.


Specify a period.. 30 seconds? I need something to start with. I'll
assume 30 seconds.