Author: Avleen Vig Date: To: Exim Users Mailing List Subject: Re: [Exim] error response rate limiting vs. overloaded systems
On Sat, Jul 05, 2003 at 11:20:28PM -0400, Greg A. Woods wrote: > > and works on a small scale, but on the scale of
> > accepting 300+ new connections, per machine, per second, they just don't
> > work unless you at least quadrouple your hoursepower.
>
> I don't think you understand queuing and operational theory very well.
> I'm no expert either, but I've had enough basic training and lots of
> real-world experience in these issues and I can assure you that you
> don't know what you're talking about.
I think I do, and I'll go on to explain why - something I admit I should
have done in my last mail.
> The actual factor depends on how many of those connections you're
> ultimately rejecting with a 4xx or 5xx response.
> I'll bet you can get by with only about 25% more system capacity, at
> most, even if 99% of those connections result in successful message
> deliveries (since of course 99% of the time there would be no delay
> introduced by error rate response limiting).
Let me at least make sure we're on the same page:
We're talking about rate limiting responses to clients who would receive
an error. This conversation started talking about "bad" clients, but
seems to have gone on to talk about all errors.
The chief culprits by far, of these errors, are spammers. Spammers cause
more of these errors through "user unknown", "relay denied", "block
because of XYZ BL", etc.
I refer you now, to RFC1925. Humorous as it's release date is, point 9
is very relevant here:
(9) For all resources, whatever it is, you need more.
I don't know how accurate your "99%" figure is. I'll be able to tell
soon when my statistics gathering is complete. About a week or two.
I speak from the viewpoint of ISP's, because this is where most of my
experience is based.
When a spammer connects to your MX, he does not open one connections,
pipeline as many messages as he can, close to, and then open another
connection.
Rather, he goes through a phase of trial and error.
Spammers appreciate that most ISP's, and even a number of businesses,
now employ the use of connection rate limiting (in one form or another).
He finds limits at which he can slam your servers from one of many
drones around the world until either his list of recipients is
exhausted, or your servers block him. He will open up as many
connections as possible.
So, what happens when you add 25% more capacity? Yup, you're right back
where you started.
Spammers, through the mass availability of open proxies and relays,
compromised clients, and other things we wish didn't exist, have far
more resources to send our mail, than any one organisation does to
receive (to the best of my knowledge).
Now, coming back to your point about rate limiting:
An application, any application, has the ability to accept only so many
connections. Lets say for the sake of argument, that Exim is set up to
accept 10 simultanious connections.
So 10 clients connect to the Exim server. New connections, now have to
wait until an older connection drops.
Unfortunately one of the original 10 connections is getting rate
limited!
So for the next 60 seconds only 9 other connections are availible.
Now, as 9 new clients connect, one of those has to be rate limited for
60 seconds.
Now only 8 slots are availible.
And the cycle continues until the server cannot accept any more
connections.
So the obvious question on the minds of some readers will be "Well, why
not set Exim up to accept 20 connections? Then you can last twice as
long!".
The answer quite simply, is that most admins will set their servers up
to accept the maximum number of connections they can.
After that, they simply cannot accept more connections.
If you add capacity. you're back to where you started.
My apologies Mr Woods (really I am, I had a teach called Mr Woods who I
liked very much..), but on the scales at which I and several others
work, rate limiting in this fashion does not work. It's simply not a
black/white matter.
> Have you got actual stats for any such machine or group of machines?
What specifically are you looking for? I have many stats :-)