Re: [Exim] error response rate limiting vs. overloaded syste…

Top Page
Delete this message
Reply to this message
Author: Exim Users Mailing List
Date:  
To: Avleen Vig
CC: Exim Users Mailing List
Subject: Re: [Exim] error response rate limiting vs. overloaded systems
[ On Sunday, July 6, 2003 at 07:14:51 (-0700), Avleen Vig wrote: ]
> Subject: Re: [Exim] error response rate limiting vs. overloaded systems
>
> You argument boils down to "Keep a connection open for an abnormally
> long period of time before returning an error, and just leave the
> connection idle so it doesn't take up resources".


You've missed the key point: "and so that the errant client can't come
right back at you with another attempt to cause the same error."

> You can only accept a finite number of connections. Regardless of the
> resources I have, if I tell Exim to have 'smtp_accept_max' at 10, Exim
> cannot have more than 10 concurrent connection, *regardless* of the
> state they are in (transmit, receive, or idle). That's it. End.


Agreed.

> And if you end up with 10 connections all waiting for their error
> replies so that they can disconnect, you're effectively stopping all new
> connections.


Of course.

> The added cost of this is significant if:
> If E - M > 1 you're in trouble
> Where E is unmber of messages generating errors, and M is the maximum
> number of connections you can open, and where E is a sample of time
> equal to the length of your rate limit.
>
> Thus, if you rate limit is 10s, and in 10s you recieve 100 connections
> which recieve one error or another, and the maximum simultanious
> connections you accept is 5, you're choked.
> (these numbers are of course hypothetical).


"choked" has all the wrong connotations -- you've stopped accepting
connections but your machine is completely, i.e. 100%, idle and with
very little VM, process slot, and mbuf requirements.

If on average the majority of your connections result in errors
(e.g. the 75% from my well-studied sample machine), and assuming your
base OS is well designed and implemented, then you could increase the
maximum number of allowed connections by as much as an order of
magnitude and still not suffer at all.

> You *cannot* accept more than your predefined number of connections. And
> if on of those connections gets rate limited, it still takes up a slot
> that a new connection cannot.


You are still ignoring what happens to the system when a process and the
socket it holds open are in the stasis of a sleep() call.

> This is all true unless you're talking about Exim dynamically being able
> to adjust the maximum number of connections it can take (eg, increasing
> by one each time an oold connection is rate limited), but that has its
> own serious issues.


Dynamic adjustment of the accept limits based on the number of idle
connections being rate-limited would be ideal, and not too difficult to
achieve, and really does NOT have any serious issues of its own, but so
far in my experience it has not been necessary either. Maybe someday
the situation will get so bad (e.g. >95% of connections are rejected) as
to require it, but I don't see that day coming soon.

Also, PLEASE try to keep in mind that SMTP is a store and forward
protocol with fully defined retry semantics. Rejecting connections with
a 421 is a normal thing to do and it will not cause any serious harm.
You just have to make sure that everyone you reject will be well behaved
and stay away for some time before they try again.

(FYI so far I've not had any clients hammer away at a 421, though I
should set up a honeypot of sorts to test them because that's one place
where it starts to get tricky as to how long you can afford to hold them
hostage. So far the only bad behaviour has been from clients that
ignore 5xx errors.)

> It most certainly is true of spammers from my vantage point.


I suspect you are ignoring the majority and focusing on those who are
causing you problems.

> Right, and they'll open up as many simultanious connections as the
> resources on their side permit. If you rate limit one connection, it's
> completely possible that the slow down in traffic is sufficient for
> another thread to fire up. And so all your connections will get used up,
> regardless of how much horsepower you may or may not have left.


But if half or more of those alloted connections are idle then you can
safely open up a few more and at least some of the time one of those few
more will be a normal legitimate client.

You can't go on forever this way of course, because eventually you will
run out of some OS resource, but in reality you don't have to. I don't
know if there's some formula based on the size and type of your user
base or not, but if you have a good feel for the requirements of your
user base and if you watch your logs and system performance for a trial
period, it's not hard at all to tune this value such that you don't
exceed the physical limits of your machine. Of course if your user
requirements are bigger than your machine then you have to upgrade (or
divide and conquer), regardless.

--
                                Greg A. Woods


+1 416 218-0098;            <g.a.woods@???>;           <woods@???>
Planix, Inc. <woods@???>; VE3TCP; Secrets of the Weird <woods@???>