Author: Alexander Sabourenkov Date: To: Suresh Ramasubramanian CC: exim-users Subject: Re: 5xx during / after DATA [was Re: [Exim] bouncing viruses]
Suresh Ramasubramanian wrote: > At 03:17 PM 2/18/2003 +0300, Alexander Sabourenkov wrote:
>
>>> Domains / IPs which are noted doing this escalate from our access.db /
>>> rbldns to our firewall deny lists.
>>
>>
>> Then I suppose you shouldn't have problems with dumb clients retrying
>> after
>> 550 to end of data, as they'll eventually will find themselves
>> outright blocked,
>> if I understood you.
>
>
> It is the part before the "eventually" happens that is the problem.
>
> Stupid mailservers (or spammer configured mailservers) can put a huge
> strain on our mailservers before they get noticed, sometimes.
>
> So, we look for spam strings (spamware signature) in the headers, 5xx
> the mail and drop the connection.
Hmm.
So our discussion boils down to:
1. keeping count of after-data rejects per IP address [ per last xx minutes ]
2. denying relaying for IPs which exceed some rate limit
3. rejecting connections via 4xx or 5xx SMTP banner for IPs which exceed some another rate limit
4. rejecting connections via routing/firewall rules for IPs which exceed yet another rate limit
It is possible on step 1 log not only IP address, but also HELO/EHLO parameter and sender address.
This allows to make blocks on step 2 be more fine-grained.
new sequence:
1. keeping count of after-data rejects per {IP address,EHLOorHELO, its parameter, sender} tuple [ per last xx minutes ]
Lets call this tuple a 'fingerprint'.
2. denying relaying for fingerprints that exceed some per-tuple rate limit
2. denying relaying for IPs which exceed some per-IP rate limit
3. rejecting connections via 4xx or 5xx SMTP banner [possibly stating reason] for IPs which exceed some another per-IP
rate limit
4. rejecting connections via routing/firewall rules for IPs which exceed yet another per-IP rate limit
To save more resources step 3 can be eliminated.
Add some timeout after which hosts and fingerprints get purged from records.
Set up some more-or-less harsh ratelimits. Make them depend [dynamically] on server load, rate of incoming
connections, sysadmin availability per time of day/day of week, whatever.
Step 2 requires some type of lookup per connection. (precisely, per first RCPT TO).
One can either write a sidekick daemon, or a dnsdb containing, say, MD5 sums of fingerprints
as keys. Or base64-encoded some-one-or-other-way serialized fingerprints. Or whatever.
Step 1 requires the event of rejection at DATA stage be sent to the database-caring daemon-or-something.
You do it by scanning logs. Can't comment on systems under your kinds of load, but I'd prefer an
UDP datagram sent, internal count updated immediately, block flag for tuple updated as soon as rate-limit
is reached. That should provide more responsiveness.
Can't think of anything that can be done beyond this. Having no experience in such large-scale operations
my thoughts may well make no sense at all. Hope they help you though.