[exim] Reducing load vs seeing all the spam

Author: Peter Bowyer
Date:
To: Exim Users Mailing List
Subject: [exim] Reducing load vs seeing all the spam

Hi all

I'm reviewing how we process incoming mail.

Currently we apply the following checks in the RCPT ACL:

- Sending IP in local blacklist
- Missing HELO
- Blacklisted HELO (eg oemcomputer.com)
- HELO syntax (needs a '.', shouldn't be a bare IP)
- HELO forgery (must not be one of our domains or IPs)
- sbl-xbl.spamhaus.org DNSBL check
- Sender verify (no callout)
- Recipient verify with callout to non-local domains

This deals with a huge percentage of our unwanted mail.

Then in the DATA ACL, we call ClamAV and SpamAssassin via Exiscan,
which deals with another chunk. Several SA thresholds are implemented
for different classes of user, with the 'one class only per
connection' trick.

However.... in this architecture, which is pretty common, SA doesn't
get to see all the spam, so the Bayesian learning is skewed towards
learning ham.

I suspect there would be benefit in letting SA 'see' the spam as well,
perhaps not in real-time (ie in-line with the SMTP transaction), so it
can learn spam as well as ham.

Is anyone else doing this? How are you implementing it? I guess we
could set an ACL variable in the RCPT acl instead of rejecting, and
then do a 'control=fakereject' in the DATA acl if the variable is set
and bypass the SA scan. And deliver the spam to a pipe to the spamc
client, perhaps via a queue-only to control the load.

Or simply let SA see everything inline, and make sure we reject after
DATA if we would have rejected after RCPT?

Clearly our SA load will increase dramatically if we let everything
through to it.

Any suggestions? Or am I worrying about a non-problem?

Thanks

Peter

--
Peter Bowyer
Email: peter@???
Tel: +44 1296 768003
VoIP: sip:peter@???

This message is part of the following thread:
	the complete thread tree sorted by date

	Peter Bowyer at