Dear ex[im]perts,
I'm in dire need of some advice on implementing a per-user
spam-detection system.
All our site mail passes through our mail hub, and I would like
to implement a per-user spam detector (bogofilter/spamassassin etc)
on it. There are no user accounts on this hub, all mail is routed off to
other hosts via entries in the alias file.
It has a per-user exim filter implemented, along the lines of the
'Per-address filter' in Chapter 39.8 of the Exim-4 docs, and the
users filtering requirements are input via a web interface.
I want to augment this.
The use of procmail et al is not appropriate as mail is read by a wide
variety of MUA's on Macs, Windows and U*X
I dont want to filter spam on a globally-configured basis.
Users would each supply their own 'learning' data and the spam filter
would be invoked on a per-user basis to use that person's data.
I would also like to offer them the options of a) no spam-analysis,
b) spam-analysis and flagging of the message with a 'Probably-spam' header
and then delivering it to them, or c) just junking it (maybe to a big circular
spam-bucket to permit limited-time emergency retrieval).
AIUI, the per-user requirement for spam-filtering seems to me to impose
several restrictions (apologies if I'm talking thro' my hat from here on,
I'm an exim layman)
1. The spam filtering can't be done at SMTP-receipt time. Even if I
could use exiscan (and I dont think I can config this on a per-user
basis), I would still have a problem with messages arriving addressed
to a list of recipients. I have to wait until alias-routing is complete
so that I can have one recipient-per-message.
Furthermore, since this box is a hub, it will be receiving SMTP mail
from internal hosts, destined for outside. I dont want to filter these.
2. Similary, making the spam-filter a 'local_scan' function would mean
that filtering would also take place too early, and it gets run for
*every* message, not just SMTP (not that there *are* many messages other
than SMTP on this box, but it does have one local account).
3. The way that seems to be used by most of the correspondents in the
exim archives, is to use the transport_filter method. This does
the filtering at the point of delivery, but I cant see how I would
be able to make per-user decisions on what to do with the message
after it had been processed by the spam-analyzer.
The ideal thing for me would be, for all messages arriving from offsite,
to at some appropriate time, conditionally pass the message through the
filter if the user had 'spam-filtering' turned on, and to have the filter
add some sort of 'This looks like spam' header to the message. I would then
like to be able to optionally dispose of this message in some way
(e.g. to the circular bucket) if the user had requested 'junk all spam',
or else deliver it as normal.
Maybe I can make the choice of transport conditional on 'this an incoming message'
and then on a per-user 'filter-my-spam' flag, but that still leaves me with the
problem of what to do with the msg after its been through the transport_filter.
Seems more like a routing job. Is there a router_filter...
Again, from the archives, I get the impression that starting up a new process
(/usr/local/bin/spamassasin or /usr/local/bin/bogofilter or whatever) for
each message is a heavyweight action - spamassassin offers spamc/spamd as a
way round this, and exiscan makes the whole thing part of exim.
Can I avoid this startup penalty by making the thing part of a custom router?
Would this approach enable me to solve *all* these problems I wonder?
Cheers, and thanks for reading this far!
Terry.
Terry Horsnell (tsh@???)
I.T. Manager
Medical Research Council
Lab of Molecular Biology
Hills Road
CAMBRIDGE CB2 2QH
U.K.
Phone: +44 (0)1223 248011
Fax: +44 (0)1223 213556