Autor: Peter Bowyer Data: A: Exim Users Mailing List Assumpte: Re: [exim] Has anyone done this?
On Sun, 06 Mar 2005 06:49:47 -0800, Marc Perkel <marc@???> wrote:
> Hey - if I have a bad idea - don't worry about coming to my defense. ;)
Not something I usually have a problem with :-)
Leaving aside the legality/sensibility of doing portscans on
connecting hosts, any test which can differentiate spam from nonspam
will be of help in a Bayesian system, provided it can be trained.
The training might be an issue here. With content-based tests/patterns
you can push through a huge corpus of pre-determined spam and nonspam
to do the training - but with this test you can't because it relies on
connection-time information, so you have to train one-at-a-time and it
will take a while to become effective.
The point about 'you'll block loads of nonspam from big ISPs' won't be
an issue because the training will sort that out - after training, the
weight given to this 'test's contribution to a spamscore will be
exactly 'right' according to the statistical probability of a message
hitting that test and all the others being spam and the statistical
probability of it being nonspam. Actually there's no concept of
individual tests at this point, but that's a good way of thinking
about it.
A test that doesn't contribute to the score can be left out - but it
won't be skewing your results.
Peter
--
Peter Bowyer
Email: peter@???
Tel: +44 1296 768003
VoIP: sip:peter@???