RE: [Exim] Re: Spam Assassin vs. Bogofilter

Top Page
Delete this message
Reply to this message
Author: Arnulv Rudland
Date:  
To: exim-users
Subject: RE: [Exim] Re: Spam Assassin vs. Bogofilter
Well, I got curious and managed to install bogofilter after all

In a very breif test, _on my data_ bgofilter seems to be fast, but has a bad
filter score:

Populations (approx. same as the other tests): 1409 good, 2006 bad

~ 0% false positives (1 of 1409)
~ 9% false negatives (180 of 2009) initially, reduced to
~ 4.5% when only accepting spamicity of 0.0000

This ist still way above the others.

In my environment spamprobe is the definiteive winner:
+ few false ngeatives
+ higher flexcibility tah bogofilter
+ easy and safe handling of relabeling false positives- and negatives due to
md5-identificatio of messages

<disclaimer>
This is my result on my data. Speed doesn't count that much.
And, of course, your mileage may vary.
</disclaimer>

Arnulv


and spamprobe still seems to be more flexible learnig fast i the following.
With spamprobe it is especially eas< and safe

>
> I didn't get bogofilter install, due too some obscure library
> incompatibilities, but
> I've just installed
>
>     spamprobe

>
> (http://sourceforge.net/projects/spamprobe/)
> which is another Bayesian spam filter.
> Requirements: BerkeleyDB v3+. The application is written in c++.
> After I installed and compiled the actual BerkeleyDB,
> spamprobe compiled and
> runs under SuSE Linux 7.0 (2.2.16 Kernel).
>
> On a base of 1000-2000 good/bad messages, it delivers
> astonishing 0 (zero)
> false positives and less than 1.5% false negatives (i.e.
> undetected SPAMs)
> in test on the base data.
>
> spamprobe works as a filter, generally communicating via
> Header fields.
> It reads the spam message from STDIN or a filename as
> argument and returns
> string output, typically "GOOD <factor> <check-sum>" or
> "SPAM <factor>
> <check-sum>"
> which can be captured by the calling process.
>
> I also tested
>
>     Bayespam

>
> (http://sourceforge.net/projects/bayespam/)
> Bayespam is a perl-script. It requires MIME::Parser and
> DB_File. Bayespam
> had a false positive / false negative ratio of 2,5 to 3% in _my_ data.
>
> I chose spamprobe for several reasons:
>
> 1. spamprobe appeared to be faster
> 2. spamprobe performed better on my data sample
> 3. spamprobe accepts mbox as well as maildir format,
> 4. spamprobe is relative comfortable in maintenance an use:
>     - is self-learning
>     - can combine personal and group dictionaries
>     - has several command line tuning options
>     - fair diagnostic utilities
>     - good documentation

>
> I, for my part use it only indirectly via procmail and put a
> spam-filter
> into the mail clients.
>
> # procmail reciept:
>     :0
>     SCORE=| /usr/local/bin/spamprobe -8 -c -D
> /dir/to/group/dict/ receive
>     :0 wf
>     | formail -I "X-SpamProbe: $SCORE"

>
> This might be neither elegant nor fast but it works.
>
>
>
>
> <disclaimer>
> I am no ISP. I have only a few hundred emails a day to process.
> </disclaimer>
>
>
>
> > -----Original Message-----
> > From: exim-users-admin@??? [mailto:exim-users-admin@exim.org]On
> > Behalf Of Dennis Davis
> > Sent: Monday, November 18, 2002 3:36 PM
> > To: exim-users@???
> > Subject: [Exim] Re: Spam Assassin vs. Bogofilter
> >
> >
> > >I am looking into integrating a spam filter with Exim. I was
> > >mainly looking at Spam Assassin,
> >
> > Existing techniques for this are usually combined with virus
> > scanning. I know of the following that can be used with exim. All
> > combine virus scanning with spam detection. I suspect (3) may be
> > the only approach that can just run spam detection.
> >
> > (1) amavisd-new, available from:
> >
> >     http://www.ijs.si/software/amavisd/

> >
> >     (The amavis project is based at:

> >
> >      http://www.amavis.org/

> >
> >      and is just concerned with virus detection.)

> >
> > (2) MailScanner from:
> >
> >     http://www.sng.ecs.soton.ac.uk/mailscanner/

> >
> > (3) Tom Kistner's exiscan from:
> >
> >     http://duncanthrax.net/exiscan/

> >
> > Of the above I expect that MailScanner will make the most efficient
> > use of CPU power etc.
> >
> > Tom Kistner's exiscan uses exim4's local_scan facility and can
> > reject suspect email during the SMTP transacation. I personally
> > like this approach; it gets rid of suspect mail at the earliest
> > opportunity. Note it rejects the suspect email while the SMTP
> > connection is still open. Thus the sending MTA is responsible for
> > generating the error message. The receiving MTA won't have to care
> > if the envelope sender is forged -- this happens a lot with viruses
> > sent by email -- and so run the risk of sending a virus warning
> > message to the wrong person.
> >
> > > but a colleague gave me a magazine article which mentioned
> > >Bogofilter (http://www.tuxedo.org/~esr/bogofilter/). I was
> > >wondering if anybody on the list has used or tried this software
> > >with Exim. If so, do you know how well it compares with others,
> >
> > Bogofilter is bayesian mail filtering software. For further
> > information you might like to look at Paul Graham's article:
> >
> > http://www.paulgraham.com/spam.html
> >
> > Other bayesian mail filtering software can be found at:
> >
> > http://sourceforge.net/projects/bmf
> >
> > I've certainly not used bmf. However there is a port of this
> > software in the latest OpenBSD ports tree. And I've seen comments
> > from one OpenBSD user that he's happy with using bmf. As usual,
> > your mileage may vary...
> >
> > --
> >
> > ## List details at
> > http://www.exim.org/mailman/listinfo/exim-users Exim details
> > at http://www.exim.org/ ##
> >
> >
>
>
> --
>
> ## List details at

http://www.exim.org/mailman/listinfo/exim-users Exim details at
http://www.exim.org/ ##