Re: [exim] SPAM Filtering - Losing the war!

Góra strony
Delete this message
Reply to this message
Autor: Marlon Cabrera Oliveira
Data:  
Dla: exim-users
Temat: Re: [exim] SPAM Filtering - Losing the war!
Hi,

> To tell you the truth I'm losing ground lately against spammers. Two
> reasons. The Image spam is getting through and because it poisons the
> bayes I've lost much of the effectiveness of bayes filtering. I'm still
> holding on but I've had people who I hosted for for over a year who
> never had a single spam who are now getting a few. I am also having a
> few more false positives than I used to.



I'm having succes here detecting image spam using OSBF-Lua filter:

from OSBF-lua website:

"OSBF-Lua (Orthogonal Sparse Bigrams with confidence Factor) is a Lua C module
for text classification. It is a port of the OSBF classifier implemented in
the CRM114 project. This implementation attempts to put focus on the
classification task itself by using Lua as the scripting language, a powerful
yet light-weight and fast language, which makes it easier to build and test
more elaborated filters and training methods.

The OSBF algorithm is a typical Bayesian classifier but enhanced with two
techniques that I originally developed for the CRM114 project: Orthogonal
Sparse Bigrams - OSB, for feature extraction, and the Exponential
Differential Document Count - EDDC (a.k.a Confidence Factor) for automatic
feature selection. Combined, these two techniques produce a highly accurate
classifier. OSBF was developed focused on two classes, SPAM and NON-SPAM, so
the performance for more than two classes may not be the same."



OSBF-Lua learn very fast. It only require Lua 5.1 installed on Exim server
with dynamic loading enabled.
See install doc; http://osbf-lua.luaforge.net/#installation


On exim.conf I add this statements:

On ## ON CONFIGURATION SETTINGS ##

# set OSBF_LUA_DIR to where spamfilter.lua, spamfilter_command.lua etc were
#installed
OSBF_LUA_DIR=/usr/local/osbf-lua


On ## TRANSPORTS CONFIGURATION ##


add transport_filter to local_delivery transport:

local_delivery:
   driver = appendfile
   check_string = ""
   create_directory
   delivery_date_add
   directory = ${home}/Maildir/
   directory_mode = 700
   envelope_to_add
   return_path_add
   group = mail
   maildir_format
   maildir_tag = ,S=$message_size
   message_prefix = ""
   message_suffix = ""
   mode = 0600
   quota = ${lookup{$local_part}lsearch*{/etc/mail/quota_usr}{$value}    {4M}}
   quota_size_regex = S=(\d+)$
   quota_warn_threshold = 75%
   transport_filter = OSBF_LUA_DIR/spamfilter.lua --udir $home/osbf-lua



that's it!! :)


Verify our setup sending a message to yourself with the following in the
subject line: help <your password>

You will receive a message with a help about spamfilter.

To verify that databases wre created correctly: stats <your password>

From now, all mesages that you received will be classified and tagged
according the score they get:

Tag           Meaning


[--]         almost sure it's a spam - score <= -20


[-]          probably it's a spam (reinforcement zone) - score < 0 and > -20


[+]        probably it's not spam (reinforcement zone) - score >=0 and < 20


[++]     almost sure it's not spam - score >= 20. This tag is here just for   
symmetry, it's not used. An empty tag is used in place of it so as not to 
pollute the messages.



If the classification is wrong you nust train the filter replaying the message
back to yourself, replacing the subject with the correspondent training
command:

learn <password> spam or learn <password> nonspam


After training a few messages, osbf-lua will increase the accuracy on spam
detection.
If you have a pre-classified messages (nonspam / spam) database on a imap
folder, you can use the script toer.lua to do the training.


Regards,

Marlon