Re: [exim] SPAM Filtering - Losing the war!

Top Page
Delete this message
Reply to this message
Author: Odhiambo G. Washington
Date:  
To: exim-users
Subject: Re: [exim] SPAM Filtering - Losing the war!
* On 23/10/06 21:16 -0200, Marlon Cabrera Oliveira wrote:
| Hi,
| 
| > To tell you the truth I'm losing ground lately against spammers. Two
| > reasons. The Image spam is getting through and because it poisons the
| > bayes I've lost much of the effectiveness of bayes filtering. I'm still
| > holding on but I've had people who I hosted for for over a year who
| > never had a single spam who are now getting a few. I am also having a
| > few more false positives than I used to.
| 
| 
| I'm having succes here detecting image spam using OSBF-Lua filter:
| 
| from OSBF-lua website:
| 
| "OSBF-Lua (Orthogonal Sparse Bigrams with confidence Factor) is a Lua C module 
| for text classification. It is a port of the OSBF classifier implemented in 
| the CRM114 project. This implementation attempts to put focus on the 
| classification task itself by using Lua as the scripting language, a powerful 
| yet light-weight and fast language, which makes it easier to build and test 
| more elaborated filters and training methods.
| 
| The OSBF algorithm is a typical Bayesian classifier but enhanced with two 
| techniques that I originally developed for the CRM114 project: Orthogonal 
| Sparse Bigrams - OSB, for feature extraction, and the Exponential 
| Differential Document Count - EDDC (a.k.a Confidence Factor) for automatic 
| feature selection. Combined, these two techniques produce a highly accurate 
| classifier. OSBF was developed focused on two classes, SPAM and NON-SPAM, so 
| the performance for more than two classes may not be the same."
| 
| 
| OSBF-Lua learn very fast. It only require Lua 5.1 installed on Exim server 
| with dynamic loading enabled. 
| See install doc; http://osbf-lua.luaforge.net/#installation
| 
| 
| On exim.conf I add this statements:
| 
| On ## ON CONFIGURATION SETTINGS ##
| 
| # set OSBF_LUA_DIR to where spamfilter.lua, spamfilter_command.lua etc were 
| #installed
| OSBF_LUA_DIR=/usr/local/osbf-lua
| 
| 
| On ## TRANSPORTS CONFIGURATION ##
| 
| 
| add transport_filter to local_delivery transport:
| 
| local_delivery:
|    driver = appendfile
|    check_string = ""
|    create_directory
|    delivery_date_add
|    directory = ${home}/Maildir/
|    directory_mode = 700
|    envelope_to_add
|    return_path_add
|    group = mail
|    maildir_format
|    maildir_tag = ,S=$message_size
|    message_prefix = ""
|    message_suffix = ""
|    mode = 0600
|    quota = ${lookup{$local_part}lsearch*{/etc/mail/quota_usr}{$value}    {4M}}
|    quota_size_regex = S=(\d+)$
|    quota_warn_threshold = 75%
|    transport_filter = OSBF_LUA_DIR/spamfilter.lua --udir $home/osbf-lua
| 
| 
| that's it!! :)
| 
| 
| Verify our setup sending a message to yourself with the following in the 
| subject line: help <your password> 
| 
| You will receive a message with a help about spamfilter.
| 
| To verify that databases wre created correctly: stats <your password>
| 
| >From now, all mesages that you received will be classified and tagged 
| according the score they get:
| 
| Tag           Meaning
| 
| [--]         almost sure it's a spam - score <= -20
| 
| [-]          probably it's a spam (reinforcement zone) - score < 0 and > -20
| 
| [+]        probably it's not spam (reinforcement zone) - score >=0 and < 20
| 
| [++]     almost sure it's not spam - score >= 20. This tag is here just for   
| symmetry, it's not used. An empty tag is used in place of it so as not to 
| pollute the messages.
| 
| 
| If the classification is wrong you nust train the filter replaying the message 
| back to yourself, replacing the subject with the correspondent training 
| command:
| 
| learn <password> spam or learn <password> nonspam 
| 
| 
| After training a few messages, osbf-lua will increase the accuracy on spam 
| detection.
| If you have a pre-classified messages (nonspam / spam) database on a imap 
| folder, you can use the script toer.lua to do the training.


This doesn't look like a good solution. We simply don't want to accept
the message, if that were possible. Of course I know it's possible with
Exim, but the fact that this still leans towards SpamAssassin-ism ....
If this could be integrated within Exiscan framework, then I'd rethink
my stand.


        cheers
       - wash 
+----------------------------------+-----------------------------------------+
Odhiambo Washington                    . WANANCHI ONLINE LTD (Nairobi, KE)  |
wash () WANANCHI ! com            . 1ere Etage, Loita Hse, Loita St.,  |
GSM: (+254) 722 743 223            . # 10286, 00100 NAIROBI             |
GSM: (+254) 733 744 121            . (+254) 020 313 985 - 9             |
+---------------------------------+------------------------------------------+
"Oh My God! They killed init! You Bastards!"  
                         --from a /. post