Re: [exim] SPAM Filtering - Losing the war!

Author: Odhiambo G. Washington
Date:
To: exim-users
Subject: Re: [exim] SPAM Filtering - Losing the war!

* On 23/10/06 21:16 -0200, Marlon Cabrera Oliveira wrote:
| Hi, | | > To tell you the truth I'm losing ground lately against spammers. Two | > reasons. The Image spam is getting through and because it poisons the | > bayes I've lost much of the effectiveness of bayes filtering. I'm still | > holding on but I've had people who I hosted for for over a year who | > never had a single spam who are now getting a few. I am also having a | > few more false positives than I used to. | | | I'm having succes here detecting image spam using OSBF-Lua filter: | | from OSBF-lua website: | | "OSBF-Lua (Orthogonal Sparse Bigrams with confidence Factor) is a Lua C module | for text classification. It is a port of the OSBF classifier implemented in | the CRM114 project. This implementation attempts to put focus on the | classification task itself by using Lua as the scripting language, a powerful | yet light-weight and fast language, which makes it easier to build and test | more elaborated filters and training methods. | | The OSBF algorithm is a typical Bayesian classifier but enhanced with two | techniques that I originally developed for the CRM114 project: Orthogonal | Sparse Bigrams - OSB, for feature extraction, and the Exponential | Differential Document Count - EDDC (a.k.a Confidence Factor) for automatic | feature selection. Combined, these two techniques produce a highly accurate | classifier. OSBF was developed focused on two classes, SPAM and NON-SPAM, so | the performance for more than two classes may not be the same." | | | OSBF-Lua learn very fast. It only require Lua 5.1 installed on Exim server | with dynamic loading enabled. | See install doc; http://osbf-lua.luaforge.net/#installation | | | On exim.conf I add this statements: | | On ## ON CONFIGURATION SETTINGS ## | | # set OSBF_LUA_DIR to where spamfilter.lua, spamfilter_command.lua etc were | #installed | OSBF_LUA_DIR=/usr/local/osbf-lua | | | On ## TRANSPORTS CONFIGURATION ## | | | add transport_filter to local_delivery transport: | | local_delivery: | driver = appendfile | check_string = "" | create_directory | delivery_date_add | directory = ${home}/Maildir/ | directory_mode = 700 | envelope_to_add | return_path_add | group = mail | maildir_format | maildir_tag = ,S=$message_size | message_prefix = "" | message_suffix = "" | mode = 0600 | quota = ${lookup{$local_part}lsearch*{/etc/mail/quota_usr}{$value} {4M}} | quota_size_regex = S=(\d+)$ | quota_warn_threshold = 75% | transport_filter = OSBF_LUA_DIR/spamfilter.lua --udir $home/osbf-lua | | | that's it!! :) | | | Verify our setup sending a message to yourself with the following in the | subject line: help <your password> | | You will receive a message with a help about spamfilter. | | To verify that databases wre created correctly: stats <your password> | | >From now, all mesages that you received will be classified and tagged | according the score they get: | | Tag Meaning | | [--] almost sure it's a spam - score <= -20 | | [-] probably it's a spam (reinforcement zone) - score < 0 and > -20 | | [+] probably it's not spam (reinforcement zone) - score >=0 and < 20 | | [++] almost sure it's not spam - score >= 20. This tag is here just for | symmetry, it's not used. An empty tag is used in place of it so as not to | pollute the messages. | | | If the classification is wrong you nust train the filter replaying the message | back to yourself, replacing the subject with the correspondent training | command: | | learn <password> spam or learn <password> nonspam | | | After training a few messages, osbf-lua will increase the accuracy on spam | detection. | If you have a pre-classified messages (nonspam / spam) database on a imap | folder, you can use the script toer.lua to do the training.

This doesn't look like a good solution. We simply don't want to accept
the message, if that were possible. Of course I know it's possible with
Exim, but the fact that this still leans towards SpamAssassin-ism ....
If this could be integrated within Exiscan framework, then I'd rethink
my stand.

        cheers
       - wash 
+----------------------------------+-----------------------------------------+
Odhiambo Washington                    . WANANCHI ONLINE LTD (Nairobi, KE)  |
wash () WANANCHI ! com            . 1ere Etage, Loita Hse, Loita St.,  |
GSM: (+254) 722 743 223            . # 10286, 00100 NAIROBI             |
GSM: (+254) 733 744 121            . (+254) 020 313 985 - 9             |
+---------------------------------+------------------------------------------+
"Oh My God! They killed init! You Bastards!"  
                         --from a /. post

This message is part of the following thread:
	the complete thread tree sorted by date
	W B Hacker at
	Marlon at