Hi,
> To tell you the truth I'm losing ground lately against spammers. Two
> reasons. The Image spam is getting through and because it poisons the
> bayes I've lost much of the effectiveness of bayes filtering. I'm still
> holding on but I've had people who I hosted for for over a year who
> never had a single spam who are now getting a few. I am also having a
> few more false positives than I used to.
I'm having succes here detecting image spam using OSBF-Lua filter:
from OSBF-lua website:
"OSBF-Lua (Orthogonal Sparse Bigrams with confidence Factor) is a Lua C module
for text classification. It is a port of the OSBF classifier implemented in
the CRM114 project. This implementation attempts to put focus on the
classification task itself by using Lua as the scripting language, a powerful
yet light-weight and fast language, which makes it easier to build and test
more elaborated filters and training methods.
The OSBF algorithm is a typical Bayesian classifier but enhanced with two
techniques that I originally developed for the CRM114 project: Orthogonal
Sparse Bigrams - OSB, for feature extraction, and the Exponential
Differential Document Count - EDDC (a.k.a Confidence Factor) for automatic
feature selection. Combined, these two techniques produce a highly accurate
classifier. OSBF was developed focused on two classes, SPAM and NON-SPAM, so
the performance for more than two classes may not be the same."
OSBF-Lua learn very fast. It only require Lua 5.1 installed on Exim server
with dynamic loading enabled.
See install doc;
http://osbf-lua.luaforge.net/#installation
On exim.conf I add this statements:
On ## ON CONFIGURATION SETTINGS ##
# set OSBF_LUA_DIR to where spamfilter.lua, spamfilter_command.lua etc were
#installed
OSBF_LUA_DIR=/usr/local/osbf-lua
On ## TRANSPORTS CONFIGURATION ##
add transport_filter to local_delivery transport:
local_delivery:
driver = appendfile
check_string = ""
create_directory
delivery_date_add
directory = ${home}/Maildir/
directory_mode = 700
envelope_to_add
return_path_add
group = mail
maildir_format
maildir_tag = ,S=$message_size
message_prefix = ""
message_suffix = ""
mode = 0600
quota = ${lookup{$local_part}lsearch*{/etc/mail/quota_usr}{$value} {4M}}
quota_size_regex = S=(\d+)$
quota_warn_threshold = 75%
transport_filter = OSBF_LUA_DIR/spamfilter.lua --udir $home/osbf-lua
that's it!! :)
Verify our setup sending a message to yourself with the following in the
subject line: help <your password>
You will receive a message with a help about spamfilter.
To verify that databases wre created correctly: stats <your password>
From now, all mesages that you received will be classified and tagged
according the score they get:
Tag Meaning
[--] almost sure it's a spam - score <= -20
[-] probably it's a spam (reinforcement zone) - score < 0 and > -20
[+] probably it's not spam (reinforcement zone) - score >=0 and < 20
[++] almost sure it's not spam - score >= 20. This tag is here just for
symmetry, it's not used. An empty tag is used in place of it so as not to
pollute the messages.
If the classification is wrong you nust train the filter replaying the message
back to yourself, replacing the subject with the correspondent training
command:
learn <password> spam or learn <password> nonspam
After training a few messages, osbf-lua will increase the accuracy on spam
detection.
If you have a pre-classified messages (nonspam / spam) database on a imap
folder, you can use the script toer.lua to do the training.
Regards,
Marlon