[Exim] Spam test to detect misspelled words

Top Page
Delete this message
Reply to this message
Author: Marc Perkel
Date:  
To: exim-users, spamassassin-devel
Subject: [Exim] Spam test to detect misspelled words
This test is a hybrid between Exim and Spam Assassin. SA doesn't have
the ability to derive regular expressions from a list like Exim does. I
wish it did.

The test is extremely accurate and effective. It catches virtually all
misspelled words from a list but ignores them if they are spelled
correctly. So you can use "diploma" in the subject but not "d1pl.0_ma".
No false positives and very few spams of this nature are uncaught. It
catches a very wide variety of deliberately misspelled words.

I also request that some Spam Assassin programmer code this into SA so
that people using other mailers can take advantage of it.

Feedback is appreciated.

###################################################
# This filter tests for misspelled words using punctuation
# y0ung g!rls - but not young girls

# First I try to separate real words by changing the spaces into X so that
# when I remove spaces - prohibited words aren't created but joining
# unrelated words. It keps phrases like "this alert" from
# becoming "thi[sale]rt". Any space after 4 characters from a-z
# is considered to be a hard space as opposed to gappy text.

headers add "X-Temp1: ${sg{${lc:$h_subject:${substr_0_150:$message_body}}}\
{\\N([a-z]{4,}) \\N}{\\N$1X\\N}}"

# Then we remove all properly spelled words from the subject and store it
# in X-Temp2 leaving only deliberately misspelled words.
# I use Z as a word separator when removing a word so that words
# running together don't form other words in the list.

headers add "X-Temp2: ${sg{$h_X-Temp1:}\
{\x28${sg{${sg{${sg{${readfile{/etc/exim/lists/misspell}{|}}}{\\\\|+}{|}}}{#.*?\\\\|}{}}}{\\\\|\\$}{}}\x29}{Z}}"

# Then we translate characters into other characters the way spammers do
# 0-o 1-i !-i and spaces and punctuation is deleted correcting the spelling

headers add "X-Temp3: ${sg{${tr{$h_X-Temp2:}\
{àáâãäåèéëìíîïòóôõöùúûüýÿñ×@1!03\\$#-:_*,.%^~`;|/}\
{aaaaaaeeeiiiiooooouuuuyynxaiioes              }}}{ }{}}"


# We then test it again to see if the prohibited words reappear after
character
# translation and removal of junk characters. If so - it's spam.
# The new header is the flag indicating a positive match which is
# passed on to Spam Assassin for scoring.

if "$h_X-Temp3:" matches \
\x28${sg{${sg{${sg{${readfile{/etc/exim/lists/misspell}{|}}}{\\|+}{|}}}{#.*?\\|}{}}}{\\|\$}{}}\x29
then
headers add "X-Temp-Misspell: YES"
endif

# Finally - we get rid of headers used for temporary variables.

headers remove "X-Temp1:"
headers remove "X-Temp2:"
headers remove "X-Temp3:"

-----------------------

Spam Assassin Rule

header MISSPELL          ALL =~ /X-Temp-Misspell/
describe MISSPELL        Words with gaps punctuation substitute chars
score MISSPELL 8


-----------------

Part of my list of words I test for:

adult
adv
assistence
attract
auction
banned
best
bitch
business
cable
cards
cartriges
cash
casino
celeb
cheap
click
credit
cunt
debt
dick
digital
diploma
discount
doctor
dollar
domain
drug