This test is a hybrid between Exim and Spam Assassin. SA doesn't have
the ability to derive regular expressions from a list like Exim does. I
wish it did.
The test is extremely accurate and effective. It catches virtually all
misspelled words from a list but ignores them if they are spelled
correctly. So you can use "diploma" in the subject but not "d1pl.0_ma".
No false positives and very few spams of this nature are uncaught. It
catches a very wide variety of deliberately misspelled words.
I also request that some Spam Assassin programmer code this into SA so
that people using other mailers can take advantage of it.
Feedback is appreciated.
###################################################
# This filter tests for misspelled words using punctuation
# y0ung g!rls - but not young girls
# First I try to separate real words by changing the spaces into X so that
# when I remove spaces - prohibited words aren't created but joining
# unrelated words. It keps phrases like "this alert" from
# becoming "thi[sale]rt". Any space after 4 characters from a-z
# is considered to be a hard space as opposed to gappy text.
headers add "X-Temp1: ${sg{${lc:$h_subject:${substr_0_150:$message_body}}}\
{\\N([a-z]{4,}) \\N}{\\N$1X\\N}}"
# Then we remove all properly spelled words from the subject and store it
# in X-Temp2 leaving only deliberately misspelled words.
# I use Z as a word separator when removing a word so that words
# running together don't form other words in the list.
headers add "X-Temp2: ${sg{$h_X-Temp1:}\
{\x28${sg{${sg{${sg{${readfile{/etc/exim/lists/misspell}{|}}}{\\\\|+}{|}}}{#.*?\\\\|}{}}}{\\\\|\\$}{}}\x29}{Z}}"
# Then we translate characters into other characters the way spammers do
# 0-o 1-i !-i and spaces and punctuation is deleted correcting the spelling
headers add "X-Temp3: ${sg{${tr{$h_X-Temp2:}\
{àáâãäåèéëìíîïòóôõöùúûüýÿñ×@1!03\\$#-:_*,.%^~`;|/}\
{aaaaaaeeeiiiiooooouuuuyynxaiioes }}}{ }{}}"
# We then test it again to see if the prohibited words reappear after
character
# translation and removal of junk characters. If so - it's spam.
# The new header is the flag indicating a positive match which is
# passed on to Spam Assassin for scoring.
if "$h_X-Temp3:" matches \
\x28${sg{${sg{${sg{${readfile{/etc/exim/lists/misspell}{|}}}{\\|+}{|}}}{#.*?\\|}{}}}{\\|\$}{}}\x29
then
headers add "X-Temp-Misspell: YES"
endif
# Finally - we get rid of headers used for temporary variables.
headers remove "X-Temp1:"
headers remove "X-Temp2:"
headers remove "X-Temp3:"
-----------------------
Spam Assassin Rule
header MISSPELL ALL =~ /X-Temp-Misspell/
describe MISSPELL Words with gaps punctuation substitute chars
score MISSPELL 8
-----------------
Part of my list of words I test for:
adult
adv
assistence
attract
auction
banned
best
bitch
business
cable
cards
cartriges
cash
casino
celeb
cheap
click
credit
cunt
debt
dick
digital
diploma
discount
doctor
dollar
domain
drug