I think I really have a good spam catching trick and I'm testing it. I'm
going to throw it out here and see what everyone thinks. It really usus
the power of exim filters.
This trick catches the deliberately misspelled words in the subject spam.
Examples: v!agra g1rls disc0unt pen1s b_i_t_c_h
Here's how I do it. I have a list of words in a flat file that people
might try to misspell. The words are spelled correctly in this file. The
file loks like this:
discount
free
girl
incest
lolita
lowest
mortgage
order
porn
rape
removed
Then - I read the file and do text substitution on the subjuect line and
create a new header with all the correctly spelled words removed.
Remember - we only want to catch deliberately misspelled words.
Then - I do text substitution again creating a second temporary header.
This time I read the first header and get rid of spaces, punctuation,
garbage characters and translate characters the was spammers misspell
words. 0-o 1-i !-i 3-e foreign characters - etc. In other words - I
correct the spelling by character sunstitution.
Finally I test the words against the list a second time and then if
there's a match - we have found deliberately misspelled words!
What I do then is erase the temporary headers and set a new one
indicating that I found a deliberately misspelled word and pass it to
spam assassin for scoring.
So - how do I do it? Here's the code. It will get better so look for a
newer version soon!
###################################################
# This filter tests for misspelled words using punctuation
# y0ung g!rls - but not young girls
# First - we remove all properly spelled words from the subject and store it
# in X-Temp1 leaving only deliberately misspelled words
headers add "X-Temp1:
${sg{$h_subject:}{\x28${sg{${sg{${sg{${readfile{/etc/exim/lists/misspell}{|}}}{\\\\|+}{|}}}{#.*?\\\\|}{}}}{\\\\|\\$}{}}\x29}{}}"
# Then we translate characters into other characters the way spammers do
# 0-o 1-i !-i and spaces and punctuation is deleted correcting the spelling
headers add "X-Temp2: ${sg{${tr{$h_X-Temp1:}\
{äè@1!03-:_*,.%^~`;|/}{aeaiioe }}}{ }{}}"
# We then test it again to see if the prohibited words reappear - if so
- it's spam
if "$h_X-Temp2:" matches
\x28${sg{${sg{${sg{${readfile{/etc/exim/lists/misspell}{|}}}{\\|+}{|}}}{#.*?\\|}{}}}{\\|\$}{}}\x29
then
headers add "X-Temp-Misspell: YES"
endif
headers remove "X-Temp1:"
headers remove "X-Temp2:"