[exim] [OT] an automated spam filtering technique

Pàgina inicial
Delete this message
Reply to this message
Autor: Stanislaw Halik
Data:  
A: exim-users
Assumpte: [exim] [OT] an automated spam filtering technique
Hello,

Although the subject is offtopic to Exim itself, I've decided to post
it, knowing that many inspired spam fighters read the list.

Many spam messages are distributed in large quantities idempotent and
even if not, some expressions are contained in many of them.

I've just started publishing my spamtraps and I got around 160
text/plain parts from them.

I searched for 5-word sentences that are included multiple times. Here's
what I got:

  11    view available updated software from
  11    new software for you click
  11    some new software for you
  11    you click here to view
  11    here to view available updated
  11    for you click here to
  11    to view available updated software
  11    software for you click here
  11    has uploaded some new software
  11    uploaded some new software for
  11    click here to view available
  10    in symbols for i386 i386
  10    number lotto ball number lotto
  10    reading in symbols for i386
  10    ball number lotto ball number
  10    lotto ball number lotto ball


I run my spamtrap mail through procmail, then it is sent to mimedump
which extracts text/plain parts to it. I might create an automated
script which creates SpamAssassin rulesets from popular spam phrases,
then update SA configs the same way I do for other rulesets. Fully
automated, not needing any human intervention whatsoever.

Any comments or criticism? Only vulnerability I can think of is spammers
poisoning the results via non-spammy phrases, but I don't think they
would all do so because of a method created by one insignificant spam
fighter. Nevertheless, I wouldn't assign extremely high SA scores, as
it's a fully automated process.