I don't think it's off topic at all. That's very interesting. I'm going
to experiment with your list. If you have a longer list I'd be
interested in that too.
Stanislaw Halik wrote:
> Hello,
>
> Although the subject is offtopic to Exim itself, I've decided to post
> it, knowing that many inspired spam fighters read the list.
>
> Many spam messages are distributed in large quantities idempotent and
> even if not, some expressions are contained in many of them.
>
> I've just started publishing my spamtraps and I got around 160
> text/plain parts from them.
>
> I searched for 5-word sentences that are included multiple times. Here's
> what I got:
>
> 11 view available updated software from
> 11 new software for you click
> 11 some new software for you
> 11 you click here to view
> 11 here to view available updated
> 11 for you click here to
> 11 to view available updated software
> 11 software for you click here
> 11 has uploaded some new software
> 11 uploaded some new software for
> 11 click here to view available
> 10 in symbols for i386 i386
> 10 number lotto ball number lotto
> 10 reading in symbols for i386
> 10 ball number lotto ball number
> 10 lotto ball number lotto ball
>
> I run my spamtrap mail through procmail, then it is sent to mimedump
> which extracts text/plain parts to it. I might create an automated
> script which creates SpamAssassin rulesets from popular spam phrases,
> then update SA configs the same way I do for other rulesets. Fully
> automated, not needing any human intervention whatsoever.
>
> Any comments or criticism? Only vulnerability I can think of is spammers
> poisoning the results via non-spammy phrases, but I don't think they
> would all do so because of a method created by one insignificant spam
> fighter. Nevertheless, I wouldn't assign extremely high SA scores, as
> it's a fully automated process.
>
>