No question here. Just some information someone might be able to use in
their spam-fighting arsenal:
I use zen.spamhaus.org in the rcpt acl as my primary spam filter. This
catches at least 99% of the spam directed at my server. I run
spamassassin in the data acl and reject anything with a score greater
than 10.0. This rejects very close to 100% of the remainder without
losing any "ham".
HOWEVER, a couple of specific email recipient addresses have been
regularly successfully targeted by so-called "snow shoe" spammers. See:
https://blogs.sophos.com/2014/11/21/snowshoe-spam-is-on-the-rise-what-can-be-done-about-it/
These spammers constantly switch to new IP blocks where they register
numerous domains that spew out various kinds of spam that is very hard
to detect automatically. One of the zen spamhaus lists catches on to
these spammers relatively quickly and lists their whole IP block. SPAM
then ceases for a couple of days, but then they just switch to a new IP
block with new domains. In the gap between the switch to the new block
and the zen response, your server gets spammed. Rinse and repeat.
The usual techniques for recognizing spam are fairly useless against
these guys. They are very adept at tailoring the spam they send to avoid
triggering most Spamassassin checks, which pretty much cover the gamut
of ways emails can be analyzed. The emails typically generate SA scores
less than 2.0, too low to reliably flag as spam. I was pretty much
resigned to living with this level of spam due to the difficulty in
coming up with an automated way to recognize them.
After collecting about a months worth of these for one recipient address
(from two dozen different spam domains over three IP-block changes), I
did a detailed look over about two hours. I noticed an interesting
pattern. Each spam email contains several coded web links (usually both
in plain text and in html "a" tags). The URLs follow a pattern something
like the following:
http://spamdomain/something/hash-string/word
or
http://spamdomain/hash-string/word
or
http://spamdomain/word/hash-string
NOTE: "word" is always a common English word of 4 or 5 characters,
different for each URL. Several different ones will appear in the same
email. "hash string" is composed of ASCII alphanumerics, usually about
30-40 characters long, in essentially random order, both upper and lower
case. Each URL contains a different "hash-string".
HOWEVER, I noticed that "hash-string" in emails for the same target
recipient ALWAYS contains the exact same embedded substring. In the
specific case of the ones I saved, the substring is exactly 22
characters long. My theory is that the 22 characters are a hash of the
full recipient email address, and the remainder of this string is a hash
of "word", plus possibly some other keyword in the text. (Thus my term
"hash-string"). I understand that this sort of coded URL is common
among snowshoe spammers.
This has resulted in my inserting the following custom SA rule for this
specific recipient address:
(where "hash-constant" is the known hash substring that always appears
in URIs in the spam addressed to this recipient address)
### snowshoe rule for Fred@???
uri LOCAL_SNOWSHOE_RULE_1 /hash-constant/
score LOCAL_SNOWSHOE_RULE_1 20.0
describe LOCAL_SNOWSHOE_RULE_1 reject known snow shoe spammer
(Add a rule for each known hash-constant)
100% of this spam is now denied. They keep sending, despite rejection.
This rule has no effect on legitimate emails, as there is zero
likelihood of that same gibberish appearing in any URL in a legitimate
email.
I realize my Spamassin rule technique is probably not useful for a large
server. However, I can envision a script being written to recognize a
recurring pattern in embedded URLs of the type described. I would also
love if there was an automated way for an email server to notify
spamhaus of each IP that sends one of these emails, as soon as it is
received.