Author: W B Hacker Date: To: exim users Subject: Re: [exim] Need a little unix help
Marc Perkel wrote: > Here's what I'm doing. I'm using MyDNS which is a DNS server with a
> MySQL backend and so it is close to real time. I think I figured out how
> to do what I want with the client/server reporting. On the server side
> I'm using xinetd to pipe the incoming text into a perl program and then
> into MySQL. On the client side will be a small script that Exim will run
> using netcat.
>
> What I'm thinking is that the reporting client will send short strings:
>
> ham 1.2.3.4
> spam 5.6.7.8
> honeypot 9.8.7.6
Two fields only? Looks ideal for a lite, fast, cdb format...
>
> This strings will be processed on my end any update my database. Some of
> it is real time. Some of it is calculaed every 5 minutes to update the
> lists.
5 minutes is probably 'near real time' enough for smtp use.
> Servers that send only ham make the whitelist, only spam makes
> the blacklists, and mixed makes the yellowlist. It's working for me
> right now and I'm working on being able to let others read it and a
> select few feed data to it. I'll slowly increase the number of people
> that can use it and see how it scales up. At some point others will see
> how it works and want to do it big scale and do it right.
>
> One important thing to think about is that the idea of the blacklist is
> to be really accurate. But it isn't as much to catch spam, which it will
> do, but to identify ham servers and eliminate false positives. I think
> that this system if used widely enough will have it's biggest impact in
> allowing good email to pass through and eliminate false positives for
> banks and other commercial sources that never send spam.
>
>
Blocking based on protocol misbehaviour - which we rely on more than SA scores,
is at least fairly repeatable from any given IP.
When you are using spam scores, OTOH, I can't see it as either repeatable or
simple enough to 'rate' the source IP as much of anything but 'yellow'.
Here are two (of several) problem areas:
1) Insurance, mortgage, brokerage and online banking accounts, utility bills,
airmiles programs, even video-rental and supermarket chains, not to mention
certain more specialized mailing lists - typically send messages (not
necessarily sensitive information) to customers who have signed-up for them and
(mostly) want to see them.
These are nearly always in html, usually graphics-heavy, designed with not-quite
standard Win or Lin Tools, hence usually get an unfavorable spam score -
sometimes enough to need manual whitelisting. Some of the more careful spammers
actually create "cleaner" messages than the local electric company bothers to
do, 'coz they know they must do so.
2) The largest of ISP mail services may get all or nearly all of the protocol
and DNS steps spot-on, yet suffer waves and waves of compromised WinBoxen, not
to mention a high percentage of chronically broken MUA (missing headers, MIME
encoding, etc.).
In our HKG-based 'corporate' environment, we can safely block all of roadrunner
and comcast - but I dare not do that with msn/hotmail, aol, yahoo or gmail.
- Too much risk of blocking new client inquiries that some of our clients rely on.
All of these providers are *way* better behaved than they were 3-4 years ago
when the worst 3 admitted to trafficing 2 *billion* spam messages - that they
knew about - per every 24 hours.
But cleaned-up or not, the very size of their nearly-100% WinWoes customer base
means they will probably *never* be off your 'yellow' list, and may spend a lot
of time on the blacklist.
So - I don't see that one can draw a sufficiently accurate *generalization* of
IP 'goodness' based on spam scores. If one could do so, it would already be a
mainstay of SpamAssassin or similar scanners.
Spam scanning - if a message gets that far - pretty well has to be done one
message at a time, and doesn't necessarily tell you much of lasting value about
the server on the source IP that handed it to you.
OTOH - persistent arrivals from an IP that doesn't resolve, has no PTR or A
record, uses a mismatched or obviousy forged HELO, HELO's as your own box, may
very well benefit from a white/yellow/black list IP lookup before making remote
rDNS / forward/reverse lookup calls.
Our version of that uses lists categorized as white, black, and 'brown' (draw
the obvious inference!) and is manually populated from harvesting the Exim logs,
but seldom has even 100 entries on the largest of the lists.
In fact, if any list became really large, the CPU cycles and fs activity needed
to parse it would probably make the remote lookups 'lighter', even if not as fast.
The rest is too random, IP-wise. Spammers, who have been caught with as many as
40,000 zombified separate-IP WinBoxen under their control, work very hard to
make it so. Essentially all of those will fail a simple rDNS test, but so too,
will NetSol's cluster, home to thousands of SME domain virtual mx.
The only 'easy answer' I have is "closer scrutiny, less magic".