[ On Tuesday, December 30, 2003 at 10:46:22 (-0800), Marc Perkel wrote: ]
> Subject: Re: [Exim] Inbound Hosts without valid rDNS
>
> Getting this all back to the original point - which I think was to try
> to detect spam. I have implemented this idea - doing revers DNS and name
> sever lookups and putting this "information" (or lack of information)
> into headers which are then sent to Spam Assassin.
Note that Exim already does exactly that, by default, automatically; and
as far as I know it always has done so too.
(though it doesn't do it quite correctly w.r.t. the syntax and semantics
of the header it places this information into :-)
> I then just let the
> Bayesian filter absorb it in my stream of spam/ham learning and there
> are a lot of messages that can be distinguished by this information -
> overalll - making the bayesian filter more accurate. It learns that host
> send only spam - what hosts send only ham and what hosts send both.
For strict Bayesian analysis it's not quite as effective as you might
hope, though if your filter's tokenizer does separate domain names into
their components, especially in the header section, and if you never get
e-mail from certain top level domains and/or ISPs that do have reverse
DNS and are only spam sources from your perspective.
I'm not sure about SpamAssassin, but Bogofilter does now have the
ability to separately track the wordlists derived from message headers
and message bodies and it should be more accurate at doing what you're
talking about, though I've yet to see proof of it having any concrete
effect with my own daily use of Bogofilter for all my personal e-mail.
--
Greg A. Woods
+1 416 218-0098 VE3TCP RoboHack <woods@???>
Planix, Inc. <woods@???> Secrets of the Weird <woods@???>