Re: [exim] Exim4 Anti-Image-Spam Program

Top Page
Delete this message
Reply to this message
Author: W B Hacker
Date:  
To: exim users
Subject: Re: [exim] Exim4 Anti-Image-Spam Program
Craig Whitmore wrote:
> Hi
>
> This may be a little off topic, but its an exim only program.
>
>
> I have released a new version of my exim-only Anti-Image-Spam program
> loosely based on FuzzyOCR (thus called nsfo (Not So Fuzzy OCR). I wrote it
> really quickly a few months ago, as FuzzyOCR works well, but It had to
> install SpamAssassin (which I wasn't using) (I was using dspam instead)
>
> I have been using it in production for quite a while now with no false
> positives now and it picks over 40,000 image spams per day from our email
> servers. (and still misses quite a few tricky image spams which will be
> fixed in the future). It is not very CPU intensive on the servers I look
> after, but maybe on other people's it will be.
>
> Have a look on:
>
> http://www.spam.co.nz/nsfo
>
> Thanks
> Craig
> http://www.spam.co.nz
>
>
>


Craig,

'image' spam is becoming an increasing problem for us - but nothing like as high
a percentage of it gets through (so far...) as to cause us more than minor
annoyance.

Looking to the future, however, three questions:

A) Do I understand that your method works with DSpam or otherwise does NOT
require SA?

B) Do you perchance have any statistics, specifically and only from the '40,000
per day' that your software is nailing, as to how many of these were also/might
have been rejected by other means, to wit:

- forward/reverse DNS failure or missing PTR record

- dynamic-IP source RBL hit or calculation

- Other RBL hit

- HELO mismatched to DNS for the connecting IP, and/or not a valid FQDN

- tolerance for at least one 30-second delay

- violation of smtp sync

- attempting to pipeline when pipelining was not advertised

C) Any evidence of 'False Positive' hits? i.e. - it is dirt-simple to reject ALL
graphics attachments, unless from 'whitelisted' sources, but that is a whole
'nuther issue.

No interest in 'hits' that might have failed an SA test (FuzzyOCR or
otherwise.), as the goal here is to NOT invoke SA unless we are otherwise
willing to accept - which historically is less than 11% of all offered traffic.

Willing to help analyze if the info has been saved - or could be.

Note that either/both 'log_selector = +all' and an 'instrumented' configure file
(many 'warn' verbs with copious log_message and logwrite to dump variables and
such) might be required to ID the above 'other reasons', especially if they were
not cause of rejection before OCR checking - but might have been.

Thanks & regards,

Bill Hacker