Re: [exim] Estimating spam deliveries

Top Page
Delete this message
Reply to this message
Author: Ian Eiloart
Date:  
To: Keith Edmunds
CC: exim-users, Derrick
Subject: Re: [exim] Estimating spam deliveries


--On 16 October 2007 13:07:58 +0100 Keith Edmunds <kae@???>
wrote:

> On Tue, 16 Oct 2007 12:30:32 +0100, iane@??? said:
>
>> I'm trying to estimate the number of spam messages that have
>> gone undetected. This just tells me about the number that were detected.
>
> I'm not sure what you really want. Earlier you said: "I was hoping someone
> might be able to point me at a statistical technique that I can apply
> through correlating deliveries and rejections, without examining
> individual messages."
>
> That's pretty much what Spamassassin does.


No, spamassassin examines messages.

> What you seem to be asking is
> "Is there an automated way of detecting mails that are definitely Spam but
> which Spamassassin has not marked as Spam?". If you find such a
> way, I'm sure a lot of people would be interested!



No, I'm not trying to do that at all. I'm trying to measure the quality of
my spam filtering service. That will tell me how much effort I should
expend in trying to improve it.

The simplest method would be to sample delivered messages, and count the
spam. However, that method has several drawbacks. It can't be automated
(otherwise, I'd have built a spam filter), and I can't do it
retrospectively because my logs don't contain enough information.

Instead, I'm hoping that a statistical correlation between spam rejection
rates and message delivery rates will help me to estimate the quantity of
spam that I actually deliver to users.

Intuitively, if I see peaks in message delivery that correspond with peaks
in spam rejection, then I might assume that the former peaks are at least
partly explained by leaky spam filters. If, on the other hand, message
delivery rates are independent of spam rejection rates, then I might
conclude that my filters are not leaky.

Of course, other explanations for a correlation plausible: perhaps spammers
are active at the same time that my users and their correspondents are.
Perhaps I'm leaking a different type of spam than the spam that I'm
rejecting.


--
Ian Eiloart
IT Services, University of Sussex
x3148