Re: [exim] Estimating spam deliveries

Top Page
Delete this message
Reply to this message
Author: Alun
Date:  
To: Ian Eiloart
CC: exim-users
Subject: Re: [exim] Estimating spam deliveries
Ian Eiloart <iane@???> said, in message
3768745AE5793AAFBA73DEE7@???:
>
> Intuitively, if I see peaks in message delivery that correspond with
> peaks in spam rejection, then I might assume that the former peaks
> are at least partly explained by leaky spam filters. If, on the other
> hand, message delivery rates are independent of spam rejection rates,
> then I might conclude that my filters are not leaky.


Hi Ian,

This sounded interesting so I had a little play with my logs. It's an
interesting approach, but I think you've probably just got too little
information to make anything other than vague generalisations.

Depending on your reporting interval you can end up with wildly
different correlation coefficients - presumably a larger reporting
interval (days rather than hours) gives a coefficient that's better
when it comes to removing diurnal trends, but then you end up with less
data points with which to generate a coefficient and you can't track
changes in the perfomance of your filters so well.

Taking the past 6 weeks, I get the following from our logs:

Correlation coefficient by day = 0.268
Correlation coefficient by hour = 0.179

Neither shows any great correlation, so I guess I can assume my filters
are doing well. Or are they?

I wonder if it's possible to do something with your (presumably) known
distinction between internally and externally generated e-mails. If
most of your institution's correspondence is with people in the same
timezone and your internal users don't generate spam then is it
reasonable to assume that internally generated e-mails should correlate
strongly with accepted external mail? If you're letting through lots of
spam then would it weaken this correlation?

Doing this by hour, and taking one week in September:

a) Correlation between internal and external accepted = 0.881
b) Correlation between internal and external rejected = 0.455
c) Correlation between internal and all external = 0.517

Complete speculation, but if a) and c) above were close to equal
then it would suggest to me that your spam filter was leaking badly.

Cheers,
Alun.

-- 
Alun Jones                       auj@???
Systems Support,                 (01970) 62 2494
Information Services,
University of Wales, Aberystwyth