Re: [exim] Estimating spam deliveries

Top Page
Delete this message
Reply to this message
Author: Richard Clayton
Date:  
To: Exim Mailing List
Subject: Re: [exim] Estimating spam deliveries
In message <21F45ADA0BD2B14BA8A42E01@???>, Ian
Eiloart <iane@???> writes

>I was hoping someone might be able
>to point me at a statistical technique that I can apply through correlating
>deliveries and rejections, without examining individual messages.


You can only apply statistics if you know a "ground truth" about what is
arriving ... and then you can examine your processing to see how
efficient it is.

Unfortunately we don't know such a ground truth about incoming spam
levels (in fact there's little agreement as to how much spam there is
except that it's somewhere between 60 and 95 percent of incoming email)

You can create your own ground truth by labelling all the email you see
(or getting your users to do it for you) -- as already suggested.

You might benchmark your processing against one of the handful of
corpuses that exist (where someone else has labelled all the traffic),
but these corpuses age very quickly because, as we know, spam is
constantly evolving to evade our blocks :(

- -- 
richard                                              Richard Clayton


They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety.         Benjamin Franklin