Autor: William Warren Data: A: exim-users Assumpte: [exim] [OT] Getting a corpus for Spamassassin tuning
Thanks for reading this. I know it's ot, so I'll keep it short.
My customer is pleased with the results of filtering their email
through Exim: I have dnsbl's active and I'm doing rnds checks, so
they've seen a dramatic reduction. However, Spamassassin isn't
marking as many of the ones that get through as I'd like, and I'm
frustrated trying to obtain a training corpus from the users.
I've set up a special mailbox that users may forward spam to, but it's
only collected 80 items in three weeks, and the Spamassassin docs say
that I need at least 1,000 to get a good corpus. Of course, I also
need "ham" emails to make it work.
Here are my questions:
1. Do publicly available corpi (corpuses?) provide good results?
2. Is 1,000 a magic number? What's the "knee point" in the curve?
3. What tricks and techniques have list members used to convince
users to contribute to the spam pile?