Hi there,
I'm now happily using the debian exim4-daemon-heavy package that
includes the exiscan patch. It runs Spamassassin and is doing a pretty
good job of splitting spam and non-spam into their respective categories.
For debugging purposes, I'm adding headers to the email like this:
# Spamassassin config,
# Doc: http://duncanthrax.net/exiscan-acl/exiscan-acl-spec.txt
# put headers in all messages (no matter if spam or not)
warn message = X-Spam-Score: $spam_score ($spam_bar)
spam = mail:true
warn message = X-Spam-Flag: YES
spam = mail
warn message = X-Spam-Report: $spam_report
spam = mail
deny message = Spammy message detected
spam = mail:true
condition = ${if >{$spam_score_int}{150}{1}{0}}
That means that every message has the spam score attached, and that if
it's spam it also has a X-Spam-Flag header added, and an X-Spam-Report
header added, detailing why it's spam.
I used to have SpamAssassin set with report_safe set to 1, which would
create a new message with the report and add the other one as an
attachment, but I understand that doesn't work with the exiscan patch?
Anyhow, my question is this:
If I take these modified messages and try to use sa-learn to train my
bayesian filters, will the fact these have modified headers, including
te spam report skew the bayesian algorithm? Spamassassin's docs say
that when you rewrite the message using 'report_safe 1', that sa-learn
is smart enough to disregard the message, and instead to look at the
attachment, but what about this case? Does anybody have hints on how to
use spamassassin's bayesian abilities with the exiscan patch?
Ben