On 11/17/2015 5:20 PM, Chris Siebenmann wrote:
>> I have also decided to stop using "deny" at the data ACL and instead
>> >either redirect to a webmaster alias or the bit bucket. I have reached
>> >the conclusion that denying spam or malware at data time doesn't
>> >accomplish anything useful. Using deny with a code at rcpt time has
>> >the feature of saving both internet bandwidth and server time, but
>> >after data, that damage is already done.
> One theoretical reason I can see to do deny-at-DATA for a small
> personal server is that it potentially sends a signal to large origin
> sources like GMail that something questionable is up with email from a
> particular account. I can imagine GMail looking for signals on outgoing
> email like an increased number of rejections.
>
> (For a large population, deny-at-DATA has the advantage that a sender
> who is the victim of a false positive at least knows about it, and in
> turn this may make adding more aggressive filtering more palatable to
> your users.)
>
> - cks
First, I need to correct a statement I made several times in this
thread. The spamhaus zen filter in my server's exim.conf is applied at
MAIL time, not RCPT.
Thanks for bringing up false positives, one of my favorite spam topics.
I have very strong opinions against aggressive spam filtering that
cheerfully tosses tons of ham to avoid delivering a pound of spam. (For
the same reason, I totally refuse to use stuff like captcha or its
equivalent approach on my web page contact form.)
Which is why I was perfectly willing to live with some snowshoe spam
rather than use any aggressive technique that risks loss of legitimate
messages.
Spamassassin does occasionally generate fairly high scores on perfectly
legitimate emails that could certainly produce false positives. Which is
why I am evolving toward a multilevel approach with respect to spam
scoring (with Accept at every level):
1) Very low spam score: Deliver to recipient mailbox without even a
spam score.
2) Low spam score: Deliver to recipient mailbox with the usual spam headers.
3) Likely spam: Re-route to a spam-dedicated mailbox for human analysis.
4) Certain spam: Accept and throw away.
Level 4 is reserved for email that fails specific test(s). For
instance, the custom SA rule I described in the initial message in this
thread.
Level 3 is for any "spam" that has the slightest possibility of being
ham. My thinking is that the result of analysis will be custom SA rules
that either direct future similar emails to either level 1 or level 4.
The recipient is charged with deciding how to treat level 1 or 2 false
negatives.
As to implementation, I will use this terrific exim-specific guide:
https://github.com/Exim/exim/wiki/ExiscanExamples
SA false positive story: I receive a daily email from a newspaper which
contains a dozen or so top headlines of the day. Together with each
headline is a one-sentence summary, and a smail graphic, plus a link to
the story on their web version of the newspaper. I had to use an SA
whitelist rule because SA was routinely giving these emails 8-plus spam
scores. (I use the SA 'whitelist_from_rcvd' rule.)
I am not much persuaded by your Gmail example. I think Gmail and other
large sources have better ways of finding abusers of their service.