Συντάκτης: Peter Bowyer Ημερομηνία: Προς: Exim, Users Αντικείμενο: Re: [exim] Greylisting algorithms after end of DATA
On 13/01/07, Magnus Holmgren <holmgren@???> wrote: > Traditional greylisting combines the remote host, envelope sender, and
> envelope recipient and checks if that triplet has been seen before (not too
> long ago but also at least some time ago) after each RCPT command. (Correct
> me if I'm wrong.) The advantage is that it saves bandwidth.
Saves resources. Bandwidth is one resource, but probably not the
primary one in this context.
> Running SpamAssassin after end of DATA but before accepting the mail gives the
> advantage that greylisting can be applied only to grey mail - the delaying of
> clearly non-spam mail can be avoided.
But for most people, running SA is the most expensive test they do,
and they move it to last place in the chain for this reason.
Greylisting is seen as a cheap way of turning away likely spam without
having to go to the expense of content-scanning it. If SA is involved
in the greylisting algorithm, the resource saving it delivers is
significantly reduced. That is, unless the resulting improvement in
the algorithm leads to better whitelisting and less SA work later.
> It also means that e.g. the Message-ID
> can be considered when determining whether we have seen the message before.
Does this have any correlation to whether the message is spam or not?
If not, I'm not sure it helps....
> In fact, nothing prevents us from using an arbitrary set of header fields
> (such as Subject, Message-ID, From) in constructing the key, if it gives
> better confidence in what we want to know: whether the other end retries
> after a temporary failure. (We could even accept delivery and whitelist based
> on a partial match, say 3 of 4, to better cope with the braindead mail
> servers that unfortunately exist.) After we have determined that it does,
> there's no reason to greylist further mail. (Well, there might be a reason to
> delay mail from new senders at large ESPs like Hotmail, if that means that
> URIs in the spam get the time to end up in URIBLs. This is open to
> discussion.)
Hmm. I can't see what aspect of traditional triplet-based greylisting
you're improving with this. It seems to be the automatic whitelisting
after a successful retry - but which aspects of the sender are you
then able to whitelist more accurately as a result? Especially since
the use of SA has added cost, you'd need to be clearly saving cost
somehow.
This is probably a really good idea, but I'm not getting it yet....