Re: [exim] Extending greylisting

Author: Alun
Date:
To: exim-users
Subject: Re: [exim] Extending greylisting

Richard Clayton <richard@???> said, in message
DczzBDxtvmzDFAeD@???:

> >I've had an idea that could make greylisting more useful
> >in the presence of spammers that retry. I thought I'd publicise
> >it somewhere to see what people think.
>
> why not on an anti-spam mailing list ? here it's hit and miss whether
> you get good advice or not :(

True. Sorry for the noise. I believe in only submitting stuff to a
list after you've been on it for a while, and I'm not on any anti-spam
lists per-se. This list *does* have a lot of (strictly) off topic stuff
about handling spam, much of which only relates to Exim in the
implementation. My implementation runs from ACLs through "readsocket"
and my "exim_sockd.pl" script and, if my tests show promise, I will
put the sources on my exim anti-spam page (users.aber.ac.uk/auj/spam/).
Are we on topic yet?

> of course they are, too many people have deployed it. It was only
> ever a bodge -- and it has a number of problems when the sender
> doesn't operate in a mainstream manner :(

By "mainstream", you mean "RFC-compliant", yes? In 2 years of running
my (slightly non-standard) greylisting system I've had less than 30
queries from people whose legitimate mail was blocked by it. In most
cases I've whitelisted their server and that's been the end of their
worries.

> >Now, when a host retries, you can query its (attempted) submission
> >history to get an idea of its intentions.
>
> What you will do is to block major ISP mail systems, because they have
> insecure customers and are therefore sending you a lot of spam.
>
> Unfortunately they are also sending you a lot of good stuff as well,
> but you will assume that it is spam.

Nope. Since they're sending a lot of good stuff, the ratio bad/(bad+good)
will be low unless they're sending much more bad than good stuff. OK, this
will be coloured by the fact that I'm only looking at new sender/recipient
pairs at any given time, so it's possible that a flood of spam from an
established major ISP would skew the ratio for an hour, but then, during
that hour maybe I *should* be lowering the bar a bit?

Alternatively, I could include some backoff factor based on the "established
relationships" - valid sender/recipient pairs that aren't new for this IP
address - they're in the database, after all.

Or I could just forget the problem, as I'm only talking about adding a few
SpamAssassin points (automatically), so legitimate mail should still get
through unless it's "almost spam".

> >In fact, any host with a high count of new addresses in the past
> >few minutes may well be suspect.
>
> or lots of your students have joined the same mailing list :(

... all at once, in the space of an hour? If I saw that, I'd tend to suspect
that they'd been subscribed against their will. It happens, most recently
on 27th December when someone appears to have subscribed most of our
undergrads to a mailing list. We received and accepted > 30,000 unsolicited
mails from an otherwise (apparently) legitimate domain. Either way, it may
be worth a few SpamAssassin points for that hour, just in case, surely?

> or major.isp has moved their email server to a new IP address....

So for the first hour or so we look rather harder at their mail
submissions, yes? I can't see how this is a problem, unless major.isp
is submitting lots of mail that's almost-but-not-quite-spam.

I can see I shouldn't have used the word "greylist" in the original message!
You could use something based on this quite happily without ever deferring
any messages. Without greylisting, you don't get a chance to see their
intentions before accepting messages from them, but you could still use the
generated "reputation" of the host to decide what to do about future stuff.
Or you could defer every new sender/recipient just once to see what happens
- sort of a "thin" greylist - which might let you expose the submitter's
intentions doing the full greylist thing.

Oh well, I'll shut up now - the off- and on-list mails I've received do
tend to suggest that it's considered a bad idea. I'm still going to
experiment and see whether useful inferences can be drawn from the data.
If I come up with anything I consider effective I'll release the exim
implementation.

Cheers,
Alun.

-- 
Alun Jones                       auj@???
Systems Support,                 (01970) 62 2494
Information Services,
University of Wales, Aberystwyth

This message is part of the following thread:
	the complete thread tree sorted by date
	Richard Clayton at