On Fri, Jul 25, 2008 at 04:23:52PM +0200, Schramm, Dominik said:
> Hi Stephen,
>
> Stephen Gran on Friday, July 25, 2008 4:08 PM:
>
> > [...]
> > warn condition = ${if match
> > {$h_Subject:}{\N[^[:print:]]{8}\N}}
> > set acl_c1 = ${eval:$acl_c1+5}
> > log_message = mailer is not RFC 2047 compliant
> > [...]
> >
> > So, great - it matches unencoded text, but not encoded text.
>
> have a look at the following:
>
> http://docs.exim.org/current/spec_html/ch11.html#SECTexpansionitems
>
> and check out the difference between
>
> $h_<header name>: and $rh_<header name>:
>
> *Maybe* that is the problem here.
Yes, that seems like it - thanks. I've been staring at it long enough
that I was probably no longer thinking clearly.
> > [...]
> > Any cluebats handy? Is this even a good idea?
>
> Well, I like such things, but what are you trying to accomplish?
> What does this regular expression catch? (Why eight non-printable
> characters at the beginning?)
At the beginning mostly because I wanted to avoid matching the rfc2047
begin block. I chose 8 as a reasonable sized set to match on, so that
I didn't get too upset about someone sending '© 2008' as a subject or
something (technically not compliant, but a common enough type of
mistake in legitimate email). Other than that, 8 is a mostly arbitrary
number.
More generally, I am seeing a lot of foreign language spam with 8-bit
data in the headers not encoded according to rfc2047, and I have a fair
number of users who do legitimately speak and write emails in character
sets other than us-ascii. So I'm looking for a way to distinguish the
junk from the output of a real MUA - I'm not convinced I've found it,
but it's a first stab at it.
Thanks again,
--
--------------------------------------------------------------------------
| Stephen Gran | : - cut in regexps I don't think we |
| steve@??? | reached consensus on that. We're still |
| http://www.lobefin.net/~steve | backtracking... -- Larry Wall in |
| | <199710291922.LAA07101@???> |
--------------------------------------------------------------------------