Re: [exim] Why does this regex not match?

Startseite
Nachricht löschen
Nachricht beantworten
Autor: W B Hacker
Datum:  
To: exim users
Betreff: Re: [exim] Why does this regex not match?
Robert Nicholson wrote:
> So I've got mail that has headers like this
>
> From: "=?windows-1255?B?4+X46fog+ezp5S349Oz38eXs5eLp+g==?="
> <info@???>
>
> Subject:
> =?windows-1255?B?7PHl6/j66e0g5ezu6+Dl4ekg+OLs6entIC0g7uXu7uz1IOHp5fr4?=
>
> and so I have regular expressions like this
>
> $header_from: does not match "\\s*=\\?(ks_c_5601-|big5|euc-|shift-jis|
> (iso.\{0,4\}639-)|hkscs|sil|koi[78]|iscii|guobiao|gb2312|gb18030|(iso.\
> {0,4\}2022)|(iso.\{0,4\}8859-[57])|(windows-1251)|(windows-1255))" and
> $header_content-type: does not match "\\s*=\\?(ks_c_5601-|big5|euc-|
> shift-jis|(iso.\{0,4\}639-)|hkscs|sil|koi[78]|iscii|guobiao|gb2312|
> gb18030|(iso.\{0,4\}2022)|(iso.\{0,4\}8859-[57])|(windows-1251)|
> (windows-1255))" and
> $header_subject: does not match "\\s*=\\?(ks_c_5601-|big5|euc-|shift-
> jis|(iso.\{0,4\}639-)|hkscs|sil|koi[78]|iscii|guobiao|gb2312|gb18030|
> (iso.\{0,4\}2022)|(iso.\{0,4\}8859-[57])|(windows-1251)|(windows-1255))"
>
> save $home/Maildir/.INBOX.intray.backup/
>
> Now whilst this regular expression is quite convoluted even if I just
> put $header_from: does not match "windows" it still doesn't match
>
> Can anybody tell me why that's the case?


You would need to seek a partial match with a sliding window,
effectively a 'contains' on any substring, not an exact match.

That's potentially a lot of CPU cycles if you parse all traffic.

>
> In my case I don't want to copy mail with foreign charsets etc into
> the backup folder which is why copying it is conditional that it
> doesn't match those patterns.
>
>
>
>


IF you could post the ENTIRE set of headers, I strongly suspect we could
help you identify simpler means of selecting on such traffic - perhaps
without need of ANY regex parsing.

To date, you haven't even told us if 'on inspection' these messages have
turned out to be totally unwanted, or if all/some are actually
legitimate traffic for members of your user community who just happen to
be multilingual.

Bill