[exim] Advanced regexp question

Top Page
Delete this message
Reply to this message
Author: jori.hamalainen
Date:  
To: exim-users
Subject: [exim] Advanced regexp question
I need to build RFC2047 encoded text length evaluator via PCRE. I know that newest Exim (4.66) has ${rfc2047d:[string]} expansion. But upgrading Exim is too much of a hassle for just that purpose.

Is there anyway to solve this?

RFC2047 syntax for quoted-printable: =?utf-8?Q?foobar=7C=7C?=

I need a length of this string decoded: like ${strlen:foobar||}

Partial result can be done with nested ${sg}, but that is not correctly working..
${strlen:${sg{${sg{$header_Subject:}{\N\=\?.*?\?[Qq]\?(\S*?)\?\=\N}{\$1}}}{\N\=[a-fA-F0-9]{2}\N}{#}}}

- for subject line "=?utf-8?Q?foobar=7C=7C?=" it gives the correct length
- because it matches also actual subject "foobar=7C=7C" to "foobar##" and returns a wrong length of string (4 chars too short).

So I need a nested regexp with match of "=?(.*?)?[qQ]? [string with escape chars] ?=", and all those "=[HEX]" turned into single char, because I just need a length of string, not actual decode.

My experiments on sub patterns has failed so far. My purpose has been to evaluate outer RFC2047 syntax, then char-by-char go through encoded text and replace =[HEX] with a single char. Even all chars could be replaced to '#' or similar char as actual decode is not needed.

Small expansion to above would be changing "?B?" (Base64 encoded RFC2047) also that 4 chars is replaced by 3 chars.

Is anyone familiar enough in PCRE to give hint/clue/actual answer..

Thanks in advance,
Jori