Author: jori.hamalainen Date: To: exim-users Subject: [exim] Advanced regexp question
I need to build RFC2047 encoded text length evaluator via PCRE. I know that newest Exim (4.66) has ${rfc2047d:[string]} expansion. But upgrading Exim is too much of a hassle for just that purpose.
Is there anyway to solve this?
RFC2047 syntax for quoted-printable: =?utf-8?Q?foobar=7C=7C?=
I need a length of this string decoded: like ${strlen:foobar||}
Partial result can be done with nested ${sg}, but that is not correctly working..
${strlen:${sg{${sg{$header_Subject:}{\N\=\?.*?\?[Qq]\?(\S*?)\?\=\N}{\$1}}}{\N\=[a-fA-F0-9]{2}\N}{#}}}
- for subject line "=?utf-8?Q?foobar=7C=7C?=" it gives the correct length
- because it matches also actual subject "foobar=7C=7C" to "foobar##" and returns a wrong length of string (4 chars too short).
So I need a nested regexp with match of "=?(.*?)?[qQ]? [string with escape chars] ?=", and all those "=[HEX]" turned into single char, because I just need a length of string, not actual decode.
My experiments on sub patterns has failed so far. My purpose has been to evaluate outer RFC2047 syntax, then char-by-char go through encoded text and replace =[HEX] with a single char. Even all chars could be replaced to '#' or similar char as actual decode is not needed.
Small expansion to above would be changing "?B?" (Base64 encoded RFC2047) also that 4 chars is replaced by 3 chars.
Is anyone familiar enough in PCRE to give hint/clue/actual answer..