[pcre-dev] Order Insensitive PCRE regexes for matching multi…

Top Page
Delete this message
Author: Frank Chang
Date:  
To: pcre-dev
Subject: [pcre-dev] Order Insensitive PCRE regexes for matching multiple UTF-8 codepoints that are far apart does not function precisely
Good evening, We are using C/C++ PCRE 8.30 with PCRE_UTF8 | PCRE_UCP |
PCRE_COLLATE.Here's an order-insensitive
regex: '(?=.*\x{00F6})(?=.*\x{00E4})' It tries to use uses ?= or *positive
lookahead* to make sure both UTF-8 code points are matched in either order.
PCRE_compile() returns OK and PCRE_execute() returns OK on the string DAS
tausendschöne Jungfräulein . In hex, it is 44 41 53 20 74 61 75 73 65 6E
64 73 63 68 C3 B6 6E 65 20 4A 75 6E 67 66 72 C3 A4 75 6C 65 69 6E.
      However, GetMatchStart() returns 0 and GetMatchEnd() returns 0
instead of GetMatchStart() = 14 and GetMatchEnd() = 27 which we obtain when
we use the PCRE '\x{00F6}.*\x{00E4}' regex.
     Please advise us if it is possible to do order insensitive matching of
multiple UTF-8 code points in a PCRE regex. THank you.