Re: [pcre-dev] Order Insensitive PCRE regexes for matching m…

Top Page
Delete this message
Author: Frank Chang
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] Order Insensitive PCRE regexes for matching multiple UTF-8 codepoints that are far apart does not function precisely
Good morning, We wrote a order insensitive regex .
'(?=.+(\x{00F6})){1}(?=.+(\x{00E4})){1}' that appears to function
correctly. Thank you.

On Mon, Jun 25, 2012 at 6:32 PM, Frank Chang <frankchang91@???> wrote:

> Good evening, We are using C/C++ PCRE 8.30 with PCRE_UTF8 | PCRE_UCP |
> PCRE_COLLATE.Here's an order-insensitive
> regex: '(?=.*\x{00F6})(?=.*\x{00E4})' It tries to use uses ?= or *positive
> lookahead* to make sure both UTF-8 code points are matched in either
> order. PCRE_compile() returns OK and PCRE_execute() returns OK on the
> string DAS tausendschöne Jungfräulein . In hex, it is 44 41 53 20 74 61
> 75 73 65 6E 64 73 63 68 C3 B6 6E 65 20 4A 75 6E 67 66 72 C3 A4 75 6C 65 69
> 6E.
>       However, GetMatchStart() returns 0 and GetMatchEnd() returns 0
> instead of GetMatchStart() = 14 and GetMatchEnd() = 27 which we obtain when
> we use the PCRE '\x{00F6}.*\x{00E4}' regex.
>      Please advise us if it is possible to do order insensitive matching
> of multiple UTF-8 code points in a PCRE regex. THank you.

>