Re: [pcre-dev] Is it possible to use UTF-8 literal character…

Góra strony
Delete this message
Autor: Giuseppe D'Angelo
Data:  
Dla: Frank Chang
CC: pcre-dev
Temat: Re: [pcre-dev] Is it possible to use UTF-8 literal characters in a C/C++ PCRE regex?
On 28 June 2012 19:00, Frank Chang <frankchang91@???> wrote:
> Good afternoon, We are trying to match the German string. Munich
> tausendschöne Jungfräulein ausendschçne, using a C/C++ PCRE regex with
> PCRE_UTF8, PCRE_UCP, PCRE_CASELESS options activated which uses the UTF-8
> literals, ö, ä, ç Is it possible to construct a valid PCRE regex which uses
> the UTF-8 literals ö or ä or ç without using codepoints?


It *is* possible, but you must ensure that the execution charset of
your compiler is set to properly output UTF-8 sequences. Is it the
case? Try getting an hex dump of the string literal you're passing to
pcre_compile (eventually, try looking at the assembler output).

Cheers,
--
Giuseppe D'Angelo