Re: [pcre-dev] Is it possible to use UTF-8 literal character…

Page principale
Supprimer ce message
Auteur: Giuseppe D'Angelo
Date:  
À: Frank Chang
CC: pcre-dev
Sujet: Re: [pcre-dev] Is it possible to use UTF-8 literal characters in a C/C++ PCRE regex?
On 28 June 2012 19:00, Frank Chang <frankchang91@???> wrote:
> Good afternoon, We are trying to match the German string. Munich
> tausendschöne Jungfräulein ausendschçne, using a C/C++ PCRE regex with
> PCRE_UTF8, PCRE_UCP, PCRE_CASELESS options activated which uses the UTF-8
> literals, ö, ä, ç Is it possible to construct a valid PCRE regex which uses
> the UTF-8 literals ö or ä or ç without using codepoints?


It *is* possible, but you must ensure that the execution charset of
your compiler is set to properly output UTF-8 sequences. Is it the
case? Try getting an hex dump of the string literal you're passing to
pcre_compile (eventually, try looking at the assembler output).

Cheers,
--
Giuseppe D'Angelo