[pcre-dev] Is it possible to use UTF-8 literal characters in a C/C++ PCRE regex?

Author: Frank Chang
Date:
To: pcre-dev
Subject: [pcre-dev] Is it possible to use UTF-8 literal characters in a C/C++ PCRE regex?

Good afternoon, We are trying to match the German string. Munich

tausendschöne Jungfräulein ausendschçne, using a C/C++ PCRE regex with
PCRE_UTF8, PCRE_UCP, PCRE_CASELESS options activated which uses the UTF-8
literals, ö, ä, ç Is it possible to construct a valid PCRE regex which uses
the UTF-8 literals ö or ä or ç without using codepoints?
        We are able to match the German string, Munich tausendschöne
Jungfräulein ausendschçne, with a PCRE regex which uses positive lookahead
and a sequence of multiple UTF-8 codepoints . For
example,(?=.+(\x{0068}\x{00F6})){1}. However, when we add any of the UTF-8
literals, ö, ä, ç into the PCRE regex , pcre_compile() complains about
invalid UTF-8 regex string. Thank you

This message is part of the following thread:
	the complete thread tree sorted by date

	Giuseppe D'Angelo at

[pcre-dev] Is it possible to use UTF-8 literal characters in…