Author: ph10 Date: To: Jayaprakasam, Kannan CC: pcre-dev@exim.org Subject: Re: [pcre-dev] pcre not matching unicode characters
On Thu, 17 Oct 2013, Jayaprakasam, Kannan wrote:
> Resending my question as I'm still stuck on this.
>
> From: Jayaprakasam, Kannan
> Sent: Tuesday, August 20, 2013 3:27 PM
> To: 'pcre-dev@???'
> Subject: pcre not matching unicode characters
>
>
> I'm compiling a pcre pattern with utf8 flag enabled and am trying to
> match a utf8 char* string against it, but it is not matching and
> pcre_exec returns negative. I'm passing the subject length as 65 to
> pcre_exec which is the number of characters in the string. Please
> help/
Has anybody replied to you? I have been offline since October 17th
because of a broken telephone connection (copper cable thieves).
First thing: is 65 the number of *characters*? What you should pass is
the number of *bytes*.
> (If I try without the flag PCRE_UTF8 however, it matches but the
> offset vector[1] is 30 which is index of the character just before a
> unicode character in my input string)
You *must* set PCRE_UTF8 in the pcre_compile options if you are working
with UTF-8 strings.