Re: [pcre-dev] pcre not matching unicode characters

Top Page
Delete this message
Author: ph10
Date:  
To: Jayaprakasam, Kannan
CC: pcre-dev@exim.org
Subject: Re: [pcre-dev] pcre not matching unicode characters
On Thu, 17 Oct 2013, Jayaprakasam, Kannan wrote:

> Resending my question as I'm still stuck on this.
>
> From: Jayaprakasam, Kannan
> Sent: Tuesday, August 20, 2013 3:27 PM
> To: 'pcre-dev@???'
> Subject: pcre not matching unicode characters
>
>
> I'm compiling a pcre pattern with utf8 flag enabled and am trying to
> match a utf8 char* string against it, but it is not matching and
> pcre_exec returns negative. I'm passing the subject length as 65 to
> pcre_exec which is the number of characters in the string. Please
> help/


Has anybody replied to you? I have been offline since October 17th
because of a broken telephone connection (copper cable thieves).

First thing: is 65 the number of *characters*? What you should pass is
the number of *bytes*.

> (If I try without the flag PCRE_UTF8 however, it matches but the
> offset vector[1] is 30 which is index of the character just before a
> unicode character in my input string)


You *must* set PCRE_UTF8 in the pcre_compile options if you are working
with UTF-8 strings.

Philip

--
Philip Hazel