Re: [pcre-dev] PCRE with UTF-8

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: Manohar S
CC: pcre-dev
Subject: Re: [pcre-dev] PCRE with UTF-8
On Thu, 26 Jun 2008, Manohar S wrote:

> I have attached the actual string for which ovector is not filled up
> properly.
> It seems my UTF-8 characters are not shown in the mail properly.
> Please find the attachment with proper UTF-8 text.


When I run your pattern and string through pcretest, it works fine:

PCRE version 7.7 2008-05-07

/[\'\"][\x{80}-\x{ffff}a-zA-Z0-9]+[\'\"];/8
select * from account where a = 'ਠਡਢà²à²à²µà²·à²¡à²¢à²£à²¤à²¥à²µà²·à²¡à²¢à²£à²¤à²¥';
0: '\x{a20}\x{a21}\x{a22}\x{c89}\x{c89}\x{cb5}\x{cb7}\x{ca1}\x{ca2}\x{ca3}\x{ca4}\x{ca5}\x{cb5}\x{cb7}\x{ca1}\x{ca2}\x{ca3}\x{ca4}\x{ca5}';

(It always shows the captured strings using escapes to avoid display
problems.) So something in your code is not working properly. It is
filling ovector correctly for pcretest.

Philip

--
Philip Hazel