[pcre-dev] [Bug 897] \w and others based on Unicode properti…

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: pcre-dev
Old-Topics: [pcre-dev] [Bug 897] New: \w and others based on Unicode properties
Subject: [pcre-dev] [Bug 897] \w and others based on Unicode properties
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=897




--- Comment #7 from Philip Hazel <ph10@???> 2009-12-16 17:28:04 ---
On Wed, 16 Dec 2009, Pavel Kostromitinov wrote:

> One more difficulty came to my mind: how should I make not only \w,
> but also [\w] work?


A good question! Let me see... does [\p{L}] work? Ah yes, it does. So
the answer is to convert it into the equivalent list of properties. That
requires work in pcre_compile.c.

> Not mentioning that \w is a combination of ucp_L, ucp_N, underscore and I
> wonder what else...


It *should* just be "letters, digits, and underscore", but I am not sure
exactly how Perl 5.10 treats it. Some experiments are needed to check
this, because the ideal is to be compatible with Perl.

Philip


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email