[pcre-dev] [Bug 897] \w and others based on Unicode properties

Author: Philip Hazel
Date:
To: pcre-dev
Old-Topics: [pcre-dev] [Bug 897] New: \w and others based on Unicode properties
Subject: [pcre-dev] [Bug 897] \w and others based on Unicode properties

------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=897

--- Comment #17 from Philip Hazel <ph10@???> 2010-03-22 09:18:28 ---
On Fri, 19 Mar 2010, Pavel Kostromitinov wrote:

> Is there any hope that patches for this feature will make their way into main
> pcre version? I guess there'll be more people who could appreciate this.

Somebody was working on patches a few months ago, but I have heard no
more. The patches were rather large.

I am aware that this is an issue and I will think about it at some
point, but please do not hold your breath. It will not be soon.

> [^\pL\pN] can not be used, since the set is either inclusive or exclusive - and
> if I have [\Wa-f] as input, I cannot convert it ti something
> like[(^\pL\pN)a-f]...

Sadly, no. It would have to become (?:[^\pL\pN]|[a-f]) and I agree this
is not very nice. But it may be the best way to do it ... one of my
thoughts about how to provide this feature automatically is to do
conversions like that internally at compile time rather than having lots
of changes in the exec-time code.

> And there are problem with pcre_study.c.

What problems?

> Also - there should be some hack for \t to be treated as \s (it is NOT \pZ)

Do you mean that you do want it to match \pZ? Or you do want \t to still
match \s when \w, \d, etc are using Unicode properties?

Philip

--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

This message is part of the following thread:
	the complete thread tree sorted by date
	Philip Hazel at
	Pavel Kostromitinov at

[pcre-dev] [Bug 897] \w and others based on Unicode properti…