[pcre-dev] [Bug 822] Word boundary after arabic character doesn't match

Auteur: Philip Hazel
Date:
À: pcre-dev
Sujet: [pcre-dev] [Bug 822] Word boundary after arabic character doesn't match

------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=822

Philip Hazel <ph10@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID

--- Comment #1 from Philip Hazel <ph10@???> 2009-03-18 16:34:13 ---
This is as specified in the pcre.3 man page in the section entitled "General
comments about UTF-8 mode":

       6.  The  character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
       test characters of any code value, but the characters that PCRE  recog-
       nizes  as  digits,  spaces,  or  word characters remain the same set as
       before, all with values less than 256. This remains true even when PCRE
       includes  Unicode  property support, because to do otherwise would slow
       down PCRE in many common cases. If you really want to test for a  wider
       sense  of,  say,  "digit",  you must use Unicode property tests such as
       \p{Nd}.

I will add extra words to point out the this affects \b as well, because \b is
defined as the boundary between a character than matches \w and one that does
not.

--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

Ce message fait partie du fil suivant :
	Arborescence complète du fil triée par date
	Mirko Vogel à

[pcre-dev] [Bug 822] Word boundary after arabic character do…