[pcre-dev] [Bug 865] \b Does not work for non ascii characte…

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 865] \b Does not work for non ascii characters in UTF-8
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=865

Philip Hazel <ph10@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX





--- Comment #1 from Philip Hazel <ph10@???> 2009-07-20 09:54:05 ---
This behaviour is fully documented. See, for example, this comment in a section
entitled "General comments about UTF-8 mode" in the "pcre" man page:

       6. The character escapes \b, \B, \d, \D, \s, \S, \w, and  \W  correctly
       test  characters of any code value, but the characters that PCRE recog-
       nizes as digits, spaces, or word characters  remain  the  same  set  as
       before, all with values less than 256. This remains true even when PCRE
       includes Unicode property support, because to do otherwise  would  slow
       down  PCRE in many common cases. If you really want to test for a wider
       sense of, say, "digit", you must use Unicode  property  tests  such  as
       \p{Nd}.  Note  that  this  also applies to \b, because it is defined in
       terms of \w and \W.      



--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email