[pcre-dev] [Bug 865] \b Does not work for non ascii characte…

Top Page
Delete this message
Author: Mark de Does
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 865] \b Does not work for non ascii characters in UTF-8
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=865




--- Comment #2 from Mark de Does <mark@???> 2009-07-20 18:09:01 ---
Sorry for reporting this.

Mark


On Mon, 2009-07-20 at 09:54 +0100, Philip Hazel wrote:
> ------- You are receiving this mail because: -------
> You reported the bug.
>
> http://bugs.exim.org/show_bug.cgi?id=865
>
> Philip Hazel <ph10@???> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|NEW                         |RESOLVED
>          Resolution|                            |WONTFIX

>
>
>
>
> --- Comment #1 from Philip Hazel <ph10@???> 2009-07-20 09:54:05 ---
> This behaviour is fully documented. See, for example, this comment in a section
> entitled "General comments about UTF-8 mode" in the "pcre" man page:
>
>        6. The character escapes \b, \B, \d, \D, \s, \S, \w, and  \W  correctly
>        test  characters of any code value, but the characters that PCRE recog-
>        nizes as digits, spaces, or word characters  remain  the  same  set  as
>        before, all with values less than 256. This remains true even when PCRE
>        includes Unicode property support, because to do otherwise  would  slow
>        down  PCRE in many common cases. If you really want to test for a wider
>        sense of, say, "digit", you must use Unicode  property  tests  such  as
>        \p{Nd}.  Note  that  this  also applies to \b, because it is defined in
>        terms of \w and \W.      

>
>



--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email