[pcre-dev] [Bug 865] \b Does not work for non ascii characters in UTF-8

Auteur: Mark de Does
Datum:
Aan: pcre-dev
Onderwerp: [pcre-dev] [Bug 865] \b Does not work for non ascii characters in UTF-8

------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=865

--- Comment #2 from Mark de Does <mark@???> 2009-07-20 18:09:01 ---
Sorry for reporting this.

Mark

On Mon, 2009-07-20 at 09:54 +0100, Philip Hazel wrote:
> ------- You are receiving this mail because: -------
> You reported the bug.
>
> http://bugs.exim.org/show_bug.cgi?id=865
>
> Philip Hazel <ph10@???> changed:
>
> What |Removed |Added > ---------------------------------------------------------------------------- > Status|NEW |RESOLVED > Resolution| |WONTFIX

>
>
>
>
> --- Comment #1 from Philip Hazel <ph10@???> 2009-07-20 09:54:05 ---
> This behaviour is fully documented. See, for example, this comment in a section
> entitled "General comments about UTF-8 mode" in the "pcre" man page:
>
> 6. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly > test characters of any code value, but the characters that PCRE recog- > nizes as digits, spaces, or word characters remain the same set as > before, all with values less than 256. This remains true even when PCRE > includes Unicode property support, because to do otherwise would slow > down PCRE in many common cases. If you really want to test for a wider > sense of, say, "digit", you must use Unicode property tests such as > \p{Nd}. Note that this also applies to \b, because it is defined in > terms of \w and \W.

>
>

--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

Deze boodschap maakt deel uit van devolgende draad:
	de volledige draad-boom gesorteerd op datum
	Philip Hazel op

[pcre-dev] [Bug 865] \b Does not work for non ascii characte…