Re: [pcre-dev] \b bug with extended Unicode characters?

著者: Philip Hazel
日付:
To: Ralf Junker
CC: pcre-dev
古いトピック: [pcre-dev] \b bug with extended Unicode characters?
題目: Re: [pcre-dev] \b bug with extended Unicode characters?

On Wed, 21 Jan 2009, Ralf Junker wrote:

> it appears the word boundary anchor fails to work when it bounds a
> word using extended Unicode characters (PCRE 7.8, UTF-8 enabled):
>
> ÅÅ°ÅÅ± -> Matches
> \bÅÅ°ÅÅ±\b -> Fails
> \bNAME\b -> Matches
>
> Can anybody confirm this?

Did this ever get answered? The answer is that it is a limitation of
PCRE. I have upgraded the documentation about \b to make it even
clearer. It now says this:

In UTF-8 mode, characters with values greater than 128 never match
\d, \s, or \w, and always match \D, \S, and \W. This is true
even when Unicode character property support is available. These
sequences retain their original meanings from before UTF-8 support was
available, mainly for efficiency reasons. Note that this also affects
\b, because it is defined in terms of \w and \W.

Philip

--
Philip Hazel

このメッセージは次のスレッドの一部です:
	日付によるスレッドの仕分け

	Ralf Junker 、