Re: [pcre-dev] Property Codes, Character Classes and non-UTF…

Top Page
Delete this message
Author: Sheri
Date:  
To: pcre-dev
Old-Topics: [pcre-dev] Property Codes, Character Classes and non-UTF8 mode
Subject: Re: [pcre-dev] Property Codes, Character Classes and non-UTF8 mode
Hi Philip,

Thought a reminder on this issue might be helpful since its not in the
bug system.

Regards,
Sheri


Sheri wrote:
> We've previously established and documented that Unicode property codes
> do work in non-UTF8 mode for characters up to 255.
>
> But the documentation says that \p and \P can be used in character
> classes. In character classes, they seem to work only up to 128. Bug?
>
> pcretest
> PCRE version 7.8 2008-09-05
>
> re> /(?:\p{Lu}|\x20)+/
> data> \x41\x20\x50\xC2\x54\xC9\x20\x54\x4F\x44\x41\x59
> 0: A P\xc2T\xc9 TODAY
> re> /[\p{Lu}\x20]+/
> data> \x41\x20\x50\xC2\x54\xC9\x20\x54\x4F\x44\x41\x59
> 0: A P
>
> Regards,
> Sheri
>
>
>

Philip Hazel wrote:
> On Sat, 25 Oct 2008, Sheri wrote:
>
>
>> We've previously established and documented that Unicode property codes
>> do work in non-UTF8 mode for characters up to 255.
>>
>
> Presumably using Unicode encoding?
>
>
>> But the documentation says that \p and \P can be used in character
>> classes. In character classes, they seem to work only up to 128. Bug?
>>
>
> It would seem so. The behaviour should obviously be the same in and
> outside character classes.
>
> I've put this on my list to look at when I next work on PCRE. Thanks for
> the report.
>
> Philip
>
>