[pcre-dev] [Fwd: Re: Property Codes, Character Classes and…

Inizio della pagina
Delete this message
Autore: Sheri
Data:  
To: pcre-dev
Oggetto: [pcre-dev] [Fwd: Re: Property Codes, Character Classes and non-UTF8 mode]

Zack Weinberg wrote:
> On Sat, Oct 25, 2008 at 8:42 AM, Sheri <silvermoonwoman@???> wrote:
>
>> Source I think is Unicode, but presently editing as Ansi. AFAIK the
>> upper characters are the same.
>>
>
> They aren't. There are 27 characters defined by "ansi"
> ("windows-1252") that are reserved control characters in Unicode.
>
> Also, which encoding of Unicode do you mean? UTF-8, -16, or -32?
>
> zw
>
>


Actually the user says the original source is OCR which he is editing in
a Windows text editor that uses PCRE for processing regular expressions.
The absence of 27 ANSI characters is understood. However the ability to
match upper range characters with unicode property codes in non-UTF8
mode is still useful (e.g., for accented letters), and works fine
(except inside character classes).

Regards,
Sheri