Re: [pcre-dev] /\p{Arabic}/8

Αρχική Σελίδα
Delete this message
Συντάκτης: Philip Hazel
Ημερομηνία:  
Προς: Juergen Leising
Υ/ο: pcre-dev
Αντικείμενο: Re: [pcre-dev] /\p{Arabic}/8
On Wed, 21 Nov 2007, Juergen Leising wrote:

> I wonder, why certain hex codes do not match certain
> UTF-8 script names.


> 0654;ARABIC HAMZA ABOVE;Mn;230;NSM;;;;;N;;;;;
> 0656;ARABIC SUBSCRIPT ALEF;Mn;220;NSM;;;;;N;;;;;
> 0658;ARABIC MARK NOON GHUNNA;Mn;230;NSM;;;;;N;;;;;
>
> But they don't. Do I miss something? Wrong table or version or
> syntax or whatever?


I can reproduce the problem, and I think I can guess its cause. In the
table, there are some ARABIC-INDIC characters in the middle of the
ARABIC characters. And also an AFGHANI character. I suspect that the
program that reads the Unicode table and turns it into a table for PCRE
to use gets confused by this, and "loses" some ARABIC characters.

Thank you for the report. I will try to get this bug fixed for the next
release.

Regards,
Philip

--
Philip Hazel