Re: [pcre-dev] /\p{Arabic}/8

Top Page
Delete this message
Author: Juergen Leising
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] /\p{Arabic}/8
On Thu, Nov 22, 2007 at 10:22:20AM +0000, Philip Hazel wrote:
> I can reproduce the problem, and I think I can guess its cause. In the
> table, there are some ARABIC-INDIC characters in the middle of the
> ARABIC characters. And also an AFGHANI character. I suspect that the
> program that reads the Unicode table and turns it into a table for PCRE
> to use gets confused by this, and "loses" some ARABIC characters.

(...)

I see. Ok. Btw, in the meantime I stumbled over further
characters that PCRE does not recognize as I would have
expected (just random tests, not complete ones):

re> /\p{Armenian}/8
data> \x{0589}

No match

UnicodeData.txt:
0589;ARMENIAN FULL STOP;Po;0;L;;;;;N;ARMENIAN PERIOD;;;;

re> /\p{Cyrillic}/8
data> \x{1D2B}/8

No match

1D2B;CYRILLIC LETTER SMALL CAPITAL EL;Ll;0;L;;;;;N;;;;;

re> /\p{Devanagari}/8
data> \x{0970}

No match

0970;DEVANAGARI ABBREVIATION SIGN;Po;0;L;;;;;N;;;;;

re> /\p{Devanagari}/8
data> \x{0965}

No match

0965;DEVANAGARI DOUBLE DANDA;Po;0;L;;;;;N;;;;;
re> /\p{Devanagari}/8
data> \x{0964}

No match

0964;DEVANAGARI DANDA;Po;0;L;;;;;N;;;;

re> /\p{Georgian}/8
data> \x{10FB}

No match

10FB;GEORGIAN PARAGRAPH SEPARATOR;Po;0;L;;;;;N;;;;;

re> /\p{Greek}/8
data> \x{1D68}

No match

1D68;GREEK SUBSCRIPT SMALL LETTER RHO;Ll;0;L;<sub> 03C1;;;;N;;;;;

re> /\p{Greek}/8
data> \x{1D29}

No match

1D29;GREEK LETTER SMALL CAPITAL RHO;Ll;0;L;;;;;N;;;;;

re> /\p{Greek}/8
data> \x{0387}

No match

0387;GREEK ANO TELEIA;Po;0;ON;00B7;;;;N;;;;;

re> /\p{Katakana}/8
data> \x{30FE}

No match

30FE;KATAKANA VOICED ITERATION MARK;Lm;0;L;30FD 3099;;;;N;;;;

re> /\p{Osmanya}/8
data> \x{10486}

No match

10486;OSMANYA LETTER DEEL;Lo;0;L;;;;;N;;;;;

data> \x{1049A}

No match

1049A;OSMANYA LETTER U;Lo;0;L;;;;;N;;;;;

data> \x{10485}

No match

10485;OSMANYA LETTER KHA;Lo;0;L;;;;;N;;;;;

re> /\p{Runic}/8
data> \x{16ED}

No match

16ED;RUNIC CROSS PUNCTUATION;Po;0;L;;;;;N;;;;;

re> /\p{Thai}/8
data> \x{0E3F}

No match

0E3F;THAI CURRENCY SYMBOL BAHT;Sc;0;ET;;;;;N;THAI BAHT SIGN;;;;



Bye, bye,

Juergen