Re: [pcre-dev] Locale-aware (Turkish) Unicode caseless match…

Startseite
Nachricht löschen
Autor: ph10
Datum:  
To: Giuseppe D'Angelo
CC: Jean-Christophe Deschamps, pcre-dev
Betreff: Re: [pcre-dev] Locale-aware (Turkish) Unicode caseless matching
On Thu, 23 May 2013, Giuseppe D'Angelo wrote:

> No, because it hasn't to do with Unicode Properties -- it has to do
> with setting up the case folding tables in a different way depending
> on the user's language (if it's Turkish, "I" shouldn't fold to --
> match caselessly -- "i").


We can take our time discussing this, because it is now too late for any
new changes to the forthcoming 8.33 release.

The problem I see with local changes is the problem that Unicode is
supposed to solve: what to do if a document is written in more than one
language? If part is in English and part is in Turkish - which is quite
possible in an English book that is discussing Turkish literature (or
vice versa) - how can you have English rules for some parts and Turkish
rules for others?

PCRE currently gets its case-folding rules from the Unicode tables. I
would be very loath to start introducing locale-specific exceptions -
where does this end? I also see these problems:

1. If the tables are modified at PCRE build time you get the best
performance, but you are then restricted to one locale.

2. If the tables are not modified, but special cases are detected at run
time, performance is hit - for all users, not just those in the special
locales.

Perhaps whoever represents Turkish on the Unicode consortium should be
lobbying for new characters I and i that do not fold to each other.

Philip

--
Philip Hazel