Re: [pcre-dev] Locale-aware (Turkish) Unicode caseless matching

Autor: ph10
Datum:
To: Giuseppe D'Angelo
CC: Jean-Christophe Deschamps, pcre-dev
Betreff: Re: [pcre-dev] Locale-aware (Turkish) Unicode caseless matching

On Thu, 23 May 2013, Giuseppe D'Angelo wrote:

> No, because it hasn't to do with Unicode Properties -- it has to do
> with setting up the case folding tables in a different way depending
> on the user's language (if it's Turkish, "I" shouldn't fold to --
> match caselessly -- "i").

We can take our time discussing this, because it is now too late for any
new changes to the forthcoming 8.33 release.

The problem I see with local changes is the problem that Unicode is
supposed to solve: what to do if a document is written in more than one
language? If part is in English and part is in Turkish - which is quite
possible in an English book that is discussing Turkish literature (or
vice versa) - how can you have English rules for some parts and Turkish
rules for others?

PCRE currently gets its case-folding rules from the Unicode tables. I
would be very loath to start introducing locale-specific exceptions -
where does this end? I also see these problems:

1. If the tables are modified at PCRE build time you get the best
performance, but you are then restricted to one locale.

2. If the tables are not modified, but special cases are detected at run
time, performance is hit - for all users, not just those in the special
locales.

Perhaps whoever represents Turkish on the Unicode consortium should be
lobbying for new characters I and i that do not fold to each other.

Philip

--
Philip Hazel

Diese Nachricht ist Teil des folgenden Threads:
	Der komplette Thread sortiert nach Datum
	Giuseppe D'Angelo am
	Zoltán Herczeg am

Re: [pcre-dev] Locale-aware (Turkish) Unicode caseless match…