------- You are receiving this mail because: -------
You are on the CC list for the bug.
Summary: Case folding in PCRE
I was wondering what's the (planned?) status of casefolding in PCRE when doing
a (case insensitive) match using Unicode.
For instance, "ß" (U+00DF LATIN SMALL LETTER SHARP S) should match "ss" (or
even "SS" in case insensitive); µ (U+00B5, MICRO SIGN) should match μ
(U+03BC, GREEK SMALL LETTER MU), or Μ (U+039C, GREEK CAPITAL LETTER MU). The
CaseFolding.txt file from Unicode says
# If all characters are mapped according to the full mapping below, then
# case differences (according to UnicodeData.txt and SpecialCasing.txt)
# are eliminated.
For instance the relevant entries for what I just said are:
0053; C; 0073; # LATIN CAPITAL LETTER S
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
00B5; C; 03BC; # MICRO SIGN
039C; C; 03BC; # GREEK CAPITAL LETTER MU
>From what I can see right now, PCRE doesn't seem to do this. For starters -- am I wrong? If not, what's the overall status of such a feature? For instance, how
are the four different Turkish "i" letters considered?