------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1208
Summary: Case folding in PCRE
Product: PCRE
Version: 8.30
Platform: Other
OS/Version: Linux
Status: NEW
Severity: wishlist
Priority: low
Component: Code
AssignedTo: ph10@???
ReportedBy: dangelog@???
CC: pcre-dev@???
Hi,
I was wondering what's the (planned?) status of casefolding in PCRE when doing
a (case insensitive) match using Unicode.
For instance, "ß" (U+00DF LATIN SMALL LETTER SHARP S) should match "ss" (or
even "SS" in case insensitive); µ (U+00B5, MICRO SIGN) should match μ
(U+03BC, GREEK SMALL LETTER MU), or Μ (U+039C, GREEK CAPITAL LETTER MU). The
CaseFolding.txt file from Unicode says
# If all characters are mapped according to the full mapping below, then
# case differences (according to UnicodeData.txt and SpecialCasing.txt)
# are eliminated.
For instance the relevant entries for what I just said are:
0053; C; 0073; # LATIN CAPITAL LETTER S
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
00B5; C; 03BC; # MICRO SIGN
039C; C; 03BC; # GREEK CAPITAL LETTER MU
>From what I can see right now, PCRE doesn't seem to do this. For starters -- am
I wrong? If not, what's the overall status of such a feature? For instance, how
are the four different Turkish "i" letters considered?
Thanks,
Giuseppe D'Angelo
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email