[pcre-dev] [Bug 1208] New: Case folding in PCRE

Top Page
Delete this message
Author: Giuseppe D'Angelo
Date:  
To: pcre-dev
New-Topics: [pcre-dev] [Bug 1208] Case folding in PCRE, [pcre-dev] [Bug 1208] Case folding in PCRE, [pcre-dev] [Bug 1208] Case folding in PCRE, [pcre-dev] [Bug 1208] Case folding in PCRE, [pcre-dev] [Bug 1208] Case folding in PCRE, [pcre-dev] [Bug 1208] Case folding in PCRE, [pcre-dev] [Bug 1208] Case folding in PCRE
Subject: [pcre-dev] [Bug 1208] New: Case folding in PCRE
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1208
           Summary: Case folding in PCRE
           Product: PCRE
           Version: 8.30
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: wishlist
          Priority: low
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: dangelog@???
                CC: pcre-dev@???



Hi,

I was wondering what's the (planned?) status of casefolding in PCRE when doing
a (case insensitive) match using Unicode.

For instance, "ß" (U+00DF LATIN SMALL LETTER SHARP S) should match "ss" (or
even "SS" in case insensitive); µ (U+00B5, MICRO SIGN) should match μ
(U+03BC, GREEK SMALL LETTER MU), or Μ (U+039C, GREEK CAPITAL LETTER MU). The
CaseFolding.txt file from Unicode says

# If all characters are mapped according to the full mapping below, then
# case differences (according to UnicodeData.txt and SpecialCasing.txt)
# are eliminated.

For instance the relevant entries for what I just said are:

0053; C; 0073; # LATIN CAPITAL LETTER S
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
00B5; C; 03BC; # MICRO SIGN
039C; C; 03BC; # GREEK CAPITAL LETTER MU

>From what I can see right now, PCRE doesn't seem to do this. For starters -- am

I wrong? If not, what's the overall status of such a feature? For instance, how
are the four different Turkish "i" letters considered?

Thanks,
Giuseppe D'Angelo


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email