[pcre-dev] [Bug 1279] New: Support Unicode Extended Grapheme…

Góra strony
Delete this message
Autor: Wesley J Landaker
Data:  
Dla: pcre-dev
Temat: [pcre-dev] [Bug 1279] New: Support Unicode Extended Grapheme Clusters with \X
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1279
           Summary: Support Unicode Extended Grapheme Clusters with \X
           Product: PCRE
           Version: N/A
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: wjl@???
                CC: pcre-dev@???



Please support matching the full standard Unicode definition of Extended
Grapheme Clusters with \X.

Currently, \X doesn't support the Unicode definition of a Extended Grapheme
Cluster, although it is pretty close.

The documentation makes it clear that this is known: "Note that recent
versions of Perl have changed \X to match what Unicode calls an "extended
grapheme cluster", which has a more complicated definition."

Unfortunately, this makes PCRE incompatible with both Perl and ICU Regulator
expressions in several important situations. Since matching Extended Grapheme
Clusters is one of the most common things to do with Unicode regulator
expressions, this situation also reflects somewhat poorly upon PCRE which
otherwise has excellent Unicode support!

So again, please support matching the full standard Unicode definition of
Extended Grapheme Clusters with \X.

Perhaps this support could be enabled/disabled via an option for some measure
of backwards compatibility. However, I'd suggest that anyone using \X in
patterns currently probably is doing so in an attempt to get real Extended
Grapheme Clusters, but is only currently getting an approximation. It's less
likely (but obviously possible) that someone thought, "I need (?>\PM\pM*) ...
oh! \X in PCRE matches that!".

Thanks for PCRE and considering this upgrade! =)


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email