------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1279
Summary: Support Unicode Extended Grapheme Clusters with \X
Product: PCRE
Version: N/A
Platform: All
OS/Version: All
Status: NEW
Severity: bug
Priority: medium
Component: Code
AssignedTo: ph10@???
ReportedBy: wjl@???
CC: pcre-dev@???
Please support matching the full standard Unicode definition of Extended
Grapheme Clusters with \X.
Currently, \X doesn't support the Unicode definition of a Extended Grapheme
Cluster, although it is pretty close.
The documentation makes it clear that this is known: "Note that recent
versions of Perl have changed \X to match what Unicode calls an "extended
grapheme cluster", which has a more complicated definition."
Unfortunately, this makes PCRE incompatible with both Perl and ICU Regulator
expressions in several important situations. Since matching Extended Grapheme
Clusters is one of the most common things to do with Unicode regulator
expressions, this situation also reflects somewhat poorly upon PCRE which
otherwise has excellent Unicode support!
So again, please support matching the full standard Unicode definition of
Extended Grapheme Clusters with \X.
Perhaps this support could be enabled/disabled via an option for some measure
of backwards compatibility. However, I'd suggest that anyone using \X in
patterns currently probably is doing so in an attempt to get real Extended
Grapheme Clusters, but is only currently getting an approximation. It's less
likely (but obviously possible) that someone thought, "I need (?>\PM\pM*) ...
oh! \X in PCRE matches that!".
Thanks for PCRE and considering this upgrade! =)
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email