[Pcre-svn] [628] code/trunk/doc: Document difference between…

トップ ページ
このメッセージを削除
著者: Subversion repository
日付:  
To: pcre-svn
題目: [Pcre-svn] [628] code/trunk/doc: Document difference between PCRE and Perl' s versions of \X.
Revision: 628
          http://vcs.pcre.org/viewvc?view=rev&revision=628
Author:   ph10
Date:     2011-07-20 19:03:20 +0100 (Wed, 20 Jul 2011)


Log Message:
-----------
Document difference between PCRE and Perl's versions of \X.

Modified Paths:
--------------
    code/trunk/doc/pcrecompat.3
    code/trunk/doc/pcrepattern.3


Modified: code/trunk/doc/pcrecompat.3
===================================================================
--- code/trunk/doc/pcrecompat.3    2011-07-20 17:53:09 UTC (rev 627)
+++ code/trunk/doc/pcrecompat.3    2011-07-20 18:03:20 UTC (rev 628)
@@ -50,7 +50,11 @@
 the internal representation of Unicode characters, there is no need to
 implement the somewhat messy concept of surrogates."
 .P
-7. PCRE does support the \eQ...\eE escape for quoting substrings. Characters in
+7. PCRE implements a simpler version of \eX than Perl, which changed to make
+\eX match what Unicode calls an "extended grapheme cluster". This is more 
+complicated than an extended Unicode sequence, which is what PCRE matches.
+.P
+8. PCRE does support the \eQ...\eE escape for quoting substrings. Characters in
 between are treated as literals. This is slightly different from Perl in that $
 and @ are also handled as literals inside the quotes. In Perl, they cause
 variable interpolation (but of course PCRE does not have variables). Note the
@@ -66,7 +70,7 @@
 .sp
 The \eQ...\eE sequence is recognized both inside and outside character classes.
 .P
-8. Fairly obviously, PCRE does not support the (?{code}) and (??{code})
+9. Fairly obviously, PCRE does not support the (?{code}) and (??{code})
 constructions. However, there is support for recursive patterns. This is not
 available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE "callout"
 feature allows an external function to be called during pattern matching. See
@@ -76,7 +80,7 @@
 .\"
 documentation for details.
 .P
-9. Subpatterns that are called recursively or as "subroutines" are always
+10. Subpatterns that are called recursively or as "subroutines" are always
 treated as atomic groups in PCRE. This is like Python, but unlike Perl. There
 is a discussion of an example that explains this in more detail in the
 .\" HTML <a href="pcrepattern.html#recursiondifference">
@@ -89,11 +93,11 @@
 .\"
 page.
 .P
-10. There are some differences that are concerned with the settings of captured
+11. There are some differences that are concerned with the settings of captured
 strings when part of a pattern is repeated. For example, matching "aba" against
 the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
 .P
-11. PCRE's handling of duplicate subpattern numbers and duplicate subpattern
+12. PCRE's handling of duplicate subpattern numbers and duplicate subpattern
 names is not as general as Perl's. This is a consequence of the fact the PCRE
 works internally just with numbers, using an external table to translate
 between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b)B),
@@ -103,10 +107,10 @@
 names map to capturing subpattern number 1. To avoid this confusing situation,
 an error is given at compile time.
 .P
-12. Perl recognizes comments in some places that PCRE doesn't, for example,
+13. Perl recognizes comments in some places that PCRE doesn't, for example,
 between the ( and ? at the start of a subpattern.
 .P
-13. PCRE provides some extensions to the Perl regular expression facilities.
+14. PCRE provides some extensions to the Perl regular expression facilities.
 Perl 5.10 includes new features that are not in earlier versions of Perl, some
 of which (such as named parentheses) have been in PCRE for some time. This list
 is with respect to Perl 5.10:
@@ -163,6 +167,6 @@
 .rs
 .sp
 .nf
-Last updated: 02 May 2011
+Last updated: 20 July 2011
 Copyright (c) 1997-2011 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2011-07-20 17:53:09 UTC (rev 627)
+++ code/trunk/doc/pcrepattern.3    2011-07-20 18:03:20 UTC (rev 628)
@@ -757,6 +757,9 @@
 preceding character. None of them have codepoints less than 256, so in
 non-UTF-8 mode \eX matches any one character.
 .P
+Note that recent versions of Perl have changed \eX to match what Unicode calls 
+an "extended grapheme cluster", which has a more complicated definition.
+.P
 Matching characters by Unicode property is not fast, because PCRE has to search
 a structure that contains data for over fifteen thousand characters. That is
 why the traditional escape sequences such as \ed and \ew do not use Unicode
@@ -2752,6 +2755,6 @@
 .rs
 .sp
 .nf
-Last updated: 12 June 2011
+Last updated: 20 July 2011
 Copyright (c) 1997-2011 University of Cambridge.
 .fi