[Pcre-svn] [464] code/trunk: Document more clearly capturing…

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [464] code/trunk: Document more clearly capturing behaviour for recursion and subroutines.
Revision: 464
          http://vcs.pcre.org/viewvc?view=rev&revision=464
Author:   ph10
Date:     2009-10-18 20:50:34 +0100 (Sun, 18 Oct 2009)


Log Message:
-----------
Document more clearly capturing behaviour for recursion and subroutines.

Modified Paths:
--------------
    code/trunk/configure.ac
    code/trunk/doc/pcrepattern.3
    code/trunk/testdata/testinput11
    code/trunk/testdata/testinput2
    code/trunk/testdata/testinput7
    code/trunk/testdata/testoutput11
    code/trunk/testdata/testoutput2
    code/trunk/testdata/testoutput7


Modified: code/trunk/configure.ac
===================================================================
--- code/trunk/configure.ac    2009-10-18 10:02:46 UTC (rev 463)
+++ code/trunk/configure.ac    2009-10-18 19:50:34 UTC (rev 464)
@@ -8,8 +8,8 @@


m4_define(pcre_major, [8])
m4_define(pcre_minor, [00])
-m4_define(pcre_prerelease, [-RC2])
-m4_define(pcre_date, [2009-10-17])
+m4_define(pcre_prerelease, [])
+m4_define(pcre_date, [2009-10-19])

# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [0:1:0])

Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2009-10-18 10:02:46 UTC (rev 463)
+++ code/trunk/doc/pcrepattern.3    2009-10-18 19:50:34 UTC (rev 464)
@@ -2074,10 +2074,9 @@
 ways the + and * repeats can carve up the subject, and all have to be tested
 before failure can be reported.
 .P
-At the end of a match, the values set for any capturing subpatterns are those
-from the outermost level of the recursion at which the subpattern value is set.
-If you want to obtain intermediate values, a callout function can be used (see
-below and the
+At the end of a match, the values of capturing parentheses are those from
+the outermost level. If you want to obtain intermediate values, a callout
+function can be used (see below and the
 .\" HREF
 \fBpcrecallout\fP
 .\"
@@ -2085,19 +2084,16 @@
 .sp
   (ab(cd)ef)
 .sp
-the value for the capturing parentheses is "ef", which is the last value taken
-on at the top level. If additional parentheses are added, giving
-.sp
-  \e( ( ( [^()]++ | (?R) )* ) \e)
-     ^                        ^
-     ^                        ^
-.sp
-the string they capture is "ab(cd)ef", the contents of the top level
-parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE
-has to obtain extra memory to store data during a recursion, which it does by
-using \fBpcre_malloc\fP, freeing it via \fBpcre_free\fP afterwards. If no
-memory can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.
+the value for the inner capturing parentheses (numbered 2) is "ef", which is
+the last value taken on at the top level. If a capturing subpattern is not
+matched at the top level, its final value is unset, even if it is (temporarily)
+set at a deeper level.
 .P
+If there are more than 15 capturing parentheses in a pattern, PCRE has to
+obtain extra memory to store data during a recursion, which it does by using
+\fBpcre_malloc\fP, freeing it via \fBpcre_free\fP afterwards. If no memory can
+be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.
+.P
 Do not confuse the (?R) item with the condition (R), which tests for recursion.
 Consider this pattern, which matches text in angle brackets, allowing for
 arbitrary nesting. Only digits are allowed in nested brackets (that is, when
@@ -2207,10 +2203,11 @@
 is used, it does match "sense and responsibility" as well as the other two
 strings. Another example is given in the discussion of DEFINE above.
 .P
-Like recursive subpatterns, a "subroutine" call is always treated as an atomic
+Like recursive subpatterns, a subroutine call is always treated as an atomic
 group. That is, once it has matched some of the subject string, it is never
 re-entered, even if it contains untried alternatives and there is a subsequent
-matching failure.
+matching failure. Any capturing parentheses that are set during the subroutine 
+call revert to their previous values afterwards.
 .P
 When a subpattern is used as a subroutine, processing options such as
 case-independence are fixed when the subpattern is defined. They cannot be
@@ -2294,10 +2291,10 @@
 failing negative assertion, they cause an error if encountered by
 \fBpcre_dfa_exec()\fP.
 .P
-If any of these verbs are used in an assertion subpattern, their effect is
-confined to that subpattern; it does not extend to the surrounding pattern.
-Note that assertion subpatterns are processed as anchored at the point where
-they are tested.
+If any of these verbs are used in an assertion or subroutine subpattern 
+(including recursive subpatterns), their effect is confined to that subpattern;
+it does not extend to the surrounding pattern. Note that such subpatterns are
+processed as anchored at the point where they are tested.
 .P
 The new verbs make use of what was previously invalid syntax: an opening
 parenthesis followed by an asterisk. In Perl, they are generally of the form
@@ -2418,6 +2415,6 @@
 .rs
 .sp
 .nf
-Last updated: 04 October 2009
+Last updated: 18 October 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/testdata/testinput11
===================================================================
--- code/trunk/testdata/testinput11    2009-10-18 10:02:46 UTC (rev 463)
+++ code/trunk/testdata/testinput11    2009-10-18 19:50:34 UTC (rev 464)
@@ -303,4 +303,37 @@
     ** Failers 
     b\"11111


+/(?:(?1)|B)(A(*F)|C)/
+    ABCD
+    CCD
+    ** Failers
+    CAD   
+
+/^(?:(?1)|B)(A(*F)|C)/
+    CCD
+    BCD 
+    ** Failers
+    ABCD
+    CAD
+    BAD    
+
+/(?:(?1)|B)(A(*ACCEPT)XX|C)D/
+    AAD
+    ACD
+    BAD
+    BCD
+    BAX  
+    ** Failers
+    ACX
+    ABC   
+
+/(?(DEFINE)(A))B(?1)C/
+    BAC
+
+/(?(DEFINE)((A)\2))B(?1)C/
+    BAAC
+
+/(?<pn> \( ( [^()]++ | (?&pn) )* \) )/x
+    (ab(cd)ef)
+
 /-- End of testinput11 --/


Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2009-10-18 10:02:46 UTC (rev 463)
+++ code/trunk/testdata/testinput2    2009-10-18 19:50:34 UTC (rev 464)
@@ -3147,4 +3147,17 @@
     xxxxabcde\P
     xxxxabcde\P\P


+/-- This is not in the Perl 5.10 test because Perl seems currently to be broken
+    and not behaving as specified in that it *does* bumpalong after hitting
+    (*COMMIT). --/ 
+
+/(?1)(A(*COMMIT)|B)D/
+    ABD
+    XABD
+    BAD
+    ABXABD  
+    ** Failers 
+    ABX 
+    BAXBAD  
+
 /-- End of testinput2 --/


Modified: code/trunk/testdata/testinput7
===================================================================
--- code/trunk/testdata/testinput7    2009-10-18 10:02:46 UTC (rev 463)
+++ code/trunk/testdata/testinput7    2009-10-18 19:50:34 UTC (rev 464)
@@ -4528,4 +4528,18 @@
     xxxxabcde\P
     xxxxabcde\P\P


+/(?:(?1)|B)(A(*F)|C)/
+    ABCD
+    CCD
+    ** Failers
+    CAD   
+
+/^(?:(?1)|B)(A(*F)|C)/
+    CCD
+    BCD 
+    ** Failers
+    ABCD
+    CAD
+    BAD    
+
 /-- End of testinput7 --/


Modified: code/trunk/testdata/testoutput11
===================================================================
--- code/trunk/testdata/testoutput11    2009-10-18 10:02:46 UTC (rev 463)
+++ code/trunk/testdata/testoutput11    2009-10-18 19:50:34 UTC (rev 464)
@@ -647,4 +647,69 @@
     b\"11111
 No match


+/(?:(?1)|B)(A(*F)|C)/
+    ABCD
+ 0: BC
+ 1: C
+    CCD
+ 0: CC
+ 1: C
+    ** Failers
+No match
+    CAD   
+No match
+
+/^(?:(?1)|B)(A(*F)|C)/
+    CCD
+ 0: CC
+ 1: C
+    BCD 
+ 0: BC
+ 1: C
+    ** Failers
+No match
+    ABCD
+No match
+    CAD
+No match
+    BAD    
+No match
+
+/(?:(?1)|B)(A(*ACCEPT)XX|C)D/
+    AAD
+ 0: AA
+ 1: A
+    ACD
+ 0: ACD
+ 1: C
+    BAD
+ 0: BA
+ 1: A
+    BCD
+ 0: BCD
+ 1: C
+    BAX  
+ 0: BA
+ 1: A
+    ** Failers
+No match
+    ACX
+No match
+    ABC   
+No match
+
+/(?(DEFINE)(A))B(?1)C/
+    BAC
+ 0: BAC
+
+/(?(DEFINE)((A)\2))B(?1)C/
+    BAAC
+ 0: BAAC
+
+/(?<pn> \( ( [^()]++ | (?&pn) )* \) )/x
+    (ab(cd)ef)
+ 0: (ab(cd)ef)
+ 1: (ab(cd)ef)
+ 2: ef
+
 /-- End of testinput11 --/


Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2009-10-18 10:02:46 UTC (rev 463)
+++ code/trunk/testdata/testoutput2    2009-10-18 19:50:34 UTC (rev 464)
@@ -10407,4 +10407,28 @@
     xxxxabcde\P\P
 Partial match: abcde


+/-- This is not in the Perl 5.10 test because Perl seems currently to be broken
+    and not behaving as specified in that it *does* bumpalong after hitting
+    (*COMMIT). --/ 
+
+/(?1)(A(*COMMIT)|B)D/
+    ABD
+ 0: ABD
+ 1: B
+    XABD
+ 0: ABD
+ 1: B
+    BAD
+ 0: BAD
+ 1: A
+    ABXABD  
+ 0: ABD
+ 1: B
+    ** Failers 
+No match
+    ABX 
+No match
+    BAXBAD  
+No match
+
 /-- End of testinput2 --/


Modified: code/trunk/testdata/testoutput7
===================================================================
--- code/trunk/testdata/testoutput7    2009-10-18 10:02:46 UTC (rev 463)
+++ code/trunk/testdata/testoutput7    2009-10-18 19:50:34 UTC (rev 464)
@@ -7560,4 +7560,28 @@
     xxxxabcde\P\P
 Partial match: abcde


+/(?:(?1)|B)(A(*F)|C)/
+    ABCD
+ 0: BC
+    CCD
+ 0: CC
+    ** Failers
+No match
+    CAD   
+No match
+
+/^(?:(?1)|B)(A(*F)|C)/
+    CCD
+ 0: CC
+    BCD 
+ 0: BC
+    ** Failers
+No match
+    ABCD
+No match
+    CAD
+No match
+    BAD    
+No match
+
 /-- End of testinput7 --/