[Pcre-svn] [177] code/trunk: Improvements for substring handling with partial matches.

Author: Subversion repository
Date:
To: pcre-svn
Subject: [Pcre-svn] [177] code/trunk: Improvements for substring handling with partial matches.

Revision: 177

          http://www.exim.org/viewvc/pcre2?view=rev&revision=177
Author:   ph10
Date:     2014-12-22 17:33:10 +0000 (Mon, 22 Dec 2014)

Log Message:
-----------
Improvements for substring handling with partial matches.

Modified Paths:
--------------
    code/trunk/doc/html/pcre2_substring_length_bynumber.html
    code/trunk/doc/html/pcre2api.html
    code/trunk/doc/html/pcre2partial.html
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2_substring_length_bynumber.3
    code/trunk/doc/pcre2api.3
    code/trunk/doc/pcre2partial.3
    code/trunk/src/pcre2_substring.c
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testinput6
    code/trunk/testdata/testoutput2
    code/trunk/testdata/testoutput6

Modified: code/trunk/doc/html/pcre2_substring_length_bynumber.html
===================================================================
--- code/trunk/doc/html/pcre2_substring_length_bynumber.html    2014-12-19 09:55:25 UTC (rev 176)
+++ code/trunk/doc/html/pcre2_substring_length_bynumber.html    2014-12-22 17:33:10 UTC (rev 177)
@@ -31,9 +31,11 @@
 <pre>
   <i>match_data</i>   The match data block for the match
   <i>number</i>       The substring number
-  <i>length</i>       Where to return the length
+  <i>length</i>       Where to return the length, or NULL
 </pre>
-The yield is zero on success, or an error code if the substring is not found.
+The third argument may be NULL if all you want to know is whether or not a
+substring is set. The yield is zero on success, or a negative error code
+otherwise. After a partial match, only substring 0 is available.
 </P>
 <P>
 There is a complete description of the PCRE2 native API in the

Modified: code/trunk/doc/html/pcre2api.html
===================================================================
--- code/trunk/doc/html/pcre2api.html    2014-12-19 09:55:25 UTC (rev 176)
+++ code/trunk/doc/html/pcre2api.html    2014-12-22 17:33:10 UTC (rev 177)
@@ -1740,6 +1740,12 @@
 below.
 </P>
 <P>
+When a call of <b>pcre2_match()</b> fails, valid data is available in the match
+block only when the error is PCRE2_ERROR_NOMATCH, PCRE2_ERROR_PARTIAL, or one
+of the error codes for an invalid UTF string. Exactly what is available depends
+on the error, and is detailed below.
+</P>
+<P>
 When one of the matching functions is called, pointers to the compiled pattern
 and the subject string are set in the match data block so that they can be
 referenced by the extraction functions. After running a match, you must not
@@ -2018,9 +2024,9 @@
 compiled pattern.
 </P>
 <P>
-The overall matched string and any captured substrings are returned to the
-caller via a vector of PCRE2_SIZE values. This is called the <b>ovector</b>, and
-is contained within the
+A successful match returns the overall matched string and any captured
+substrings to the caller via a vector of PCRE2_SIZE values. This is called the
+<b>ovector</b>, and is contained within the
 <a href="#matchdatablock">match data block.</a>
 You can obtain direct access to the ovector by calling
 <b>pcre2_get_ovector_pointer()</b> to find its address, and
@@ -2041,20 +2047,26 @@
 library.
 </P>
 <P>
-The first pair of offsets (that is, <i>ovector[0]</i> and <i>ovector[1]</i>)
-identifies the portion of the subject string that was matched by the entire
-pattern. The next pair is used for the first capturing subpattern, and so on.
-The value returned by <b>pcre2_match()</b> is one more than the highest numbered
-pair that has been set. For example, if two substrings have been captured, the
-returned value is 3. If there are no capturing subpatterns, the return value
-from a successful match is 1, indicating that just the first pair of offsets
-has been set.
+After a partial match (error return PCRE2_ERROR_PARTIAL), only the first pair
+of offsets (that is, <i>ovector[0]</i> and <i>ovector[1]</i>) are set. They
+identify the part of the subject that was partially matched. See the
+<a href="pcre2partial.html"><b>pcre2partial</b></a>
+documentation for details of partial matching.
 </P>
 <P>
+After a successful match, the first pair of offsets identifies the portion of
+the subject string that was matched by the entire pattern. The next pair is
+used for the first capturing subpattern, and so on. The value returned by
+<b>pcre2_match()</b> is one more than the highest numbered pair that has been
+set. For example, if two substrings have been captured, the returned value is
+3. If there are no capturing subpatterns, the return value from a successful
+match is 1, indicating that just the first pair of offsets has been set.
+</P>
+<P>
 If a pattern uses the \K escape sequence within a positive assertion, the
-reported start of the match can be greater than the end of the match. For
-example, if the pattern (?=ab\K) is matched against "ab", the start and end
-offset values for the match are 2 and 0.
+reported start of a successful match can be greater than the end of the match.
+For example, if the pattern (?=ab\K) is matched against "ab", the start and
+end offset values for the match are 2 and 0.
 </P>
 <P>
 If a capturing subpattern group is matched repeatedly within a single match
@@ -2104,24 +2116,38 @@
 </P>
 <P>
 As well as the offsets in the ovector, other information about a match is
-retained in the match data block and can be retrieved by the above functions.
+retained in the match data block and can be retrieved by the above functions in
+appropriate circumstances. If they are called at other times, the result is
+undefined.
 </P>
 <P>
-When a (*MARK) name is to be passed back, <b>pcre2_get_mark()</b> returns a
-pointer to the zero-terminated name, which is within the compiled pattern.
-Otherwise NULL is returned. A (*MARK) name may be available after a failed
-match or a partial match, as well as after a successful one.
+After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure
+to match (PCRE2_ERROR_NOMATCH), a (*MARK) name may be available, and
+<b>pcre2_get_mark()</b> can be called. It returns a pointer to the
+zero-terminated name, which is within the compiled pattern. Otherwise NULL is
+returned. After a successful match, the (*MARK) name that is returned is the
+last one encountered on the matching path through the pattern. After a "no
+match" or a partial match, the last encountered (*MARK) name is returned. For
+example, consider this pattern:
+<pre>
+  ^(*MARK:A)((*MARK:B)a|b)c
+</pre>
+When it matches "bc", the returned mark is A. The B mark is "seen" in the first
+branch of the group, but it is not on the matching path. On the other hand,
+when this pattern fails to match "bx", the returned mark is B.
 </P>
 <P>
-The code unit offset of the character at which a successful match started is
-returned by <b>pcre2_get_startchar()</b>. For a non-partial match, this can be
+After a successful match, a partial match, or one of the invalid UTF errors
+(for example, PCRE2_ERROR_UTF8_ERR5), <b>pcre2_get_startchar()</b> can be
+called. After a successful or partial match it returns the code unit offset of
+the character at which the match started. For a non-partial match, this can be
 different to the value of <i>ovector[0]</i> if the pattern contains the \K
 escape sequence. After a partial match, however, this value is always the same
 as <i>ovector[0]</i> because \K does not affect the result of a partial match.
 </P>
 <P>
-The <b>startchar</b> field is also used to return the offset of an invalid
-UTF character when UTF checking fails. Details are given in the
+After a UTF check failure, \fBpcre2_get_startchar()\fB can be used to obtain
+the code unit offset of the invalid UTF character. Details are given in the
 <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
 page.
 <a name="errorlist"></a></P>
@@ -2256,19 +2282,23 @@
 Captured substrings can be accessed directly by using the ovector as described
 <a href="#matchedstrings">above.</a>
 For convenience, auxiliary functions are provided for extracting captured
-substrings as new, separate, zero-terminated strings. The functions in this
-section identify substrings by number. The number zero refers to the entire
-matched substring, with higher numbers referring to substrings captured by
-parenthesized groups. The next section describes similar functions for
-extracting captured substrings by name. A substring that contains a binary zero
-is correctly extracted and has a further zero added on the end, but the result
-is not, of course, a C string.
+substrings as new, separate, zero-terminated strings. A substring that contains
+a binary zero is correctly extracted and has a further zero added on the end,
+but the result is not, of course, a C string.
 </P>
 <P>
+The functions in this section identify substrings by number. The number zero
+refers to the entire matched substring, with higher numbers referring to
+substrings captured by parenthesized groups. After a partial match, only
+substring zero is available. An attempt to extract any other substring gives
+the error PCRE2_ERROR_PARTIAL. The next section describes similar functions for
+extracting captured substrings by name.
+</P>
+<P>
 If a pattern uses the \K escape sequence within a positive assertion, the
-reported start of the match can be greater than the end of the match. For
-example, if the pattern (?=ab\K) is matched against "ab", the start and end
-offset values for the match are 2 and 0. In this situation, calling these
+reported start of a successful match can be greater than the end of the match.
+For example, if the pattern (?=ab\K) is matched against "ab", the start and
+end offset values for the match are 2 and 0. In this situation, calling these
 functions with a zero substring number extracts a zero-length empty string.
 </P>
 <P>
@@ -2302,7 +2332,8 @@
 <P>
 The return value from all these functions is zero for success, or a negative
 error code. If the pattern match failed, the match failure code is returned.
-Other possible error codes are:
+If a substring number greater than zero is used after a partial match,
+PCRE2_ERROR_PARTIAL is returned. Other possible error codes are:
 <pre>
   PCRE2_ERROR_NOMEMORY
 </pre>
@@ -2343,6 +2374,10 @@
 the match data block.
 </P>
 <P>
+This function must be called only after a successful match. If called after a
+partial match, the error code PCRE2_ERROR_PARTIAL is returned.
+</P>
+<P>
 The address of the memory block is returned via <i>listptr</i>, which is also
 the start of the list of string pointers. The end of the list is marked by a
 NULL pointer. The address of the list of lengths is returned via
@@ -2757,7 +2792,7 @@
 </P>
 <br><a name="SEC37" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 14 December 2014
+Last updated: 22 December 2014
 <br>
 Copyright &copy; 1997-2014 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcre2partial.html
===================================================================
--- code/trunk/doc/html/pcre2partial.html    2014-12-19 09:55:25 UTC (rev 176)
+++ code/trunk/doc/html/pcre2partial.html    2014-12-22 17:33:10 UTC (rev 177)
@@ -89,8 +89,9 @@
 </P>
 <P>
 When a partial match is returned, the first two elements in the ovector point
-to the portion of the subject that was matched. The appearance of \K in the
-pattern has no effect for a partial match. Consider this pattern:
+to the portion of the subject that was matched, but the values in the rest of
+the ovector are undefined. The appearance of \K in the pattern has no effect
+for a partial match. Consider this pattern:
 <pre>
   /abc\K123/
 </pre>
@@ -455,7 +456,7 @@
 </P>
 <br><a name="SEC10" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 14 October 2014
+Last updated: 22 December 2014
 <br>
 Copyright &copy; 1997-2014 University of Cambridge.
 <br>

Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2014-12-19 09:55:25 UTC (rev 176)
+++ code/trunk/doc/pcre2.txt    2014-12-22 17:33:10 UTC (rev 177)
@@ -1753,6 +1753,12 @@
        described  in  the  sections  on  matched  strings and other match data
        below.

+       When a call of pcre2_match() fails, valid  data  is  available  in  the
+       match    block    only   when   the   error   is   PCRE2_ERROR_NOMATCH,
+       PCRE2_ERROR_PARTIAL, or one of the  error  codes  for  an  invalid  UTF
+       string. Exactly what is available depends on the error, and is detailed
+       below.
+
        When one of the matching functions is called, pointers to the  compiled
        pattern  and the subject string are set in the match data block so that
        they can be referenced by the extraction  functions.  After  running  a
@@ -2008,14 +2014,14 @@
        be  captured. The pcre2_pattern_info() function can be used to find out
        how many capturing subpatterns there are in a compiled pattern.

-       The overall matched string and any captured substrings are returned  to
-       the  caller via a vector of PCRE2_SIZE values. This is called the ovec-
-       tor, and is contained within the match  data  block.   You  can  obtain
-       direct  access to the ovector by calling pcre2_get_ovector_pointer() to
-       find its address, and pcre2_get_ovector_count() to find the  number  of
-       pairs  of  values it contains. Alternatively, you can use the auxiliary
-       functions for accessing captured substrings by number or by  name  (see
-       below).
+       A successful match returns the overall matched string and any  captured
+       substrings  to  the  caller  via a vector of PCRE2_SIZE values. This is
+       called the ovector, and is contained within the match data block.   You
+       can  obtain  direct  access  to  the ovector by calling pcre2_get_ovec-
+       tor_pointer() to find its  address,  and  pcre2_get_ovector_count()  to
+       find  the number of pairs of values it contains. Alternatively, you can
+       use the auxiliary functions for accessing captured substrings by number
+       or by name (see below).

        Within the ovector, the first in each pair of values is set to the off-
        set of the first code unit of a substring, and the second is set to the
@@ -2024,53 +2030,58 @@
        are  byte  offsets  in  the 8-bit library, 16-bit offsets in the 16-bit
        library, and 32-bit offsets in the 32-bit library.

-       The first pair of offsets (that is, ovector[0] and ovector[1])  identi-
-       fies  the  portion of the subject string that was matched by the entire
-       pattern. The next pair is used for the first capturing subpattern,  and
-       so  on.  The value returned by pcre2_match() is one more than the high-
-       est numbered pair that has been set. For  example,  if  two  substrings
-       have  been captured, the returned value is 3. If there are no capturing
-       subpatterns, the return value from a successful match is 1,  indicating
-       that just the first pair of offsets has been set.
+       After a partial match  (error  return  PCRE2_ERROR_PARTIAL),  only  the
+       first  pair  of  offsets  (that is, ovector[0] and ovector[1]) are set.
+       They identify the part of the subject that was partially  matched.  See
+       the pcre2partial documentation for details of partial matching.

-       If  a  pattern uses the \K escape sequence within a positive assertion,
-       the reported start of the match can be greater  than  the  end  of  the
-       match.  For  example,  if the pattern (?=ab\K) is matched against "ab",
-       the start and end offset values for the match are 2 and 0.
+       After a successful match, the first pair of offsets identifies the por-
+       tion of the subject string that was matched by the entire pattern.  The
+       next  pair  is  used for the first capturing subpattern, and so on. The
+       value returned by pcre2_match() is one more than the  highest  numbered
+       pair  that  has been set. For example, if two substrings have been cap-
+       tured, the returned value is 3. If there are no capturing  subpatterns,
+       the return value from a successful match is 1, indicating that just the
+       first pair of offsets has been set.

-       If a capturing subpattern group is matched repeatedly within  a  single
-       match  operation, it is the last portion of the subject that it matched
+       If a pattern uses the \K escape sequence within a  positive  assertion,
+       the reported start of a successful match can be greater than the end of
+       the match.  For example, if the pattern  (?=ab\K)  is  matched  against
+       "ab", the start and end offset values for the match are 2 and 0.
+
+       If  a  capturing subpattern group is matched repeatedly within a single
+       match operation, it is the last portion of the subject that it  matched
        that is returned.

        If the ovector is too small to hold all the captured substring offsets,
-       as  much  as possible is filled in, and the function returns a value of
-       zero. If captured substrings are not of interest, pcre2_match() may  be
+       as much as possible is filled in, and the function returns a  value  of
+       zero.  If captured substrings are not of interest, pcre2_match() may be
        called with a match data block whose ovector is of minimum length (that
        is, one pair). However, if the pattern contains back references and the
        ovector is not big enough to remember the related substrings, PCRE2 has
-       to get additional memory for use during matching. Thus  it  is  usually
+       to  get  additional  memory for use during matching. Thus it is usually
        advisable to set up a match data block containing an ovector of reason-
        able size.

-       It is possible for capturing subpattern number n+1 to match  some  part
+       It  is  possible for capturing subpattern number n+1 to match some part
        of the subject when subpattern n has not been used at all. For example,
-       if the string "abc" is matched  against  the  pattern  (a|(z))(bc)  the
+       if  the  string  "abc"  is  matched against the pattern (a|(z))(bc) the
        return from the function is 4, and subpatterns 1 and 3 are matched, but
-       2 is not. When this happens, both values in  the  offset  pairs  corre-
+       2  is  not.  When  this happens, both values in the offset pairs corre-
        sponding to unused subpatterns are set to PCRE2_UNSET.

-       Offset  values  that correspond to unused subpatterns at the end of the
-       expression are also set to PCRE2_UNSET.  For  example,  if  the  string
+       Offset values that correspond to unused subpatterns at the end  of  the
+       expression  are  also  set  to  PCRE2_UNSET. For example, if the string
        "abc" is matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3
-       are not matched.  The return from the function is 2, because the  high-
+       are  not matched.  The return from the function is 2, because the high-
        est used capturing subpattern number is 1. The offsets for for the sec-
-       ond and third capturing  subpatterns  (assuming  the  vector  is  large
+       ond  and  third  capturing  subpatterns  (assuming  the vector is large
        enough, of course) are set to PCRE2_UNSET.

        Elements in the ovector that do not correspond to capturing parentheses
        in the pattern are never changed. That is, if a pattern contains n cap-
        turing parentheses, no more than ovector[0] to ovector[2n+1] are set by
-       pcre2_match(). The other elements retain whatever  values  they  previ-
+       pcre2_match().  The  other  elements retain whatever values they previ-
        ously had.

@@ -2080,26 +2091,39 @@

        PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *match_data);

-       As  well as the offsets in the ovector, other information about a match
-       is retained in the match data block and can be retrieved by  the  above
-       functions.
+       As well as the offsets in the ovector, other information about a  match
+       is  retained  in the match data block and can be retrieved by the above
+       functions in appropriate circumstances. If they  are  called  at  other
+       times, the result is undefined.

-       When  a  (*MARK)  name is to be passed back, pcre2_get_mark() returns a
-       pointer to the zero-terminated name, which is within the compiled  pat-
-       tern.   Otherwise  NULL  is  returned.  A (*MARK) name may be available
-       after a failed match or a partial match, as well as after a  successful
-       one.
+       After  a  successful match, a partial match (PCRE2_ERROR_PARTIAL), or a
+       failure to match (PCRE2_ERROR_NOMATCH), a (*MARK) name  may  be  avail-
+       able,  and  pcre2_get_mark() can be called. It returns a pointer to the
+       zero-terminated name, which is within the compiled  pattern.  Otherwise
+       NULL  is  returned.  After a successful match, the (*MARK) name that is
+       returned is the last one encountered on the matching path  through  the
+       pattern.  After  a  "no match" or a partial match, the last encountered
+       (*MARK) name is returned. For example, consider this pattern:

-       The  code  unit  offset  of  the  character at which a successful match
-       started is returned by pcre2_get_startchar(). For a non-partial  match,
-       this  can  be  different to the value of ovector[0] if the pattern con-
-       tains the \K escape sequence. After  a  partial  match,  however,  this
+         ^(*MARK:A)((*MARK:B)a|b)c
+
+       When it matches "bc", the returned mark is A. The B mark is  "seen"  in
+       the  first  branch of the group, but it is not on the matching path. On
+       the other hand, when this pattern fails to  match  "bx",  the  returned
+       mark is B.
+
+       After  a  successful  match, a partial match, or one of the invalid UTF
+       errors (for example, PCRE2_ERROR_UTF8_ERR5), pcre2_get_startchar()  can
+       be called. After a successful or partial match it returns the code unit
+       offset of the character at which the match started. For  a  non-partial
+       match,  this can be different to the value of ovector[0] if the pattern
+       contains the \K escape sequence. After a partial match,  however,  this
        value  is  always the same as ovector[0] because \K does not affect the
        result of a partial match.

-       The startchar field is also used to return the offset of an invalid UTF
-       character  when  UTF checking fails. Details are given in the pcre2uni-
-       code page.
+       After a UTF check failure, pcre2_get_startchar() can be used to  obtain
+       the code unit offset of the invalid UTF character. Details are given in
+       the pcre2unicode page.

 ERROR RETURNS FROM pcre2_match()
@@ -2225,33 +2249,36 @@
        Captured substrings can be accessed directly by using  the  ovector  as
        described above.  For convenience, auxiliary functions are provided for
        extracting  captured  substrings  as  new,  separate,   zero-terminated
-       strings.  The  functions in this section identify substrings by number.
-       The number zero refers to the entire  matched  substring,  with  higher
-       numbers  referring  to substrings captured by parenthesized groups. The
-       next section describes similar functions for extracting  captured  sub-
-       strings  by  name. A substring that contains a binary zero is correctly
-       extracted and has a further zero added on the end, but  the  result  is
-       not, of course, a C string.
+       strings. A substring that contains a binary zero is correctly extracted
+       and has a further zero added on the end, but  the  result  is  not,  of
+       course, a C string.

-       If  a  pattern uses the \K escape sequence within a positive assertion,
-       the reported start of the match can be greater  than  the  end  of  the
-       match.  For  example,  if the pattern (?=ab\K) is matched against "ab",
-       the start and end offset values for the match are 2 and 0. In this sit-
-       uation, calling these functions with a zero substring number extracts a
-       zero-length empty string.
+       The functions in this section identify substrings by number. The number
+       zero refers to the entire matched substring, with higher numbers refer-
+       ring  to  substrings  captured by parenthesized groups. After a partial
+       match, only substring zero is available.  An  attempt  to  extract  any
+       other  substring  gives the error PCRE2_ERROR_PARTIAL. The next section
+       describes similar functions for extracting captured substrings by name.

-       You can find the length in code units of a captured  substring  without
-       extracting  it  by calling pcre2_substring_length_bynumber(). The first
-       argument is a pointer to the match data block, the second is the  group
-       number,  and the third is a pointer to a variable into which the length
-       is placed. If you just want to know whether or not  the  substring  has
+       If a pattern uses the \K escape sequence within a  positive  assertion,
+       the reported start of a successful match can be greater than the end of
+       the match.  For example, if the pattern  (?=ab\K)  is  matched  against
+       "ab",  the  start  and  end offset values for the match are 2 and 0. In
+       this situation, calling these functions with a  zero  substring  number
+       extracts a zero-length empty string.
+
+       You  can  find the length in code units of a captured substring without
+       extracting it by calling pcre2_substring_length_bynumber().  The  first
+       argument  is a pointer to the match data block, the second is the group
+       number, and the third is a pointer to a variable into which the  length
+       is  placed.  If  you just want to know whether or not the substring has
        been captured, you can pass the third argument as NULL.

-       The  pcre2_substring_copy_bynumber()  function  copies  a captured sub-
-       string into a supplied buffer,  whereas  pcre2_substring_get_bynumber()
-       copies  it  into  new memory, obtained using the same memory allocation
-       function that was used for the match data block. The  first  two  argu-
-       ments  of  these  functions are a pointer to the match data block and a
+       The pcre2_substring_copy_bynumber() function  copies  a  captured  sub-
+       string  into  a supplied buffer, whereas pcre2_substring_get_bynumber()
+       copies it into new memory, obtained using the  same  memory  allocation
+       function  that  was  used for the match data block. The first two argu-
+       ments of these functions are a pointer to the match data  block  and  a
        capturing group number.

        The final arguments of pcre2_substring_copy_bynumber() are a pointer to
@@ -2260,23 +2287,25 @@
        for the extracted substring, excluding the terminating zero.

        For pcre2_substring_get_bynumber() the third and fourth arguments point
-       to variables that are updated with a pointer to the new memory and  the
-       number  of  code units that comprise the substring, again excluding the
-       terminating zero. When the substring is no longer  needed,  the  memory
+       to  variables that are updated with a pointer to the new memory and the
+       number of code units that comprise the substring, again  excluding  the
+       terminating  zero.  When  the substring is no longer needed, the memory
        should be freed by calling pcre2_substring_free().

-       The  return  value  from  all these functions is zero for success, or a
-       negative error code. If the pattern match  failed,  the  match  failure
-       code is returned.  Other possible error codes are:
+       The return value from all these functions is zero  for  success,  or  a
+       negative  error  code.  If  the pattern match failed, the match failure
+       code is returned.  If a substring number  greater  than  zero  is  used
+       after  a partial match, PCRE2_ERROR_PARTIAL is returned. Other possible
+       error codes are:

          PCRE2_ERROR_NOMEMORY

-       The  buffer  was  too small for pcre2_substring_copy_bynumber(), or the
+       The buffer was too small for  pcre2_substring_copy_bynumber(),  or  the
        attempt to get memory failed for pcre2_substring_get_bynumber().

          PCRE2_ERROR_NOSUBSTRING

-       There is no substring with that number in the  pattern,  that  is,  the
+       There  is  no  substring  with that number in the pattern, that is, the
        number is greater than the number of capturing parentheses.

          PCRE2_ERROR_UNAVAILABLE
@@ -2287,8 +2316,8 @@

          PCRE2_ERROR_UNSET

-       The  substring  did  not  participate in the match. For example, if the
-       pattern is (abc)|(def) and the subject is "def", and the  ovector  con-
+       The substring did not participate in the match.  For  example,  if  the
+       pattern  is  (abc)|(def) and the subject is "def", and the ovector con-
        tains at least two capturing slots, substring number 1 is unset.

@@ -2299,13 +2328,16 @@

        void pcre2_substring_list_free(PCRE2_SPTR *list);

-       The  pcre2_substring_list_get()  function  extracts  all available sub-
-       strings and builds a list of pointers to  them.  It  also  (optionally)
-       builds  a  second  list  that  contains  their lengths (in code units),
+       The pcre2_substring_list_get() function  extracts  all  available  sub-
+       strings  and  builds  a  list of pointers to them. It also (optionally)
+       builds a second list that  contains  their  lengths  (in  code  units),
        excluding a terminating zero that is added to each of them. All this is
        done in a single block of memory that is obtained using the same memory
        allocation function that was used to get the match data block.

+       This  function  must be called only after a successful match. If called
+       after a partial match, the error code PCRE2_ERROR_PARTIAL is returned.
+
        The address of the memory block is returned via listptr, which is  also
        the start of the list of string pointers. The end of the list is marked
        by a NULL pointer. The address of the list of lengths is  returned  via
@@ -2694,7 +2726,7 @@

REVISION

-       Last updated: 14 December 2014
+       Last updated: 22 December 2014
        Copyright (c) 1997-2014 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -4314,9 +4346,9 @@
        string at the end of the subject.

        When a partial match is returned, the first two elements in the ovector
-       point to the portion of the subject that was matched. The appearance of
-       \K in the pattern has no effect for a partial match. Consider this pat-
-       tern:
+       point to the portion of the subject that was matched, but the values in
+       the rest of the ovector are undefined. The appearance of \K in the pat-
+       tern has no effect for a partial match. Consider this pattern:

          /abc\K123/

@@ -4678,7 +4710,7 @@

REVISION

-       Last updated: 14 October 2014
+       Last updated: 22 December 2014
        Copyright (c) 1997-2014 University of Cambridge.
 ------------------------------------------------------------------------------

Modified: code/trunk/doc/pcre2_substring_length_bynumber.3
===================================================================
--- code/trunk/doc/pcre2_substring_length_bynumber.3    2014-12-19 09:55:25 UTC (rev 176)
+++ code/trunk/doc/pcre2_substring_length_bynumber.3    2014-12-22 17:33:10 UTC (rev 177)
@@ -1,4 +1,4 @@
-.TH PCRE2_SUBSTRING_LENGTH_BYNUMBER 3 "01 December 2014" "PCRE2 10.00"
+.TH PCRE2_SUBSTRING_LENGTH_BYNUMBER 3 "22 December 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@@ -19,9 +19,11 @@
 .sp
   \fImatch_data\fP   The match data block for the match
   \fInumber\fP       The substring number
-  \fIlength\fP       Where to return the length
+  \fIlength\fP       Where to return the length, or NULL
 .sp
-The yield is zero on success, or an error code if the substring is not found.
+The third argument may be NULL if all you want to know is whether or not a
+substring is set. The yield is zero on success, or a negative error code
+otherwise. After a partial match, only substring 0 is available.
 .P
 There is a complete description of the PCRE2 native API in the
 .\" HREF

Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2014-12-19 09:55:25 UTC (rev 176)
+++ code/trunk/doc/pcre2api.3    2014-12-22 17:33:10 UTC (rev 177)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "14 December 2014" "PCRE2 10.00"
+.TH PCRE2API 3 "22 December 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@@ -1736,6 +1736,11 @@
 .\"
 below.
 .P
+When a call of \fBpcre2_match()\fP fails, valid data is available in the match
+block only when the error is PCRE2_ERROR_NOMATCH, PCRE2_ERROR_PARTIAL, or one
+of the error codes for an invalid UTF string. Exactly what is available depends
+on the error, and is detailed below.
+.P
 When one of the matching functions is called, pointers to the compiled pattern
 and the subject string are set in the match data block so that they can be
 referenced by the extraction functions. After running a match, you must not
@@ -2031,9 +2036,9 @@
 function can be used to find out how many capturing subpatterns there are in a
 compiled pattern.
 .P
-The overall matched string and any captured substrings are returned to the
-caller via a vector of PCRE2_SIZE values. This is called the \fBovector\fP, and
-is contained within the
+A successful match returns the overall matched string and any captured
+substrings to the caller via a vector of PCRE2_SIZE values. This is called the
+\fBovector\fP, and is contained within the
 .\" HTML <a href="#matchdatablock">
 .\" </a>
 match data block.
@@ -2061,19 +2066,26 @@
 library, 16-bit offsets in the 16-bit library, and 32-bit offsets in the 32-bit
 library.
 .P
-The first pair of offsets (that is, \fIovector[0]\fP and \fIovector[1]\fP)
-identifies the portion of the subject string that was matched by the entire
-pattern. The next pair is used for the first capturing subpattern, and so on.
-The value returned by \fBpcre2_match()\fP is one more than the highest numbered
-pair that has been set. For example, if two substrings have been captured, the
-returned value is 3. If there are no capturing subpatterns, the return value
-from a successful match is 1, indicating that just the first pair of offsets
-has been set.
+After a partial match (error return PCRE2_ERROR_PARTIAL), only the first pair
+of offsets (that is, \fIovector[0]\fP and \fIovector[1]\fP) are set. They
+identify the part of the subject that was partially matched. See the
+.\" HREF
+\fBpcre2partial\fP
+.\"
+documentation for details of partial matching.
 .P
+After a successful match, the first pair of offsets identifies the portion of
+the subject string that was matched by the entire pattern. The next pair is
+used for the first capturing subpattern, and so on. The value returned by
+\fBpcre2_match()\fP is one more than the highest numbered pair that has been
+set. For example, if two substrings have been captured, the returned value is
+3. If there are no capturing subpatterns, the return value from a successful
+match is 1, indicating that just the first pair of offsets has been set.
+.P
 If a pattern uses the \eK escape sequence within a positive assertion, the
-reported start of the match can be greater than the end of the match. For
-example, if the pattern (?=ab\eK) is matched against "ab", the start and end
-offset values for the match are 2 and 0.
+reported start of a successful match can be greater than the end of the match.
+For example, if the pattern (?=ab\eK) is matched against "ab", the start and
+end offset values for the match are 2 and 0.
 .P
 If a capturing subpattern group is matched repeatedly within a single match
 operation, it is the last portion of the subject that it matched that is
@@ -2121,21 +2133,35 @@
 .fi
 .P
 As well as the offsets in the ovector, other information about a match is
-retained in the match data block and can be retrieved by the above functions.
+retained in the match data block and can be retrieved by the above functions in
+appropriate circumstances. If they are called at other times, the result is
+undefined.
 .P
-When a (*MARK) name is to be passed back, \fBpcre2_get_mark()\fP returns a
-pointer to the zero-terminated name, which is within the compiled pattern.
-Otherwise NULL is returned. A (*MARK) name may be available after a failed
-match or a partial match, as well as after a successful one.
+After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure
+to match (PCRE2_ERROR_NOMATCH), a (*MARK) name may be available, and
+\fBpcre2_get_mark()\fP can be called. It returns a pointer to the
+zero-terminated name, which is within the compiled pattern. Otherwise NULL is
+returned. After a successful match, the (*MARK) name that is returned is the
+last one encountered on the matching path through the pattern. After a "no
+match" or a partial match, the last encountered (*MARK) name is returned. For
+example, consider this pattern:
+.sp
+  ^(*MARK:A)((*MARK:B)a|b)c
+.sp
+When it matches "bc", the returned mark is A. The B mark is "seen" in the first
+branch of the group, but it is not on the matching path. On the other hand,
+when this pattern fails to match "bx", the returned mark is B.
 .P
-The code unit offset of the character at which a successful match started is
-returned by \fBpcre2_get_startchar()\fP. For a non-partial match, this can be
+After a successful match, a partial match, or one of the invalid UTF errors
+(for example, PCRE2_ERROR_UTF8_ERR5), \fBpcre2_get_startchar()\fP can be
+called. After a successful or partial match it returns the code unit offset of
+the character at which the match started. For a non-partial match, this can be
 different to the value of \fIovector[0]\fP if the pattern contains the \eK
 escape sequence. After a partial match, however, this value is always the same
 as \fIovector[0]\fP because \eK does not affect the result of a partial match.
 .P
-The \fBstartchar\fP field is also used to return the offset of an invalid
-UTF character when UTF checking fails. Details are given in the
+After a UTF check failure, \fBpcre2_get_startchar()\fB can be used to obtain
+the code unit offset of the invalid UTF character. Details are given in the
 .\" HREF
 \fBpcre2unicode\fP
 .\"
@@ -2289,18 +2315,21 @@
 above.
 .\"
 For convenience, auxiliary functions are provided for extracting captured
-substrings as new, separate, zero-terminated strings. The functions in this
-section identify substrings by number. The number zero refers to the entire
-matched substring, with higher numbers referring to substrings captured by
-parenthesized groups. The next section describes similar functions for
-extracting captured substrings by name. A substring that contains a binary zero
-is correctly extracted and has a further zero added on the end, but the result
-is not, of course, a C string.
+substrings as new, separate, zero-terminated strings. A substring that contains
+a binary zero is correctly extracted and has a further zero added on the end,
+but the result is not, of course, a C string.
 .P
+The functions in this section identify substrings by number. The number zero
+refers to the entire matched substring, with higher numbers referring to
+substrings captured by parenthesized groups. After a partial match, only
+substring zero is available. An attempt to extract any other substring gives
+the error PCRE2_ERROR_PARTIAL. The next section describes similar functions for
+extracting captured substrings by name.
+.P
 If a pattern uses the \eK escape sequence within a positive assertion, the
-reported start of the match can be greater than the end of the match. For
-example, if the pattern (?=ab\eK) is matched against "ab", the start and end
-offset values for the match are 2 and 0. In this situation, calling these
+reported start of a successful match can be greater than the end of the match.
+For example, if the pattern (?=ab\eK) is matched against "ab", the start and
+end offset values for the match are 2 and 0. In this situation, calling these
 functions with a zero substring number extracts a zero-length empty string.
 .P
 You can find the length in code units of a captured substring without
@@ -2329,7 +2358,8 @@
 .P
 The return value from all these functions is zero for success, or a negative
 error code. If the pattern match failed, the match failure code is returned.
-Other possible error codes are:
+If a substring number greater than zero is used after a partial match,
+PCRE2_ERROR_PARTIAL is returned. Other possible error codes are:
 .sp
   PCRE2_ERROR_NOMEMORY
 .sp
@@ -2371,6 +2401,9 @@
 that is obtained using the same memory allocation function that was used to get
 the match data block.
 .P
+This function must be called only after a successful match. If called after a
+partial match, the error code PCRE2_ERROR_PARTIAL is returned.
+.P
 The address of the memory block is returned via \fIlistptr\fP, which is also
 the start of the list of string pointers. The end of the list is marked by a
 NULL pointer. The address of the list of lengths is returned via
@@ -2802,6 +2835,6 @@
 .rs
 .sp
 .nf
-Last updated: 14 December 2014
+Last updated: 22 December 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcre2partial.3
===================================================================
--- code/trunk/doc/pcre2partial.3    2014-12-19 09:55:25 UTC (rev 176)
+++ code/trunk/doc/pcre2partial.3    2014-12-22 17:33:10 UTC (rev 177)
@@ -1,4 +1,4 @@
-.TH PCRE2PARTIAL 3 "14 October 2014" "PCRE2 10.00"
+.TH PCRE2PARTIAL 3 "22 December 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions
 .SH "PARTIAL MATCHING IN PCRE2"
@@ -64,8 +64,9 @@
 empty string at the end of the subject.
 .P
 When a partial match is returned, the first two elements in the ovector point
-to the portion of the subject that was matched. The appearance of \eK in the
-pattern has no effect for a partial match. Consider this pattern:
+to the portion of the subject that was matched, but the values in the rest of
+the ovector are undefined. The appearance of \eK in the pattern has no effect
+for a partial match. Consider this pattern:
 .sp
   /abc\eK123/
 .sp
@@ -428,6 +429,6 @@
 .rs
 .sp
 .nf
-Last updated: 14 October 2014
+Last updated: 22 December 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi

Modified: code/trunk/src/pcre2_substring.c
===================================================================
--- code/trunk/src/pcre2_substring.c    2014-12-19 09:55:25 UTC (rev 176)
+++ code/trunk/src/pcre2_substring.c    2014-12-22 17:33:10 UTC (rev 177)
@@ -312,9 +312,15 @@
 pcre2_substring_length_bynumber(pcre2_match_data *match_data,
   uint32_t stringnumber, PCRE2_SIZE *sizeptr)
 {
-int count;
 PCRE2_SIZE left, right;
-if ((count = match_data->rc) < 0) return count;   /* Match failed */
+int count = match_data->rc;
+if (count == PCRE2_ERROR_PARTIAL)
+  {
+  if (stringnumber > 0) return PCRE2_ERROR_PARTIAL;
+  count = 0;
+  }
+else if (count < 0) return count;            /* Match failed */
+
 if (match_data->matchedby != PCRE2_MATCHEDBY_DFA_INTERPRETER)
   {
   if (stringnumber > match_data->code->top_bracket)
@@ -329,6 +335,7 @@
   if (stringnumber >= match_data->oveccount) return PCRE2_ERROR_UNAVAILABLE;
   if (count != 0 && stringnumber >= (uint32_t)count) return PCRE2_ERROR_UNSET;
   }
+
 left = match_data->ovector[stringnumber*2];
 right = match_data->ovector[stringnumber*2+1];
 if (sizeptr != NULL) *sizeptr = (left > right)? 0 : right - left;

Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2014-12-19 09:55:25 UTC (rev 176)
+++ code/trunk/src/pcre2test.c    2014-12-22 17:33:10 UTC (rev 177)
@@ -4234,6 +4234,232 @@

 /*************************************************
+*       Handle *MARK and copy/get tests          *
+*************************************************/
+
+/* This function is called after complete and partial matches. It runs the
+tests for substring extraction.
+
+Arguments:
+  utf       TRUE for utf
+  capcount  return from pcre2_match()
+
+Returns:    nothing
+*/
+
+static void
+copy_and_get(BOOL utf, int capcount)
+{
+int i;
+uint8_t *nptr;
+
+/* Test copy strings by number */
+
+for (i = 0; i < MAXCPYGET && dat_datctl.copy_numbers[i] >= 0; i++)
+  {
+  int rc;
+  PCRE2_SIZE length, length2;
+  uint32_t copybuffer[256];
+  uint32_t n = (uint32_t)(dat_datctl.copy_numbers[i]);
+  length = sizeof(copybuffer)/code_unit_size;
+  PCRE2_SUBSTRING_COPY_BYNUMBER(rc, match_data, n, copybuffer, &length);
+  if (rc < 0)
+    {
+    fprintf(outfile, "Copy substring %d failed (%d): ", n, rc);
+    PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
+    PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
+    fprintf(outfile, "\n");
+    }
+  else
+    {
+    PCRE2_SUBSTRING_LENGTH_BYNUMBER(rc, match_data, n, &length2);
+    if (rc < 0)
+      {
+      fprintf(outfile, "Get substring %d length failed (%d): ", n, rc);
+      PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
+      PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
+      fprintf(outfile, "\n");
+      }
+    else if (length2 != length)
+      {
+      fprintf(outfile, "Mismatched substring lengths: %ld %ld\n",
+        length, length2);
+      }
+    fprintf(outfile, "%2dC ", n);
+    PCHARSV(copybuffer, 0, length, utf, outfile);
+    fprintf(outfile, " (%lu)\n", (unsigned long)length);
+    }
+  }
+
+/* Test copy strings by name */
+
+nptr = dat_datctl.copy_names;
+for (;;)
+  {
+  int rc;
+  int groupnumber;
+  PCRE2_SIZE length, length2;
+  uint32_t copybuffer[256];
+  int namelen = strlen((const char *)nptr);
+#if defined SUPPORT_PCRE2_16 || defined SUPPORT_PCRE2_32
+  PCRE2_SIZE cnl = namelen;
+#endif
+  if (namelen == 0) break;
+
+#ifdef SUPPORT_PCRE2_8
+  if (test_mode == PCRE8_MODE) strcpy((char *)pbuffer8, (char *)nptr);
+#endif
+#ifdef SUPPORT_PCRE2_16
+  if (test_mode == PCRE16_MODE)(void)to16(nptr, utf, &cnl);
+#endif
+#ifdef SUPPORT_PCRE2_32
+  if (test_mode == PCRE32_MODE)(void)to32(nptr, utf, &cnl);
+#endif
+
+  PCRE2_SUBSTRING_NUMBER_FROM_NAME(groupnumber, compiled_code, pbuffer);
+  if (groupnumber < 0 && groupnumber != PCRE2_ERROR_NOUNIQUESUBSTRING)
+    fprintf(outfile, "Number not found for group '%s'\n", nptr);
+
+  length = sizeof(copybuffer)/code_unit_size;
+  PCRE2_SUBSTRING_COPY_BYNAME(rc, match_data, pbuffer, copybuffer, &length);
+  if (rc < 0)
+    {
+    fprintf(outfile, "Copy substring '%s' failed (%d): ", nptr, rc);
+    PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
+    PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
+    fprintf(outfile, "\n");
+    }
+  else
+    {
+    PCRE2_SUBSTRING_LENGTH_BYNAME(rc, match_data, pbuffer, &length2);
+    if (rc < 0)
+      {
+      fprintf(outfile, "Get substring '%s' length failed (%d): ", nptr, rc);
+      PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
+      PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
+      fprintf(outfile, "\n");
+      }
+    else if (length2 != length)
+      {
+      fprintf(outfile, "Mismatched substring lengths: %ld %ld\n",
+        length, length2);
+      }
+    fprintf(outfile, "  C ");
+    PCHARSV(copybuffer, 0, length, utf, outfile);
+    fprintf(outfile, " (%lu) %s", (unsigned long)length, nptr);
+    if (groupnumber >= 0) fprintf(outfile, " (group %d)\n", groupnumber);
+      else fprintf(outfile, " (non-unique)\n");
+    }
+  nptr += namelen + 1;
+  }
+
+/* Test get strings by number */
+
+for (i = 0; i < MAXCPYGET && dat_datctl.get_numbers[i] >= 0; i++)
+  {
+  int rc;
+  PCRE2_SIZE length;
+  void *gotbuffer;
+  uint32_t n = (uint32_t)(dat_datctl.get_numbers[i]);
+  PCRE2_SUBSTRING_GET_BYNUMBER(rc, match_data, n, &gotbuffer, &length);
+  if (rc < 0)
+    {
+    fprintf(outfile, "Get substring %d failed (%d): ", n, rc);
+    PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
+    PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
+    fprintf(outfile, "\n");
+    }
+  else
+    {
+    fprintf(outfile, "%2dG ", n);
+    PCHARSV(gotbuffer, 0, length, utf, outfile);
+    fprintf(outfile, " (%lu)\n", (unsigned long)length);
+    PCRE2_SUBSTRING_FREE(gotbuffer);
+    }
+  }
+
+/* Test get strings by name */
+
+nptr = dat_datctl.get_names;
+for (;;)
+  {
+  PCRE2_SIZE length;
+  void *gotbuffer;
+  int rc;
+  int groupnumber;
+  int namelen = strlen((const char *)nptr);
+#if defined SUPPORT_PCRE2_16 || defined SUPPORT_PCRE2_32
+  PCRE2_SIZE cnl = namelen;
+#endif
+  if (namelen == 0) break;
+
+#ifdef SUPPORT_PCRE2_8
+  if (test_mode == PCRE8_MODE) strcpy((char *)pbuffer8, (char *)nptr);
+#endif
+#ifdef SUPPORT_PCRE2_16
+  if (test_mode == PCRE16_MODE)(void)to16(nptr, utf, &cnl);
+#endif
+#ifdef SUPPORT_PCRE2_32
+  if (test_mode == PCRE32_MODE)(void)to32(nptr, utf, &cnl);
+#endif
+
+  PCRE2_SUBSTRING_NUMBER_FROM_NAME(groupnumber, compiled_code, pbuffer);
+  if (groupnumber < 0 && groupnumber != PCRE2_ERROR_NOUNIQUESUBSTRING)
+    fprintf(outfile, "Number not found for group '%s'\n", nptr);
+
+  PCRE2_SUBSTRING_GET_BYNAME(rc, match_data, pbuffer, &gotbuffer, &length);
+  if (rc < 0)
+    {
+    fprintf(outfile, "Get substring '%s' failed (%d): ", nptr, rc);
+    PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
+    PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
+    fprintf(outfile, "\n");
+    }
+  else
+    {
+    fprintf(outfile, "  G ");
+    PCHARSV(gotbuffer, 0, length, utf, outfile);
+    fprintf(outfile, " (%lu) %s", (unsigned long)length, nptr);
+    if (groupnumber >= 0) fprintf(outfile, " (group %d)\n", groupnumber);
+      else fprintf(outfile, " (non-unique)\n");
+    PCRE2_SUBSTRING_FREE(gotbuffer);
+    }
+  nptr += namelen + 1;
+  }
+
+/* Test getting the complete list of captured strings. */
+
+if ((dat_datctl.control & CTL_GETALL) != 0)
+  {
+  int rc;
+  void **stringlist;
+  PCRE2_SIZE *lengths;
+  PCRE2_SUBSTRING_LIST_GET(rc, match_data, &stringlist, &lengths);
+  if (rc < 0)
+    {
+    fprintf(outfile, "get substring list failed (%d): ", rc);
+    PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
+    PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
+    fprintf(outfile, "\n");
+    }
+  else
+    {
+    for (i = 0; i < capcount; i++)
+      {
+      fprintf(outfile, "%2dL ", i);
+      PCHARSV(stringlist[i], 0, lengths[i], utf, outfile);
+      putc('\n', outfile);
+      }
+    if (stringlist[i] != NULL)
+      fprintf(outfile, "string list not terminated by NULL\n");
+    PCRE2_SUBSTRING_LIST_FREE(stringlist);
+    }
+  }
+}
+
+
+
+/*************************************************
 *               Process a data line              *
 *************************************************/

@@ -5074,7 +5300,6 @@
     {
     int i;
     uint32_t oveccount;
-    uint8_t *nptr;

     /* This is a check against a lunatic return value. */

@@ -5239,7 +5464,7 @@
         }
       }

-    /* Output mark data if requested. */
+    /* Output (*MARK) data if requested */

     if ((dat_datctl.control & CTL_MARK) != 0 &&
          TESTFLD(match_data, mark, !=, NULL))
@@ -5249,208 +5474,10 @@
       fprintf(outfile, "\n");
       }

-    /* Test copy strings by number */
+    /* Process copy/get strings */

-    for (i = 0; i < MAXCPYGET && dat_datctl.copy_numbers[i] >= 0; i++)
-      {
-      int rc;
-      PCRE2_SIZE length, length2;
-      uint32_t copybuffer[256];
-      uint32_t n = (uint32_t)(dat_datctl.copy_numbers[i]);
-      length = sizeof(copybuffer)/code_unit_size;
-      PCRE2_SUBSTRING_COPY_BYNUMBER(rc, match_data, n, copybuffer, &length);
-      if (rc < 0)
-        {
-        fprintf(outfile, "Copy substring %d failed (%d): ", n, rc);
-        PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
-        PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
-        fprintf(outfile, "\n");
-        }
-      else
-        {
-        PCRE2_SUBSTRING_LENGTH_BYNUMBER(rc, match_data, n, &length2);
-        if (rc < 0)
-          {
-          fprintf(outfile, "Get substring %d length failed (%d): ", n, rc);
-          PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
-          PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
-          fprintf(outfile, "\n");
-          }
-        else if (length2 != length)
-          {
-          fprintf(outfile, "Mismatched substring lengths: %ld %ld\n",
-            length, length2);
-          }
-        fprintf(outfile, "%2dC ", n);
-        PCHARSV(copybuffer, 0, length, utf, outfile);
-        fprintf(outfile, " (%lu)\n", (unsigned long)length);
-        }
-      }
+    copy_and_get(utf, capcount);

-    /* Test copy strings by name */
-
-    nptr = dat_datctl.copy_names;
-    for (;;)
-      {
-      int rc;
-      int groupnumber;
-      PCRE2_SIZE length, length2;
-      uint32_t copybuffer[256];
-      int namelen = strlen((const char *)nptr);
-#if defined SUPPORT_PCRE2_16 || defined SUPPORT_PCRE2_32
-      PCRE2_SIZE cnl = namelen;
-#endif
-      if (namelen == 0) break;
-
-#ifdef SUPPORT_PCRE2_8
-      if (test_mode == PCRE8_MODE) strcpy((char *)pbuffer8, (char *)nptr);
-#endif
-#ifdef SUPPORT_PCRE2_16
-      if (test_mode == PCRE16_MODE)(void)to16(nptr, utf, &cnl);
-#endif
-#ifdef SUPPORT_PCRE2_32
-      if (test_mode == PCRE32_MODE)(void)to32(nptr, utf, &cnl);
-#endif
-
-      PCRE2_SUBSTRING_NUMBER_FROM_NAME(groupnumber, compiled_code, pbuffer);
-      if (groupnumber < 0 && groupnumber != PCRE2_ERROR_NOUNIQUESUBSTRING)
-        fprintf(outfile, "Number not found for group '%s'\n", nptr);
-
-      length = sizeof(copybuffer)/code_unit_size;
-      PCRE2_SUBSTRING_COPY_BYNAME(rc, match_data, pbuffer, copybuffer, &length);
-      if (rc < 0)
-        {
-        fprintf(outfile, "Copy substring '%s' failed (%d): ", nptr, rc);
-        PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
-        PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
-        fprintf(outfile, "\n");
-        }
-      else
-        {
-        PCRE2_SUBSTRING_LENGTH_BYNAME(rc, match_data, pbuffer, &length2);
-        if (rc < 0)
-          {
-          fprintf(outfile, "Get substring '%s' length failed (%d): ", nptr, rc);
-          PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
-          PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
-          fprintf(outfile, "\n");
-          }
-        else if (length2 != length)
-          {
-          fprintf(outfile, "Mismatched substring lengths: %ld %ld\n",
-            length, length2);
-          }
-        fprintf(outfile, "  C ");
-        PCHARSV(copybuffer, 0, length, utf, outfile);
-        fprintf(outfile, " (%lu) %s", (unsigned long)length, nptr);
-        if (groupnumber >= 0) fprintf(outfile, " (group %d)\n", groupnumber);
-          else fprintf(outfile, " (non-unique)\n");
-        }
-      nptr += namelen + 1;
-      }
-
-    /* Test get strings by number */
-
-    for (i = 0; i < MAXCPYGET && dat_datctl.get_numbers[i] >= 0; i++)
-      {
-      int rc;
-      PCRE2_SIZE length;
-      void *gotbuffer;
-      uint32_t n = (uint32_t)(dat_datctl.get_numbers[i]);
-      PCRE2_SUBSTRING_GET_BYNUMBER(rc, match_data, n, &gotbuffer, &length);
-      if (rc < 0)
-        {
-        fprintf(outfile, "Get substring %d failed (%d): ", n, rc);
-        PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
-        PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
-        fprintf(outfile, "\n");
-        }
-      else
-        {
-        fprintf(outfile, "%2dG ", n);
-        PCHARSV(gotbuffer, 0, length, utf, outfile);
-        fprintf(outfile, " (%lu)\n", (unsigned long)length);
-        PCRE2_SUBSTRING_FREE(gotbuffer);
-        }
-      }
-
-    /* Test get strings by name */
-
-    nptr = dat_datctl.get_names;
-    for (;;)
-      {
-      PCRE2_SIZE length;
-      void *gotbuffer;
-      int rc;
-      int groupnumber;
-      int namelen = strlen((const char *)nptr);
-#if defined SUPPORT_PCRE2_16 || defined SUPPORT_PCRE2_32
-      PCRE2_SIZE cnl = namelen;
-#endif
-      if (namelen == 0) break;
-
-#ifdef SUPPORT_PCRE2_8
-      if (test_mode == PCRE8_MODE) strcpy((char *)pbuffer8, (char *)nptr);
-#endif
-#ifdef SUPPORT_PCRE2_16
-      if (test_mode == PCRE16_MODE)(void)to16(nptr, utf, &cnl);
-#endif
-#ifdef SUPPORT_PCRE2_32
-      if (test_mode == PCRE32_MODE)(void)to32(nptr, utf, &cnl);
-#endif
-
-      PCRE2_SUBSTRING_NUMBER_FROM_NAME(groupnumber, compiled_code, pbuffer);
-      if (groupnumber < 0 && groupnumber != PCRE2_ERROR_NOUNIQUESUBSTRING)
-        fprintf(outfile, "Number not found for group '%s'\n", nptr);
-
-      PCRE2_SUBSTRING_GET_BYNAME(rc, match_data, pbuffer, &gotbuffer, &length);
-      if (rc < 0)
-        {
-        fprintf(outfile, "Get substring '%s' failed (%d): ", nptr, rc);
-        PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
-        PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
-        fprintf(outfile, "\n");
-        }
-      else
-        {
-        fprintf(outfile, "  G ");
-        PCHARSV(gotbuffer, 0, length, utf, outfile);
-        fprintf(outfile, " (%lu) %s", (unsigned long)length, nptr);
-        if (groupnumber >= 0) fprintf(outfile, " (group %d)\n", groupnumber);
-          else fprintf(outfile, " (non-unique)\n");
-        PCRE2_SUBSTRING_FREE(gotbuffer);
-        }
-      nptr += namelen + 1;
-      }
-
-    /* Test getting the complete list of captured strings. */
-
-    if ((dat_datctl.control & CTL_GETALL) != 0)
-      {
-      int rc;
-      void **stringlist;
-      PCRE2_SIZE *lengths;
-      PCRE2_SUBSTRING_LIST_GET(rc, match_data, &stringlist, &lengths);
-      if (rc < 0)
-        {
-        fprintf(outfile, "get substring list failed (%d): ", rc);
-        PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
-        PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
-        fprintf(outfile, "\n");
-        }
-      else
-        {
-        for (i = 0; i < capcount; i++)
-          {
-          fprintf(outfile, "%2dL ", i);
-          PCHARSV(stringlist[i], 0, lengths[i], utf, outfile);
-          putc('\n', outfile);
-          }
-        if (stringlist[i] != NULL)
-          fprintf(outfile, "string list not terminated by NULL\n");
-        PCRE2_SUBSTRING_LIST_FREE(stringlist);
-        }
-      }
     }    /* End of handling a successful match */

   /* There was a partial match. The value of ovector[0] is the bumpalong point,
@@ -5489,6 +5516,10 @@
       fprintf(outfile, "\n");
       }

+    /* Process copy/get strings */
+
+    copy_and_get(utf, 1);
+
     break;  /* Out of the /g loop */
     }       /* End of handling partial match */

Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2014-12-19 09:55:25 UTC (rev 176)
+++ code/trunk/testdata/testinput2    2014-12-22 17:33:10 UTC (rev 177)
@@ -4097,4 +4097,7 @@
     a\=ovector=2,copy=A,get=A,get=2
     b\=ovector=2,copy=A,get=A,get=2

+/a(b)c(d)/
+    abc\=ph,copy=0,copy=1,getall
+
 # End of testinput2

Modified: code/trunk/testdata/testinput6
===================================================================
(Binary files differ)

Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2014-12-19 09:55:25 UTC (rev 176)
+++ code/trunk/testdata/testoutput2    2014-12-22 17:33:10 UTC (rev 177)
@@ -13762,4 +13762,11 @@
 Get substring 2 failed (-54): requested value is not available
 Get substring 'A' failed (-55): requested value is not set

+/a(b)c(d)/
+    abc\=ph,copy=0,copy=1,getall
+Partial match: abc
+ 0C abc (3)
+Copy substring 1 failed (-2): partial match
+get substring list failed (-2): partial match
+
 # End of testinput2

Modified: code/trunk/testdata/testoutput6
===================================================================
(Binary files differ)

This message is part of the following thread:
	the complete thread tree sorted by date

[Pcre-svn] [177] code/trunk: Improvements for substring hand…