[Pcre-svn] [174] code/trunk: Update and improve substring handling and its documentation.

著者: Subversion repository
日付:
To: pcre-svn
題目: [Pcre-svn] [174] code/trunk: Update and improve substring handling and its documentation.

Revision: 174

          http://www.exim.org/viewvc/pcre2?view=rev&revision=174
Author:   ph10
Date:     2014-12-14 17:17:06 +0000 (Sun, 14 Dec 2014)

Log Message:
-----------
Update and improve substring handling and its documentation.

Modified Paths:
--------------
    code/trunk/doc/pcre2api.3
    code/trunk/src/pcre2.h.in
    code/trunk/src/pcre2_dfa_match.c
    code/trunk/src/pcre2_error.c
    code/trunk/src/pcre2_internal.h
    code/trunk/src/pcre2_intmodedep.h
    code/trunk/src/pcre2_jit_match.c
    code/trunk/src/pcre2_match.c
    code/trunk/src/pcre2_substring.c
    code/trunk/testdata/grepoutput
    code/trunk/testdata/testinput2
    code/trunk/testdata/testinput6
    code/trunk/testdata/testoutput14
    code/trunk/testdata/testoutput16
    code/trunk/testdata/testoutput2
    code/trunk/testdata/testoutput6
    code/trunk/testdata/testoutput7

Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2014-12-13 17:43:26 UTC (rev 173)
+++ code/trunk/doc/pcre2api.3    2014-12-14 17:17:06 UTC (rev 174)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "13 December 2014" "PCRE2 10.00"
+.TH PCRE2API 3 "14 December 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@@ -921,6 +921,16 @@
 contains the compiled pattern and related data. The caller must free the memory
 by calling \fBpcre2_code_free()\fP when it is no longer needed.
 .P
+NOTE: When one of the matching functions is called, pointers to the compiled
+pattern and the subject string are set in the match data block so that they can
+be referenced by the extraction functions. After running a match, you must not
+free a compiled pattern (or a subject string) until after all operations on the
+.\" HTML <a href="#matchdatablock">
+.\" </a>
+match data block 
+.\"
+have taken place.
+.P
 If the compile context argument \fIccontext\fP is NULL, memory for the compiled
 pattern is obtained by calling \fBmalloc()\fP. Otherwise, it is obtained from
 the same memory function that was used for the compile context.
@@ -1683,7 +1693,7 @@
 .B void pcre2_match_data_free(pcre2_match_data *\fImatch_data\fP);
 .fi
 .P
-Information about successful and unsuccessful matches is placed in a match
+Information about a successful or unsuccessful match is placed in a match
 data block, which is an opaque structure that is accessed by function calls. In
 particular, the match data block contains a vector of offsets into the subject
 string that define the matched part of the subject and any substrings that were
@@ -1713,10 +1723,8 @@
 pattern (custom or default).
 .P
 A match data block can be used many times, with the same or different compiled
-patterns. When it is no longer needed, it should be freed by calling
-\fBpcre2_match_data_free()\fP. You can extract information from a match data
-block after a match operation has finished, using functions that are described
-in the sections on
+patterns. You can extract information from a match data block after a match
+operation has finished, using functions that are described in the sections on
 .\" HTML <a href="#matchedstrings">
 .\" </a>
 matched strings
@@ -1727,6 +1735,15 @@
 other match data
 .\"
 below.
+.P
+When one of the matching functions is called, pointers to the compiled pattern
+and the subject string are set in the match data block so that they can be
+referenced by the extraction functions. After running a match, you must not
+free a compiled pattern or a subject string until after all operations on the
+match data block (for that match) have taken place.
+.P
+When a match data block itself is no longer needed, it should be freed by
+calling \fBpcre2_match_data_free()\fP.
 .
 .
 .SH "MATCHING A PATTERN: THE TRADITIONAL FUNCTION"
@@ -2053,8 +2070,13 @@
 from a successful match is 1, indicating that just the first pair of offsets
 has been set.
 .P
-If a capturing subpattern is matched repeatedly within a single match
-operation, it is the last portion of the string that it matched that is
+If a pattern uses the \eK escape sequence within a positive assertion, the 
+reported start of the match can be greater than the end of the match. For 
+example, if the pattern (?=ab\eK) is matched against "ab", the start and end 
+offset values for the match are 2 and 0.
+.P
+If a capturing subpattern group is matched repeatedly within a single match
+operation, it is the last portion of the subject that it matched that is
 returned.
 .P
 If the ovector is too small to hold all the captured substring offsets, as much
@@ -2268,23 +2290,31 @@
 .\"
 For convenience, auxiliary functions are provided for extracting captured
 substrings as new, separate, zero-terminated strings. The functions in this
-section identify substrings by number. The next section describes similar
-functions for extracting substrings by name. A substring that contains a binary
-zero is correctly extracted and has a further zero added on the end, but the
-result is not, of course, a C string.
+section identify substrings by number. The number zero refers to the entire
+matched substring, with higher numbers referring to substrings captured by
+parenthesized groups. The next section describes similar functions for
+extracting captured substrings by name. A substring that contains a binary zero
+is correctly extracted and has a further zero added on the end, but the result
+is not, of course, a C string.
 .P
+If a pattern uses the \eK escape sequence within a positive assertion, the 
+reported start of the match can be greater than the end of the match. For 
+example, if the pattern (?=ab\eK) is matched against "ab", the start and end 
+offset values for the match are 2 and 0. In this situation, calling these 
+functions with a zero substring number extracts a zero-length empty string.
+.P
 You can find the length in code units of a captured substring without
 extracting it by calling \fBpcre2_substring_length_bynumber()\fP. The first
 argument is a pointer to the match data block, the second is the group number,
-and the third is a pointer to a variable into which the length is placed.
+and the third is a pointer to a variable into which the length is placed. If 
+you just want to know whether or not the substring has been captured, you can 
+pass the third argument as NULL.
 .P
-The \fBpcre2_substring_copy_bynumber()\fP function copies one string into a
-supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it into
-new memory, obtained using the same memory allocation function that was used
-for the match data block. The first two arguments of these functions are a
-pointer to the match data block and a capturing group number. A group number of
-zero extracts the substring that matched the entire pattern, and higher values
-extract the captured substrings.
+The \fBpcre2_substring_copy_bynumber()\fP function copies a captured substring
+into a supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it
+into new memory, obtained using the same memory allocation function that was
+used for the match data block. The first two arguments of these functions are a
+pointer to the match data block and a capturing group number.
 .P
 The final arguments of \fBpcre2_substring_copy_bynumber()\fP are a pointer to
 the buffer and a pointer to a variable that contains its length in code units.
@@ -2297,8 +2327,9 @@
 zero. When the substring is no longer needed, the memory should be freed by
 calling \fBpcre2_substring_free()\fP.
 .P
-The return value from these functions is zero for success, or one of these
-error codes:
+The return value from all these functions is zero for success, or a negative
+error code. If the pattern match failed, the match failure code is returned.
+Other possible error codes are:
 .sp
   PCRE2_ERROR_NOMEMORY
 .sp
@@ -2319,7 +2350,8 @@
   PCRE2_ERROR_UNSET
 .sp
 The substring did not participate in the match. For example, if the pattern is
-(abc)|(def) and the subject is "def", substring number 1 is unset.  
+(abc)|(def) and the subject is "def", and the ovector contains at least two
+capturing slots, substring number 1 is unset.
 .
 .
 .SH "EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS"
@@ -2388,16 +2420,21 @@
 compiled pattern, and the second is the name. The yield of the function is the
 subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
 name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
-that name.
+that name. Given the number, you can extract the substring directly, or use one
+of the functions described above.
 .P
-Given the number, you can extract the substring directly, or use one of the
-functions described above. For convenience, there are also "byname" functions
-that correspond to the "bynumber" functions, the only difference being that the
-second argument is a name instead of a number. If PCRE2_DUPNAMES is
-set and there are duplicate names, these functions return the first named 
-string that is set. PCRE2_ERROR_UNSET is returned only if all groups of the 
-same name are unset.
+For convenience, there are also "byname" functions that correspond to the
+"bynumber" functions, the only difference being that the second argument is a
+name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
+names, these functions scan all the groups with the given name, and return the
+first named string that is set.
 .P
+If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is 
+returned. If all groups with the name have numbers that are greater than the 
+number of slots in the ovector, PCRE2_ERROR_UNAVAILABLE is returned. If there 
+is at least one group with a slot in the ovector, but no group is found to be 
+set, PCRE2_ERROR_UNSET is returned.
+.P
 \fBWarning:\fP If the pattern uses the (?| feature to set up multiple
 subpatterns with the same number, as described in the
 .\" HTML <a href="pcre2pattern.html#dupsubpatternnumber">
@@ -2660,18 +2697,37 @@
 .sp
 the three matched strings are
 .sp
+  <something> <something else> <something further>
+  <something> <something else>
   <something>
-  <something> <something else>
-  <something> <something else> <something further>
 .sp
 On success, the yield of the function is a number greater than zero, which is
 the number of matched substrings. The offsets of the substrings are returned in
-the ovector, and can be extracted in the same way as for \fBpcre2_match()\fP.
-They are returned in reverse order of length; that is, the longest
-matching string is given first. If there were too many matches to fit into
-the ovector, the yield of the function is zero, and the vector is filled with
-the longest matches.
+the ovector, and can be extracted by number in the same way as for
+\fBpcre2_match()\fP, but the numbers bear no relation to any capturing groups
+that may exist in the pattern, because DFA matching does not support group
+capture. 
 .P
+Calls to the convenience functions that extract substrings by name
+return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
+DFA match. The convenience functions that extract substrings by number never
+return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are
+slightly different:
+.sp
+  PCRE2_ERROR_UNAVAILABLE
+.sp
+The ovector is not big enough to include a slot for the given substring number.
+.sp
+  PCRE2_ERROR_UNSET
+.sp
+There is a slot in the ovector for this substring, but there were insufficient 
+matches to fill it.
+.P
+The matched strings are stored in the ovector in reverse order of length; that
+is, the longest matching string is first. If there were too many matches to fit
+into the ovector, the yield of the function is zero, and the vector is filled
+with the longest matches.
+.P
 NOTE: PCRE2's "auto-possessification" optimization usually applies to character
 repeats at the end of a pattern (as well as internally). For example, the
 pattern "a\ed+" is compiled as if it were "a\ed++". For DFA matching, this
@@ -2746,6 +2802,6 @@
 .rs
 .sp
 .nf
-Last updated: 13 December 2014
+Last updated: 14 December 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi

Modified: code/trunk/src/pcre2.h.in
===================================================================
--- code/trunk/src/pcre2.h.in    2014-12-13 17:43:26 UTC (rev 173)
+++ code/trunk/src/pcre2.h.in    2014-12-14 17:17:06 UTC (rev 174)
@@ -212,20 +212,21 @@
 #define PCRE2_ERROR_DFA_BADRESTART    (-38)
 #define PCRE2_ERROR_DFA_RECURSE       (-39)
 #define PCRE2_ERROR_DFA_UCOND         (-40)
-#define PCRE2_ERROR_DFA_UITEM         (-41)
-#define PCRE2_ERROR_DFA_WSSIZE        (-42)
-#define PCRE2_ERROR_INTERNAL          (-43)
-#define PCRE2_ERROR_JIT_BADOPTION     (-44)
-#define PCRE2_ERROR_JIT_STACKLIMIT    (-45)
-#define PCRE2_ERROR_MATCHLIMIT        (-46)
-#define PCRE2_ERROR_NOMEMORY          (-47)
-#define PCRE2_ERROR_NOSUBSTRING       (-48)
-#define PCRE2_ERROR_NOUNIQUESUBSTRING (-49)
-#define PCRE2_ERROR_NULL              (-50)
-#define PCRE2_ERROR_RECURSELOOP       (-51)
-#define PCRE2_ERROR_RECURSIONLIMIT    (-52)
-#define PCRE2_ERROR_UNAVAILABLE       (-53)
-#define PCRE2_ERROR_UNSET             (-54)
+#define PCRE2_ERROR_DFA_UFUNC         (-41)
+#define PCRE2_ERROR_DFA_UITEM         (-42)
+#define PCRE2_ERROR_DFA_WSSIZE        (-43)
+#define PCRE2_ERROR_INTERNAL          (-44)
+#define PCRE2_ERROR_JIT_BADOPTION     (-45)
+#define PCRE2_ERROR_JIT_STACKLIMIT    (-46)
+#define PCRE2_ERROR_MATCHLIMIT        (-47)
+#define PCRE2_ERROR_NOMEMORY          (-48)
+#define PCRE2_ERROR_NOSUBSTRING       (-49)
+#define PCRE2_ERROR_NOUNIQUESUBSTRING (-50)
+#define PCRE2_ERROR_NULL              (-51)
+#define PCRE2_ERROR_RECURSELOOP       (-52)
+#define PCRE2_ERROR_RECURSIONLIMIT    (-53)
+#define PCRE2_ERROR_UNAVAILABLE       (-54)
+#define PCRE2_ERROR_UNSET             (-55)

/* Request types for pcre2_pattern_info() */

Modified: code/trunk/src/pcre2_dfa_match.c
===================================================================
--- code/trunk/src/pcre2_dfa_match.c    2014-12-13 17:43:26 UTC (rev 173)
+++ code/trunk/src/pcre2_dfa_match.c    2014-12-14 17:17:06 UTC (rev 174)
@@ -3275,7 +3275,13 @@
     }
   }

+/* Fill in fields that are always returned in the match data. */

+match_data->code = re;
+match_data->subject = subject;
+match_data->mark = NULL;
+match_data->matchedby = PCRE2_MATCHEDBY_DFA_INTERPRETER;
+
/* Call the main matching function, looping for a non-anchored regex after a
failed match. If not restarting, perform certain optimizations at the start of
a match. */

Modified: code/trunk/src/pcre2_error.c
===================================================================
--- code/trunk/src/pcre2_error.c    2014-12-13 17:43:26 UTC (rev 173)
+++ code/trunk/src/pcre2_error.c    2014-12-14 17:17:06 UTC (rev 174)
@@ -212,18 +212,19 @@
   "invalid data in workspace for DFA restart\0"
   "too much recursion for DFA matching\0"
   /* 40 */
-  "backreference condition or recursion test not supported for DFA matching\0"
-  "item unsupported for DFA matching\0"
+  "backreference condition or recursion test is not supported for DFA matching\0"
+  "function is not supported for DFA matching\0"
+  "pattern contains an item that is not supported for DFA matching\0"
   "workspace size exceeded in DFA matching\0"
   "internal error - pattern overwritten?\0"
+  /* 45 */
   "bad JIT option\0"
-  /* 45 */
   "JIT stack limit reached\0"
   "match limit exceeded\0"
   "no more memory\0"
   "unknown substring\0"
+  /* 50 */
   "non-unique substring name\0"
-  /* 50 */
   "NULL argument passed\0"
   "nested recursion at the same subject position\0"
   "recursion limit exceeded\0"

Modified: code/trunk/src/pcre2_internal.h
===================================================================
--- code/trunk/src/pcre2_internal.h    2014-12-13 17:43:26 UTC (rev 173)
+++ code/trunk/src/pcre2_internal.h    2014-12-14 17:17:06 UTC (rev 174)
@@ -526,15 +526,16 @@

 #define PCRE2_MODE_MASK     (PCRE2_MODE8 | PCRE2_MODE16 | PCRE2_MODE32)

+/* Values for the matchedby field in a match data block. */
+
+enum { PCRE2_MATCHEDBY_INTERPRETER,     /* pcre2_match() */
+       PCRE2_MATCHEDBY_DFA_INTERPRETER, /* pcre2_dfa_match() */
+       PCRE2_MATCHEDBY_JIT };           /* pcre2_jit_match() */ 
+
 /* Magic number to provide a small check against being handed junk. */

#define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */

-/* This value is used to detect a loaded regular expression in different
-endianness. */
-
-#define REVERSED_MAGIC_NUMBER 0x45524350UL /* 'ERCP' */
-
/* The maximum remaining length of subject we are prepared to search for a
req_unit match. */

Modified: code/trunk/src/pcre2_intmodedep.h
===================================================================
--- code/trunk/src/pcre2_intmodedep.h    2014-12-13 17:43:26 UTC (rev 173)
+++ code/trunk/src/pcre2_intmodedep.h    2014-12-14 17:17:06 UTC (rev 174)
@@ -616,12 +616,13 @@
   pcre2_memctl     memctl;
   const pcre2_real_code *code;    /* The pattern used for the match */
   PCRE2_SPTR       subject;       /* The subject that was matched */
-  int              rc;            /* The return code from the match */
+  PCRE2_SPTR       mark;          /* Pointer to last mark */
   PCRE2_SIZE       leftchar;      /* Offset to leftmost code unit */
   PCRE2_SIZE       rightchar;     /* Offset to rightmost code unit */
   PCRE2_SIZE       startchar;     /* Offset to starting code unit */
-  PCRE2_SPTR       mark;          /* Pointer to last mark */
+  uint16_t         matchedby;     /* Type of match (normal, JIT, DFA) */ 
   uint16_t         oveccount;     /* Number of pairs */
+  int              rc;            /* The return code from the match */
   PCRE2_SIZE       ovector[1];    /* The first field */
 } pcre2_real_match_data;

Modified: code/trunk/src/pcre2_jit_match.c
===================================================================
--- code/trunk/src/pcre2_jit_match.c    2014-12-13 17:43:26 UTC (rev 173)
+++ code/trunk/src/pcre2_jit_match.c    2014-12-14 17:17:06 UTC (rev 174)
@@ -180,6 +180,7 @@
 match_data->leftchar = 0;
 match_data->rightchar = 0;
 match_data->mark = arguments.mark_ptr;
+match_data->matchedby = PCRE2_MATCHEDBY_JIT;

return match_data->rc;

Modified: code/trunk/src/pcre2_match.c
===================================================================
--- code/trunk/src/pcre2_match.c    2014-12-13 17:43:26 UTC (rev 173)
+++ code/trunk/src/pcre2_match.c    2014-12-14 17:17:06 UTC (rev 174)
@@ -6995,6 +6995,7 @@
 match_data->code = re;
 match_data->subject = subject;
 match_data->mark = mb->mark;
+match_data->matchedby = PCRE2_MATCHEDBY_INTERPRETER;

/* Handle a fully successful match. */

@@ -7026,14 +7027,15 @@
   match_data->rc = ((mb->capture_last & OVFLBIT) != 0)?
     0 : mb->end_offset_top/2;

- /* If there is space in the offset vector, set any unused pairs at the end to
- PCRE2_UNSET for backwards compatibility. It is documented that this happens.
- In earlier versions, the whole set of potential capturing offsets was
- initialized each time round the loop, but this is handled differently now.
- "Gaps" are set to PCRE2_UNSET dynamically instead (this fixes a bug). Thus,
- it is only those at the end that need setting here. We can't just set them
- all at the start of the whole thing because they may get set in one branch
- that is not the final matching branch. */
+ /* If there is space in the offset vector, set any pairs that follow the
+ highest-numbered captured string but are less than the number of capturing
+ groups in the pattern (and are within the ovector) to PCRE2_UNSET. It is
+ documented that this happens. In earlier versions, the whole set of potential
+ capturing offsets was initialized each time round the loop, but this is
+ handled differently now. "Gaps" are set to PCRE2_UNSET dynamically instead
+ (this fixed a bug). Thus, it is only those at the end that need setting here.
+ We can't just mark them all unset at the start of the whole thing because
+ they may get set in one branch that is not the final matching branch. */

   if (mb->end_offset_top/2 <= re->top_bracket)
     {

Modified: code/trunk/src/pcre2_substring.c
===================================================================
--- code/trunk/src/pcre2_substring.c    2014-12-13 17:43:26 UTC (rev 173)
+++ code/trunk/src/pcre2_substring.c    2014-12-14 17:17:06 UTC (rev 174)
@@ -64,27 +64,34 @@
 Returns:         if successful: zero
                  if not successful, a negative error code:
                    (1) an error from nametable_scan()
-                   (2) an error from copy_bynumber()  
-                   (3) PCRE2_ERROR_UNSET: all named groups are unset
+                   (2) an error from copy_bynumber()
+                   (3) PCRE2_ERROR_UNAVAILABLE: no group is in ovector 
+                   (4) PCRE2_ERROR_UNSET: all named groups in ovector are unset
 */

 PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
 pcre2_substring_copy_byname(pcre2_match_data *match_data, PCRE2_SPTR stringname,
   PCRE2_UCHAR *buffer, PCRE2_SIZE *sizeptr)
 {
-PCRE2_SPTR first;
-PCRE2_SPTR last;
-PCRE2_SPTR entry;
-int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
+PCRE2_SPTR first, last, entry;
+int failrc, entrysize;
+if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER)
+  return PCRE2_ERROR_DFA_UFUNC;
+entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
   &first, &last);
 if (entrysize < 0) return entrysize;
+failrc = PCRE2_ERROR_UNAVAILABLE;
 for (entry = first; entry <= last; entry += entrysize)
   {
   uint32_t n = GET2(entry, 0);
-  if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
-    return pcre2_substring_copy_bynumber(match_data, n, buffer, sizeptr);
+  if (n < match_data->oveccount)
+    {
+    if (match_data->ovector[n*2] != PCRE2_UNSET)
+      return pcre2_substring_copy_bynumber(match_data, n, buffer, sizeptr);
+    failrc = PCRE2_ERROR_UNSET;   
+    }   
   }
-return PCRE2_ERROR_UNSET;
+return failrc;
 }

@@ -146,26 +153,33 @@
                  if not successful, a negative value:
                    (1) an error from nametable_scan()
                    (2) an error from get_bynumber()  
-                   (3) PCRE2_ERROR_UNSET: all named groups are unset
+                   (3) PCRE2_ERROR_UNAVAILABLE: no group is in ovector 
+                   (4) PCRE2_ERROR_UNSET: all named groups in ovector are unset
 */

 PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
 pcre2_substring_get_byname(pcre2_match_data *match_data,
   PCRE2_SPTR stringname, PCRE2_UCHAR **stringptr, PCRE2_SIZE *sizeptr)
 {
-PCRE2_SPTR first;
-PCRE2_SPTR last;
-PCRE2_SPTR entry;
-int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
+PCRE2_SPTR first, last, entry;
+int failrc, entrysize;
+if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER)
+  return PCRE2_ERROR_DFA_UFUNC;
+entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
   &first, &last);
 if (entrysize < 0) return entrysize;
+failrc = PCRE2_ERROR_UNAVAILABLE;
 for (entry = first; entry <= last; entry += entrysize)
   {
   uint32_t n = GET2(entry, 0);
-  if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
-    return pcre2_substring_get_bynumber(match_data, n, stringptr, sizeptr);
+  if (n < match_data->oveccount)
+    {
+    if (match_data->ovector[n*2] != PCRE2_UNSET)
+      return pcre2_substring_get_bynumber(match_data, n, stringptr, sizeptr);
+    failrc = PCRE2_ERROR_UNSET;
+    }    
   }
-return PCRE2_ERROR_UNSET;
+return failrc;
 }

@@ -251,19 +265,25 @@
 pcre2_substring_length_byname(pcre2_match_data *match_data,
   PCRE2_SPTR stringname, PCRE2_SIZE *sizeptr)
 {
-PCRE2_SPTR first;
-PCRE2_SPTR last;
-PCRE2_SPTR entry;
-int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
+PCRE2_SPTR first, last, entry;
+int failrc, entrysize;
+if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER)
+  return PCRE2_ERROR_DFA_UFUNC;
+entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
   &first, &last);
 if (entrysize < 0) return entrysize;
+failrc = PCRE2_ERROR_UNAVAILABLE;
 for (entry = first; entry <= last; entry += entrysize)
   {
   uint32_t n = GET2(entry, 0);
-  if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
-    return pcre2_substring_length_bynumber(match_data, n, sizeptr);
+  if (n < match_data->oveccount)
+    {
+    if (match_data->ovector[n*2] != PCRE2_UNSET)
+      return pcre2_substring_length_bynumber(match_data, n, sizeptr);
+    failrc = PCRE2_ERROR_UNSET;
+    }    
   }
-return PCRE2_ERROR_UNSET;
+return failrc;
 }

@@ -292,13 +312,23 @@
 pcre2_substring_length_bynumber(pcre2_match_data *match_data,
   uint32_t stringnumber, PCRE2_SIZE *sizeptr)
 {
+int count;
 PCRE2_SIZE left, right;
-if (stringnumber > match_data->code->top_bracket) 
-  return PCRE2_ERROR_NOSUBSTRING;
-if (stringnumber >= match_data->oveccount) 
-  return PCRE2_ERROR_UNAVAILABLE;
-if (match_data->ovector[stringnumber*2] == PCRE2_UNSET)
-  return PCRE2_ERROR_UNSET;
+if ((count = match_data->rc) < 0) return count;   /* Match failed */
+if (match_data->matchedby != PCRE2_MATCHEDBY_DFA_INTERPRETER)
+  {
+  if (stringnumber > match_data->code->top_bracket) 
+    return PCRE2_ERROR_NOSUBSTRING;
+  if (stringnumber >= match_data->oveccount) 
+    return PCRE2_ERROR_UNAVAILABLE;
+  if (match_data->ovector[stringnumber*2] == PCRE2_UNSET)
+    return PCRE2_ERROR_UNSET;
+  }
+else  /* Matched using pcre2_dfa_match() */
+  {
+  if (stringnumber >= match_data->oveccount) return PCRE2_ERROR_UNAVAILABLE;
+  if (count != 0 && stringnumber >= (uint32_t)count) return PCRE2_ERROR_UNSET;
+  } 
 left = match_data->ovector[stringnumber*2];
 right = match_data->ovector[stringnumber*2+1];
 if (sizeptr != NULL) *sizeptr = (left > right)? 0 : right - left;

Modified: code/trunk/testdata/grepoutput
===================================================================
(Binary files differ)

Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2014-12-13 17:43:26 UTC (rev 173)
+++ code/trunk/testdata/testinput2    2014-12-14 17:17:06 UTC (rev 174)
@@ -4090,5 +4090,11 @@
 /x(?=ab\K)/
     xab\=get=0 
     xab\=copy=0 
+    xab\=getall

+/(?<A>a)|(?<A>b)/dupnames
+    a\=ovector=1,copy=A,get=A,get=2
+    a\=ovector=2,copy=A,get=A,get=2
+    b\=ovector=2,copy=A,get=A,get=2
+
 # End of testinput2

Modified: code/trunk/testdata/testinput6
===================================================================
(Binary files differ)

Modified: code/trunk/testdata/testoutput14
===================================================================
--- code/trunk/testdata/testoutput14    2014-12-13 17:43:26 UTC (rev 173)
+++ code/trunk/testdata/testoutput14    2014-12-14 17:17:06 UTC (rev 174)
@@ -114,11 +114,11 @@
     aaaaaaaaaaaaaz
 No match
     aaaaaaaaaaaaaz\=match_limit=3000
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 /(a+)*zz/
     aaaaaaaaaaaaaz\=recursion_limit=10
-Failed: error -52: recursion limit exceeded
+Failed: error -53: recursion limit exceeded

 /(*LIMIT_MATCH=3000)(a+)*zz/I
 Capturing subpattern count = 1
@@ -127,9 +127,9 @@
 Last code unit = 'z'
 Subject length lower bound = 2
     aaaaaaaaaaaaaz
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded
     aaaaaaaaaaaaaz\=match_limit=60000
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 /(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
 Capturing subpattern count = 1
@@ -138,7 +138,7 @@
 Last code unit = 'z'
 Subject length lower bound = 2
     aaaaaaaaaaaaaz
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 /(*LIMIT_MATCH=60000)(a+)*zz/I
 Capturing subpattern count = 1
@@ -149,7 +149,7 @@
     aaaaaaaaaaaaaz
 No match
     aaaaaaaaaaaaaz\=match_limit=3000
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 /(*LIMIT_RECURSION=10)(a+)*zz/I
 Capturing subpattern count = 1
@@ -158,9 +158,9 @@
 Last code unit = 'z'
 Subject length lower bound = 2
     aaaaaaaaaaaaaz
-Failed: error -52: recursion limit exceeded
+Failed: error -53: recursion limit exceeded
     aaaaaaaaaaaaaz\=recursion_limit=1000
-Failed: error -52: recursion limit exceeded
+Failed: error -53: recursion limit exceeded

 /(*LIMIT_RECURSION=10)(*LIMIT_RECURSION=1000)(a+)*zz/I
 Capturing subpattern count = 1
@@ -180,21 +180,21 @@
     aaaaaaaaaaaaaz
 No match
     aaaaaaaaaaaaaz\=recursion_limit=10
-Failed: error -52: recursion limit exceeded
+Failed: error -53: recursion limit exceeded

# These three have infinitely nested recursions.

 /((?2))((?1))/
     abc
-Failed: error -51: nested recursion at the same subject position
+Failed: error -52: nested recursion at the same subject position

 /((?(R2)a+|(?1)b))/
     aaaabcde
-Failed: error -51: nested recursion at the same subject position
+Failed: error -52: nested recursion at the same subject position

 /(?(R)a*(?1)|((?R))b)/
     aaaabcde
-Failed: error -51: nested recursion at the same subject position
+Failed: error -52: nested recursion at the same subject position

# The allusedtext modifier does not work with JIT, which does not maintain
# the leftchar/rightchar data.

Modified: code/trunk/testdata/testoutput16
===================================================================
--- code/trunk/testdata/testoutput16    2014-12-13 17:43:26 UTC (rev 173)
+++ code/trunk/testdata/testoutput16    2014-12-14 17:17:06 UTC (rev 174)
@@ -15,7 +15,7 @@

 /(?(R)a*(?1)|((?R))b)/
     aaaabcde
-Failed: error -45: JIT stack limit reached
+Failed: error -46: JIT stack limit reached

 /abcd/I
 Capturing subpattern count = 0
@@ -64,13 +64,13 @@
     abcd
  0: abcd (JIT)
     ab\=ps
-Failed: error -44: bad JIT option
+Failed: error -45: bad JIT option
     ab\=ph
-Failed: error -44: bad JIT option
+Failed: error -45: bad JIT option
     xyz
 No match (JIT)
     xyz\=ps
-Failed: error -44: bad JIT option
+Failed: error -45: bad JIT option

 /abcd/jit=2
     abcd
@@ -84,13 +84,13 @@

 /abcd/jit=2,jitfast
     abcd
-Failed: error -44: bad JIT option
+Failed: error -45: bad JIT option
     ab\=ps
 Partial match: ab (JIT)
     ab\=ph
-Failed: error -44: bad JIT option
+Failed: error -45: bad JIT option
     xyz
-Failed: error -44: bad JIT option
+Failed: error -45: bad JIT option

 /abcd/jit=3
     abcd
@@ -256,7 +256,7 @@
     aaaaaaaaaaaaaz
 No match (JIT)
     aaaaaaaaaaaaaz\=match_limit=3000
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 /(*LIMIT_MATCH=3000)(a+)*zz/I
 Capturing subpattern count = 1
@@ -266,9 +266,9 @@
 Subject length lower bound = 2
 JIT compilation was successful
     aaaaaaaaaaaaaz
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded
     aaaaaaaaaaaaaz\=match_limit=60000
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 /(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
 Capturing subpattern count = 1
@@ -278,7 +278,7 @@
 Subject length lower bound = 2
 JIT compilation was successful
     aaaaaaaaaaaaaz
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

 /(*LIMIT_MATCH=60000)(a+)*zz/I
 Capturing subpattern count = 1
@@ -290,21 +290,21 @@
     aaaaaaaaaaaaaz
 No match (JIT)
     aaaaaaaaaaaaaz\=match_limit=3000
-Failed: error -46: match limit exceeded
+Failed: error -47: match limit exceeded

# These three have infinitely nested recursions.

 /((?2))((?1))/
     abc
-Failed: error -45: JIT stack limit reached
+Failed: error -46: JIT stack limit reached

 /((?(R2)a+|(?1)b))/
     aaaabcde
-Failed: error -45: JIT stack limit reached
+Failed: error -46: JIT stack limit reached

 /(?(R)a*(?1)|((?R))b)/
     aaaabcde
-Failed: error -45: JIT stack limit reached
+Failed: error -46: JIT stack limit reached

# Invalid options disable JIT when called via pcre2_match(), causing the
# match to happen via the interpreter, but for fast JIT invalid options are

Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2014-12-13 17:43:26 UTC (rev 173)
+++ code/trunk/testdata/testoutput2    2014-12-14 17:17:06 UTC (rev 174)
@@ -993,7 +993,7 @@
  0: abcd
  1: a
  2: d
-Copy substring 5 failed (-48): unknown substring
+Copy substring 5 failed (-49): unknown substring

/(.{20})/I
Capturing subpattern count = 1
@@ -1047,9 +1047,9 @@
2: <unset>
3: f
1G a (1)
-Get substring 2 failed (-54): requested value is not set
+Get substring 2 failed (-55): requested value is not set
3G f (1)
-Get substring 4 failed (-48): unknown substring
+Get substring 4 failed (-49): unknown substring
0L adef
1L a
2L
@@ -1062,7 +1062,7 @@
1G bc (2)
2G bc (2)
3G f (1)
-Get substring 4 failed (-48): unknown substring
+Get substring 4 failed (-49): unknown substring
0L bcdef
1L bc
2L bc
@@ -4363,7 +4363,7 @@
1: cd
2: gh
Number not found for group 'three'
-Copy substring 'three' failed (-48): unknown substring
+Copy substring 'three' failed (-49): unknown substring

/(?P<Tes>)(?P<Test>)/IB
------------------------------------------------------------------
@@ -5731,7 +5731,7 @@
1: a1
2: a1
Number not found for group 'Z'
-Copy substring 'Z' failed (-48): unknown substring
+Copy substring 'Z' failed (-49): unknown substring
C a1 (2) A (non-unique)

 /(?|(?<a>)(?<b>)(?<a>)|(?<a>)(?<b>)(?<a>))/I,dupnames
@@ -5772,7 +5772,7 @@
   C a (1) A (non-unique)
     cd\=copy=A
  0: cd
-Copy substring 'A' failed (-54): requested value is not set
+Copy substring 'A' failed (-55): requested value is not set

/^(?P<A>a)(?P<A>b)|cd(?P<A>ef)(?P<A>gh)/I,dupnames
Capturing subpattern count = 4
@@ -5817,7 +5817,7 @@
1: a1
2: a1
Number not found for group 'Z'
-Get substring 'Z' failed (-48): unknown substring
+Get substring 'Z' failed (-49): unknown substring
G a1 (2) A (non-unique)

 /^(?P<A>a)(?P<A>b)/I,dupnames
@@ -5848,7 +5848,7 @@
   G a (1) A (non-unique)
     cd\=get=A
  0: cd
-Get substring 'A' failed (-54): requested value is not set
+Get substring 'A' failed (-55): requested value is not set

/^(?P<A>a)(?P<A>b)|cd(?P<A>ef)(?P<A>gh)/I,dupnames
Capturing subpattern count = 4
@@ -13659,11 +13659,11 @@

 /abc/replace=a$bad
     123abc
-Failed: error -48: unknown substring
+Failed: error -49: unknown substring

 /abc/replace=a${A234567890123456789_123456789012}z
     123abc
-Failed: error -48: unknown substring
+Failed: error -49: unknown substring

 /abc/replace=a${A23456789012345678901234567890123}z
     123abc
@@ -13683,7 +13683,7 @@

 /abc/replace=[9]XYZ
     123abc123
-Failed: error -47: no more memory
+Failed: error -48: no more memory

 /abc/replace=xyz
     1abc2\=partial_hard
@@ -13720,10 +13720,10 @@
 Matched, but too many substrings
  0: c
  1: <unset>
-Get substring 1 failed (-54): requested value is not set
-Get substring 2 failed (-53): requested value is not available
-Get substring 3 failed (-53): requested value is not available
-Get substring 4 failed (-48): unknown substring
+Get substring 1 failed (-55): requested value is not set
+Get substring 2 failed (-54): requested value is not available
+Get substring 3 failed (-54): requested value is not available
+Get substring 4 failed (-49): unknown substring
  0L c
  1L

@@ -13736,5 +13736,30 @@
 Start of matched string is beyond its end - displaying from end to start.
  0: ab
  0C  (0)
+    xab\=getall
+Start of matched string is beyond its end - displaying from end to start.
+ 0: ab
+ 0L

+/(?<A>a)|(?<A>b)/dupnames
+    a\=ovector=1,copy=A,get=A,get=2
+Matched, but too many substrings
+ 0: a
+Copy substring 'A' failed (-54): requested value is not available
+Get substring 2 failed (-54): requested value is not available
+Get substring 'A' failed (-54): requested value is not available
+    a\=ovector=2,copy=A,get=A,get=2
+ 0: a
+ 1: a
+  C a (1) A (non-unique)
+Get substring 2 failed (-54): requested value is not available
+  G a (1) A (non-unique)
+    b\=ovector=2,copy=A,get=A,get=2
+Matched, but too many substrings
+ 0: b
+ 1: <unset>
+Copy substring 'A' failed (-55): requested value is not set
+Get substring 2 failed (-54): requested value is not available
+Get substring 'A' failed (-55): requested value is not set
+
 # End of testinput2

Modified: code/trunk/testdata/testoutput6
===================================================================
(Binary files differ)

Modified: code/trunk/testdata/testoutput7
===================================================================
--- code/trunk/testdata/testoutput7    2014-12-13 17:43:26 UTC (rev 173)
+++ code/trunk/testdata/testoutput7    2014-12-14 17:17:06 UTC (rev 174)
@@ -1218,7 +1218,7 @@

 /ab\Cde/utf
     abXde
-Failed: error -41: item unsupported for DFA matching
+Failed: error -42: pattern contains an item that is not supported for DFA matching

/(?<=ab\Cde)X/utf
Failed: error 136 at offset 10: \C is not allowed in a lookbehind assertion

このメッセージは次のスレッドの一部です:
	日付によるスレッドの仕分け

[Pcre-svn] [174] code/trunk: Update and improve substring ha…