[Pcre-svn] [96] code/trunk: Impose a minimum of 1 for the nu…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [96] code/trunk: Impose a minimum of 1 for the number of pairs in the ovector.
Revision: 96
          http://www.exim.org/viewvc/pcre2?view=rev&revision=96
Author:   ph10
Date:     2014-10-05 18:55:25 +0100 (Sun, 05 Oct 2014)


Log Message:
-----------
Impose a minimum of 1 for the number of pairs in the ovector.

Modified Paths:
--------------
    code/trunk/doc/pcre2api.3
    code/trunk/doc/pcre2test.1
    code/trunk/src/pcre2_match_data.c
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testoutput2
    code/trunk/testdata/testoutput6


Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2014-10-05 16:56:11 UTC (rev 95)
+++ code/trunk/doc/pcre2api.3    2014-10-05 17:55:25 UTC (rev 96)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "01 October 2014" "PCRE2 10.00"
+.TH PCRE2API 3 "05 October 2014" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@@ -1650,13 +1650,15 @@
 string that define the matched part of the subject and any substrings that were
 capured. This is know as the \fIovector\fP. 
 .P
-Before calling \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP you must create a 
+Before calling \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP you must create a
 match data block by calling one of the creation functions above. For
 \fBpcre2_match_data_create()\fP, the first argument is the number of pairs of
 offsets in the \fIovector\fP. One pair of offsets is required to identify the
 string that matched the whole pattern, with another pair for each captured
-substring. For example, a value of 4 creates enough space to record the
-matched portion of the subject plus three captured substrings. 
+substring. For example, a value of 4 creates enough space to record the matched
+portion of the subject plus three captured substrings. A minimum of at least 1
+pair is imposed by \fBpcre2_match_data_create()\fP, so it is always possible to
+return the overall matched string.
 .P
 For \fBpcre2_match_data_create_from_pattern()\fP, the first argument is a
 pointer to a compiled pattern. In this case the ovector is created to be 
@@ -2015,13 +2017,13 @@
 returned.
 .P
 If the ovector is too small to hold all the captured substring offsets, as much
-as possible is filled in, and the function returns a value of zero. If neither
-the actual string matched nor any captured substrings are of interest,
-\fBpcre2_match()\fP may be called with a match data block whose ovector is of
-zero length. However, if the pattern contains back references and the
-\fIovector\fP is not big enough to remember the related substrings, PCRE2 has
-to get additional memory for use during matching. Thus it is usually advisable
-to set up a match data block containing an ovector of reasonable size.
+as possible is filled in, and the function returns a value of zero. If captured
+substrings are not of interest, \fBpcre2_match()\fP may be called with a match
+data block whose ovector is of minimum length (that is, one pair). However, if
+the pattern contains back references and the \fIovector\fP is not big enough to
+remember the related substrings, PCRE2 has to get additional memory for use
+during matching. Thus it is usually advisable to set up a match data block
+containing an ovector of reasonable size.
 .P
 It is possible for capturing subpattern number \fIn+1\fP to match some part of
 the subject when subpattern \fIn\fP has not been used at all. For example, if
@@ -2652,6 +2654,6 @@
 .rs
 .sp
 .nf
-Last updated: 01 October 2014
+Last updated: 05 October 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2test.1
===================================================================
--- code/trunk/doc/pcre2test.1    2014-10-05 16:56:11 UTC (rev 95)
+++ code/trunk/doc/pcre2test.1    2014-10-05 17:55:25 UTC (rev 96)
@@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "19 August 2014" "PCRE 10.00"
+.TH PCRE2TEST 1 "05 October 2014" "PCRE 10.00"
 .SH NAME
 pcre2test - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -881,6 +881,12 @@
 appears, though of course it can also be used to set a default in a
 \fB#subject\fP command. It specifies the number of pairs of offsets that are
 available for storing matching information. The default is 15.
+.P
+At least one pair of offsets is always created by 
+\fBpcre2_match_data_create()\fP, for matching with PCRE2's native API, so a 
+value of 0 is the same as 1. However a value of 0 is useful when testing the 
+POSIX API because it causes \fBregexec()\fP to be called with a NULL capture 
+vector.
 .
 .
 .SH "THE ALTERNATIVE MATCHING FUNCTION"
@@ -1145,6 +1151,6 @@
 .rs
 .sp
 .nf
-Last updated: 19 August 2014
+Last updated: 05 October 2014
 Copyright (c) 1997-2014 University of Cambridge.
 .fi


Modified: code/trunk/src/pcre2_match_data.c
===================================================================
--- code/trunk/src/pcre2_match_data.c    2014-10-05 16:56:11 UTC (rev 95)
+++ code/trunk/src/pcre2_match_data.c    2014-10-05 17:55:25 UTC (rev 96)
@@ -51,10 +51,14 @@
 *  Create a match data block given ovector size  *
 *************************************************/


+/* A minimum of 1 is imposed on the number of ovector triplets. */
+
PCRE2_EXP_DEFN pcre2_match_data * PCRE2_CALL_CONVENTION
pcre2_match_data_create(uint32_t oveccount, pcre2_general_context *gcontext)
{
-pcre2_match_data *yield = PRIV(memctl_malloc)(
+pcre2_match_data *yield;
+if (oveccount < 1) oveccount = 1;
+yield = PRIV(memctl_malloc)(
sizeof(pcre2_match_data) + 3*oveccount*sizeof(PCRE2_SIZE),
(pcre2_memctl *)gcontext);
yield->oveccount = oveccount;

Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2014-10-05 16:56:11 UTC (rev 95)
+++ code/trunk/src/pcre2test.c    2014-10-05 17:55:25 UTC (rev 96)
@@ -2531,7 +2531,7 @@
   case MOD_CTC:  /* Compile context modifier */
   if (ctx == CTX_DEFPAT) field = PTR(default_pat_context);
     else if (ctx == CTX_PAT) field = PTR(pat_context);
-  break;   
+  break;


case MOD_CTM: /* Match context modifier */
if (ctx == CTX_DEFDAT) field = PTR(default_dat_context);
@@ -3705,8 +3705,8 @@
/* Call the JIT compiler if requested. */

if (pat_patctl.jit != 0)
- {
- PCRE2_JIT_COMPILE(compiled_code, pat_patctl.jit);
+ {
+ PCRE2_JIT_COMPILE(compiled_code, pat_patctl.jit);
}

/* Output code size and other information if requested. */
@@ -4385,12 +4385,11 @@
dat_datctl.control &= ~CTL_FINDLIMITS;
}

-if ((dat_datctl.control & CTL_ANYGLOB) != 0 && dat_datctl.oveccount < 1)
- {
- printf("** Global matching requires a non-zero ovector count: ignored\n");
- dat_datctl.control &= ~CTL_ANYGLOB;
- }
+/* As pcre2_match_data_create() imposes a minimum of 1 on the ovector count, we
+must do so too. */

+if (dat_datctl.oveccount < 1) dat_datctl.oveccount = 1;
+
/* Enable display of malloc/free if wanted. */

 show_memory = (dat_datctl.control & CTL_MEMORY) != 0;
@@ -4438,28 +4437,28 @@
   PCRE2_MATCH_DATA_FREE(match_data);
   PCRE2_MATCH_DATA_CREATE(match_data, max_oveccount, NULL);
   }
-     
+
 /* Loop for global matching */


for (gmatched = 0;; gmatched++)
{
int capcount;
PCRE2_SIZE *ovector;
- PCRE2_SIZE ovecsave[2];
+ PCRE2_SIZE ovecsave[2];

jit_was_used = FALSE;
ovector = FLD(match_data, ovector);
-
+
/* After the first time round a global loop, save the current ovector[0,1] so
- that we can check that they do change each time. Otherwise a matching bug
+ that we can check that they do change each time. Otherwise a matching bug
that returns the same string causes an infinite loop. It has happened! */

   if (gmatched > 0)
-    {  
+    {
     ovecsave[0] = ovector[0];
-    ovecsave[1] = ovector[1];  
-    } 
-   
+    ovecsave[1] = ovector[1];
+    }
+
   /* Do timing if required. */


   if (timeitm > 0)
@@ -4564,7 +4563,7 @@
     PCRE2_SIZE rightchar = FLD(match_data, rightchar);


     /* This is a check against a lunatic return value. */
-    
+
     if (capcount > (int)dat_datctl.oveccount)
       {
       fprintf(outfile,
@@ -4577,20 +4576,20 @@
         dat_datctl.control &= ~CTL_ANYGLOB;        /* Break g/G loop */
         }
       }
-      
-    /* If this is not the first time round a global loop, check that the 
-    returned string has changed. If not, there is a bug somewhere and we must 
+
+    /* If this is not the first time round a global loop, check that the
+    returned string has changed. If not, there is a bug somewhere and we must
     break the loop because it will go on for ever. We know that for a global
-    match there must be at least two elements in the ovector. This is checked 
+    match there must be at least two elements in the ovector. This is checked
     above. */
-    
+
     if (gmatched > 0 && ovecsave[0] == ovector[0] && ovecsave[1] == ovector[1])
       {
-      fprintf(outfile,  
+      fprintf(outfile,
         "** PCRE2 error: global repeat returned the same string as previous\n");
       fprintf(outfile, "** Global loop abandoned\n");
       dat_datctl.control &= ~CTL_ANYGLOB;        /* Break g/G loop */
-      }   
+      }


     /* "allcaptures" requests showing of all captures in the pattern, to check
     unset ones at the end. It may be set on the pattern or the data. Implement
@@ -4647,7 +4646,7 @@
         PCHARSV(pp, start, end - start, utf, outfile);
         }


-      if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used) 
+      if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used)
         fprintf(outfile, " (JIT)");
       fprintf(outfile, "\n");


@@ -4864,7 +4863,7 @@

     fprintf(outfile, ": ");
     PCHARSV(pp, leftchar, ulen - leftchar, utf, outfile);
-    if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used) 
+    if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used)
       fprintf(outfile, " (JIT)");
     fprintf(outfile, "\n");
     break;  /* Out of the /g loop */
@@ -4875,8 +4874,7 @@
   If that is the case, this is not necessarily the end. We want to advance the
   start offset, and continue. We won't be at the end of the string - that was
   checked before setting g_notempty. We achieve the effect by pretending that a
-  single character was matched. We know that match_data->oveccount is at least
-  1 because that was checked above.
+  single character was matched.


   Complication arises in the case when the newline convention is "any", "crlf",
   or "anycrlf". If the previous match was at the end of a line terminated by
@@ -4936,7 +4934,7 @@
           fprintf(outfile, ", mark = ");
           PCHARSV(CASTFLD(void *, match_data, mark), 0, -1, utf, outfile);
           }
-        if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used) 
+        if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used)
           fprintf(outfile, " (JIT)");
         fprintf(outfile, "\n");
         }


Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2014-10-05 16:56:11 UTC (rev 95)
+++ code/trunk/testdata/testoutput2    2014-10-05 17:55:25 UTC (rev 96)
@@ -245,6 +245,7 @@
  3: c
     abcb\=ovector=0
 Matched, but too many substrings
+ 0: abcb
     abcb\=ovector=1
 Matched, but too many substrings
  0: abcb
@@ -273,6 +274,7 @@
  1: a
     abc\=ovector=0
 Matched, but too many substrings
+ 0: abc
     abc\=ovector=1
 Matched, but too many substrings
  0: abc
@@ -286,6 +288,7 @@
  3: b
     aba\=ovector=0
 Matched, but too many substrings
+ 0: aba
     aba\=ovector=1
 Matched, but too many substrings
  0: aba
@@ -7404,6 +7407,7 @@
 No match
   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4\=ovector=0
 Matched, but too many substrings
+ 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4


 /^a.b/newline=lf
     a\rb
@@ -10922,6 +10926,7 @@
  3: baz
     bazfooX\=ovector=0
 Matched, but too many substrings
+ 0: fooX
     bazfooX\=ovector=1
 Matched, but too many substrings
  0: fooX
@@ -11970,7 +11975,7 @@


 /(ab)x|ab/
     ab\=ovector=0
-Matched, but too many substrings
+ 0: ab
     ab\=ovector=1
  0: ab



Modified: code/trunk/testdata/testoutput6
===================================================================
(Binary files differ)