[Pcre-svn] [702] code/trunk: Make \=find_limits apply to DFA…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [702] code/trunk: Make \=find_limits apply to DFA matching, to find the minimum depth limit.
Revision: 702
          http://www.exim.org/viewvc/pcre2?view=rev&revision=702
Author:   ph10
Date:     2017-03-24 18:20:34 +0000 (Fri, 24 Mar 2017)
Log Message:
-----------
Make \=find_limits apply to DFA matching, to find the minimum depth limit.


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcre2test.1
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testinput6
    code/trunk/testdata/testoutput6


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2017-03-24 16:53:38 UTC (rev 701)
+++ code/trunk/ChangeLog    2017-03-24 18:20:34 UTC (rev 702)
@@ -84,7 +84,11 @@
 14. The alternative matching function, pcre2_dfa_match() misbehaved if it 
 encountered a character class with a possessive repeat, for example [a-f]{3}+.


+15. The depth (formerly recursion) limit now applies to DFA matching (as
+of 10.23/36); pcre2test has been upgraded so that \=find_limits works with DFA
+matching to find the minimum value for this limit.

+
Version 10.23 14-February-2017
------------------------------


Modified: code/trunk/doc/pcre2test.1
===================================================================
--- code/trunk/doc/pcre2test.1    2017-03-24 16:53:38 UTC (rev 701)
+++ code/trunk/doc/pcre2test.1    2017-03-24 18:20:34 UTC (rev 702)
@@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "21 March 2017" "PCRE 10.30"
+.TH PCRE2TEST 1 "24 March 2017" "PCRE 10.30"
 .SH NAME
 pcre2test - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -1052,7 +1052,7 @@
       copy=<number or name>      copy captured substring
       depth_limit=<n>            set a depth limit
       dfa                        use \fBpcre2_dfa_match()\fP
-      find_limits                find match and recursion limits
+      find_limits                find match and depth limits
       get=<number or name>       extract captured substring
       getall                     extract all captured substrings
   /g  global                     global matching
@@ -1297,23 +1297,26 @@
 .SS "Finding minimum limits"
 .rs
 .sp
-If the \fBfind_limits\fP modifier is present, \fBpcre2test\fP calls
-\fBpcre2_match()\fP several times, setting different values in the match
-context via \fBpcre2_set_match_limit()\fP and \fBpcre2_set_depth_limit()\fP
-until it finds the minimum values for each parameter that allow
-\fBpcre2_match()\fP to complete without error.
+If the \fBfind_limits\fP modifier is present on a subject line, \fBpcre2test\fP
+calls the relevant matching function several times, setting different values in
+the match context via \fBpcre2_set_match_limit()\fP or
+\fBpcre2_set_depth_limit()\fP until it finds the minimum values for each
+parameter that allows the match to complete without error.
 .P
 If JIT is being used, only the match limit is relevant. If DFA matching is
-being used, only the depth limit is relevant, but at present this modifier is
-ignored (with a warning message).
+being used, only the depth limit is relevant.
 .P
 The \fImatch_limit\fP number is a measure of the amount of backtracking
 that takes place, and learning the minimum value can be instructive. For most
 simple matches, the number is quite small, but for patterns with very large
 numbers of matching possibilities, it can become large very quickly with
-increasing length of subject string. The \fIdepth_limit\fP number is
-a measure of how much memory for recording backtracking points is needed to
-complete the match attempt.
+increasing length of subject string. 
+.P
+For non-DFA matching, the minimum \fIdepth_limit\fP number is a measure of how
+much memory for recording backtracking points is needed to complete the match
+attempt. In the case of DFA matching, \fIdepth_limit\fP controls the depth of 
+recursive calls of the internal function that is used for handling pattern 
+recursion, lookaround assertions, and atomic groups.
 .
 .
 .SS "Showing MARK names"
@@ -1765,6 +1768,6 @@
 .rs
 .sp
 .nf
-Last updated: 21 March 2017
+Last updated: 24 March 2017
 Copyright (c) 1997-2017 University of Cambridge.
 .fi


Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2017-03-24 16:53:38 UTC (rev 701)
+++ code/trunk/src/pcre2test.c    2017-03-24 18:20:34 UTC (rev 702)
@@ -5258,8 +5258,20 @@
 *          Check match or depth limit            *
 *************************************************/


+/* This is used for DFA, normal, and JIT fast matching. For DFA matching it 
+should only called with the third argument set to PCRE2_ERROR_DEPTHLIMIT.
+
+Arguments:
+  pp        the subject string
+  ulen      length of subject or PCRE2_ZERO_TERMINATED
+  errnumber defines which limit to test
+  msg       string to include in final message
+  
+Returns:    the return from the final match function call
+*/    
+
 static int
-check_match_limit(uint8_t *pp, size_t ulen, int errnumber, const char *msg)
+check_match_limit(uint8_t *pp, PCRE2_SIZE ulen, int errnumber, const char *msg)
 {
 int capcount;
 uint32_t min = 0;
@@ -5279,10 +5291,22 @@
     {
     PCRE2_SET_DEPTH_LIMIT(dat_context, mid);
     }
-
-  if ((pat_patctl.control & CTL_JITFAST) != 0)
+    
+  if ((dat_datctl.control & CTL_DFA) != 0)
+    {
+    if (dfa_workspace == NULL)
+      dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int));
+    if (dfa_matched++ == 0)
+      dfa_workspace[0] = -1;  /* To catch bad restart */
+    PCRE2_DFA_MATCH(capcount, compiled_code, pp, ulen, dat_datctl.offset, 
+      dat_datctl.options, match_data,
+      PTR(dat_context), dfa_workspace, DFA_WS_DIMENSION);
+    }
+        
+  else if ((pat_patctl.control & CTL_JITFAST) != 0)
     PCRE2_JIT_MATCH(capcount, compiled_code, pp, ulen, dat_datctl.offset,
       dat_datctl.options, match_data, PTR(dat_context));
+       
   else
     PCRE2_MATCH(capcount, compiled_code, pp, ulen, dat_datctl.offset,
       dat_datctl.options, match_data, PTR(dat_context));
@@ -6243,12 +6267,6 @@
  /* Handle matching via the native interface. Check for consistency of
 modifiers. */


-if ((dat_datctl.control & (CTL_DFA|CTL_FINDLIMITS)) == (CTL_DFA|CTL_FINDLIMITS))
- {
- fprintf(outfile, "** Finding match limits is not relevant for DFA matching: ignored\n");
- dat_datctl.control &= ~CTL_FINDLIMITS;
- }
-
/* ALLUSEDTEXT is not supported with JIT, but JIT is not used with DFA
matching, even if the JIT compiler was used. */

@@ -6579,14 +6597,19 @@
         (double)CLOCKS_PER_SEC);
     }


- /* Find the match and depth limits if requested. The depth limit
- is not relevant for JIT. */
+ /* Find the match and depth limits if requested. The match limit is not
+ relevant for DFA matching and the depth limit is not relevant for JIT. */

   if ((dat_datctl.control & CTL_FINDLIMITS) != 0)
     {
-    capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_MATCHLIMIT, "match");
-    if (FLD(compiled_code, executable_jit) == NULL)
-      (void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_DEPTHLIMIT,
+    if ((dat_datctl.control & CTL_DFA) == 0)
+      capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_MATCHLIMIT, 
+        "match");
+    else capcount = 0;     
+    if (FLD(compiled_code, executable_jit) == NULL || 
+        (dat_datctl.options & PCRE2_NO_JIT) != 0 ||
+        (dat_datctl.control & CTL_DFA) != 0)
+      capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_DEPTHLIMIT,
         "depth");
     }



Modified: code/trunk/testdata/testinput6
===================================================================
--- code/trunk/testdata/testinput6    2017-03-24 16:53:38 UTC (rev 701)
+++ code/trunk/testdata/testinput6    2017-03-24 18:20:34 UTC (rev 702)
@@ -4889,4 +4889,7 @@
 /(02-)?[0-9]{3}-[0-9]{3}/
     02-123-123


+/^(a(?2))(b)(?1)/
+    abbab\=find_limits 
+
 # End of testinput6


Modified: code/trunk/testdata/testoutput6
===================================================================
--- code/trunk/testdata/testoutput6    2017-03-24 16:53:38 UTC (rev 701)
+++ code/trunk/testdata/testoutput6    2017-03-24 18:20:34 UTC (rev 702)
@@ -7689,4 +7689,9 @@
     02-123-123
  0: 02-123-123


+/^(a(?2))(b)(?1)/
+    abbab\=find_limits 
+Minimum depth limit = 2
+ 0: abbab
+
 # End of testinput6