Revision: 623
http://www.exim.org/viewvc/pcre2?view=rev&revision=623
Author: ph10
Date: 2016-12-23 11:04:51 +0000 (Fri, 23 Dec 2016)
Log Message:
-----------
Make the recursion limit apply to DFA matching.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/doc/pcre2_dfa_match.3
code/trunk/doc/pcre2api.3
code/trunk/doc/pcre2pattern.3
code/trunk/doc/pcre2stack.3
code/trunk/doc/pcre2syntax.3
code/trunk/src/pcre2_dfa_match.c
code/trunk/src/pcre2_intmodedep.h
code/trunk/testdata/testinput6
code/trunk/testdata/testoutput6
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2016-12-21 17:09:11 UTC (rev 622)
+++ code/trunk/ChangeLog 2016-12-23 11:04:51 UTC (rev 623)
@@ -233,7 +233,11 @@
certain recursive cases it failed to be triggered and an internal error could
be the result.
+36. The pcre2_dfa_match() function now takes note of the recursion limit for
+the internal recursive calls that are used for lookrounds and recursions within
+the pattern.
+
Version 10.22 29-July-2016
--------------------------
Modified: code/trunk/doc/pcre2_dfa_match.3
===================================================================
--- code/trunk/doc/pcre2_dfa_match.3 2016-12-21 17:09:11 UTC (rev 622)
+++ code/trunk/doc/pcre2_dfa_match.3 2016-12-23 11:04:51 UTC (rev 623)
@@ -1,4 +1,4 @@
-.TH PCRE2_DFA_MATCH 3 "12 May 2013" "PCRE2 10.00"
+.TH PCRE2_DFA_MATCH 3 "23 December 2016" "PCRE2 10.23"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@@ -33,8 +33,8 @@
\fIwscount\fP Number of elements in the vector
.sp
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
-up a callout function. The \fIlength\fP and \fIstartoffset\fP values are code
-units, not characters. The options are:
+up a callout function or specify the recursion limit. The \fIlength\fP and
+\fIstartoffset\fP values are code units, not characters. The options are:
.sp
PCRE2_ANCHORED Match only at the first position
PCRE2_NOTBOL Subject is not the beginning of a line
Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3 2016-12-21 17:09:11 UTC (rev 622)
+++ code/trunk/doc/pcre2api.3 2016-12-23 11:04:51 UTC (rev 623)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "22 November 2016" "PCRE2 10.23"
+.TH PCRE2API 3 "24 December 2016" "PCRE2 10.23"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@@ -840,20 +840,22 @@
Limiting the recursion depth limits the amount of system stack that can be
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
stack, the amount of heap memory that can be used. This limit is not relevant,
-and is ignored, when matching is done using JIT compiled code or by the
-\fBpcre2_dfa_match()\fP function.
+and is ignored, when matching is done using JIT compiled code. However, it is
+supported by \fBpcre2_dfa_match()\fP, which uses recursive function calls less
+frequently than \fBpcre2_match()\fP, but which can be caused to use a lot of
+stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
.P
The default value for \fIrecursion_limit\fP can be set when PCRE2 is built; the
default default is the same value as the default for \fImatch_limit\fP. If the
-limit is exceeded, \fBpcre2_match()\fP returns PCRE2_ERROR_RECURSIONLIMIT. A
-value for the recursion limit may also be supplied by an item at the start of a
-pattern of the form
+limit is exceeded, \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP return
+PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
+supplied by an item at the start of a pattern of the form
.sp
(*LIMIT_RECURSION=ddd)
.sp
where ddd is a decimal number. However, such a setting is ignored unless ddd is
-less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
-limit is set, less than the default.
+less than the limit set by the caller of \fBpcre2_match()\fP or
+\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
.sp
.nf
.B int pcre2_set_recursion_memory_management(
@@ -3319,6 +3321,6 @@
.rs
.sp
.nf
-Last updated: 22 November 2016
+Last updated: 23 December 2016
Copyright (c) 1997-2016 University of Cambridge.
.fi
Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3 2016-12-21 17:09:11 UTC (rev 622)
+++ code/trunk/doc/pcre2pattern.3 2016-12-23 11:04:51 UTC (rev 623)
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "23 October 2016" "PCRE2 10.23"
+.TH PCRE2PATTERN 3 "23 December 2016" "PCRE2 10.23"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -158,6 +158,11 @@
for it to have any effect. In other words, the pattern writer can lower the
limits set by the programmer, but not raise them. If there is more than one
setting of one of these limits, the lower value is used.
+.P
+The match limit is used (but in a different way) when JIT is being used, but it
+is not relevant, and is ignored, when matching with \fBpcre2_dfa_match()\fP.
+However, the recursion limit is relevant for DFA matching, which does use some
+function recursion, in particular, for recursions within the pattern.
.
.
.\" HTML <a name="newlines"></a>
@@ -3477,6 +3482,6 @@
.rs
.sp
.nf
-Last updated: 23 October 2016
+Last updated: 23 December 2016
Copyright (c) 1997-2016 University of Cambridge.
.fi
Modified: code/trunk/doc/pcre2stack.3
===================================================================
--- code/trunk/doc/pcre2stack.3 2016-12-21 17:09:11 UTC (rev 622)
+++ code/trunk/doc/pcre2stack.3 2016-12-23 11:04:51 UTC (rev 623)
@@ -1,4 +1,4 @@
-.TH PCRE2STACK 3 "21 November 2014" "PCRE2 10.00"
+.TH PCRE2STACK 3 "23 December 2016" "PCRE2 10.23"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 DISCUSSION OF STACK USAGE"
@@ -43,11 +43,12 @@
Normally, these are never very deep, and the limit on the complexity of
\fBpcre2_dfa_match()\fP is controlled by the amount of workspace it is given.
However, it is possible to write patterns with runaway infinite recursions;
-such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack. At
-present, there is no protection against this.
+such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack unless a
+limit is applied (see below).
.P
-The comments that follow do NOT apply to \fBpcre2_dfa_match()\fP; they are
-relevant only for \fBpcre2_match()\fP without the JIT optimization.
+The comments in the next three sections do not apply to
+\fBpcre2_dfa_match()\fP; they are relevant only for \fBpcre2_match()\fP without
+the JIT optimization.
.
.
.SS "Reducing \fBpcre2_match()\fP's stack usage"
@@ -147,6 +148,15 @@
different limits.
.
.
+.SS "Limiting \fBpcre2_dfa_match()\fP's stack usage"
+.rs
+.sp
+The recursion limit, as described above for \fBpcre2_match()\fP, also applies
+to \fBpcre2_dfa_match()\fP, whose use of recursive function calls for
+recursions in the pattern can lead to runaway stack usage. The non-recursive
+match limit is not relevant for DFA matching, and is ignored.
+.
+.
.SS "Changing stack size in Unix-like systems"
.rs
.sp
@@ -197,6 +207,6 @@
.rs
.sp
.nf
-Last updated: 21 November 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 23 December 2016
+Copyright (c) 1997-2016 University of Cambridge.
.fi
Modified: code/trunk/doc/pcre2syntax.3
===================================================================
--- code/trunk/doc/pcre2syntax.3 2016-12-21 17:09:11 UTC (rev 622)
+++ code/trunk/doc/pcre2syntax.3 2016-12-23 11:04:51 UTC (rev 623)
@@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "28 September 2016" "PCRE2 10.23"
+.TH PCRE2SYNTAX 3 "23 December 2016" "PCRE2 10.23"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@@ -428,9 +428,10 @@
(*UCP) set PCRE2_UCP (use Unicode properties for \ed etc)
.sp
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
-limits set by the caller of pcre2_match(), not increase them. The application
-can lock out the use of (*UTF) and (*UCP) by setting the PCRE2_NEVER_UTF or
-PCRE2_NEVER_UCP options, respectively, at compile time.
+limits set by the caller of \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP, not
+increase them. The application can lock out the use of (*UTF) and (*UCP) by
+setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at
+compile time.
.
.
.SH "NEWLINE CONVENTION"
@@ -584,6 +585,6 @@
.rs
.sp
.nf
-Last updated: 28 September 2016
+Last updated: 23 December 2016
Copyright (c) 1997-2016 University of Cambridge.
.fi
Modified: code/trunk/src/pcre2_dfa_match.c
===================================================================
--- code/trunk/src/pcre2_dfa_match.c 2016-12-21 17:09:11 UTC (rev 622)
+++ code/trunk/src/pcre2_dfa_match.c 2016-12-23 11:04:51 UTC (rev 623)
@@ -371,7 +371,7 @@
uint32_t offsetcount,
int *workspace,
int wscount,
- int rlevel)
+ uint32_t rlevel)
{
stateblock *active_states, *new_states, *temp_states;
stateblock *next_active_state, *next_new_state;
@@ -400,7 +400,7 @@
BOOL reset_could_continue = FALSE;
-rlevel++;
+if (rlevel++ > mb->match_limit_recursion) return PCRE2_ERROR_RECURSIONLIMIT;
offsetcount &= (uint32_t)(-2); /* Round down */
wscount -= 2;
@@ -2591,7 +2591,7 @@
sizeof(local_workspace)/sizeof(int), /* size of same */
rlevel); /* function recursion level */
- if (rc == PCRE2_ERROR_DFA_UITEM) return rc;
+ if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
if ((rc >= 0) == (codevalue == OP_ASSERT || codevalue == OP_ASSERTBACK))
{ ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); }
}
@@ -2710,7 +2710,7 @@
sizeof(local_workspace)/sizeof(int), /* size of same */
rlevel); /* function recursion level */
- if (rc == PCRE2_ERROR_DFA_UITEM) return rc;
+ if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc;
if ((rc >= 0) ==
(condcode == OP_ASSERT || condcode == OP_ASSERTBACK))
{ ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); }
@@ -3216,6 +3216,7 @@
{
mb->callout = NULL;
mb->memctl = re->memctl;
+ mb->match_limit_recursion = PRIV(default_match_context).recursion_limit;
}
else
{
@@ -3228,7 +3229,10 @@
mb->callout = mcontext->callout;
mb->callout_data = mcontext->callout_data;
mb->memctl = mcontext->memctl;
+ mb->match_limit_recursion = mcontext->recursion_limit;
}
+if (mb->match_limit_recursion > re->limit_recursion)
+ mb->match_limit_recursion = re->limit_recursion;
mb->start_code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) +
re->name_count * re->name_entry_size;
Modified: code/trunk/src/pcre2_intmodedep.h
===================================================================
--- code/trunk/src/pcre2_intmodedep.h 2016-12-21 17:09:11 UTC (rev 622)
+++ code/trunk/src/pcre2_intmodedep.h 2016-12-23 11:04:51 UTC (rev 623)
@@ -843,6 +843,7 @@
PCRE2_SPTR last_used_ptr; /* Latest consulted character */
const uint8_t *tables; /* Character tables */
PCRE2_SIZE start_offset; /* The start offset value */
+ uint32_t match_limit_recursion; /* As it says */
uint32_t moptions; /* Match options */
uint32_t poptions; /* Pattern options */
uint32_t nltype; /* Newline type */
Modified: code/trunk/testdata/testinput6
===================================================================
--- code/trunk/testdata/testinput6 2016-12-21 17:09:11 UTC (rev 622)
+++ code/trunk/testdata/testinput6 2016-12-23 11:04:51 UTC (rev 623)
@@ -4882,4 +4882,8 @@
aaa\=dfa,allcaptures
a\=dfa,allcaptures
+/(*LIMIT_RECURSION=600)^((.)(?1)|.)$/
+\= Expect recursion limit exceeded
+ a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
+
# End of testinput6
Modified: code/trunk/testdata/testoutput6
===================================================================
--- code/trunk/testdata/testoutput6 2016-12-21 17:09:11 UTC (rev 622)
+++ code/trunk/testdata/testoutput6 2016-12-23 11:04:51 UTC (rev 623)
@@ -7682,4 +7682,9 @@
** Ignored after DFA matching: allcaptures
0: a
+/(*LIMIT_RECURSION=600)^((.)(?1)|.)$/
+\= Expect recursion limit exceeded
+ a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]
+Failed: error -53: recursion limit exceeded
+
# End of testinput6