Revision: 1454
http://vcs.pcre.org/viewvc?view=rev&revision=1454
Author: ph10
Date: 2014-02-09 18:55:03 +0000 (Sun, 09 Feb 2014)
Log Message:
-----------
Implement pcre_stack_guard.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/doc/pcreapi.3
code/trunk/doc/pcretest.1
code/trunk/pcre.h.in
code/trunk/pcre_compile.c
code/trunk/pcre_globals.c
code/trunk/pcre_internal.h
code/trunk/pcreposix.c
code/trunk/pcretest.c
code/trunk/testdata/testinput2
code/trunk/testdata/testoutput2
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2014-01-30 06:10:21 UTC (rev 1453)
+++ code/trunk/ChangeLog 2014-02-09 18:55:03 UTC (rev 1454)
@@ -99,6 +99,12 @@
20. The fast forward newline mechanism could enter to an infinite loop on
certain invalid UTF-8 input. Although we don't support these cases
this issue can be fixed by a performance optimization.
+
+21. Change 33 of 8.34 is not sufficient to ensure stack safety because it does
+ not take account if existing stack usage. There is now a new global
+ variable called pcre_stack_guard that can be set to point to an external
+ function to check stack availability. It is called at the start of
+ processing every parenthesized group.
Version 8.34 15-December-2013
Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3 2014-01-30 06:10:21 UTC (rev 1453)
+++ code/trunk/doc/pcreapi.3 2014-02-09 18:55:03 UTC (rev 1454)
@@ -1,4 +1,4 @@
-.TH PCREAPI 3 "03 January 2014" "PCRE 8.35"
+.TH PCREAPI 3 "09 February 2014" "PCRE 8.35"
.SH NAME
PCRE - Perl-compatible regular expressions
.sp
@@ -116,6 +116,8 @@
.B void (*pcre_stack_free)(void *);
.sp
.B int (*pcre_callout)(pcre_callout_block *);
+.sp
+.B int (*pcre_stack_guard)(void);
.fi
.
.
@@ -286,6 +288,14 @@
\fBpcrecallout\fP
.\"
documentation.
+.P
+The global variable \fBpcre_stack_guard\fP initially contains NULL. It can be
+set by the caller to a function that is called by PCRE whenever it starts
+to compile a parenthesized part of a pattern. When parentheses are nested, PCRE
+uses recursive function calls, which use up the system stack. This function is
+provided so that applications with restricted stacks can force a compilation
+error if the stack runs out. The function should return zero if all is well, or
+non-zero to force an error.
.
.
.\" HTML <a name="newlines"></a>
@@ -337,7 +347,8 @@
The PCRE functions can be used in multi-threading applications, with the
proviso that the memory management functions pointed to by \fBpcre_malloc\fP,
\fBpcre_free\fP, \fBpcre_stack_malloc\fP, and \fBpcre_stack_free\fP, and the
-callout function pointed to by \fBpcre_callout\fP, are shared by all threads.
+callout and stack-checking functions pointed to by \fBpcre_callout\fP and
+\fBpcre_stack_guard\fP, are shared by all threads.
.P
The compiled form of a regular expression is not altered during matching, so
the same compiled pattern can safely be used by several threads at once.
@@ -465,7 +476,10 @@
The output is a long integer that gives the maximum depth of nesting of
parentheses (of any kind) in a pattern. This limit is imposed to cap the amount
of system stack used when a pattern is compiled. It is specified when PCRE is
-built; the default is 250.
+built; the default is 250. This limit does not take into account the stack that
+may already be used by the calling application. For finer control over
+compilation stack usage, you can set a pointer to an external checking function
+in \fBpcre_stack_guard\fP.
.sp
PCRE_CONFIG_MATCH_LIMIT
.sp
@@ -991,6 +1005,8 @@
81 missing opening brace after \eo
82 parentheses are too deeply nested
83 invalid range in character class
+ 84 group name must start with a non-digit
+ 85 parentheses are too deeply nested (stack check)
.sp
The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
be used if the limits were changed when PCRE was built.
@@ -2898,6 +2914,6 @@
.rs
.sp
.nf
-Last updated: 03 January 2014
+Last updated: 09 February 2014
Copyright (c) 1997-2014 University of Cambridge.
.fi
Modified: code/trunk/doc/pcretest.1
===================================================================
--- code/trunk/doc/pcretest.1 2014-01-30 06:10:21 UTC (rev 1453)
+++ code/trunk/doc/pcretest.1 2014-02-09 18:55:03 UTC (rev 1454)
@@ -1,4 +1,4 @@
-.TH PCRETEST 1 "17 January 2014" "PCRE 8.35"
+.TH PCRETEST 1 "09 February 2014" "PCRE 8.35"
.SH NAME
pcretest - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@@ -333,6 +333,7 @@
\fB/N\fP set PCRE_NO_AUTO_CAPTURE
\fB/O\fP set PCRE_NO_AUTO_POSSESS
\fB/P\fP use the POSIX wrapper
+ \fB/Q\fP test external stack check function
\fB/S\fP study the pattern after compilation
\fB/s\fP set PCRE_DOTALL
\fB/T\fP select character tables
@@ -519,6 +520,15 @@
successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the
JIT compiled code is also output.
.P
+The \fB/Q\fP modifier is used to test the use of \fBpcre_stack_guard\fP. It
+must be followed by '0' or '1', specifying the return code to be given from an
+external function that is passed to PCRE and used for stack checking during
+compilation (see the
+.\" HREF
+\fBpcreapi\fP
+.\"
+documentation for details).
+.P
The \fB/S\fP modifier causes \fBpcre[16|32]_study()\fP to be called after the
expression has been compiled, and the results used when the expression is
matched. There are a number of qualifying characters that may follow \fB/S\fP.
@@ -1141,6 +1151,6 @@
.rs
.sp
.nf
-Last updated: 17 January 2014
+Last updated: 09 February 2014
Copyright (c) 1997-2014 University of Cambridge.
.fi
Modified: code/trunk/pcre.h.in
===================================================================
--- code/trunk/pcre.h.in 2014-01-30 06:10:21 UTC (rev 1453)
+++ code/trunk/pcre.h.in 2014-02-09 18:55:03 UTC (rev 1454)
@@ -5,7 +5,7 @@
/* This is the public header file for the PCRE library, to be #included by
applications that call the PCRE functions.
- Copyright (c) 1997-2013 University of Cambridge
+ Copyright (c) 1997-2014 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -491,36 +491,42 @@
PCRE_EXP_DECL void *(*pcre_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre_stack_free)(void *);
PCRE_EXP_DECL int (*pcre_callout)(pcre_callout_block *);
+PCRE_EXP_DECL int (*pcre_stack_guard)(void);
PCRE_EXP_DECL void *(*pcre16_malloc)(size_t);
PCRE_EXP_DECL void (*pcre16_free)(void *);
PCRE_EXP_DECL void *(*pcre16_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre16_stack_free)(void *);
PCRE_EXP_DECL int (*pcre16_callout)(pcre16_callout_block *);
+PCRE_EXP_DECL int (*pcre16_stack_guard)(void);
PCRE_EXP_DECL void *(*pcre32_malloc)(size_t);
PCRE_EXP_DECL void (*pcre32_free)(void *);
PCRE_EXP_DECL void *(*pcre32_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre32_stack_free)(void *);
PCRE_EXP_DECL int (*pcre32_callout)(pcre32_callout_block *);
+PCRE_EXP_DECL int (*pcre32_stack_guard)(void);
#else /* VPCOMPAT */
PCRE_EXP_DECL void *pcre_malloc(size_t);
PCRE_EXP_DECL void pcre_free(void *);
PCRE_EXP_DECL void *pcre_stack_malloc(size_t);
PCRE_EXP_DECL void pcre_stack_free(void *);
PCRE_EXP_DECL int pcre_callout(pcre_callout_block *);
+PCRE_EXP_DECL int pcre_stack_guard(void);
PCRE_EXP_DECL void *pcre16_malloc(size_t);
PCRE_EXP_DECL void pcre16_free(void *);
PCRE_EXP_DECL void *pcre16_stack_malloc(size_t);
PCRE_EXP_DECL void pcre16_stack_free(void *);
PCRE_EXP_DECL int pcre16_callout(pcre16_callout_block *);
+PCRE_EXP_DECL int pcre16_stack_guard(void);
PCRE_EXP_DECL void *pcre32_malloc(size_t);
PCRE_EXP_DECL void pcre32_free(void *);
PCRE_EXP_DECL void *pcre32_stack_malloc(size_t);
PCRE_EXP_DECL void pcre32_stack_free(void *);
PCRE_EXP_DECL int pcre32_callout(pcre32_callout_block *);
+PCRE_EXP_DECL int pcre32_stack_guard(void);
#endif /* VPCOMPAT */
/* User defined callback which provides a stack just before the match starts. */
Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c 2014-01-30 06:10:21 UTC (rev 1453)
+++ code/trunk/pcre_compile.c 2014-02-09 18:55:03 UTC (rev 1454)
@@ -547,6 +547,8 @@
"parentheses are too deeply nested\0"
"invalid range in character class\0"
"group name must start with a non-digit\0"
+ /* 85 */
+ "parentheses are too deeply nested (stack check)\0"
;
/* Table to identify digits and hex digits. This is used when compiling
@@ -8033,6 +8035,16 @@
unsigned int max_bracount;
branch_chain bc;
+/* If set, call the external function that checks for stack availability. */
+
+if (PUBL(stack_guard) != NULL && PUBL(stack_guard)())
+ {
+ *errorcodeptr= ERR85;
+ return FALSE;
+ }
+
+/* Miscellaneous initialization */
+
bc.outer = bcptr;
bc.current_branch = code;
Modified: code/trunk/pcre_globals.c
===================================================================
--- code/trunk/pcre_globals.c 2014-01-30 06:10:21 UTC (rev 1453)
+++ code/trunk/pcre_globals.c 2014-02-09 18:55:03 UTC (rev 1454)
@@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
- Copyright (c) 1997-2012 University of Cambridge
+ Copyright (c) 1997-2014 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -72,6 +72,7 @@
PCRE_EXP_DATA_DEFN void *(*PUBL(stack_malloc))(size_t) = LocalPcreMalloc;
PCRE_EXP_DATA_DEFN void (*PUBL(stack_free))(void *) = LocalPcreFree;
PCRE_EXP_DATA_DEFN int (*PUBL(callout))(PUBL(callout_block) *) = NULL;
+PCRE_EXP_DATA_DEFN int (*PUBL(stack_guard))(void) = NULL;
#elif !defined VPCOMPAT
PCRE_EXP_DATA_DEFN void *(*PUBL(malloc))(size_t) = malloc;
@@ -79,6 +80,7 @@
PCRE_EXP_DATA_DEFN void *(*PUBL(stack_malloc))(size_t) = malloc;
PCRE_EXP_DATA_DEFN void (*PUBL(stack_free))(void *) = free;
PCRE_EXP_DATA_DEFN int (*PUBL(callout))(PUBL(callout_block) *) = NULL;
+PCRE_EXP_DATA_DEFN int (*PUBL(stack_guard))(void) = NULL;
#endif
/* End of pcre_globals.c */
Modified: code/trunk/pcre_internal.h
===================================================================
--- code/trunk/pcre_internal.h 2014-01-30 06:10:21 UTC (rev 1453)
+++ code/trunk/pcre_internal.h 2014-02-09 18:55:03 UTC (rev 1454)
@@ -2281,7 +2281,7 @@
ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
ERR60, ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69,
ERR70, ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79,
- ERR80, ERR81, ERR82, ERR83, ERR84, ERRCOUNT };
+ ERR80, ERR81, ERR82, ERR83, ERR84, ERR85, ERRCOUNT };
/* JIT compiling modes. The function list is indexed by them. */
Modified: code/trunk/pcreposix.c
===================================================================
--- code/trunk/pcreposix.c 2014-01-30 06:10:21 UTC (rev 1453)
+++ code/trunk/pcreposix.c 2014-02-09 18:55:03 UTC (rev 1454)
@@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
- Copyright (c) 1997-2012 University of Cambridge
+ Copyright (c) 1997-2014 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -170,7 +170,9 @@
REG_BADPAT, /* missing opening brace after \o */
REG_BADPAT, /* parentheses too deeply nested */
REG_BADPAT, /* invalid range in character class */
- REG_BADPAT /* group name must start with a non-digit */
+ REG_BADPAT, /* group name must start with a non-digit */
+ /* 85 */
+ REG_BADPAT /* parentheses too deeply nested (stack check) */
};
/* Table of texts corresponding to POSIX error codes */
Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c 2014-01-30 06:10:21 UTC (rev 1453)
+++ code/trunk/pcretest.c 2014-02-09 18:55:03 UTC (rev 1454)
@@ -233,6 +233,9 @@
#define SET_PCRE_CALLOUT8(callout) \
pcre_callout = callout
+#define SET_PCRE_STACK_GUARD8(stack_guard) \
+ pcre_stack_guard = stack_guard
+
#define PCRE_ASSIGN_JIT_STACK8(extra, callback, userdata) \
pcre_assign_jit_stack(extra, callback, userdata)
@@ -317,6 +320,9 @@
#define SET_PCRE_CALLOUT16(callout) \
pcre16_callout = (int (*)(pcre16_callout_block *))callout
+#define SET_PCRE_STACK_GUARD16(stack_guard) \
+ pcre16_stack_guard = (int (*)(void))stack_guard
+
#define PCRE_ASSIGN_JIT_STACK16(extra, callback, userdata) \
pcre16_assign_jit_stack((pcre16_extra *)extra, \
(pcre16_jit_callback)callback, userdata)
@@ -406,6 +412,9 @@
#define SET_PCRE_CALLOUT32(callout) \
pcre32_callout = (int (*)(pcre32_callout_block *))callout
+#define SET_PCRE_STACK_GUARD32(stack_guard) \
+ pcre32_stack_guard = (int (*)(void))stack_guard
+
#define PCRE_ASSIGN_JIT_STACK32(extra, callback, userdata) \
pcre32_assign_jit_stack((pcre32_extra *)extra, \
(pcre32_jit_callback)callback, userdata)
@@ -533,6 +542,14 @@
else \
SET_PCRE_CALLOUT8(callout)
+#define SET_PCRE_STACK_GUARD(stack_guard) \
+ if (pcre_mode == PCRE32_MODE) \
+ SET_PCRE_STACK_GUARD32(stack_guard); \
+ else if (pcre_mode == PCRE16_MODE) \
+ SET_PCRE_STACK_GUARD16(stack_guard); \
+ else \
+ SET_PCRE_STACK_GUARD8(stack_guard)
+
#define STRLEN(p) (pcre_mode == PCRE32_MODE ? STRLEN32(p) : pcre_mode == PCRE16_MODE ? STRLEN16(p) : STRLEN8(p))
#define PCRE_ASSIGN_JIT_STACK(extra, callback, userdata) \
@@ -756,6 +773,12 @@
else \
G(SET_PCRE_CALLOUT,BITTWO)(callout)
+#define SET_PCRE_STACK_GUARD(stack_guard) \
+ if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \
+ G(SET_PCRE_STACK_GUARD,BITONE)(stack_guard); \
+ else \
+ G(SET_PCRE_STACK_GUARD,BITTWO)(stack_guard)
+
#define STRLEN(p) ((pcre_mode == G(G(PCRE,BITONE),_MODE)) ? \
G(STRLEN,BITONE)(p) : G(STRLEN,BITTWO)(p))
@@ -897,6 +920,7 @@
#define PCHARSV PCHARSV8
#define READ_CAPTURE_NAME READ_CAPTURE_NAME8
#define SET_PCRE_CALLOUT SET_PCRE_CALLOUT8
+#define SET_PCRE_STACK_GUARD SET_PCRE_STACK_GUARD8
#define STRLEN STRLEN8
#define PCRE_ASSIGN_JIT_STACK PCRE_ASSIGN_JIT_STACK8
#define PCRE_COMPILE PCRE_COMPILE8
@@ -927,6 +951,7 @@
#define PCHARSV PCHARSV16
#define READ_CAPTURE_NAME READ_CAPTURE_NAME16
#define SET_PCRE_CALLOUT SET_PCRE_CALLOUT16
+#define SET_PCRE_STACK_GUARD SET_PCRE_STACK_GUARD16
#define STRLEN STRLEN16
#define PCRE_ASSIGN_JIT_STACK PCRE_ASSIGN_JIT_STACK16
#define PCRE_COMPILE PCRE_COMPILE16
@@ -957,6 +982,7 @@
#define PCHARSV PCHARSV32
#define READ_CAPTURE_NAME READ_CAPTURE_NAME32
#define SET_PCRE_CALLOUT SET_PCRE_CALLOUT32
+#define SET_PCRE_STACK_GUARD SET_PCRE_STACK_GUARD32
#define STRLEN STRLEN32
#define PCRE_ASSIGN_JIT_STACK PCRE_ASSIGN_JIT_STACK32
#define PCRE_COMPILE PCRE_COMPILE32
@@ -1015,6 +1041,7 @@
static int jit_was_used;
static int locale_set = 0;
static int show_malloc;
+static int stack_guard_return;
static int use_utf;
static const unsigned char *last_callout_mark = NULL;
@@ -2201,6 +2228,18 @@
/*************************************************
+* Stack guard function *
+*************************************************/
+
+/* Called from PCRE when set in pcre_stack_guard. We give an error (non-zero)
+return when a count overflows. */
+
+static int stack_guard(void)
+{
+return stack_guard_return;
+}
+
+/*************************************************
* Callout function *
*************************************************/
@@ -3445,6 +3484,7 @@
use_utf = 0;
debug_lengths = 1;
+ SET_PCRE_STACK_GUARD(NULL);
if (extend_inputline(infile, buffer, " re> ") == NULL) break;
if (infile != stdin) fprintf(outfile, "%s", (char *)buffer);
@@ -3745,6 +3785,21 @@
case 'P': do_posix = 1; break;
#endif
+ case 'Q':
+ switch (*pp)
+ {
+ case '0':
+ case '1':
+ stack_guard_return = *pp++ - '0';
+ break;
+
+ default:
+ fprintf(outfile, "** Missing 0 or 1 after /Q\n");
+ goto SKIP_DATA;
+ }
+ SET_PCRE_STACK_GUARD(stack_guard);
+ break;
+
case 'S':
do_study = 1;
for (;;)
@@ -5198,7 +5253,7 @@
if (count * 2 > use_size_offsets) count = use_size_offsets/2;
}
- /* Output the captured substrings. Note that, for the matched string,
+ /* Output the captured substrings. Note that, for the matched string,
the use of \K in an assertion can make the start later than the end. */
for (i = 0; i < count * 2; i += 2)
@@ -5217,23 +5272,23 @@
{
int start = use_offsets[i];
int end = use_offsets[i+1];
-
+
if (start > end)
{
start = use_offsets[i+1];
end = use_offsets[i];
- fprintf(outfile, "Start of matched string is beyond its end - "
- "displaying from end to start.\n");
- }
-
+ fprintf(outfile, "Start of matched string is beyond its end - "
+ "displaying from end to start.\n");
+ }
+
fprintf(outfile, "%2d: ", i/2);
PCHARSV(bptr, start, end - start, outfile);
if (verify_jit && jit_was_used) fprintf(outfile, " (JIT)");
fprintf(outfile, "\n");
-
+
/* Note: don't use the start/end variables here because we want to
show the text from what is reported as the end. */
-
+
if (do_showcaprest || (i == 0 && do_showrest))
{
fprintf(outfile, "%2d+ ", i/2);
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2014-01-30 06:10:21 UTC (rev 1453)
+++ code/trunk/testdata/testinput2 2014-02-09 18:55:03 UTC (rev 1454)
@@ -4050,5 +4050,13 @@
/abcd/f<lf>
xx\nxabcd
+
+/ -- Test stack check external calls --/
+/(((((a)))))/Q0
+
+/(((((a)))))/Q1
+
+/(((((a)))))/Q
+
/-- End of testinput2 --/
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2014-01-30 06:10:21 UTC (rev 1453)
+++ code/trunk/testdata/testoutput2 2014-02-09 18:55:03 UTC (rev 1454)
@@ -14134,5 +14134,15 @@
/abcd/f<lf>
xx\nxabcd
No match
+
+/ -- Test stack check external calls --/
+/(((((a)))))/Q0
+
+/(((((a)))))/Q1
+Failed: parentheses are too deeply nested (stack check) at offset 0
+
+/(((((a)))))/Q
+** Missing 0 or 1 after /Q
+
/-- End of testinput2 --/