[Pcre-svn] [1389] code/trunk: Implement compile-time nested …

トップ ページ
このメッセージを削除
著者: Subversion repository
日付:  
To: pcre-svn
題目: [Pcre-svn] [1389] code/trunk: Implement compile-time nested parentheses limit, specified at build time.
Revision: 1389
          http://vcs.pcre.org/viewvc?view=rev&revision=1389
Author:   ph10
Date:     2013-11-05 18:05:29 +0000 (Tue, 05 Nov 2013)


Log Message:
-----------
Implement compile-time nested parentheses limit, specified at build time.

Modified Paths:
--------------
    code/trunk/CMakeLists.txt
    code/trunk/ChangeLog
    code/trunk/README
    code/trunk/config-cmake.h.in
    code/trunk/configure.ac
    code/trunk/doc/pcre_config.3
    code/trunk/doc/pcreapi.3
    code/trunk/doc/pcrelimits.3
    code/trunk/pcre.h.in
    code/trunk/pcre_compile.c
    code/trunk/pcre_config.c
    code/trunk/pcre_internal.h
    code/trunk/pcreposix.c
    code/trunk/pcretest.c


Modified: code/trunk/CMakeLists.txt
===================================================================
--- code/trunk/CMakeLists.txt    2013-11-05 15:24:45 UTC (rev 1388)
+++ code/trunk/CMakeLists.txt    2013-11-05 18:05:29 UTC (rev 1389)
@@ -64,6 +64,7 @@
 # 2013-07-01 PH realized that the "support" for GCOV was a total nonsense and
 #            so it has been removed.
 # 2013-10-08 PH got rid of the "source" command, which is a bash-ism (use ".")
+# 2013-11-05 PH added support for PARENS_NEST_LIMIT


PROJECT(PCRE C CXX)

@@ -132,6 +133,9 @@
 SET(PCRE_LINK_SIZE "2" CACHE STRING
     "Internal link size (2, 3 or 4 allowed). See LINK_SIZE in config.h.in for details.")


+SET(PCRE_PARENS_NEST_LIMIT "250" CACHE STRING
+    "Default nested parentheses limit. See PARENS_NEST_LIMIT in config.h.in for details.")
+     
 SET(PCRE_MATCH_LIMIT "10000000" CACHE STRING
     "Default limit on internal looping. See MATCH_LIMIT in config.h.in for details.")


@@ -911,6 +915,7 @@
MESSAGE(STATUS " No stack recursion .............. : ${PCRE_NO_RECURSE}")
MESSAGE(STATUS " POSIX mem threshold ............. : ${PCRE_POSIX_MALLOC_THRESHOLD}")
MESSAGE(STATUS " Internal link size .............. : ${PCRE_LINK_SIZE}")
+ MESSAGE(STATUS " Parentheses nest limit .......... : ${PCRE_PARENS_NEST_LIMIT}")
MESSAGE(STATUS " Match limit ..................... : ${PCRE_MATCH_LIMIT}")
MESSAGE(STATUS " Match limit recursion ........... : ${PCRE_MATCH_LIMIT_RECURSION}")
MESSAGE(STATUS " Build shared libs ............... : ${BUILD_SHARED_LIBS}")

Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2013-11-05 15:24:45 UTC (rev 1388)
+++ code/trunk/ChangeLog    2013-11-05 18:05:29 UTC (rev 1389)
@@ -154,6 +154,11 @@


 32. Added the "forbid" facility to pcretest so that putting tests into the 
     wrong test files can sometimes be quickly detected.
+    
+33. There is now a limit (default 250) on the depth of nesting of parentheses. 
+    This limit is imposed to control the amount of system stack used at compile 
+    time. It can be changed at build time by --with-parens-nest-limit=xxx or 
+    the equivalent in CMake. 



Version 8.33 28-May-2013

Modified: code/trunk/README
===================================================================
--- code/trunk/README    2013-11-05 15:24:45 UTC (rev 1388)
+++ code/trunk/README    2013-11-05 18:05:29 UTC (rev 1389)
@@ -268,10 +268,18 @@
   --with-posix-malloc-threshold=20


on the "configure" command.
+
+. PCRE has a counter that limits the depth of nesting of parentheses in a
+ pattern. This limits the amount of system stack that a pattern uses when it
+ is compiled. The default is 250, but you can change it by setting, for
+ example,
+
+ --with-parens-nest-limit=500

-. PCRE has a counter that can be set to limit the amount of resources it uses.
- If the limit is exceeded during a match, the match fails. The default is ten
- million. You can change the default by setting, for example,
+. PCRE has a counter that can be set to limit the amount of resources it uses
+ when matching a pattern. If the limit is exceeded during a match, the match
+ fails. The default is ten million. You can change the default by setting, for
+ example,

--with-match-limit=500000

@@ -979,4 +987,4 @@
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 02 July 2013
+Last updated: 05 November 2013

Modified: code/trunk/config-cmake.h.in
===================================================================
--- code/trunk/config-cmake.h.in    2013-11-05 15:24:45 UTC (rev 1388)
+++ code/trunk/config-cmake.h.in    2013-11-05 18:05:29 UTC (rev 1389)
@@ -46,6 +46,7 @@
 #define NEWLINE            @NEWLINE@
 #define POSIX_MALLOC_THRESHOLD    @PCRE_POSIX_MALLOC_THRESHOLD@
 #define LINK_SIZE        @PCRE_LINK_SIZE@
+#define PARENS_NEST_LIMIT       @PCRE_PARENS_NEST_LIMIT@
 #define MATCH_LIMIT        @PCRE_MATCH_LIMIT@
 #define MATCH_LIMIT_RECURSION    @PCRE_MATCH_LIMIT_RECURSION@
 #define PCREGREP_BUFSIZE        @PCREGREP_BUFSIZE@


Modified: code/trunk/configure.ac
===================================================================
--- code/trunk/configure.ac    2013-11-05 15:24:45 UTC (rev 1388)
+++ code/trunk/configure.ac    2013-11-05 18:05:29 UTC (rev 1389)
@@ -274,6 +274,12 @@
             AS_HELP_STRING([--with-link-size=N],
                            [internal link size (2, 3, or 4 allowed; default=2)]),
             , with_link_size=2)
+            
+# Handle --with-parens-nest-limit=N
+AC_ARG_WITH(parens-nest-limit,
+            AS_HELP_STRING([--with-parens-nest-limit=N],
+                           [nested parentheses limit (default=250)]),
+            , with_parens_nest_limit=250)                  


# Handle --with-match-limit=N
AC_ARG_WITH(match-limit,
@@ -783,6 +789,11 @@
small, the wrapper function uses space on the stack, because this is
faster than using malloc() for each call. The threshold above which
the stack is no longer used is defined by POSIX_MALLOC_THRESHOLD.])
+
+AC_DEFINE_UNQUOTED([PARENS_NEST_LIMIT], [$with_parens_nest_limit], [
+ The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
+ parentheses (of any kind) in a pattern. This limits the amount of system
+ stack that is used while compiling a pattern.])

 AC_DEFINE_UNQUOTED([MATCH_LIMIT], [$with_match_limit], [
   The value of MATCH_LIMIT determines the default number of times the
@@ -1071,6 +1082,7 @@
     Use stack recursion ............. : ${enable_stack_for_recursion}
     POSIX mem threshold ............. : ${with_posix_malloc_threshold}
     Internal link size .............. : ${with_link_size}
+    Nested parentheses limit ........ : ${with_parens_nest_limit} 
     Match limit ..................... : ${with_match_limit}
     Match limit recursion ........... : ${with_match_limit_recursion}
     Build shared libs ............... : ${enable_shared}


Modified: code/trunk/doc/pcre_config.3
===================================================================
--- code/trunk/doc/pcre_config.3    2013-11-05 15:24:45 UTC (rev 1388)
+++ code/trunk/doc/pcre_config.3    2013-11-05 18:05:29 UTC (rev 1389)
@@ -1,4 +1,4 @@
-.TH PCRE_CONFIG 3 "24 June 2012" "PCRE 8.30"
+.TH PCRE_CONFIG 3 "05 November 2013" "PCRE 8.34"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH SYNOPSIS
@@ -33,6 +33,7 @@
                               target architecture for the JIT compiler,
                               or NULL if there is no JIT support
   PCRE_CONFIG_LINK_SIZE     Internal link size: 2, 3, or 4
+  PCRE_CONFIG_PARENS_LIMIT  Parentheses nesting limit 
   PCRE_CONFIG_MATCH_LIMIT   Internal resource limit
   PCRE_CONFIG_MATCH_LIMIT_RECURSION
                             Internal recursion depth limit


Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2013-11-05 15:24:45 UTC (rev 1388)
+++ code/trunk/doc/pcreapi.3    2013-11-05 18:05:29 UTC (rev 1389)
@@ -1,4 +1,4 @@
-.TH PCREAPI 3 "08 October 2013" "PCRE 8.34"
+.TH PCREAPI 3 "05 November 2013" "PCRE 8.34"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .sp
@@ -460,6 +460,13 @@
 .\"
 documentation.
 .sp
+  PCRE_CONFIG_PARENS_LIMIT
+.sp
+The output is a long integer that gives the maximum depth of nesting of 
+parentheses (of any kind) in a pattern. This limit is imposed to cap the amount 
+of system stack used when a pattern is compiled. It is specified when PCRE is
+built; the default is 250.
+.sp
   PCRE_CONFIG_MATCH_LIMIT
 .sp
 The output is a long integer that gives the default limit for the number of
@@ -2870,6 +2877,6 @@
 .rs
 .sp
 .nf
-Last updated: 08 October 2013
+Last updated: 05 November 2013
 Copyright (c) 1997-2013 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcrelimits.3
===================================================================
--- code/trunk/doc/pcrelimits.3    2013-11-05 15:24:45 UTC (rev 1388)
+++ code/trunk/doc/pcrelimits.3    2013-11-05 18:05:29 UTC (rev 1389)
@@ -1,4 +1,4 @@
-.TH PCRELIMITS 3 "15 August 2013" "PCRE 8.34"
+.TH PCRELIMITS 3 "05 November 2013" "PCRE 8.34"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "SIZE AND OTHER LIMITATIONS"
@@ -24,7 +24,10 @@
 All values in repeating quantifiers must be less than 65536.
 .P
 There is no limit to the number of parenthesized subpatterns, but there can be
-no more than 65535 capturing subpatterns.
+no more than 65535 capturing subpatterns. There is, however, a limit to the
+depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
+order to limit the amount of system stack used at compile time. The limit can
+be specified when PCRE is built; the default is 250.
 .P
 There is a limit to the number of forward references to subsequent subpatterns
 of around 200,000. Repeated forward references with fixed upper limits, for
@@ -63,6 +66,6 @@
 .rs
 .sp
 .nf
-Last updated: 15 August 2013
+Last updated: 05 November 2013
 Copyright (c) 1997-2013 University of Cambridge.
 .fi


Modified: code/trunk/pcre.h.in
===================================================================
--- code/trunk/pcre.h.in    2013-11-05 15:24:45 UTC (rev 1388)
+++ code/trunk/pcre.h.in    2013-11-05 18:05:29 UTC (rev 1389)
@@ -298,6 +298,7 @@
 #define PCRE_CONFIG_UTF16                  10
 #define PCRE_CONFIG_JITTARGET              11
 #define PCRE_CONFIG_UTF32                  12
+#define PCRE_CONFIG_PARENS_LIMIT           13


/* Request types for pcre_study(). Do not re-arrange, in order to remain
compatible. */

Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c    2013-11-05 15:24:45 UTC (rev 1388)
+++ code/trunk/pcre_compile.c    2013-11-05 18:05:29 UTC (rev 1389)
@@ -531,6 +531,7 @@
   /* 80 */
   "non-octal character in \\o{} (closing brace missing?)\0"
   "missing opening brace after \\o\0"
+  "parentheses are too deeply nested\0"
   ;


 /* Table to identify digits and hex digits. This is used when compiling
@@ -7290,11 +7291,20 @@
       skipbytes = IMM2_SIZE;
       }


-    /* Process nested bracketed regex. Assertions used not to be repeatable,
-    but this was changed for Perl compatibility, so all kinds can now be
-    repeated. We copy code into a non-register variable (tempcode) in order to
-    be able to pass its address because some compilers complain otherwise. */
+    /* Process nested bracketed regex. First check for parentheses nested too
+    deeply. */


+    if ((cd->parens_depth += 1) > PARENS_NEST_LIMIT)
+      {
+      *errorcodeptr = ERR82;
+      goto FAILED;  
+      }  
+ 
+    /* Assertions used not to be repeatable, but this was changed for Perl
+    compatibility, so all kinds can now be repeated. We copy code into a
+    non-register variable (tempcode) in order to be able to pass its address
+    because some compilers complain otherwise. */
+
     previous = code;                      /* For handling repetition */
     *code = bravalue;
     tempcode = code;
@@ -7323,6 +7333,8 @@
            &length_prevgroup              /* Pre-compile phase */
          ))
       goto FAILED;
+      
+    cd->parens_depth -= 1;


     /* If this was an atomic group and there are no capturing groups within it,
     generate OP_ONCE_NC instead of OP_ONCE. */
@@ -8898,6 +8910,7 @@
 cd->start_pattern = (const pcre_uchar *)pattern;
 cd->end_pattern = (const pcre_uchar *)(pattern + STRLEN_UC((const pcre_uchar *)pattern));
 cd->req_varyopt = 0;
+cd->parens_depth = 0;
 cd->assert_depth = 0;
 cd->max_lookbehind = 0;
 cd->external_options = options;
@@ -8983,6 +8996,7 @@
 */


cd->final_bracount = cd->bracount; /* Save for checking forward references */
+cd->parens_depth = 0;
cd->assert_depth = 0;
cd->bracount = 0;
cd->max_lookbehind = 0;

Modified: code/trunk/pcre_config.c
===================================================================
--- code/trunk/pcre_config.c    2013-11-05 15:24:45 UTC (rev 1388)
+++ code/trunk/pcre_config.c    2013-11-05 18:05:29 UTC (rev 1389)
@@ -161,6 +161,10 @@
   *((int *)where) = POSIX_MALLOC_THRESHOLD;
   break;


+ case PCRE_CONFIG_PARENS_LIMIT:
+ *((unsigned long int *)where) = PARENS_NEST_LIMIT;
+ break;
+
case PCRE_CONFIG_MATCH_LIMIT:
*((unsigned long int *)where) = MATCH_LIMIT;
break;

Modified: code/trunk/pcre_internal.h
===================================================================
--- code/trunk/pcre_internal.h    2013-11-05 15:24:45 UTC (rev 1388)
+++ code/trunk/pcre_internal.h    2013-11-05 18:05:29 UTC (rev 1389)
@@ -2335,9 +2335,10 @@
        ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
        ERR60, ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69,
        ERR70, ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79,
-       ERR80, ERR81, ERRCOUNT };
+       ERR80, ERR81, ERR82, ERRCOUNT };


 /* JIT compiling modes. The function list is indexed by them. */
+
 enum { JIT_COMPILE, JIT_PARTIAL_SOFT_COMPILE, JIT_PARTIAL_HARD_COMPILE,
        JIT_NUMBER_OF_COMPILE_MODES };


@@ -2490,6 +2491,7 @@
   int  top_backref;                 /* Maximum back reference */
   unsigned int backref_map;         /* Bitmap of low back refs */
   unsigned int namedrefcount;       /* Number of backreferences by name */
+  int  parens_depth;                /* Depth of nested parentheses */ 
   int  assert_depth;                /* Depth of nested assertions */
   pcre_uint32 external_options;     /* External (initial) options */
   pcre_uint32 external_flags;       /* External flag bits to be set */


Modified: code/trunk/pcreposix.c
===================================================================
--- code/trunk/pcreposix.c    2013-11-05 15:24:45 UTC (rev 1388)
+++ code/trunk/pcreposix.c    2013-11-05 18:05:29 UTC (rev 1389)
@@ -167,7 +167,8 @@
   REG_BADPAT,  /* non-hex character in \\x{} (closing brace missing?) */ 
   /* 80 */ 
   REG_BADPAT,  /* non-octal character in \o{} (closing brace missing?) */ 
-  REG_BADPAT   /* missing opening brace after \o */
+  REG_BADPAT,  /* missing opening brace after \o */
+  REG_BADPAT   /* parentheses too deeply nested */
 };


/* Table of texts corresponding to POSIX error codes */

Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c    2013-11-05 15:24:45 UTC (rev 1388)
+++ code/trunk/pcretest.c    2013-11-05 18:05:29 UTC (rev 1389)
@@ -3303,6 +3303,8 @@
     printf("  Internal link size = %d\n", rc);
     (void)PCRE_CONFIG(PCRE_CONFIG_POSIX_MALLOC_THRESHOLD, &rc);
     printf("  POSIX malloc threshold = %d\n", rc);
+    (void)PCRE_CONFIG(PCRE_CONFIG_PARENS_LIMIT, &lrc);
+    printf("  Parentheses nest limit = %ld\n", lrc);
     (void)PCRE_CONFIG(PCRE_CONFIG_MATCH_LIMIT, &lrc);
     printf("  Default match limit = %ld\n", lrc);
     (void)PCRE_CONFIG(PCRE_CONFIG_MATCH_LIMIT_RECURSION, &lrc);