[Pcre-svn] [787] code/trunk: Fix uninitialized memory use wh…

Startseite
Nachricht löschen
Autor: Subversion repository
Datum:  
To: pcre-svn
Betreff: [Pcre-svn] [787] code/trunk: Fix uninitialized memory use when writing study data to file if no starting
Revision: 787
          http://vcs.pcre.org/viewvc?view=rev&revision=787
Author:   ph10
Date:     2011-12-06 15:37:24 +0000 (Tue, 06 Dec 2011)


Log Message:
-----------
Fix uninitialized memory use when writing study data to file if no starting
byte set exists.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/pcre_study.c


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2011-12-06 11:33:41 UTC (rev 786)
+++ code/trunk/ChangeLog    2011-12-06 15:37:24 UTC (rev 787)
@@ -22,87 +22,92 @@


 6.  Lookbehinds such as (?<=a{2}b) that contained a fixed repetition were
     erroneously being rejected as "not fixed length" if PCRE_CASELESS was set.
-    This bug was probably introduced by change 9 of 8.13. 
-    
+    This bug was probably introduced by change 9 of 8.13.
+
 7.  While fixing 6 above, I noticed that a number of other items were being
-    incorrectly rejected as "not fixed length". This arose partly because newer 
+    incorrectly rejected as "not fixed length". This arose partly because newer
     opcodes had not been added to the fixed-length checking code. I have (a)
     corrected the bug and added tests for these items, and (b) arranged for an
     error to occur if an unknown opcode is encountered while checking for fixed
-    length instead of just assuming "not fixed length". The items that were 
-    rejected were: (*ACCEPT), (*COMMIT), (*FAIL), (*MARK), (*PRUNE), (*SKIP), 
-    (*THEN), \h, \H, \v, \V, and single character negative classes with fixed 
+    length instead of just assuming "not fixed length". The items that were
+    rejected were: (*ACCEPT), (*COMMIT), (*FAIL), (*MARK), (*PRUNE), (*SKIP),
+    (*THEN), \h, \H, \v, \V, and single character negative classes with fixed
     repetitions, e.g. [^a]{3}, with and without PCRE_CASELESS.
-    
+
 8.  A possessively repeated conditional subpattern such as (?(?=c)c|d)++ was
-    being incorrectly compiled and would have given unpredicatble results. 
-    
-9.  A possessively repeated subpattern with minimum repeat count greater than 
+    being incorrectly compiled and would have given unpredicatble results.
+
+9.  A possessively repeated subpattern with minimum repeat count greater than
     one behaved incorrectly. For example, (A){2,}+ behaved as if it was
-    (A)(A)++ which meant that, after a subsequent mismatch, backtracking into 
-    the first (A) could occur when it should not. 
-    
-10. Add a cast and remove a redundant test from the code. 
+    (A)(A)++ which meant that, after a subsequent mismatch, backtracking into
+    the first (A) could occur when it should not.


+10. Add a cast and remove a redundant test from the code.
+
11. JIT should use pcre_malloc/pcre_free for allocation.

 12. Updated pcre-config so that it no longer shows -L/usr/lib, which seems
-    best practice nowadays, and helps with cross-compiling. (If the exec_prefix 
-    is anything other than /usr, -L is still shown). 
-    
+    best practice nowadays, and helps with cross-compiling. (If the exec_prefix
+    is anything other than /usr, -L is still shown).
+
 13. In non-UTF-8 mode, \C is now supported in lookbehinds and DFA matching.


 14. Perl does not support \N without a following name in a [] class; PCRE now
     also gives an error.
-    
+
 15. If a forward reference was repeated with an upper limit of around 2000,
-    it caused the error "internal error: overran compiling workspace". The 
+    it caused the error "internal error: overran compiling workspace". The
     maximum number of forward references (including repeats) was limited by the
-    internal workspace, and dependent on the LINK_SIZE. The code has been 
-    rewritten so that the workspace expands (via pcre_malloc) if necessary, and 
-    the default depends on LINK_SIZE. There is a new upper limit (for safety) 
-    of around 200,000 forward references. While doing this, I also speeded up 
-    the filling in of repeated forward references. 
-    
+    internal workspace, and dependent on the LINK_SIZE. The code has been
+    rewritten so that the workspace expands (via pcre_malloc) if necessary, and
+    the default depends on LINK_SIZE. There is a new upper limit (for safety)
+    of around 200,000 forward references. While doing this, I also speeded up
+    the filling in of repeated forward references.
+
 16. A repeated forward reference in a pattern such as (a)(?2){2}(.) was
     incorrectly expecting the subject to contain another "a" after the start.
-    
-17. When (*SKIP:name) is activated without a corresponding (*MARK:name) earlier 
-    in the match, the SKIP should be ignored. This was not happening; instead 
-    the SKIP was being treated as NOMATCH. For patterns such as 
-    /A(*MARK:A)A+(*SKIP:B)Z|AAC/ this meant that the AAC branch was never 
-    tested. 
-    
+
+17. When (*SKIP:name) is activated without a corresponding (*MARK:name) earlier
+    in the match, the SKIP should be ignored. This was not happening; instead
+    the SKIP was being treated as NOMATCH. For patterns such as
+    /A(*MARK:A)A+(*SKIP:B)Z|AAC/ this meant that the AAC branch was never
+    tested.
+
 18. The behaviour of (*MARK), (*PRUNE), and (*THEN) has been reworked and is
     now much more compatible with Perl, in particular in cases where the result
     is a non-match for a non-anchored pattern. For example, if
     /b(*:m)f|a(*:n)w/ is matched against "abc", the non-match returns the name
-    "m", where previously it did not return a name. A side effect of this 
-    change is that for partial matches, the last encountered mark name is 
+    "m", where previously it did not return a name. A side effect of this
+    change is that for partial matches, the last encountered mark name is
     returned, as for non matches. A number of tests that were previously not
     Perl-compatible have been moved into the Perl-compatible test files. The
     refactoring has had the pleasing side effect of removing one argument from
     the match() function, thus reducing its stack requirements.
-    
-19. If the /S+ option was used in pcretest to study a pattern using JIT, 
+
+19. If the /S+ option was used in pcretest to study a pattern using JIT,
     subsequent uses of /S (without +) incorrectly behaved like /S+.
-    
+
 21. Retrieve executable code size support for the JIT compiler and fixing
     some warnings.
-    
-22. A caseless match of a UTF-8 character whose other case uses fewer bytes did 
-    not work when the shorter character appeared right at the end of the 
+
+22. A caseless match of a UTF-8 character whose other case uses fewer bytes did
+    not work when the shorter character appeared right at the end of the
     subject string.
+
+23. Added some (int) casts to non-JIT modules to reduce warnings on 64-bit
+    systems.
+
+24. Added PCRE_INFO_JITSIZE to pass on the value from (21) above, and also
+    output it when the /M option is used in pcretest.
+
+25. The CheckMan script was not being included in the distribution. Also, added
+    an explicit "perl" to run Perl scripts from the PrepareRelease script
+    because this is reportedly needed in Windows.


-23. Added some (int) casts to non-JIT modules to reduce warnings on 64-bit 
-    systems. 
-    
-24. Added PCRE_INFO_JITSIZE to pass on the value from (21) above, and also 
-    output it when the /M option is used in pcretest. 
-    
-25. The CheckMan script was not being included in the distribution. Also, added
-    an explicit "perl" to run Perl scripts from the PrepareRelease script 
-    because this is reportedly needed in Windows. 
+26. If study data was being save in a file and studying had not found a set of
+    "starts with" bytes for the pattern, the data written to the file (though 
+    never used) was taken from uninitialized memory and so caused valgrind to
+    complain.  



Version 8.20 21-Oct-2011

Modified: code/trunk/pcre_study.c
===================================================================
--- code/trunk/pcre_study.c    2011-12-06 11:33:41 UTC (rev 786)
+++ code/trunk/pcre_study.c    2011-12-06 15:37:24 UTC (rev 787)
@@ -286,8 +286,8 @@
     cc++;
     break;


-    /* The single-byte matcher means we can't proceed in UTF-8 mode. (In 
-    non-UTF-8 mode \C will actually be turned into OP_ALLANY, so won't ever 
+    /* The single-byte matcher means we can't proceed in UTF-8 mode. (In
+    non-UTF-8 mode \C will actually be turned into OP_ALLANY, so won't ever
     appear, but leave the code, just in case.) */


     case OP_ANYBYTE:
@@ -1321,12 +1321,17 @@


study->size = sizeof(pcre_study_data);
study->flags = 0;
+
+ /* Set the start bits always, to avoid unset memory errors if the
+ study data is written to a file, but set the flag only if any of the bits
+ are set, to save time looking when none are. */

-  if (bits_set)
+  if (bits_set) 
     {
     study->flags |= PCRE_STUDY_MAPPED;
     memcpy(study->start_bits, start_bits, sizeof(start_bits));
     }
+  else memset(study->start_bits, 0, 32 * sizeof(uschar));


/* Always set the minlength value in the block, because the JIT compiler
makes use of it. However, don't set the bit unless the length is greater than