Revision: 657
http://vcs.pcre.org/viewvc?view=rev&revision=657
Author: ph10
Date: 2011-08-15 18:39:09 +0100 (Mon, 15 Aug 2011)
Log Message:
-----------
Fix pcre_study() bug with \b at start of branch.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/pcre_study.c
code/trunk/testdata/testinput2
code/trunk/testdata/testoutput2
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2011-08-13 12:27:51 UTC (rev 656)
+++ code/trunk/ChangeLog 2011-08-15 17:39:09 UTC (rev 657)
@@ -245,6 +245,9 @@
47. The pattern /f.*/8s, when applied to "for" with PCRE_PARTIAL_HARD, gave a
complete match instead of a partial match. This bug was dependent on both
the PCRE_UTF8 and PCRE_DOTALL options being set.
+
+48. For a pattern such as /\babc|\bdef/ pcre_study() was failing to set up the
+ starting byte set, because \b was not being ignored.
Version 8.12 15-Jan-2011
Modified: code/trunk/pcre_study.c
===================================================================
--- code/trunk/pcre_study.c 2011-08-13 12:27:51 UTC (rev 656)
+++ code/trunk/pcre_study.c 2011-08-15 17:39:09 UTC (rev 657)
@@ -773,7 +773,6 @@
case OP_NOTUPTOI:
case OP_NOT_HSPACE:
case OP_NOT_VSPACE:
- case OP_NOT_WORD_BOUNDARY:
case OP_NRREF:
case OP_PROP:
case OP_PRUNE:
@@ -791,10 +790,16 @@
case OP_SOM:
case OP_THEN:
case OP_THEN_ARG:
- case OP_WORD_BOUNDARY:
case OP_XCLASS:
return SSB_FAIL;
+ /* We can ignore word boundary tests. */
+
+ case OP_WORD_BOUNDARY:
+ case OP_NOT_WORD_BOUNDARY:
+ tcode++;
+ break;
+
/* If we hit a bracket or a positive lookahead assertion, recurse to set
bits from within the subpattern. If it can't find anything, we have to
give up. If it finds some mandatory character(s), we are done for this
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2011-08-13 12:27:51 UTC (rev 656)
+++ code/trunk/testdata/testinput2 2011-08-15 17:39:09 UTC (rev 657)
@@ -3837,4 +3837,8 @@
/.(*F)/
\P\Pabc
+/\btype\b\W*?\btext\b\W*?\bjavascript\b/IS
+
+/\btype\b\W*?\btext\b\W*?\bjavascript\b|\burl\b\W*?\bshell:|<input\b.*?\btype\b\W*?\bimage\b|\bonkeyup\b\W*?\=/IS
+
/-- End of testinput2 --/
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2011-08-13 12:27:51 UTC (rev 656)
+++ code/trunk/testdata/testoutput2 2011-08-15 17:39:09 UTC (rev 657)
@@ -12217,4 +12217,20 @@
\P\Pabc
No match
+/\btype\b\W*?\btext\b\W*?\bjavascript\b/IS
+Capturing subpattern count = 0
+No options
+First char = 't'
+Need char = 't'
+Subject length lower bound = 18
+No set of starting bytes
+
+/\btype\b\W*?\btext\b\W*?\bjavascript\b|\burl\b\W*?\bshell:|<input\b.*?\btype\b\W*?\bimage\b|\bonkeyup\b\W*?\=/IS
+Capturing subpattern count = 0
+No options
+No first char
+No need char
+Subject length lower bound = 8
+Starting byte set: < o t u
+
/-- End of testinput2 --/