[Pcre-svn] [1272] code/trunk: Fix Bugzilla #2642: no match b…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [1272] code/trunk: Fix Bugzilla #2642: no match bug in 8-bit mode for caseless invalid utf
Revision: 1272
          http://www.exim.org/viewvc/pcre2?view=rev&revision=1272
Author:   ph10
Date:     2020-09-15 15:36:23 +0100 (Tue, 15 Sep 2020)
Log Message:
-----------
Fix Bugzilla #2642: no match bug in 8-bit mode for caseless invalid utf 
matching.


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/src/pcre2_match.c
    code/trunk/testdata/testinput10
    code/trunk/testdata/testoutput10


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2020-09-14 15:45:34 UTC (rev 1271)
+++ code/trunk/ChangeLog    2020-09-15 14:36:23 UTC (rev 1272)
@@ -66,7 +66,14 @@
 12. Further to 10 above, pcre2test has been updated to detect and grumble if a 
 delimiter other than / is used after #perltest.


+13. Fixed a bug with PCRE2_MATCH_INVALID_UTF in 8-bit mode when PCRE2_CASELESS
+was set and PCRE2_NO_START_OPTIMIZE was not set. The optimization for finding
+the start of a match was not resetting correctly after a failed match on the
+first valid fragment of the subject, possibly causing incorrect "no match"
+returns on subsequent fragments. For example, the pattern /A/ failed to match
+the subject \xe5A. Fixes Bugzilla #2642.

+
Version 10.35 09-May-2020
---------------------------


Modified: code/trunk/src/pcre2_match.c
===================================================================
--- code/trunk/src/pcre2_match.c    2020-09-14 15:45:34 UTC (rev 1271)
+++ code/trunk/src/pcre2_match.c    2020-09-15 14:36:23 UTC (rev 1272)
@@ -6115,8 +6115,8 @@
 BOOL startline;


#if PCRE2_CODE_UNIT_WIDTH == 8
-BOOL memchr_not_found_first_cu = FALSE;
-BOOL memchr_not_found_first_cu2 = FALSE;
+BOOL memchr_not_found_first_cu;
+BOOL memchr_not_found_first_cu2;
#endif

PCRE2_UCHAR first_cu = 0;
@@ -6709,6 +6709,11 @@
start_partial = match_partial = NULL;
mb->hitend = FALSE;

+#if PCRE2_CODE_UNIT_WIDTH == 8
+memchr_not_found_first_cu = FALSE;
+memchr_not_found_first_cu2 = FALSE;
+#endif
+
 for(;;)
   {
   PCRE2_SPTR new_start_match;
@@ -7187,6 +7192,7 @@
     starting code units in 8-bit and 16-bit modes. */


     start_match = end_subject + 1;
+    
 #if PCRE2_CODE_UNIT_WIDTH != 32
     while (start_match < true_end_subject && NOT_FIRSTCU(*start_match))
       start_match++;


Modified: code/trunk/testdata/testinput10
===================================================================
--- code/trunk/testdata/testinput10    2020-09-14 15:45:34 UTC (rev 1271)
+++ code/trunk/testdata/testinput10    2020-09-15 14:36:23 UTC (rev 1272)
@@ -610,4 +610,7 @@
 /X(\x{e1})Y/replace=>\U$1<,substitute_extended
     X\x{e1}Y


+/A/utf,match_invalid_utf,caseless
+    \xe5A
+
 # End of testinput10


Modified: code/trunk/testdata/testoutput10
===================================================================
--- code/trunk/testdata/testoutput10    2020-09-14 15:45:34 UTC (rev 1271)
+++ code/trunk/testdata/testoutput10    2020-09-15 14:36:23 UTC (rev 1272)
@@ -1871,4 +1871,8 @@
     X\x{e1}Y
  1: >\xe1<


+/A/utf,match_invalid_utf,caseless
+    \xe5A
+ 0: A
+
 # End of testinput10