[Pcre-svn] [1488] code/trunk: Recognize characters with mult…

トップ ページ
このメッセージを削除
著者: Subversion repository
日付:  
To: pcre-svn
題目: [Pcre-svn] [1488] code/trunk: Recognize characters with multiple other cases when creating starting bit map .
Revision: 1488
          http://vcs.pcre.org/viewvc?view=rev&revision=1488
Author:   ph10
Date:     2014-06-18 19:38:00 +0100 (Wed, 18 Jun 2014)


Log Message:
-----------
Recognize characters with multiple other cases when creating starting bit map.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/pcre_study.c
    code/trunk/testdata/testinput16
    code/trunk/testdata/testinput19
    code/trunk/testdata/testoutput16
    code/trunk/testdata/testoutput19


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2014-06-18 17:17:03 UTC (rev 1487)
+++ code/trunk/ChangeLog    2014-06-18 18:38:00 UTC (rev 1488)
@@ -71,6 +71,11 @@


 14. If a character class started [\Qx]... where x is any character, the class
     was incorrectly terminated at the ].
+    
+15. If a pattern that started with a caseless match for a character with more 
+    than one "other case" was studied, PCRE did not set up the starting code 
+    unit bit map for the list of possible characters. Now it does. This is an 
+    optimization improvement, not a bug fix.



Version 8.35 04-April-2014

Modified: code/trunk/pcre_study.c
===================================================================
--- code/trunk/pcre_study.c    2014-06-18 17:17:03 UTC (rev 1487)
+++ code/trunk/pcre_study.c    2014-06-18 18:38:00 UTC (rev 1488)
@@ -863,7 +863,6 @@
       case OP_NOTUPTOI:
       case OP_NOT_HSPACE:
       case OP_NOT_VSPACE:
-      case OP_PROP:
       case OP_PRUNE:
       case OP_PRUNE_ARG:
       case OP_RECURSE:
@@ -880,6 +879,31 @@
       case OP_THEN:
       case OP_THEN_ARG:
       return SSB_FAIL;
+      
+      /* A "real" property test implies no starting bits, but the fake property
+      PT_CLIST identifies a list of characters. These lists are short, as they
+      are used for characters with more than one "other case", so there is no
+      point in recognizing them for OP_NOTPROP. */
+                                                                    
+      case OP_PROP:                                                     
+      if (tcode[1] != PT_CLIST) return SSB_FAIL;                       
+        {                                                              
+        const pcre_uint32 *p = PRIV(ucd_caseless_sets) + tcode[2];            
+        while ((c = *p++) < NOTACHAR)                           
+          {                                                                  
+#if defined SUPPORT_UTF && defined COMPILE_PCRE8         
+          if (utf)                  
+            {                                                        
+            pcre_uchar buff[6];                  
+            (void)PRIV(ord2utf)(c, buff);
+            c = buff[0];
+            }                                      
+#endif                                                          
+          if (c > 0xff) SET_BIT(0xff); else SET_BIT(c);
+          }        
+        }                
+      try_next = FALSE;    
+      break;               


       /* We can ignore word boundary tests. */



Modified: code/trunk/testdata/testinput16
===================================================================
--- code/trunk/testdata/testinput16    2014-06-18 17:17:03 UTC (rev 1487)
+++ code/trunk/testdata/testinput16    2014-06-18 18:38:00 UTC (rev 1488)
@@ -32,4 +32,10 @@


/[[:blank:]]/WBZ

+/\x{212a}+/i8SI
+    KKkk\x{212a}
+
+/s+/i8SI
+    SSss\x{17f}
+
 /-- End of testinput16 --/


Modified: code/trunk/testdata/testinput19
===================================================================
--- code/trunk/testdata/testinput19    2014-06-18 17:17:03 UTC (rev 1487)
+++ code/trunk/testdata/testinput19    2014-06-18 18:38:00 UTC (rev 1488)
@@ -19,4 +19,10 @@


/[[:blank:]]/WBZ

+/\x{212a}+/i8SI
+    KKkk\x{212a}
+
+/s+/i8SI
+    SSss\x{17f}
+
 /-- End of testinput19 --/ 


Modified: code/trunk/testdata/testoutput16
===================================================================
--- code/trunk/testdata/testoutput16    2014-06-18 17:17:03 UTC (rev 1487)
+++ code/trunk/testdata/testoutput16    2014-06-18 18:38:00 UTC (rev 1488)
@@ -118,4 +118,24 @@
         End
 ------------------------------------------------------------------


+/\x{212a}+/i8SI
+Capturing subpattern count = 0
+Options: caseless utf
+No first char
+No need char
+Subject length lower bound = 1
+Starting chars: K k \xe2 
+    KKkk\x{212a}
+ 0: KKkk\x{212a}
+
+/s+/i8SI
+Capturing subpattern count = 0
+Options: caseless utf
+No first char
+No need char
+Subject length lower bound = 1
+Starting chars: S s \xc5 
+    SSss\x{17f}
+ 0: SSss\x{17f}
+
 /-- End of testinput16 --/


Modified: code/trunk/testdata/testoutput19
===================================================================
--- code/trunk/testdata/testoutput19    2014-06-18 17:17:03 UTC (rev 1487)
+++ code/trunk/testdata/testoutput19    2014-06-18 18:38:00 UTC (rev 1488)
@@ -85,4 +85,24 @@
         End
 ------------------------------------------------------------------


+/\x{212a}+/i8SI
+Capturing subpattern count = 0
+Options: caseless utf
+No first char
+No need char
+Subject length lower bound = 1
+Starting chars: K k \xff 
+    KKkk\x{212a}
+ 0: KKkk\x{212a}
+
+/s+/i8SI
+Capturing subpattern count = 0
+Options: caseless utf
+No first char
+No need char
+Subject length lower bound = 1
+Starting chars: S s \xff 
+    SSss\x{17f}
+ 0: SSss\x{17f}
+
 /-- End of testinput19 --/