[Pcre-svn] [354] code/trunk: Fix caseless backreferences for…

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [354] code/trunk: Fix caseless backreferences for non-ASCII characters.
Revision: 354
          http://vcs.pcre.org/viewvc?view=rev&revision=354
Author:   ph10
Date:     2008-07-07 17:30:33 +0100 (Mon, 07 Jul 2008)


Log Message:
-----------
Fix caseless backreferences for non-ASCII characters.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/pcre_exec.c
    code/trunk/testdata/testinput6
    code/trunk/testdata/testoutput6


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2008-07-07 15:44:24 UTC (rev 353)
+++ code/trunk/ChangeLog    2008-07-07 16:30:33 UTC (rev 354)
@@ -17,6 +17,10 @@
 3.  Change 12 for 7.7 introduced a bug in pcre_study() when a pattern contained
     a group with a zero qualifier. The result of the study could be incorrect,
     or the function might crash, depending on the pattern. 
+    
+4.  Caseless matching was not working for non-ASCII characters in back 
+    references. For example, /(\x{de})\1/8i was not matching \x{de}\x{fe}.
+    It now works when Unicode Property Support is available. 



Version 7.7 07-May-08

Modified: code/trunk/pcre_exec.c
===================================================================
--- code/trunk/pcre_exec.c    2008-07-07 15:44:24 UTC (rev 353)
+++ code/trunk/pcre_exec.c    2008-07-07 16:30:33 UTC (rev 354)
@@ -158,13 +158,39 @@


if (length > md->end_subject - eptr) return FALSE;

-/* Separate the caselesss case for speed */
+/* Separate the caseless case for speed. In UTF-8 mode we can only do this
+properly if Unicode properties are supported. Otherwise, we can check only
+ASCII characters. */

 if ((ims & PCRE_CASELESS) != 0)
   {
+#ifdef SUPPORT_UTF8
+#ifdef SUPPORT_UCP
+  if (md->utf8)
+    {
+    USPTR endptr = eptr + length; 
+    while (eptr < endptr)
+      {
+      int c, d; 
+      GETCHARINC(c, eptr);
+      GETCHARINC(d, p);
+      if (c != d && c != UCD_OTHERCASE(d)) return FALSE;
+      }  
+    }  
+  else
+#endif
+#endif
+
+  /* The same code works when not in UTF-8 mode and in UTF-8 mode when there
+  is no UCP support. */
+   
   while (length-- > 0)
-    if (md->lcc[*p++] != md->lcc[*eptr++]) return FALSE;
+    { if (md->lcc[*p++] != md->lcc[*eptr++]) return FALSE; }
   }
+  
+/* In the caseful case, we can just compare the bytes, whether or not we
+are in UTF-8 mode. */
+ 
 else
   { while (length-- > 0) if (*p++ != *eptr++) return FALSE; }



Modified: code/trunk/testdata/testinput6
===================================================================
--- code/trunk/testdata/testinput6    2008-07-07 15:44:24 UTC (rev 353)
+++ code/trunk/testdata/testinput6    2008-07-07 16:30:33 UTC (rev 354)
@@ -925,4 +925,22 @@
     ** Failers 
     \x{1d79}\x{a77d} 


+/(A)\1/8i
+    AA
+    Aa
+    aa
+    aA
+
+/(\x{de})\1/8i
+    \x{de}\x{de}
+    \x{de}\x{fe}
+    \x{fe}\x{fe}
+    \x{fe}\x{de}
+
+/(\x{10a})\1/8i
+    \x{10a}\x{10a}
+    \x{10a}\x{10b}
+    \x{10b}\x{10b}
+    \x{10b}\x{10a}
+
 / End of testinput6 /


Modified: code/trunk/testdata/testoutput6
===================================================================
--- code/trunk/testdata/testoutput6    2008-07-07 15:44:24 UTC (rev 353)
+++ code/trunk/testdata/testoutput6    2008-07-07 16:30:33 UTC (rev 354)
@@ -1705,4 +1705,46 @@
     \x{1d79}\x{a77d} 
 No match


+/(A)\1/8i
+    AA
+ 0: AA
+ 1: A
+    Aa
+ 0: Aa
+ 1: A
+    aa
+ 0: aa
+ 1: a
+    aA
+ 0: aA
+ 1: a
+
+/(\x{de})\1/8i
+    \x{de}\x{de}
+ 0: \x{de}\x{de}
+ 1: \x{de}
+    \x{de}\x{fe}
+ 0: \x{de}\x{fe}
+ 1: \x{de}
+    \x{fe}\x{fe}
+ 0: \x{fe}\x{fe}
+ 1: \x{fe}
+    \x{fe}\x{de}
+ 0: \x{fe}\x{de}
+ 1: \x{fe}
+
+/(\x{10a})\1/8i
+    \x{10a}\x{10a}
+ 0: \x{10a}\x{10a}
+ 1: \x{10a}
+    \x{10a}\x{10b}
+ 0: \x{10a}\x{10b}
+ 1: \x{10a}
+    \x{10b}\x{10b}
+ 0: \x{10b}\x{10b}
+ 1: \x{10b}
+    \x{10b}\x{10a}
+ 0: \x{10b}\x{10a}
+ 1: \x{10b}
+
 / End of testinput6 /