[Pcre-svn] [1364] code/trunk: Add VT to the set of characters recognized as white space.

Author: Subversion repository
Date:
To: pcre-svn
Subject: [Pcre-svn] [1364] code/trunk: Add VT to the set of characters recognized as white space.

Revision: 1364

          http://vcs.pcre.org/viewvc?view=rev&revision=1364
Author:   ph10
Date:     2013-10-05 16:45:11 +0100 (Sat, 05 Oct 2013)

Log Message:
-----------
Add VT to the set of characters recognized as white space.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcreapi.3
    code/trunk/doc/pcrepattern.3
    code/trunk/doc/pcresyntax.3
    code/trunk/pcre_chartables.c.dist
    code/trunk/pcre_compile.c
    code/trunk/pcre_dfa_exec.c
    code/trunk/pcre_exec.c
    code/trunk/pcre_maketables.c
    code/trunk/pcre_study.c
    code/trunk/pcre_xclass.c
    code/trunk/testdata/testoutput1
    code/trunk/testdata/testoutput10
    code/trunk/testdata/testoutput15
    code/trunk/testdata/testoutput18-16
    code/trunk/testdata/testoutput18-32
    code/trunk/testdata/testoutput2
    code/trunk/testdata/testoutput6
    code/trunk/testdata/testoutput7
    code/trunk/testdata/testoutput8

Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/ChangeLog    2013-10-05 15:45:11 UTC (rev 1364)
@@ -87,6 +87,10 @@
     compilation. The code is cleaner, and more cases are handled. The option 
     PCRE_NO_AUTO_POSSESSIFY is added for testing purposes, and the -O and /O 
     options in pcretest are provided to set it.
+    
+18. The character VT has been added to the set of characters that match \s and
+    are generally treated as white space, following this same change in Perl 
+    5.18. There is now no difference between "Perl space" and "POSIX space".

Version 8.33 28-May-2013

Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/doc/pcreapi.3    2013-10-05 15:45:11 UTC (rev 1364)
@@ -1,4 +1,4 @@
-.TH PCREAPI 3 "01 October 2013" "PCRE 8.34"
+.TH PCREAPI 3 "05 October 2013" "PCRE 8.34"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .sp
@@ -645,11 +645,14 @@
   PCRE_EXTENDED
 .sp
 If this bit is set, white space data characters in the pattern are totally
-ignored except when escaped or inside a character class. White space does not
-include the VT character (code 11). In addition, characters between an
-unescaped # outside a character class and the next newline, inclusive, are also
-ignored. This is equivalent to Perl's /x option, and it can be changed within a
-pattern by a (?x) option setting.
+ignored except when escaped or inside a character class. White space did not
+used to include the VT character (code 11), because Perl did not treat this 
+character as white space. However, Perl changed at release 5.18, so PCRE
+followed at release 8.34, and VT is now treated as white space. PCRE_EXTENDED
+also causes characters between an unescaped # outside a character class and the
+next newline, inclusive, to be ignored. PCRE_EXTENDED is equivalent to
+Perl's /x option, and it can be changed within a pattern by a (?x) option
+setting.
 .P
 Which characters are interpreted as newlines is controlled by the options
 passed to \fBpcre_compile()\fP or by a special sequence at the start of the
@@ -2863,6 +2866,6 @@
 .rs
 .sp
 .nf
-Last updated: 01 October 2013
+Last updated: 05 October 2013
 Copyright (c) 1997-2013 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/doc/pcrepattern.3    2013-10-05 15:45:11 UTC (rev 1364)
@@ -1,4 +1,4 @@
-.TH PCREPATTERN 3 "06 September 2013" "PCRE 8.34"
+.TH PCREPATTERN 3 "05 October 2013" "PCRE 8.34"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION DETAILS"
@@ -494,11 +494,10 @@
 matching point is at the end of the subject string, all of them fail, because
 there is no character to match.
 .P
-For compatibility with Perl, \es does not match the VT character (code 11).
-This makes it different from the the POSIX "space" class. The \es characters
-are HT (9), LF (10), FF (12), CR (13), and space (32). If "use locale;" is
-included in a Perl script, \es may match the VT character. In PCRE, it never
-does.
+For compatibility with Perl, \es did not used to match the VT character (code
+11), which made it different from the the POSIX "space" class. However, Perl
+added VT at release 5.18, and PCRE followed suit at release 8.34. The \es
+characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space (32).
 .P
 A "word" character is an underscore or any character that is a letter or digit.
 By default, the definition of letters and digits is controlled by PCRE's
@@ -1296,9 +1295,9 @@
   xdigit   hexadecimal digits
 .sp
 The "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), and
-space (32). Notice that this list includes the VT character (code 11). This
-makes "space" different to \es, which does not include VT (for Perl
-compatibility).
+space (32). "Space" used to be different to \es, which did not include VT, for
+Perl compatibility. However, Perl changed at release 5.18, and PCRE followed at
+release 8.34. "Space" and \es now match the same set of characters.
 .P
 The name "word" is a Perl extension, and "blank" is a GNU extension from Perl
 5.8. Another Perl extension is negation, which is indicated by a ^ character
@@ -3157,6 +3156,6 @@
 .rs
 .sp
 .nf
-Last updated: 06 September 2013
+Last updated: 05 October 2013
 Copyright (c) 1997-2013 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcresyntax.3
===================================================================
--- code/trunk/doc/pcresyntax.3    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/doc/pcresyntax.3    2013-10-05 15:45:11 UTC (rev 1364)
@@ -1,4 +1,4 @@
-.TH PCRESYNTAX 3 "26 April 2013" "PCRE 8.33"
+.TH PCRESYNTAX 3 "05 October 2013" "PCRE 8.34"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
@@ -115,10 +115,13 @@
 .sp
   Xan        Alphanumeric: union of properties L and N
   Xps        POSIX space: property Z or tab, NL, VT, FF, CR
-  Xsp        Perl space: property Z or tab, NL, FF, CR
+  Xsp        Perl space: property Z or tab, NL, VT, FF, CR
   Xuc        Univerally-named character: one that can be
                represented by a Universal Character Name
   Xwd        Perl word: property Xan or underscore
+.sp  
+Perl and POSIX space are now the same. Perl added VT to its space character set
+at release 5.18 and PCRE changed at release 8.34.
 .
 .
 .SH "SCRIPT NAMES FOR \ep AND \eP"
@@ -495,6 +498,6 @@
 .rs
 .sp
 .nf
-Last updated: 26 April 2013
+Last updated: 05 October 2013
 Copyright (c) 1997-2013 University of Cambridge.
 .fi

Modified: code/trunk/pcre_chartables.c.dist
===================================================================
--- code/trunk/pcre_chartables.c.dist    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/pcre_chartables.c.dist    2013-10-05 15:45:11 UTC (rev 1364)
@@ -163,7 +163,7 @@
 */

   0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /*   0-  7 */
-  0x00,0x01,0x01,0x00,0x01,0x01,0x00,0x00, /*   8- 15 */
+  0x00,0x01,0x01,0x01,0x01,0x01,0x00,0x00, /*   8- 15 */
   0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /*  16- 23 */
   0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /*  24- 31 */
   0x01,0x00,0x00,0x00,0x80,0x00,0x00,0x00, /*    - '  */

Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/pcre_compile.c    2013-10-05 15:45:11 UTC (rev 1364)
@@ -2650,11 +2650,11 @@
   return (PRIV(ucp_gentype)[prop->chartype] == ucp_L ||
           PRIV(ucp_gentype)[prop->chartype] == ucp_N) == negated;

+  /* Perl space used to exclude VT, but from Perl 5.18 it is included, which
+  means that Perl space and POSIX space are now identical. PCRE was changed
+  at release 8.34. */
+    
   case PT_SPACE:    /* Perl space */
-  return (PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
-          c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR)
-          == negated;
-
   case PT_PXSPACE:  /* POSIX space */
   return (PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
           c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
@@ -4627,21 +4627,20 @@
             for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_word];
             continue;

-            /* Perl 5.004 onwards omits VT from \s, but we must preserve it
-            if it was previously set by something earlier in the character
-            class. Luckily, the value of CHAR_VT is 0x0b in both ASCII and
-            EBCDIC, so we lazily just adjust the appropriate bit. */
+            /* Perl 5.004 onwards omitted VT from \s, but restored it at Perl
+            5.18. Before PCRE 8.34, we had to preserve the VT bit if it was
+            previously set by something earlier in the character class.
+            Luckily, the value of CHAR_VT is 0x0b in both ASCII and EBCDIC, so
+            we could just adjust the appropriate bit. From PCRE 8.34 we no 
+            longer treat \s and \S specially. */

             case ESC_s:
-            classbits[0] |= cbits[cbit_space];
-            classbits[1] |= cbits[cbit_space+1] & ~0x08;
-            for (c = 2; c < 32; c++) classbits[c] |= cbits[c+cbit_space];
+            for (c = 0; c < 32; c++) classbits[c] |= cbits[c+cbit_space];
             continue;

             case ESC_S:
             should_flip_negation = TRUE;
             for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_space];
-            classbits[1] |= 0x08;    /* Perl 5.004 onwards omits VT from \s */
             continue;

             /* The rest apply in both UCP and non-UCP cases. */

Modified: code/trunk/pcre_dfa_exec.c
===================================================================
--- code/trunk/pcre_dfa_exec.c    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/pcre_dfa_exec.c    2013-10-05 15:45:11 UTC (rev 1364)
@@ -1098,11 +1098,11 @@
                PRIV(ucp_gentype)[prop->chartype] == ucp_N;
           break;

+          /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+          which means that Perl space and POSIX space are now identical. PCRE
+          was changed at release 8.34. */
+    
           case PT_SPACE:    /* Perl space */
-          OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
-               c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
-          break;
-
           case PT_PXSPACE:  /* POSIX space */
           OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
                c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
@@ -1348,11 +1348,11 @@
                PRIV(ucp_gentype)[prop->chartype] == ucp_N;
           break;

+          /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+          which means that Perl space and POSIX space are now identical. PCRE
+          was changed at release 8.34. */
+    
           case PT_SPACE:    /* Perl space */
-          OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
-               c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
-          break;
-
           case PT_PXSPACE:  /* POSIX space */
           OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
                c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
@@ -1592,11 +1592,11 @@
                PRIV(ucp_gentype)[prop->chartype] == ucp_N;
           break;

+          /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+          which means that Perl space and POSIX space are now identical. PCRE
+          was changed at release 8.34. */
+    
           case PT_SPACE:    /* Perl space */
-          OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
-               c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
-          break;
-
           case PT_PXSPACE:  /* POSIX space */
           OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
                c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
@@ -1861,11 +1861,11 @@
                PRIV(ucp_gentype)[prop->chartype] == ucp_N;
           break;

+          /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+          which means that Perl space and POSIX space are now identical. PCRE
+          was changed at release 8.34. */
+    
           case PT_SPACE:    /* Perl space */
-          OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
-               c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
-          break;
-
           case PT_PXSPACE:  /* POSIX space */
           OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
                c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||

Modified: code/trunk/pcre_exec.c
===================================================================
--- code/trunk/pcre_exec.c    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/pcre_exec.c    2013-10-05 15:45:11 UTC (rev 1364)
@@ -2656,13 +2656,11 @@
           RRETURN(MATCH_NOMATCH);
         break;

+        /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+        which means that Perl space and POSIX space are now identical. PCRE
+        was changed at release 8.34. */
+    
         case PT_SPACE:    /* Perl space */
-        if ((PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
-             c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR)
-               == (op == OP_NOTPROP))
-          RRETURN(MATCH_NOMATCH);
-        break;
-
         case PT_PXSPACE:  /* POSIX space */
         if ((PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
              c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
@@ -4283,22 +4281,11 @@
             }
           break;

+          /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+          which means that Perl space and POSIX space are now identical. PCRE
+          was changed at release 8.34. */
+    
           case PT_SPACE:    /* Perl space */
-          for (i = 1; i <= min; i++)
-            {
-            if (eptr >= md->end_subject)
-              {
-              SCHECK_PARTIAL();
-              RRETURN(MATCH_NOMATCH);
-              }
-            GETCHARINCTEST(c, eptr);
-            if ((UCD_CATEGORY(c) == ucp_Z || c == CHAR_HT || c == CHAR_NL ||
-                 c == CHAR_FF || c == CHAR_CR)
-                   == prop_fail_result)
-              RRETURN(MATCH_NOMATCH);
-            }
-          break;
-
           case PT_PXSPACE:  /* POSIX space */
           for (i = 1; i <= min; i++)
             {
@@ -5031,25 +5018,11 @@
             }
           /* Control never gets here */

+          /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+          which means that Perl space and POSIX space are now identical. PCRE
+          was changed at release 8.34. */
+    
           case PT_SPACE:    /* Perl space */
-          for (fi = min;; fi++)
-            {
-            RMATCH(eptr, ecode, offset_top, md, eptrb, RM60);
-            if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (fi >= max) RRETURN(MATCH_NOMATCH);
-            if (eptr >= md->end_subject)
-              {
-              SCHECK_PARTIAL();
-              RRETURN(MATCH_NOMATCH);
-              }
-            GETCHARINCTEST(c, eptr);
-            if ((UCD_CATEGORY(c) == ucp_Z || c == CHAR_HT || c == CHAR_NL ||
-                 c == CHAR_FF || c == CHAR_CR)
-                   == prop_fail_result)
-              RRETURN(MATCH_NOMATCH);
-            }
-          /* Control never gets here */
-
           case PT_PXSPACE:  /* POSIX space */
           for (fi = min;; fi++)
             {
@@ -5549,24 +5522,11 @@
             }
           break;

+          /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+          which means that Perl space and POSIX space are now identical. PCRE
+          was changed at release 8.34. */
+    
           case PT_SPACE:    /* Perl space */
-          for (i = min; i < max; i++)
-            {
-            int len = 1;
-            if (eptr >= md->end_subject)
-              {
-              SCHECK_PARTIAL();
-              break;
-              }
-            GETCHARLENTEST(c, eptr, len);
-            if ((UCD_CATEGORY(c) == ucp_Z || c == CHAR_HT || c == CHAR_NL ||
-                 c == CHAR_FF || c == CHAR_CR)
-                 == prop_fail_result)
-              break;
-            eptr+= len;
-            }
-          break;
-
           case PT_PXSPACE:  /* POSIX space */
           for (i = min; i < max; i++)
             {

Modified: code/trunk/pcre_maketables.c
===================================================================
--- code/trunk/pcre_maketables.c    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/pcre_maketables.c    2013-10-05 15:45:11 UTC (rev 1364)
@@ -98,14 +98,18 @@
 for (i = 0; i < 256; i++) *p++ = islower(i)? toupper(i) : tolower(i);

/* Then the character class tables. Don't try to be clever and save effort on
-exclusive ones - in some locales things may be different. Note that the table
-for "space" includes everything "isspace" gives, including VT in the default
-locale. This makes it work for the POSIX class [:space:]. Note also that it is
-possible for a character to be alnum or alpha without being lower or upper,
-such as "male and female ordinals" (\xAA and \xBA) in the fr_FR locale (at
-least under Debian Linux's locales as of 12/2005). So we must test for alnum
-specially. */
+exclusive ones - in some locales things may be different.

+Note that the table for "space" includes everything "isspace" gives, including
+VT in the default locale. This makes it work for the POSIX class [:space:].
+From release 8.34 is is also correct for Perl space, because Perl added VT at
+release 5.18.
+
+Note also that it is possible for a character to be alnum or alpha without
+being lower or upper, such as "male and female ordinals" (\xAA and \xBA) in the
+fr_FR locale (at least under Debian Linux's locales as of 12/2005). So we must
+test for alnum specially. */
+
memset(p, 0, cbit_length);
for (i = 0; i < 256; i++)
{
@@ -123,14 +127,15 @@
}
p += cbit_length;

-/* Finally, the character type table. In this, we exclude VT from the white
-space chars, because Perl doesn't recognize it as such for \s and for comments
-within regexes. */
+/* Finally, the character type table. In this, we used to exclude VT from the
+white space chars, because Perl didn't recognize it as such for \s and for
+comments within regexes. However, Perl changed at release 5.18, so PCRE changed
+at release 8.34. */

for (i = 0; i < 256; i++)
{
int x = 0;
- if (i != CHAR_VT && isspace(i)) x += ctype_space;
+ if (isspace(i)) x += ctype_space;
if (isalpha(i)) x += ctype_letter;
if (isdigit(i)) x += ctype_digit;
if (isxdigit(i)) x += ctype_xdigit;

Modified: code/trunk/pcre_study.c
===================================================================
--- code/trunk/pcre_study.c    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/pcre_study.c    2013-10-05 15:45:11 UTC (rev 1364)
@@ -1219,24 +1219,16 @@
         set_type_bits(start_bits, cbit_digit, table_limit, cd);
         break;

-        /* The cbit_space table has vertical tab as whitespace; we have to
-        ensure it gets set as not whitespace. Luckily, the code value is the
-        same (0x0b) in ASCII and EBCDIC, so we can just adjust the appropriate
-        bit. */
+        /* The cbit_space table has vertical tab as whitespace; we no longer 
+        have to play fancy tricks because Perl added VT to its whitespace at 
+        release 5.18. PCRE added it at release 8.34. */

         case OP_NOT_WHITESPACE:
         set_nottype_bits(start_bits, cbit_space, table_limit, cd);
-        start_bits[1] |= 0x08;
         break;

-        /* The cbit_space table has vertical tab as whitespace; we have to
-        avoid setting it. Luckily, the code value is the same (0x0b) in ASCII
-        and EBCDIC, so we can just adjust the appropriate bit. */
-
         case OP_WHITESPACE:
-        c = start_bits[1];    /* Save in case it was already set */
         set_type_bits(start_bits, cbit_space, table_limit, cd);
-        start_bits[1] = (start_bits[1] & ~0x08) | c;
         break;

         case OP_NOT_WORDCHAR:

Modified: code/trunk/pcre_xclass.c
===================================================================
--- code/trunk/pcre_xclass.c    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/pcre_xclass.c    2013-10-05 15:45:11 UTC (rev 1364)
@@ -159,13 +159,11 @@
         return !negated;
       break;

+      /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+      which means that Perl space and POSIX space are now identical. PCRE
+      was changed at release 8.34. */
+    
       case PT_SPACE:    /* Perl space */
-      if ((PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
-           c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR)
-             == (t == XCL_PROP))
-        return !negated;
-      break;
-
       case PT_PXSPACE:  /* POSIX space */
       if ((PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
            c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||

Modified: code/trunk/testdata/testoutput1
===================================================================
--- code/trunk/testdata/testoutput1    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput1    2013-10-05 15:45:11 UTC (rev 1364)
@@ -6006,15 +6006,15 @@

 /[\s]+/
     > \x09\x0a\x0c\x0d\x0b<
- 0:  \x09\x0a\x0c\x0d
+ 0:  \x09\x0a\x0c\x0d\x0b

 /\s+/
     > \x09\x0a\x0c\x0d\x0b<
- 0:  \x09\x0a\x0c\x0d
+ 0:  \x09\x0a\x0c\x0d\x0b

 /a?b/x
     ab
-No match
+ 0: ab

/(?!\A)x/m
a\nxb\n

Modified: code/trunk/testdata/testoutput10
===================================================================
--- code/trunk/testdata/testoutput10    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput10    2013-10-05 15:45:11 UTC (rev 1364)
@@ -1717,36 +1717,39 @@

 /^>\p{Xsp}+/8O
     > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
- 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
- 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
- 3: > \x{09}\x{0a}\x{0c}\x{0d}
- 4: > \x{09}\x{0a}\x{0c}
- 5: > \x{09}\x{0a}
- 6: > \x{09}
- 7: > 
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: >

 /^>\p{Xsp}*/8O
     > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
- 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
- 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
- 3: > \x{09}\x{0a}\x{0c}\x{0d}
- 4: > \x{09}\x{0a}\x{0c}
- 5: > \x{09}\x{0a}
- 6: > \x{09}
- 7: > 
- 8: >
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: > 
+ 9: >

 /^>\p{Xsp}{2,9}/8O
     > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
- 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
- 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
- 3: > \x{09}\x{0a}\x{0c}\x{0d}
- 4: > \x{09}\x{0a}\x{0c}
- 5: > \x{09}\x{0a}
- 6: > \x{09}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}

 /^>[\p{Xsp}]/8O
     >\x{2028}\x{0b}
@@ -1754,14 +1757,15 @@

 /^>[\p{Xsp}]+/8O
     > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
- 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
- 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
- 3: > \x{09}\x{0a}\x{0c}\x{0d}
- 4: > \x{09}\x{0a}\x{0c}
- 5: > \x{09}\x{0a}
- 6: > \x{09}
- 7: > 
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: >

 /^>\p{Xps}/8
     >\x{1680}\x{2028}\x{0b}

Modified: code/trunk/testdata/testoutput15
===================================================================
--- code/trunk/testdata/testoutput15    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput15    2013-10-05 15:45:11 UTC (rev 1364)
@@ -861,7 +861,7 @@
 No first char
 Need char = 'x'
 Subject length lower bound = 4
-Starting byte set: \x09 \x0a \x0c \x0d \x20 x 
+Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 x

/\sxxx\s/I8ST1
Capturing subpattern count = 0

Modified: code/trunk/testdata/testoutput18-16
===================================================================
--- code/trunk/testdata/testoutput18-16    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput18-16    2013-10-05 15:45:11 UTC (rev 1364)
@@ -742,7 +742,7 @@
 No first char
 Need char = 'x'
 Subject length lower bound = 4
-Starting byte set: \x09 \x0a \x0c \x0d \x20 x 
+Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 x

/\sxxx\s/I8ST1
Capturing subpattern count = 0

Modified: code/trunk/testdata/testoutput18-32
===================================================================
--- code/trunk/testdata/testoutput18-32    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput18-32    2013-10-05 15:45:11 UTC (rev 1364)
@@ -739,7 +739,7 @@
 No first char
 Need char = 'x'
 Subject length lower bound = 4
-Starting byte set: \x09 \x0a \x0c \x0d \x20 x 
+Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 x

/\sxxx\s/I8ST1
Capturing subpattern count = 0

Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput2    2013-10-05 15:45:11 UTC (rev 1364)
@@ -229,7 +229,7 @@
 No first char
 No need char
 Subject length lower bound = 1
-Starting byte set: \x09 \x0a \x0c \x0d \x20 a b 
+Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 a b

 /(ab\2)/
 Failed: reference to non-existent subpattern at offset 6
@@ -2653,7 +2653,7 @@
 /[\s]/DZ
 ------------------------------------------------------------------
         Bra
-        [\x09\x0a\x0c\x0d ]
+        [\x09-\x0d ]
         Ket
         End
 ------------------------------------------------------------------
@@ -2665,7 +2665,7 @@
 /[\S]/DZ
 ------------------------------------------------------------------
         Bra
-        [\x00-\x08\x0b\x0e-\x1f!-\xff] (neg)
+        [\x00-\x08\x0e-\x1f!-\xff] (neg)
         Ket
         End
 ------------------------------------------------------------------
@@ -3167,7 +3167,7 @@
 /[\s]/IDZ
 ------------------------------------------------------------------
         Bra
-        [\x09\x0a\x0c\x0d ]
+        [\x09-\x0d ]
         Ket
         End
 ------------------------------------------------------------------
@@ -6418,9 +6418,9 @@
 No first char
 Need char = ','
 Subject length lower bound = 1
-Starting byte set: \x09 \x0a \x0c \x0d \x20 , 
+Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 , 
     \x0b,\x0b
- 0: ,
+ 0: \x0b,\x0b
     \x0c,\x0d
  0: \x0c,\x0d

Modified: code/trunk/testdata/testoutput6
===================================================================
--- code/trunk/testdata/testoutput6    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput6    2013-10-05 15:45:11 UTC (rev 1364)
@@ -1302,7 +1302,7 @@

 /^>\s+/8W
     >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} 
- 0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{09}
+ 0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{09}\x{0b}

 /^>\pZ+/8W
     >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b}

Modified: code/trunk/testdata/testoutput7
===================================================================
--- code/trunk/testdata/testoutput7    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput7    2013-10-05 15:45:11 UTC (rev 1364)
@@ -540,7 +540,7 @@

 /^>\p{Xsp}+/8
     > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}

 /^>\p{Xsp}+?/8
     >\x{1680}\x{2028}\x{0b}
@@ -548,11 +548,11 @@

 /^>\p{Xsp}*/8
     > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}

 /^>\p{Xsp}{2,9}/8
     > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}

 /^>\p{Xsp}{2,9}?/8
     > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
@@ -564,7 +564,7 @@

 /^>[\p{Xsp}]+/8
     > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}

 /^>\p{Xps}/8
     >\x{1680}\x{2028}\x{0b}

Modified: code/trunk/testdata/testoutput8
===================================================================
--- code/trunk/testdata/testoutput8    2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput8    2013-10-05 15:45:11 UTC (rev 1364)
@@ -6083,19 +6083,20 @@

 /[\s]+/
     > \x09\x0a\x0c\x0d\x0b<
- 0:  \x09\x0a\x0c\x0d
- 1:  \x09\x0a\x0c
- 2:  \x09\x0a
- 3:  \x09
- 4:  
+ 0:  \x09\x0a\x0c\x0d\x0b
+ 1:  \x09\x0a\x0c\x0d
+ 2:  \x09\x0a\x0c
+ 3:  \x09\x0a
+ 4:  \x09
+ 5:

 /\s+/
     > \x09\x0a\x0c\x0d\x0b<
- 0:  \x09\x0a\x0c\x0d
+ 0:  \x09\x0a\x0c\x0d\x0b

 /a?b/x
     ab
-No match
+ 0: ab

/(?!\A)x/m
a\nxb\n

This message is part of the following thread:
	the complete thread tree sorted by date

[Pcre-svn] [1364] code/trunk: Add VT to the set of character…