Revision: 1364
http://vcs.pcre.org/viewvc?view=rev&revision=1364
Author: ph10
Date: 2013-10-05 16:45:11 +0100 (Sat, 05 Oct 2013)
Log Message:
-----------
Add VT to the set of characters recognized as white space.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/doc/pcreapi.3
code/trunk/doc/pcrepattern.3
code/trunk/doc/pcresyntax.3
code/trunk/pcre_chartables.c.dist
code/trunk/pcre_compile.c
code/trunk/pcre_dfa_exec.c
code/trunk/pcre_exec.c
code/trunk/pcre_maketables.c
code/trunk/pcre_study.c
code/trunk/pcre_xclass.c
code/trunk/testdata/testoutput1
code/trunk/testdata/testoutput10
code/trunk/testdata/testoutput15
code/trunk/testdata/testoutput18-16
code/trunk/testdata/testoutput18-32
code/trunk/testdata/testoutput2
code/trunk/testdata/testoutput6
code/trunk/testdata/testoutput7
code/trunk/testdata/testoutput8
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/ChangeLog 2013-10-05 15:45:11 UTC (rev 1364)
@@ -87,6 +87,10 @@
compilation. The code is cleaner, and more cases are handled. The option
PCRE_NO_AUTO_POSSESSIFY is added for testing purposes, and the -O and /O
options in pcretest are provided to set it.
+
+18. The character VT has been added to the set of characters that match \s and
+ are generally treated as white space, following this same change in Perl
+ 5.18. There is now no difference between "Perl space" and "POSIX space".
Version 8.33 28-May-2013
Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/doc/pcreapi.3 2013-10-05 15:45:11 UTC (rev 1364)
@@ -1,4 +1,4 @@
-.TH PCREAPI 3 "01 October 2013" "PCRE 8.34"
+.TH PCREAPI 3 "05 October 2013" "PCRE 8.34"
.SH NAME
PCRE - Perl-compatible regular expressions
.sp
@@ -645,11 +645,14 @@
PCRE_EXTENDED
.sp
If this bit is set, white space data characters in the pattern are totally
-ignored except when escaped or inside a character class. White space does not
-include the VT character (code 11). In addition, characters between an
-unescaped # outside a character class and the next newline, inclusive, are also
-ignored. This is equivalent to Perl's /x option, and it can be changed within a
-pattern by a (?x) option setting.
+ignored except when escaped or inside a character class. White space did not
+used to include the VT character (code 11), because Perl did not treat this
+character as white space. However, Perl changed at release 5.18, so PCRE
+followed at release 8.34, and VT is now treated as white space. PCRE_EXTENDED
+also causes characters between an unescaped # outside a character class and the
+next newline, inclusive, to be ignored. PCRE_EXTENDED is equivalent to
+Perl's /x option, and it can be changed within a pattern by a (?x) option
+setting.
.P
Which characters are interpreted as newlines is controlled by the options
passed to \fBpcre_compile()\fP or by a special sequence at the start of the
@@ -2863,6 +2866,6 @@
.rs
.sp
.nf
-Last updated: 01 October 2013
+Last updated: 05 October 2013
Copyright (c) 1997-2013 University of Cambridge.
.fi
Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/doc/pcrepattern.3 2013-10-05 15:45:11 UTC (rev 1364)
@@ -1,4 +1,4 @@
-.TH PCREPATTERN 3 "06 September 2013" "PCRE 8.34"
+.TH PCREPATTERN 3 "05 October 2013" "PCRE 8.34"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE REGULAR EXPRESSION DETAILS"
@@ -494,11 +494,10 @@
matching point is at the end of the subject string, all of them fail, because
there is no character to match.
.P
-For compatibility with Perl, \es does not match the VT character (code 11).
-This makes it different from the the POSIX "space" class. The \es characters
-are HT (9), LF (10), FF (12), CR (13), and space (32). If "use locale;" is
-included in a Perl script, \es may match the VT character. In PCRE, it never
-does.
+For compatibility with Perl, \es did not used to match the VT character (code
+11), which made it different from the the POSIX "space" class. However, Perl
+added VT at release 5.18, and PCRE followed suit at release 8.34. The \es
+characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space (32).
.P
A "word" character is an underscore or any character that is a letter or digit.
By default, the definition of letters and digits is controlled by PCRE's
@@ -1296,9 +1295,9 @@
xdigit hexadecimal digits
.sp
The "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), and
-space (32). Notice that this list includes the VT character (code 11). This
-makes "space" different to \es, which does not include VT (for Perl
-compatibility).
+space (32). "Space" used to be different to \es, which did not include VT, for
+Perl compatibility. However, Perl changed at release 5.18, and PCRE followed at
+release 8.34. "Space" and \es now match the same set of characters.
.P
The name "word" is a Perl extension, and "blank" is a GNU extension from Perl
5.8. Another Perl extension is negation, which is indicated by a ^ character
@@ -3157,6 +3156,6 @@
.rs
.sp
.nf
-Last updated: 06 September 2013
+Last updated: 05 October 2013
Copyright (c) 1997-2013 University of Cambridge.
.fi
Modified: code/trunk/doc/pcresyntax.3
===================================================================
--- code/trunk/doc/pcresyntax.3 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/doc/pcresyntax.3 2013-10-05 15:45:11 UTC (rev 1364)
@@ -1,4 +1,4 @@
-.TH PCRESYNTAX 3 "26 April 2013" "PCRE 8.33"
+.TH PCRESYNTAX 3 "05 October 2013" "PCRE 8.34"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
@@ -115,10 +115,13 @@
.sp
Xan Alphanumeric: union of properties L and N
Xps POSIX space: property Z or tab, NL, VT, FF, CR
- Xsp Perl space: property Z or tab, NL, FF, CR
+ Xsp Perl space: property Z or tab, NL, VT, FF, CR
Xuc Univerally-named character: one that can be
represented by a Universal Character Name
Xwd Perl word: property Xan or underscore
+.sp
+Perl and POSIX space are now the same. Perl added VT to its space character set
+at release 5.18 and PCRE changed at release 8.34.
.
.
.SH "SCRIPT NAMES FOR \ep AND \eP"
@@ -495,6 +498,6 @@
.rs
.sp
.nf
-Last updated: 26 April 2013
+Last updated: 05 October 2013
Copyright (c) 1997-2013 University of Cambridge.
.fi
Modified: code/trunk/pcre_chartables.c.dist
===================================================================
--- code/trunk/pcre_chartables.c.dist 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/pcre_chartables.c.dist 2013-10-05 15:45:11 UTC (rev 1364)
@@ -163,7 +163,7 @@
*/
0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 0- 7 */
- 0x00,0x01,0x01,0x00,0x01,0x01,0x00,0x00, /* 8- 15 */
+ 0x00,0x01,0x01,0x01,0x01,0x01,0x00,0x00, /* 8- 15 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 16- 23 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 24- 31 */
0x01,0x00,0x00,0x00,0x80,0x00,0x00,0x00, /* - ' */
Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/pcre_compile.c 2013-10-05 15:45:11 UTC (rev 1364)
@@ -2650,11 +2650,11 @@
return (PRIV(ucp_gentype)[prop->chartype] == ucp_L ||
PRIV(ucp_gentype)[prop->chartype] == ucp_N) == negated;
+ /* Perl space used to exclude VT, but from Perl 5.18 it is included, which
+ means that Perl space and POSIX space are now identical. PCRE was changed
+ at release 8.34. */
+
case PT_SPACE: /* Perl space */
- return (PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
- c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR)
- == negated;
-
case PT_PXSPACE: /* POSIX space */
return (PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
@@ -4627,21 +4627,20 @@
for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_word];
continue;
- /* Perl 5.004 onwards omits VT from \s, but we must preserve it
- if it was previously set by something earlier in the character
- class. Luckily, the value of CHAR_VT is 0x0b in both ASCII and
- EBCDIC, so we lazily just adjust the appropriate bit. */
+ /* Perl 5.004 onwards omitted VT from \s, but restored it at Perl
+ 5.18. Before PCRE 8.34, we had to preserve the VT bit if it was
+ previously set by something earlier in the character class.
+ Luckily, the value of CHAR_VT is 0x0b in both ASCII and EBCDIC, so
+ we could just adjust the appropriate bit. From PCRE 8.34 we no
+ longer treat \s and \S specially. */
case ESC_s:
- classbits[0] |= cbits[cbit_space];
- classbits[1] |= cbits[cbit_space+1] & ~0x08;
- for (c = 2; c < 32; c++) classbits[c] |= cbits[c+cbit_space];
+ for (c = 0; c < 32; c++) classbits[c] |= cbits[c+cbit_space];
continue;
case ESC_S:
should_flip_negation = TRUE;
for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_space];
- classbits[1] |= 0x08; /* Perl 5.004 onwards omits VT from \s */
continue;
/* The rest apply in both UCP and non-UCP cases. */
Modified: code/trunk/pcre_dfa_exec.c
===================================================================
--- code/trunk/pcre_dfa_exec.c 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/pcre_dfa_exec.c 2013-10-05 15:45:11 UTC (rev 1364)
@@ -1098,11 +1098,11 @@
PRIV(ucp_gentype)[prop->chartype] == ucp_N;
break;
+ /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+ which means that Perl space and POSIX space are now identical. PCRE
+ was changed at release 8.34. */
+
case PT_SPACE: /* Perl space */
- OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
- c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
- break;
-
case PT_PXSPACE: /* POSIX space */
OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
@@ -1348,11 +1348,11 @@
PRIV(ucp_gentype)[prop->chartype] == ucp_N;
break;
+ /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+ which means that Perl space and POSIX space are now identical. PCRE
+ was changed at release 8.34. */
+
case PT_SPACE: /* Perl space */
- OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
- c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
- break;
-
case PT_PXSPACE: /* POSIX space */
OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
@@ -1592,11 +1592,11 @@
PRIV(ucp_gentype)[prop->chartype] == ucp_N;
break;
+ /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+ which means that Perl space and POSIX space are now identical. PCRE
+ was changed at release 8.34. */
+
case PT_SPACE: /* Perl space */
- OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
- c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
- break;
-
case PT_PXSPACE: /* POSIX space */
OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
@@ -1861,11 +1861,11 @@
PRIV(ucp_gentype)[prop->chartype] == ucp_N;
break;
+ /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+ which means that Perl space and POSIX space are now identical. PCRE
+ was changed at release 8.34. */
+
case PT_SPACE: /* Perl space */
- OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
- c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
- break;
-
case PT_PXSPACE: /* POSIX space */
OK = PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
Modified: code/trunk/pcre_exec.c
===================================================================
--- code/trunk/pcre_exec.c 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/pcre_exec.c 2013-10-05 15:45:11 UTC (rev 1364)
@@ -2656,13 +2656,11 @@
RRETURN(MATCH_NOMATCH);
break;
+ /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+ which means that Perl space and POSIX space are now identical. PCRE
+ was changed at release 8.34. */
+
case PT_SPACE: /* Perl space */
- if ((PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
- c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR)
- == (op == OP_NOTPROP))
- RRETURN(MATCH_NOMATCH);
- break;
-
case PT_PXSPACE: /* POSIX space */
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
@@ -4283,22 +4281,11 @@
}
break;
+ /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+ which means that Perl space and POSIX space are now identical. PCRE
+ was changed at release 8.34. */
+
case PT_SPACE: /* Perl space */
- for (i = 1; i <= min; i++)
- {
- if (eptr >= md->end_subject)
- {
- SCHECK_PARTIAL();
- RRETURN(MATCH_NOMATCH);
- }
- GETCHARINCTEST(c, eptr);
- if ((UCD_CATEGORY(c) == ucp_Z || c == CHAR_HT || c == CHAR_NL ||
- c == CHAR_FF || c == CHAR_CR)
- == prop_fail_result)
- RRETURN(MATCH_NOMATCH);
- }
- break;
-
case PT_PXSPACE: /* POSIX space */
for (i = 1; i <= min; i++)
{
@@ -5031,25 +5018,11 @@
}
/* Control never gets here */
+ /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+ which means that Perl space and POSIX space are now identical. PCRE
+ was changed at release 8.34. */
+
case PT_SPACE: /* Perl space */
- for (fi = min;; fi++)
- {
- RMATCH(eptr, ecode, offset_top, md, eptrb, RM60);
- if (rrc != MATCH_NOMATCH) RRETURN(rrc);
- if (fi >= max) RRETURN(MATCH_NOMATCH);
- if (eptr >= md->end_subject)
- {
- SCHECK_PARTIAL();
- RRETURN(MATCH_NOMATCH);
- }
- GETCHARINCTEST(c, eptr);
- if ((UCD_CATEGORY(c) == ucp_Z || c == CHAR_HT || c == CHAR_NL ||
- c == CHAR_FF || c == CHAR_CR)
- == prop_fail_result)
- RRETURN(MATCH_NOMATCH);
- }
- /* Control never gets here */
-
case PT_PXSPACE: /* POSIX space */
for (fi = min;; fi++)
{
@@ -5549,24 +5522,11 @@
}
break;
+ /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+ which means that Perl space and POSIX space are now identical. PCRE
+ was changed at release 8.34. */
+
case PT_SPACE: /* Perl space */
- for (i = min; i < max; i++)
- {
- int len = 1;
- if (eptr >= md->end_subject)
- {
- SCHECK_PARTIAL();
- break;
- }
- GETCHARLENTEST(c, eptr, len);
- if ((UCD_CATEGORY(c) == ucp_Z || c == CHAR_HT || c == CHAR_NL ||
- c == CHAR_FF || c == CHAR_CR)
- == prop_fail_result)
- break;
- eptr+= len;
- }
- break;
-
case PT_PXSPACE: /* POSIX space */
for (i = min; i < max; i++)
{
Modified: code/trunk/pcre_maketables.c
===================================================================
--- code/trunk/pcre_maketables.c 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/pcre_maketables.c 2013-10-05 15:45:11 UTC (rev 1364)
@@ -98,14 +98,18 @@
for (i = 0; i < 256; i++) *p++ = islower(i)? toupper(i) : tolower(i);
/* Then the character class tables. Don't try to be clever and save effort on
-exclusive ones - in some locales things may be different. Note that the table
-for "space" includes everything "isspace" gives, including VT in the default
-locale. This makes it work for the POSIX class [:space:]. Note also that it is
-possible for a character to be alnum or alpha without being lower or upper,
-such as "male and female ordinals" (\xAA and \xBA) in the fr_FR locale (at
-least under Debian Linux's locales as of 12/2005). So we must test for alnum
-specially. */
+exclusive ones - in some locales things may be different.
+Note that the table for "space" includes everything "isspace" gives, including
+VT in the default locale. This makes it work for the POSIX class [:space:].
+From release 8.34 is is also correct for Perl space, because Perl added VT at
+release 5.18.
+
+Note also that it is possible for a character to be alnum or alpha without
+being lower or upper, such as "male and female ordinals" (\xAA and \xBA) in the
+fr_FR locale (at least under Debian Linux's locales as of 12/2005). So we must
+test for alnum specially. */
+
memset(p, 0, cbit_length);
for (i = 0; i < 256; i++)
{
@@ -123,14 +127,15 @@
}
p += cbit_length;
-/* Finally, the character type table. In this, we exclude VT from the white
-space chars, because Perl doesn't recognize it as such for \s and for comments
-within regexes. */
+/* Finally, the character type table. In this, we used to exclude VT from the
+white space chars, because Perl didn't recognize it as such for \s and for
+comments within regexes. However, Perl changed at release 5.18, so PCRE changed
+at release 8.34. */
for (i = 0; i < 256; i++)
{
int x = 0;
- if (i != CHAR_VT && isspace(i)) x += ctype_space;
+ if (isspace(i)) x += ctype_space;
if (isalpha(i)) x += ctype_letter;
if (isdigit(i)) x += ctype_digit;
if (isxdigit(i)) x += ctype_xdigit;
Modified: code/trunk/pcre_study.c
===================================================================
--- code/trunk/pcre_study.c 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/pcre_study.c 2013-10-05 15:45:11 UTC (rev 1364)
@@ -1219,24 +1219,16 @@
set_type_bits(start_bits, cbit_digit, table_limit, cd);
break;
- /* The cbit_space table has vertical tab as whitespace; we have to
- ensure it gets set as not whitespace. Luckily, the code value is the
- same (0x0b) in ASCII and EBCDIC, so we can just adjust the appropriate
- bit. */
+ /* The cbit_space table has vertical tab as whitespace; we no longer
+ have to play fancy tricks because Perl added VT to its whitespace at
+ release 5.18. PCRE added it at release 8.34. */
case OP_NOT_WHITESPACE:
set_nottype_bits(start_bits, cbit_space, table_limit, cd);
- start_bits[1] |= 0x08;
break;
- /* The cbit_space table has vertical tab as whitespace; we have to
- avoid setting it. Luckily, the code value is the same (0x0b) in ASCII
- and EBCDIC, so we can just adjust the appropriate bit. */
-
case OP_WHITESPACE:
- c = start_bits[1]; /* Save in case it was already set */
set_type_bits(start_bits, cbit_space, table_limit, cd);
- start_bits[1] = (start_bits[1] & ~0x08) | c;
break;
case OP_NOT_WORDCHAR:
Modified: code/trunk/pcre_xclass.c
===================================================================
--- code/trunk/pcre_xclass.c 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/pcre_xclass.c 2013-10-05 15:45:11 UTC (rev 1364)
@@ -159,13 +159,11 @@
return !negated;
break;
+ /* Perl space used to exclude VT, but from Perl 5.18 it is included,
+ which means that Perl space and POSIX space are now identical. PCRE
+ was changed at release 8.34. */
+
case PT_SPACE: /* Perl space */
- if ((PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
- c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR)
- == (t == XCL_PROP))
- return !negated;
- break;
-
case PT_PXSPACE: /* POSIX space */
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_Z ||
c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
Modified: code/trunk/testdata/testoutput1
===================================================================
--- code/trunk/testdata/testoutput1 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput1 2013-10-05 15:45:11 UTC (rev 1364)
@@ -6006,15 +6006,15 @@
/[\s]+/
> \x09\x0a\x0c\x0d\x0b<
- 0: \x09\x0a\x0c\x0d
+ 0: \x09\x0a\x0c\x0d\x0b
/\s+/
> \x09\x0a\x0c\x0d\x0b<
- 0: \x09\x0a\x0c\x0d
+ 0: \x09\x0a\x0c\x0d\x0b
/a?b/x
ab
-No match
+ 0: ab
/(?!\A)x/m
a\nxb\n
Modified: code/trunk/testdata/testoutput10
===================================================================
--- code/trunk/testdata/testoutput10 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput10 2013-10-05 15:45:11 UTC (rev 1364)
@@ -1717,36 +1717,39 @@
/^>\p{Xsp}+/8O
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
- 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
- 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
- 3: > \x{09}\x{0a}\x{0c}\x{0d}
- 4: > \x{09}\x{0a}\x{0c}
- 5: > \x{09}\x{0a}
- 6: > \x{09}
- 7: >
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: >
/^>\p{Xsp}*/8O
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
- 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
- 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
- 3: > \x{09}\x{0a}\x{0c}\x{0d}
- 4: > \x{09}\x{0a}\x{0c}
- 5: > \x{09}\x{0a}
- 6: > \x{09}
- 7: >
- 8: >
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: >
+ 9: >
/^>\p{Xsp}{2,9}/8O
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
- 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
- 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
- 3: > \x{09}\x{0a}\x{0c}\x{0d}
- 4: > \x{09}\x{0a}\x{0c}
- 5: > \x{09}\x{0a}
- 6: > \x{09}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
/^>[\p{Xsp}]/8O
>\x{2028}\x{0b}
@@ -1754,14 +1757,15 @@
/^>[\p{Xsp}]+/8O
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
- 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
- 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
- 3: > \x{09}\x{0a}\x{0c}\x{0d}
- 4: > \x{09}\x{0a}\x{0c}
- 5: > \x{09}\x{0a}
- 6: > \x{09}
- 7: >
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: >
/^>\p{Xps}/8
>\x{1680}\x{2028}\x{0b}
Modified: code/trunk/testdata/testoutput15
===================================================================
--- code/trunk/testdata/testoutput15 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput15 2013-10-05 15:45:11 UTC (rev 1364)
@@ -861,7 +861,7 @@
No first char
Need char = 'x'
Subject length lower bound = 4
-Starting byte set: \x09 \x0a \x0c \x0d \x20 x
+Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 x
/\sxxx\s/I8ST1
Capturing subpattern count = 0
Modified: code/trunk/testdata/testoutput18-16
===================================================================
--- code/trunk/testdata/testoutput18-16 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput18-16 2013-10-05 15:45:11 UTC (rev 1364)
@@ -742,7 +742,7 @@
No first char
Need char = 'x'
Subject length lower bound = 4
-Starting byte set: \x09 \x0a \x0c \x0d \x20 x
+Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 x
/\sxxx\s/I8ST1
Capturing subpattern count = 0
Modified: code/trunk/testdata/testoutput18-32
===================================================================
--- code/trunk/testdata/testoutput18-32 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput18-32 2013-10-05 15:45:11 UTC (rev 1364)
@@ -739,7 +739,7 @@
No first char
Need char = 'x'
Subject length lower bound = 4
-Starting byte set: \x09 \x0a \x0c \x0d \x20 x
+Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 x
/\sxxx\s/I8ST1
Capturing subpattern count = 0
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput2 2013-10-05 15:45:11 UTC (rev 1364)
@@ -229,7 +229,7 @@
No first char
No need char
Subject length lower bound = 1
-Starting byte set: \x09 \x0a \x0c \x0d \x20 a b
+Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 a b
/(ab\2)/
Failed: reference to non-existent subpattern at offset 6
@@ -2653,7 +2653,7 @@
/[\s]/DZ
------------------------------------------------------------------
Bra
- [\x09\x0a\x0c\x0d ]
+ [\x09-\x0d ]
Ket
End
------------------------------------------------------------------
@@ -2665,7 +2665,7 @@
/[\S]/DZ
------------------------------------------------------------------
Bra
- [\x00-\x08\x0b\x0e-\x1f!-\xff] (neg)
+ [\x00-\x08\x0e-\x1f!-\xff] (neg)
Ket
End
------------------------------------------------------------------
@@ -3167,7 +3167,7 @@
/[\s]/IDZ
------------------------------------------------------------------
Bra
- [\x09\x0a\x0c\x0d ]
+ [\x09-\x0d ]
Ket
End
------------------------------------------------------------------
@@ -6418,9 +6418,9 @@
No first char
Need char = ','
Subject length lower bound = 1
-Starting byte set: \x09 \x0a \x0c \x0d \x20 ,
+Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 ,
\x0b,\x0b
- 0: ,
+ 0: \x0b,\x0b
\x0c,\x0d
0: \x0c,\x0d
Modified: code/trunk/testdata/testoutput6
===================================================================
--- code/trunk/testdata/testoutput6 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput6 2013-10-05 15:45:11 UTC (rev 1364)
@@ -1302,7 +1302,7 @@
/^>\s+/8W
>\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b}
- 0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{09}
+ 0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{09}\x{0b}
/^>\pZ+/8W
>\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b}
Modified: code/trunk/testdata/testoutput7
===================================================================
--- code/trunk/testdata/testoutput7 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput7 2013-10-05 15:45:11 UTC (rev 1364)
@@ -540,7 +540,7 @@
/^>\p{Xsp}+/8
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
/^>\p{Xsp}+?/8
>\x{1680}\x{2028}\x{0b}
@@ -548,11 +548,11 @@
/^>\p{Xsp}*/8
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
/^>\p{Xsp}{2,9}/8
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
/^>\p{Xsp}{2,9}?/8
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
@@ -564,7 +564,7 @@
/^>[\p{Xsp}]+/8
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
- 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
/^>\p{Xps}/8
>\x{1680}\x{2028}\x{0b}
Modified: code/trunk/testdata/testoutput8
===================================================================
--- code/trunk/testdata/testoutput8 2013-10-01 16:54:40 UTC (rev 1363)
+++ code/trunk/testdata/testoutput8 2013-10-05 15:45:11 UTC (rev 1364)
@@ -6083,19 +6083,20 @@
/[\s]+/
> \x09\x0a\x0c\x0d\x0b<
- 0: \x09\x0a\x0c\x0d
- 1: \x09\x0a\x0c
- 2: \x09\x0a
- 3: \x09
- 4:
+ 0: \x09\x0a\x0c\x0d\x0b
+ 1: \x09\x0a\x0c\x0d
+ 2: \x09\x0a\x0c
+ 3: \x09\x0a
+ 4: \x09
+ 5:
/\s+/
> \x09\x0a\x0c\x0d\x0b<
- 0: \x09\x0a\x0c\x0d
+ 0: \x09\x0a\x0c\x0d\x0b
/a?b/x
ab
-No match
+ 0: ab
/(?!\A)x/m
a\nxb\n