Revision: 517
http://vcs.pcre.org/viewvc?view=rev&revision=517
Author: ph10
Date: 2010-05-05 11:44:20 +0100 (Wed, 05 May 2010)
Log Message:
-----------
Add new special properties Xan, Xps, Xsp, Xwd to help with \w etc.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/doc/pcrepattern.3
code/trunk/doc/pcresyntax.3
code/trunk/maint/GenerateUtt.py
code/trunk/pcre_dfa_exec.c
code/trunk/pcre_exec.c
code/trunk/pcre_internal.h
code/trunk/pcre_tables.c
code/trunk/pcre_xclass.c
code/trunk/testdata/testinput12
code/trunk/testdata/testinput9
code/trunk/testdata/testoutput12
code/trunk/testdata/testoutput9
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/ChangeLog 2010-05-05 10:44:20 UTC (rev 517)
@@ -28,7 +28,11 @@
7. Minor change to pcretest.c to avoid a compiler warning.
+8. Added four artifical Unicode properties to help with an option to make
+ \s etc use properties. The new properties are: Xan (alphanumeric), Xsp
+ (Perl space), Xps (POSIX space), and Xwd (word).
+
Version 8.02 19-Mar-2010
------------------------
Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3 2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/doc/pcrepattern.3 2010-05-05 10:44:20 UTC (rev 517)
@@ -505,10 +505,16 @@
\eX an extended Unicode sequence
.sp
The property names represented by \fIxx\fP above are limited to the Unicode
-script names, the general category properties, and "Any", which matches any
-character (including newline). Other properties such as "InMusicalSymbols" are
-not currently supported by PCRE. Note that \eP{Any} does not match any
-characters, so always causes a match failure.
+script names, the general category properties, "Any", which matches any
+character (including newline), and some special PCRE properties (described
+in the
+.\" HTML <a href="#extraprops">
+.\" </a>
+next section).
+.\"
+Other Perl properties such as "InMusicalSymbols" are not currently supported by
+PCRE. Note that \eP{Any} does not match any characters, so always causes a
+match failure.
.P
Sets of Unicode characters are defined as belonging to certain scripts. A
character from one of these sets can be matched using a script name. For
@@ -613,10 +619,10 @@
Vai,
Yi.
.P
-Each character has exactly one general category property, specified by a
-two-letter abbreviation. For compatibility with Perl, negation can be specified
-by including a circumflex between the opening brace and the property name. For
-example, \ep{^Lu} is the same as \eP{Lu}.
+Each character has exactly one Unicode general category property, specified by
+a two-letter abbreviation. For compatibility with Perl, negation can be
+specified by including a circumflex between the opening brace and the property
+name. For example, \ep{^Lu} is the same as \eP{Lu}.
.P
If only one letter is specified with \ep or \eP, it includes all the general
category properties that start with that letter. In this case, in the absence
@@ -718,6 +724,27 @@
properties in PCRE.
.
.
+.\" HTML <a name="extraprops"></a>
+.SS PCRE's additional properties
+.rs
+.sp
+As well as the standard Unicode properties described in the previous
+section, PCRE supports four more that make it possible to convert traditional
+escape sequences such as \ew and \es and POSIX character classes to use Unicode
+properties. These are:
+.sp
+ Xan Any alphanumeric character
+ Xps Any POSIX space character
+ Xsp Any Perl space character
+ Xwd Any Perl "word" character
+.sp
+Xan matches characters that have either the L (letter) or the N (number)
+property. Xps matches the characters tab, linefeed, vertical tab, formfeed, or
+carriage return, and any other character that has the Z (separator) property.
+Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the
+same characters as Xan, plus underscore.
+.
+.
.\" HTML <a name="resetmatchstart"></a>
.SS "Resetting the match start"
.rs
@@ -2597,6 +2624,6 @@
.rs
.sp
.nf
-Last updated: 03 May 2010
+Last updated: 05 May 2010
Copyright (c) 1997-2010 University of Cambridge.
.fi
Modified: code/trunk/doc/pcresyntax.3
===================================================================
--- code/trunk/doc/pcresyntax.3 2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/doc/pcresyntax.3 2010-05-05 10:44:20 UTC (rev 517)
@@ -45,6 +45,7 @@
\eD a character that is not a decimal digit
\eh a horizontal whitespace character
\eH a character that is not a horizontal whitespace character
+ \eN a character that is not a newline
\ep{\fIxx\fP} a character with the \fIxx\fP property
\eP{\fIxx\fP} a character without the \fIxx\fP property
\eR a newline sequence
@@ -59,7 +60,7 @@
In PCRE, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII characters.
.
.
-.SH "GENERAL CATEGORY PROPERTY CODES FOR \ep and \eP"
+.SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
.rs
.sp
C Other
@@ -108,6 +109,15 @@
Zs Space separator
.
.
+.SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
+.rs
+.sp
+ Xan Alphanumeric: union of properties L and N
+ Xps POSIX space: property Z or tab, NL, VT, FF, CR
+ Xsp Perl space: property Z or tab, NL, FF, CR
+ Xwd Perl word: property Xan or underscore
+.
+.
.SH "SCRIPT NAMES FOR \ep AND \eP"
.rs
.sp
@@ -459,6 +469,6 @@
.rs
.sp
.nf
-Last updated: 01 March 2010
+Last updated: 05 May 2010
Copyright (c) 1997-2010 University of Cambridge.
.fi
Modified: code/trunk/maint/GenerateUtt.py
===================================================================
--- code/trunk/maint/GenerateUtt.py 2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/maint/GenerateUtt.py 2010-05-05 10:44:20 UTC (rev 517)
@@ -11,6 +11,7 @@
# Modified by PH 17-March-2009 to generate the more verbose form that works
# for UTF-support in EBCDIC as well as ASCII environments.
# Modified by PH 01-March-2010 to add new scripts from Unicode 5.2.0.
+# Modified by PH 04-May-2010 to add new "X.." special categories.
script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Buginese', 'Buhid', 'Canadian_Aboriginal', \
'Cherokee', 'Common', 'Coptic', 'Cypriot', 'Cyrillic', 'Deseret', 'Devanagari', 'Ethiopic', 'Georgian', \
@@ -36,12 +37,23 @@
general_category_names = ['C', 'L', 'M', 'N', 'P', 'S', 'Z']
+# First add the Unicode script and category names.
+
utt_table = zip(script_names, ['PT_SC'] * len(script_names))
utt_table += zip(category_names, ['PT_PC'] * len(category_names))
utt_table += zip(general_category_names, ['PT_GC'] * len(general_category_names))
-utt_table.append(('L&', 'PT_LAMP'))
+
+# Now add our own specials.
+
utt_table.append(('Any', 'PT_ANY'))
+utt_table.append(('L&', 'PT_LAMP'))
+utt_table.append(('Xan', 'PT_ALNUM'))
+utt_table.append(('Xps', 'PT_PXSPACE'))
+utt_table.append(('Xsp', 'PT_SPACE'))
+utt_table.append(('Xwd', 'PT_WORD'))
+# Sort the table.
+
utt_table.sort()
# We have to use STR_ macros to define the strings so that it all works in
@@ -74,7 +86,8 @@
offset = 0
last = ','
for utt in utt_table:
- if utt[1] in ('PT_ANY', 'PT_LAMP'):
+ if utt[1] in ('PT_ANY', 'PT_LAMP', 'PT_ALNUM', 'PT_PXSPACE',
+ 'PT_SPACE', 'PT_WORD'):
value = '0'
else:
value = 'ucp_' + utt[0]
Modified: code/trunk/pcre_dfa_exec.c
===================================================================
--- code/trunk/pcre_dfa_exec.c 2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/pcre_dfa_exec.c 2010-05-05 10:44:20 UTC (rev 517)
@@ -955,7 +955,8 @@
break;
case PT_LAMP:
- OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || prop->chartype == ucp_Lt;
+ OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll ||
+ prop->chartype == ucp_Lt;
break;
case PT_GC:
@@ -969,6 +970,30 @@
case PT_SC:
OK = prop->script == code[2];
break;
+
+ /* These are specials for combination cases. */
+
+ case PT_ALNUM:
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+ _pcre_ucp_gentype[prop->chartype] == ucp_N;
+ break;
+
+ case PT_SPACE: /* Perl space */
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+ c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
+ break;
+
+ case PT_PXSPACE: /* POSIX space */
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+ c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
+ c == CHAR_FF || c == CHAR_CR;
+ break;
+
+ case PT_WORD:
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+ _pcre_ucp_gentype[prop->chartype] == ucp_N ||
+ c == CHAR_UNDERSCORE;
+ break;
/* Should never occur, but keep compilers from grumbling. */
@@ -1124,7 +1149,8 @@
break;
case PT_LAMP:
- OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || prop->chartype == ucp_Lt;
+ OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll ||
+ prop->chartype == ucp_Lt;
break;
case PT_GC:
@@ -1139,6 +1165,30 @@
OK = prop->script == code[3];
break;
+ /* These are specials for combination cases. */
+
+ case PT_ALNUM:
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+ _pcre_ucp_gentype[prop->chartype] == ucp_N;
+ break;
+
+ case PT_SPACE: /* Perl space */
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+ c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
+ break;
+
+ case PT_PXSPACE: /* POSIX space */
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+ c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
+ c == CHAR_FF || c == CHAR_CR;
+ break;
+
+ case PT_WORD:
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+ _pcre_ucp_gentype[prop->chartype] == ucp_N ||
+ c == CHAR_UNDERSCORE;
+ break;
+
/* Should never occur, but keep compilers from grumbling. */
default:
@@ -1346,7 +1396,8 @@
break;
case PT_LAMP:
- OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || prop->chartype == ucp_Lt;
+ OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll ||
+ prop->chartype == ucp_Lt;
break;
case PT_GC:
@@ -1360,6 +1411,30 @@
case PT_SC:
OK = prop->script == code[3];
break;
+
+ /* These are specials for combination cases. */
+
+ case PT_ALNUM:
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+ _pcre_ucp_gentype[prop->chartype] == ucp_N;
+ break;
+
+ case PT_SPACE: /* Perl space */
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+ c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
+ break;
+
+ case PT_PXSPACE: /* POSIX space */
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+ c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
+ c == CHAR_FF || c == CHAR_CR;
+ break;
+
+ case PT_WORD:
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+ _pcre_ucp_gentype[prop->chartype] == ucp_N ||
+ c == CHAR_UNDERSCORE;
+ break;
/* Should never occur, but keep compilers from grumbling. */
@@ -1593,7 +1668,8 @@
break;
case PT_LAMP:
- OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || prop->chartype == ucp_Lt;
+ OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll ||
+ prop->chartype == ucp_Lt;
break;
case PT_GC:
@@ -1607,6 +1683,30 @@
case PT_SC:
OK = prop->script == code[5];
break;
+
+ /* These are specials for combination cases. */
+
+ case PT_ALNUM:
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+ _pcre_ucp_gentype[prop->chartype] == ucp_N;
+ break;
+
+ case PT_SPACE: /* Perl space */
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+ c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
+ break;
+
+ case PT_PXSPACE: /* POSIX space */
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+ c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
+ c == CHAR_FF || c == CHAR_CR;
+ break;
+
+ case PT_WORD:
+ OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+ _pcre_ucp_gentype[prop->chartype] == ucp_N ||
+ c == CHAR_UNDERSCORE;
+ break;
/* Should never occur, but keep compilers from grumbling. */
Modified: code/trunk/pcre_exec.c
===================================================================
--- code/trunk/pcre_exec.c 2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/pcre_exec.c 2010-05-05 10:44:20 UTC (rev 517)
@@ -2060,7 +2060,7 @@
prop->chartype == ucp_Ll ||
prop->chartype == ucp_Lt) == (op == OP_NOTPROP))
MRRETURN(MATCH_NOMATCH);
- break;
+ break;
case PT_GC:
if ((ecode[2] != _pcre_ucp_gentype[prop->chartype]) == (op == OP_PROP))
@@ -2076,7 +2076,39 @@
if ((ecode[2] != prop->script) == (op == OP_PROP))
MRRETURN(MATCH_NOMATCH);
break;
+
+ /* These are specials */
+
+ case PT_ALNUM:
+ if ((_pcre_ucp_gentype[prop->chartype] == ucp_L ||
+ _pcre_ucp_gentype[prop->chartype] == ucp_N) == (op == OP_NOTPROP))
+ MRRETURN(MATCH_NOMATCH);
+ break;
+
+ case PT_SPACE: /* Perl space */
+ if ((_pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+ c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR)
+ == (op == OP_NOTPROP))
+ MRRETURN(MATCH_NOMATCH);
+ break;
+
+ case PT_PXSPACE: /* POSIX space */
+ if ((_pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+ c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
+ c == CHAR_FF || c == CHAR_CR)
+ == (op == OP_NOTPROP))
+ MRRETURN(MATCH_NOMATCH);
+ break;
+ case PT_WORD:
+ if ((_pcre_ucp_gentype[prop->chartype] == ucp_L ||
+ _pcre_ucp_gentype[prop->chartype] == ucp_N ||
+ c == CHAR_UNDERSCORE) == (op == OP_NOTPROP))
+ MRRETURN(MATCH_NOMATCH);
+ break;
+
+ /* This should never occur */
+
default:
RRETURN(PCRE_ERROR_INTERNAL);
}
@@ -3492,6 +3524,75 @@
MRRETURN(MATCH_NOMATCH);
}
break;
+
+ case PT_ALNUM:
+ for (i = 1; i <= min; i++)
+ {
+ if (eptr >= md->end_subject)
+ {
+ SCHECK_PARTIAL();
+ MRRETURN(MATCH_NOMATCH);
+ }
+ GETCHARINCTEST(c, eptr);
+ prop_category = UCD_CATEGORY(c);
+ if ((prop_category == ucp_L || prop_category == ucp_N)
+ == prop_fail_result)
+ MRRETURN(MATCH_NOMATCH);
+ }
+ break;
+
+ case PT_SPACE: /* Perl space */
+ for (i = 1; i <= min; i++)
+ {
+ if (eptr >= md->end_subject)
+ {
+ SCHECK_PARTIAL();
+ MRRETURN(MATCH_NOMATCH);
+ }
+ GETCHARINCTEST(c, eptr);
+ prop_category = UCD_CATEGORY(c);
+ if ((prop_category == ucp_Z || c == CHAR_HT || c == CHAR_NL ||
+ c == CHAR_FF || c == CHAR_CR)
+ == prop_fail_result)
+ MRRETURN(MATCH_NOMATCH);
+ }
+ break;
+
+ case PT_PXSPACE: /* POSIX space */
+ for (i = 1; i <= min; i++)
+ {
+ if (eptr >= md->end_subject)
+ {
+ SCHECK_PARTIAL();
+ MRRETURN(MATCH_NOMATCH);
+ }
+ GETCHARINCTEST(c, eptr);
+ prop_category = UCD_CATEGORY(c);
+ if ((prop_category == ucp_Z || c == CHAR_HT || c == CHAR_NL ||
+ c == CHAR_VT || c == CHAR_FF || c == CHAR_CR)
+ == prop_fail_result)
+ MRRETURN(MATCH_NOMATCH);
+ }
+ break;
+
+ case PT_WORD:
+ for (i = 1; i <= min; i++)
+ {
+ if (eptr >= md->end_subject)
+ {
+ SCHECK_PARTIAL();
+ MRRETURN(MATCH_NOMATCH);
+ }
+ GETCHARINCTEST(c, eptr);
+ prop_category = UCD_CATEGORY(c);
+ if ((prop_category == ucp_L || prop_category == ucp_N ||
+ c == CHAR_UNDERSCORE)
+ == prop_fail_result)
+ MRRETURN(MATCH_NOMATCH);
+ }
+ break;
+
+ /* This should not occur */
default:
RRETURN(PCRE_ERROR_INTERNAL);
@@ -4132,6 +4233,88 @@
}
/* Control never gets here */
+ case PT_ALNUM:
+ for (fi = min;; fi++)
+ {
+ RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM39);
+ if (rrc != MATCH_NOMATCH) RRETURN(rrc);
+ if (fi >= max) MRRETURN(MATCH_NOMATCH);
+ if (eptr >= md->end_subject)
+ {
+ SCHECK_PARTIAL();
+ MRRETURN(MATCH_NOMATCH);
+ }
+ GETCHARINC(c, eptr);
+ prop_category = UCD_CATEGORY(c);
+ if ((prop_category == ucp_L || prop_category == ucp_N)
+ == prop_fail_result)
+ MRRETURN(MATCH_NOMATCH);
+ }
+ /* Control never gets here */
+
+ case PT_SPACE: /* Perl space */
+ for (fi = min;; fi++)
+ {
+ RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM39);
+ if (rrc != MATCH_NOMATCH) RRETURN(rrc);
+ if (fi >= max) MRRETURN(MATCH_NOMATCH);
+ if (eptr >= md->end_subject)
+ {
+ SCHECK_PARTIAL();
+ MRRETURN(MATCH_NOMATCH);
+ }
+ GETCHARINC(c, eptr);
+ prop_category = UCD_CATEGORY(c);
+ if ((prop_category == ucp_Z || c == CHAR_HT || c == CHAR_NL ||
+ c == CHAR_FF || c == CHAR_CR)
+ == prop_fail_result)
+ MRRETURN(MATCH_NOMATCH);
+ }
+ /* Control never gets here */
+
+ case PT_PXSPACE: /* POSIX space */
+ for (fi = min;; fi++)
+ {
+ RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM39);
+ if (rrc != MATCH_NOMATCH) RRETURN(rrc);
+ if (fi >= max) MRRETURN(MATCH_NOMATCH);
+ if (eptr >= md->end_subject)
+ {
+ SCHECK_PARTIAL();
+ MRRETURN(MATCH_NOMATCH);
+ }
+ GETCHARINC(c, eptr);
+ prop_category = UCD_CATEGORY(c);
+ if ((prop_category == ucp_Z || c == CHAR_HT || c == CHAR_NL ||
+ c == CHAR_VT || c == CHAR_FF || c == CHAR_CR)
+ == prop_fail_result)
+ MRRETURN(MATCH_NOMATCH);
+ }
+ /* Control never gets here */
+
+ case PT_WORD:
+ for (fi = min;; fi++)
+ {
+ RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM39);
+ if (rrc != MATCH_NOMATCH) RRETURN(rrc);
+ if (fi >= max) MRRETURN(MATCH_NOMATCH);
+ if (eptr >= md->end_subject)
+ {
+ SCHECK_PARTIAL();
+ MRRETURN(MATCH_NOMATCH);
+ }
+ GETCHARINC(c, eptr);
+ prop_category = UCD_CATEGORY(c);
+ if ((prop_category == ucp_L ||
+ prop_category == ucp_N ||
+ c == CHAR_UNDERSCORE)
+ == prop_fail_result)
+ MRRETURN(MATCH_NOMATCH);
+ }
+ /* Control never gets here */
+
+ /* This should never occur */
+
default:
RRETURN(PCRE_ERROR_INTERNAL);
}
@@ -4553,6 +4736,83 @@
eptr+= len;
}
break;
+
+ case PT_ALNUM:
+ for (i = min; i < max; i++)
+ {
+ int len = 1;
+ if (eptr >= md->end_subject)
+ {
+ SCHECK_PARTIAL();
+ break;
+ }
+ GETCHARLEN(c, eptr, len);
+ prop_category = UCD_CATEGORY(c);
+ if ((prop_category == ucp_L || prop_category == ucp_N)
+ == prop_fail_result)
+ break;
+ eptr+= len;
+ }
+ break;
+
+ case PT_SPACE: /* Perl space */
+ for (i = min; i < max; i++)
+ {
+ int len = 1;
+ if (eptr >= md->end_subject)
+ {
+ SCHECK_PARTIAL();
+ break;
+ }
+ GETCHARLEN(c, eptr, len);
+ prop_category = UCD_CATEGORY(c);
+ if ((prop_category == ucp_Z || c == CHAR_HT || c == CHAR_NL ||
+ c == CHAR_FF || c == CHAR_CR)
+ == prop_fail_result)
+ break;
+ eptr+= len;
+ }
+ break;
+
+ case PT_PXSPACE: /* POSIX space */
+ for (i = min; i < max; i++)
+ {
+ int len = 1;
+ if (eptr >= md->end_subject)
+ {
+ SCHECK_PARTIAL();
+ break;
+ }
+ GETCHARLEN(c, eptr, len);
+ prop_category = UCD_CATEGORY(c);
+ if ((prop_category == ucp_Z || c == CHAR_HT || c == CHAR_NL ||
+ c == CHAR_VT || c == CHAR_FF || c == CHAR_CR)
+ == prop_fail_result)
+ break;
+ eptr+= len;
+ }
+ break;
+
+ case PT_WORD:
+ for (i = min; i < max; i++)
+ {
+ int len = 1;
+ if (eptr >= md->end_subject)
+ {
+ SCHECK_PARTIAL();
+ break;
+ }
+ GETCHARLEN(c, eptr, len);
+ prop_category = UCD_CATEGORY(c);
+ if ((prop_category == ucp_L || prop_category == ucp_N ||
+ c == CHAR_UNDERSCORE) == prop_fail_result)
+ break;
+ eptr+= len;
+ }
+ break;
+
+ default:
+ RRETURN(PCRE_ERROR_INTERNAL);
}
/* eptr is now past the end of the maximum run */
Modified: code/trunk/pcre_internal.h
===================================================================
--- code/trunk/pcre_internal.h 2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/pcre_internal.h 2010-05-05 10:44:20 UTC (rev 517)
@@ -1190,9 +1190,13 @@
#define PT_ANY 0 /* Any property - matches all chars */
#define PT_LAMP 1 /* L& - the union of Lu, Ll, Lt */
-#define PT_GC 2 /* General characteristic (e.g. L) */
-#define PT_PC 3 /* Particular characteristic (e.g. Lu) */
+#define PT_GC 2 /* Specified general characteristic (e.g. L) */
+#define PT_PC 3 /* Specified particular characteristic (e.g. Lu) */
#define PT_SC 4 /* Script (e.g. Han) */
+#define PT_ALNUM 5 /* Alphanumeric - the union of L and N */
+#define PT_SPACE 6 /* Perl space - Z plus 9,10,12,13 */
+#define PT_PXSPACE 7 /* POSIX space - Z plus 9,10,11,12,13 */
+#define PT_WORD 8 /* Word - L plus N plus underscore */
/* Flag bits and data types for the extended class (OP_XCLASS) for classes that
contain UTF-8 characters with values greater than 255. */
Modified: code/trunk/pcre_tables.c
===================================================================
--- code/trunk/pcre_tables.c 2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/pcre_tables.c 2010-05-05 10:44:20 UTC (rev 517)
@@ -243,6 +243,10 @@
#define STRING_Tifinagh0 STR_T STR_i STR_f STR_i STR_n STR_a STR_g STR_h "\0"
#define STRING_Ugaritic0 STR_U STR_g STR_a STR_r STR_i STR_t STR_i STR_c "\0"
#define STRING_Vai0 STR_V STR_a STR_i "\0"
+#define STRING_Xan0 STR_X STR_a STR_n "\0"
+#define STRING_Xps0 STR_X STR_p STR_s "\0"
+#define STRING_Xsp0 STR_X STR_s STR_p "\0"
+#define STRING_Xwd0 STR_X STR_w STR_d "\0"
#define STRING_Yi0 STR_Y STR_i "\0"
#define STRING_Z0 STR_Z "\0"
#define STRING_Zl0 STR_Z STR_l "\0"
@@ -376,6 +380,10 @@
STRING_Tifinagh0
STRING_Ugaritic0
STRING_Vai0
+ STRING_Xan0
+ STRING_Xps0
+ STRING_Xsp0
+ STRING_Xwd0
STRING_Yi0
STRING_Z0
STRING_Zl0
@@ -509,11 +517,15 @@
{ 891, PT_SC, ucp_Tifinagh },
{ 900, PT_SC, ucp_Ugaritic },
{ 909, PT_SC, ucp_Vai },
- { 913, PT_SC, ucp_Yi },
- { 916, PT_GC, ucp_Z },
- { 918, PT_PC, ucp_Zl },
- { 921, PT_PC, ucp_Zp },
- { 924, PT_PC, ucp_Zs }
+ { 913, PT_ALNUM, 0 },
+ { 917, PT_PXSPACE, 0 },
+ { 921, PT_SPACE, 0 },
+ { 925, PT_WORD, 0 },
+ { 929, PT_SC, ucp_Yi },
+ { 932, PT_GC, ucp_Z },
+ { 934, PT_PC, ucp_Zl },
+ { 937, PT_PC, ucp_Zp },
+ { 940, PT_PC, ucp_Zs }
};
const int _pcre_utt_size = sizeof(_pcre_utt)/sizeof(ucp_type_table);
Modified: code/trunk/pcre_xclass.c
===================================================================
--- code/trunk/pcre_xclass.c 2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/pcre_xclass.c 2010-05-05 10:44:20 UTC (rev 517)
@@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
- Copyright (c) 1997-2009 University of Cambridge
+ Copyright (c) 1997-2010 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -112,12 +112,13 @@
break;
case PT_LAMP:
- if ((prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || prop->chartype == ucp_Lt) ==
- (t == XCL_PROP)) return !negated;
+ if ((prop->chartype == ucp_Lu || prop->chartype == ucp_Ll ||
+ prop->chartype == ucp_Lt) == (t == XCL_PROP)) return !negated;
break;
case PT_GC:
- if ((data[1] == _pcre_ucp_gentype[prop->chartype]) == (t == XCL_PROP)) return !negated;
+ if ((data[1] == _pcre_ucp_gentype[prop->chartype]) == (t == XCL_PROP))
+ return !negated;
break;
case PT_PC:
@@ -127,7 +128,34 @@
case PT_SC:
if ((data[1] == prop->script) == (t == XCL_PROP)) return !negated;
break;
+
+ case PT_ALNUM:
+ if ((_pcre_ucp_gentype[prop->chartype] == ucp_L ||
+ _pcre_ucp_gentype[prop->chartype] == ucp_N) == (t == XCL_PROP))
+ return !negated;
+ break;
+
+ case PT_SPACE: /* Perl space */
+ if ((_pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+ c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR)
+ == (t == XCL_PROP))
+ return !negated;
+ break;
+ case PT_PXSPACE: /* POSIX space */
+ if ((_pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+ c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
+ c == CHAR_FF || c == CHAR_CR) == (t == XCL_PROP))
+ return !negated;
+ break;
+
+ case PT_WORD:
+ if ((_pcre_ucp_gentype[prop->chartype] == ucp_L ||
+ _pcre_ucp_gentype[prop->chartype] == ucp_N || c == CHAR_UNDERSCORE)
+ == (t == XCL_PROP))
+ return !negated;
+ break;
+
/* This should never occur, but compilers may mutter if there is no
default. */
Modified: code/trunk/testdata/testinput12
===================================================================
--- code/trunk/testdata/testinput12 2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/testdata/testinput12 2010-05-05 10:44:20 UTC (rev 517)
@@ -211,5 +211,152 @@
A\x{300}\x{301}\x{302}BC
*** Failers
\x{300}
+
+/-- These are PCRE's extra properties to help with Unicodizing \d etc. --/
+/^\p{Xan}/8
+ ABCD
+ 1234
+ \x{6ca}
+ \x{a6c}
+ \x{10a7}
+ ** Failers
+ _ABC
+
+/^\p{Xan}+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ ** Failers
+ _ABC
+
+/^\p{Xan}+?/8
+ \x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xan}*/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xan}{2,9}/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xan}{2,9}?/8
+ \x{6ca}\x{a6c}\x{10a7}_
+
+/^[\p{Xan}]/8
+ ABCD1234_
+ 1234abcd_
+ \x{6ca}
+ \x{a6c}
+ \x{10a7}
+ ** Failers
+ _ABC
+
+/^[\p{Xan}]+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ ** Failers
+ _ABC
+
+/^>\p{Xsp}/8
+ >\x{1680}\x{2028}\x{0b}
+ >\x{a0}
+ ** Failers
+ \x{0b}
+
+/^>\p{Xsp}+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xsp}+?/8
+ >\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xsp}*/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xsp}{2,9}/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xsp}{2,9}?/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>[\p{Xsp}]/8
+ >\x{2028}\x{0b}
+
+/^>[\p{Xsp}]+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}/8
+ >\x{1680}\x{2028}\x{0b}
+ >\x{a0}
+ ** Failers
+ \x{0b}
+
+/^>\p{Xps}+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}+?/8
+ >\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}*/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}{2,9}/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}{2,9}?/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>[\p{Xps}]/8
+ >\x{2028}\x{0b}
+
+/^>[\p{Xps}]+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^\p{Xwd}/8
+ ABCD
+ 1234
+ \x{6ca}
+ \x{a6c}
+ \x{10a7}
+ _ABC
+ ** Failers
+ []
+
+/^\p{Xwd}+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xwd}+?/8
+ \x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xwd}*/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xwd}{2,9}/8
+ A_B12\x{6ca}\x{a6c}\x{10a7}
+
+/^\p{Xwd}{2,9}?/8
+ \x{6ca}\x{a6c}\x{10a7}_
+
+/^[\p{Xwd}]/8
+ ABCD1234_
+ 1234abcd_
+ \x{6ca}
+ \x{a6c}
+ \x{10a7}
+ _ABC
+ ** Failers
+ []
+
+/^[\p{Xwd}]+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/-- A check not in UTF-8 mode --/
+
+/^[\p{Xwd}]+/
+ ABCD1234_
+
+/-- Some negative checks --/
+
+/^[\P{Xwd}]+/8
+ !.+\x{019}\x{35a}AB
+
+/^[\p{^Xwd}]+/8
+ !.+\x{019}\x{35a}AB
+
/-- End of testinput12 --/
Modified: code/trunk/testdata/testinput9
===================================================================
--- code/trunk/testdata/testinput9 2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/testdata/testinput9 2010-05-05 10:44:20 UTC (rev 517)
@@ -847,4 +847,117 @@
** Failers
\x{1d79}\x{a77d}
+/^\p{Xan}/8
+ ABCD
+ 1234
+ \x{6ca}
+ \x{a6c}
+ \x{10a7}
+ ** Failers
+ _ABC
+
+/^\p{Xan}+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ ** Failers
+ _ABC
+
+/^\p{Xan}*/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xan}{2,9}/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/^[\p{Xan}]/8
+ ABCD1234_
+ 1234abcd_
+ \x{6ca}
+ \x{a6c}
+ \x{10a7}
+ ** Failers
+ _ABC
+
+/^[\p{Xan}]+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ ** Failers
+ _ABC
+
+/^>\p{Xsp}/8
+ >\x{1680}\x{2028}\x{0b}
+ ** Failers
+ \x{0b}
+
+/^>\p{Xsp}+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xsp}*/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xsp}{2,9}/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>[\p{Xsp}]/8
+ >\x{2028}\x{0b}
+
+/^>[\p{Xsp}]+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}/8
+ >\x{1680}\x{2028}\x{0b}
+ >\x{a0}
+ ** Failers
+ \x{0b}
+
+/^>\p{Xps}+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}+?/8
+ >\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}*/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}{2,9}/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}{2,9}?/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>[\p{Xps}]/8
+ >\x{2028}\x{0b}
+
+/^>[\p{Xps}]+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^\p{Xwd}/8
+ ABCD
+ 1234
+ \x{6ca}
+ \x{a6c}
+ \x{10a7}
+ _ABC
+ ** Failers
+ []
+
+/^\p{Xwd}+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xwd}*/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xwd}{2,9}/8
+ A_12\x{6ca}\x{a6c}\x{10a7}
+
+/^[\p{Xwd}]/8
+ ABCD1234_
+ 1234abcd_
+ \x{6ca}
+ \x{a6c}
+ \x{10a7}
+ _ABC
+ ** Failers
+ []
+
+/^[\p{Xwd}]+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
/-- End of testinput9 --/
Modified: code/trunk/testdata/testoutput12
===================================================================
--- code/trunk/testdata/testoutput12 2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/testdata/testoutput12 2010-05-05 10:44:20 UTC (rev 517)
@@ -484,5 +484,223 @@
0: *
\x{300}
No match
+
+/-- These are PCRE's extra properties to help with Unicodizing \d etc. --/
+/^\p{Xan}/8
+ ABCD
+ 0: A
+ 1234
+ 0: 1
+ \x{6ca}
+ 0: \x{6ca}
+ \x{a6c}
+ 0: \x{a6c}
+ \x{10a7}
+ 0: \x{10a7}
+ ** Failers
+No match
+ _ABC
+No match
+
+/^\p{Xan}+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ ** Failers
+No match
+ _ABC
+No match
+
+/^\p{Xan}+?/8
+ \x{6ca}\x{a6c}\x{10a7}_
+ 0: \x{6ca}
+
+/^\p{Xan}*/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+
+/^\p{Xan}{2,9}/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}
+
+/^\p{Xan}{2,9}?/8
+ \x{6ca}\x{a6c}\x{10a7}_
+ 0: \x{6ca}\x{a6c}
+
+/^[\p{Xan}]/8
+ ABCD1234_
+ 0: A
+ 1234abcd_
+ 0: 1
+ \x{6ca}
+ 0: \x{6ca}
+ \x{a6c}
+ 0: \x{a6c}
+ \x{10a7}
+ 0: \x{10a7}
+ ** Failers
+No match
+ _ABC
+No match
+
+/^[\p{Xan}]+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ ** Failers
+No match
+ _ABC
+No match
+
+/^>\p{Xsp}/8
+ >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}
+ >\x{a0}
+ 0: >\x{a0}
+ ** Failers
+No match
+ \x{0b}
+No match
+
+/^>\p{Xsp}+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+
+/^>\p{Xsp}+?/8
+ >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}
+
+/^>\p{Xsp}*/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+
+/^>\p{Xsp}{2,9}/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+
+/^>\p{Xsp}{2,9}?/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}
+
+/^>[\p{Xsp}]/8
+ >\x{2028}\x{0b}
+ 0: >\x{2028}
+
+/^>[\p{Xsp}]+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+
+/^>\p{Xps}/8
+ >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}
+ >\x{a0}
+ 0: >\x{a0}
+ ** Failers
+No match
+ \x{0b}
+No match
+
+/^>\p{Xps}+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}+?/8
+ >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}
+
+/^>\p{Xps}*/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}{2,9}/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}{2,9}?/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}
+
+/^>[\p{Xps}]/8
+ >\x{2028}\x{0b}
+ 0: >\x{2028}
+
+/^>[\p{Xps}]+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^\p{Xwd}/8
+ ABCD
+ 0: A
+ 1234
+ 0: 1
+ \x{6ca}
+ 0: \x{6ca}
+ \x{a6c}
+ 0: \x{a6c}
+ \x{10a7}
+ 0: \x{10a7}
+ _ABC
+ 0: _
+ ** Failers
+No match
+ []
+No match
+
+/^\p{Xwd}+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xwd}+?/8
+ \x{6ca}\x{a6c}\x{10a7}_
+ 0: \x{6ca}
+
+/^\p{Xwd}*/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xwd}{2,9}/8
+ A_B12\x{6ca}\x{a6c}\x{10a7}
+ 0: A_B12\x{6ca}\x{a6c}\x{10a7}
+
+/^\p{Xwd}{2,9}?/8
+ \x{6ca}\x{a6c}\x{10a7}_
+ 0: \x{6ca}\x{a6c}
+
+/^[\p{Xwd}]/8
+ ABCD1234_
+ 0: A
+ 1234abcd_
+ 0: 1
+ \x{6ca}
+ 0: \x{6ca}
+ \x{a6c}
+ 0: \x{a6c}
+ \x{10a7}
+ 0: \x{10a7}
+ _ABC
+ 0: _
+ ** Failers
+No match
+ []
+No match
+
+/^[\p{Xwd}]+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/-- A check not in UTF-8 mode --/
+
+/^[\p{Xwd}]+/
+ ABCD1234_
+ 0: ABCD1234_
+
+/-- Some negative checks --/
+
+/^[\P{Xwd}]+/8
+ !.+\x{019}\x{35a}AB
+ 0: !.+\x{19}\x{35a}
+
+/^[\p{^Xwd}]+/8
+ !.+\x{019}\x{35a}AB
+ 0: !.+\x{19}\x{35a}
+
/-- End of testinput12 --/
Modified: code/trunk/testdata/testoutput9
===================================================================
--- code/trunk/testdata/testoutput9 2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/testdata/testoutput9 2010-05-05 10:44:20 UTC (rev 517)
@@ -1674,4 +1674,324 @@
\x{1d79}\x{a77d}
No match
+/^\p{Xan}/8
+ ABCD
+ 0: A
+ 1234
+ 0: 1
+ \x{6ca}
+ 0: \x{6ca}
+ \x{a6c}
+ 0: \x{a6c}
+ \x{10a7}
+ 0: \x{10a7}
+ ** Failers
+No match
+ _ABC
+No match
+
+/^\p{Xan}+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 1: ABCD1234\x{6ca}\x{a6c}
+ 2: ABCD1234\x{6ca}
+ 3: ABCD1234
+ 4: ABCD123
+ 5: ABCD12
+ 6: ABCD1
+ 7: ABCD
+ 8: ABC
+ 9: AB
+10: A
+ ** Failers
+No match
+ _ABC
+No match
+
+/^\p{Xan}*/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 1: ABCD1234\x{6ca}\x{a6c}
+ 2: ABCD1234\x{6ca}
+ 3: ABCD1234
+ 4: ABCD123
+ 5: ABCD12
+ 6: ABCD1
+ 7: ABCD
+ 8: ABC
+ 9: AB
+10: A
+11:
+
+/^\p{Xan}{2,9}/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}
+ 1: ABCD1234
+ 2: ABCD123
+ 3: ABCD12
+ 4: ABCD1
+ 5: ABCD
+ 6: ABC
+ 7: AB
+
+/^[\p{Xan}]/8
+ ABCD1234_
+ 0: A
+ 1234abcd_
+ 0: 1
+ \x{6ca}
+ 0: \x{6ca}
+ \x{a6c}
+ 0: \x{a6c}
+ \x{10a7}
+ 0: \x{10a7}
+ ** Failers
+No match
+ _ABC
+No match
+
+/^[\p{Xan}]+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 1: ABCD1234\x{6ca}\x{a6c}
+ 2: ABCD1234\x{6ca}
+ 3: ABCD1234
+ 4: ABCD123
+ 5: ABCD12
+ 6: ABCD1
+ 7: ABCD
+ 8: ABC
+ 9: AB
+10: A
+ ** Failers
+No match
+ _ABC
+No match
+
+/^>\p{Xsp}/8
+ >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}
+ ** Failers
+No match
+ \x{0b}
+No match
+
+/^>\p{Xsp}+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}
+ 4: > \x{09}\x{0a}\x{0c}
+ 5: > \x{09}\x{0a}
+ 6: > \x{09}
+ 7: >
+
+/^>\p{Xsp}*/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}
+ 4: > \x{09}\x{0a}\x{0c}
+ 5: > \x{09}\x{0a}
+ 6: > \x{09}
+ 7: >
+ 8: >
+
+/^>\p{Xsp}{2,9}/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}
+ 4: > \x{09}\x{0a}\x{0c}
+ 5: > \x{09}\x{0a}
+ 6: > \x{09}
+
+/^>[\p{Xsp}]/8
+ >\x{2028}\x{0b}
+ 0: >\x{2028}
+
+/^>[\p{Xsp}]+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}
+ 4: > \x{09}\x{0a}\x{0c}
+ 5: > \x{09}\x{0a}
+ 6: > \x{09}
+ 7: >
+
+/^>\p{Xps}/8
+ >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}
+ >\x{a0}
+ 0: >\x{a0}
+ ** Failers
+No match
+ \x{0b}
+No match
+
+/^>\p{Xps}+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: >
+
+/^>\p{Xps}+?/8
+ >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}\x{2028}\x{0b}
+ 1: >\x{1680}\x{2028}
+ 2: >\x{1680}
+
+/^>\p{Xps}*/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: >
+ 9: >
+
+/^>\p{Xps}{2,9}/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+
+/^>\p{Xps}{2,9}?/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+
+/^>[\p{Xps}]/8
+ >\x{2028}\x{0b}
+ 0: >\x{2028}
+
+/^>[\p{Xps}]+/8
+ > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: >
+
+/^\p{Xwd}/8
+ ABCD
+ 0: A
+ 1234
+ 0: 1
+ \x{6ca}
+ 0: \x{6ca}
+ \x{a6c}
+ 0: \x{a6c}
+ \x{10a7}
+ 0: \x{10a7}
+ _ABC
+ 0: _
+ ** Failers
+No match
+ []
+No match
+
+/^\p{Xwd}+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 1: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 2: ABCD1234\x{6ca}\x{a6c}
+ 3: ABCD1234\x{6ca}
+ 4: ABCD1234
+ 5: ABCD123
+ 6: ABCD12
+ 7: ABCD1
+ 8: ABCD
+ 9: ABC
+10: AB
+11: A
+
+/^\p{Xwd}*/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 1: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 2: ABCD1234\x{6ca}\x{a6c}
+ 3: ABCD1234\x{6ca}
+ 4: ABCD1234
+ 5: ABCD123
+ 6: ABCD12
+ 7: ABCD1
+ 8: ABCD
+ 9: ABC
+10: AB
+11: A
+12:
+
+/^\p{Xwd}{2,9}/8
+ A_12\x{6ca}\x{a6c}\x{10a7}
+ 0: A_12\x{6ca}\x{a6c}\x{10a7}
+ 1: A_12\x{6ca}\x{a6c}
+ 2: A_12\x{6ca}
+ 3: A_12
+ 4: A_1
+ 5: A_
+
+/^[\p{Xwd}]/8
+ ABCD1234_
+ 0: A
+ 1234abcd_
+ 0: 1
+ \x{6ca}
+ 0: \x{6ca}
+ \x{a6c}
+ 0: \x{a6c}
+ \x{10a7}
+ 0: \x{10a7}
+ _ABC
+ 0: _
+ ** Failers
+No match
+ []
+No match
+
+/^[\p{Xwd}]+/8
+ ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 1: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 2: ABCD1234\x{6ca}\x{a6c}
+ 3: ABCD1234\x{6ca}
+ 4: ABCD1234
+ 5: ABCD123
+ 6: ABCD12
+ 7: ABCD1
+ 8: ABCD
+ 9: ABC
+10: AB
+11: A
+
/-- End of testinput9 --/