[Pcre-svn] [517] code/trunk: Add new special properties Xan,…

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [517] code/trunk: Add new special properties Xan, Xps, Xsp, Xwd to help with \w etc.
Revision: 517
          http://vcs.pcre.org/viewvc?view=rev&revision=517
Author:   ph10
Date:     2010-05-05 11:44:20 +0100 (Wed, 05 May 2010)


Log Message:
-----------
Add new special properties Xan, Xps, Xsp, Xwd to help with \w etc.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcrepattern.3
    code/trunk/doc/pcresyntax.3
    code/trunk/maint/GenerateUtt.py
    code/trunk/pcre_dfa_exec.c
    code/trunk/pcre_exec.c
    code/trunk/pcre_internal.h
    code/trunk/pcre_tables.c
    code/trunk/pcre_xclass.c
    code/trunk/testdata/testinput12
    code/trunk/testdata/testinput9
    code/trunk/testdata/testoutput12
    code/trunk/testdata/testoutput9


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/ChangeLog    2010-05-05 10:44:20 UTC (rev 517)
@@ -28,7 +28,11 @@


7. Minor change to pcretest.c to avoid a compiler warning.

+8.  Added four artifical Unicode properties to help with an option to make
+    \s etc use properties. The new properties are: Xan (alphanumeric), Xsp 
+    (Perl space), Xps (POSIX space), and Xwd (word).


+
Version 8.02 19-Mar-2010
------------------------


Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/doc/pcrepattern.3    2010-05-05 10:44:20 UTC (rev 517)
@@ -505,10 +505,16 @@
   \eX       an extended Unicode sequence
 .sp
 The property names represented by \fIxx\fP above are limited to the Unicode
-script names, the general category properties, and "Any", which matches any
-character (including newline). Other properties such as "InMusicalSymbols" are
-not currently supported by PCRE. Note that \eP{Any} does not match any
-characters, so always causes a match failure.
+script names, the general category properties, "Any", which matches any
+character (including newline), and some special PCRE properties (described
+in the 
+.\" HTML <a href="#extraprops">
+.\" </a>
+next section). 
+.\"
+Other Perl properties such as "InMusicalSymbols" are not currently supported by
+PCRE. Note that \eP{Any} does not match any characters, so always causes a
+match failure.
 .P
 Sets of Unicode characters are defined as belonging to certain scripts. A
 character from one of these sets can be matched using a script name. For
@@ -613,10 +619,10 @@
 Vai,
 Yi.
 .P
-Each character has exactly one general category property, specified by a
-two-letter abbreviation. For compatibility with Perl, negation can be specified
-by including a circumflex between the opening brace and the property name. For
-example, \ep{^Lu} is the same as \eP{Lu}.
+Each character has exactly one Unicode general category property, specified by
+a two-letter abbreviation. For compatibility with Perl, negation can be
+specified by including a circumflex between the opening brace and the property
+name. For example, \ep{^Lu} is the same as \eP{Lu}.
 .P
 If only one letter is specified with \ep or \eP, it includes all the general
 category properties that start with that letter. In this case, in the absence
@@ -718,6 +724,27 @@
 properties in PCRE.
 .
 .
+.\" HTML <a name="extraprops"></a>
+.SS PCRE's additional properties
+.rs
+.sp
+As well as the standard Unicode properties described in the previous 
+section, PCRE supports four more that make it possible to convert traditional 
+escape sequences such as \ew and \es and POSIX character classes to use Unicode
+properties. These are:
+.sp
+  Xan   Any alphanumeric character
+  Xps   Any POSIX space character
+  Xsp   Any Perl space character
+  Xwd   Any Perl "word" character
+.sp
+Xan matches characters that have either the L (letter) or the N (number) 
+property. Xps matches the characters tab, linefeed, vertical tab, formfeed, or 
+carriage return, and any other character that has the Z (separator) property.
+Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the 
+same characters as Xan, plus underscore.
+.
+.
 .\" HTML <a name="resetmatchstart"></a>
 .SS "Resetting the match start"
 .rs
@@ -2597,6 +2624,6 @@
 .rs
 .sp
 .nf
-Last updated: 03 May 2010
+Last updated: 05 May 2010
 Copyright (c) 1997-2010 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcresyntax.3
===================================================================
--- code/trunk/doc/pcresyntax.3    2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/doc/pcresyntax.3    2010-05-05 10:44:20 UTC (rev 517)
@@ -45,6 +45,7 @@
   \eD         a character that is not a decimal digit
   \eh         a horizontal whitespace character
   \eH         a character that is not a horizontal whitespace character
+  \eN         a character that is not a newline 
   \ep{\fIxx\fP}     a character with the \fIxx\fP property
   \eP{\fIxx\fP}     a character without the \fIxx\fP property
   \eR         a newline sequence
@@ -59,7 +60,7 @@
 In PCRE, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII characters.
 .
 .
-.SH "GENERAL CATEGORY PROPERTY CODES FOR \ep and \eP"
+.SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
 .rs
 .sp
   C          Other
@@ -108,6 +109,15 @@
   Zs         Space separator
 .
 .
+.SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
+.rs
+.sp
+  Xan        Alphanumeric: union of properties L and N
+  Xps        POSIX space: property Z or tab, NL, VT, FF, CR
+  Xsp        Perl space: property Z or tab, NL, FF, CR
+  Xwd        Perl word: property Xan or underscore 
+.
+.
 .SH "SCRIPT NAMES FOR \ep AND \eP"
 .rs
 .sp
@@ -459,6 +469,6 @@
 .rs
 .sp
 .nf
-Last updated: 01 March 2010
+Last updated: 05 May 2010
 Copyright (c) 1997-2010 University of Cambridge.
 .fi


Modified: code/trunk/maint/GenerateUtt.py
===================================================================
--- code/trunk/maint/GenerateUtt.py    2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/maint/GenerateUtt.py    2010-05-05 10:44:20 UTC (rev 517)
@@ -11,6 +11,7 @@
 # Modified by PH 17-March-2009 to generate the more verbose form that works
 # for UTF-support in EBCDIC as well as ASCII environments.
 # Modified by PH 01-March-2010 to add new scripts from Unicode 5.2.0.
+# Modified by PH 04-May-2010 to add new "X.." special categories.


script_names = ['Arabic', 'Armenian', 'Bengali', 'Bopomofo', 'Braille', 'Buginese', 'Buhid', 'Canadian_Aboriginal', \
'Cherokee', 'Common', 'Coptic', 'Cypriot', 'Cyrillic', 'Deseret', 'Devanagari', 'Ethiopic', 'Georgian', \
@@ -36,12 +37,23 @@

general_category_names = ['C', 'L', 'M', 'N', 'P', 'S', 'Z']

+# First add the Unicode script and category names.
+
utt_table = zip(script_names, ['PT_SC'] * len(script_names))
utt_table += zip(category_names, ['PT_PC'] * len(category_names))
utt_table += zip(general_category_names, ['PT_GC'] * len(general_category_names))
-utt_table.append(('L&', 'PT_LAMP'))
+
+# Now add our own specials.
+
utt_table.append(('Any', 'PT_ANY'))
+utt_table.append(('L&', 'PT_LAMP'))
+utt_table.append(('Xan', 'PT_ALNUM'))
+utt_table.append(('Xps', 'PT_PXSPACE'))
+utt_table.append(('Xsp', 'PT_SPACE'))
+utt_table.append(('Xwd', 'PT_WORD'))

+# Sort the table.
+
utt_table.sort()

 # We have to use STR_ macros to define the strings so that it all works in
@@ -74,7 +86,8 @@
 offset = 0
 last = ','
 for utt in utt_table:
-    if utt[1] in ('PT_ANY', 'PT_LAMP'):
+    if utt[1] in ('PT_ANY', 'PT_LAMP', 'PT_ALNUM', 'PT_PXSPACE', 
+          'PT_SPACE', 'PT_WORD'):
         value = '0'
     else:
         value = 'ucp_' + utt[0]


Modified: code/trunk/pcre_dfa_exec.c
===================================================================
--- code/trunk/pcre_dfa_exec.c    2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/pcre_dfa_exec.c    2010-05-05 10:44:20 UTC (rev 517)
@@ -955,7 +955,8 @@
           break;


           case PT_LAMP:
-          OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || prop->chartype == ucp_Lt;
+          OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || 
+               prop->chartype == ucp_Lt;
           break;


           case PT_GC:
@@ -969,6 +970,30 @@
           case PT_SC:
           OK = prop->script == code[2];
           break;
+          
+          /* These are specials for combination cases. */
+          
+          case PT_ALNUM:
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+               _pcre_ucp_gentype[prop->chartype] == ucp_N;
+          break;        
+ 
+          case PT_SPACE:    /* Perl space */
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+               c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
+          break;  
+ 
+          case PT_PXSPACE:  /* POSIX space */
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+               c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
+               c == CHAR_FF || c == CHAR_CR;
+          break;      
+ 
+          case PT_WORD:
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+               _pcre_ucp_gentype[prop->chartype] == ucp_N ||
+               c == CHAR_UNDERSCORE;
+          break;            


           /* Should never occur, but keep compilers from grumbling. */


@@ -1124,7 +1149,8 @@
           break;


           case PT_LAMP:
-          OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || prop->chartype == ucp_Lt;
+          OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || 
+            prop->chartype == ucp_Lt;
           break;


           case PT_GC:
@@ -1139,6 +1165,30 @@
           OK = prop->script == code[3];
           break;


+          /* These are specials for combination cases. */
+          
+          case PT_ALNUM:
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+               _pcre_ucp_gentype[prop->chartype] == ucp_N;
+          break;        
+ 
+          case PT_SPACE:    /* Perl space */
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+               c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
+          break;  
+ 
+          case PT_PXSPACE:  /* POSIX space */
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+               c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
+               c == CHAR_FF || c == CHAR_CR;
+          break;      
+ 
+          case PT_WORD:
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+               _pcre_ucp_gentype[prop->chartype] == ucp_N ||
+               c == CHAR_UNDERSCORE;
+          break;            
+
           /* Should never occur, but keep compilers from grumbling. */


           default:
@@ -1346,7 +1396,8 @@
           break;


           case PT_LAMP:
-          OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || prop->chartype == ucp_Lt;
+          OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || 
+            prop->chartype == ucp_Lt;
           break;


           case PT_GC:
@@ -1360,6 +1411,30 @@
           case PT_SC:
           OK = prop->script == code[3];
           break;
+          
+          /* These are specials for combination cases. */
+          
+          case PT_ALNUM:
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+               _pcre_ucp_gentype[prop->chartype] == ucp_N;
+          break;        
+ 
+          case PT_SPACE:    /* Perl space */
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+               c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
+          break;  
+ 
+          case PT_PXSPACE:  /* POSIX space */
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+               c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
+               c == CHAR_FF || c == CHAR_CR;
+          break;      
+ 
+          case PT_WORD:
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+               _pcre_ucp_gentype[prop->chartype] == ucp_N ||
+               c == CHAR_UNDERSCORE;
+          break;            


           /* Should never occur, but keep compilers from grumbling. */


@@ -1593,7 +1668,8 @@
           break;


           case PT_LAMP:
-          OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || prop->chartype == ucp_Lt;
+          OK = prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || 
+            prop->chartype == ucp_Lt;
           break;


           case PT_GC:
@@ -1607,6 +1683,30 @@
           case PT_SC:
           OK = prop->script == code[5];
           break;
+          
+          /* These are specials for combination cases. */
+          
+          case PT_ALNUM:
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+               _pcre_ucp_gentype[prop->chartype] == ucp_N;
+          break;        
+ 
+          case PT_SPACE:    /* Perl space */
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+               c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR;
+          break;  
+ 
+          case PT_PXSPACE:  /* POSIX space */
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+               c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
+               c == CHAR_FF || c == CHAR_CR;
+          break;      
+ 
+          case PT_WORD:
+          OK = _pcre_ucp_gentype[prop->chartype] == ucp_L ||
+               _pcre_ucp_gentype[prop->chartype] == ucp_N ||
+               c == CHAR_UNDERSCORE;
+          break;            


           /* Should never occur, but keep compilers from grumbling. */



Modified: code/trunk/pcre_exec.c
===================================================================
--- code/trunk/pcre_exec.c    2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/pcre_exec.c    2010-05-05 10:44:20 UTC (rev 517)
@@ -2060,7 +2060,7 @@
              prop->chartype == ucp_Ll ||
              prop->chartype == ucp_Lt) == (op == OP_NOTPROP))
           MRRETURN(MATCH_NOMATCH);
-         break;
+        break;


         case PT_GC:
         if ((ecode[2] != _pcre_ucp_gentype[prop->chartype]) == (op == OP_PROP))
@@ -2076,7 +2076,39 @@
         if ((ecode[2] != prop->script) == (op == OP_PROP))
           MRRETURN(MATCH_NOMATCH);
         break;
+        
+        /* These are specials */
+        
+        case PT_ALNUM:
+        if ((_pcre_ucp_gentype[prop->chartype] == ucp_L ||
+             _pcre_ucp_gentype[prop->chartype] == ucp_N) == (op == OP_NOTPROP))
+          MRRETURN(MATCH_NOMATCH);
+        break;   
+ 
+        case PT_SPACE:    /* Perl space */
+        if ((_pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+             c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR)
+               == (op == OP_NOTPROP))
+          MRRETURN(MATCH_NOMATCH);
+        break;   
+ 
+        case PT_PXSPACE:  /* POSIX space */
+        if ((_pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+             c == CHAR_HT || c == CHAR_NL || c == CHAR_VT || 
+             c == CHAR_FF || c == CHAR_CR)
+               == (op == OP_NOTPROP))
+          MRRETURN(MATCH_NOMATCH);
+        break;   


+        case PT_WORD:   
+        if ((_pcre_ucp_gentype[prop->chartype] == ucp_L ||
+             _pcre_ucp_gentype[prop->chartype] == ucp_N || 
+             c == CHAR_UNDERSCORE) == (op == OP_NOTPROP))
+          MRRETURN(MATCH_NOMATCH);
+        break;   
+        
+        /* This should never occur */
+
         default:
         RRETURN(PCRE_ERROR_INTERNAL);
         }
@@ -3492,6 +3524,75 @@
               MRRETURN(MATCH_NOMATCH);
             }
           break;
+          
+          case PT_ALNUM:
+          for (i = 1; i <= min; i++)
+            {
+            if (eptr >= md->end_subject)
+              {
+              SCHECK_PARTIAL();
+              MRRETURN(MATCH_NOMATCH);
+              }
+            GETCHARINCTEST(c, eptr);
+            prop_category = UCD_CATEGORY(c); 
+            if ((prop_category == ucp_L || prop_category == ucp_N) 
+                   == prop_fail_result)
+              MRRETURN(MATCH_NOMATCH);
+            }
+          break;
+          
+          case PT_SPACE:    /* Perl space */
+          for (i = 1; i <= min; i++)
+            {
+            if (eptr >= md->end_subject)
+              {
+              SCHECK_PARTIAL();
+              MRRETURN(MATCH_NOMATCH);
+              }
+            GETCHARINCTEST(c, eptr);
+            prop_category = UCD_CATEGORY(c); 
+            if ((prop_category == ucp_Z || c == CHAR_HT || c == CHAR_NL || 
+                 c == CHAR_FF || c == CHAR_CR) 
+                   == prop_fail_result)
+              MRRETURN(MATCH_NOMATCH);
+            }
+          break;
+          
+          case PT_PXSPACE:  /* POSIX space */
+          for (i = 1; i <= min; i++)
+            {
+            if (eptr >= md->end_subject)
+              {
+              SCHECK_PARTIAL();
+              MRRETURN(MATCH_NOMATCH);
+              }
+            GETCHARINCTEST(c, eptr);
+            prop_category = UCD_CATEGORY(c); 
+            if ((prop_category == ucp_Z || c == CHAR_HT || c == CHAR_NL || 
+                 c == CHAR_VT || c == CHAR_FF || c == CHAR_CR) 
+                   == prop_fail_result)
+              MRRETURN(MATCH_NOMATCH);
+            }
+          break;
+          
+          case PT_WORD:   
+          for (i = 1; i <= min; i++)
+            {
+            if (eptr >= md->end_subject)
+              {
+              SCHECK_PARTIAL();
+              MRRETURN(MATCH_NOMATCH);
+              }
+            GETCHARINCTEST(c, eptr);
+            prop_category = UCD_CATEGORY(c); 
+            if ((prop_category == ucp_L || prop_category == ucp_N ||
+                 c == CHAR_UNDERSCORE) 
+                   == prop_fail_result)
+              MRRETURN(MATCH_NOMATCH);
+            }
+          break;
+          
+          /* This should not occur */


           default:
           RRETURN(PCRE_ERROR_INTERNAL);
@@ -4132,6 +4233,88 @@
             }
           /* Control never gets here */


+          case PT_ALNUM:
+          for (fi = min;; fi++)
+            {
+            RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM39);
+            if (rrc != MATCH_NOMATCH) RRETURN(rrc);
+            if (fi >= max) MRRETURN(MATCH_NOMATCH);
+            if (eptr >= md->end_subject)
+              {
+              SCHECK_PARTIAL();
+              MRRETURN(MATCH_NOMATCH);
+              }
+            GETCHARINC(c, eptr);
+            prop_category = UCD_CATEGORY(c); 
+            if ((prop_category == ucp_L || prop_category == ucp_N) 
+                   == prop_fail_result)
+              MRRETURN(MATCH_NOMATCH);
+            }
+          /* Control never gets here */
+          
+          case PT_SPACE:    /* Perl space */ 
+          for (fi = min;; fi++)
+            {
+            RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM39);
+            if (rrc != MATCH_NOMATCH) RRETURN(rrc);
+            if (fi >= max) MRRETURN(MATCH_NOMATCH);
+            if (eptr >= md->end_subject)
+              {
+              SCHECK_PARTIAL();
+              MRRETURN(MATCH_NOMATCH);
+              }
+            GETCHARINC(c, eptr);
+            prop_category = UCD_CATEGORY(c); 
+            if ((prop_category == ucp_Z || c == CHAR_HT || c == CHAR_NL || 
+                 c == CHAR_FF || c == CHAR_CR) 
+                   == prop_fail_result)
+              MRRETURN(MATCH_NOMATCH);
+            }
+          /* Control never gets here */
+           
+          case PT_PXSPACE:  /* POSIX space */
+          for (fi = min;; fi++)
+            {
+            RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM39);
+            if (rrc != MATCH_NOMATCH) RRETURN(rrc);
+            if (fi >= max) MRRETURN(MATCH_NOMATCH);
+            if (eptr >= md->end_subject)
+              {
+              SCHECK_PARTIAL();
+              MRRETURN(MATCH_NOMATCH);
+              }
+            GETCHARINC(c, eptr);
+            prop_category = UCD_CATEGORY(c); 
+            if ((prop_category == ucp_Z || c == CHAR_HT || c == CHAR_NL || 
+                 c == CHAR_VT || c == CHAR_FF || c == CHAR_CR) 
+                   == prop_fail_result)
+              MRRETURN(MATCH_NOMATCH);
+            }
+          /* Control never gets here */
+          
+          case PT_WORD: 
+          for (fi = min;; fi++)
+            {
+            RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM39);
+            if (rrc != MATCH_NOMATCH) RRETURN(rrc);
+            if (fi >= max) MRRETURN(MATCH_NOMATCH);
+            if (eptr >= md->end_subject)
+              {
+              SCHECK_PARTIAL();
+              MRRETURN(MATCH_NOMATCH);
+              }
+            GETCHARINC(c, eptr);
+            prop_category = UCD_CATEGORY(c); 
+            if ((prop_category == ucp_L || 
+                 prop_category == ucp_N ||
+                 c == CHAR_UNDERSCORE) 
+                   == prop_fail_result)
+              MRRETURN(MATCH_NOMATCH);
+            }
+          /* Control never gets here */
+
+          /* This should never occur */
+           
           default:
           RRETURN(PCRE_ERROR_INTERNAL);
           }
@@ -4553,6 +4736,83 @@
             eptr+= len;
             }
           break;
+          
+          case PT_ALNUM:
+          for (i = min; i < max; i++)
+            {
+            int len = 1;
+            if (eptr >= md->end_subject)
+              {
+              SCHECK_PARTIAL();
+              break;
+              }
+            GETCHARLEN(c, eptr, len);
+            prop_category = UCD_CATEGORY(c);
+            if ((prop_category == ucp_L || prop_category == ucp_N) 
+                 == prop_fail_result)
+              break;
+            eptr+= len;
+            }
+          break;
+
+          case PT_SPACE:    /* Perl space */
+          for (i = min; i < max; i++)
+            {
+            int len = 1;
+            if (eptr >= md->end_subject)
+              {
+              SCHECK_PARTIAL();
+              break;
+              }
+            GETCHARLEN(c, eptr, len);
+            prop_category = UCD_CATEGORY(c);
+            if ((prop_category == ucp_Z || c == CHAR_HT || c == CHAR_NL ||
+                 c == CHAR_FF || c == CHAR_CR) 
+                 == prop_fail_result)
+              break;
+            eptr+= len;
+            }
+          break;
+
+          case PT_PXSPACE:  /* POSIX space */
+          for (i = min; i < max; i++)
+            {
+            int len = 1;
+            if (eptr >= md->end_subject)
+              {
+              SCHECK_PARTIAL();
+              break;
+              }
+            GETCHARLEN(c, eptr, len);
+            prop_category = UCD_CATEGORY(c);
+            if ((prop_category == ucp_Z || c == CHAR_HT || c == CHAR_NL ||
+                 c == CHAR_VT || c == CHAR_FF || c == CHAR_CR) 
+                 == prop_fail_result)
+              break;
+            eptr+= len;
+            }
+          break;
+
+          case PT_WORD:
+          for (i = min; i < max; i++)
+            {
+            int len = 1;
+            if (eptr >= md->end_subject)
+              {
+              SCHECK_PARTIAL();
+              break;
+              }
+            GETCHARLEN(c, eptr, len);
+            prop_category = UCD_CATEGORY(c);
+            if ((prop_category == ucp_L || prop_category == ucp_N ||
+                 c == CHAR_UNDERSCORE) == prop_fail_result)
+              break;
+            eptr+= len;
+            }
+          break;
+
+          default:
+          RRETURN(PCRE_ERROR_INTERNAL);
           }


         /* eptr is now past the end of the maximum run */


Modified: code/trunk/pcre_internal.h
===================================================================
--- code/trunk/pcre_internal.h    2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/pcre_internal.h    2010-05-05 10:44:20 UTC (rev 517)
@@ -1190,9 +1190,13 @@


 #define PT_ANY        0    /* Any property - matches all chars */
 #define PT_LAMP       1    /* L& - the union of Lu, Ll, Lt */
-#define PT_GC         2    /* General characteristic (e.g. L) */
-#define PT_PC         3    /* Particular characteristic (e.g. Lu) */
+#define PT_GC         2    /* Specified general characteristic (e.g. L) */
+#define PT_PC         3    /* Specified particular characteristic (e.g. Lu) */
 #define PT_SC         4    /* Script (e.g. Han) */
+#define PT_ALNUM      5    /* Alphanumeric - the union of L and N */
+#define PT_SPACE      6    /* Perl space - Z plus 9,10,12,13 */
+#define PT_PXSPACE    7    /* POSIX space - Z plus 9,10,11,12,13 */
+#define PT_WORD       8    /* Word - L plus N plus underscore */


/* Flag bits and data types for the extended class (OP_XCLASS) for classes that
contain UTF-8 characters with values greater than 255. */

Modified: code/trunk/pcre_tables.c
===================================================================
--- code/trunk/pcre_tables.c    2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/pcre_tables.c    2010-05-05 10:44:20 UTC (rev 517)
@@ -243,6 +243,10 @@
 #define STRING_Tifinagh0 STR_T STR_i STR_f STR_i STR_n STR_a STR_g STR_h "\0"
 #define STRING_Ugaritic0 STR_U STR_g STR_a STR_r STR_i STR_t STR_i STR_c "\0"
 #define STRING_Vai0 STR_V STR_a STR_i "\0"
+#define STRING_Xan0 STR_X STR_a STR_n "\0"
+#define STRING_Xps0 STR_X STR_p STR_s "\0"
+#define STRING_Xsp0 STR_X STR_s STR_p "\0"
+#define STRING_Xwd0 STR_X STR_w STR_d "\0"
 #define STRING_Yi0 STR_Y STR_i "\0"
 #define STRING_Z0 STR_Z "\0"
 #define STRING_Zl0 STR_Z STR_l "\0"
@@ -376,6 +380,10 @@
   STRING_Tifinagh0
   STRING_Ugaritic0
   STRING_Vai0
+  STRING_Xan0
+  STRING_Xps0
+  STRING_Xsp0
+  STRING_Xwd0
   STRING_Yi0
   STRING_Z0
   STRING_Zl0
@@ -509,11 +517,15 @@
   { 891, PT_SC, ucp_Tifinagh },
   { 900, PT_SC, ucp_Ugaritic },
   { 909, PT_SC, ucp_Vai },
-  { 913, PT_SC, ucp_Yi },
-  { 916, PT_GC, ucp_Z },
-  { 918, PT_PC, ucp_Zl },
-  { 921, PT_PC, ucp_Zp },
-  { 924, PT_PC, ucp_Zs }
+  { 913, PT_ALNUM, 0 },
+  { 917, PT_PXSPACE, 0 },
+  { 921, PT_SPACE, 0 },
+  { 925, PT_WORD, 0 },
+  { 929, PT_SC, ucp_Yi },
+  { 932, PT_GC, ucp_Z },
+  { 934, PT_PC, ucp_Zl },
+  { 937, PT_PC, ucp_Zp },
+  { 940, PT_PC, ucp_Zs }
 };


const int _pcre_utt_size = sizeof(_pcre_utt)/sizeof(ucp_type_table);

Modified: code/trunk/pcre_xclass.c
===================================================================
--- code/trunk/pcre_xclass.c    2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/pcre_xclass.c    2010-05-05 10:44:20 UTC (rev 517)
@@ -6,7 +6,7 @@
 and semantics are as close as possible to those of the Perl 5 language.


                        Written by Philip Hazel
-           Copyright (c) 1997-2009 University of Cambridge
+           Copyright (c) 1997-2010 University of Cambridge


 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -112,12 +112,13 @@
       break;


       case PT_LAMP:
-      if ((prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || prop->chartype == ucp_Lt) ==
-          (t == XCL_PROP)) return !negated;
+      if ((prop->chartype == ucp_Lu || prop->chartype == ucp_Ll || 
+           prop->chartype == ucp_Lt) == (t == XCL_PROP)) return !negated;
       break;


       case PT_GC:
-      if ((data[1] == _pcre_ucp_gentype[prop->chartype]) == (t == XCL_PROP)) return !negated;
+      if ((data[1] == _pcre_ucp_gentype[prop->chartype]) == (t == XCL_PROP)) 
+        return !negated;
       break;


       case PT_PC:
@@ -127,7 +128,34 @@
       case PT_SC:
       if ((data[1] == prop->script) == (t == XCL_PROP)) return !negated;
       break;
+      
+      case PT_ALNUM:
+      if ((_pcre_ucp_gentype[prop->chartype] == ucp_L ||
+           _pcre_ucp_gentype[prop->chartype] == ucp_N) == (t == XCL_PROP))
+        return !negated;
+      break;         
+      
+      case PT_SPACE:    /* Perl space */
+      if ((_pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+           c == CHAR_HT || c == CHAR_NL || c == CHAR_FF || c == CHAR_CR) 
+             == (t == XCL_PROP))
+        return !negated;
+      break;         


+      case PT_PXSPACE:  /* POSIX space */
+      if ((_pcre_ucp_gentype[prop->chartype] == ucp_Z ||
+           c == CHAR_HT || c == CHAR_NL || c == CHAR_VT ||
+           c == CHAR_FF || c == CHAR_CR) == (t == XCL_PROP))
+        return !negated;
+      break;         
+
+      case PT_WORD:    
+      if ((_pcre_ucp_gentype[prop->chartype] == ucp_L ||
+           _pcre_ucp_gentype[prop->chartype] == ucp_N || c == CHAR_UNDERSCORE) 
+             == (t == XCL_PROP))
+        return !negated;
+      break;         
+
       /* This should never occur, but compilers may mutter if there is no
       default. */



Modified: code/trunk/testdata/testinput12
===================================================================
--- code/trunk/testdata/testinput12    2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/testdata/testinput12    2010-05-05 10:44:20 UTC (rev 517)
@@ -211,5 +211,152 @@
     A\x{300}\x{301}\x{302}BC 
     *** Failers
     \x{300}  
+    
+/-- These are PCRE's extra properties to help with Unicodizing \d etc. --/


+/^\p{Xan}/8
+    ABCD
+    1234
+    \x{6ca}
+    \x{a6c}
+    \x{10a7}   
+    ** Failers
+    _ABC   
+
+/^\p{Xan}+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    ** Failers
+    _ABC   
+
+/^\p{Xan}+?/8
+    \x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xan}*/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    
+/^\p{Xan}{2,9}/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    
+/^\p{Xan}{2,9}?/8
+    \x{6ca}\x{a6c}\x{10a7}_
+    
+/^[\p{Xan}]/8
+    ABCD1234_
+    1234abcd_
+    \x{6ca}
+    \x{a6c}
+    \x{10a7}   
+    ** Failers
+    _ABC   
+ 
+/^[\p{Xan}]+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    ** Failers
+    _ABC   
+
+/^>\p{Xsp}/8
+    >\x{1680}\x{2028}\x{0b}
+    >\x{a0} 
+    ** Failers
+    \x{0b} 
+
+/^>\p{Xsp}+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xsp}+?/8
+    >\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xsp}*/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>\p{Xsp}{2,9}/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>\p{Xsp}{2,9}?/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>[\p{Xsp}]/8
+    >\x{2028}\x{0b}
+ 
+/^>[\p{Xsp}]+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}/8
+    >\x{1680}\x{2028}\x{0b}
+    >\x{a0} 
+    ** Failers
+    \x{0b} 
+
+/^>\p{Xps}+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}+?/8
+    >\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}*/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>\p{Xps}{2,9}/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>\p{Xps}{2,9}?/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>[\p{Xps}]/8
+    >\x{2028}\x{0b}
+ 
+/^>[\p{Xps}]+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^\p{Xwd}/8
+    ABCD
+    1234
+    \x{6ca}
+    \x{a6c}
+    \x{10a7}
+    _ABC    
+    ** Failers
+    [] 
+
+/^\p{Xwd}+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xwd}+?/8
+    \x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xwd}*/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    
+/^\p{Xwd}{2,9}/8
+    A_B12\x{6ca}\x{a6c}\x{10a7}
+    
+/^\p{Xwd}{2,9}?/8
+    \x{6ca}\x{a6c}\x{10a7}_
+    
+/^[\p{Xwd}]/8
+    ABCD1234_
+    1234abcd_
+    \x{6ca}
+    \x{a6c}
+    \x{10a7}   
+    _ABC 
+    ** Failers
+    []   
+ 
+/^[\p{Xwd}]+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/-- A check not in UTF-8 mode --/
+
+/^[\p{Xwd}]+/
+    ABCD1234_
+    
+/-- Some negative checks --/
+
+/^[\P{Xwd}]+/8
+    !.+\x{019}\x{35a}AB
+
+/^[\p{^Xwd}]+/8
+    !.+\x{019}\x{35a}AB
+
 /-- End of testinput12 --/


Modified: code/trunk/testdata/testinput9
===================================================================
--- code/trunk/testdata/testinput9    2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/testdata/testinput9    2010-05-05 10:44:20 UTC (rev 517)
@@ -847,4 +847,117 @@
     ** Failers 
     \x{1d79}\x{a77d} 


+/^\p{Xan}/8
+    ABCD
+    1234
+    \x{6ca}
+    \x{a6c}
+    \x{10a7}   
+    ** Failers
+    _ABC   
+
+/^\p{Xan}+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    ** Failers
+    _ABC   
+
+/^\p{Xan}*/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    
+/^\p{Xan}{2,9}/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    
+/^[\p{Xan}]/8
+    ABCD1234_
+    1234abcd_
+    \x{6ca}
+    \x{a6c}
+    \x{10a7}   
+    ** Failers
+    _ABC   
+ 
+/^[\p{Xan}]+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    ** Failers
+    _ABC   
+
+/^>\p{Xsp}/8
+    >\x{1680}\x{2028}\x{0b}
+    ** Failers
+    \x{0b} 
+
+/^>\p{Xsp}+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xsp}*/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>\p{Xsp}{2,9}/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>[\p{Xsp}]/8
+    >\x{2028}\x{0b}
+ 
+/^>[\p{Xsp}]+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}/8
+    >\x{1680}\x{2028}\x{0b}
+    >\x{a0} 
+    ** Failers
+    \x{0b} 
+
+/^>\p{Xps}+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}+?/8
+    >\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}*/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>\p{Xps}{2,9}/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>\p{Xps}{2,9}?/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>[\p{Xps}]/8
+    >\x{2028}\x{0b}
+ 
+/^>[\p{Xps}]+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^\p{Xwd}/8
+    ABCD
+    1234
+    \x{6ca}
+    \x{a6c}
+    \x{10a7}
+    _ABC    
+    ** Failers
+    [] 
+
+/^\p{Xwd}+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xwd}*/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    
+/^\p{Xwd}{2,9}/8
+    A_12\x{6ca}\x{a6c}\x{10a7}
+    
+/^[\p{Xwd}]/8
+    ABCD1234_
+    1234abcd_
+    \x{6ca}
+    \x{a6c}
+    \x{10a7}   
+    _ABC 
+    ** Failers
+    []   
+ 
+/^[\p{Xwd}]+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
 /-- End of testinput9 --/ 


Modified: code/trunk/testdata/testoutput12
===================================================================
--- code/trunk/testdata/testoutput12    2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/testdata/testoutput12    2010-05-05 10:44:20 UTC (rev 517)
@@ -484,5 +484,223 @@
  0: *
     \x{300}  
 No match
+    
+/-- These are PCRE's extra properties to help with Unicodizing \d etc. --/


+/^\p{Xan}/8
+    ABCD
+ 0: A
+    1234
+ 0: 1
+    \x{6ca}
+ 0: \x{6ca}
+    \x{a6c}
+ 0: \x{a6c}
+    \x{10a7}   
+ 0: \x{10a7}
+    ** Failers
+No match
+    _ABC   
+No match
+
+/^\p{Xan}+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+    ** Failers
+No match
+    _ABC   
+No match
+
+/^\p{Xan}+?/8
+    \x{6ca}\x{a6c}\x{10a7}_
+ 0: \x{6ca}
+
+/^\p{Xan}*/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+    
+/^\p{Xan}{2,9}/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}
+    
+/^\p{Xan}{2,9}?/8
+    \x{6ca}\x{a6c}\x{10a7}_
+ 0: \x{6ca}\x{a6c}
+    
+/^[\p{Xan}]/8
+    ABCD1234_
+ 0: A
+    1234abcd_
+ 0: 1
+    \x{6ca}
+ 0: \x{6ca}
+    \x{a6c}
+ 0: \x{a6c}
+    \x{10a7}   
+ 0: \x{10a7}
+    ** Failers
+No match
+    _ABC   
+No match
+ 
+/^[\p{Xan}]+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+    ** Failers
+No match
+    _ABC   
+No match
+
+/^>\p{Xsp}/8
+    >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}
+    >\x{a0} 
+ 0: >\x{a0}
+    ** Failers
+No match
+    \x{0b} 
+No match
+
+/^>\p{Xsp}+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+
+/^>\p{Xsp}+?/8
+    >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}
+
+/^>\p{Xsp}*/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+    
+/^>\p{Xsp}{2,9}/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+    
+/^>\p{Xsp}{2,9}?/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}
+    
+/^>[\p{Xsp}]/8
+    >\x{2028}\x{0b}
+ 0: >\x{2028}
+ 
+/^>[\p{Xsp}]+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+
+/^>\p{Xps}/8
+    >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}
+    >\x{a0} 
+ 0: >\x{a0}
+    ** Failers
+No match
+    \x{0b} 
+No match
+
+/^>\p{Xps}+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^>\p{Xps}+?/8
+    >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}
+
+/^>\p{Xps}*/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>\p{Xps}{2,9}/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+    
+/^>\p{Xps}{2,9}?/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}
+    
+/^>[\p{Xps}]/8
+    >\x{2028}\x{0b}
+ 0: >\x{2028}
+ 
+/^>[\p{Xps}]+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+
+/^\p{Xwd}/8
+    ABCD
+ 0: A
+    1234
+ 0: 1
+    \x{6ca}
+ 0: \x{6ca}
+    \x{a6c}
+ 0: \x{a6c}
+    \x{10a7}
+ 0: \x{10a7}
+    _ABC    
+ 0: _
+    ** Failers
+No match
+    [] 
+No match
+
+/^\p{Xwd}+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/^\p{Xwd}+?/8
+    \x{6ca}\x{a6c}\x{10a7}_
+ 0: \x{6ca}
+
+/^\p{Xwd}*/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+    
+/^\p{Xwd}{2,9}/8
+    A_B12\x{6ca}\x{a6c}\x{10a7}
+ 0: A_B12\x{6ca}\x{a6c}\x{10a7}
+    
+/^\p{Xwd}{2,9}?/8
+    \x{6ca}\x{a6c}\x{10a7}_
+ 0: \x{6ca}\x{a6c}
+    
+/^[\p{Xwd}]/8
+    ABCD1234_
+ 0: A
+    1234abcd_
+ 0: 1
+    \x{6ca}
+ 0: \x{6ca}
+    \x{a6c}
+ 0: \x{a6c}
+    \x{10a7}   
+ 0: \x{10a7}
+    _ABC 
+ 0: _
+    ** Failers
+No match
+    []   
+No match
+ 
+/^[\p{Xwd}]+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+
+/-- A check not in UTF-8 mode --/
+
+/^[\p{Xwd}]+/
+    ABCD1234_
+ 0: ABCD1234_
+    
+/-- Some negative checks --/
+
+/^[\P{Xwd}]+/8
+    !.+\x{019}\x{35a}AB
+ 0: !.+\x{19}\x{35a}
+
+/^[\p{^Xwd}]+/8
+    !.+\x{019}\x{35a}AB
+ 0: !.+\x{19}\x{35a}
+
 /-- End of testinput12 --/


Modified: code/trunk/testdata/testoutput9
===================================================================
--- code/trunk/testdata/testoutput9    2010-05-04 15:51:35 UTC (rev 516)
+++ code/trunk/testdata/testoutput9    2010-05-05 10:44:20 UTC (rev 517)
@@ -1674,4 +1674,324 @@
     \x{1d79}\x{a77d} 
 No match


+/^\p{Xan}/8
+    ABCD
+ 0: A
+    1234
+ 0: 1
+    \x{6ca}
+ 0: \x{6ca}
+    \x{a6c}
+ 0: \x{a6c}
+    \x{10a7}   
+ 0: \x{10a7}
+    ** Failers
+No match
+    _ABC   
+No match
+
+/^\p{Xan}+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 1: ABCD1234\x{6ca}\x{a6c}
+ 2: ABCD1234\x{6ca}
+ 3: ABCD1234
+ 4: ABCD123
+ 5: ABCD12
+ 6: ABCD1
+ 7: ABCD
+ 8: ABC
+ 9: AB
+10: A
+    ** Failers
+No match
+    _ABC   
+No match
+
+/^\p{Xan}*/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 1: ABCD1234\x{6ca}\x{a6c}
+ 2: ABCD1234\x{6ca}
+ 3: ABCD1234
+ 4: ABCD123
+ 5: ABCD12
+ 6: ABCD1
+ 7: ABCD
+ 8: ABC
+ 9: AB
+10: A
+11: 
+    
+/^\p{Xan}{2,9}/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}
+ 1: ABCD1234
+ 2: ABCD123
+ 3: ABCD12
+ 4: ABCD1
+ 5: ABCD
+ 6: ABC
+ 7: AB
+    
+/^[\p{Xan}]/8
+    ABCD1234_
+ 0: A
+    1234abcd_
+ 0: 1
+    \x{6ca}
+ 0: \x{6ca}
+    \x{a6c}
+ 0: \x{a6c}
+    \x{10a7}   
+ 0: \x{10a7}
+    ** Failers
+No match
+    _ABC   
+No match
+ 
+/^[\p{Xan}]+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 1: ABCD1234\x{6ca}\x{a6c}
+ 2: ABCD1234\x{6ca}
+ 3: ABCD1234
+ 4: ABCD123
+ 5: ABCD12
+ 6: ABCD1
+ 7: ABCD
+ 8: ABC
+ 9: AB
+10: A
+    ** Failers
+No match
+    _ABC   
+No match
+
+/^>\p{Xsp}/8
+    >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}
+    ** Failers
+No match
+    \x{0b} 
+No match
+
+/^>\p{Xsp}+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}
+ 4: > \x{09}\x{0a}\x{0c}
+ 5: > \x{09}\x{0a}
+ 6: > \x{09}
+ 7: > 
+
+/^>\p{Xsp}*/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}
+ 4: > \x{09}\x{0a}\x{0c}
+ 5: > \x{09}\x{0a}
+ 6: > \x{09}
+ 7: > 
+ 8: >
+    
+/^>\p{Xsp}{2,9}/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}
+ 4: > \x{09}\x{0a}\x{0c}
+ 5: > \x{09}\x{0a}
+ 6: > \x{09}
+    
+/^>[\p{Xsp}]/8
+    >\x{2028}\x{0b}
+ 0: >\x{2028}
+ 
+/^>[\p{Xsp}]+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}
+ 4: > \x{09}\x{0a}\x{0c}
+ 5: > \x{09}\x{0a}
+ 6: > \x{09}
+ 7: > 
+
+/^>\p{Xps}/8
+    >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}
+    >\x{a0} 
+ 0: >\x{a0}
+    ** Failers
+No match
+    \x{0b} 
+No match
+
+/^>\p{Xps}+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: > 
+
+/^>\p{Xps}+?/8
+    >\x{1680}\x{2028}\x{0b}
+ 0: >\x{1680}\x{2028}\x{0b}
+ 1: >\x{1680}\x{2028}
+ 2: >\x{1680}
+
+/^>\p{Xps}*/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: > 
+ 9: >
+    
+/^>\p{Xps}{2,9}/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+    
+/^>\p{Xps}{2,9}?/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+    
+/^>[\p{Xps}]/8
+    >\x{2028}\x{0b}
+ 0: >\x{2028}
+ 
+/^>[\p{Xps}]+/8
+    > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
+ 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}
+ 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}
+ 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}
+ 4: > \x{09}\x{0a}\x{0c}\x{0d}
+ 5: > \x{09}\x{0a}\x{0c}
+ 6: > \x{09}\x{0a}
+ 7: > \x{09}
+ 8: > 
+
+/^\p{Xwd}/8
+    ABCD
+ 0: A
+    1234
+ 0: 1
+    \x{6ca}
+ 0: \x{6ca}
+    \x{a6c}
+ 0: \x{a6c}
+    \x{10a7}
+ 0: \x{10a7}
+    _ABC    
+ 0: _
+    ** Failers
+No match
+    [] 
+No match
+
+/^\p{Xwd}+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 1: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 2: ABCD1234\x{6ca}\x{a6c}
+ 3: ABCD1234\x{6ca}
+ 4: ABCD1234
+ 5: ABCD123
+ 6: ABCD12
+ 7: ABCD1
+ 8: ABCD
+ 9: ABC
+10: AB
+11: A
+
+/^\p{Xwd}*/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 1: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 2: ABCD1234\x{6ca}\x{a6c}
+ 3: ABCD1234\x{6ca}
+ 4: ABCD1234
+ 5: ABCD123
+ 6: ABCD12
+ 7: ABCD1
+ 8: ABCD
+ 9: ABC
+10: AB
+11: A
+12: 
+    
+/^\p{Xwd}{2,9}/8
+    A_12\x{6ca}\x{a6c}\x{10a7}
+ 0: A_12\x{6ca}\x{a6c}\x{10a7}
+ 1: A_12\x{6ca}\x{a6c}
+ 2: A_12\x{6ca}
+ 3: A_12
+ 4: A_1
+ 5: A_
+    
+/^[\p{Xwd}]/8
+    ABCD1234_
+ 0: A
+    1234abcd_
+ 0: 1
+    \x{6ca}
+ 0: \x{6ca}
+    \x{a6c}
+ 0: \x{a6c}
+    \x{10a7}   
+ 0: \x{10a7}
+    _ABC 
+ 0: _
+    ** Failers
+No match
+    []   
+No match
+ 
+/^[\p{Xwd}]+/8
+    ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_
+ 1: ABCD1234\x{6ca}\x{a6c}\x{10a7}
+ 2: ABCD1234\x{6ca}\x{a6c}
+ 3: ABCD1234\x{6ca}
+ 4: ABCD1234
+ 5: ABCD123
+ 6: ABCD12
+ 7: ABCD1
+ 8: ABCD
+ 9: ABC
+10: AB
+11: A
+
 /-- End of testinput9 --/