[Pcre-svn] [412] code/trunk: Add support for (*UTF8).

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [412] code/trunk: Add support for (*UTF8).
Revision: 412
          http://vcs.pcre.org/viewvc?view=rev&revision=412
Author:   ph10
Date:     2009-04-11 11:34:37 +0100 (Sat, 11 Apr 2009)


Log Message:
-----------
Add support for (*UTF8).

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcre.3
    code/trunk/doc/pcreapi.3
    code/trunk/doc/pcrepattern.3
    code/trunk/doc/pcresyntax.3
    code/trunk/pcre_compile.c
    code/trunk/pcre_internal.h
    code/trunk/pcretest.c
    code/trunk/testdata/testinput5
    code/trunk/testdata/testoutput5


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2009-04-10 15:40:21 UTC (rev 411)
+++ code/trunk/ChangeLog    2009-04-11 10:34:37 UTC (rev 412)
@@ -1,7 +1,7 @@
 ChangeLog for PCRE
 ------------------


-Version 7.9 10-Apr-09
+Version 7.9 11-Apr-09
---------------------

 1.  When building with support for bzlib/zlib (pcregrep) and/or readline
@@ -116,6 +116,8 @@
 27. Wrapped the definitions of fileno and isatty for Windows, which appear in
     pcretest.c, inside #ifndefs, because it seems they are sometimes already 
     pre-defined. 
+    
+28. Added support for (*UTF8) at the start of a pattern. 





Modified: code/trunk/doc/pcre.3
===================================================================
--- code/trunk/doc/pcre.3    2009-04-10 15:40:21 UTC (rev 411)
+++ code/trunk/doc/pcre.3    2009-04-11 10:34:37 UTC (rev 412)
@@ -15,7 +15,7 @@
 Perl 5.10, including support for UTF-8 encoded strings and Unicode general
 category properties. However, UTF-8 and Unicode support has to be explicitly
 enabled; it is not the default. The Unicode tables correspond to Unicode
-release 5.0.0.
+release 5.1.
 .P
 In addition to the Perl-compatible matching function, PCRE contains an
 alternative matching function that matches the same compiled patterns in a
@@ -163,9 +163,10 @@
 .\" HREF
 \fBpcre_compile()\fP
 .\"
-with the PCRE_UTF8 option flag. When you do this, both the pattern and any
-subject strings that are matched against it are treated as UTF-8 strings
-instead of just strings of bytes.
+with the PCRE_UTF8 option flag, or the pattern must start with the sequence
+(*UTF8). When either of these is the case, both the pattern and any subject
+strings that are matched against it are treated as UTF-8 strings instead of
+just strings of bytes.
 .P
 If you compile PCRE with UTF-8 support, but do not use it at run time, the
 library will be a bit bigger, but the additional run time overhead is limited
@@ -290,6 +291,6 @@
 .rs
 .sp
 .nf
-Last updated: 18 March 2009
+Last updated: 11 April 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2009-04-10 15:40:21 UTC (rev 411)
+++ code/trunk/doc/pcreapi.3    2009-04-11 10:34:37 UTC (rev 412)
@@ -406,16 +406,17 @@
 .P
 The \fIoptions\fP argument contains various bit settings that affect the
 compilation. It should be zero if no options are required. The available
-options are described below. Some of them, in particular, those that are
-compatible with Perl, can also be set and unset from within the pattern (see
-the detailed description in the
+options are described below. Some of them (in particular, those that are
+compatible with Perl, but also some others) can also be set and unset from
+within the pattern (see the detailed description in the
 .\" HREF
 \fBpcrepattern\fP
 .\"
-documentation). For these options, the contents of the \fIoptions\fP argument
-specifies their initial settings at the start of compilation and execution. The
-PCRE_ANCHORED and PCRE_NEWLINE_\fIxxx\fP options can be set at the time of
-matching as well as at compile time.
+documentation). For those options that can be different in different parts of
+the pattern, the contents of the \fIoptions\fP argument specifies their initial
+settings at the start of compilation and execution. The PCRE_ANCHORED and
+PCRE_NEWLINE_\fIxxx\fP options can be set at the time of matching as well as at
+compile time.
 .P
 If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.
 Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fP returns
@@ -1995,6 +1996,6 @@
 .rs
 .sp
 .nf
-Last updated: 17 March 2009
+Last updated: 11 April 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2009-04-10 15:40:21 UTC (rev 411)
+++ code/trunk/doc/pcrepattern.3    2009-04-11 10:34:37 UTC (rev 412)
@@ -23,8 +23,15 @@
 The original operation of PCRE was on strings of one-byte characters. However,
 there is now also support for UTF-8 character strings. To use this, you must
 build PCRE to include UTF-8 support, and then call \fBpcre_compile()\fP with
-the PCRE_UTF8 option. How this affects pattern matching is mentioned in several
-places below. There is also a summary of UTF-8 features in the
+the PCRE_UTF8 option. There is also a special sequence that can be given at the 
+start of a pattern:
+.sp
+  (*UTF8)
+.sp   
+Starting a pattern with this sequence is equivalent to setting the PCRE_UTF8
+option. This feature is not Perl-compatible. How setting UTF-8 mode affects
+pattern matching is mentioned in several places below. There is also a summary
+of UTF-8 features in the
 .\" HTML <a href="pcre.html#utf8support">
 .\" </a>
 section on UTF-8 support
@@ -1032,11 +1039,11 @@
 changed in the same way as the Perl-compatible options by using the characters
 J, U and X respectively.
 .P
-When an option change occurs at top level (that is, not inside subpattern
-parentheses), the change applies to the remainder of the pattern that follows.
-If the change is placed right at the start of a pattern, PCRE extracts it into
-the global options (and it will therefore show up in data extracted by the
-\fBpcre_fullinfo()\fP function).
+When one of these option changes occurs at top level (that is, not inside
+subpattern parentheses), the change applies to the remainder of the pattern
+that follows. If the change is placed right at the start of a pattern, PCRE
+extracts it into the global options (and it will therefore show up in data
+extracted by the \fBpcre_fullinfo()\fP function).
 .P
 An option change within a subpattern (see below for a description of
 subpatterns) affects only that part of the current pattern that follows it, so
@@ -1057,13 +1064,15 @@
 .P
 \fBNote:\fP There are other PCRE-specific options that can be set by the
 application when the compile or match functions are called. In some cases the
-pattern can contain special leading sequences to override what the application
-has set or what has been defaulted. Details are given in the section entitled
+pattern can contain special leading sequences such as (*CRLF) to override what
+the application has set or what has been defaulted. Details are given in the
+section entitled
 .\" HTML <a href="#newlineseq">
 .\" </a>
 "Newline sequences"
 .\"
-above.
+above. There is also the (*UTF8) leading sequence that can be used to set UTF-8 
+mode; this is equivalent to setting the PCRE_UTF8 option.
 .
 .
 .\" HTML <a name="subpattern"></a>
@@ -2245,6 +2254,6 @@
 .rs
 .sp
 .nf
-Last updated: 18 March 2009
+Last updated: 11 April 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcresyntax.3
===================================================================
--- code/trunk/doc/pcresyntax.3    2009-04-10 15:40:21 UTC (rev 411)
+++ code/trunk/doc/pcresyntax.3    2009-04-11 10:34:37 UTC (rev 412)
@@ -120,6 +120,8 @@
 Buginese,
 Buhid,
 Canadian_Aboriginal,
+Carian,
+Cham,
 Cherokee,
 Common,
 Coptic,
@@ -143,12 +145,16 @@
 Inherited,
 Kannada,
 Katakana,
+Kayah_Li,
 Kharoshthi,
 Khmer,
 Lao,
 Latin,
+Lepcha,
 Limbu,
 Linear_B,
+Lycian,
+Lydian,
 Malayalam,
 Mongolian,
 Myanmar,
@@ -157,13 +163,17 @@
 Ogham,
 Old_Italic,
 Old_Persian,
+Ol_Chiki,
 Oriya,
 Osmanya,
 Phags_Pa,
 Phoenician,
+Rejang,
 Runic,
+Saurashtra,
 Shavian,
 Sinhala,
+Sudanese,
 Syloti_Nagri,
 Syriac,
 Tagalog,
@@ -176,6 +186,7 @@
 Tibetan,
 Tifinagh,
 Ugaritic,
+Vai,
 Yi.
 .
 .
@@ -231,7 +242,7 @@
 .SH "ANCHORS AND SIMPLE ASSERTIONS"
 .rs
 .sp
-  \eb          word boundary
+  \eb          word boundary (only ASCII letters recognized)
   \eB          not a word boundary
   ^           start of subject
                also after internal newline in multiline mode
@@ -260,19 +271,19 @@
 .SH "CAPTURING"
 .rs
 .sp
-  (...)          capturing group
-  (?<name>...)   named capturing group (Perl)
-  (?'name'...)   named capturing group (Perl)
-  (?P<name>...)  named capturing group (Python)
-  (?:...)        non-capturing group
-  (?|...)        non-capturing group; reset group numbers for
-                  capturing groups in each alternative
+  (...)           capturing group
+  (?<name>...)    named capturing group (Perl)
+  (?'name'...)    named capturing group (Perl)
+  (?P<name>...)   named capturing group (Python)
+  (?:...)         non-capturing group
+  (?|...)         non-capturing group; reset group numbers for
+                   capturing groups in each alternative
 .
 .
 .SH "ATOMIC GROUPS"
 .rs
 .sp
-  (?>...)        atomic, non-capturing group
+  (?>...)         atomic, non-capturing group
 .
 .
 .
@@ -280,28 +291,33 @@
 .SH "COMMENT"
 .rs
 .sp
-  (?#....)       comment (not nestable)
+  (?#....)        comment (not nestable)
 .
 .
 .SH "OPTION SETTING"
 .rs
 .sp
-  (?i)           caseless
-  (?J)           allow duplicate names
-  (?m)           multiline
-  (?s)           single line (dotall)
-  (?U)           default ungreedy (lazy)
-  (?x)           extended (ignore white space)
-  (?-...)        unset option(s)
+  (?i)            caseless
+  (?J)            allow duplicate names
+  (?m)            multiline
+  (?s)            single line (dotall)
+  (?U)            default ungreedy (lazy)
+  (?x)            extended (ignore white space)
+  (?-...)         unset option(s)
+.sp
+The following is recognized only at the start of a pattern or after one of the 
+newline-setting options with similar syntax:
+.sp
+  (*UTF8)         set UTF-8 mode   
 .
 .
 .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
 .rs
 .sp
-  (?=...)        positive look ahead
-  (?!...)        negative look ahead
-  (?<=...)       positive look behind
-  (?<!...)       negative look behind
+  (?=...)         positive look ahead
+  (?!...)         negative look ahead
+  (?<=...)        positive look behind
+  (?<!...)        negative look behind
 .sp
 Each top-level branch of a look behind must be of a fixed length.
 .
@@ -309,34 +325,34 @@
 .SH "BACKREFERENCES"
 .rs
 .sp
-  \en             reference by number (can be ambiguous)
-  \egn            reference by number
-  \eg{n}          reference by number
-  \eg{-n}         relative reference by number
-  \ek<name>       reference by name (Perl)
-  \ek'name'       reference by name (Perl)
-  \eg{name}       reference by name (Perl)
-  \ek{name}       reference by name (.NET)
-  (?P=name)      reference by name (Python)
+  \en              reference by number (can be ambiguous)
+  \egn             reference by number
+  \eg{n}           reference by number
+  \eg{-n}          relative reference by number
+  \ek<name>        reference by name (Perl)
+  \ek'name'        reference by name (Perl)
+  \eg{name}        reference by name (Perl)
+  \ek{name}        reference by name (.NET)
+  (?P=name)       reference by name (Python)
 .
 .
 .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
 .rs
 .sp
-  (?R)           recurse whole pattern
-  (?n)           call subpattern by absolute number
-  (?+n)          call subpattern by relative number
-  (?-n)          call subpattern by relative number
-  (?&name)       call subpattern by name (Perl)
-  (?P>name)      call subpattern by name (Python)
-  \eg<name>       call subpattern by name (Oniguruma)
-  \eg'name'       call subpattern by name (Oniguruma)
-  \eg<n>          call subpattern by absolute number (Oniguruma)
-  \eg'n'          call subpattern by absolute number (Oniguruma)
-  \eg<+n>         call subpattern by relative number (PCRE extension)
-  \eg'+n'         call subpattern by relative number (PCRE extension)
-  \eg<-n>         call subpattern by relative number (PCRE extension)
-  \eg'-n'         call subpattern by relative number (PCRE extension)
+  (?R)            recurse whole pattern
+  (?n)            call subpattern by absolute number
+  (?+n)           call subpattern by relative number
+  (?-n)           call subpattern by relative number
+  (?&name)        call subpattern by name (Perl)
+  (?P>name)       call subpattern by name (Python)
+  \eg<name>        call subpattern by name (Oniguruma)
+  \eg'name'        call subpattern by name (Oniguruma)
+  \eg<n>           call subpattern by absolute number (Oniguruma)
+  \eg'n'           call subpattern by absolute number (Oniguruma)
+  \eg<+n>          call subpattern by relative number (PCRE extension)
+  \eg'+n'          call subpattern by relative number (PCRE extension)
+  \eg<-n>          call subpattern by relative number (PCRE extension)
+  \eg'-n'          call subpattern by relative number (PCRE extension)
 .
 .
 .SH "CONDITIONAL PATTERNS"
@@ -345,17 +361,17 @@
   (?(condition)yes-pattern)
   (?(condition)yes-pattern|no-pattern)
 .sp
-  (?(n)...       absolute reference condition
-  (?(+n)...      relative reference condition
-  (?(-n)...      relative reference condition
-  (?(<name>)...  named reference condition (Perl)
-  (?('name')...  named reference condition (Perl)
-  (?(name)...    named reference condition (PCRE)
-  (?(R)...       overall recursion condition
-  (?(Rn)...      specific group recursion condition
-  (?(R&name)...  specific recursion condition
-  (?(DEFINE)...  define subpattern for reference
-  (?(assert)...  assertion condition
+  (?(n)...        absolute reference condition
+  (?(+n)...       relative reference condition
+  (?(-n)...       relative reference condition
+  (?(<name>)...   named reference condition (Perl)
+  (?('name')...   named reference condition (Perl)
+  (?(name)...     named reference condition (PCRE)
+  (?(R)...        overall recursion condition
+  (?(Rn)...       specific group recursion condition
+  (?(R&name)...   specific recursion condition
+  (?(DEFINE)...   define subpattern for reference
+  (?(assert)...   assertion condition
 .
 .
 .SH "BACKTRACKING CONTROL"
@@ -363,41 +379,41 @@
 .sp
 The following act immediately they are reached:
 .sp
-  (*ACCEPT)      force successful match
-  (*FAIL)        force backtrack; synonym (*F)
+  (*ACCEPT)       force successful match
+  (*FAIL)         force backtrack; synonym (*F)
 .sp
 The following act only when a subsequent match failure causes a backtrack to
 reach them. They all force a match failure, but they differ in what happens
 afterwards. Those that advance the start-of-match point do so only if the
 pattern is not anchored.
 .sp
-  (*COMMIT)      overall failure, no advance of starting point
-  (*PRUNE)       advance to next starting character
-  (*SKIP)        advance start to current matching position
-  (*THEN)        local failure, backtrack to next alternation
+  (*COMMIT)       overall failure, no advance of starting point
+  (*PRUNE)        advance to next starting character
+  (*SKIP)         advance start to current matching position
+  (*THEN)         local failure, backtrack to next alternation
 .
 .
 .SH "NEWLINE CONVENTIONS"
 .rs
 .sp
 These are recognized only at the very start of the pattern or after a
-(*BSR_...) option.
+(*BSR_...) or (*UTF8) option.
 .sp
-  (*CR)
-  (*LF)
-  (*CRLF)
-  (*ANYCRLF)
-  (*ANY)
+  (*CR)           carriage return only
+  (*LF)           linefeed only
+  (*CRLF)         carriage return followed by linefeed
+  (*ANYCRLF)      all three of the above
+  (*ANY)          any Unicode newline sequence
 .
 .
 .SH "WHAT \eR MATCHES"
 .rs
 .sp
 These are recognized only at the very start of the pattern or after a
-(*...) option that sets the newline convention.
+(*...) option that sets the newline convention or UTF-8 mode.
 .sp
-  (*BSR_ANYCRLF)
-  (*BSR_UNICODE)
+  (*BSR_ANYCRLF)  CR, LF, or CRLF
+  (*BSR_UNICODE)  any Unicode newline sequence
 .
 .
 .SH "CALLOUTS"
@@ -428,6 +444,6 @@
 .rs
 .sp
 .nf
-Last updated: 09 April 2008
-Copyright (c) 1997-2008 University of Cambridge.
+Last updated: 11 April 2009
+Copyright (c) 1997-2009 University of Cambridge.
 .fi


Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c    2009-04-10 15:40:21 UTC (rev 411)
+++ code/trunk/pcre_compile.c    2009-04-11 10:34:37 UTC (rev 412)
@@ -6226,38 +6226,22 @@


*erroroffset = 0;

-/* Can't support UTF8 unless PCRE has been compiled to include the code. */
+/* Set up pointers to the individual character tables */

-#ifdef SUPPORT_UTF8
-utf8 = (options & PCRE_UTF8) != 0;
-if (utf8 && (options & PCRE_NO_UTF8_CHECK) == 0 &&
-     (*erroroffset = _pcre_valid_utf8((uschar *)pattern, -1)) >= 0)
-  {
-  errorcode = ERR44;
-  goto PCRE_EARLY_ERROR_RETURN2;
-  }
-#else
-if ((options & PCRE_UTF8) != 0)
-  {
-  errorcode = ERR32;
-  goto PCRE_EARLY_ERROR_RETURN;
-  }
-#endif
+if (tables == NULL) tables = _pcre_default_tables;
+cd->lcc = tables + lcc_offset;
+cd->fcc = tables + fcc_offset;
+cd->cbits = tables + cbits_offset;
+cd->ctypes = tables + ctypes_offset;


+/* Check that all undefined public option bits are zero */
+
if ((options & ~PUBLIC_COMPILE_OPTIONS) != 0)
{
errorcode = ERR17;
goto PCRE_EARLY_ERROR_RETURN;
}

-/* Set up pointers to the individual character tables */
-
-if (tables == NULL) tables = _pcre_default_tables;
-cd->lcc = tables + lcc_offset;
-cd->fcc = tables + fcc_offset;
-cd->cbits = tables + cbits_offset;
-cd->ctypes = tables + ctypes_offset;
-
/* Check for global one-time settings at the start of the pattern, and remember
the offset for later. */

@@ -6267,6 +6251,9 @@
int newnl = 0;
int newbsr = 0;

+  if (strncmp((char *)(ptr+skipatstart+2), STRING_UTF8_RIGHTPAR, 5) == 0)
+    { skipatstart += 7; options |= PCRE_UTF8; continue; }
+
   if (strncmp((char *)(ptr+skipatstart+2), STRING_CR_RIGHTPAR, 3) == 0)
     { skipatstart += 5; newnl = PCRE_NEWLINE_CR; }
   else if (strncmp((char *)(ptr+skipatstart+2), STRING_LF_RIGHTPAR, 3)  == 0)
@@ -6290,6 +6277,24 @@
   else break;
   }


+/* Can't support UTF8 unless PCRE has been compiled to include the code. */
+
+#ifdef SUPPORT_UTF8
+utf8 = (options & PCRE_UTF8) != 0;
+if (utf8 && (options & PCRE_NO_UTF8_CHECK) == 0 &&
+     (*erroroffset = _pcre_valid_utf8((uschar *)pattern, -1)) >= 0)
+  {
+  errorcode = ERR44;
+  goto PCRE_EARLY_ERROR_RETURN2;
+  }
+#else
+if ((options & PCRE_UTF8) != 0)
+  {
+  errorcode = ERR32;
+  goto PCRE_EARLY_ERROR_RETURN;
+  }
+#endif
+
 /* Check validity of \R options. */


switch (options & (PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE))

Modified: code/trunk/pcre_internal.h
===================================================================
--- code/trunk/pcre_internal.h    2009-04-10 15:40:21 UTC (rev 411)
+++ code/trunk/pcre_internal.h    2009-04-11 10:34:37 UTC (rev 412)
@@ -881,6 +881,7 @@
 #define STRING_ANYCRLF_RIGHTPAR     "ANYCRLF)"
 #define STRING_BSR_ANYCRLF_RIGHTPAR "BSR_ANYCRLF)"
 #define STRING_BSR_UNICODE_RIGHTPAR "BSR_UNICODE)"
+#define STRING_UTF8_RIGHTPAR        "UTF8)"


#else /* SUPPORT_UTF8 */

@@ -1132,6 +1133,7 @@
 #define STRING_ANYCRLF_RIGHTPAR     STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
 #define STRING_BSR_ANYCRLF_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS
 #define STRING_BSR_UNICODE_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_U STR_N STR_I STR_C STR_O STR_D STR_E STR_RIGHT_PARENTHESIS
+#define STRING_UTF8_RIGHTPAR        STR_U STR_T STR_F STR_8 STR_RIGHT_PARENTHESIS


#endif /* SUPPORT_UTF8 */


Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c    2009-04-10 15:40:21 UTC (rev 411)
+++ code/trunk/pcretest.c    2009-04-11 10:34:37 UTC (rev 412)
@@ -1325,6 +1325,8 @@
 #endif  /* !defined NOPOSIX */


     {
+    unsigned long int get_options;
+      
     if (timeit > 0)
       {
       register int i;
@@ -1367,10 +1369,17 @@
         }
       goto CONTINUE;
       }
+      
+    /* Compilation succeeded. It is now possible to set the UTF-8 option from 
+    within the regex; check for this so that we know how to process the data 
+    lines. */
+    
+    new_info(re, NULL, PCRE_INFO_OPTIONS, &get_options);
+    if ((get_options & PCRE_UTF8) != 0) use_utf8 = 1;


-    /* Compilation succeeded; print data if required. There are now two
-    info-returning functions. The old one has a limited interface and
-    returns only limited data. Check that it agrees with the newer one. */
+    /* Print information if required. There are now two info-returning
+    functions. The old one has a limited interface and returns only limited
+    data. Check that it agrees with the newer one. */


     if (log_store)
       fprintf(outfile, "Memory allocation (code space): %d\n",
@@ -1454,10 +1463,12 @@
       fprintf(outfile, "------------------------------------------------------------------\n");
       pcre_printint(re, outfile, debug_lengths);
       }
+      
+    /* We already have the options in get_options (see above) */


     if (do_showinfo)
       {
-      unsigned long int get_options, all_options;
+      unsigned long int all_options;
 #if !defined NOINFOCHECK
       int old_first_char, old_options, old_count;
 #endif
@@ -1466,7 +1477,6 @@
       int nameentrysize, namecount;
       const uschar *nametable;


-      new_info(re, NULL, PCRE_INFO_OPTIONS, &get_options);
       new_info(re, NULL, PCRE_INFO_SIZE, &size);
       new_info(re, NULL, PCRE_INFO_CAPTURECOUNT, &count);
       new_info(re, NULL, PCRE_INFO_BACKREFMAX, &backrefmax);


Modified: code/trunk/testdata/testinput5
===================================================================
--- code/trunk/testdata/testinput5    2009-04-10 15:40:21 UTC (rev 411)
+++ code/trunk/testdata/testinput5    2009-04-11 10:34:37 UTC (rev 412)
@@ -480,4 +480,9 @@
 /X/8f<any> 
     A\x{1ec5}ABCXYZ


+/(*UTF8)\x{1234}/
+ abcd\x{1234}pqr
+
+/(*CRLF)(*UTF8)(*BSR_UNICODE)a\Rb/I
+
/ End of testinput5 /

Modified: code/trunk/testdata/testoutput5
===================================================================
--- code/trunk/testdata/testoutput5    2009-04-10 15:40:21 UTC (rev 411)
+++ code/trunk/testdata/testoutput5    2009-04-11 10:34:37 UTC (rev 412)
@@ -1641,4 +1641,15 @@
     A\x{1ec5}ABCXYZ
  0: X


+/(*UTF8)\x{1234}/
+ abcd\x{1234}pqr
+ 0: \x{1234}
+
+/(*CRLF)(*UTF8)(*BSR_UNICODE)a\Rb/I
+Capturing subpattern count = 0
+Options: bsr_unicode utf8
+Forced newline sequence: CRLF
+First char = 'a'
+Need char = 'b'
+
/ End of testinput5 /