[Pcre-svn] [513] code/trunk: Make \R and \X in a character c…

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [513] code/trunk: Make \R and \X in a character class behave more like Perl
Revision: 513
          http://vcs.pcre.org/viewvc?view=rev&revision=513
Author:   ph10
Date:     2010-05-03 12:13:37 +0100 (Mon, 03 May 2010)


Log Message:
-----------
Make \R and \X in a character class behave more like Perl

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/configure.ac
    code/trunk/doc/pcreapi.3
    code/trunk/doc/pcrepattern.3
    code/trunk/pcre_compile.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput2


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2010-03-30 11:11:52 UTC (rev 512)
+++ code/trunk/ChangeLog    2010-05-03 11:13:37 UTC (rev 513)
@@ -1,7 +1,7 @@
 ChangeLog for PCRE
 ------------------


-Version 8.03 26-Mar-2010
+Version 8.10 03 May-2010
------------------------

1. Added support for (*MARK:ARG) and for ARG additions to PRUNE, SKIP, and
@@ -9,7 +9,15 @@

2. (*ACCEPT) was not working when inside an atomic group.

+3.  Inside a character class, \B is treated as a literal by default, but 
+    faulted if PCRE_EXTRA is set. This mimics Perl's behaviour (the -w option 
+    causes the error). The code is unchanged, but I tidied the documentation.
+    
+4.  Inside a character class, PCRE always treated \R and \X as literals, 
+    whereas Perl faults them if its -w option is set. I have changed PCRE so
+    that it faults them when PCRE_EXTRA is set.


+
Version 8.02 19-Mar-2010
------------------------


Modified: code/trunk/configure.ac
===================================================================
--- code/trunk/configure.ac    2010-03-30 11:11:52 UTC (rev 512)
+++ code/trunk/configure.ac    2010-05-03 11:13:37 UTC (rev 513)
@@ -9,9 +9,9 @@
 dnl be defined as -RC2, for example. For real releases, it should be empty.


m4_define(pcre_major, [8])
-m4_define(pcre_minor, [03])
-m4_define(pcre_prerelease, [-TEST-2])
-m4_define(pcre_date, [2010-03-30])
+m4_define(pcre_minor, [10])
+m4_define(pcre_prerelease, [-RC1])
+m4_define(pcre_date, [2010-05-03])

# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [0:1:0])

Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2010-03-30 11:11:52 UTC (rev 512)
+++ code/trunk/doc/pcreapi.3    2010-05-03 11:13:37 UTC (rev 513)
@@ -553,8 +553,9 @@
 special meaning causes an error, thus reserving these combinations for future
 expansion. By default, as in Perl, a backslash followed by a letter with no
 special meaning is treated as a literal. (Perl can, however, be persuaded to
-give a warning for this.) There are at present no other features controlled by
-this option. It can also be set by a (?X) option setting within a pattern.
+give an error for this, by running it with the -w option.) There are at present
+no other features controlled by this option. It can also be set by a (?X)
+option setting within a pattern.
 .sp
   PCRE_FIRSTLINE
 .sp
@@ -2099,6 +2100,6 @@
 .rs
 .sp
 .nf
-Last updated: 26 March 2010
+Last updated: 03 May 2010
 Copyright (c) 1997-2010 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2010-03-30 11:11:52 UTC (rev 512)
+++ code/trunk/doc/pcrepattern.3    2010-05-03 11:13:37 UTC (rev 513)
@@ -295,10 +295,11 @@
 .P
 All the sequences that define a single character value can be used both inside
 and outside character classes. In addition, inside a character class, the
-sequence \eb is interpreted as the backspace character (hex 08), and the
-sequences \eR and \eX are interpreted as the characters "R" and "X",
-respectively. Outside a character class, these sequences have different
-meanings
+sequence \eb is interpreted as the backspace character (hex 08). The sequences
+\eB, \eR, and \eX are not special inside a character class. Like any other
+unrecognized escape sequences, they are treated as the literal characters "B",
+"R", and "X" by default, but cause an error if the PCRE_EXTRA option is set.
+Outside a character class, these sequences have different meanings
 .\" HTML <a href="#uniextseq">
 .\" </a>
 (see below).
@@ -478,7 +479,9 @@
 .sp
   (*ANY)(*BSR_ANYCRLF)
 .sp
-Inside a character class, \eR matches the letter "R".
+Inside a character class, \eR is treated as an unrecognized escape sequence, 
+and so matches the letter "R" by default, but causes an error if PCRE_EXTRA is
+set.
 .
 .
 .\" HTML <a name="uniextseq"></a>
@@ -765,8 +768,11 @@
   \ez     matches only at the end of the subject
   \eG     matches at the first matching position in the subject
 .sp
-These assertions may not appear in character classes (but note that \eb has a
-different meaning, namely the backspace character, inside a character class).
+Inside a character class, \eb has a different meaning; it matches the backspace
+character. If any other of these assertions appears in a character class, by 
+default it matches the corresponding literal character (for example, \eB
+matches the letter B). However, if the PCRE_EXTRA option is set, an "invalid
+escape sequence" error is generated instead.
 .P
 A word boundary is a position in the subject string where the current character
 and the previous character do not both match \ew or \eW (i.e. one matches
@@ -2580,6 +2586,6 @@
 .rs
 .sp
 .nf
-Last updated: 27 March 2010
+Last updated: 03 May 2010
 Copyright (c) 1997-2010 University of Cambridge.
 .fi


Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c    2010-03-30 11:11:52 UTC (rev 512)
+++ code/trunk/pcre_compile.c    2010-05-03 11:13:37 UTC (rev 513)
@@ -3204,19 +3204,18 @@


       /* Backslash may introduce a single character, or it may introduce one
       of the specials, which just set a flag. The sequence \b is a special
-      case. Inside a class (and only there) it is treated as backspace.
-      Elsewhere it marks a word boundary. Other escapes have preset maps ready
-      to 'or' into the one we are building. We assume they have more than one
-      character in them, so set class_charcount bigger than one. */
+      case. Inside a class (and only there) it is treated as backspace. We
+      assume that other escapes have more than one character in them, so set
+      class_charcount bigger than one. Unrecognized escapes fall through and
+      are either treated as literal characters (by default), or are faulted if
+      PCRE_EXTRA is set. */


       if (c == CHAR_BACKSLASH)
         {
         c = check_escape(&ptr, errorcodeptr, cd->bracount, options, TRUE);
         if (*errorcodeptr != 0) goto FAILED;


-        if (-c == ESC_b) c = CHAR_BS;       /* \b is backspace in a class */
-        else if (-c == ESC_X) c = CHAR_X;   /* \X is literal X in a class */
-        else if (-c == ESC_R) c = CHAR_R;   /* \R is literal R in a class */
+        if (-c == ESC_b) c = CHAR_BS;    /* \b is backspace in a class */
         else if (-c == ESC_Q)            /* Handle start of quoted string */
           {
           if (ptr[1] == CHAR_BACKSLASH && ptr[2] == CHAR_E)


Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2010-03-30 11:11:52 UTC (rev 512)
+++ code/trunk/testdata/testinput2    2010-05-03 11:13:37 UTC (rev 513)
@@ -2,12 +2,12 @@
     of PCRE's API, error diagnostics, and the compiled code of some patterns.
     It also checks the non-Perl syntax the PCRE supports (Python, .NET, 
     Oniguruma). Finally, there are some tests where PCRE and Perl differ, 
-    either because PCRE can't be compatible, or there is potential Perl 
+    either because PCRE can't be compatible, or there is a possible Perl 
     bug. --/  


-/-- Originally, the Perl 5.10 things were in here too, but now I have separated
-    many (most?) of them out into test 11. However, there may still be some
-    that were overlooked. --/   
+/-- Originally, the Perl 5.10 and 5.11 things were in here too, but now I have 
+    separated many (most?) of them out into test 11. However, there may still 
+    be some that were overlooked. --/   


/(a)b|/I

@@ -51,6 +51,16 @@

/(?X)[\B]/

+/(?X)[\R]/
+
+/(?X)[\X]/
+
+/[\B]/BZ
+
+/[\R]/BZ
+
+/[\X]/BZ
+
/[z-a]/

/^*/

Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2010-03-30 11:11:52 UTC (rev 512)
+++ code/trunk/testdata/testoutput2    2010-05-03 11:13:37 UTC (rev 513)
@@ -2,12 +2,12 @@
     of PCRE's API, error diagnostics, and the compiled code of some patterns.
     It also checks the non-Perl syntax the PCRE supports (Python, .NET, 
     Oniguruma). Finally, there are some tests where PCRE and Perl differ, 
-    either because PCRE can't be compatible, or there is potential Perl 
+    either because PCRE can't be compatible, or there is a possible Perl 
     bug. --/  


-/-- Originally, the Perl 5.10 things were in here too, but now I have separated
-    many (most?) of them out into test 11. However, there may still be some
-    that were overlooked. --/   
+/-- Originally, the Perl 5.10 and 5.11 things were in here too, but now I have 
+    separated many (most?) of them out into test 11. However, there may still 
+    be some that were overlooked. --/   


/(a)b|/I
Capturing subpattern count = 1
@@ -103,6 +103,36 @@
/(?X)[\B]/
Failed: invalid escape sequence in character class at offset 6

+/(?X)[\R]/
+Failed: invalid escape sequence in character class at offset 6
+
+/(?X)[\X]/
+Failed: invalid escape sequence in character class at offset 6
+
+/[\B]/BZ
+------------------------------------------------------------------
+        Bra
+        B
+        Ket
+        End
+------------------------------------------------------------------
+
+/[\R]/BZ
+------------------------------------------------------------------
+        Bra
+        R
+        Ket
+        End
+------------------------------------------------------------------
+
+/[\X]/BZ
+------------------------------------------------------------------
+        Bra
+        X
+        Ket
+        End
+------------------------------------------------------------------
+
 /[z-a]/
 Failed: range out of order in character class at offset 3