Revision: 513
http://vcs.pcre.org/viewvc?view=rev&revision=513
Author: ph10
Date: 2010-05-03 12:13:37 +0100 (Mon, 03 May 2010)
Log Message:
-----------
Make \R and \X in a character class behave more like Perl
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/configure.ac
code/trunk/doc/pcreapi.3
code/trunk/doc/pcrepattern.3
code/trunk/pcre_compile.c
code/trunk/testdata/testinput2
code/trunk/testdata/testoutput2
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2010-03-30 11:11:52 UTC (rev 512)
+++ code/trunk/ChangeLog 2010-05-03 11:13:37 UTC (rev 513)
@@ -1,7 +1,7 @@
ChangeLog for PCRE
------------------
-Version 8.03 26-Mar-2010
+Version 8.10 03 May-2010
------------------------
1. Added support for (*MARK:ARG) and for ARG additions to PRUNE, SKIP, and
@@ -9,7 +9,15 @@
2. (*ACCEPT) was not working when inside an atomic group.
+3. Inside a character class, \B is treated as a literal by default, but
+ faulted if PCRE_EXTRA is set. This mimics Perl's behaviour (the -w option
+ causes the error). The code is unchanged, but I tidied the documentation.
+
+4. Inside a character class, PCRE always treated \R and \X as literals,
+ whereas Perl faults them if its -w option is set. I have changed PCRE so
+ that it faults them when PCRE_EXTRA is set.
+
Version 8.02 19-Mar-2010
------------------------
Modified: code/trunk/configure.ac
===================================================================
--- code/trunk/configure.ac 2010-03-30 11:11:52 UTC (rev 512)
+++ code/trunk/configure.ac 2010-05-03 11:13:37 UTC (rev 513)
@@ -9,9 +9,9 @@
dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre_major, [8])
-m4_define(pcre_minor, [03])
-m4_define(pcre_prerelease, [-TEST-2])
-m4_define(pcre_date, [2010-03-30])
+m4_define(pcre_minor, [10])
+m4_define(pcre_prerelease, [-RC1])
+m4_define(pcre_date, [2010-05-03])
# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [0:1:0])
Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3 2010-03-30 11:11:52 UTC (rev 512)
+++ code/trunk/doc/pcreapi.3 2010-05-03 11:13:37 UTC (rev 513)
@@ -553,8 +553,9 @@
special meaning causes an error, thus reserving these combinations for future
expansion. By default, as in Perl, a backslash followed by a letter with no
special meaning is treated as a literal. (Perl can, however, be persuaded to
-give a warning for this.) There are at present no other features controlled by
-this option. It can also be set by a (?X) option setting within a pattern.
+give an error for this, by running it with the -w option.) There are at present
+no other features controlled by this option. It can also be set by a (?X)
+option setting within a pattern.
.sp
PCRE_FIRSTLINE
.sp
@@ -2099,6 +2100,6 @@
.rs
.sp
.nf
-Last updated: 26 March 2010
+Last updated: 03 May 2010
Copyright (c) 1997-2010 University of Cambridge.
.fi
Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3 2010-03-30 11:11:52 UTC (rev 512)
+++ code/trunk/doc/pcrepattern.3 2010-05-03 11:13:37 UTC (rev 513)
@@ -295,10 +295,11 @@
.P
All the sequences that define a single character value can be used both inside
and outside character classes. In addition, inside a character class, the
-sequence \eb is interpreted as the backspace character (hex 08), and the
-sequences \eR and \eX are interpreted as the characters "R" and "X",
-respectively. Outside a character class, these sequences have different
-meanings
+sequence \eb is interpreted as the backspace character (hex 08). The sequences
+\eB, \eR, and \eX are not special inside a character class. Like any other
+unrecognized escape sequences, they are treated as the literal characters "B",
+"R", and "X" by default, but cause an error if the PCRE_EXTRA option is set.
+Outside a character class, these sequences have different meanings
.\" HTML <a href="#uniextseq">
.\" </a>
(see below).
@@ -478,7 +479,9 @@
.sp
(*ANY)(*BSR_ANYCRLF)
.sp
-Inside a character class, \eR matches the letter "R".
+Inside a character class, \eR is treated as an unrecognized escape sequence,
+and so matches the letter "R" by default, but causes an error if PCRE_EXTRA is
+set.
.
.
.\" HTML <a name="uniextseq"></a>
@@ -765,8 +768,11 @@
\ez matches only at the end of the subject
\eG matches at the first matching position in the subject
.sp
-These assertions may not appear in character classes (but note that \eb has a
-different meaning, namely the backspace character, inside a character class).
+Inside a character class, \eb has a different meaning; it matches the backspace
+character. If any other of these assertions appears in a character class, by
+default it matches the corresponding literal character (for example, \eB
+matches the letter B). However, if the PCRE_EXTRA option is set, an "invalid
+escape sequence" error is generated instead.
.P
A word boundary is a position in the subject string where the current character
and the previous character do not both match \ew or \eW (i.e. one matches
@@ -2580,6 +2586,6 @@
.rs
.sp
.nf
-Last updated: 27 March 2010
+Last updated: 03 May 2010
Copyright (c) 1997-2010 University of Cambridge.
.fi
Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c 2010-03-30 11:11:52 UTC (rev 512)
+++ code/trunk/pcre_compile.c 2010-05-03 11:13:37 UTC (rev 513)
@@ -3204,19 +3204,18 @@
/* Backslash may introduce a single character, or it may introduce one
of the specials, which just set a flag. The sequence \b is a special
- case. Inside a class (and only there) it is treated as backspace.
- Elsewhere it marks a word boundary. Other escapes have preset maps ready
- to 'or' into the one we are building. We assume they have more than one
- character in them, so set class_charcount bigger than one. */
+ case. Inside a class (and only there) it is treated as backspace. We
+ assume that other escapes have more than one character in them, so set
+ class_charcount bigger than one. Unrecognized escapes fall through and
+ are either treated as literal characters (by default), or are faulted if
+ PCRE_EXTRA is set. */
if (c == CHAR_BACKSLASH)
{
c = check_escape(&ptr, errorcodeptr, cd->bracount, options, TRUE);
if (*errorcodeptr != 0) goto FAILED;
- if (-c == ESC_b) c = CHAR_BS; /* \b is backspace in a class */
- else if (-c == ESC_X) c = CHAR_X; /* \X is literal X in a class */
- else if (-c == ESC_R) c = CHAR_R; /* \R is literal R in a class */
+ if (-c == ESC_b) c = CHAR_BS; /* \b is backspace in a class */
else if (-c == ESC_Q) /* Handle start of quoted string */
{
if (ptr[1] == CHAR_BACKSLASH && ptr[2] == CHAR_E)
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2010-03-30 11:11:52 UTC (rev 512)
+++ code/trunk/testdata/testinput2 2010-05-03 11:13:37 UTC (rev 513)
@@ -2,12 +2,12 @@
of PCRE's API, error diagnostics, and the compiled code of some patterns.
It also checks the non-Perl syntax the PCRE supports (Python, .NET,
Oniguruma). Finally, there are some tests where PCRE and Perl differ,
- either because PCRE can't be compatible, or there is potential Perl
+ either because PCRE can't be compatible, or there is a possible Perl
bug. --/
-/-- Originally, the Perl 5.10 things were in here too, but now I have separated
- many (most?) of them out into test 11. However, there may still be some
- that were overlooked. --/
+/-- Originally, the Perl 5.10 and 5.11 things were in here too, but now I have
+ separated many (most?) of them out into test 11. However, there may still
+ be some that were overlooked. --/
/(a)b|/I
@@ -51,6 +51,16 @@
/(?X)[\B]/
+/(?X)[\R]/
+
+/(?X)[\X]/
+
+/[\B]/BZ
+
+/[\R]/BZ
+
+/[\X]/BZ
+
/[z-a]/
/^*/
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2010-03-30 11:11:52 UTC (rev 512)
+++ code/trunk/testdata/testoutput2 2010-05-03 11:13:37 UTC (rev 513)
@@ -2,12 +2,12 @@
of PCRE's API, error diagnostics, and the compiled code of some patterns.
It also checks the non-Perl syntax the PCRE supports (Python, .NET,
Oniguruma). Finally, there are some tests where PCRE and Perl differ,
- either because PCRE can't be compatible, or there is potential Perl
+ either because PCRE can't be compatible, or there is a possible Perl
bug. --/
-/-- Originally, the Perl 5.10 things were in here too, but now I have separated
- many (most?) of them out into test 11. However, there may still be some
- that were overlooked. --/
+/-- Originally, the Perl 5.10 and 5.11 things were in here too, but now I have
+ separated many (most?) of them out into test 11. However, there may still
+ be some that were overlooked. --/
/(a)b|/I
Capturing subpattern count = 1
@@ -103,6 +103,36 @@
/(?X)[\B]/
Failed: invalid escape sequence in character class at offset 6
+/(?X)[\R]/
+Failed: invalid escape sequence in character class at offset 6
+
+/(?X)[\X]/
+Failed: invalid escape sequence in character class at offset 6
+
+/[\B]/BZ
+------------------------------------------------------------------
+ Bra
+ B
+ Ket
+ End
+------------------------------------------------------------------
+
+/[\R]/BZ
+------------------------------------------------------------------
+ Bra
+ R
+ Ket
+ End
+------------------------------------------------------------------
+
+/[\X]/BZ
+------------------------------------------------------------------
+ Bra
+ X
+ Ket
+ End
+------------------------------------------------------------------
+
/[z-a]/
Failed: range out of order in character class at offset 3