Revision: 629
http://www.exim.org/viewvc/pcre2?view=rev&revision=629
Author: ph10
Date: 2016-12-27 11:50:28 +0000 (Tue, 27 Dec 2016)
Log Message:
-----------
Fail hyphen after POSIX character class.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/doc/pcre2pattern.3
code/trunk/src/pcre2_compile.c
code/trunk/testdata/testinput2
code/trunk/testdata/testoutput2
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2016-12-26 17:11:18 UTC (rev 628)
+++ code/trunk/ChangeLog 2016-12-27 11:50:28 UTC (rev 629)
@@ -37,6 +37,10 @@
(f) When testing zero-terminated patterns under valgrind, the terminating
zero is now marked "no access". This catches bugs that would otherwise
show up only with non-zero-terminated patterns.
+
+ (g) A hyphen appearing immediately after a POSIX character class (for example
+ /[[:ascii:]-z]/) now generates an error. Perl does accept this as a
+ literal, but gives a warning, so it seems best to fail it in PCRE.
One effect of the refactoring is that some error numbers and messages have
changed, and the pattern offset given for compiling errors is not always the
Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3 2016-12-26 17:11:18 UTC (rev 628)
+++ code/trunk/doc/pcre2pattern.3 2016-12-27 11:50:28 UTC (rev 629)
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "23 December 2016" "PCRE2 10.23"
+.TH PCRE2PATTERN 3 "27 December 2016" "PCRE2 10.23"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -1352,10 +1352,10 @@
or immediately after a range. For example, [b-d-z] matches letters in the range
b to d, a hyphen character, or z.
.P
-Perl treats a hyphen as a literal if it appears before a POSIX class (see
-below) or a character type escape such as as \ed, but gives a warning in its
-warning mode, as this is most likely a user error. As PCRE2 has no facility for
-warning, an error is given in these cases.
+Perl treats a hyphen as a literal if it appears before or after a POSIX class
+(see below) or a character type escape such as as \ed, but gives a warning in
+its warning mode, as this is most likely a user error. As PCRE2 has no facility
+for warning, an error is given in these cases.
.P
It is not possible to have the literal character "]" as the end character of a
range. A pattern such as [W-]46] is interpreted as a class of two characters
@@ -3482,6 +3482,6 @@
.rs
.sp
.nf
-Last updated: 23 December 2016
+Last updated: 27 December 2016
Copyright (c) 1997-2016 University of Cambridge.
.fi
Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c 2016-12-26 17:11:18 UTC (rev 628)
+++ code/trunk/src/pcre2_compile.c 2016-12-27 11:50:28 UTC (rev 629)
@@ -2992,6 +2992,17 @@
goto FAILED;
}
ptr = tempptr + 2;
+
+ /* Perl treats a hyphen after a POSIX class as a literal, not the
+ start of a range. However, it gives a warning in its warning mode. PCRE
+ does not have a warning mode, so we give an error, because this is
+ likely an error on the user's part. */
+
+ if (ptr < ptrend && *ptr == CHAR_MINUS)
+ {
+ errorcode = ERR50;
+ goto FAILED;
+ }
/* When PCRE2_UCP is set, some of the POSIX classes are converted to
use Unicode properties \p or \P or, in one case, \h or \H. The
@@ -5003,7 +5014,7 @@
{
#ifdef DEBUG_SHOW_PARSED
fprintf(stderr, "** Unrecognized parsed pattern item 0x%.8x "
- "in character class", meta);
+ "in character class\n", meta);
#endif
*errorcodeptr = ERR89; /* Internal error - unrecognized. */
return 0;
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2016-12-26 17:11:18 UTC (rev 628)
+++ code/trunk/testdata/testinput2 2016-12-27 11:50:28 UTC (rev 629)
@@ -4950,4 +4950,6 @@
/.+(?(?C'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'))?!XXXX.=X/
.+(?(?C'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'))?!XXXX.=X
+/[:[:alnum:]-[[a:lnum:]+/
+
# End of testinput2
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2016-12-26 17:11:18 UTC (rev 628)
+++ code/trunk/testdata/testoutput2 2016-12-27 11:50:28 UTC (rev 629)
@@ -15431,6 +15431,9 @@
Failed: error 128 at offset 63: assertion expected after (?( or (?(?C)
.+(?(?C'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'))?!XXXX.=X
+/[:[:alnum:]-[[a:lnum:]+/
+Failed: error 150 at offset 11: invalid range in character class
+
# End of testinput2
Error -63: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data