[Pcre-svn] [629] code/trunk: Fail hyphen after POSIX charact…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [629] code/trunk: Fail hyphen after POSIX character class.
Revision: 629
          http://www.exim.org/viewvc/pcre2?view=rev&revision=629
Author:   ph10
Date:     2016-12-27 11:50:28 +0000 (Tue, 27 Dec 2016)
Log Message:
-----------
Fail hyphen after POSIX character class.


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcre2pattern.3
    code/trunk/src/pcre2_compile.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput2


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2016-12-26 17:11:18 UTC (rev 628)
+++ code/trunk/ChangeLog    2016-12-27 11:50:28 UTC (rev 629)
@@ -37,6 +37,10 @@
   (f) When testing zero-terminated patterns under valgrind, the terminating
       zero is now marked "no access". This catches bugs that would otherwise
       show up only with non-zero-terminated patterns.
+      
+  (g) A hyphen appearing immediately after a POSIX character class (for example 
+      /[[:ascii:]-z]/) now generates an error. Perl does accept this as a 
+      literal, but gives a warning, so it seems best to fail it in PCRE. 


One effect of the refactoring is that some error numbers and messages have
changed, and the pattern offset given for compiling errors is not always the

Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3    2016-12-26 17:11:18 UTC (rev 628)
+++ code/trunk/doc/pcre2pattern.3    2016-12-27 11:50:28 UTC (rev 629)
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "23 December 2016" "PCRE2 10.23"
+.TH PCRE2PATTERN 3 "27 December 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -1352,10 +1352,10 @@
 or immediately after a range. For example, [b-d-z] matches letters in the range
 b to d, a hyphen character, or z.
 .P
-Perl treats a hyphen as a literal if it appears before a POSIX class (see
-below) or a character type escape such as as \ed, but gives a warning in its 
-warning mode, as this is most likely a user error. As PCRE2 has no facility for
-warning, an error is given in these cases.
+Perl treats a hyphen as a literal if it appears before or after a POSIX class
+(see below) or a character type escape such as as \ed, but gives a warning in
+its warning mode, as this is most likely a user error. As PCRE2 has no facility
+for warning, an error is given in these cases.
 .P
 It is not possible to have the literal character "]" as the end character of a
 range. A pattern such as [W-]46] is interpreted as a class of two characters
@@ -3482,6 +3482,6 @@
 .rs
 .sp
 .nf
-Last updated: 23 December 2016
+Last updated: 27 December 2016
 Copyright (c) 1997-2016 University of Cambridge.
 .fi


Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c    2016-12-26 17:11:18 UTC (rev 628)
+++ code/trunk/src/pcre2_compile.c    2016-12-27 11:50:28 UTC (rev 629)
@@ -2992,6 +2992,17 @@
           goto FAILED;
           }
         ptr = tempptr + 2;
+        
+        /* Perl treats a hyphen after a POSIX class as a literal, not the
+        start of a range. However, it gives a warning in its warning mode. PCRE
+        does not have a warning mode, so we give an error, because this is
+        likely an error on the user's part. */
+        
+        if (ptr < ptrend && *ptr == CHAR_MINUS)
+          {
+          errorcode = ERR50;
+          goto FAILED;
+          }     


         /* When PCRE2_UCP is set, some of the POSIX classes are converted to
         use Unicode properties \p or \P or, in one case, \h or \H. The
@@ -5003,7 +5014,7 @@
           {
 #ifdef DEBUG_SHOW_PARSED
           fprintf(stderr, "** Unrecognized parsed pattern item 0x%.8x "
-                          "in character class", meta);
+                          "in character class\n", meta);
 #endif
           *errorcodeptr = ERR89;  /* Internal error - unrecognized. */
           return 0;


Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2016-12-26 17:11:18 UTC (rev 628)
+++ code/trunk/testdata/testinput2    2016-12-27 11:50:28 UTC (rev 629)
@@ -4950,4 +4950,6 @@
 /.+(?(?C'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'))?!XXXX.=X/
     .+(?(?C'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'))?!XXXX.=X


+/[:[:alnum:]-[[a:lnum:]+/
+
# End of testinput2

Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2016-12-26 17:11:18 UTC (rev 628)
+++ code/trunk/testdata/testoutput2    2016-12-27 11:50:28 UTC (rev 629)
@@ -15431,6 +15431,9 @@
 Failed: error 128 at offset 63: assertion expected after (?( or (?(?C)
     .+(?(?C'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'))?!XXXX.=X


+/[:[:alnum:]-[[a:lnum:]+/
+Failed: error 150 at offset 11: invalid range in character class
+
# End of testinput2
Error -63: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data