Revision: 326
http://vcs.pcre.org/viewvc?view=rev&revision=326
Author: ph10
Date: 2008-03-08 17:24:02 +0000 (Sat, 08 Mar 2008)
Log Message:
-----------
Craig's patch to the QuoteMeta function in pcrecpp.cc so that it escapes the
NUL character as backslash + 0 rather than backslash + NUL, because PCRE
doesn't support NULs in patterns.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/pcrecpp.cc
code/trunk/pcrecpp.h
code/trunk/pcrecpp_unittest.cc
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2008-03-08 17:13:02 UTC (rev 325)
+++ code/trunk/ChangeLog 2008-03-08 17:24:02 UTC (rev 326)
@@ -29,6 +29,10 @@
5. Added the --include_dir and --exclude_dir patterns to pcregrep, and used
--exclude_dir in the tests to avoid scanning .svn directories.
+
+6. Applied Craig's patch to the QuoteMeta function so that it escapes the
+ NUL character as backslash + 0 rather than backslash + NUL, because PCRE
+ doesn't support NULs in patterns.
Version 7.6 28-Jan-08
Modified: code/trunk/pcrecpp.cc
===================================================================
--- code/trunk/pcrecpp.cc 2008-03-08 17:13:02 UTC (rev 325)
+++ code/trunk/pcrecpp.cc 2008-03-08 17:24:02 UTC (rev 326)
@@ -449,21 +449,27 @@
// Note that it's legal to escape a character even if it has no
// special meaning in a regular expression -- so this function does
// that. (This also makes it identical to the perl function of the
- // same name; see `perldoc -f quotemeta`.)
+ // same name; see `perldoc -f quotemeta`.) The one exception is
+ // escaping NUL: rather than doing backslash + NUL, like perl does,
+ // we do '\0', because pcre itself doesn't take embedded NUL chars.
for (int ii = 0; ii < unquoted.size(); ++ii) {
// Note that using 'isalnum' here raises the benchmark time from
// 32ns to 58ns:
- if ((unquoted[ii] < 'a' || unquoted[ii] > 'z') &&
- (unquoted[ii] < 'A' || unquoted[ii] > 'Z') &&
- (unquoted[ii] < '0' || unquoted[ii] > '9') &&
- unquoted[ii] != '_' &&
- // If this is the part of a UTF8 or Latin1 character, we need
- // to copy this byte without escaping. Experimentally this is
- // what works correctly with the regexp library.
- !(unquoted[ii] & 128)) {
+ if (unquoted[ii] == '\0') {
+ result += "\\0";
+ } else if ((unquoted[ii] < 'a' || unquoted[ii] > 'z') &&
+ (unquoted[ii] < 'A' || unquoted[ii] > 'Z') &&
+ (unquoted[ii] < '0' || unquoted[ii] > '9') &&
+ unquoted[ii] != '_' &&
+ // If this is the part of a UTF8 or Latin1 character, we need
+ // to copy this byte without escaping. Experimentally this is
+ // what works correctly with the regexp library.
+ !(unquoted[ii] & 128)) {
result += '\\';
+ result += unquoted[ii];
+ } else {
+ result += unquoted[ii];
}
- result += unquoted[ii];
}
return result;
Modified: code/trunk/pcrecpp.h
===================================================================
--- code/trunk/pcrecpp.h 2008-03-08 17:13:02 UTC (rev 325)
+++ code/trunk/pcrecpp.h 2008-03-08 17:24:02 UTC (rev 326)
@@ -620,6 +620,9 @@
// 1.5-2.0?
// may become:
// 1\.5\-2\.0\?
+ // Note QuoteMeta behaves the same as perl's QuoteMeta function,
+ // *except* that it escapes the NUL character (\0) as backslash + 0,
+ // rather than backslash + NUL.
static string QuoteMeta(const StringPiece& unquoted);
Modified: code/trunk/pcrecpp_unittest.cc
===================================================================
--- code/trunk/pcrecpp_unittest.cc 2008-03-08 17:13:02 UTC (rev 325)
+++ code/trunk/pcrecpp_unittest.cc 2008-03-08 17:24:02 UTC (rev 326)
@@ -497,6 +497,7 @@
TestQuoteMeta("((a|b)c?d*e+[f-h]i)");
TestQuoteMeta("((?!)xxx).*yyy");
TestQuoteMeta("([");
+ TestQuoteMeta(string("foo\0bar", 7));
}
static void TestQuoteMetaSimpleNegative() {