[pcre-dev] [Bug 669] New: pcrecpp:: QuoteMeta does not behav…

Top Page
Delete this message
Author: Yvan Seth
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 669] New: pcrecpp:: QuoteMeta does not behave as expected when there are embedded null bytes
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=669
           Summary: pcrecpp::QuoteMeta does not behave as expected when
                    there are embedded null bytes
           Product: PCRE
           Version: 7.2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: bugs.exim.org@???
                CC: pcre-dev@???



Given that pcrecpp API provides a QuoteMeta one would expect that the "quoted"
result for a string with an embedded null byte would create an expression that
will match null bytes when used to create a pcrecpp::RE. However, this is not
the case. True to its documentation QuoteMeta does the same as perl, it just
backwhacks any non-alphanum bytes. This leaves null bytes as null bytes with a
'\' in front of them. In perl this string will actually work as intended when
used in an RE, however in pcrecpp it doesn't.

The problem is the Compile method in pcrecpp.cc uses the .c_str() value to feed
pcre_compile. Now, as far as I can tell, libpcre uses NULL terminated strings
to represent regular expressions and that's the only choice, so the null must
be more solidly escaped.

Some approaches to making life easier for API users would be:
   1) Document that std::string instances with embedded null
      bytes are not supported, with a note that QuoteMeta
      does not quote them in a way that works (but this is
      not a QuoteMeta bug since it works as documented:
      exactly like Perl's quotemeta.)
   2) The pcrecpp interface should magically translate embedded
      null to \x00 so that libpcre matches on them as expected.
   3) As a sort of a fusion between the above: Implement a version
      of QuoteMeta that does not work exactly as Perl's quotemeta.
      The difference being that \xHH escapes are used.


I think 3) is the safest and most friendly to users, it also wont break
existing code and should be a binary-compatible change. It documents the
shortcoming of QuoteMeta /and/ provides a solution ;) The diff I have attached
is a possible implementation of 3). Diff taken as:
diff -ur pcre-7.6.orig/ pcre-7.6/

Changes include:
* New QuoteMetaImpl method static to pcrecpp.cc where implementation is.
* Original QuoteMeta calls QuoteMetaImpl
* New QuoteMetaHex method declared and documented in pcrecpp.h
* QuoteMetaHex calls QuoteMetaImpl
* QuoteMetaHex tests added to pcrecpp_unittest.cpp

The original QuoteMeta seems concerned about performance. This mod should add
a few more cycles to the processing so could be a concern in that sense. (In
which case just keep the old QuoteMeta and make QuoteMetaImpl, minus hex
conditionals, the body of QuoteMetaHex... I guess.)

The unit tests are a hack on top of the existing QuoteMeta unit tests... was
the easiest way to go with the least duplicated code.



This is a quick example of the problem:
-----------------------------------------------------------------------
:; cat quotemeta.cc
#include <pcrecpp.h>
#include <string>
#include <iostream>

int main(void)
{
    std::string unquoted("foo");
    unquoted.push_back('\0');
    unquoted.append("bar");
    std::string autoquoted = pcrecpp::RE::QuoteMeta(unquoted);
    std::string manualquoted("foo\\x00bar");
    std::cout << "Auto quoted version is: " << autoquoted << std::endl;
    std::cout << "Auto match result is: " <<
pcrecpp::RE(autoquoted).FullMatch(unquoted) << std::endl;
    std::cout << "Manual quoted version is: " << manualquoted << std::endl;
    std::cout << "Manual match result is: " <<
pcrecpp::RE(manualquoted).FullMatch(unquoted) << std::endl;
    return 0;
}
:; g++ quotemeta.cc -o quotemeta -lpcrecpp
:; ./quotemeta 
Auto quoted version is: foo\bar
Auto match result is: 0
Manual quoted version is: foo\x00bar
Manual match result is: 1
:;./quotemeta | xxd
0000000: 4175 746f 2071 756f 7465 6420 7665 7273  Auto quoted vers
0000010: 696f 6e20 6973 3a20 666f 6f5c 0062 6172  ion is: foo\.bar
0000020: 0a41 7574 6f20 6d61 7463 6820 7265 7375  .Auto match resu
0000030: 6c74 2069 733a 2030 0a4d 616e 7561 6c20  lt is: 0.Manual 
0000040: 7175 6f74 6564 2076 6572 7369 6f6e 2069  quoted version i
0000050: 733a 2066 6f6f 5c78 3030 6261 720a 4d61  s: foo\x00bar.Ma
0000060: 6e75 616c 206d 6174 6368 2072 6573 756c  nual match resul
0000070: 7420 6973 3a20 310a                      t is: 1.
-----------------------------------------------------------------------



--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email