[pcre-dev] [Bug 1719] New: Class containing negated POSIX cl…

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1719] New: Class containing negated POSIX classes with other classes match incorrectly
https://bugs.exim.org/show_bug.cgi?id=1719

            Bug ID: 1719
           Summary: Class containing negated POSIX classes with other
                    classes match incorrectly
           Product: PCRE
           Version: 8.37
          Hardware: x86
                OS: Linux
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
          Assignee: ph10@???
          Reporter: justin.viiret@???
                CC: pcre-dev@???


I have another bug related to POSIX character classes; this one is a little
more involved than the ones I filed yesterday. These were all found with
fuzzer-driven comparison testing against Intel's Hyperscan regex engine.

We have found that PCRE differs from Perl's behaviour in PCRE_UCP mode for some
character classes composed of more than one class (POSIX class, mnemonic like
\d, etc) where one of those classes is a negated POSIX class that does not
become a Unicode property in UCP mode, for example [:^ascii:] or [:^xdigit:].

As a concrete example, it appears that /[[:^ascii:]]/8W behaves effectively as
/[\x{80}-\x{10ffff}]/8, whereas /[[:^ascii:]\w]/8W will not match some non-word
code points. It looks like the latter class is matching against a smaller set
of characters than the former -- perhaps this is some interaction with the
negation at pcre_compile time?

Here is a test case in pcretest format:

/[[:^ascii:]\w]/8W
    a
    9
    g
    \x{100}
    \x{200}
    \x{300}
    \x{37e}


This fails to match the last two cases(\x{300} and \x{37e}) in pcretest, but
returns a match for all test cases in perltest.pl (with 'use utf8; require
Encode;' uncommented). I see similar results for /[[:^xdigit:][:space:]]/8W
with the same test input data.

I have checked with PCRE2 10.20, and it exhibits the same behaviour.

--
You are receiving this mail because:
You are on the CC list for the bug.