[pcre-dev] [Bug 2625] New: Unexpected caseless matching of A…

Top Page

Reply to this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2625] New: Unexpected caseless matching of ASCII "s" when using "[\x{00FF}-\x{FFEE}]" in UTF-16 text
https://bugs.exim.org/show_bug.cgi?id=2625

            Bug ID: 2625
           Summary: Unexpected caseless matching of ASCII "s" when using
                    "[\x{00FF}-\x{FFEE}]" in UTF-16 text
           Product: PCRE
           Version: 10.35 (PCRE2)
          Hardware: x86-64
                OS: All
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
          Assignee: ph10@???
          Reporter: siegel@???
                CC: pcre-dev@???


Using r1267, macOS 10.14.6 (x86_64):

Given this text represented as UTF-16:

    this is a test


Search using this pattern, which as written should *not* match any ASCII
characters:

    [\x{00FF}-\x{FFEE}]


If the pattern was compiled with PCRE2_CASELESS turned on, pcre2_match() will
return a match at the first "s" in the subject text, even though that is
outside the explicit range of characters. (And the uppercase version "S" would
be, as well.)

Further testing shows that "k" and "K" are matching as well, presumably with
the same underlying cause.

Invariant compile options are (PCRE2_UCP | PCRE2_MULTILINE |
PCRE2_AUTO_CALLOUT) and PCRE2_EXTRA_ESCAPED_CR_IS_LF is set in the extra flags
(pcre2_set_compile_extra_options()).

I regret I don't have a trivial test program to demonstrate this, but if you
find that you're not able to reproduce with this, please let me know and I'll
see if I can come up with something.

--
You are receiving this mail because:
You are on the CC list for the bug.