[pcre-dev] [Bug 1492] New: First char optimization bug for …

Αρχική Σελίδα
Delete this message
Συντάκτης: Justin Viiret
Ημερομηνία:  
Προς: pcre-dev
Αντικείμενο: [pcre-dev] [Bug 1492] New: First char optimization bug for multi-line pattern beginning with clist
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1492
           Summary: First char optimization bug for multi-line pattern
                    beginning with clist
           Product: PCRE
           Version: 8.35
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: justin.viiret@???
                CC: pcre-dev@???



Hi there,

I have encountered a PCRE matching issue that looks to be a bug in determining
the characters used for the "first char" optimization. Prior to PCRE 8.32, the
pattern "/^s?c/mi8W" would match against the data "sc", but later versions do
not match.

Here's a test case (PCRE compiled with Unicode properties support):

PCRE 8.31:
--------
$ ./pcretest -d
PCRE version 8.31 2012-07-06

re> /^s?c/mi8W

------------------------------------------------------------------
  0   8 Bra
  3  /m ^
  4  /i s?
  6  /i c
  8   8 Ket
 11     End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless multiline utf ucp
First char at start or follows newline
Need char = 'c' (caseless)

data> sc

0: sc
--------


PCRE 8.35:
--------
$ ./pcretest -d
PCRE version 8.35 2014-04-04

re> /^s?c/mi8W

------------------------------------------------------------------
  0  10 Bra
  3  /m ^
  4     clist 0053 0073 017f ?+
  8  /i c
 10  10 Ket
 13     End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless multiline utf ucp
First char = 'c' (caseless)
No need char

data> sc

No match
--------

This looks to be specific to characters that expand into a clist in caseless
UTF-8 mode; the same pattern behaves correctly if the "s" is replaced with "a"
or if the UTF-8 flag is removed.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email