------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1492
Summary: First char optimization bug for multi-line pattern
beginning with clist
Product: PCRE
Version: 8.35
Platform: All
OS/Version: All
Status: NEW
Severity: bug
Priority: medium
Component: Code
AssignedTo: ph10@???
ReportedBy: justin.viiret@???
CC: pcre-dev@???
Hi there,
I have encountered a PCRE matching issue that looks to be a bug in determining
the characters used for the "first char" optimization. Prior to PCRE 8.32, the
pattern "/^s?c/mi8W" would match against the data "sc", but later versions do
not match.
Here's a test case (PCRE compiled with Unicode properties support):
PCRE 8.31:
--------
$ ./pcretest -d
PCRE version 8.31 2012-07-06
re> /^s?c/mi8W
------------------------------------------------------------------
0 8 Bra
3 /m ^
4 /i s?
6 /i c
8 8 Ket
11 End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless multiline utf ucp
First char at start or follows newline
Need char = 'c' (caseless)
data> sc
0: sc
--------
PCRE 8.35:
--------
$ ./pcretest -d
PCRE version 8.35 2014-04-04
re> /^s?c/mi8W
------------------------------------------------------------------
0 10 Bra
3 /m ^
4 clist 0053 0073 017f ?+
8 /i c
10 10 Ket
13 End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless multiline utf ucp
First char = 'c' (caseless)
No need char
data> sc
No match
--------
This looks to be specific to characters that expand into a clist in caseless
UTF-8 mode; the same pattern behaves correctly if the "s" is replaced with "a"
or if the UTF-8 flag is removed.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email