------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1074
Summary: Incorrect length check in match_ref(...)
Product: PCRE
Version: 8.11
Platform: All
OS/Version: All
Status: NEW
Severity: bug
Priority: medium
Component: Code
AssignedTo: ph10@???
ReportedBy: hzmester@???
CC: pcre-dev@???
I had a discussion with Philip and it turned out that some unicode
uppercase-lowercase pairs have different length, like the following pair: 570 -
11365 (I have no idea about the glyph, but it doesn't matter).
Their utf8 representation (in C hecxa character form):
\xc8\xba = 570 \xe2\xb1\xa5 = 11365
The following regular expression incorrectly reports a match:
const char* pattern = "(\xc8\xba\xc8\xba\xc8\xba)?\\1"
on string:
const char* input = "\xc8\xba\xc8\xba\xc8\xba\xe2\xb1\xa5\xe2\xb1\xa5"
The input is basically the char 570 repeated 3 times, and char 11365 repeated
twice. The pattern also contans char 570 repeated 3 times.
Output: 0, 12, 0, 6
Actually match_ref do an early byte-length check, which is invalid in this case
since the length of three '570' is the same as two '11365'.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email