------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1596
Summary: Character classes in caseless utf8 mode missing elements
Product: PCRE
Version: 8.36
Platform: Other
OS/Version: Linux
Status: NEW
Severity: bug
Priority: medium
Component: Code
AssignedTo: ph10@???
ReportedBy: a.coyte@???
CC: pcre-dev@???
It appears that while in caseless utf8 mode, it is possible for character
classes to be missing the lowercase version of some ascii characters.
For example, /[A-`]/i8 does not match a-j:
./pcretest -d
PCRE version 8.36 2014-09-26
re> /[A-`]/i8
------------------------------------------------------------------
0 47 Bra
3 [A-`k-z\x{212a}\x{17f}]
47 47 Ket
50 End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf
No first char
No need char
data> abcdefghijklmno
0: k
data>
I believe that this may be due to the code in add_to_class() to extend ranges:
it updates the end value after classbits_end has already been initialised using
the original end value.
10.00 and 10.10RC2 look to have the same behaviour.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email