[pcre-dev] [Bug 1596] New: Character classes in caseless utf…

Top Page
Delete this message
Author: Alex Coyte
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1596] New: Character classes in caseless utf8 mode missing elements
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1596
           Summary: Character classes in caseless utf8 mode missing elements
           Product: PCRE
           Version: 8.36
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: a.coyte@???
                CC: pcre-dev@???



It appears that while in caseless utf8 mode, it is possible for character
classes to be missing the lowercase version of some ascii characters.

For example, /[A-`]/i8 does not match a-j:

./pcretest -d
PCRE version 8.36 2014-09-26

re> /[A-`]/i8

------------------------------------------------------------------
  0  47 Bra
  3     [A-`k-z\x{212a}\x{17f}]
 47  47 Ket
 50     End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf
No first char
No need char

data> abcdefghijklmno

0: k
data>



I believe that this may be due to the code in add_to_class() to extend ranges:
it updates the end value after classbits_end has already been initialised using
the original end value.

10.00 and 10.10RC2 look to have the same behaviour.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email