[pcre-dev] [Bug 1894] New: In UTF8 Locale Russian Cyrillic …

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1894] New: In UTF8 Locale Russian Cyrillic [а-я] range contains only 32 of 33 letters
https://bugs.exim.org/show_bug.cgi?id=1894

            Bug ID: 1894
           Summary: In UTF8 Locale Russian Cyrillic [а-я] range contains
                    only 32 of 33 letters
           Product: PCRE
           Version: 8.32
          Hardware: x86-64
                OS: Linux
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
          Assignee: ph10@???
          Reporter: ikonta@???
                CC: pcre-dev@???


Originally find on EL7 (with pcre-8.32) issue seems to be more common.

Modern Russian alphabet contains 33 letters.
Standard UTF8 rage covers 32 of most common, but misses one ('ё').

Standard
U+0410    А
…
U+044F  я


Exceptions:
U+0401    Ё
U+0451    ё
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=1024


[а-я] range should include 'ё' (and [А-Я] — 'Ё') letter, but actually do not.

Forwarded here from downstream tracker, see
https://bugs.php.net/bug.php?id=73251

$valid_string_expr = '/^[а-я]+$/u';
var_dump(preg_match($valid_string_expr, $str));
$str = "ещё";
var_dump(preg_match($valid_string_expr, $str));

Second regexp fails, although should not.

--
You are receiving this mail because:
You are on the CC list for the bug.