[pcre-dev] [Bug 1894] In UTF8 Locale Russian Cyrillic [а-я] range contains only 32 of 33 letters

Lähettäjä: admin
Päiväys:
Vastaanottaja: pcre-dev
Aihe: [pcre-dev] [Bug 1894] In UTF8 Locale Russian Cyrillic [а-я] range contains only 32 of 33 letters

https://bugs.exim.org/show_bug.cgi?id=1894

Petr Pisar <ppisar@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ppisar@???

--- Comment #1 from Petr Pisar <ppisar@???> ---
The [Ð°-Ñ] range does not mean all Cyrillic symbols. It means Unicode character
from range U+0430 to U+044F. And as you correctly noted, Ñ is out of the range
(U+0451). Therefore it should not match:

printf '/[Ð°-Ñ]*/8\nÐµÑÑ\n' | pcretest
PCRE version 8.39 2016-06-14

re> data> 0: \x{435}\x{449}
data>

If you extend the range up to Ñ, it will match:

$ printf '/[Ð°-Ñ]*/8\nÐµÑÑ\n' | pcretest
PCRE version 8.39 2016-06-14

re> data> 0: \x{435}\x{449}\x{451}
data>

But there is a better way: Instead of Unicode range you can use Unicode script
name. This is because sometimes the Unicode ranges contain characters from
foreign scripts or an unassigned code points:

$ printf '/\p{Cyrillic}*/8\nÐµÑÑ\n' | pcretest
PCRE version 8.39 2016-06-14

re> data> 0: \x{435}\x{449}\x{451}
data>

--
You are receiving this mail because:
You are on the CC list for the bug.

Tämä viesti kuuluu seuraavaan säikeeseen:
	Näytä säikeen viestit aikajärjestyksessä
	admin
	admin

[pcre-dev] [Bug 1894] In UTF8 Locale Russian Cyrillic [а-я]…