------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=976
Summary: pcre_exec after pcre_study work incorrectly.
Product: PCRE
Version: N/A
Platform: Other
OS/Version: Windows
Status: NEW
Severity: bug
Priority: medium
Component: Code
AssignedTo: ph10@???
ReportedBy: pretender@???
CC: pcre-dev@???
With 17 UTF-8 letters of Cyrillic block pcre_exec after pcre_study works
incorrectly if options PCRE_UTF8 and PCRE_CASELESS are used. Simple code
exapmle to demonstrate this problem:
#include <pcre.h>
#include <iostream>
int pcre_test(const char* pattern, const char* str, bool study)
{
const char* err = NULL;
int num = 0;
const int options = PCRE_UTF8 | PCRE_CASELESS;
pcre* regexp = pcre_compile(pattern, options, &err, &num, NULL);
pcre_extra* extra = study ? pcre_study(regexp, 0, &err) : NULL;
int vec[256] = {0};
const int res = pcre_exec(regexp, extra, str, 2, 0, 0, vec,
sizeof(vec)/sizeof(int));
pcre_free(regexp);
pcre_free(extra);
return res;
}
int main()
{
const char* upc8[] = {"\xd0\x81", "\xd0\xa0", "\xd0\xa1", "\xd0\xa2",
"\xd0\xa3",
"\xd0\xa4", "\xd0\xa5", "\xd0\xa6", "\xd0\xa7",
"\xd0\xa8",
"\xd0\xa9", "\xd0\xac", "\xd0\xab", "\xd0\xaa",
"\xd0\xad",
"\xd0\xae", "\xd0\xaf" };
const char* lowc8[] = {"\xd1\x91", "\xd1\x80", "\xd1\x81", "\xd1\x82",
"\xd1\x83",
"\xd1\x84", "\xd1\x85", "\xd1\x86", "\xd1\x87",
"\xd1\x88",
"\xd1\x89", "\xd1\x8c", "\xd1\x8b", "\xd1\x8a",
"\xd1\x8d",
"\xd1\x8e", "\xd1\x8f" };
for (size_t ii = 0; ii < sizeof(upc8)/sizeof(upc8[0]); ++ii)
{
if (pcre_test(lowc8[ii], upc8[ii], true) != pcre_test(lowc8[ii],
upc8[ii], false))
{
std::cout << "!?\n";
}
}
return 0;
}
17 !? will be printed.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email