[pcre-dev] [Bug 976] New: pcre_exec after pcre_study work in…

Top Page
Delete this message
Author: Max Lukashenya
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 976] New: pcre_exec after pcre_study work incorrectly.
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=976
           Summary: pcre_exec after pcre_study work incorrectly.
           Product: PCRE
           Version: N/A
          Platform: Other
        OS/Version: Windows
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: pretender@???
                CC: pcre-dev@???



With 17 UTF-8 letters of Cyrillic block pcre_exec after pcre_study works
incorrectly if options PCRE_UTF8 and PCRE_CASELESS are used. Simple code
exapmle to demonstrate this problem:

#include <pcre.h>
#include <iostream>

int pcre_test(const char* pattern, const char* str, bool study)
{
       const char* err = NULL;
       int num = 0;
       const int options = PCRE_UTF8 | PCRE_CASELESS;


       pcre* regexp = pcre_compile(pattern, options, &err, &num, NULL);
       pcre_extra* extra = study ? pcre_study(regexp, 0, &err) : NULL;


       int vec[256] = {0};
       const int res = pcre_exec(regexp, extra, str, 2, 0, 0, vec,
sizeof(vec)/sizeof(int));


       pcre_free(regexp);
       pcre_free(extra);


       return res;
}


int main()
{
       const char* upc8[]  = {"\xd0\x81", "\xd0\xa0", "\xd0\xa1", "\xd0\xa2",
"\xd0\xa3",
                              "\xd0\xa4", "\xd0\xa5", "\xd0\xa6", "\xd0\xa7",
"\xd0\xa8",
                              "\xd0\xa9", "\xd0\xac", "\xd0\xab", "\xd0\xaa",
"\xd0\xad",
                              "\xd0\xae", "\xd0\xaf" };


       const char* lowc8[] = {"\xd1\x91", "\xd1\x80", "\xd1\x81", "\xd1\x82",
"\xd1\x83",
                              "\xd1\x84", "\xd1\x85", "\xd1\x86", "\xd1\x87",
"\xd1\x88",
                              "\xd1\x89", "\xd1\x8c", "\xd1\x8b", "\xd1\x8a",
"\xd1\x8d",
                              "\xd1\x8e", "\xd1\x8f" };


       for (size_t ii = 0; ii < sizeof(upc8)/sizeof(upc8[0]); ++ii)
       {
               if (pcre_test(lowc8[ii], upc8[ii], true) != pcre_test(lowc8[ii],
upc8[ii], false))
               {
                       std::cout << "!?\n";
               }
       }


       return 0;
}


17 !? will be printed.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email