[pcre-dev] [Bug 1288] New: Performance issue using C++ inter…

Top Page
Delete this message
Author: Tóth Tamás
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1288] New: Performance issue using C++ interface with UTF-8 enabled
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1288
           Summary: Performance issue using C++ interface with UTF-8 enabled
           Product: PCRE
           Version: N/A
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: bug
          Priority: high
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: toth.tamas@???
                CC: pcre-dev@???



In RE::TryMatch method at pcrecpp.cc line 524 (in version 8.31) exec options
defaults to 0 and later only PCRE_ANCHORED and PCRE_NOTEMPTY can be added. This
way there is no way to set PCRE_NO_UTF8_CHECK flag in exec options which can
cause serious performance issues processing large files.

My suggested patch is changing line

    int options = 0;


to

    int options = (options_.all_options() & PCRE_NO_UTF8_CHECK);


This way no interface modification is required and UTF-8 checking automatically
skipped if user has set the PCRE_NO_UTF8_CHECK in RE class. Not setting this in
RE will cause minimal performance loss during regexp parsing and keeps exec
safe.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email