[pcre-dev] [Bug 2793] Case insensitive search gets exponenti…

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2793] Case insensitive search gets exponentially slower with larger buffers and a specific text file
https://bugs.exim.org/show_bug.cgi?id=2793

--- Comment #6 from Thomas Tempelmann <tempelmann@???> ---
So, here's one way to fix this:

1.

Change

BOOL memchr_not_found_first_cu;
BOOL memchr_not_found_first_cu2;

into

PCRE2_SPTR memchr_found_first_cu;
PCRE2_SPTR memchr_found_first_cu2;

2.

Change

memchr_not_found_first_cu = FALSE;
memchr_not_found_first_cu2 = FALSE;

into

memchr_found_first_cu = NULL;
memchr_found_first_cu2 = NULL;

3.

Change

          if (!memchr_not_found_first_cu)
            {
            pp1 = memchr(start_match, first_cu, end_subject - start_match);
            if (pp1 == NULL) memchr_not_found_first_cu = TRUE;
              else cu2size = pp1 - start_match;
            }


          /* If pp1 is not NULL, we have arranged to search only as far as pp1,
          to see if the other case is earlier, so we can set "not found" only
          when both searches have returned NULL. */


          if (!memchr_not_found_first_cu2)
            {
            pp2 = memchr(start_match, first_cu2, cu2size);
            memchr_not_found_first_cu2 = (pp2 == NULL && pp1 == NULL);
            }


into

          if (start_match <= memchr_found_first_cu) {
            pp1 = memchr_found_first_cu;
            if (pp1 == end_subject) {
              pp1 = NULL;
            }
          } else {
            pp1 = memchr(start_match, first_cu, cu2size);
            if (pp1 == NULL) {
                memchr_found_first_cu = end_subject;
            } else {
                memchr_found_first_cu = pp1;
            }
          }


          /* If pp1 is not NULL, we have arranged to search only as far as pp1,
          to see if the other case is earlier, so we can set "not found" only
          when both searches have returned NULL. */


          if (start_match <= memchr_found_first_cu2) {
            pp2 = memchr_found_first_cu2;
            if (pp2 == end_subject) {
              pp2 = NULL;
            }
          } else {
            pp2 = memchr(start_match, first_cu2, cu2size);
            if (pp2 == NULL) {
              memchr_found_first_cu2 = end_subject;
            } else {
              memchr_found_first_cu2 = pp2;
            }
          }


This means two changes to the algorithm:

1. Instead of using a flag to tell whether memchr() found something, it'll now
store the last found position and re-use that as long as the start_match
pointer is still behind.

2. The size parameter for the second memchr() is not getting reduced any more
(formerly, if the first found a location, the second one would be limited to
search until there). Since we now cache each location, there's no need to
shorten the searches any more.

--
You are receiving this mail because:
You are on the CC list for the bug.