[pcre-dev] [Bug 2430] Severe performance decrease in (8-bit)…

Top Page

Reply to this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2430] Severe performance decrease in (8-bit) case-insensitive mode
https://bugs.exim.org/show_bug.cgi?id=2430

--- Comment #1 from Philip Hazel <ph10@???> ---
You missed the leading [ in "aA][bB][cC]" but I assume that's just a typo in
your posting, as it is present in your example. Investigating your patterns
shows that this is an effect caused by an optimization that happens in one
case, but not the other. Actually, it's an optimization that turns into a
pessimization. For (?i)abc PCRE2 records that a match must start with "a" and
there must be a "c" later in the source. For [aA][bB][cC] it records only that
a match must start with "A" or "a". It seems that searching for "c" (which may
be a long way after each "a") is taking up lots of time. (Note, however, that
if you use JIT, the problem doesn't occur.) I will take a look at this - it
occurs to me that searching for a "last fixed character" is a bit pointless
unless there is something variable between it and the first character. Also,
the search should perhaps only search so far after the initial character.

If you turn off the optimizations with NO_START_OPTIMIZE the two patterns
behave much the same.

--
You are receiving this mail because:
You are on the CC list for the bug.