[pcre-dev] [Bug 2430] Severe performance decrease in (8-bit)…

Top Page

Reply to this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2430] Severe performance decrease in (8-bit) case-insensitive mode
https://bugs.exim.org/show_bug.cgi?id=2430

--- Comment #2 from Andreas Bergmann <andreas.bergmann@???> ---
(In reply to Philip Hazel from comment #1)
> You missed the leading [ in "aA][bB][cC]" but I assume that's just a typo in
> your posting, as it is present in your example. Investigating your patterns
> shows that this is an effect caused by an optimization that happens in one
> case, but not the other. Actually, it's an optimization that turns into a
> pessimization. For (?i)abc PCRE2 records that a match must start with "a"
> and there must be a "c" later in the source. For [aA][bB][cC] it records
> only that a match must start with "A" or "a". It seems that searching for
> "c" (which may be a long way after each "a") is taking up lots of time.
> (Note, however, that if you use JIT, the problem doesn't occur.) I will take
> a look at this - it occurs to me that searching for a "last fixed character"
> is a bit pointless unless there is something variable between it and the
> first character. Also, the search should perhaps only search so far after
> the initial character.
>
> If you turn off the optimizations with NO_START_OPTIMIZE the two patterns
> behave much the same.


Thank you for your feedback - and true, the missing "[" is a typo.

The reasoning makes perfect sense and I'll give the NO_START_OPTIMIZE option a
try.

--
You are receiving this mail because:
You are on the CC list for the bug.