[pcre-dev] [Bug 2430] Severe performance decrease in (8-bit)…

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2430] Severe performance decrease in (8-bit) case-insensitive mode
https://bugs.exim.org/show_bug.cgi?id=2430

--- Comment #3 from Andreas Bergmann <andreas.bergmann@???> ---
(In reply to Andreas Bergmann from comment #2)
> (In reply to Philip Hazel from comment #1)
> > You missed the leading [ in "aA][bB][cC]" but I assume that's just a typo in
> > your posting, as it is present in your example. Investigating your patterns
> > shows that this is an effect caused by an optimization that happens in one
> > case, but not the other. Actually, it's an optimization that turns into a
> > pessimization. For (?i)abc PCRE2 records that a match must start with "a"
> > and there must be a "c" later in the source. For [aA][bB][cC] it records
> > only that a match must start with "A" or "a". It seems that searching for
> > "c" (which may be a long way after each "a") is taking up lots of time.
> > (Note, however, that if you use JIT, the problem doesn't occur.) I will take
> > a look at this - it occurs to me that searching for a "last fixed character"
> > is a bit pointless unless there is something variable between it and the
> > first character. Also, the search should perhaps only search so far after
> > the initial character.
> >
> > If you turn off the optimizations with NO_START_OPTIMIZE the two patterns
> > behave much the same.
>
> Thank you for your feedback - and true, the missing "[" is a typo.
>
> The reasoning makes perfect sense and I'll give the NO_START_OPTIMIZE option
> a try.



FYI / for the records:

 1048576 bytes    0.043428 sec   (?i)abc
 1048576 bytes    0.000130 sec   (a|A)(b|B)(c|C)
 1048576 bytes    0.000094 sec   [aA][bB][cC]


PCRE2_NO_START_OPTIMIZE:

 1048576 bytes    0.002174 sec   (?i)abc
 1048576 bytes    0.004067 sec   (a|A)(b|B)(c|C)
 1048576 bytes    0.002128 sec   [aA][bB][cC]


--
You are receiving this mail because:
You are on the CC list for the bug.