Re: [pcre-dev] JIT regression

Top Page
Delete this message
Author: ph10
Date:  
To: Zoltán Herczeg
CC: Pcre-dev@exim.org, ND
Subject: Re: [pcre-dev] JIT regression
On Mon, 27 May 2019, Zoltán Herczeg wrote:

> that is strategical difference. You don't know the input from the
> pattern, and your input has no a-d characters. The interpreter only
> searches 'a', while jit searches two characters: 'a' and 'd' which
> distance is two. The latter is more complicated, but works better for
> random input. You can see the difference here:


The interpreter searches for 'a' using mamchr(); if it finds 'a' it then
does a second search for 'd' (again using memchr()) before running the
match. If you set no_start_optimize, to disable these optimizations,
there is a huge penalty.

> ./pcre2test -tm
> PCRE2 version 10.34-RC1 2019-04-22
>   re> /abcd/
> data> \[012345678a]{2000}
> Match time 0.1659 milliseconds
> No match
> data>
>   re> /abcd/jit
> data> \[012345678a]{2000}
> Match time 0.0027 milliseconds
> No match


Thanks for posting that example. I've just noticed that an improvement
may be possible in the interpreter - the search for 'd' happens only if
the subject is quite short, because searching very long strings takes
time - or at least it does when memchr() is not used. It cannot be used
for 16-bit and 32-bit strings, and originally it was not used for 8-bit
strings. I will do some experiments to see if that restriction can be
lifted for 8-bit strings (and if it improves performance).

Philip

--
Philip Hazel