Re: [pcre-dev] Strangely long matching times. Could anyone h…

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: Ralf Junker
CC: pcre-dev@exim.org
Subject: Re: [pcre-dev] Strangely long matching times. Could anyone help to explain?
Atomic grouping prevents backtracking into a successfully matched group
when there is a later match failure. This situation never happens in this
case. The group never matches. It will go on trying with more and more
characters matched by .*? until it hits the end of the subject. Then the
whole match fails; the starting point is moved on by 1 and it all happens
again. Try testing with pcre2test's -ac option (and a much shorter subject)
to get an idea of what is going on.
Regards,
Philip


On Sat, 28 Nov 2020 at 09:33, Ralf Junker <ralfjunker@???> wrote:

> Thank you, Philip, for the explanation. no_start_optimize certainly
> makes sense.
>
> What I still do not understand is why atomic grouping does not speed up
> the search? Theses two patterns are equally fast/slow:
>
> # Failure with extra letter "a" is very slow (> 6000 ms).
>
> /aa.*?bba/
> \[aa                                 bb ]{200}

>
> # Atomic grouping does not help (each > 6000 ms).
>
> /aa(?>.*?bba)/
> \[aa                                 bb ]{200}

>
> Shouldn't the atomic group prevent backtracking and run much faster?
>
> Ralf
>
> On 20.11.2020 17:05, Philip Hazel wrote:
>
> > The explanation is that the fast ones benefit from a start-up
> > optimization. For example, with the added x it knows there must be an
> > x in the subject and it does a preliminary check. Same for y. Not the
> > same for an added a. If you run with no_start_optimize the fast ones
> > will be slow and the slowness in all cases is because it is checking
> > a rather large search tree.
>
> --
> ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
>