Re: [pcre-dev] Detecting starting code units

Top Page
Delete this message
Author: ph10
Date:  
To: ND
CC: Pcre-dev
Subject: Re: [pcre-dev] Detecting starting code units
On Sat, 27 Jul 2019, ND via Pcre-dev wrote:

> It seems last code unit "c" is not detected and so start optimization don't
> work:
>
>
> PCRE2 version 10.34-RC1 2019-04-22
> /\Aabc/info,auto_callout
> Capture group count = 0
> Max lookbehind = 1
> Compile options: auto_callout
> Overall options: anchored auto_callout
> First code unit = 'a'
> Subject length lower bound = 3
> abx
> --->abx
> +0 ^       \A
> +2 ^       a
> +3 ^^      b
> +4 ^ ^     c
> No match


For an anchored pattern, the "must be present" code unit value is set
only if it follows a variable length item in the pattern.

This is a judgement that it will probably be faster in most cases and it
will avoid the really bad case: suppose, instead of "abx" you have

abxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...very long string without "c"

It can take a lot of time to search a long string (and in fact, for that
reason, there is a limit to how much is searched, even when there is a
value to search for). Much quicker to fail the match after just checking
the first three characters.

Philip

--
Philip Hazel