Re: [pcre-dev] Which limit is hit?

Top Page

Reply to this message
Author: Zoltán Herczeg
Date:  
To: Jean-Christophe Deschamps
CC: pcre-dev
Subject: Re: [pcre-dev] Which limit is hit?
The (?:abc){1234} is a simple case, but sometimes you need to track the status of the subpattern. E.g. when you match /(?:aa|a){6}/ to aaaaaa (this is a badly written, but possible pattern requires exponential matching time). The engine greedily matches aa at the beginning, but eventually it needs to try to second alternative, since there are not enough 'a'-s in the input. The interpreter tracks this on the stack, where each iteration has its own stack frame. These frames only track the subpattern, they are not aware of the subpattern index. Tracking the subpattern index would require dynamic memory allocation, which is not preferred in PCRE, since memory allocation is slow.

Regards,
Zoltan

Jean-Christophe Deschamps <jch.deschamps@???> írta:
>
> Zoltan,
> At 07:17 26/01/2015, you wrote:
>
>     the pattern is always compiled to byte code first, and JIT converts
>     it back, so using JIT alone does not help.

>
> Ah, I didn't knew that point.
>
>      The reason of not using an iterator in the interpreter is
>     practical: PCRE interpreter uses stack recursion, and you cannot
>     easily share variable data across function calls. This is not a
>     problem for single character iterators, but matching brackets would
>     require inspecting the machine stack. Finding the previous call of
>     an iterator on the stack chain and getting local data from it is
>     difficult (in C at least). Instead the byte code of a subpattern is
>     repeated so there is no need for tracking the iterator count.

>
> I don't want to abuse your time and patience but I'd love to
> understand the whole picture.
> In the case of a fixed repetition factor, are there cases which need
> to backtrack in the middle of the iteration? I may be missing
> something obvious but (?:abc){1234} matches as a whole or not at all.
> If it doesn't and if at start of the pattern all is needed is to bump
> the matching point in the subject and backtrack at the beginning of
> the loop.
> Using pcretest -d indeed shows that for instance (?:abc){5,7} expands
> in fixed repetition of 'abc' 5 times then twice an optional 'abc'.
> I understand your point for the variable, optional, part but wouldn't
> it be worth implementing an iterator in the bytecode for the fixed
> part? That wouldn't solve the (?:abc){5,9999} case but still it would
> help in 2/3 of the cases, like (?:abc){9999} or (?:abc){9999,10022}.
>
> --
> [1]jcd@???
>
>References
>
> 1. mailto:jcd@q-e-d.org
>--
>## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
>