Re: [pcre-dev] Which limit is hit?

Top Page
Delete this message
Author: Zoltán Herczeg
Date:  
To: Jean-Christophe Deschamps
CC: pcre-dev
Subject: Re: [pcre-dev] Which limit is hit?
Hi,

the pattern is always compiled to byte code first, and JIT converts it back, so using JIT alone does not help. The reason of not using an iterator in the interpreter is practical: PCRE interpreter uses stack recursion, and you cannot easily share variable data across function calls. This is not a problem for single character iterators, but matching brackets would require inspecting the machine stack. Finding the previous call of an iterator on the stack chain and getting local data from it is difficult (in C at least). Instead the byte code of a subpattern is repeated so there is no need for tracking the iterator count. JIT does not use machine stack for recursion, and it has an infrastructure for iterator data sharing, so this is not an issue there.

Regards,
Zoltan

Jean-Christophe Deschamps <jch.deschamps@???> írta:
>
> At 18:30 25/01/2015, you wrote:
> ´¯¯¯
>
>     I think the issue is that the byte code of the pattern is too big.
>     It is basically (?:\d+=) 9999 times. It was easier to implement the
>     interpreter this way (JIT converts back the byte code into an
>     interator again, because of the code size).
>     To make this work, increase the link size 3 or 4
>     (--with-link-size=4) when compiling PCRE.

>
> `---
> So if I understand you correctly, the only options are to either use a
> larger link size or use JIT, none of which is under my control since
> I'm using a script language interpretor embedding PCRE in linked form.
> While I regard PCRE as a superior engine and feel obliged by the work
> of the dev team I find unfortunate the choice to not implement an
> internal loop structure for fixed repetition of subpatterns.
> Thank you for your insight anyway.
>
> --
> [1]jcd@???
>
>References
>
> 1. mailto:jcd@q-e-d.org
>--
>## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
>