Zoltan,
At 07:17 26/01/2015, you wrote:
the pattern is always compiled to byte code first, and JIT converts
it back, so using JIT alone does not help.
Ah, I didn't knew that point.
The reason of not using an iterator in the interpreter is
practical: PCRE interpreter uses stack recursion, and you cannot
easily share variable data across function calls. This is not a
problem for single character iterators, but matching brackets would
require inspecting the machine stack. Finding the previous call of
an iterator on the stack chain and getting local data from it is
difficult (in C at least). Instead the byte code of a subpattern is
repeated so there is no need for tracking the iterator count.
I don't want to abuse your time and patience but I'd love to
understand the whole picture.
In the case of a fixed repetition factor, are there cases which need
to backtrack in the middle of the iteration? I may be missing
something obvious but (?:abc){1234} matches as a whole or not at all.
If it doesn't and if at start of the pattern all is needed is to bump
the matching point in the subject and backtrack at the beginning of
the loop.
Using pcretest -d indeed shows that for instance (?:abc){5,7} expands
in fixed repetition of 'abc' 5 times then twice an optional 'abc'.
I understand your point for the variable, optional, part but wouldn't
it be worth implementing an iterator in the bytecode for the fixed
part? That wouldn't solve the (?:abc){5,9999} case but still it would
help in 2/3 of the cases, like (?:abc){9999} or (?:abc){9999,10022}.
--
[1]jcd@???
References
1.
mailto:jcd@q-e-d.org