Autor: ph10 Data: A: Zoltán Herczeg CC: Jean-Christophe Deschamps, pcre-dev Assumpte: Re: [pcre-dev] Which limit is hit?
On Mon, 26 Jan 2015, Zoltán Herczeg wrote:
> Tracking the subpattern index would require dynamic memory
> allocation, which is not preferred in PCRE, since memory allocation is
> slow.
Yes, that is exactly it. When I first designed PCRE, back in 1997, I
wanted to avoid memory allocation at matching time[*]. By replicating
groups with fixed upper limits in the compiled code, there is no need to
keep track of how many iterations have happened. I did not envisage
people using large quantifiers such as {9999} - I am not very good at
foreseeing how people are going to stretch the limits of the software.
Several other similar issues (such as the maximum number of capturing
groups) have been dealt with, but this one still remains.
The situation is different for groups like (?:abc)+ where there is no
upper limit to the repeat quantifier. In this case, there is no need to
keep track of which iteration you are in. There is therefore just one
copy of the group in the compiled byte code, and whenever it is
successfully matched, control just jumps back to its beginning.
Now that you have raised this point, it has started me wondering about
possible ways of not replicating the code but without too much memory
allocation. I have not yet come to any conclusion (and am rather busy
with other things at the moment), but maybe sometime in the future this
can be addressed. However, any such change would only apply to PCRE2.
Philip
[*] Subsequent changes to the code mean that there are now two, probably
rare, situations where memory is allocated at interpretive matching
time. One is if there is a recursive subpattern call and there are more
than 15 capturing groups whose status must be preserved over the
recursion. The other is when a call to match a pattern with back
references does not provide an output vector that is large enough to
hold the references.