Szerző: Philip Hazel Dátum: Címzett: Titus von der Malsburg CC: pcre-dev, Tobias Günther Tárgy: Re: [pcre-dev] position and match length for all matches of all
subpatterns separately
On Tue, 26 Oct 2010, Titus von der Malsburg wrote:
> I have strings like "abcbccd" and patterns like "a(b(c)+)+d". I need
> the positions and match lengths for all matches of all subpatterns.
> Also, I need a way to select all positions and match lengths of a
> particular subpattern, e.g. (c) in the above example. How can I
> achieve this? My current idea is to transform the pattern from
> "a(b(c)+)+d" to "a(?<1>b(?<2>c)+)+d", this is easy and allows me to
> single out matches for a particular subpattern. But as far as I can
> see, pcre_exec returns only information about the last match for every
> subpattern.
That is correct. As far as I can see, the only way you will be able to
extract information for each match of something like (c)+ is to make use
of a callout, which will be called each time. That, of course, means
that you have to modify your pattern and implement the callout interface
(see the pcrecallout documentation page). There will be an inevitable
performance penalty.
Note that an item such as (c)+, where there is just one character inside
parentheses, is very inefficient, both in terms of time and memory
usage. (I realize that perhaps you just gave this as an example, and
your real patterns are more complicated.)