Re: [pcre-dev] position and match length for all matches of …

トップ ページ
このメッセージを削除
著者: Philip Hazel
日付:  
To: Titus von der Malsburg
CC: pcre-dev, Tobias Günther
題目: Re: [pcre-dev] position and match length for all matches of all subpatterns separately
On Tue, 26 Oct 2010, Titus von der Malsburg wrote:

> I have strings like "abcbccd" and patterns like "a(b(c)+)+d".  I need
> the positions and match lengths for all matches of all subpatterns.
> Also, I need a way to select all positions and match lengths of a
> particular subpattern, e.g. (c) in the above example.  How can I
> achieve this?  My current idea is to transform the pattern from
> "a(b(c)+)+d" to "a(?<1>b(?<2>c)+)+d", this is easy and allows me to
> single out matches for a particular subpattern.  But as far as I can
> see, pcre_exec returns only information about the last match for every
> subpattern.


That is correct. As far as I can see, the only way you will be able to
extract information for each match of something like (c)+ is to make use
of a callout, which will be called each time. That, of course, means
that you have to modify your pattern and implement the callout interface
(see the pcrecallout documentation page). There will be an inevitable
performance penalty.

Note that an item such as (c)+, where there is just one character inside
parentheses, is very inefficient, both in terms of time and memory
usage. (I realize that perhaps you just gave this as an example, and
your real patterns are more complicated.)

Philip

--
Philip Hazel