[pcre-dev] [Bug 617] Several questions

Αρχική Σελίδα
Delete this message
Συντάκτης: Philip Hazel
Ημερομηνία:  
Προς: pcre-dev
Αντικείμενο: [pcre-dev] [Bug 617] Several questions
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=617




--- Comment #6 from Philip Hazel <ph10@???> 2007-11-20 09:45:57 ---
On Mon, 19 Nov 2007, Felipe Pena wrote:

> --- Comment #5 from Felipe Pena <felipensp@???> 2007-11-19 13:19:26 ---
> > 7. When you match /(\D+|<\d+>)*[!?]\Ka+/ with the string
> > aaaaaaaaaaaaaaaaaaaa1!aaaaaaaaaaaaaaaaaaaa
> > it will backtrack a lot. First \D+ matches all the a's once; then, when it does
> > not find ! or ? it will it will match all except the last a with \D+ and then
> > repeat by the * to match again. And so on. Nested unlimited repeats usually
> > have this property.
>
> Yes, but shouldn't make backtracking only at begin until '!' and perfoms 'a+'
> without a backtracking?


No. PCRE and Perl work left to write. Consider just

(\D+|<\d+>)*[!?]

First try:  \D+ matches aaaaaaaaaaaaaaaaaaaa    (20 chars)
The (...)* group has matched one time.
Try again: \D+ does not match; '<' does not match.
Carry on after the loop.
Try to match [!?]
Fail
Are there any backtracking points?
Yes, we can try matching \D+ 19 times instead of 20
Now \D+ matches 19 times
Try again: this time \D+ matches one character
But we then fail again
...
And so on, again and again


Nested repeats work like this. It will be much better if you use \D++ so
that it can't do that backtracking.

Regards,
Philip


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email