Hi,
Is P1 means two literals? 'P' and '1'? PCRE has clever optimizations for quantifiers used on single characters: it can count them, and go back when recursion needed. The story is more difficult for other expessions, since a recursion may happened inside the subexpression.
Anyway, back references also optimized in the same way as characters, so perhaps:
"(P1)*" could be rewritten to "(?:(P1)\1*)?"
Regards,
Zoltan
Alan Lehotsky <alehotsky@???> írta:
>I have a string that's 150,002 characters consisting of>
>
P1P2......P4....P9.....>
>
>
(Thats 'P1', followed by 50k each of 'P2', 'P4', 'P9')>
>
and a simpleminded regex of>
>
(P1(P2)*(P4)*(P9)*)?>
>
With PCRE 7.9, this crashes in match() on the 5672'd recursive call (on a linux box with a moderately large stack)>
>
With the old Spencer regex, we match successfully.>
>
I tried the obvious things to improve this:>
>
1/ replaced capturing parentheses with (?: )>
>
2/ Added explicit ^$ anchors>
>
3/ possessive quantifiers.>
>
4/ atomic grouping>
>
In all cases I still stomp the stack via recursion in match().>
>
I assume that if I configured PCRE to not use the stack that I could make this run (although probably really slowly due to malloc/free overhead). >
>
Any advice?>
-- >
## List details at
http://lists.exim.org/mailman/listinfo/pcre-dev >