[pcre-dev] optimizing matches for large strings

Author: Alan Lehotsky
Date:
To: pcre-dev
Subject: [pcre-dev] optimizing matches for large strings

I have a string that's 150,002 characters consisting of

P1P2......P4....P9.....

(Thats 'P1', followed by 50k each of 'P2', 'P4', 'P9')

and a simpleminded regex of

    (P1(P2)*(P4)*(P9)*)?

With PCRE 7.9, this crashes in match() on the 5672'd recursive call (on a linux box with a moderately large stack)

With the old Spencer regex, we match successfully.

I tried the obvious things to improve this:

1/ replaced capturing parentheses with (?: )

2/ Added explicit ^$ anchors

3/ possessive quantifiers.

4/ atomic grouping

In all cases I still stomp the stack via recursion in match().

I assume that if I configured PCRE to not use the stack that I could make this run (although probably really slowly due to malloc/free overhead).

Any advice?

This message is part of the following thread:
	the complete thread tree sorted by date

	Herczeg Zoltán at