[pcre-dev] optimizing matches for large strings

Lähettäjä: Alan Lehotsky
Päiväys:
Vastaanottaja: pcre-dev
Aihe: [pcre-dev] optimizing matches for large strings

I have a string that's 150,002 characters consisting of

P1P2......P4....P9.....

(Thats 'P1', followed by 50k each of 'P2', 'P4', 'P9')

and a simpleminded regex of

    (P1(P2)*(P4)*(P9)*)?

With PCRE 7.9, this crashes in match() on the 5672'd recursive call (on a linux box with a moderately large stack)

With the old Spencer regex, we match successfully.

I tried the obvious things to improve this:

1/ replaced capturing parentheses with (?: )

2/ Added explicit ^$ anchors

3/ possessive quantifiers.

4/ atomic grouping

In all cases I still stomp the stack via recursion in match().

I assume that if I configured PCRE to not use the stack that I could make this run (although probably really slowly due to malloc/free overhead).

Any advice?

Tämä viesti kuuluu seuraavaan säikeeseen:
	Näytä säikeen viestit aikajärjestyksessä

	Herczeg Zoltán