[pcre-dev] optimizing matches for large strings

Top Page
Delete this message
Author: Alan Lehotsky
Date:  
To: pcre-dev
Subject: [pcre-dev] optimizing matches for large strings
I have a string that's 150,002 characters consisting of

        P1P2......P4....P9.....


(Thats 'P1', followed by 50k each of 'P2', 'P4', 'P9')

and a simpleminded regex of

    (P1(P2)*(P4)*(P9)*)?


With PCRE 7.9, this crashes in match() on the 5672'd recursive call (on a linux box with a moderately large stack)

With the old Spencer regex, we match successfully.

I tried the obvious things to improve this:

1/ replaced capturing parentheses with (?: )

2/ Added explicit ^$ anchors

3/ possessive quantifiers.

4/ atomic grouping

In all cases I still stomp the stack via recursion in match().

I assume that if I configured PCRE to not use the stack that I could make this run (although probably really slowly due to malloc/free overhead).  

Any advice?