Re: [pcre-dev] Why Greedy Recursion Happens?

Kezdőlap
Üzenet törlése
Szerző: Philip Hazel
Dátum:  
Címzett: NN
CC: Pcre-dev
Új témák: [pcre-dev] Proceed from saved state, [pcre-dev] Optimizations when execution order is important
Tárgy: Re: [pcre-dev] Why Greedy Recursion Happens?
On Wed, 3 Dec 2008, NN wrote:

> Why regexp
> ((?!aaa).)+
> eats so much resources?
> Subject string is less then 40 kB length without 'aaa', but i get
> PCRE_ERROR_RECURSIONLIMIT or PCRE_ERROR_MATCHLIMIT


Each time a ()+ group repeats, it must save data for a back-up point, so
each time it recursively calls the matching function. You are matching
only one character per repeat, so it is recursion a lot. I used the \M
feature of pcretest to measure the value needed for the recursion limit
of your regex. It turns out to be equal to the length of the string. So,
for a string of 40K the limit must be set to at least 80K. Try running

pcretest -C

to see what the limit has been set to on your system. I see that on my
Gentoo system, the limit is set to 8192, which is pretty small. If you
build PCRE from source, the default limit is 10000000. (I also had to
increase the stack size to run a 40K test.)

If you re-write your regex to something like this:

([^a]++|(?!aaa)a)+

the number of recursions drops to around 5.

Regards,
Philip

--
Philip Hazel