[pcre-dev] [Bug 572] PHP 5.2.3 with PCRE 6.7: repeated subpa…

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 572] PHP 5.2.3 with PCRE 6.7: repeated subpattern is too long at ...
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=572




--- Comment #1 from Philip Hazel <ph10@???> 2007-07-31 10:34:31 ---
On Mon, 30 Jul 2007, Stefan Teleman wrote:

> preg_match_all() [function: preg-match-all]: Compilation failed: repeated
> subpattern is too long at offset ...
>
> Tracked down the origin of this error in pcre_compile.c (pcre_compile2()), and
> in the following #defines in pcre's config.h:


This is not an error. It is a limitation that is documented. This is
from the pcre man page:

     All values in repeating quantifiers must be less than 65536. The  maxi-
     mum  compiled  length  of  subpattern  with an explicit repeat count is
     30000 bytes. The maximum number of capturing subpatterns is 65535.


     The maximum length of name for a named subpattern is 32 characters, and
     the maximum number of named subpatterns is 10000.


> Increasing these limits in pcre's config.h to:
>
> #ifndef MAX_NAME_SIZE
> #define MAX_NAME_SIZE 64
> #endif
>
> #ifndef MAX_NAME_COUNT
> #define MAX_NAME_COUNT 30000
> #endif
>
> #ifndef MAX_DUPLENGTH
> #define MAX_DUPLENGTH 60000
> #endif
>
> fixes the problem described above. Gallery2 and MediaWiki install and work
> fine.
>
> This is more of a question than a bug: Are overflows possible with these new,
> increased values ? In other words, are these new limits still within what the
> PCRE developers consider to be "safe" ?


The "overflow" in question for NAME_SIZE and NAME_COUNT is the size of a
32-bit integer when getting a block of memory for the compiled pattern.

> I realize that this is probably a very difficult question to answer,
> since answering it completely and corretctly would involve evaluating
> all the possible regular expression pattern matches, and determining
> whether or not they would cause overflow, or not.


No, it's not the matches. It's the compiled memory requirements.

The computation of memory size required is:

size = length + sizeof(real_pcre) + cd->names_found * (cd->name_entry_size +
3);

where length is the length of the compiled pattern, limited to be less
than 65536 if the link size is 2 (the default). The sizeof(real_pcre) is
relatively small. So I suppose that 64*30000 is still well within the
limit. But, I ask myself, who on earth would want to use names longer
than 32 characters in a pattern? And who would want more than 10000 such
names in a pattern? I don't think you need to increase those values.

Your problem is really MAX_DUPLENGTH. Indeed, that's what the error
message says: "repeated subpattern is too long". The comment in the code
for this check says:

/* This is a paranoid check to stop integer overflow later on */

The maximum repeat count is 65535. Repeated groups with a fixed upper
limit have to be duplicated (so as to provide different backtrack
points). 30000 * 65535 = 0x752f8ad0, which is getting close to
overflowing. Unfortunately, 60000 * 65535 = 0xea5f15a0, which has
overflowed.

The maximum safe value for MAX_DUPLENGTH (according to the current way
of thinking) is 32768.

> Reason for this bug/question: Without an "official" position of the PCRE
> developers on these increased limits, I cannot have this patch integrated in
> Solaris. Without this patch, Gallery2 and MediaWiki with PHP 5.2.3 won't
> install.


So we have a problem. It seems that you are trying to compile really
huge subpatterns that have explicit upper limits on their quantifiers.

If you increase MAX_DUPLENGTH above 32768 you run the risk of integer
overflow. Actually, when I think about it, the correct statement is
probably "more of a risk".

> If these new limits are still within "safe" boundaries, would it be possible to
> add the values for these #defines as --options to PCRE's ./configure ?


The first two are probably safe, but who needs them?
The third, as I've shown, is almost at its upper safe value already.

> Incidentally, the same error occurs with PCRE 7.0.


The DUPLENGTH check was introduced into 6.7 after somebody complained.

What is to be done?

It is a crude check, as you can see. I did not revisit it when I
completely revised the way the length of a compiled pattern was computed
for the 7.0 release. It occurs to me that, with the new way of doing
things, it might now be possible to be more flexible in this limit. I am
also now worried that with LINK_SIZE set greater than 2, the limit may
not work in any case.

I am doing work on PCRE at the moment, and I will put this issue on my
work list. I will let you know what happens.

Philip


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email