Author: Philip Hazel Date: To: Sheri CC: pcre-dev Subject: Re: [pcre-dev] Compile error "number is too big" ?
On Thu, 24 Jul 2008, Sheri wrote:
> Someone sent me the following -- turns out everything in the pattern
> from the double quote on was erroneously tacked on (in fact the
> "pattern" was even longer.). But I don't understand the error. Don't the
> digits after \ddd stand for themselves? If the pattern is one digit
> character shorter there is no error. If longer (more digits), the
> reported offset of the error gets bigger.
>
> re> /.*?[0-9].log_old.log"D:\Temp\WswTest\WswSOL_0000-log\12006021523/i
> Failed: number is too big at offset 63
This is an artefact of the implementation. This is what the
documentation says (in pcrepattern):
The handling of a backslash followed by a digit other than 0 is
complicated. Outside a character class, PCRE reads it and any following
digits as a decimal number. If the number is less than 10, or if there
have been at least that many previous capturing left parentheses in the
expression, the entire sequence is taken as a back reference. A
description of how this works is given later, following the discussion
of parenthesized subpatterns.
Inside a character class, or if the decimal number is greater than 9 and
there have not been that many capturing subpatterns, PCRE re-reads up to
three octal digits following the backslash, and uses them to generate a
data character. Any subsequent digits stand for themselves. In non-UTF-8
mode, the value of a character specified in octal must be less than
\400. In UTF-8 mode, values up to \777 are permitted.
So it's trying to implement the first of those paragraphs, just in case
there have been 12006021523 previous back references. I don't think
there is much I can do about this. (It's the fault of Perl, of course,
for specifying this ambiguity in the first place. More recent versions
of Perl have moved away from this, but compatibility is maintained.)