On Mon, 12 Aug 2013, Jean-Christophe Deschamps wrote:
> I suspect that stock Windows compile of both (PCRE8 + UTF) or (PCRE16
> + UTF + UCP) hits some limits trying to compile this really crazy RE:
> [1]https://gist.github.com/noprompt/6106573/raw/
> (It's supposed to match any english word.)
> I'd like to know which limit is hit.
>From the "pcrelimits" man page:
The maximum length of a compiled pattern is approximately 64K data
units (bytes for the 8-bit library, 32-bit units for the 32-bit
library, and 32-bit units for the 32-bit library) if PCRE is compiled
with the default internal linkage size of 2 bytes. If you want to
process regular expressions that are truly enormous, you can compile
PCRE with an internal linkage size of 3 or 4 (when building the 16-bit
or 32-bit library, 3 is rounded up to 4). See the README file in
the source distribution and the pcrebuild documentation for details.
In these cases the limit is substantially larger. However, the speed
of execution is slower.
I tried your pattern on my Linux box, using the default internal linkage
size and the pcretest program:
Failed: regular expression is too large at offset 1589395
I re-built PCRE to use a linkage size of 3, and then it compiled OK. My
guess is that the same will be true under Windows.
Philip
--
Philip Hazel