Author: Philip Hazel Date: To: Geoffrey Sneddon CC: pcre-dev Subject: Re: [pcre-dev] PCRE < 7.6's buffer overflow with UTF-8 character
classes
On Thu, 1 Jan 2009, Geoffrey Sneddon wrote:
> Is there anyway of knowing how many UTF-8 codepoints were needed in a
> character class to cause the buffer overflow to happen, or is it
> entirely platform dependent (and not even constant on, say, all 32-bit
> OSes)? Are codepoint ranges affected? What about when the characters
> are specified as hex escapes?
It shouldn't matter how the characters are specified. It's now almost a
year since I fixed that bug, and I'm afraid I cannot remember the
details at all clearly. The ChangeLog does say "very large number", so
it must be several thousand, I would have thought. I think it would be
the same in all environments. However, the limit would have been a
number of bytes, not a number of codepoints, since each codepoint could
be a different number of bytes. PCRE uses a temporary buffer when it's
scanning to see how much store the compiled pattern needs, and if my
memory is right, it wasn't emptying this buffer often enough when
dealing with this kind of character class.
To learn more, one would have to compare the sources.