[pcre-dev] [Bug 632] pcre_compile has no way to give length …

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 632] pcre_compile has no way to give length of source
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=632




--- Comment #5 from Philip Hazel <ph10@???> 2007-11-20 18:41:59 ---
On Tue, 20 Nov 2007, Sean Middleditch wrote:

> (1) The *ptr == 0 has to be replaced with ptr == end.
> (2) Results from GETCHARINC and friends are checked against 0, so this must be
> modified to check against ptr == end instead.
> (3) Some loops continuously look up the current character in a table, and check
> the return against 0, so these also need a ptr != end check.
> (4) Some parts of the code loop while *ptr == 'x' (where x is some character or
> another), so these need to be replaced with while ptr != end && *ptr == 'x'.


I wonder if and/or how much these changes will affect performance? Hard
to say until one has tried it, I suppose.

> Most of the functions just need to take a copy of the end pointer.


The *internal* functions should not need this if they have the "cd"
(compile data) variable passed to them, because cd->pattern_end has the
required value.

> I renamed pcre_compile2 to pcre_compile3 which takes a length used to
> calculate the end pointer, and made pcre_compile and pcre_compile2
> call into pcre_compile3 and call strlen to figure out the proper
> length to pass in.


That's exactly what I would have done.

> Do any of the unit tests include tests that have incomplete expressions or
> incomplete UTF8 characters that could cause the code to try to walk past the
> end of the string, or do I need to add some tests for that?


They may well do, but without a search of the input I can't be sure. It
might be as well to add some specific tests. It never does any harm.

Philip


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email