Re: [pcre-dev] A native pcre exec for JIT

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: Zoltán Herczeg
CC: pcre-dev
Subject: Re: [pcre-dev] A native pcre exec for JIT
On Sun, 30 Sep 2012, Zoltán Herczeg wrote:

> > Also IMHO for *new* API we shouldn't continue the problems of the old
> > APIs; that means we should use size_t for the length, start_offset and
> > offsetcount parameters and the offsets themselves. (If the current code
> > can't cope, just reuturn an error if length > INT_MAX, but then we can
> > fix that without changing API.) Also, maybe options should be unsigned
> > (it's flags, right?).
>
> I think this would be a major change, which should only happen if all
> pcre API would go to the new form. Probably uint32_t would be the best
> for flags.


Yes indeed. There are a number of issues with the API. If a new one is
to be defined, we should try hard to think about all of them so that the
new API will last at least as long as the current one has. (I designed
it originally in 1997, so it hasn't done too badly.)

> We were thinking about a complete redesign of the API for some time,
> and perhaps we should note these requirements as well. Perhaps we
> should introduce a pcre2.h sometimes and some conversion functions
> which translates the arguments from the old format.


It might be a good idea to write a specification for a new API that can
be discussed and modified for a while ... there is, after all, no
immediate urgency. Once we have settled on where we want to get to, then
we can try to figure out a way to get there as compatibly as possible.

If some kind of compatibility wrapper proves impossible to create, then
perhaps we will have to invent NPCRE, though I would prefer not to have
to do this.

The main issues that I am aware of are:

(1) The interface for malloc/free control.

(2) The ugliness of the pcre_extra mechanism (which was done to extend 
    the existing API compatibly).


(3) Standardizing int vs unsigned int vs pcre_uint32 vs size_t and also
    thinking about char vs unsigned char.


My proposal for both (1) and (2) is that there should be a new data
structure called pcre_context that would be the first argument to every
function. It would contain all the user-settable values that currently
live in the pcre_extra block and also pointers to malloc/free functions,
with optional arguments. The existing malloc/free global indirectors
would be abolished. Typical coding style would start with

pcre_context *pc = pcre_init_context(NULL);

where the context block would be obtained by malloc(), or

pcre_context *pc = malloc(sizeof(pcre_context));

(or the structure is obtained some other way ... it could be the stack,
for example) followed by explicit setting the values or

pcre_init_context(pc);

to set default values (or a combination).

These are just ideas in my head at the moment. This is a major change
which should probably only happen after a version that is highly stable.

Philip

--
Philip Hazel