Re: [pcre-dev] Proposal for a new API for PCRE

Top Page
Delete this message
Author: ph10
Date:  
To: Graycode
CC: pcre-dev
Subject: Re: [pcre-dev] Proposal for a new API for PCRE
On Fri, 30 Aug 2013, Graycode wrote:

> I'd like to be able to define the storage for match_data to be on a
> thread's stack, mainly to avoid memory allocation / free for every
> execution. Its content can still be opaque, the values can be
> initialized / assigned by PCRE2. Because that storage is user-defined,
> its size should be passed into PCRE2 to enable detection of an error
> when compiling with one version but later linking with another PCRE2
> version that defines a larger structure.
>
> For the example of 8.1 option A, I'm thinking of something like:
>
> size_t ovector[20];
> pcre2_match_data mymatch;
> pcre2_match_data * match_data =
> pcre2_init_match_data(context, &mymatch, sizeof(mymatch), ovector, 20);


Interesting idea, but note that you don't need to allocate/free for
every execution. You can re-use a match_data block as often as you like.
If we weren't bothered about multithreading, the match data block could
be amalgamated with the context - the reason for splitting them was so
that a single context could often be used by multiple threads, and also
to make a clear distinction between input to the match and output from
the match.

The proposed function is pcre2_create_match_data rather than
pcre2_init_match_data, as there is no explicit initializing - the fields
are all used for remembering and passing back information about the
match.

There is a problem with allocating on the stack. In the current API, 1/3
of the ovector is used as workspace (so if you have an ovector of size
30, you can support up to 10 captures (2 slots per capture). I was
proposing to hide this by including the workspace inside the match data,
but that means that the match_data is not of a fixed size.

Bottom line: I like the principle, but it won't work straightforwardly
unless we go back to using some of the ovector as workspace (which I
think is confusing), so I'm not too keen, but I have noted the idea to
think about some more.

Philip

--
Philip Hazel