Author: Philip Hazel Date: To: Graycode CC: pcre-dev Subject: Re: [pcre-dev] [Bug 1174] allow passing of pcre_{malloc, free,
stack_malloc, stack_free, callout} as parameters
Thanks for taking the time to post a useful discussion.
On Tue, 8 Nov 2011, Graycode wrote:
> I think it would be great if PCRE could invent a pcre_app_config()
> function whereby the application could specify its default
> limitations and configuration options. It should include things like
> the memory allocation / free vectors, match_limit_recursion,
> match_limit, etc. These are all currently present in PCRE, either
> as static variables or as members of the extra structure. All I'm
> suggesting is that a pcre_app_config() could establish default
> handing, and that new function could spread those settings back
> out into static variables that are private in the library.
I am not very knowledgeable about threads, but it seems to me that this
would not work, at least not in a Unix/Linux world (which is where I
operate) because the static variables would be shared by all threads.
Unless I am missing something (and that may well be true!) there is no
concept of "static variables that are private to the library" in
Unix/Linux. (I also suspect that in most Unix/Linux systems PCRE is
installed as a shared library.)
I do already have an item on the Wish List that reads as follows:
. Write a wrapper to maintain a structure with specified runtime
parameters, such as recurse limit, and pass these to PCRE each time
it is called. Also maybe malloc and free.
In a threaded world, such a wrapper would have to keep the data in
thread-local storage, possibly passed as an argument. Not really sure
how this would work.
> I suggest not requiring that pcre_callout be set that way. I think
> it's not-the-same kind of configuration option because it's more
> likely to have a different value for different threads of an
> application. Consider adding a pcre_callout call-back function
> pointer as a member of the extra structure that the application
> can assign, next to the callout_data pointer that's already there.
Yes, I think that is something that I will do.
> Trying to carry the memory management vectors through the PCRE code
> by starting with a pcre_compile3() seems difficult and may be more
> trouble than it's worth.
It would not, in fact, be difficult. There is already a local structure
that is carried through the code (it contains "static" variables);
adding one or more fields to it is straightforward. The same is true of
pcre_exec. The difficulty is in how to get the new data into
pcre_compile and other functions. The only way I can see of doing this
compatibly is to invent pcre_compile3 (etc).
> Keep in mind that the thread that invokes pcre_compile2() may not be
> the same thread that will call pcre_exec() to use it.
Good point.
> In our case all the setups including compile() are done by one thread,
> and later the exec() using the compiled expressions are done by
> multiple other threads. Releasing the compiled expression is also
> done by the same thread that compiled them. That could matter a lot
> depending on whether threaded memory is like fork() or other.
Indeed, and I'd rather not tangle with those issues because they may
differ from OS to OS.
> By the way, we do make use of (and rely upon) the PCRE memory
> management vectors.
I suspected that somebody might; it's good to know that the facility is
used. That is more of an incentive to improve it if possible.
Philip
PS: I've just answered a post (Bugzilla 1049) about UTF-16. Taking that
along with this issue, it is almost making a case that the current API
has been pushed to its limits and that a totally new API should be
created. I am not really happy about this for all sorts of reasons (not
least because of the amount of work!) However, the current API has
lasted a long time ... I think the last incompatible change was in 1998
or thereabouts.