Autor: Philip Hazel Data: A: Kevin Connor Arpe CC: pcre-dev Assumpte: Re: [pcre-dev] pcre_compile.c: error_texts
On Mon, 31 Dec 2012, Kevin Connor Arpe wrote:
> Apologies, I should be clearer. By "first" I do not mean multiple errors
> in the same pattern. I mean multiple, sequential calls to pcre_compile().
> Imagine the scenario above where user is entering regex in a GUI. This
> causes continuous recompile -- after each keystroke. Each recompile will
> be different (probably), and may potentially fail. Another case: You have
> a big file of regexes that you want to try to compile (test, etc).
I don't see how that would work with a shared library. With a static
library, yes, you could modify the data in the module. Note that there
are no static variables in PCRE other than those that are data tables
that are never changed.
[A thought: perhaps I don't understand shared libraries. Does each user
get their own static section? If so, what I wrote above is nonsense.]
The pcre_compile2() function returns an error number as well as an error
message; I suppose one could contemplate pcre_compile3() that returns
only the error number, but it would be rather more work for, I suspect,
not much gain.
> So when I say "first", I literally mean the first time error_texts is ever
> scanned (after PCRE lib is loaded into memory). At that point, we build
> the the indexer. For subsequent compiles that fail, we will have faster
> error lookup.
I really don't believe you would notice much difference. Especially in
the example you gave of a human interacting. The time taken for a modern
cpu to scan through no more than 75 messages is minuscule. Using
pcretest interactively, for example, gives instant responses, even on
this old desktop computer of mine.
Another way of speeding it up - though again I do not believe it is
worth doing - would be to store the messages as a concatenated sequence
of "BCPL strings", that is, with a byte containing the length of the
string at the start. Then skipping over them is even faster, and there
would be no time wasted doing indexing in the cases when only one regex
is being compiled.