Re: [pcre-dev] pcre_compile.c: error_texts

Top Page
Delete this message
Author: Giuseppe D'Angelo
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] pcre_compile.c: error_texts
On 1 January 2013 14:12, Philip Hazel <ph10@???> wrote:
> On Mon, 31 Dec 2012, Kevin Connor Arpe wrote:
>
>> Apologies, I should be clearer. By "first" I do not mean multiple errors
>> in the same pattern. I mean multiple, sequential calls to pcre_compile().
>> Imagine the scenario above where user is entering regex in a GUI. This
>> causes continuous recompile -- after each keystroke. Each recompile will
>> be different (probably), and may potentially fail. Another case: You have
>> a big file of regexes that you want to try to compile (test, etc).
>
> I don't see how that would work with a shared library. With a static
> library, yes, you could modify the data in the module. Note that there
> are no static variables in PCRE other than those that are data tables
> that are never changed.
>
> [A thought: perhaps I don't understand shared libraries. Does each user
> get their own static section? If so, what I wrote above is nonsense.]


Each user gets its own writable data section; read-only data sections
are instead shared between the users. With the proposed approach, if I
got it right, every user would have to build the table of offsets when
it encounters the first pattern compilation error; but all tables
would be equal to each other and "read only" (you build it once, then
always read from it). Doesn't sound a good idea.

>> So when I say "first", I literally mean the first time error_texts is ever
>> scanned (after PCRE lib is loaded into memory). At that point, we build
>> the the indexer. For subsequent compiles that fail, we will have faster
>> error lookup.
>
> I really don't believe you would notice much difference. Especially in
> the example you gave of a human interacting. The time taken for a modern
> cpu to scan through no more than 75 messages is minuscule. Using
> pcretest interactively, for example, gives instant responses, even on
> this old desktop computer of mine.


Indeed, for 75 strings I don't think the difference would be
noticeable. If we can figure out how to statically build the offset
table [1], then it would be a mere space/time tradeoff (O(1) instead
of O(n) time for lookup, at a O(n) ~~ 4n additional memory cost).

> Another way of speeding it up - though again I do not believe it is
> worth doing - would be to store the messages as a concatenated sequence
> of "BCPL strings", that is, with a byte containing the length of the
> string at the start. Then skipping over them is even faster, and there
> would be no time wasted doing indexing in the cases when only one regex
> is being compiled.


But again, this requires some sort of preprocessing in order to deal
with the XSTRING(...), which are fixed at configure time. And if we
accept to do the preprocessing, then I think we can statically build
the array of the offsets.

Cheers,
--
Giuseppe D'Angelo

[1] See also
* http://www.macieira.org/blog/2011/07/table-driven-methods-with-no-relocations/
* http://websvn.kde.org/trunk/KDE/kdesdk/scripts/generate_string_table.pl?view=markup