Re: [pcre-dev] pcre_compile.c: error_texts

Top Page
Delete this message
Author: Zoltán Herczeg
Date:  
To: Kevin Connor Arpe
CC: pcre-dev
Subject: Re: [pcre-dev] pcre_compile.c: error_texts
Hi Arpe,

thanks for your suggestion! Just my two cents: the reason of the search is that it shouldn't be a bottleneck. I suspect it takes less than 1% of the total compilation time, although I never really measured it. I think before we jump to coding, we should measure it somehow.

I wouldn't worry about the thread safety of your static array. If you write the same values to the memory from different threads, that is safe. Therefore, if for some reasons the array would be initialized multiple times, that should be thread safe (as long as you write exactly the same values to the array again).

Alternatively, we can simply use something like:
static char* error_texts[] = { "text1", "text2", ... };
error_texts[i] would return with the text corresponding to error code.

Regards,
Zoltan

Kevin Connor Arpe <kevinarpe@???> írta:
>Hello,>
>

This is my first mail to this list. I am doing some hacking on GLib (part>
of GNOME), specifically their GRegex module which is built on top of this>
(incredible) library.>
>

I was digging into pcre_compile.c>
(here<http://vcs.pcre.org/viewvc/code/trunk/pcre_compile.c?revision=1233&view=markup>)>
to understand ownership of 'errorptr'. I know now that ownership is not>
transferred to the caller.>
>

I found this line in pcre_compile2 ():>
>> *errorptr = find_error_text(errorcode);>
>

Following the trail, I found this comment:>
>

/* The texts of compile-time error messages. These are "char *" because they>
are passed to the outside world. Do not ever re-use any error number,>
because>
they are documented. Always add a new error instead. Messages marked DEAD>
below>
are no longer used. This used to be a table of strings, but in order to>
reduce>
the number of relocations needed when a shared library is loaded>
dynamically,>
it is now one long string. We cannot use a table of offsets, because the>
lengths of inserts such as XSTRING(MAX_NAME_SIZE) are not known. Instead, we>
simply count through to the one we want - this isn't a performance issue>
because these strings are used only when there is a compilation error.>
>

Each substring ends with \0 to insert a null character. This includes the>
final>
substring, so that the whole string ends with \0\0, which can be detected>
when>
counting through. */>
>

I have no doubt that this design exists for very good reasons, as explained>
in the comment. (I had never considered relocation impact from a huge list>
of char pointers.) Regarding the note about "performance issue", I have an>
idea to improve.>
>

What if we create a (static) list of offsets that is only initialised at>
first compile error?>
>

Something like:>
>> static int error_text_offsets[78] = { -1 };>
>

I am willing to write the code and submit a patch. However, since I am new>
to this project, I don't know enough about its style (example: cathedral>
vs. bazaar?). Also, I wanted to have my idea considered before writing>
code.>
>

If you think this idea is worthy, there are two areas of concern I can>
think of:>
>

Sizing error_text_offsets>
=================>
I am unsure if error_text_offsets should be sized (1) static/precisely, (2)>
static/liberally, or (3) dynamically ((3a) precisely or (3b) liberally).>
(1) static/precisely: we need to keep error_text_offsets precisely sized to>
error_texts.  Today, it looks to be 78.>
     This uses the least amount of memory, but incurs higher>
risk/maintenance costs.>
     I'm all about low-maintenance ("future proofing") where>
possible/reasonable.>
     This would probably require a test to be written to ensure the precise>
size is correct.>
     Imagine a scenario where error_texts has an addition, but>
error_text_offsets does not grow.>
     I have not yet looked if PCRE includes testing as part of its release>
procedure.  I assume yes.>
(2) static/liberally: something big, but not crazy, like 255.  ex: static>
int error_text_offsets[255]>
     If error_texts grows over time, we have space.  Low maintenance, but>
takes a bit more memory.>
(3) dynamically: static int *error_text_offsets = NULL>
          allocate on first use via malloc() or PRCE's built-in/override>
method>
          Again: do we allocate precisely or liberally?>
          (a) Precisely can be done via static constant or at runtime>
(counting the number of embedded strings).>
          (b) Liberally: Allocate something big enough, like 255, to>
prevent double scan/two-pass over error_texts.>
                        First pass: Calculate size of offsets array.>
 Second pass: Calculate and store offsets.>

>

Multithreading>
==========>
Finally, what about multithreading (MT)? Imagine the scenario where two>
parallel threads call pcre_compile2 () independently. Both their compiles>
fail. The init routine for error_text_offsets needs to be MT-aware.>
>

Generally, (in C) I don't know how to properly handle that issue when>
initialising this list the first time. (I know how to do it in Java/C#,>
but there you get a huge framework library with every install. C is bit>
trickier due to cross platform issues -- pthreads, etc.) Plus, I know>
nothing about how PRCE handles/supports multithreading.>
>

Thanks,>
Arpe>
-- >
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev >