[pcre-dev] pcre_compile.c: error_texts

Top Page
Delete this message
Author: Kevin Connor Arpe
Date:  
To: pcre-dev
Subject: [pcre-dev] pcre_compile.c: error_texts
Hello,

This is my first mail to this list. I am doing some hacking on GLib (part
of GNOME), specifically their GRegex module which is built on top of this
(incredible) library.

I was digging into pcre_compile.c
(here<http://vcs.pcre.org/viewvc/code/trunk/pcre_compile.c?revision=1233&view=markup>)
to understand ownership of 'errorptr'. I know now that ownership is not
transferred to the caller.

I found this line in pcre_compile2 ():
>> *errorptr = find_error_text(errorcode);


Following the trail, I found this comment:

/* The texts of compile-time error messages. These are "char *" because they
are passed to the outside world. Do not ever re-use any error number,
because
they are documented. Always add a new error instead. Messages marked DEAD
below
are no longer used. This used to be a table of strings, but in order to
reduce
the number of relocations needed when a shared library is loaded
dynamically,
it is now one long string. We cannot use a table of offsets, because the
lengths of inserts such as XSTRING(MAX_NAME_SIZE) are not known. Instead, we
simply count through to the one we want - this isn't a performance issue
because these strings are used only when there is a compilation error.

Each substring ends with \0 to insert a null character. This includes the
final
substring, so that the whole string ends with \0\0, which can be detected
when
counting through. */

I have no doubt that this design exists for very good reasons, as explained
in the comment. (I had never considered relocation impact from a huge list
of char pointers.) Regarding the note about "performance issue", I have an
idea to improve.

What if we create a (static) list of offsets that is only initialised at
first compile error?

Something like:
>> static int error_text_offsets[78] = { -1 };


I am willing to write the code and submit a patch. However, since I am new
to this project, I don't know enough about its style (example: cathedral
vs. bazaar?). Also, I wanted to have my idea considered before writing
code.

If you think this idea is worthy, there are two areas of concern I can
think of:

Sizing error_text_offsets
=================
I am unsure if error_text_offsets should be sized (1) static/precisely, (2)
static/liberally, or (3) dynamically ((3a) precisely or (3b) liberally).
(1) static/precisely: we need to keep error_text_offsets precisely sized to
error_texts.  Today, it looks to be 78.
     This uses the least amount of memory, but incurs higher
risk/maintenance costs.
     I'm all about low-maintenance ("future proofing") where
possible/reasonable.
     This would probably require a test to be written to ensure the precise
size is correct.
     Imagine a scenario where error_texts has an addition, but
error_text_offsets does not grow.
     I have not yet looked if PCRE includes testing as part of its release
procedure.  I assume yes.
(2) static/liberally: something big, but not crazy, like 255.  ex: static
int error_text_offsets[255]
     If error_texts grows over time, we have space.  Low maintenance, but
takes a bit more memory.
(3) dynamically: static int *error_text_offsets = NULL
          allocate on first use via malloc() or PRCE's built-in/override
method
          Again: do we allocate precisely or liberally?
          (a) Precisely can be done via static constant or at runtime
(counting the number of embedded strings).
          (b) Liberally: Allocate something big enough, like 255, to
prevent double scan/two-pass over error_texts.
                        First pass: Calculate size of offsets array.
 Second pass: Calculate and store offsets.


Multithreading
==========
Finally, what about multithreading (MT)? Imagine the scenario where two
parallel threads call pcre_compile2 () independently. Both their compiles
fail. The init routine for error_text_offsets needs to be MT-aware.

Generally, (in C) I don't know how to properly handle that issue when
initialising this list the first time. (I know how to do it in Java/C#,
but there you get a huge framework library with every install. C is bit
trickier due to cross platform issues -- pthreads, etc.) Plus, I know
nothing about how PRCE handles/supports multithreading.

Thanks,
Arpe