[pcre-dev] [Bug 2283] RE::error() is nonempty even if compil…

Top Page

Reply to this message
Author: admin
To: pcre-dev
Subject: [pcre-dev] [Bug 2283] RE::error() is nonempty even if compilation succeeds

--- Comment #12 from Philip Hazel <ph10@???> ---
OK, I've done some further testing and studied the code - for the first time -
I haven't ever looked at this C++ code before. The original code always
compiles the pattern twice - once asis and then "anchored" at each end. This
seems inefficient, because you don't know which of the two versions are
actually needed, but for small patterns I guess it's cheap. But see below...

I see there is no provision for using the JIT accelerator from the C++ wrapper.
The addition of JIT happened after the C++ wrapper was contributed; the (then)
maintainer either didn't see the possibility or didn't need the feature.

Anyway, I'm now sure your patch is wrong, because it does not confine its
search for (*UTF8) etc to the very start of the pattern. As it happens, you can
get away with this because the first, unanchored, compile will throw the error,
before this wrapping code is obeyed. However, it could be confused by putting
(*UTF8) in a comment, for example. Also, for a very long pattern, it's a waste
of time searching right along it. (Some people's patterns are thousands of
bytes long.)

Other comments: My code exploited the fact that the list of (*UTF8) etc is in
reverse alphabetic order (I reckoned (*UTF8) was likely to be most common).

LONG TERM: I would suggest that you think about moving to PCRE2 in the long
term, for several reasons. The 10.xx releases do not have the C++ wrapper, but
you could copy the PCRE1 version and update it; personally I'd be tempted to
re-implement it. If performance is an issue for you, a way of using JIT would
help. The messy code we are currently discussing could be dispensed with,
because PCRE2 suports an ENDANCHORED option. Both ANCHORED and ENDANCHORED can
be specified at match time, so in theory there is no need for the double
compilation that is currently used, though there is a caveat: if specified at
match time, JIT cannot be used, so there's a compile vs match performance

You are receiving this mail because:
You are on the CC list for the bug.