[pcre-dev] PCRE2 alpha testers wanted

Top Page
Delete this message
Author: ph10
Date:  
To: pcre-dev
Subject: [pcre-dev] PCRE2 alpha testers wanted
To PCRE2 users:

Over the last few months I have been refactoring the way pcre2_compile()
works. The new code, tests, and updated documentation have just been
committed to the repository. You can check out PCRE2 like this:

svn co svn://vcs.exim.org/pcre2/code/trunk pcre2

I realized that most of the parsing could be moved into the relatively
new pre-pass function (introduced at 10.20) that identifies named and
numbered groups before the "real" compile. This function now creates a
parsed version of the pattern for the actual compile, which simplifies
things because (amongst other things) the handling of escapes and
skipping comments happens in one place only. (More details are in the
HACKING file.) While doing this work, I discovered a few minor bugs and
Perl incompatibilities that I was able to fix, including:

  (a) \Q\E in the middle of a quantifier such as A+\Q\E+ is now ignored
      instead of giving an invalid quantifier error.
  (b) {0} can now be used after a group in a lookbehind assertion;
      previously this caused an "assertion is not fixed length" error.
  (c) Perl always treats (?(DEFINE) as a "define" group, even if a group
      with the name "DEFINE" exists. PCRE2 now does likewise.
  (d) A recursion condition test such as (?(R2)...) must now refer to an
      existing subpattern.


The result of this work is that the compiled pcre2_compile.c file (the
.o file) is almost 40% smaller on my box. Paradoxically, the source file
is bigger, but most of that is a new debugging function that is not
normally compiled. With that excluded, the source is just a bit bigger,
which is probably because I put in extra comments. I have not done
exhaustive speed tests, but some simple pcre2test timings on the
testinput1 file suggest that the new code runs a few percent faster,
which is pleasing.

One effect of the refactoring is that some error numbers and messages
have changed, and the pattern offset given for compiling errors is not
always the right-most character that has been read. In particular, for a
variable-length lookbehind assertion error it now points to the start of
the assertion. Another change is that when a callout appears before a
group, the "length of next pattern item" that is passed now just gives
the length of the opening parenthesis item, not the length of the whole
group. A length of zero is now given only for a callout at the end of
the pattern. Automatic callouts are no longer inserted before and after
explicit callouts in the pattern.

The new code runs all the tests that I have thrown at it, but I fully
expect that I have broken something that does not show up in the tests.
I would be grateful if someone else could find the time to download from
the repository and try it out in a different environment. There is no
great rush as it is likely to be some months before the next release.

Meanwhile, I will get back to a number of unrelated issues that have
been reported over the summer.

Philip

--
Philip Hazel