[pcre-dev] PCRE2 first Release Candidate

Top Page
Delete this message
Author: ph10
Date:  
To: pcre-dev
Subject: [pcre-dev] PCRE2 first Release Candidate
A year or so ago there were discussions on this list about a new API for
PCRE. The current one is 17 years old and has been greatly hacked around
to accommodate new features while retaining compabitility. Over most of
this year I have been working on implementing the new API, known as
PCRE2. A first attempt at a release candidate is now available here:

ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Testing/pcre2-10.00-RC1.tar.gz
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Testing/pcre2-10.00-RC1.tar.bz2
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Testing/pcre2-10.00-RC1.zip

This is the first version of a distribution tarball, though people have
been testing from the repository sources.

Please test this in any way you can and report any problems or errors
and typos in the documentation. General comments about the API are also
welcome, though it is too late to make any drastic changes. However, if
anything really serious crops up, there may have to be changes in the
API, but I hope that if this happens the changes will be minimal.

There are no specific release notes, because there has been so much
change, but here are some things to consider:

1. You should treat this as a new project, not just a drastic update to
PCRE1. A lot has changed, though the underlying structure of the code is
much the same. We have started the version numbers from 10.00 so as to
avoid any confusion with PCRE1 versions.

2. Note that --enable-utf and --enable-ucp have been amalgamated into
--enable-unicode, and this is now the default.

3. I have updated the CMake files as well as the configure files. CMake
works for me on Linux, but I have no way of testing it on Windows. There
is as yet no RunTest.bat file for Windows. I'm hoping a Windows user
will provide one by updating the PCRE1 version.

4. Many names have been changed; in particular, pcre_exec() has become
pcre2_match(). The PCRE_JAVASCRIPT_COMPAT option has been split into
independent functional options PCRE2_ALT_BSUX, PCRE2_ALLOW_EMPTY_CLASS,
and PCRE2_MATCH_UNSET_BACKREF.

5. Patterns, subject strings, and replacement strings may all contain
binary zeros and for this reason are always passed as a pointer and a
length. However, the length may be given as PCRE2_ZERO_TERMINATED for
zero-terminated strings.

6. The output vector that holds offsets of matched strings is now a
vector of PCRE2_SIZE elements instead of ints. PCRE2_SIZE is expected to
be an unsigned integer type and is currently defined as size_t. The
special value PCRE2_UNSET is used for unset elements.

7. Error handling has been redesigned and error messages are available
in all code unit widths. The error codes have been redesignated.

8. Explicit "studying" of compiled patterns has been abolished - it now
always happens automatically. JIT compiling is done by calling a new
function, pcre2_jit_compile() after a successful return from
pcre2_compile().

9. The capture_last field of the callout structure is now an unsigned
integer, set to zero if there have been no captures.

10. The new pcre2test program has been completely re-written. The old
one started as a quick hack, but with so many added options its syntax
became horribly messy. The input format has been redesigned and is
mostly not compatible with the old program.

11. There are as yet no facilities for saving/restoring a compiled
pattern. This was always a hack in PCRE1, added when processors were
slower, and before the existence of JIT support (compiled JIT code
cannot be saved). However, thought is being given to a way of providing
this facility in future.

12. There is no C++ wrapper. The existing PCRE1 wrapper has no
maintainer at the moment, so it is unlikely to be ported to PCRE2. It
now seems to me that in fact it is best NOT to include such a wrapper
with PCRE2, but to encourage somebody to create and maintain a separate
project - or several projects, as I think there are different views on
how best to do the wrapping.

13. There is a new function called pcre2_substitute() that performs
"find and replace" operations.

14. The makevp.* files (for Virtual Pascal) are no longer included; if
anybody need these for PCRE2, please update the old files and I will put
new versions in a future release.

15. ...and no doubt there are plenty of things I've forgotten.

Philip

--
Philip Hazel