[pcre-dev] PCRE2 is released

Top Page
Delete this message
Author: ph10
Date:  
To: pcre-dev
Subject: [pcre-dev] PCRE2 is released
The time has come to release the first version of PCRE2 - that is, PCRE with a
revised API, which has been available for testing for some weeks. You can
download it from here:

ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-10.00.tar.gz
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-10.00.tar.bz2
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-10.00.zip

This does not replace the pcre-8.36 release, because the API is not
compatible. Further 8.xx releases will happen if bugs are fixed, but
development work will concentrate on PCRE2. New projects should
therefore use PCRE2 if possible.

There are no specific release notes, because there has been so much change, but
here are some things to consider:

1. You should treat this as a new project, not just a drastic update to PCRE1.
A lot has changed, though the underlying structure of the code is much the
same. We have started the version numbers from 10.00 so as to avoid any
confusion with PCRE1 versions.

2. Note that --enable-utf and --enable-ucp have been amalgamated into
--enable-unicode, and this is now the default.

3. I have updated the CMake files as well as the configure files. CMake works
for me on Linux, but I have no way of testing it on Windows. There is as yet no
RunTest.bat file for Windows. I'm hoping a Windows user will provide one by
updating the PCRE1 version.

4. Many names have been changed; in particular, pcre_exec() has become
pcre2_match(). The PCRE_JAVASCRIPT_COMPAT option has been split into
independent functional options PCRE2_ALT_BSUX, PCRE2_ALLOW_EMPTY_CLASS, and
PCRE2_MATCH_UNSET_BACKREF.

5. Patterns, subject strings, and replacement strings may all contain binary
zeros and for this reason are always passed as a pointer and a length. However,
the length may be given as PCRE2_ZERO_TERMINATED for zero-terminated strings.

6. The output vector that holds offsets of matched strings is now a vector of
PCRE2_SIZE elements instead of ints. PCRE2_SIZE is expected to be an unsigned
integer type and is currently defined as size_t. The special value PCRE2_UNSET
is used for unset elements.

7. Error handling has been redesigned and error messages are available in all
code unit widths. The error codes have been redesignated.

8. Explicit "studying" of compiled patterns has been abolished - it now always
happens automatically. JIT compiling is done by calling a new function,
pcre2_jit_compile() after a successful return from pcre2_compile().

9. The capture_last field of the callout structure is now an unsigned integer,
set to zero if there have been no captures.

10. The new pcre2test program has been completely re-written. The old one
started as a quick hack, but with so many added options its syntax became
horribly messy. The input format has been redesigned and is mostly not
compatible with the old program.

11. There are as yet no facilities for saving/restoring a compiled pattern, but
this feature is being worked on and should appear in a future release.

12. There is no C++ wrapper. The existing PCRE1 wrapper has no maintainer at
the moment, so it is unlikely to be ported to PCRE2. It now seems to me that in
fact it is best NOT to include such a wrapper with PCRE2, but to encourage
somebody to create and maintain a separate project - or several projects, as I
think there are different views on how best to do the wrapping.

13. There is a new function called pcre2_substitute() that performs "find and
replace" operations.

14. The makevp.* files (for Virtual Pascal) are no longer included; if anybody
need these for PCRE2, please update the old files and I will put new versions
in a future release.

15. ...and no doubt there are plenty of things I've forgotten.

Happy New Year!
Philip

--
Philip Hazel