Revision: 672
http://vcs.pcre.org/viewvc?view=rev&revision=672
Author: ph10
Date: 2011-08-23 17:45:55 +0100 (Tue, 23 Aug 2011)
Log Message:
-----------
Update non-manpage documentation for JIT.
Modified Paths:
--------------
code/trunk/AUTHORS
code/trunk/ChangeLog
code/trunk/HACKING
code/trunk/LICENCE
code/trunk/NEWS
code/trunk/NON-UNIX-USE
code/trunk/README
Modified: code/trunk/AUTHORS
===================================================================
--- code/trunk/AUTHORS 2011-08-23 11:17:49 UTC (rev 671)
+++ code/trunk/AUTHORS 2011-08-23 16:45:55 UTC (rev 672)
@@ -12,6 +12,28 @@
All rights reserved
+PCRE JUST-IN-TIME COMPILATION SUPPORT
+-------------------------------------
+
+Written by: Zoltan Herczeg
+Email local part: hzmester
+Emain domain: freemail.hu
+
+Copyright(c) 2010-2011 Zoltan Herczeg
+All rights reserved.
+
+
+STACK-LESS JUST-IN-TIME COMPILER
+--------------------------------
+
+Written by: Zoltan Herczeg
+Email local part: hzmester
+Emain domain: freemail.hu
+
+Copyright(c) 2009-2011 Zoltan Herczeg
+All rights reserved.
+
+
THE C++ WRAPPER LIBRARY
-----------------------
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2011-08-23 11:17:49 UTC (rev 671)
+++ code/trunk/ChangeLog 2011-08-23 16:45:55 UTC (rev 672)
@@ -25,6 +25,9 @@
using the entire output vector, but this conflicts with the specification
that only 2/3 is used for passing back captured substrings. Now it uses only
the first 2/3, for compatibility. This is, of course, another edge case.
+
+4. Zoltan Herczeg's just-in-time compiler support has been integrated into the
+ main code base, and can be used by building with --enable-jit.
Version 8.13 16-Aug-2011
Modified: code/trunk/HACKING
===================================================================
--- code/trunk/HACKING 2011-08-23 11:17:49 UTC (rev 671)
+++ code/trunk/HACKING 2011-08-23 16:45:55 UTC (rev 672)
@@ -89,7 +89,10 @@
it implements an NFA algorithm, similar to the original Henry Spencer algorithm
and the way that Perl works. This is not surprising, since it is intended to be
as compatible with Perl as possible. This is the function most users of PCRE
-will use most of the time.
+will use most of the time. From release 8.20, if PCRE is compiled with
+just-in-time (JIT) support, and studying a compiled pattern with JIT is
+successful, the JIT code is run instead of the normal pcre_exec() code, but the
+result is the same.
Supplementary matching function
@@ -450,4 +453,4 @@
Philip Hazel
-July 2011
+August 2011
Modified: code/trunk/LICENCE
===================================================================
--- code/trunk/LICENCE 2011-08-23 11:17:49 UTC (rev 671)
+++ code/trunk/LICENCE 2011-08-23 16:45:55 UTC (rev 672)
@@ -9,7 +9,9 @@
directory, is distributed under the same terms as the software itself.
The basic library functions are written in C and are freestanding. Also
-included in the distribution is a set of C++ wrapper functions.
+included in the distribution is a set of C++ wrapper functions, and a
+just-in-time compiler that can be used to optimize pattern matching. These
+are both optional features that can be omitted when the library is built.
THE BASIC LIBRARY FUNCTIONS
@@ -26,6 +28,28 @@
All rights reserved.
+PCRE JUST-IN-TIME COMPILATION SUPPORT
+-------------------------------------
+
+Written by: Zoltan Herczeg
+Email local part: hzmester
+Emain domain: freemail.hu
+
+Copyright(c) 2010-2011 Zoltan Herczeg
+All rights reserved.
+
+
+STACK-LESS JUST-IN-TIME COMPILER
+--------------------------------
+
+Written by: Zoltan Herczeg
+Email local part: hzmester
+Emain domain: freemail.hu
+
+Copyright(c) 2009-2011 Zoltan Herczeg
+All rights reserved.
+
+
THE C++ WRAPPER FUNCTIONS
-------------------------
Modified: code/trunk/NEWS
===================================================================
--- code/trunk/NEWS 2011-08-23 11:17:49 UTC (rev 671)
+++ code/trunk/NEWS 2011-08-23 16:45:55 UTC (rev 672)
@@ -1,6 +1,16 @@
News about PCRE releases
------------------------
+Release 8.20
+------------
+
+The main change in this release is the inclusion of Zoltan Herczeg's
+just-in-time compiler support, which can be accessed by building PCRE with
+--enable-jit. Large performance benefits can be had in many situations. 8.20
+also fixes an unfortunate bug that was introduced in 8.13 as well as tidying up
+a couple of infelicities.
+
+
Release 8.13 16-Aug-2011
------------------------
Modified: code/trunk/NON-UNIX-USE
===================================================================
--- code/trunk/NON-UNIX-USE 2011-08-23 11:17:49 UTC (rev 671)
+++ code/trunk/NON-UNIX-USE 2011-08-23 16:45:55 UTC (rev 672)
@@ -37,15 +37,16 @@
The PCRE distribution includes a "configure" file for use by the Configure/Make
build system, as found in many Unix-like environments. There is also support
-support for CMake, which some users prefer, especially in Windows environments.
-There are some instructions for CMake under Windows in the section entitled
-"Building PCRE with CMake" below. CMake can also be used to build PCRE in
-Unix-like systems.
+for CMake, which some users prefer, especially in Windows environments. There
+are some instructions for CMake under Windows in the section entitled "Building
+PCRE with CMake" below. CMake can also be used to build PCRE in Unix-like
+systems.
GENERIC INSTRUCTIONS FOR THE PCRE C LIBRARY
-The following are generic comments about building the PCRE C library "by hand".
+The following are generic instructions for building the PCRE C library "by
+hand":
(1) Copy or rename the file config.h.generic as config.h, and edit the macro
settings that it contains to whatever is appropriate for your environment.
@@ -121,33 +122,51 @@
an unusual compiler) so that all included PCRE header files are first
sought in the current directory. Otherwise you run the risk of picking up
a previously-installed file from somewhere else.
+
+ (7) If you have defined SUPPORT_JIT in config.h, you must also compile
+
+ pcre_jit_compile.c
+
+ This file #includes sources from the sljit subdirectory, where there
+ should be 16 files, all of whose names begin with "sljit".
- (7) Now link all the compiled code into an object library in whichever form
+ (8) Now link all the compiled code into an object library in whichever form
your system keeps such libraries. This is the basic PCRE C library. If
your system has static and shared libraries, you may have to do this once
for each type.
- (8) Similarly, if you want to build the POSIX wrapper functions, ensure that
+ (9) Similarly, if you want to build the POSIX wrapper functions, ensure that
you have the pcreposix.h file and then compile pcreposix.c (remembering
-DHAVE_CONFIG_H if necessary). Link the result (on its own) as the
pcreposix library.
- (9) Compile the test program pcretest.c (again, don't forget -DHAVE_CONFIG_H).
+(10) Compile the test program pcretest.c (again, don't forget -DHAVE_CONFIG_H).
This needs the functions in the PCRE library when linking. It also needs
the pcreposix wrapper functions unless you compile it with -DNOPOSIX. The
pcretest.c program also needs the pcre_printint.src source file, which it
#includes.
-(10) Run pcretest on the testinput files in the testdata directory, and check
- that the output matches the corresponding testoutput files. Note that the
- supplied files are in Unix format, with just LF characters as line
- terminators. You may need to edit them to change this if your system uses
- a different convention. If you are using Windows, you probably should use
- the wintestinput3 file instead of testinput3 (and the corresponding output
- file). This is a locale test; wintestinput3 sets the locale to "french"
- rather than "fr_FR", and there some minor output differences.
+(11) Run pcretest on the testinput files in the testdata directory, and check
+ that the output matches the corresponding testoutput files. Some tests are
+ relevant only when certain build-time options are selected. For example,
+ test 4 is for UTF-8 support, and will not run if you have build PCRE
+ without it. See the comments at the start of each testinput file. If you
+ have a suitable Unix-like shell, the RunTest script will run the
+ appropriate tests for you.
+
+ Note that the supplied files are in Unix format, with just LF characters
+ as line terminators. You may need to edit them to change this if your
+ system uses a different convention. If you are using Windows, you probably
+ should use the wintestinput3 file instead of testinput3 (and the
+ corresponding output file). This is a locale test; wintestinput3 sets the
+ locale to "french" rather than "fr_FR", and there some minor output
+ differences.
+
+(12) If you have built PCRE with SUPPORT_JIT, the JIT features will be tested
+ by the testdata files. However, you might also like to build and run
+ the JIT test program, pcre_jit_test.c.
-(11) If you want to use the pcregrep command, compile and link pcregrep.c; it
+(13) If you want to use the pcregrep command, compile and link pcregrep.c; it
uses only the basic PCRE library (it does not need the pcreposix library).
@@ -238,7 +257,7 @@
This should create two libraries called libpcre and libpcreposix, and, if you
have enabled building the C++ wrapper, a third one called libpcrecpp. These are
-independent libraries: when you like with libpcreposix or libpcrecpp you must
+independent libraries: when you link with libpcreposix or libpcrecpp you must
also link with libpcre, which contains the basic functions. (Some earlier
releases of PCRE included the basic libpcre functions in libpcreposix. This no
longer happens.)
@@ -497,5 +516,5 @@
=========================
-Last Updated: 26 May 2010
+Last Updated: 23 August 2011
****
Modified: code/trunk/README
===================================================================
--- code/trunk/README 2011-08-23 11:17:49 UTC (rev 671)
+++ code/trunk/README 2011-08-23 16:45:55 UTC (rev 672)
@@ -173,6 +173,10 @@
--disable-cpp to the "configure" command. Otherwise, when "configure" is run,
it will try to find a C++ compiler and C++ header files, and if it succeeds,
it will try to build the C++ wrapper.
+
+. If you want to include support for just-in-time compiling, which can give
+ large performance improvements on certain platforms, add --enable-jit to the
+ "configure" command.
. If you want to make use of the support for UTF-8 Unicode character strings in
PCRE, you must add --enable-utf8 to the "configure" command. Without it, the
@@ -255,9 +259,10 @@
on the "configure" command. PCRE runs more slowly in this mode, but it may be
necessary in environments with limited stack sizes. This applies only to the
- pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
- use deeply nested recursion. There is a discussion about stack sizes in the
- pcrestack man page.
+ normal execution of the pcre_exec() function; if JIT support is being
+ successfully used, it is not relevant. Equally, it does not apply to
+ pcre_dfa_exec(), which does not use deeply nested recursion. There is a
+ discussion about stack sizes in the pcrestack man page.
. For speed, PCRE uses four tables for manipulating and identifying characters
whose code point values are less than 256. By default, it uses a set of
@@ -317,14 +322,16 @@
The "configure" script builds the following files for the basic C library:
-. Makefile is the makefile that builds the library
-. config.h contains build-time configuration options for the library
-. pcre.h is the public PCRE header file
-. pcre-config is a script that shows the settings of "configure" options
-. libpcre.pc is data for the pkg-config command
-. libtool is a script that builds shared and/or static libraries
-. RunTest is a script for running tests on the basic C library
-. RunGrepTest is a script for running tests on the pcregrep command
+. Makefile the makefile that builds the library
+. config.h build-time configuration options for the library
+. pcre.h the public PCRE header file
+. pcre-config script that shows the building settings such as CFLAGS
+ that were set for "configure"
+. libpcre.pc ) data for the pkg-config command
+. libpcreposix.pc )
+. libtool script that builds shared and/or static libraries
+. RunTest script for running tests on the basic C library
+. RunGrepTest script for running tests on the pcregrep command
Versions of config.h and pcre.h are distributed in the PCRE tarballs under the
names config.h.generic and pcre.h.generic. These are provided for those who
@@ -333,9 +340,9 @@
If a C++ compiler is found, the following files are also built:
-. libpcrecpp.pc is data for the pkg-config command
-. pcrecpparg.h is a header file for programs that call PCRE via the C++ wrapper
-. pcre_stringpiece.h is the header for the C++ "stringpiece" functions
+. libpcrecpp.pc data for the pkg-config command
+. pcrecpparg.h header file for calling PCRE via the C++ wrapper
+. pcre_stringpiece.h header for the C++ "stringpiece" functions
The "configure" script also creates config.status, which is an executable
script that can be run to recreate the configuration, and config.log, which
@@ -343,11 +350,11 @@
Once "configure" has run, you can run "make". It builds two libraries, called
libpcre and libpcreposix, a test program called pcretest, and the pcregrep
-command. If a C++ compiler was found on your system, "make" also builds the C++
-wrapper library, which is called libpcrecpp, and some test programs called
-pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest.
-Building the C++ wrapper can be disabled by adding --disable-cpp to the
-"configure" command.
+command. If a C++ compiler was found on your system, and you did not disable it
+with --disable-cpp, "make" also builds the C++ wrapper library, which is called
+libpcrecpp, and some test programs called pcrecpp_unittest,
+pcre_scanner_unittest, and pcre_stringpiece_unittest. If you enabled JIT
+support with --enable-jit, a test program called pcre_jit_test is also built.
The command "make check" runs all the appropriate tests. Details of the PCRE
tests are given below in a separate section of this document.
@@ -368,6 +375,7 @@
Configuration information (lib/pkgconfig):
libpcre.pc
+ libpcreposix.pc
libpcrecpp.pc (if C++ support is enabled)
Header files (include):
@@ -381,6 +389,7 @@
Man pages (share/man/man{1,3}):
pcregrep.1
pcretest.1
+ pcre-config.1
pcre.3
pcre*.3 (lots more pages, all starting "pcre")
@@ -395,9 +404,10 @@
LICENCE
NEWS
README
- pcre.txt (a concatenation of the man(3) pages)
- pcretest.txt the pcretest man page
- pcregrep.txt the pcregrep man page
+ pcre.txt (a concatenation of the man(3) pages)
+ pcretest.txt the pcretest man page
+ pcregrep.txt the pcregrep man page
+ pcre-config.txt the pcre-config man page
If you want to remove PCRE from your system, you can run "make uninstall".
This removes all the files that "make install" installed. However, it does not
@@ -533,24 +543,34 @@
created by the configuring process. There is also a script called RunGrepTest
that tests the options of the pcregrep command. If the C++ wrapper library is
built, three test programs called pcrecpp_unittest, pcre_scanner_unittest, and
-pcre_stringpiece_unittest are also built.
+pcre_stringpiece_unittest are also built. When JIT support is enabled, another
+test program called pcre_jit_test is built.
Both the scripts and all the program tests are run if you obey "make check" or
"make test". For other systems, see the instructions in NON-UNIX-USE.
The RunTest script runs the pcretest test program (which is documented in its
-own man page) on each of the testinput files in the testdata directory in
-turn, and compares the output with the contents of the corresponding testoutput
-files. A file called testtry is used to hold the main output from pcretest
+own man page) on each of the relevant testinput files in the testdata
+directory, and compares the output with the contents of the corresponding
+testoutput files. Some tests are relevant only when certain build-time options
+were selected. For example, the tests for UTF-8 support are run only if
+--enable-utf8 was used. RunTest outputs a comment when it skips a test.
+
+Many of the tests that are not skipped are run up to three times. The second
+run forces pcre_study() to be called for all patterns except for a few in some
+tests that are marked "never study" (see the pcretest program for how this is
+done). If JIT support is available, the non-DFA tests are run a third time,
+this time with a forced pcre_study() with the PCRE_STUDY_JIT_COMPILE option.
+
+RunTest uses a file called testtry to hold the main output from pcretest
(testsavedregex is also used as a working file). To run pcretest on just one of
the test files, give its number as an argument to RunTest, for example:
RunTest 2
-The first test file can also be fed directly into the perltest.pl script to
-check that Perl gives the same results. The only difference you should see is
-in the first few lines, where the Perl version is given instead of the PCRE
-version.
+The first test file can be fed directly into the perltest.pl script to check
+that Perl gives the same results. The only difference you should see is in the
+first few lines, where the Perl version is given instead of the PCRE version.
The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
@@ -587,35 +607,38 @@
Windows versions of test 2. More info on using RunTest.bat is included in the
document entitled NON-UNIX-USE.]
-The fourth test checks the UTF-8 support. It is not run automatically unless
-PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
-running "configure". This file can be also fed directly to the perltest.pl
-script, provided you are running Perl 5.8 or higher.
+The fourth test checks the UTF-8 support. This file can be also fed directly to
+the perltest.pl script, provided you are running Perl 5.8 or higher.
The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
features of PCRE that are not relevant to Perl.
The sixth test (which is Perl-5.10 compatible) checks the support for Unicode
-character properties. It it not run automatically unless PCRE is built with
-Unicode property support. To to this you must set --enable-unicode-properties
-when running "configure".
+character properties. This file can be also fed directly to the perltest.pl
+script, provided you are running Perl 5.10 or higher.
The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
-property support, respectively. The eighth and ninth tests are not run
-automatically unless PCRE is build with the relevant support.
+property support, respectively.
The tenth test checks some internal offsets and code size features; it is run
only when the default "link size" of 2 is set (in other cases the sizes
-change).
+change) and when Unicode property support is enabled.
-The eleventh test checks out features that are new in Perl 5.10, and the
-twelfth test checks a number internals and non-Perl features concerned with
-Unicode property support. It it not run automatically unless PCRE is built with
-Unicode property support. To to this you must set --enable-unicode-properties
-when running "configure".
+The eleventh and twelfth tests check out features that are new in Perl 5.10,
+without and with UTF-8 support, respectively. This file can be also fed
+directly to the perltest.pl script, provided you are running Perl 5.10 or
+higher.
+The thirteenth test checks a number internals and non-Perl features concerned
+with Unicode property support.
+The fourteenth test is run only when JIT support is available, and the
+fifteenth test is run only when JIT support is not available. They test some
+JIT-specific features such as information output from pcretest about JIT
+compilation.
+
+
Character tables
----------------
@@ -693,6 +716,7 @@
pcre_get.c ) sources for the functions in the library,
pcre_globals.c ) and some internal functions that they use
pcre_info.c )
+ pcre_jit_compile.c )
pcre_maketables.c )
pcre_newline.c )
pcre_ord2utf8.c )
@@ -709,6 +733,7 @@
pcre.h.in template for pcre.h when built by "configure"
pcreposix.h header for the external POSIX wrapper API
pcre_internal.h header for internal use
+ sljit/* 16 files that make up the JIT compiler
ucp.h header for Unicode property handling
config.h.in template for config.h, which is built by "configure"
@@ -775,6 +800,7 @@
mkinstalldirs script for making install directories
perltest.pl Perl test program
pcre-config.in source of script which retains PCRE information
+ pcre_jit_test.c test program for the JIT compiler
pcrecpp_unittest.cc )
pcre_scanner_unittest.cc ) test programs for the C++ wrapper
pcre_stringpiece_unittest.cc )
@@ -811,4 +837,4 @@
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 02 August 2011
+Last updated: 23 August 2011