Revision: 975
http://vcs.pcre.org/viewvc?view=rev&revision=975
Author: ph10
Date: 2012-06-02 12:03:06 +0100 (Sat, 02 Jun 2012)
Log Message:
-----------
Document update for 8.31-RC1 test release.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/Makefile.am
code/trunk/NEWS
code/trunk/README
code/trunk/configure.ac
code/trunk/doc/html/index.html
code/trunk/doc/html/pcre16.html
code/trunk/doc/html/pcre_assign_jit_stack.html
code/trunk/doc/html/pcre_compile.html
code/trunk/doc/html/pcre_compile2.html
code/trunk/doc/html/pcre_jit_stack_alloc.html
code/trunk/doc/html/pcreapi.html
code/trunk/doc/html/pcrebuild.html
code/trunk/doc/html/pcrecompat.html
code/trunk/doc/html/pcrecpp.html
code/trunk/doc/html/pcregrep.html
code/trunk/doc/html/pcrejit.html
code/trunk/doc/html/pcrelimits.html
code/trunk/doc/html/pcrepartial.html
code/trunk/doc/html/pcrepattern.html
code/trunk/doc/html/pcresyntax.html
code/trunk/doc/html/pcretest.html
code/trunk/doc/html/pcreunicode.html
code/trunk/doc/pcre.txt
code/trunk/doc/pcre16.3
code/trunk/doc/pcreapi.3
code/trunk/doc/pcregrep.1
code/trunk/doc/pcrejit.3
code/trunk/doc/pcrepartial.3
code/trunk/doc/pcrepattern.3
code/trunk/doc/pcretest.1
code/trunk/pcre_compile.c
code/trunk/pcre_dfa_exec.c
code/trunk/pcre_exec.c
code/trunk/pcre_fullinfo.c
code/trunk/pcre_internal.h
code/trunk/pcre_tables.c
code/trunk/pcregrep.c
code/trunk/pcreposix.c
code/trunk/pcretest.c
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/ChangeLog 2012-06-02 11:03:06 UTC (rev 975)
@@ -1,8 +1,8 @@
ChangeLog for PCRE
------------------
-Version 8.31
------------------------------
+Version 8.31 02-June-2012
+-------------------------
1. Fixing a wrong JIT test case and some compiler warnings.
@@ -95,20 +95,20 @@
\w+ when the character tables indicated that \x{c4} was a word character.
There were several related cases, all because the tests for doing a table
lookup were testing for characters less than 127 instead of 255.
-
+
27. If a pattern contains capturing parentheses that are not used in a match,
- their slots in the ovector are set to -1. For those that are higher than
- any matched groups, this happens at the end of processing. In the case when
- there were back references that the ovector was too small to contain
- (causing temporary malloc'd memory to be used during matching), and the
- highest capturing number was not used, memory off the end of the ovector
- was incorrectly being set to -1. (It was using the size of the temporary
+ their slots in the ovector are set to -1. For those that are higher than
+ any matched groups, this happens at the end of processing. In the case when
+ there were back references that the ovector was too small to contain
+ (causing temporary malloc'd memory to be used during matching), and the
+ highest capturing number was not used, memory off the end of the ovector
+ was incorrectly being set to -1. (It was using the size of the temporary
memory instead of the true size.)
-
+
28. To catch bugs like 27 using valgrind, when pcretest is asked to specify an
ovector size, it uses memory at the end of the block that it has got.
-
-29. Check for an overlong MARK name and give an error at compile time. The
+
+29. Check for an overlong MARK name and give an error at compile time. The
limit is 255 for the 8-bit library and 65535 for the 16-bit library.
30. JIT compiler update.
@@ -120,7 +120,7 @@
33. Variable renamings in the PCRE-JIT compiler. No functionality change.
-34. Fixed typos in pcregrep: in two places there was SUPPORT_LIBZ2 instead of
+34. Fixed typos in pcregrep: in two places there was SUPPORT_LIBZ2 instead of
SUPPORT_LIBBZ2. This caused a build problem when bzip2 but not gzip (zlib)
was enabled.
Modified: code/trunk/Makefile.am
===================================================================
--- code/trunk/Makefile.am 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/Makefile.am 2012-06-02 11:03:06 UTC (rev 975)
@@ -356,6 +356,8 @@
endif # WITH_PCRE8
EXTRA_DIST += \
+ testdata/grepbinary \
+ testdata/grepfilelist \
testdata/grepinput \
testdata/grepinput3 \
testdata/grepinput8 \
Modified: code/trunk/NEWS
===================================================================
--- code/trunk/NEWS 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/NEWS 2012-06-02 11:03:06 UTC (rev 975)
@@ -1,6 +1,32 @@
News about PCRE releases
------------------------
+Release 8.31 02-June-2012
+-------------------------
+
+This is mainly a bug-fixing release, with a small number of developments:
+
+. The JIT compiler now supports partial matching and the (*MARK) and
+ (*COMMIT) verbs.
+
+. PCRE_INFO_MAXLOOKBEHIND can be used to find the longest lookbehing in a
+ pattern.
+
+. There should be a performance improvement when using the heap instead of the
+ stack for recursion.
+
+. pcregrep can now be linked with libedit as an alternative to libreadline.
+
+. pcregrep now has a --file-list option where the list of files to scan is
+ given as a file.
+
+. pcregrep now recognizes binary files and there are related options.
+
+. The Unicode tables have been updated to 6.1.0.
+
+As always, the full list of changes is in the ChangeLog file.
+
+
Release 8.30 04-February-2012
-----------------------------
Modified: code/trunk/README
===================================================================
--- code/trunk/README 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/README 2012-06-02 11:03:06 UTC (rev 975)
@@ -334,7 +334,7 @@
the readline() function. This provides line-editing and history facilities.
Note that libreadline is GPL-licenced, so if you distribute a binary of
pcretest linked in this way, there may be licensing issues. These can be
- avoided by linking with libedit (which has a BSD licence) instead.
+ avoided by linking with libedit (which has a BSD licence) instead.
Enabling libreadline causes the -lreadline option to be added to the pcretest
build. In many operating environments with a sytem-installed readline
Modified: code/trunk/configure.ac
===================================================================
--- code/trunk/configure.ac 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/configure.ac 2012-06-02 11:03:06 UTC (rev 975)
@@ -568,17 +568,17 @@
fi
fi
fi
-
+
# Check for the availability of libedit. Different distributions put its
# headers in different places. Try to cover the most common ones.
if test "$enable_pcretest_libedit" = "yes"; then
AC_CHECK_HEADERS([editline/readline.h], [HAVE_EDITLINE_READLINE_H=1],
[AC_CHECK_HEADERS([edit/readline/readline.h], [HAVE_READLINE_READLINE_H=1],
- [AC_CHECK_HEADERS([readline/readline.h], [HAVE_READLINE_READLINE_H=1])])])
+ [AC_CHECK_HEADERS([readline/readline.h], [HAVE_READLINE_READLINE_H=1])])])
AC_CHECK_LIB([edit], [readline], [LIBEDIT="-ledit"])
-fi
+fi
# This facilitates -ansi builds under Linux
dnl AC_DEFINE([_GNU_SOURCE], [], [Enable GNU extensions in glibc])
Modified: code/trunk/doc/html/index.html
===================================================================
--- code/trunk/doc/html/index.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/index.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -1,10 +1,10 @@
<html>
-<!-- This is a manually maintained file that is the root of the HTML version of
- the PCRE documentation. When the HTML documents are built from the man
- page versions, the entire doc/html directory is emptied, this file is then
- copied into doc/html/index.html, and the remaining files therein are
+<!-- This is a manually maintained file that is the root of the HTML version of
+ the PCRE documentation. When the HTML documents are built from the man
+ page versions, the entire doc/html directory is emptied, this file is then
+ copied into doc/html/index.html, and the remaining files therein are
created by the 132html script.
--->
+-->
<head>
<title>PCRE specification</title>
</head>
@@ -86,11 +86,11 @@
</table>
<p>
-There are also individual pages that summarize the interface for each function
+There are also individual pages that summarize the interface for each function
in the library. There is a single page for each pair of 8-bit/16-bit functions.
</p>
-<table>
+<table>
<tr><td><a href="pcre_assign_jit_stack.html">pcre_assign_jit_stack</a></td>
<td> Assign stack for JIT matching</td></tr>
@@ -153,7 +153,7 @@
<tr><td><a href="pcre_maketables.html">pcre_maketables</a></td>
<td> Build character tables in current locale</td></tr>
-
+
<tr><td><a href="pcre_pattern_to_host_byte_order.html">pcre_pattern_to_host_byte_order</a></td>
<td> Convert compiled pattern to host byte order if necessary</td></tr>
Modified: code/trunk/doc/html/pcre16.html
===================================================================
--- code/trunk/doc/html/pcre16.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcre16.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -273,12 +273,12 @@
<P>
There are two new general option names, PCRE_UTF16 and PCRE_NO_UTF16_CHECK,
which correspond to PCRE_UTF8 and PCRE_NO_UTF8_CHECK in the 8-bit library. In
-fact, these new options define the same bits in the options word. There is a
+fact, these new options define the same bits in the options word. There is a
discussion about the
<a href="pcreunicode.html#utf16strings">validity of UTF-16 strings</a>
in the
<a href="pcreunicode.html"><b>pcreunicode</b></a>
-page.
+page.
</P>
<P>
For the <b>pcre16_config()</b> function there is an option PCRE_CONFIG_UTF16
Modified: code/trunk/doc/html/pcre_assign_jit_stack.html
===================================================================
--- code/trunk/doc/html/pcre_assign_jit_stack.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcre_assign_jit_stack.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -30,7 +30,7 @@
DESCRIPTION
</b><br>
<P>
-This function provides control over the memory used as a stack at runtime by a
+This function provides control over the memory used as a stack at run-time by a
call to <b>pcre[16]_exec()</b> with a pattern that has been successfully
compiled with JIT optimization. The arguments are:
<pre>
Modified: code/trunk/doc/html/pcre_compile.html
===================================================================
--- code/trunk/doc/html/pcre_compile.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcre_compile.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -54,7 +54,7 @@
PCRE_DOLLAR_ENDONLY $ not to match newline at end
PCRE_DOTALL . matches anything including NL
PCRE_DUPNAMES Allow duplicate names for subpatterns
- PCRE_EXTENDED Ignore whitespace and # comments
+ PCRE_EXTENDED Ignore white space and # comments
PCRE_EXTRA PCRE extra features
(not much use currently)
PCRE_FIRSTLINE Force matching to be before newline
Modified: code/trunk/doc/html/pcre_compile2.html
===================================================================
--- code/trunk/doc/html/pcre_compile2.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcre_compile2.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -57,7 +57,7 @@
PCRE_DOLLAR_ENDONLY $ not to match newline at end
PCRE_DOTALL . matches anything including NL
PCRE_DUPNAMES Allow duplicate names for subpatterns
- PCRE_EXTENDED Ignore whitespace and # comments
+ PCRE_EXTENDED Ignore white space and # comments
PCRE_EXTRA PCRE extra features
(not much use currently)
PCRE_FIRSTLINE Force matching to be before newline
Modified: code/trunk/doc/html/pcre_jit_stack_alloc.html
===================================================================
--- code/trunk/doc/html/pcre_jit_stack_alloc.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcre_jit_stack_alloc.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -33,7 +33,7 @@
This function is used to create a stack for use by the code compiled by the JIT
optimization of <b>pcre[16]_study()</b>. The arguments are a starting size for
the stack, and a maximum size to which it is allowed to grow. The result can be
-passed to the JIT runtime code by <b>pcre[16]_assign_jit_stack()</b>, or that
+passed to the JIT run-time code by <b>pcre[16]_assign_jit_stack()</b>, or that
function can set up a callback for obtaining a stack. A maximum stack size of
512K to 1M should be more than enough for any pattern. For more details, see
the
Modified: code/trunk/doc/html/pcreapi.html
===================================================================
--- code/trunk/doc/html/pcreapi.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcreapi.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -317,7 +317,7 @@
strings: a single CR (carriage return) character, a single LF (linefeed)
character, the two-character sequence CRLF, any of the three preceding, or any
Unicode newline sequence. The Unicode newline sequences are the three just
-mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed,
+mentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed,
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
(paragraph separator, U+2029).
</P>
@@ -641,8 +641,8 @@
<pre>
PCRE_EXTENDED
</pre>
-If this bit is set, whitespace data characters in the pattern are totally
-ignored except when escaped or inside a character class. Whitespace does not
+If this bit is set, white space data characters in the pattern are totally
+ignored except when escaped or inside a character class. White space does not
include the VT character (code 11). In addition, characters between an
unescaped # outside a character class and the next newline, inclusive, are also
ignored. This is equivalent to Perl's /x option, and it can be changed within a
@@ -659,7 +659,7 @@
</P>
<P>
This option makes it possible to include comments inside complicated patterns.
-Note, however, that this applies only to data characters. Whitespace characters
+Note, however, that this applies only to data characters. White space characters
may never appear within special character sequences in a pattern, for example
within the sequence (?( that introduces a conditional subpattern.
<pre>
@@ -745,7 +745,7 @@
preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies
that any Unicode newline sequence should be recognized. The Unicode newline
sequences are the three just mentioned, plus the single characters VT (vertical
-tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line
+tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit
library, the last two are recognized only in UTF-8 mode.
</P>
@@ -759,7 +759,7 @@
</P>
<P>
The only time that a line break in a pattern is specially recognized when
-compiling is when PCRE_EXTENDED is set. CR and LF are whitespace characters,
+compiling is when PCRE_EXTENDED is set. CR and LF are white space characters,
and so are ignored in this mode. Also, an unescaped # outside a character class
indicates a comment that lasts until after the next line break sequence. In
other circumstances, line break sequences in patterns are treated as literal
@@ -916,6 +916,7 @@
72 too many forward references
73 disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
74 invalid UTF-16 string (specifically UTF-16)
+ 75 name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
</pre>
The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
be used if the limits were changed when PCRE was built.
@@ -950,7 +951,7 @@
</P>
<P>
The second argument of <b>pcre_study()</b> contains option bits. There are three
-options:
+options:
<pre>
PCRE_STUDY_JIT_COMPILE
PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
@@ -1231,7 +1232,7 @@
</pre>
Return the number of characters (NB not bytes) in the longest lookbehind
assertion in the pattern. Note that the simple assertions \b and \B require a
-one-character lookbehind. This information is useful when doing multi-segment
+one-character lookbehind. This information is useful when doing multi-segment
matching using the partial matching facilities.
<pre>
PCRE_INFO_MINLENGTH
@@ -1506,7 +1507,7 @@
Limiting the recursion depth limits the amount of machine stack that can be
used, or, when PCRE has been compiled to use memory on the heap instead of the
stack, the amount of heap memory that can be used. This limit is not relevant,
-and is ignored, when matching is done using JIT compiled code.
+and is ignored, when matching is done using JIT compiled code.
</P>
<P>
The default value for <i>match_limit_recursion</i> can be set when PCRE is
@@ -1689,7 +1690,7 @@
"no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)
are considered at every possible starting position in the subject string. If
PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching
-time. The use of PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set,
+time. The use of PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set,
matching is always done using interpretively.
</P>
<P>
@@ -2084,12 +2085,12 @@
<a href="pcrejit.html"><b>pcrejit</b></a>
documentation for more details.
<pre>
- PCRE_ERROR_BADMODE (-28)
+ PCRE_ERROR_BADMODE (-28)
</pre>
This error is given if a pattern that was compiled by the 8-bit library is
passed to a 16-bit library function, or vice versa.
<pre>
- PCRE_ERROR_BADENDIANNESS (-29)
+ PCRE_ERROR_BADENDIANNESS (-29)
</pre>
This error is given if a pattern that was compiled and saved is reloaded on a
host with different endianness. The utility function
@@ -2097,7 +2098,7 @@
so that it runs on the new host.
</P>
<P>
-Error numbers -16 to -20 and -22 are not used by <b>pcre_exec()</b>.
+Error numbers -16 to -20, -22, and -30 are not used by <b>pcre_exec()</b>.
<a name="badutf8reasons"></a></P>
<br><b>
Reason codes for invalid UTF-8 strings
@@ -2592,6 +2593,13 @@
recursively, using private vectors for <i>ovector</i> and <i>workspace</i>. This
error is given if the output vector is not large enough. This should be
extremely rare, as a vector of size 1000 is used.
+<pre>
+ PCRE_ERROR_DFA_BADRESTART (-30)
+</pre>
+When <b>pcre_dfa_exec()</b> is called with the <b>PCRE_DFA_RESTART</b> option,
+some plausibility checks are made on the contents of the workspace, which
+should contain data about the previous partial match. If any of these checks
+fail, this error is given.
</P>
<br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>
<P>
@@ -2610,7 +2618,7 @@
</P>
<br><a name="SEC26" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 14 April 2012
+Last updated: 04 May 2012
<br>
Copyright © 1997-2012 University of Cambridge.
<br>
Modified: code/trunk/doc/html/pcrebuild.html
===================================================================
--- code/trunk/doc/html/pcrebuild.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcrebuild.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -127,7 +127,7 @@
</P>
<P>
If you set --enable-utf when compiling in an EBCDIC environment, PCRE expects
-its input to be either ASCII or UTF-8 (depending on the runtime option). It is
+its input to be either ASCII or UTF-8 (depending on the run-time option). It is
not possible to support both EBCDIC and UTF-8 codes in the same version of the
library. Consequently, --enable-utf and --enable-ebcdic are mutually
exclusive.
@@ -317,7 +317,7 @@
</pre>
to the <b>configure</b> command, the distributed tables are no longer used.
Instead, a program called <b>dftables</b> is compiled and run. This outputs the
-source for new set of tables, created in the default locale of your C runtime
+source for new set of tables, created in the default locale of your C run-time
system. (This method of replacing the tables does not work if you are cross
compiling, because <b>dftables</b> is run on the local host. If you need to
create alternative tables when cross compiling, you will have to do so "by
Modified: code/trunk/doc/html/pcrecompat.html
===================================================================
--- code/trunk/doc/html/pcrecompat.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcrecompat.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -107,8 +107,16 @@
page.
</P>
<P>
-11. If (*THEN) is present in a group that is called as a subroutine, its action
-is limited to that group, even if the group does not contain any | characters.
+11. If any of the backtracking control verbs are used in an assertion or in a
+subpattern that is called as a subroutine (whether or not recursively), their
+effect is confined to that subpattern; it does not extend to the surrounding
+pattern. This is not always the case in Perl. In particular, if (*THEN) is
+present in a group that is called as a subroutine, its action is limited to
+that group, even if the group does not contain any | characters. There is one
+exception to this: the name from a *(MARK), (*PRUNE), or (*THEN) that is
+encountered in a successful positive assertion <i>is</i> passed back when a
+match succeeds (compare capturing parentheses in assertions). Note that such
+subpatterns are processed as anchored at the point where they are tested.
</P>
<P>
12. There are some differences that are concerned with the settings of captured
@@ -129,7 +137,7 @@
<P>
14. Perl recognizes comments in some places that PCRE does not, for example,
between the ( and ? at the start of a subpattern. If the /x modifier is set,
-Perl allows whitespace between ( and ? but PCRE never does, even if the
+Perl allows white space between ( and ? but PCRE never does, even if the
PCRE_EXTENDED option is set.
</P>
<P>
@@ -203,7 +211,7 @@
REVISION
</b><br>
<P>
-Last updated: 08 Januray 2012
+Last updated: 01 June 2012
<br>
Copyright © 1997-2012 University of Cambridge.
<br>
Modified: code/trunk/doc/html/pcrecpp.html
===================================================================
--- code/trunk/doc/html/pcrecpp.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcrecpp.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -192,7 +192,7 @@
PCRE_DOTALL dot matches newlines /s
PCRE_DOLLAR_ENDONLY $ matches only at end N/A
PCRE_EXTRA strict escape parsing N/A
- PCRE_EXTENDED ignore whitespaces /x
+ PCRE_EXTENDED ignore white spaces /x
PCRE_UTF8 handles UTF8 chars built-in
PCRE_UNGREEDY reverses * and *? N/A
PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
Modified: code/trunk/doc/html/pcregrep.html
===================================================================
--- code/trunk/doc/html/pcregrep.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcregrep.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -128,7 +128,7 @@
</P>
<br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
<P>
-By default, a file that contains a binary zero byte within the first 1024 bytes
+By default, a file that contains a binary zero byte within the first 1024 bytes
is identified as a binary file, and is processed specially. (GNU grep also
identifies binary files in this manner.) See the <b>--binary-files</b> option
for a means of changing the way binary files are handled.
@@ -172,7 +172,7 @@
</P>
<P>
<b>--binary-files=</b><i>word</i>
-Specify how binary files are to be processed. If the word is "binary" (the
+Specify how binary files are to be processed. If the word is "binary" (the
default), pattern matching is performed on binary files, but the only output is
"Binary file <name> matches" when a match succeeds. If the word is "text",
which is equivalent to the <b>-a</b> or <b>--text</b> option, binary files are
Modified: code/trunk/doc/html/pcrejit.html
===================================================================
--- code/trunk/doc/html/pcrejit.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcrejit.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -104,7 +104,7 @@
pcre_free(study_ptr);
#endif
</pre>
-PCRE_STUDY_JIT_COMPILE requests the JIT compiler to generate code for complete
+PCRE_STUDY_JIT_COMPILE requests the JIT compiler to generate code for complete
matches. If you want to run partial matches using the PCRE_PARTIAL_HARD or
PCRE_PARTIAL_SOFT options of <b>pcre_exec()</b>, you should set one or both of
the following options in addition to, or instead of, PCRE_STUDY_JIT_COMPILE
@@ -129,7 +129,7 @@
no JIT data is created. Otherwise, the compiled pattern is passed to the JIT
compiler, which turns it into machine code that executes much faster than the
normal interpretive code. When <b>pcre_exec()</b> is passed a <b>pcre_extra</b>
-block containing a pointer to JIT code of the appropriate mode (normal or
+block containing a pointer to JIT code of the appropriate mode (normal or
hard/soft partial), it obeys that code instead of running the interpreter. The
result is identical, but the compiled JIT code runs much faster.
</P>
@@ -169,10 +169,8 @@
<pre>
\C match a single byte; not supported in UTF-8 mode
(?Cn) callouts
- (*COMMIT) )
- (*MARK) )
- (*PRUNE) ) the backtracking control verbs
- (*SKIP) )
+ (*PRUNE) )
+ (*SKIP) ) backtracking control verbs
(*THEN) )
</pre>
Support for some of these may be added in future.
@@ -250,15 +248,15 @@
(2) If <i>callback</i> is NULL and <i>data</i> is not NULL, <i>data</i> must be
a valid JIT stack, the result of calling <b>pcre_jit_stack_alloc()</b>.
- (3) If <i>callback</i> is not NULL, it must point to a function that is
- called with <i>data</i> as an argument at the start of matching, in
- order to set up a JIT stack. If the return from the callback
- function is NULL, the internal 32K stack is used; otherwise the
- return value must be a valid JIT stack, the result of calling
+ (3) If <i>callback</i> is not NULL, it must point to a function that is
+ called with <i>data</i> as an argument at the start of matching, in
+ order to set up a JIT stack. If the return from the callback
+ function is NULL, the internal 32K stack is used; otherwise the
+ return value must be a valid JIT stack, the result of calling
<b>pcre_jit_stack_alloc()</b>.
</pre>
-A callback function is obeyed whenever JIT code is about to be run; it is not
-obeyed when <b>pcre_exec()</b> is called with options that are incompatible for
+A callback function is obeyed whenever JIT code is about to be run; it is not
+obeyed when <b>pcre_exec()</b> is called with options that are incompatible for
JIT execution. A callback function can therefore be used to determine whether a
match operation was executed by JIT or by the interpreter.
</P>
@@ -266,9 +264,9 @@
You may safely use the same JIT stack for more than one pattern (either by
assigning directly or by callback), as long as the patterns are all matched
sequentially in the same thread. In a multithread application, if you do not
-specify a JIT stack, or if you assign or pass back NULL from a callback, that
-is thread-safe, because each thread has its own machine stack. However, if you
-assign or pass back a non-NULL JIT stack, this must be a different stack for
+specify a JIT stack, or if you assign or pass back NULL from a callback, that
+is thread-safe, because each thread has its own machine stack. However, if you
+assign or pass back a non-NULL JIT stack, this must be a different stack for
each thread so that the application is thread-safe.
</P>
<P>
@@ -415,7 +413,7 @@
</P>
<br><a name="SEC13" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 14 April 2012
+Last updated: 04 May 2012
<br>
Copyright © 1997-2012 University of Cambridge.
<br>
Modified: code/trunk/doc/html/pcrelimits.html
===================================================================
--- code/trunk/doc/html/pcrelimits.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcrelimits.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -48,6 +48,10 @@
maximum number of named subpatterns is 10000.
</P>
<P>
+The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
+is 255 for the 8-bit library and 65535 for the 16-bit library.
+</P>
+<P>
The maximum length of a subject string is the largest positive number that an
integer variable can hold. However, when using the traditional matching
function, PCRE uses recursion to handle subpatterns and indefinite repetition.
@@ -72,7 +76,7 @@
REVISION
</b><br>
<P>
-Last updated: 08 January 2012
+Last updated: 04 May 2012
<br>
Copyright © 1997-2012 University of Cambridge.
<br>
Modified: code/trunk/doc/html/pcrepartial.html
===================================================================
--- code/trunk/doc/html/pcrepartial.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcrepartial.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -58,14 +58,14 @@
are set, PCRE_PARTIAL_HARD takes precedence.
</P>
<P>
-If you want to use partial matching with just-in-time optimized code, you must
+If you want to use partial matching with just-in-time optimized code, you must
call <b>pcre_study()</b> or <b>pcre16_study()</b> with one or both of these
options:
<pre>
PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
</pre>
-PCRE_STUDY_JIT_COMPILE should also be set if you are going to run non-partial
+PCRE_STUDY_JIT_COMPILE should also be set if you are going to run non-partial
matches on the same pattern. If the appropriate JIT study mode has not been set
for a match, the interpretive matching code is used.
</P>
@@ -354,8 +354,8 @@
<P>
2. Lookbehind assertions that have already been obeyed are catered for in the
offsets that are returned for a partial match. However a lookbehind assertion
-later in the pattern could require even earlier characters to be inspected. You
-can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the
+later in the pattern could require even earlier characters to be inspected. You
+can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the
<b>pcre_fullinfo()</b> or <b>pcre16_fullinfo()</b> functions to obtain the length
of the largest lookbehind in the pattern. This length is given in characters,
not bytes. If you always retain at least that many characters before the
@@ -372,7 +372,7 @@
data> ab\P
No match
</pre>
-If the next segment begins "cx", a match should be found, but this will only
+If the next segment begins "cx", a match should be found, but this will only
happen if characters from the previous segment are retained. For this reason, a
"no match" result should be interpreted as "partial match of an empty string"
when the pattern contains lookbehinds.
Modified: code/trunk/doc/html/pcrepattern.html
===================================================================
--- code/trunk/doc/html/pcrepattern.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcrepattern.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -227,10 +227,10 @@
greater than 127) are treated as literals.
</P>
<P>
-If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the
+If a pattern is compiled with the PCRE_EXTENDED option, white space in the
pattern (other than in a character class) and characters between a # outside
a character class and the next newline are ignored. An escaping backslash can
-be used to include a whitespace or # character as part of the pattern.
+be used to include a white space or # character as part of the pattern.
</P>
<P>
If you want to remove the special meaning from a sequence of characters, you
@@ -264,7 +264,7 @@
\a alarm, that is, the BEL character (hex 07)
\cx "control-x", where x is any ASCII character
\e escape (hex 1B)
- \f formfeed (hex 0C)
+ \f form feed (hex 0C)
\n linefeed (hex 0A)
\r carriage return (hex 0D)
\t tab (hex 09)
@@ -406,12 +406,12 @@
<pre>
\d any decimal digit
\D any character that is not a decimal digit
- \h any horizontal whitespace character
- \H any character that is not a horizontal whitespace character
- \s any whitespace character
- \S any character that is not a whitespace character
- \v any vertical whitespace character
- \V any character that is not a vertical whitespace character
+ \h any horizontal white space character
+ \H any character that is not a horizontal white space character
+ \s any white space character
+ \S any character that is not a white space character
+ \v any vertical white space character
+ \V any character that is not a vertical white space character
\w any "word" character
\W any "non-word" character
</pre>
@@ -497,7 +497,7 @@
<pre>
U+000A Linefeed
U+000B Vertical tab
- U+000C Formfeed
+ U+000C Form feed
U+000D Carriage return
U+0085 Next line
U+2028 Line separator
@@ -520,7 +520,7 @@
<a href="#atomicgroup">below.</a>
This particular group matches either the two-character sequence CR followed by
LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,
-U+000B), FF (formfeed, U+000C), CR (carriage return, U+000D), or NEL (next
+U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next
line, U+0085). The two-character sequence is treated as a single unit that
cannot be split.
</P>
@@ -822,7 +822,7 @@
Xwd Any Perl "word" character
</pre>
Xan matches characters that have either the L (letter) or the N (number)
-property. Xps matches the characters tab, linefeed, vertical tab, formfeed, or
+property. Xps matches the characters tab, linefeed, vertical tab, form feed, or
carriage return, and any other character that has the Z (separator) property.
Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the
same characters as Xan, plus underscore.
@@ -1829,7 +1829,7 @@
following a backslash are taken as part of a potential back reference number.
If the pattern continues with a digit character, some delimiter must be used to
terminate the back reference. If the PCRE_EXTENDED option is set, this can be
-whitespace. Otherwise, the \g{ syntax or an empty comment (see
+white space. Otherwise, the \g{ syntax or an empty comment (see
<a href="#comments">"Comments"</a>
below) can be used.
</P>
@@ -2171,7 +2171,7 @@
subroutines that can be referenced from elsewhere. (The use of
<a href="#subpatternsassubroutines">subroutines</a>
is described below.) For example, a pattern to match an IPv4 address such as
-"192.168.23.245" could be written like this (ignore whitespace and line
+"192.168.23.245" could be written like this (ignore white space and line
breaks):
<pre>
(?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
@@ -2565,17 +2565,18 @@
a successful positive assertion <i>is</i> passed back when a match succeeds
(compare capturing parentheses in assertions). Note that such subpatterns are
processed as anchored at the point where they are tested. Note also that Perl's
-treatment of subroutines is different in some cases.
+treatment of subroutines and assertions is different in some cases.
</P>
<P>
The new verbs make use of what was previously invalid syntax: an opening
parenthesis followed by an asterisk. They are generally of the form
(*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,
depending on whether or not an argument is present. A name is any sequence of
-characters that does not include a closing parenthesis. If the name is empty,
-that is, if the closing parenthesis immediately follows the colon, the effect
-is as if the colon were not there. Any number of these verbs may occur in a
-pattern.
+characters that does not include a closing parenthesis. The maximum length of
+name is 255 in the 8-bit library and 65535 in the 16-bit library. If the name
+is empty, that is, if the closing parenthesis immediately follows the colon,
+the effect is as if the colon were not there. Any number of these verbs may
+occur in a pattern.
<a name="nooptimize"></a></P>
<br><b>
Optimizations that affect backtracking verbs
@@ -2593,7 +2594,7 @@
<a href="pcreapi.html#execoptions">"Option bits for <b>pcre_exec()</b>"</a>
in the
<a href="pcreapi.html"><b>pcreapi</b></a>
-documentation.
+documentation.
</P>
<P>
Experiments with Perl suggest that it too has similar optimizations, sometimes
@@ -2687,7 +2688,7 @@
</P>
<P>
If you are interested in (*MARK) values after failed matches, you should
-probably set the PCRE_NO_START_OPTIMIZE option
+probably set the PCRE_NO_START_OPTIMIZE option
<a href="#nooptimize">(see above)</a>
to ensure that the match is always attempted.
</P>
@@ -2868,7 +2869,7 @@
</P>
<br><a name="SEC28" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 14 April 2012
+Last updated: 01 June 2012
<br>
Copyright © 1997-2012 University of Cambridge.
<br>
Modified: code/trunk/doc/html/pcresyntax.html
===================================================================
--- code/trunk/doc/html/pcresyntax.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcresyntax.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -61,7 +61,7 @@
\a alarm, that is, the BEL character (hex 07)
\cx "control-x", where x is any ASCII character
\e escape (hex 1B)
- \f formfeed (hex 0C)
+ \f form feed (hex 0C)
\n newline (hex 0A)
\r carriage return (hex 0D)
\t tab (hex 09)
@@ -78,16 +78,16 @@
\C one data unit, even in UTF mode (best avoided)
\d a decimal digit
\D a character that is not a decimal digit
- \h a horizontal whitespace character
- \H a character that is not a horizontal whitespace character
+ \h a horizontal white space character
+ \H a character that is not a horizontal white space character
\N a character that is not a newline
\p{<i>xx</i>} a character with the <i>xx</i> property
\P{<i>xx</i>} a character without the <i>xx</i> property
\R a newline sequence
- \s a whitespace character
- \S a character that is not a whitespace character
- \v a vertical whitespace character
- \V a character that is not a vertical whitespace character
+ \s a white space character
+ \S a character that is not a white space character
+ \v a vertical white space character
+ \V a character that is not a vertical white space character
\w a "word" character
\W a "non-word" character
\X an extended Unicode sequence
@@ -278,7 +278,7 @@
lower lower case letter
print printing, including space
punct printing, excluding alphanumeric
- space whitespace
+ space white space
upper upper case letter
word same as \w
xdigit hexadecimal digit
Modified: code/trunk/doc/html/pcretest.html
===================================================================
--- code/trunk/doc/html/pcretest.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcretest.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -166,8 +166,8 @@
Behave as if each pattern has the <b>/S</b> modifier; in other words, force each
pattern to be studied. If <b>-s+</b> is used, all the JIT compile options are
passed to <b>pcre[16]_study()</b>, causing just-in-time optimization to be set
-up if it is available, for both full and partial matching. Specific JIT compile
-options can be selected by following <b>-s+</b> with a digit in the range 1 to
+up if it is available, for both full and partial matching. Specific JIT compile
+options can be selected by following <b>-s+</b> with a digit in the range 1 to
7, which selects the JIT compile modes as follows:
<pre>
1 normal match only
@@ -453,7 +453,7 @@
If the <b>/S</b> modifier is immediately followed by a + character, the call to
<b>pcre[16]_study()</b> is made with all the JIT study options, requesting
just-in-time optimization support if it is available, for both normal and
-partial matching. If you want to restrict the JIT compiling modes, you can
+partial matching. If you want to restrict the JIT compiling modes, you can
follow <b>/S+</b> with a digit in the range 1 to 7:
<pre>
1 normal match only
@@ -469,7 +469,7 @@
</P>
<P>
Note that there is also an independent <b>/+</b> modifier; it must not be given
-immediately after <b>/S</b> or <b>/S+</b> because this will be misinterpreted.
+immediately after <b>/S</b> or <b>/S+</b> because this will be misinterpreted.
</P>
<P>
If JIT studying is successful, the compiled JIT code will automatically be used
Modified: code/trunk/doc/html/pcreunicode.html
===================================================================
--- code/trunk/doc/html/pcreunicode.html 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/html/pcreunicode.html 2012-06-02 11:03:06 UTC (rev 975)
@@ -91,7 +91,7 @@
<P>
If an invalid UTF-8 string is passed to PCRE, an error return is given. At
compile time, the only additional information is the offset to the first byte
-of the failing character. The runtime functions <b>pcre_exec()</b> and
+of the failing character. The run-time functions <b>pcre_exec()</b> and
<b>pcre_dfa_exec()</b> also pass back this information, as well as a more
detailed reason code if the caller has provided memory in which to do this.
</P>
@@ -136,7 +136,7 @@
<P>
If an invalid UTF-16 string is passed to PCRE, an error return is given. At
compile time, the only additional information is the offset to the first data
-unit of the failing character. The runtime functions <b>pcre16_exec()</b> and
+unit of the failing character. The run-time functions <b>pcre16_exec()</b> and
<b>pcre16_dfa_exec()</b> also pass back this information, as well as a more
detailed reason code if the caller has provided memory in which to do this.
</P>
@@ -202,7 +202,7 @@
low-valued characters, unless the PCRE_UCP option is set.
</P>
<P>
-8. However, the horizontal and vertical whitespace matching escapes (\h, \H,
+8. However, the horizontal and vertical white space matching escapes (\h, \H,
\v, and \V) do match all the appropriate Unicode characters, whether or not
PCRE_UCP is set.
</P>
Modified: code/trunk/doc/pcre.txt
===================================================================
--- code/trunk/doc/pcre.txt 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcre.txt 2012-06-02 11:03:06 UTC (rev 975)
@@ -138,8 +138,8 @@
Last updated: 10 January 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCRE(3) PCRE(3)
@@ -464,8 +464,8 @@
Last updated: 14 April 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCREBUILD(3) PCREBUILD(3)
@@ -568,9 +568,9 @@
tern compiling functions.
If you set --enable-utf when compiling in an EBCDIC environment, PCRE
- expects its input to be either ASCII or UTF-8 (depending on the runtime
- option). It is not possible to support both EBCDIC and UTF-8 codes in
- the same version of the library. Consequently, --enable-utf and
+ expects its input to be either ASCII or UTF-8 (depending on the run-
+ time option). It is not possible to support both EBCDIC and UTF-8 codes
+ in the same version of the library. Consequently, --enable-utf and
--enable-ebcdic are mutually exclusive.
@@ -761,9 +761,9 @@
to the configure command, the distributed tables are no longer used.
Instead, a program called dftables is compiled and run. This outputs
the source for new set of tables, created in the default locale of your
- C runtime system. (This method of replacing the tables does not work if
- you are cross compiling, because dftables is run on the local host. If
- you need to create alternative tables when cross compiling, you will
+ C run-time system. (This method of replacing the tables does not work
+ if you are cross compiling, because dftables is run on the local host.
+ If you need to create alternative tables when cross compiling, you will
have to do so "by hand".)
@@ -860,8 +860,8 @@
Last updated: 07 January 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCREMATCHING(3) PCREMATCHING(3)
@@ -1067,8 +1067,8 @@
Last updated: 08 January 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCREAPI(3) PCREAPI(3)
@@ -1311,7 +1311,7 @@
feed) character, the two-character sequence CRLF, any of the three pre-
ceding, or any Unicode newline sequence. The Unicode newline sequences
are the three just mentioned, plus the single characters VT (vertical
- tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line
+ tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
separator, U+2028), and PS (paragraph separator, U+2029).
Each of the first three conventions is used by at least one operating
@@ -1625,8 +1625,8 @@
PCRE_EXTENDED
- If this bit is set, whitespace data characters in the pattern are
- totally ignored except when escaped or inside a character class. White-
+ If this bit is set, white space data characters in the pattern are
+ totally ignored except when escaped or inside a character class. White
space does not include the VT character (code 11). In addition, charac-
ters between an unescaped # outside a character class and the next new-
line, inclusive, are also ignored. This is equivalent to Perl's /x
@@ -1642,7 +1642,7 @@
This option makes it possible to include comments inside complicated
patterns. Note, however, that this applies only to data characters.
- Whitespace characters may never appear within special character
+ White space characters may never appear within special character
sequences in a pattern, for example within the sequence (?( that intro-
duces a conditional subpattern.
@@ -1727,7 +1727,7 @@
that any of the three preceding sequences should be recognized. Setting
PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should be
recognized. The Unicode newline sequences are the three just mentioned,
- plus the single characters VT (vertical tab, U+000B), FF (formfeed,
+ plus the single characters VT (vertical tab, U+000B), FF (form feed,
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
(paragraph separator, U+2029). For the 8-bit library, the last two are
recognized only in UTF-8 mode.
@@ -1741,7 +1741,7 @@
cause an error.
The only time that a line break in a pattern is specially recognized
- when compiling is when PCRE_EXTENDED is set. CR and LF are whitespace
+ when compiling is when PCRE_EXTENDED is set. CR and LF are white space
characters, and so are ignored in this mode. Also, an unescaped # out-
side a character class indicates a comment that lasts until after the
next line break sequence. In other circumstances, line break sequences
@@ -1894,6 +1894,7 @@
72 too many forward references
73 disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
74 invalid UTF-16 string (specifically UTF-16)
+ 75 name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
The numbers 32 and 10000 in errors 48 and 49 are defaults; different
values may be used if the limits were changed when PCRE was built.
@@ -2993,19 +2994,19 @@
for the just-in-time processing stack is not large enough. See the
pcrejit documentation for more details.
- PCRE_ERROR_BADMODE (-28)
+ PCRE_ERROR_BADMODE (-28)
This error is given if a pattern that was compiled by the 8-bit library
is passed to a 16-bit library function, or vice versa.
- PCRE_ERROR_BADENDIANNESS (-29)
+ PCRE_ERROR_BADENDIANNESS (-29)
This error is given if a pattern that was compiled and saved is
reloaded on a host with different endianness. The utility function
pcre_pattern_to_host_byte_order() can be used to convert such a pattern
so that it runs on the new host.
- Error numbers -16 to -20 and -22 are not used by pcre_exec().
+ Error numbers -16 to -20, -22, and -30 are not used by pcre_exec().
Reason codes for invalid UTF-8 strings
@@ -3468,10 +3469,17 @@
This error is given if the output vector is not large enough. This
should be extremely rare, as a vector of size 1000 is used.
+ PCRE_ERROR_DFA_BADRESTART (-30)
+ When pcre_dfa_exec() is called with the PCRE_DFA_RESTART option, some
+ plausibility checks are made on the contents of the workspace, which
+ should contain data about the previous partial match. If any of these
+ checks fail, this error is given.
+
+
SEE ALSO
- pcre16(3), pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematch-
+ pcre16(3), pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematch-
ing(3), pcrepartial(3), pcreposix(3), pcreprecompile(3), pcresample(3),
pcrestack(3).
@@ -3485,11 +3493,11 @@
REVISION
- Last updated: 14 April 2012
+ Last updated: 04 May 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCRECALLOUT(3) PCRECALLOUT(3)
@@ -3687,8 +3695,8 @@
Last updated: 08 Janurary 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCRECOMPAT(3) PCRECOMPAT(3)
@@ -3777,9 +3785,17 @@
There is a discussion that explains these differences in more detail in
the section on recursion differences from Perl in the pcrepattern page.
- 11. If (*THEN) is present in a group that is called as a subroutine,
- its action is limited to that group, even if the group does not contain
- any | characters.
+ 11. If any of the backtracking control verbs are used in an assertion
+ or in a subpattern that is called as a subroutine (whether or not
+ recursively), their effect is confined to that subpattern; it does not
+ extend to the surrounding pattern. This is not always the case in Perl.
+ In particular, if (*THEN) is present in a group that is called as a
+ subroutine, its action is limited to that group, even if the group does
+ not contain any | characters. There is one exception to this: the name
+ from a *(MARK), (*PRUNE), or (*THEN) that is encountered in a success-
+ ful positive assertion is passed back when a match succeeds (compare
+ capturing parentheses in assertions). Note that such subpatterns are
+ processed as anchored at the point where they are tested.
12. There are some differences that are concerned with the settings of
captured strings when part of a pattern is repeated. For example,
@@ -3799,7 +3815,7 @@
14. Perl recognizes comments in some places that PCRE does not, for
example, between the ( and ? at the start of a subpattern. If the /x
- modifier is set, Perl allows whitespace between ( and ? but PCRE never
+ modifier is set, Perl allows white space between ( and ? but PCRE never
does, even if the PCRE_EXTENDED option is set.
15. PCRE provides some extensions to the Perl regular expression facil-
@@ -3859,11 +3875,11 @@
REVISION
- Last updated: 08 Januray 2012
+ Last updated: 01 June 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCREPATTERN(3) PCREPATTERN(3)
@@ -4045,10 +4061,10 @@
after a backslash. All other characters (in particular, those whose
codepoints are greater than 127) are treated as literals.
- If a pattern is compiled with the PCRE_EXTENDED option, whitespace in
+ If a pattern is compiled with the PCRE_EXTENDED option, white space in
the pattern (other than in a character class) and characters between a
# outside a character class and the next newline are ignored. An escap-
- ing backslash can be used to include a whitespace or # character as
+ ing backslash can be used to include a white space or # character as
part of the pattern.
If you want to remove the special meaning from a sequence of charac-
@@ -4083,7 +4099,7 @@
\a alarm, that is, the BEL character (hex 07)
\cx "control-x", where x is any ASCII character
\e escape (hex 1B)
- \f formfeed (hex 0C)
+ \f form feed (hex 0C)
\n linefeed (hex 0A)
\r carriage return (hex 0D)
\t tab (hex 09)
@@ -4212,12 +4228,12 @@
\d any decimal digit
\D any character that is not a decimal digit
- \h any horizontal whitespace character
- \H any character that is not a horizontal whitespace character
- \s any whitespace character
- \S any character that is not a whitespace character
- \v any vertical whitespace character
- \V any character that is not a vertical whitespace character
+ \h any horizontal white space character
+ \H any character that is not a horizontal white space character
+ \s any white space character
+ \S any character that is not a white space character
+ \v any vertical white space character
+ \V any character that is not a vertical white space character
\w any "word" character
\W any "non-word" character
@@ -4297,7 +4313,7 @@
U+000A Linefeed
U+000B Vertical tab
- U+000C Formfeed
+ U+000C Form feed
U+000D Carriage return
U+0085 Next line
U+2028 Line separator
@@ -4317,9 +4333,9 @@
This is an example of an "atomic group", details of which are given
below. This particular group matches either the two-character sequence
CR followed by LF, or one of the single characters LF (linefeed,
- U+000A), VT (vertical tab, U+000B), FF (formfeed, U+000C), CR (carriage
- return, U+000D), or NEL (next line, U+0085). The two-character sequence
- is treated as a single unit that cannot be split.
+ U+000A), VT (vertical tab, U+000B), FF (form feed, U+000C), CR (car-
+ riage return, U+000D), or NEL (next line, U+0085). The two-character
+ sequence is treated as a single unit that cannot be split.
In other modes, two additional characters whose codepoints are greater
than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa-
@@ -4519,7 +4535,7 @@
Xan matches characters that have either the L (letter) or the N (num-
ber) property. Xps matches the characters tab, linefeed, vertical tab,
- formfeed, or carriage return, and any other character that has the Z
+ form feed, or carriage return, and any other character that has the Z
(separator) property. Xsp is the same as Xps, except that vertical tab
is excluded. Xwd matches the same characters as Xan, plus underscore.
@@ -5484,8 +5500,8 @@
its following a backslash are taken as part of a potential back refer-
ence number. If the pattern continues with a digit character, some
delimiter must be used to terminate the back reference. If the
- PCRE_EXTENDED option is set, this can be whitespace. Otherwise, the \g{
- syntax or an empty comment (see "Comments" below) can be used.
+ PCRE_EXTENDED option is set, this can be white space. Otherwise, the
+ \g{ syntax or an empty comment (see "Comments" below) can be used.
Recursive back references
@@ -5797,7 +5813,7 @@
DEFINE is that it can be used to define subroutines that can be refer-
enced from elsewhere. (The use of subroutines is described below.) For
example, a pattern to match an IPv4 address such as "192.168.23.245"
- could be written like this (ignore whitespace and line breaks):
+ could be written like this (ignore white space and line breaks):
(?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
\b (?&byte) (\.(?&byte)){3} \b
@@ -6188,82 +6204,83 @@
that is encountered in a successful positive assertion is passed back
when a match succeeds (compare capturing parentheses in assertions).
Note that such subpatterns are processed as anchored at the point where
- they are tested. Note also that Perl's treatment of subroutines is dif-
- ferent in some cases.
+ they are tested. Note also that Perl's treatment of subroutines and
+ assertions is different in some cases.
The new verbs make use of what was previously invalid syntax: an open-
ing parenthesis followed by an asterisk. They are generally of the form
(*VERB) or (*VERB:NAME). Some may take either form, with differing be-
haviour, depending on whether or not an argument is present. A name is
any sequence of characters that does not include a closing parenthesis.
- If the name is empty, that is, if the closing parenthesis immediately
- follows the colon, the effect is as if the colon were not there. Any
- number of these verbs may occur in a pattern.
+ The maximum length of name is 255 in the 8-bit library and 65535 in the
+ 16-bit library. If the name is empty, that is, if the closing parenthe-
+ sis immediately follows the colon, the effect is as if the colon were
+ not there. Any number of these verbs may occur in a pattern.
Optimizations that affect backtracking verbs
- PCRE contains some optimizations that are used to speed up matching by
+ PCRE contains some optimizations that are used to speed up matching by
running some checks at the start of each match attempt. For example, it
- may know the minimum length of matching subject, or that a particular
- character must be present. When one of these optimizations suppresses
- the running of a match, any included backtracking verbs will not, of
+ may know the minimum length of matching subject, or that a particular
+ character must be present. When one of these optimizations suppresses
+ the running of a match, any included backtracking verbs will not, of
course, be processed. You can suppress the start-of-match optimizations
- by setting the PCRE_NO_START_OPTIMIZE option when calling pcre_com-
+ by setting the PCRE_NO_START_OPTIMIZE option when calling pcre_com-
pile() or pcre_exec(), or by starting the pattern with (*NO_START_OPT).
There is more discussion of this option in the section entitled "Option
bits for pcre_exec()" in the pcreapi documentation.
- Experiments with Perl suggest that it too has similar optimizations,
+ Experiments with Perl suggest that it too has similar optimizations,
sometimes leading to anomalous results.
Verbs that act immediately
- The following verbs act as soon as they are encountered. They may not
+ The following verbs act as soon as they are encountered. They may not
be followed by a name.
(*ACCEPT)
- This verb causes the match to end successfully, skipping the remainder
- of the pattern. However, when it is inside a subpattern that is called
- as a subroutine, only that subpattern is ended successfully. Matching
- then continues at the outer level. If (*ACCEPT) is inside capturing
+ This verb causes the match to end successfully, skipping the remainder
+ of the pattern. However, when it is inside a subpattern that is called
+ as a subroutine, only that subpattern is ended successfully. Matching
+ then continues at the outer level. If (*ACCEPT) is inside capturing
parentheses, the data so far is captured. For example:
A((?:A|B(*ACCEPT)|C)D)
- This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
+ This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
tured by the outer parentheses.
(*FAIL) or (*F)
- This verb causes a matching failure, forcing backtracking to occur. It
- is equivalent to (?!) but easier to read. The Perl documentation notes
- that it is probably useful only when combined with (?{}) or (??{}).
- Those are, of course, Perl features that are not present in PCRE. The
- nearest equivalent is the callout feature, as for example in this pat-
+ This verb causes a matching failure, forcing backtracking to occur. It
+ is equivalent to (?!) but easier to read. The Perl documentation notes
+ that it is probably useful only when combined with (?{}) or (??{}).
+ Those are, of course, Perl features that are not present in PCRE. The
+ nearest equivalent is the callout feature, as for example in this pat-
tern:
a+(?C)(*FAIL)
- A match with the string "aaaa" always fails, but the callout is taken
+ A match with the string "aaaa" always fails, but the callout is taken
before each backtrack happens (in this example, 10 times).
Recording which path was taken
- There is one verb whose main purpose is to track how a match was
- arrived at, though it also has a secondary use in conjunction with
+ There is one verb whose main purpose is to track how a match was
+ arrived at, though it also has a secondary use in conjunction with
advancing the match starting point (see (*SKIP) below).
(*MARK:NAME) or (*:NAME)
- A name is always required with this verb. There may be as many
- instances of (*MARK) as you like in a pattern, and their names do not
+ A name is always required with this verb. There may be as many
+ instances of (*MARK) as you like in a pattern, and their names do not
have to be unique.
- When a match succeeds, the name of the last-encountered (*MARK) on the
- matching path is passed back to the caller as described in the section
- entitled "Extra data for pcre_exec()" in the pcreapi documentation.
- Here is an example of pcretest output, where the /K modifier requests
+ When a match succeeds, the name of the last-encountered (*MARK) on the
+ matching path is passed back to the caller as described in the section
+ entitled "Extra data for pcre_exec()" in the pcreapi documentation.
+ Here is an example of pcretest output, where the /K modifier requests
the retrieval and outputting of (*MARK) data:
re> /X(*MARK:A)Y|X(*MARK:B)Z/K
@@ -6275,63 +6292,63 @@
MK: B
The (*MARK) name is tagged with "MK:" in this output, and in this exam-
- ple it indicates which of the two alternatives matched. This is a more
- efficient way of obtaining this information than putting each alterna-
+ ple it indicates which of the two alternatives matched. This is a more
+ efficient way of obtaining this information than putting each alterna-
tive in its own capturing parentheses.
If (*MARK) is encountered in a positive assertion, its name is recorded
and passed back if it is the last-encountered. This does not happen for
negative assertions.
- After a partial match or a failed match, the name of the last encoun-
+ After a partial match or a failed match, the name of the last encoun-
tered (*MARK) in the entire match process is returned. For example:
re> /X(*MARK:A)Y|X(*MARK:B)Z/K
data> XP
No match, mark = B
- Note that in this unanchored example the mark is retained from the
+ Note that in this unanchored example the mark is retained from the
match attempt that started at the letter "X" in the subject. Subsequent
match attempts starting at "P" and then with an empty string do not get
as far as the (*MARK) item, but nevertheless do not reset it.
- If you are interested in (*MARK) values after failed matches, you
- should probably set the PCRE_NO_START_OPTIMIZE option (see above) to
+ If you are interested in (*MARK) values after failed matches, you
+ should probably set the PCRE_NO_START_OPTIMIZE option (see above) to
ensure that the match is always attempted.
Verbs that act after backtracking
The following verbs do nothing when they are encountered. Matching con-
- tinues with what follows, but if there is no subsequent match, causing
- a backtrack to the verb, a failure is forced. That is, backtracking
- cannot pass to the left of the verb. However, when one of these verbs
- appears inside an atomic group, its effect is confined to that group,
- because once the group has been matched, there is never any backtrack-
- ing into it. In this situation, backtracking can "jump back" to the
- left of the entire atomic group. (Remember also, as stated above, that
+ tinues with what follows, but if there is no subsequent match, causing
+ a backtrack to the verb, a failure is forced. That is, backtracking
+ cannot pass to the left of the verb. However, when one of these verbs
+ appears inside an atomic group, its effect is confined to that group,
+ because once the group has been matched, there is never any backtrack-
+ ing into it. In this situation, backtracking can "jump back" to the
+ left of the entire atomic group. (Remember also, as stated above, that
this localization also applies in subroutine calls and assertions.)
- These verbs differ in exactly what kind of failure occurs when back-
+ These verbs differ in exactly what kind of failure occurs when back-
tracking reaches them.
(*COMMIT)
- This verb, which may not be followed by a name, causes the whole match
+ This verb, which may not be followed by a name, causes the whole match
to fail outright if the rest of the pattern does not match. Even if the
pattern is unanchored, no further attempts to find a match by advancing
the starting point take place. Once (*COMMIT) has been passed,
- pcre_exec() is committed to finding a match at the current starting
+ pcre_exec() is committed to finding a match at the current starting
point, or not at all. For example:
a+(*COMMIT)b
- This matches "xxaab" but not "aacaab". It can be thought of as a kind
+ This matches "xxaab" but not "aacaab". It can be thought of as a kind
of dynamic anchor, or "I've started, so I must finish." The name of the
- most recently passed (*MARK) in the path is passed back when (*COMMIT)
+ most recently passed (*MARK) in the path is passed back when (*COMMIT)
forces a match failure.
- Note that (*COMMIT) at the start of a pattern is not the same as an
- anchor, unless PCRE's start-of-match optimizations are turned off, as
+ Note that (*COMMIT) at the start of a pattern is not the same as an
+ anchor, unless PCRE's start-of-match optimizations are turned off, as
shown in this pcretest example:
re> /(*COMMIT)abc/
@@ -6340,111 +6357,111 @@
xyzabc\Y
No match
- PCRE knows that any match must start with "a", so the optimization
- skips along the subject to "a" before running the first match attempt,
- which succeeds. When the optimization is disabled by the \Y escape in
+ PCRE knows that any match must start with "a", so the optimization
+ skips along the subject to "a" before running the first match attempt,
+ which succeeds. When the optimization is disabled by the \Y escape in
the second subject, the match starts at "x" and so the (*COMMIT) causes
it to fail without trying any other starting points.
(*PRUNE) or (*PRUNE:NAME)
- This verb causes the match to fail at the current starting position in
- the subject if the rest of the pattern does not match. If the pattern
- is unanchored, the normal "bumpalong" advance to the next starting
- character then happens. Backtracking can occur as usual to the left of
- (*PRUNE), before it is reached, or when matching to the right of
- (*PRUNE), but if there is no match to the right, backtracking cannot
- cross (*PRUNE). In simple cases, the use of (*PRUNE) is just an alter-
- native to an atomic group or possessive quantifier, but there are some
+ This verb causes the match to fail at the current starting position in
+ the subject if the rest of the pattern does not match. If the pattern
+ is unanchored, the normal "bumpalong" advance to the next starting
+ character then happens. Backtracking can occur as usual to the left of
+ (*PRUNE), before it is reached, or when matching to the right of
+ (*PRUNE), but if there is no match to the right, backtracking cannot
+ cross (*PRUNE). In simple cases, the use of (*PRUNE) is just an alter-
+ native to an atomic group or possessive quantifier, but there are some
uses of (*PRUNE) that cannot be expressed in any other way. The behav-
- iour of (*PRUNE:NAME) is the same as (*MARK:NAME)(*PRUNE). In an
+ iour of (*PRUNE:NAME) is the same as (*MARK:NAME)(*PRUNE). In an
anchored pattern (*PRUNE) has the same effect as (*COMMIT).
(*SKIP)
- This verb, when given without a name, is like (*PRUNE), except that if
- the pattern is unanchored, the "bumpalong" advance is not to the next
+ This verb, when given without a name, is like (*PRUNE), except that if
+ the pattern is unanchored, the "bumpalong" advance is not to the next
character, but to the position in the subject where (*SKIP) was encoun-
- tered. (*SKIP) signifies that whatever text was matched leading up to
+ tered. (*SKIP) signifies that whatever text was matched leading up to
it cannot be part of a successful match. Consider:
a+(*SKIP)b
- If the subject is "aaaac...", after the first match attempt fails
- (starting at the first character in the string), the starting point
+ If the subject is "aaaac...", after the first match attempt fails
+ (starting at the first character in the string), the starting point
skips on to start the next attempt at "c". Note that a possessive quan-
- tifer does not have the same effect as this example; although it would
- suppress backtracking during the first match attempt, the second
- attempt would start at the second character instead of skipping on to
+ tifer does not have the same effect as this example; although it would
+ suppress backtracking during the first match attempt, the second
+ attempt would start at the second character instead of skipping on to
"c".
(*SKIP:NAME)
- When (*SKIP) has an associated name, its behaviour is modified. If the
+ When (*SKIP) has an associated name, its behaviour is modified. If the
following pattern fails to match, the previous path through the pattern
- is searched for the most recent (*MARK) that has the same name. If one
- is found, the "bumpalong" advance is to the subject position that cor-
- responds to that (*MARK) instead of to where (*SKIP) was encountered.
+ is searched for the most recent (*MARK) that has the same name. If one
+ is found, the "bumpalong" advance is to the subject position that cor-
+ responds to that (*MARK) instead of to where (*SKIP) was encountered.
If no (*MARK) with a matching name is found, the (*SKIP) is ignored.
(*THEN) or (*THEN:NAME)
- This verb causes a skip to the next innermost alternative if the rest
- of the pattern does not match. That is, it cancels pending backtrack-
- ing, but only within the current alternative. Its name comes from the
+ This verb causes a skip to the next innermost alternative if the rest
+ of the pattern does not match. That is, it cancels pending backtrack-
+ ing, but only within the current alternative. Its name comes from the
observation that it can be used for a pattern-based if-then-else block:
( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
- If the COND1 pattern matches, FOO is tried (and possibly further items
- after the end of the group if FOO succeeds); on failure, the matcher
- skips to the second alternative and tries COND2, without backtracking
- into COND1. The behaviour of (*THEN:NAME) is exactly the same as
- (*MARK:NAME)(*THEN). If (*THEN) is not inside an alternation, it acts
+ If the COND1 pattern matches, FOO is tried (and possibly further items
+ after the end of the group if FOO succeeds); on failure, the matcher
+ skips to the second alternative and tries COND2, without backtracking
+ into COND1. The behaviour of (*THEN:NAME) is exactly the same as
+ (*MARK:NAME)(*THEN). If (*THEN) is not inside an alternation, it acts
like (*PRUNE).
- Note that a subpattern that does not contain a | character is just a
- part of the enclosing alternative; it is not a nested alternation with
- only one alternative. The effect of (*THEN) extends beyond such a sub-
- pattern to the enclosing alternative. Consider this pattern, where A,
+ Note that a subpattern that does not contain a | character is just a
+ part of the enclosing alternative; it is not a nested alternation with
+ only one alternative. The effect of (*THEN) extends beyond such a sub-
+ pattern to the enclosing alternative. Consider this pattern, where A,
B, etc. are complex pattern fragments that do not contain any | charac-
ters at this level:
A (B(*THEN)C) | D
- If A and B are matched, but there is a failure in C, matching does not
+ If A and B are matched, but there is a failure in C, matching does not
backtrack into A; instead it moves to the next alternative, that is, D.
- However, if the subpattern containing (*THEN) is given an alternative,
+ However, if the subpattern containing (*THEN) is given an alternative,
it behaves differently:
A (B(*THEN)C | (*FAIL)) | D
- The effect of (*THEN) is now confined to the inner subpattern. After a
+ The effect of (*THEN) is now confined to the inner subpattern. After a
failure in C, matching moves to (*FAIL), which causes the whole subpat-
- tern to fail because there are no more alternatives to try. In this
+ tern to fail because there are no more alternatives to try. In this
case, matching does now backtrack into A.
Note also that a conditional subpattern is not considered as having two
- alternatives, because only one is ever used. In other words, the |
+ alternatives, because only one is ever used. In other words, the |
character in a conditional subpattern has a different meaning. Ignoring
white space, consider:
^.*? (?(?=a) a | b(*THEN)c )
- If the subject is "ba", this pattern does not match. Because .*? is
- ungreedy, it initially matches zero characters. The condition (?=a)
- then fails, the character "b" is matched, but "c" is not. At this
- point, matching does not backtrack to .*? as might perhaps be expected
- from the presence of the | character. The conditional subpattern is
+ If the subject is "ba", this pattern does not match. Because .*? is
+ ungreedy, it initially matches zero characters. The condition (?=a)
+ then fails, the character "b" is matched, but "c" is not. At this
+ point, matching does not backtrack to .*? as might perhaps be expected
+ from the presence of the | character. The conditional subpattern is
part of the single alternative that comprises the whole pattern, and so
- the match fails. (If there was a backtrack into .*?, allowing it to
+ the match fails. (If there was a backtrack into .*?, allowing it to
match "b", the match would succeed.)
- The verbs just described provide four different "strengths" of control
+ The verbs just described provide four different "strengths" of control
when subsequent matching fails. (*THEN) is the weakest, carrying on the
- match at the next alternative. (*PRUNE) comes next, failing the match
- at the current starting position, but allowing an advance to the next
- character (for an unanchored pattern). (*SKIP) is similar, except that
+ match at the next alternative. (*PRUNE) comes next, failing the match
+ at the current starting position, but allowing an advance to the next
+ character (for an unanchored pattern). (*SKIP) is similar, except that
the advance may be more than one character. (*COMMIT) is the strongest,
causing the entire match to fail.
@@ -6454,15 +6471,15 @@
(A(*COMMIT)B(*THEN)C|D)
- Once A has matched, PCRE is committed to this match, at the current
- starting position. If subsequently B matches, but C does not, the nor-
+ Once A has matched, PCRE is committed to this match, at the current
+ starting position. If subsequently B matches, but C does not, the nor-
mal (*THEN) action of trying the next alternative (that is, D) does not
happen because (*COMMIT) overrides.
SEE ALSO
- pcreapi(3), pcrecallout(3), pcrematching(3), pcresyntax(3), pcre(3),
+ pcreapi(3), pcrecallout(3), pcrematching(3), pcresyntax(3), pcre(3),
pcre16(3).
@@ -6475,11 +6492,11 @@
REVISION
- Last updated: 14 April 2012
+ Last updated: 01 June 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCRESYNTAX(3) PCRESYNTAX(3)
@@ -6505,7 +6522,7 @@
\a alarm, that is, the BEL character (hex 07)
\cx "control-x", where x is any ASCII character
\e escape (hex 1B)
- \f formfeed (hex 0C)
+ \f form feed (hex 0C)
\n newline (hex 0A)
\r carriage return (hex 0D)
\t tab (hex 09)
@@ -6521,16 +6538,16 @@
\C one data unit, even in UTF mode (best avoided)
\d a decimal digit
\D a character that is not a decimal digit
- \h a horizontal whitespace character
- \H a character that is not a horizontal whitespace character
+ \h a horizontal white space character
+ \H a character that is not a horizontal white space character
\N a character that is not a newline
\p{xx} a character with the xx property
\P{xx} a character without the xx property
\R a newline sequence
- \s a whitespace character
- \S a character that is not a whitespace character
- \v a vertical whitespace character
- \V a character that is not a vertical whitespace character
+ \s a white space character
+ \S a character that is not a white space character
+ \v a vertical white space character
+ \V a character that is not a vertical white space character
\w a "word" character
\W a "non-word" character
\X an extended Unicode sequence
@@ -6634,7 +6651,7 @@
lower lower case letter
print printing, including space
punct printing, excluding alphanumeric
- space whitespace
+ space white space
upper upper case letter
word same as \w
xdigit hexadecimal digit
@@ -6856,8 +6873,8 @@
Last updated: 10 January 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCREUNICODE(3) PCREUNICODE(3)
@@ -6935,7 +6952,7 @@
If an invalid UTF-8 string is passed to PCRE, an error return is given.
At compile time, the only additional information is the offset to the
- first byte of the failing character. The runtime functions pcre_exec()
+ first byte of the failing character. The run-time functions pcre_exec()
and pcre_dfa_exec() also pass back this information, as well as a more
detailed reason code if the caller has provided memory in which to do
this.
@@ -6976,7 +6993,7 @@
If an invalid UTF-16 string is passed to PCRE, an error return is
given. At compile time, the only additional information is the offset
- to the first data unit of the failing character. The runtime functions
+ to the first data unit of the failing character. The run-time functions
pcre16_exec() and pcre16_dfa_exec() also pass back this information, as
well as a more detailed reason code if the caller has provided memory
in which to do this.
@@ -7030,7 +7047,7 @@
7. Similarly, characters that match the POSIX named character classes
are all low-valued characters, unless the PCRE_UCP option is set.
- 8. However, the horizontal and vertical whitespace matching escapes
+ 8. However, the horizontal and vertical white space matching escapes
(\h, \H, \v, and \V) do match all the appropriate Unicode characters,
whether or not PCRE_UCP is set.
@@ -7057,8 +7074,8 @@
Last updated: 14 April 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCREJIT(3) PCREJIT(3)
@@ -7209,10 +7226,8 @@
\C match a single byte; not supported in UTF-8 mode
(?Cn) callouts
- (*COMMIT) )
- (*MARK) )
- (*PRUNE) ) the backtracking control verbs
- (*SKIP) )
+ (*PRUNE) )
+ (*SKIP) ) backtracking control verbs
(*THEN) )
Support for some of these may be added in future.
@@ -7441,11 +7456,11 @@
REVISION
- Last updated: 14 April 2012
+ Last updated: 04 May 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCREPARTIAL(3) PCREPARTIAL(3)
@@ -7894,8 +7909,8 @@
Last updated: 24 February 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCREPRECOMPILE(3) PCREPRECOMPILE(3)
@@ -8029,8 +8044,8 @@
Last updated: 10 January 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCREPERFORM(3) PCREPERFORM(3)
@@ -8199,8 +8214,8 @@
Last updated: 09 January 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCREPOSIX(3) PCREPOSIX(3)
@@ -8463,8 +8478,8 @@
Last updated: 09 January 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCRECPP(3) PCRECPP(3)
@@ -8641,7 +8656,7 @@
PCRE_DOTALL dot matches newlines /s
PCRE_DOLLAR_ENDONLY $ matches only at end N/A
PCRE_EXTRA strict escape parsing N/A
- PCRE_EXTENDED ignore whitespaces /x
+ PCRE_EXTENDED ignore white spaces /x
PCRE_UTF8 handles UTF8 chars built-in
PCRE_UNGREEDY reverses * and *? N/A
PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
@@ -8805,8 +8820,8 @@
Last updated: 08 January 2012
------------------------------------------------------------------------------
-
-
+
+
PCRESAMPLE(3) PCRESAMPLE(3)
@@ -8929,6 +8944,10 @@
The maximum length of name for a named subpattern is 32 characters, and
the maximum number of named subpatterns is 10000.
+ The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or
+ (*THEN) verb is 255 for the 8-bit library and 65535 for the 16-bit
+ library.
+
The maximum length of a subject string is the largest positive number
that an integer variable can hold. However, when using the traditional
matching function, PCRE uses recursion to handle subpatterns and indef-
@@ -8946,11 +8965,11 @@
REVISION
- Last updated: 08 January 2012
+ Last updated: 04 May 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
PCRESTACK(3) PCRESTACK(3)
@@ -9134,5 +9153,5 @@
Last updated: 21 January 2012
Copyright (c) 1997-2012 University of Cambridge.
------------------------------------------------------------------------------
-
-
+
+
Modified: code/trunk/doc/pcre16.3
===================================================================
--- code/trunk/doc/pcre16.3 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcre16.3 2012-06-02 11:03:06 UTC (rev 975)
@@ -264,7 +264,7 @@
.sp
There are two new general option names, PCRE_UTF16 and PCRE_NO_UTF16_CHECK,
which correspond to PCRE_UTF8 and PCRE_NO_UTF8_CHECK in the 8-bit library. In
-fact, these new options define the same bits in the options word. There is a
+fact, these new options define the same bits in the options word. There is a
discussion about the
.\" HTML <a href="pcreunicode.html#utf16strings">
.\" </a>
@@ -274,7 +274,7 @@
.\" HREF
\fBpcreunicode\fP
.\"
-page.
+page.
.P
For the \fBpcre16_config()\fP function there is an option PCRE_CONFIG_UTF16
that returns 1 if UTF-16 support is configured, otherwise 0. If this option is
Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcreapi.3 2012-06-02 11:03:06 UTC (rev 975)
@@ -926,7 +926,7 @@
72 too many forward references
73 disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
74 invalid UTF-16 string (specifically UTF-16)
- 75 name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
+ 75 name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
.sp
The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
be used if the limits were changed when PCRE was built.
@@ -964,12 +964,12 @@
\fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block.
.P
The second argument of \fBpcre_study()\fP contains option bits. There are three
-options:
+options:
.sp
PCRE_STUDY_JIT_COMPILE
PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
-.sp
+.sp
If any of these are set, and the just-in-time compiler is available, the
pattern is further compiled into machine code that executes much faster than
the \fBpcre_exec()\fP interpretive matching function. If the just-in-time
@@ -1240,7 +1240,7 @@
.sp
Return the number of characters (NB not bytes) in the longest lookbehind
assertion in the pattern. Note that the simple assertions \eb and \eB require a
-one-character lookbehind. This information is useful when doing multi-segment
+one-character lookbehind. This information is useful when doing multi-segment
matching using the partial matching facilities.
.sp
PCRE_INFO_MINLENGTH
@@ -1524,7 +1524,7 @@
Limiting the recursion depth limits the amount of machine stack that can be
used, or, when PCRE has been compiled to use memory on the heap instead of the
stack, the amount of heap memory that can be used. This limit is not relevant,
-and is ignored, when matching is done using JIT compiled code.
+and is ignored, when matching is done using JIT compiled code.
.P
The default value for \fImatch_limit_recursion\fP can be set when PCRE is
built; the default default is the same value as the default for
@@ -1708,7 +1708,7 @@
"no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)
are considered at every possible starting position in the subject string. If
PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching
-time. The use of PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set,
+time. The use of PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set,
matching is always done using interpretively.
.P
Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation.
@@ -2639,9 +2639,9 @@
PCRE_ERROR_DFA_BADRESTART (-30)
.sp
When \fBpcre_dfa_exec()\fP is called with the \fBPCRE_DFA_RESTART\fP option,
-some plausibility checks are made on the contents of the workspace, which
-should contain data about the previous partial match. If any of these checks
-fail, this error is given.
+some plausibility checks are made on the contents of the workspace, which
+should contain data about the previous partial match. If any of these checks
+fail, this error is given.
.
.
.SH "SEE ALSO"
Modified: code/trunk/doc/pcregrep.1
===================================================================
--- code/trunk/doc/pcregrep.1 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcregrep.1 2012-06-02 11:03:06 UTC (rev 975)
@@ -98,7 +98,7 @@
.SH "BINARY FILES"
.rs
.sp
-By default, a file that contains a binary zero byte within the first 1024 bytes
+By default, a file that contains a binary zero byte within the first 1024 bytes
is identified as a binary file, and is processed specially. (GNU grep also
identifies binary files in this manner.) See the \fB--binary-files\fP option
for a means of changing the way binary files are handled.
@@ -139,7 +139,7 @@
guarantees to have up to 8K of preceding text available for context output.
.TP
\fB--binary-files=\fP\fIword\fP
-Specify how binary files are to be processed. If the word is "binary" (the
+Specify how binary files are to be processed. If the word is "binary" (the
default), pattern matching is performed on binary files, but the only output is
"Binary file <name> matches" when a match succeeds. If the word is "text",
which is equivalent to the \fB-a\fP or \fB--text\fP option, binary files are
Modified: code/trunk/doc/pcrejit.3
===================================================================
--- code/trunk/doc/pcrejit.3 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcrejit.3 2012-06-02 11:03:06 UTC (rev 975)
@@ -82,7 +82,7 @@
pcre_free(study_ptr);
#endif
.sp
-PCRE_STUDY_JIT_COMPILE requests the JIT compiler to generate code for complete
+PCRE_STUDY_JIT_COMPILE requests the JIT compiler to generate code for complete
matches. If you want to run partial matches using the PCRE_PARTIAL_HARD or
PCRE_PARTIAL_SOFT options of \fBpcre_exec()\fP, you should set one or both of
the following options in addition to, or instead of, PCRE_STUDY_JIT_COMPILE
@@ -108,7 +108,7 @@
no JIT data is created. Otherwise, the compiled pattern is passed to the JIT
compiler, which turns it into machine code that executes much faster than the
normal interpretive code. When \fBpcre_exec()\fP is passed a \fBpcre_extra\fP
-block containing a pointer to JIT code of the appropriate mode (normal or
+block containing a pointer to JIT code of the appropriate mode (normal or
hard/soft partial), it obeys that code instead of running the interpreter. The
result is identical, but the compiled JIT code runs much faster.
.P
@@ -149,7 +149,7 @@
.sp
\eC match a single byte; not supported in UTF-8 mode
(?Cn) callouts
- (*PRUNE) )
+ (*PRUNE) )
(*SKIP) ) backtracking control verbs
(*THEN) )
.sp
@@ -239,24 +239,24 @@
(2) If \fIcallback\fP is NULL and \fIdata\fP is not NULL, \fIdata\fP must be
a valid JIT stack, the result of calling \fBpcre_jit_stack_alloc()\fP.
.sp
- (3) If \fIcallback\fP is not NULL, it must point to a function that is
- called with \fIdata\fP as an argument at the start of matching, in
- order to set up a JIT stack. If the return from the callback
- function is NULL, the internal 32K stack is used; otherwise the
- return value must be a valid JIT stack, the result of calling
+ (3) If \fIcallback\fP is not NULL, it must point to a function that is
+ called with \fIdata\fP as an argument at the start of matching, in
+ order to set up a JIT stack. If the return from the callback
+ function is NULL, the internal 32K stack is used; otherwise the
+ return value must be a valid JIT stack, the result of calling
\fBpcre_jit_stack_alloc()\fP.
.sp
-A callback function is obeyed whenever JIT code is about to be run; it is not
-obeyed when \fBpcre_exec()\fP is called with options that are incompatible for
+A callback function is obeyed whenever JIT code is about to be run; it is not
+obeyed when \fBpcre_exec()\fP is called with options that are incompatible for
JIT execution. A callback function can therefore be used to determine whether a
match operation was executed by JIT or by the interpreter.
.P
You may safely use the same JIT stack for more than one pattern (either by
assigning directly or by callback), as long as the patterns are all matched
sequentially in the same thread. In a multithread application, if you do not
-specify a JIT stack, or if you assign or pass back NULL from a callback, that
-is thread-safe, because each thread has its own machine stack. However, if you
-assign or pass back a non-NULL JIT stack, this must be a different stack for
+specify a JIT stack, or if you assign or pass back NULL from a callback, that
+is thread-safe, because each thread has its own machine stack. However, if you
+assign or pass back a non-NULL JIT stack, this must be a different stack for
each thread so that the application is thread-safe.
.P
Strictly speaking, even more is allowed. You can assign the same non-NULL stack
Modified: code/trunk/doc/pcrepartial.3
===================================================================
--- code/trunk/doc/pcrepartial.3 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcrepartial.3 2012-06-02 11:03:06 UTC (rev 975)
@@ -32,14 +32,14 @@
the details differ between the two types of matching function. If both options
are set, PCRE_PARTIAL_HARD takes precedence.
.P
-If you want to use partial matching with just-in-time optimized code, you must
+If you want to use partial matching with just-in-time optimized code, you must
call \fBpcre_study()\fP or \fBpcre16_study()\fP with one or both of these
options:
.sp
PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
.sp
-PCRE_STUDY_JIT_COMPILE should also be set if you are going to run non-partial
+PCRE_STUDY_JIT_COMPILE should also be set if you are going to run non-partial
matches on the same pattern. If the appropriate JIT study mode has not been set
for a match, the interpretive matching code is used.
.P
@@ -328,8 +328,8 @@
.P
2. Lookbehind assertions that have already been obeyed are catered for in the
offsets that are returned for a partial match. However a lookbehind assertion
-later in the pattern could require even earlier characters to be inspected. You
-can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the
+later in the pattern could require even earlier characters to be inspected. You
+can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the
\fBpcre_fullinfo()\fP or \fBpcre16_fullinfo()\fP functions to obtain the length
of the largest lookbehind in the pattern. This length is given in characters,
not bytes. If you always retain at least that many characters before the
@@ -345,7 +345,7 @@
data> ab\eP
No match
.sp
-If the next segment begins "cx", a match should be found, but this will only
+If the next segment begins "cx", a match should be found, but this will only
happen if characters from the previous segment are retained. For this reason, a
"no match" result should be interpreted as "partial match of an empty string"
when the pattern contains lookbehinds.
Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcrepattern.3 2012-06-02 11:03:06 UTC (rev 975)
@@ -2633,7 +2633,7 @@
.\" HREF
\fBpcreapi\fP
.\"
-documentation.
+documentation.
.P
Experiments with Perl suggest that it too has similar optimizations, sometimes
leading to anomalous results.
@@ -2727,10 +2727,10 @@
(*MARK) item, but nevertheless do not reset it.
.P
If you are interested in (*MARK) values after failed matches, you should
-probably set the PCRE_NO_START_OPTIMIZE option
+probably set the PCRE_NO_START_OPTIMIZE option
.\" HTML <a href="#nooptimize">
.\" </a>
-(see above)
+(see above)
.\"
to ensure that the match is always attempted.
.
Modified: code/trunk/doc/pcretest.1
===================================================================
--- code/trunk/doc/pcretest.1 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/doc/pcretest.1 2012-06-02 11:03:06 UTC (rev 975)
@@ -131,8 +131,8 @@
Behave as if each pattern has the \fB/S\fP modifier; in other words, force each
pattern to be studied. If \fB-s+\fP is used, all the JIT compile options are
passed to \fBpcre[16]_study()\fP, causing just-in-time optimization to be set
-up if it is available, for both full and partial matching. Specific JIT compile
-options can be selected by following \fB-s+\fP with a digit in the range 1 to
+up if it is available, for both full and partial matching. Specific JIT compile
+options can be selected by following \fB-s+\fP with a digit in the range 1 to
7, which selects the JIT compile modes as follows:
.sp
1 normal match only
@@ -141,7 +141,7 @@
4 hard partial match only
6 soft and hard partial match
7 all three modes (default)
-.sp
+.sp
If \fB-s++\fP is used instead of \fB-s+\fP (with or without a following digit),
the text "(JIT)" is added to the first output line after a match or no match
when JIT-compiled code was actually used.
@@ -402,7 +402,7 @@
If the \fB/S\fP modifier is immediately followed by a + character, the call to
\fBpcre[16]_study()\fP is made with all the JIT study options, requesting
just-in-time optimization support if it is available, for both normal and
-partial matching. If you want to restrict the JIT compiling modes, you can
+partial matching. If you want to restrict the JIT compiling modes, you can
follow \fB/S+\fP with a digit in the range 1 to 7:
.sp
1 normal match only
@@ -411,13 +411,13 @@
4 hard partial match only
6 soft and hard partial match
7 all three modes (default)
-.sp
+.sp
If \fB/S++\fP is used instead of \fB/S+\fP (with or without a following digit),
the text "(JIT)" is added to the first output line after a match or no match
when JIT-compiled code was actually used.
.P
Note that there is also an independent \fB/+\fP modifier; it must not be given
-immediately after \fB/S\fP or \fB/S+\fP because this will be misinterpreted.
+immediately after \fB/S\fP or \fB/S+\fP because this will be misinterpreted.
.P
If JIT studying is successful, the compiled JIT code will automatically be used
when \fBpcre[16]_exec()\fP is run, except when incompatible run-time options
Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcre_compile.c 2012-06-02 11:03:06 UTC (rev 975)
@@ -490,7 +490,7 @@
"disallowed Unicode code point (>= 0xd800 && <= 0xdfff)\0"
"invalid UTF-16 string\0"
/* 75 */
- "name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)\0"
+ "name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)\0"
;
/* Table to identify digits and hex digits. This is used when compiling
@@ -4518,7 +4518,7 @@
LONE_SINGLE_CHARACTER:
/* Only the value of 1 matters for class_single_char. */
-
+
if (class_single_char < 2) class_single_char++;
/* If class_charcount is 1, we saw precisely one character. As long as
@@ -4813,7 +4813,7 @@
if (*previous == OP_CHAR || *previous == OP_CHARI
|| *previous == OP_NOT || *previous == OP_NOTI)
{
- switch (*previous)
+ switch (*previous)
{
default: /* Make compiler happy. */
case OP_CHAR: op_type = OP_STAR - OP_STAR; break;
@@ -5593,7 +5593,7 @@
ptr++;
while (MAX_255(*ptr) && (cd->ctypes[*ptr] & ctype_letter) != 0) ptr++;
namelen = (int)(ptr - name);
-
+
/* It appears that Perl allows any characters whatsoever, other than
a closing parenthesis, to appear in arguments, so we no longer insist on
letters, digits, and underscores. */
@@ -5607,7 +5607,7 @@
{
*errorcodeptr = ERR75;
goto FAILED;
- }
+ }
}
if (*ptr != CHAR_RIGHT_PARENTHESIS)
@@ -6859,13 +6859,13 @@
/* For the rest (including \X when Unicode properties are supported), we
can obtain the OP value by negating the escape value in the default
situation when PCRE_UCP is not set. When it *is* set, we substitute
- Unicode property tests. Note that \b and \B do a one-character
+ Unicode property tests. Note that \b and \B do a one-character
lookbehind. */
else
{
if ((-c == ESC_b || -c == ESC_B) && cd->max_lookbehind == 0)
- cd->max_lookbehind = 1;
+ cd->max_lookbehind = 1;
#ifdef SUPPORT_UCP
if (-c >= ESC_DU && -c <= ESC_wu)
{
@@ -7173,11 +7173,11 @@
*ptrptr = ptr;
return FALSE;
}
- else
- {
- if (fixed_length > cd->max_lookbehind)
- cd->max_lookbehind = fixed_length;
- PUT(reverse_count, 0, fixed_length);
+ else
+ {
+ if (fixed_length > cd->max_lookbehind)
+ cd->max_lookbehind = fixed_length;
+ PUT(reverse_count, 0, fixed_length);
}
}
}
Modified: code/trunk/pcre_dfa_exec.c
===================================================================
--- code/trunk/pcre_dfa_exec.c 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcre_dfa_exec.c 2012-06-02 11:03:06 UTC (rev 975)
@@ -573,10 +573,10 @@
int clen, dlen;
unsigned int c, d;
int forced_fail = 0;
- BOOL partial_newline = FALSE;
+ BOOL partial_newline = FALSE;
BOOL could_continue = reset_could_continue;
- reset_could_continue = FALSE;
-
+ reset_could_continue = FALSE;
+
/* Make the new state list into the active state list and empty the
new state list. */
@@ -645,7 +645,7 @@
/* A negative offset is a special case meaning "hold off going to this
(negated) state until the number of characters in the data field have
- been skipped". If the could_continue flag was passed over from a previous
+ been skipped". If the could_continue flag was passed over from a previous
state, arrange for it to passed on. */
if (state_offset < 0)
@@ -695,7 +695,7 @@
permitted.
We also use this mechanism for opcodes such as OP_TYPEPLUS that take an
- argument that is not a data character - but is always one byte long because
+ argument that is not a data character - but is always one byte long because
the values are small. We have to take special action to deal with \P, \p,
\H, \h, \V, \v and \X in this case. To keep the other cases fast, convert
these ones to new opcodes. */
@@ -894,19 +894,19 @@
/*-----------------------------------------------------------------*/
case OP_ANY:
if (clen > 0 && !IS_NEWLINE(ptr))
- {
+ {
if (ptr + 1 >= md->end_subject &&
(md->moptions & (PCRE_PARTIAL_HARD)) != 0 &&
NLBLOCK->nltype == NLTYPE_FIXED &&
- NLBLOCK->nllen == 2 &&
+ NLBLOCK->nllen == 2 &&
c == NLBLOCK->nl[0])
{
- could_continue = partial_newline = TRUE;
- }
+ could_continue = partial_newline = TRUE;
+ }
else
- {
- ADD_NEW(state_offset + 1, 0);
- }
+ {
+ ADD_NEW(state_offset + 1, 0);
+ }
}
break;
@@ -938,16 +938,16 @@
else if (ptr + 1 >= md->end_subject &&
(md->moptions & (PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT)) != 0 &&
NLBLOCK->nltype == NLTYPE_FIXED &&
- NLBLOCK->nllen == 2 &&
+ NLBLOCK->nllen == 2 &&
c == NLBLOCK->nl[0])
{
if ((md->moptions & PCRE_PARTIAL_HARD) != 0)
{
reset_could_continue = TRUE;
- ADD_NEW_DATA(-(state_offset + 1), 0, 1);
- }
- else could_continue = partial_newline = TRUE;
- }
+ ADD_NEW_DATA(-(state_offset + 1), 0, 1);
+ }
+ else could_continue = partial_newline = TRUE;
+ }
}
break;
@@ -963,16 +963,16 @@
else if (ptr + 1 >= md->end_subject &&
(md->moptions & (PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT)) != 0 &&
NLBLOCK->nltype == NLTYPE_FIXED &&
- NLBLOCK->nllen == 2 &&
+ NLBLOCK->nllen == 2 &&
c == NLBLOCK->nl[0])
{
if ((md->moptions & PCRE_PARTIAL_HARD) != 0)
{
reset_could_continue = TRUE;
- ADD_NEW_DATA(-(state_offset + 1), 0, 1);
- }
- else could_continue = partial_newline = TRUE;
- }
+ ADD_NEW_DATA(-(state_offset + 1), 0, 1);
+ }
+ else could_continue = partial_newline = TRUE;
+ }
}
else if (IS_NEWLINE(ptr))
{ ADD_ACTIVE(state_offset + 1, 0); }
@@ -1138,11 +1138,11 @@
if (d == OP_ANY && ptr + 1 >= md->end_subject &&
(md->moptions & (PCRE_PARTIAL_HARD)) != 0 &&
NLBLOCK->nltype == NLTYPE_FIXED &&
- NLBLOCK->nllen == 2 &&
+ NLBLOCK->nllen == 2 &&
c == NLBLOCK->nl[0])
{
- could_continue = partial_newline = TRUE;
- }
+ could_continue = partial_newline = TRUE;
+ }
else if ((c >= 256 && d != OP_DIGIT && d != OP_WHITESPACE && d != OP_WORDCHAR) ||
(c < 256 &&
(d != OP_ANY || !IS_NEWLINE(ptr)) &&
@@ -1169,11 +1169,11 @@
if (d == OP_ANY && ptr + 1 >= md->end_subject &&
(md->moptions & (PCRE_PARTIAL_HARD)) != 0 &&
NLBLOCK->nltype == NLTYPE_FIXED &&
- NLBLOCK->nllen == 2 &&
+ NLBLOCK->nllen == 2 &&
c == NLBLOCK->nl[0])
{
- could_continue = partial_newline = TRUE;
- }
+ could_continue = partial_newline = TRUE;
+ }
else if ((c >= 256 && d != OP_DIGIT && d != OP_WHITESPACE && d != OP_WORDCHAR) ||
(c < 256 &&
(d != OP_ANY || !IS_NEWLINE(ptr)) &&
@@ -1199,11 +1199,11 @@
if (d == OP_ANY && ptr + 1 >= md->end_subject &&
(md->moptions & (PCRE_PARTIAL_HARD)) != 0 &&
NLBLOCK->nltype == NLTYPE_FIXED &&
- NLBLOCK->nllen == 2 &&
+ NLBLOCK->nllen == 2 &&
c == NLBLOCK->nl[0])
{
- could_continue = partial_newline = TRUE;
- }
+ could_continue = partial_newline = TRUE;
+ }
else if ((c >= 256 && d != OP_DIGIT && d != OP_WHITESPACE && d != OP_WORDCHAR) ||
(c < 256 &&
(d != OP_ANY || !IS_NEWLINE(ptr)) &&
@@ -1227,11 +1227,11 @@
if (d == OP_ANY && ptr + 1 >= md->end_subject &&
(md->moptions & (PCRE_PARTIAL_HARD)) != 0 &&
NLBLOCK->nltype == NLTYPE_FIXED &&
- NLBLOCK->nllen == 2 &&
+ NLBLOCK->nllen == 2 &&
c == NLBLOCK->nl[0])
{
- could_continue = partial_newline = TRUE;
- }
+ could_continue = partial_newline = TRUE;
+ }
else if ((c >= 256 && d != OP_DIGIT && d != OP_WHITESPACE && d != OP_WORDCHAR) ||
(c < 256 &&
(d != OP_ANY || !IS_NEWLINE(ptr)) &&
@@ -1256,11 +1256,11 @@
if (d == OP_ANY && ptr + 1 >= md->end_subject &&
(md->moptions & (PCRE_PARTIAL_HARD)) != 0 &&
NLBLOCK->nltype == NLTYPE_FIXED &&
- NLBLOCK->nllen == 2 &&
+ NLBLOCK->nllen == 2 &&
c == NLBLOCK->nl[0])
{
- could_continue = partial_newline = TRUE;
- }
+ could_continue = partial_newline = TRUE;
+ }
else if ((c >= 256 && d != OP_DIGIT && d != OP_WHITESPACE && d != OP_WORDCHAR) ||
(c < 256 &&
(d != OP_ANY || !IS_NEWLINE(ptr)) &&
@@ -1909,8 +1909,8 @@
ncount++;
nptr += ndlen;
}
- if (nptr >= end_subject && (md->moptions & PCRE_PARTIAL_HARD) != 0)
- reset_could_continue = TRUE;
+ if (nptr >= end_subject && (md->moptions & PCRE_PARTIAL_HARD) != 0)
+ reset_could_continue = TRUE;
if (++count >= GET2(code, 1))
{ ADD_NEW_DATA(-(state_offset + 2 + IMM2_SIZE), 0, ncount); }
else
@@ -2124,8 +2124,8 @@
ncount++;
nptr += nclen;
}
- if (nptr >= end_subject && (md->moptions & PCRE_PARTIAL_HARD) != 0)
- reset_could_continue = TRUE;
+ if (nptr >= end_subject && (md->moptions & PCRE_PARTIAL_HARD) != 0)
+ reset_could_continue = TRUE;
ADD_NEW_DATA(-(state_offset + 1), 0, ncount);
}
break;
@@ -2151,20 +2151,20 @@
break;
case 0x000d:
- if (ptr + 1 >= end_subject)
+ if (ptr + 1 >= end_subject)
{
- ADD_NEW(state_offset + 1, 0);
- if ((md->moptions & PCRE_PARTIAL_HARD) != 0)
- reset_could_continue = TRUE;
- }
+ ADD_NEW(state_offset + 1, 0);
+ if ((md->moptions & PCRE_PARTIAL_HARD) != 0)
+ reset_could_continue = TRUE;
+ }
else if (ptr[1] == 0x0a)
{
ADD_NEW_DATA(-(state_offset + 1), 0, 1);
}
else
- {
+ {
ADD_NEW(state_offset + 1, 0);
- }
+ }
break;
}
break;
@@ -2277,7 +2277,7 @@
case OP_NOTI:
if (clen > 0)
- {
+ {
unsigned int otherd;
#ifdef SUPPORT_UTF
if (utf && d >= 128)
@@ -2291,7 +2291,7 @@
otherd = TABLE_GET(d, fcc, d);
if (c != d && c != otherd)
{ ADD_NEW(state_offset + dlen + 1, 0); }
- }
+ }
break;
/*-----------------------------------------------------------------*/
@@ -3047,7 +3047,7 @@
The "could_continue" variable is true if a state could have continued but
for the fact that the end of the subject was reached. */
-
+
if (new_count <= 0)
{
if (rlevel == 1 && /* Top level, and */
@@ -3064,8 +3064,8 @@
( /* or ... */
ptr >= end_subject && /* End of subject and */
ptr > md->start_used_ptr) /* Inspected non-empty string */
- )
- )
+ )
+ )
{
if (offsetcount >= 2)
{
@@ -3172,15 +3172,15 @@
PCRE_ERROR_BADENDIANNESS:PCRE_ERROR_BADMAGIC;
if ((re->flags & PCRE_MODE) == 0) return PCRE_ERROR_BADMODE;
-/* If restarting after a partial match, do some sanity checks on the contents
+/* If restarting after a partial match, do some sanity checks on the contents
of the workspace. */
if ((options & PCRE_DFA_RESTART) != 0)
{
- if ((workspace[0] & (-2)) != 0 || workspace[1] < 1 ||
+ if ((workspace[0] & (-2)) != 0 || workspace[1] < 1 ||
workspace[1] > (wscount - 2)/INTS_PER_STATEBLOCK)
- return PCRE_ERROR_DFA_BADRESTART;
- }
+ return PCRE_ERROR_DFA_BADRESTART;
+ }
/* Set up study, callout, and table data */
Modified: code/trunk/pcre_exec.c
===================================================================
--- code/trunk/pcre_exec.c 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcre_exec.c 2012-06-02 11:03:06 UTC (rev 975)
@@ -1577,9 +1577,9 @@
}
md->mark = save_mark;
- /* A COMMIT failure must fail the entire assertion, without trying any
+ /* A COMMIT failure must fail the entire assertion, without trying any
subsequent branches. */
-
+
if (rrc == MATCH_COMMIT) RRETURN(MATCH_NOMATCH);
/* PCRE does not allow THEN to escape beyond an assertion; it
Modified: code/trunk/pcre_fullinfo.c
===================================================================
--- code/trunk/pcre_fullinfo.c 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcre_fullinfo.c 2012-06-02 11:03:06 UTC (rev 975)
@@ -192,10 +192,10 @@
case PCRE_INFO_HASCRORLF:
*((int *)where) = (re->flags & PCRE_HASCRORLF) != 0;
break;
-
- case PCRE_INFO_MAXLOOKBEHIND:
+
+ case PCRE_INFO_MAXLOOKBEHIND:
*((int *)where) = re->max_lookbehind;
- break;
+ break;
default: return PCRE_ERROR_BADOPTION;
}
Modified: code/trunk/pcre_internal.h
===================================================================
--- code/trunk/pcre_internal.h 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcre_internal.h 2012-06-02 11:03:06 UTC (rev 975)
@@ -2137,7 +2137,7 @@
const pcre_uchar *once_target; /* Where to back up to for atomic groups */
#ifdef NO_RECURSE
void *match_frames_base; /* For remembering malloc'd frames */
-#endif
+#endif
} match_data;
/* A similar structure is used for the same purpose by the DFA matching
Modified: code/trunk/pcre_tables.c
===================================================================
--- code/trunk/pcre_tables.c 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcre_tables.c 2012-06-02 11:03:06 UTC (rev 975)
@@ -435,151 +435,151 @@
STRING_Zs0;
const ucp_type_table PRIV(utt)[] = {
- { 0, PT_ANY, 0 },
- { 4, PT_SC, ucp_Arabic },
- { 11, PT_SC, ucp_Armenian },
- { 20, PT_SC, ucp_Avestan },
- { 28, PT_SC, ucp_Balinese },
- { 37, PT_SC, ucp_Bamum },
- { 43, PT_SC, ucp_Batak },
- { 49, PT_SC, ucp_Bengali },
- { 57, PT_SC, ucp_Bopomofo },
- { 66, PT_SC, ucp_Brahmi },
- { 73, PT_SC, ucp_Braille },
- { 81, PT_SC, ucp_Buginese },
- { 90, PT_SC, ucp_Buhid },
- { 96, PT_GC, ucp_C },
- { 98, PT_SC, ucp_Canadian_Aboriginal },
- { 118, PT_SC, ucp_Carian },
- { 125, PT_PC, ucp_Cc },
- { 128, PT_PC, ucp_Cf },
- { 131, PT_SC, ucp_Chakma },
- { 138, PT_SC, ucp_Cham },
- { 143, PT_SC, ucp_Cherokee },
- { 152, PT_PC, ucp_Cn },
- { 155, PT_PC, ucp_Co },
- { 158, PT_SC, ucp_Common },
- { 165, PT_SC, ucp_Coptic },
- { 172, PT_PC, ucp_Cs },
- { 175, PT_SC, ucp_Cuneiform },
- { 185, PT_SC, ucp_Cypriot },
- { 193, PT_SC, ucp_Cyrillic },
- { 202, PT_SC, ucp_Deseret },
- { 210, PT_SC, ucp_Devanagari },
- { 221, PT_SC, ucp_Egyptian_Hieroglyphs },
- { 242, PT_SC, ucp_Ethiopic },
- { 251, PT_SC, ucp_Georgian },
- { 260, PT_SC, ucp_Glagolitic },
- { 271, PT_SC, ucp_Gothic },
- { 278, PT_SC, ucp_Greek },
- { 284, PT_SC, ucp_Gujarati },
- { 293, PT_SC, ucp_Gurmukhi },
- { 302, PT_SC, ucp_Han },
- { 306, PT_SC, ucp_Hangul },
- { 313, PT_SC, ucp_Hanunoo },
- { 321, PT_SC, ucp_Hebrew },
- { 328, PT_SC, ucp_Hiragana },
- { 337, PT_SC, ucp_Imperial_Aramaic },
- { 354, PT_SC, ucp_Inherited },
- { 364, PT_SC, ucp_Inscriptional_Pahlavi },
- { 386, PT_SC, ucp_Inscriptional_Parthian },
- { 409, PT_SC, ucp_Javanese },
- { 418, PT_SC, ucp_Kaithi },
- { 425, PT_SC, ucp_Kannada },
- { 433, PT_SC, ucp_Katakana },
- { 442, PT_SC, ucp_Kayah_Li },
- { 451, PT_SC, ucp_Kharoshthi },
- { 462, PT_SC, ucp_Khmer },
- { 468, PT_GC, ucp_L },
- { 470, PT_LAMP, 0 },
- { 473, PT_SC, ucp_Lao },
- { 477, PT_SC, ucp_Latin },
- { 483, PT_SC, ucp_Lepcha },
- { 490, PT_SC, ucp_Limbu },
- { 496, PT_SC, ucp_Linear_B },
- { 505, PT_SC, ucp_Lisu },
- { 510, PT_PC, ucp_Ll },
- { 513, PT_PC, ucp_Lm },
- { 516, PT_PC, ucp_Lo },
- { 519, PT_PC, ucp_Lt },
- { 522, PT_PC, ucp_Lu },
- { 525, PT_SC, ucp_Lycian },
- { 532, PT_SC, ucp_Lydian },
- { 539, PT_GC, ucp_M },
- { 541, PT_SC, ucp_Malayalam },
- { 551, PT_SC, ucp_Mandaic },
- { 559, PT_PC, ucp_Mc },
- { 562, PT_PC, ucp_Me },
- { 565, PT_SC, ucp_Meetei_Mayek },
- { 578, PT_SC, ucp_Meroitic_Cursive },
- { 595, PT_SC, ucp_Meroitic_Hieroglyphs },
- { 616, PT_SC, ucp_Miao },
- { 621, PT_PC, ucp_Mn },
- { 624, PT_SC, ucp_Mongolian },
- { 634, PT_SC, ucp_Myanmar },
- { 642, PT_GC, ucp_N },
- { 644, PT_PC, ucp_Nd },
- { 647, PT_SC, ucp_New_Tai_Lue },
- { 659, PT_SC, ucp_Nko },
- { 663, PT_PC, ucp_Nl },
- { 666, PT_PC, ucp_No },
- { 669, PT_SC, ucp_Ogham },
- { 675, PT_SC, ucp_Ol_Chiki },
- { 684, PT_SC, ucp_Old_Italic },
- { 695, PT_SC, ucp_Old_Persian },
- { 707, PT_SC, ucp_Old_South_Arabian },
- { 725, PT_SC, ucp_Old_Turkic },
- { 736, PT_SC, ucp_Oriya },
- { 742, PT_SC, ucp_Osmanya },
- { 750, PT_GC, ucp_P },
- { 752, PT_PC, ucp_Pc },
- { 755, PT_PC, ucp_Pd },
- { 758, PT_PC, ucp_Pe },
- { 761, PT_PC, ucp_Pf },
- { 764, PT_SC, ucp_Phags_Pa },
- { 773, PT_SC, ucp_Phoenician },
- { 784, PT_PC, ucp_Pi },
- { 787, PT_PC, ucp_Po },
- { 790, PT_PC, ucp_Ps },
- { 793, PT_SC, ucp_Rejang },
- { 800, PT_SC, ucp_Runic },
- { 806, PT_GC, ucp_S },
- { 808, PT_SC, ucp_Samaritan },
- { 818, PT_SC, ucp_Saurashtra },
- { 829, PT_PC, ucp_Sc },
- { 832, PT_SC, ucp_Sharada },
- { 840, PT_SC, ucp_Shavian },
- { 848, PT_SC, ucp_Sinhala },
- { 856, PT_PC, ucp_Sk },
- { 859, PT_PC, ucp_Sm },
- { 862, PT_PC, ucp_So },
- { 865, PT_SC, ucp_Sora_Sompeng },
- { 878, PT_SC, ucp_Sundanese },
- { 888, PT_SC, ucp_Syloti_Nagri },
- { 901, PT_SC, ucp_Syriac },
- { 908, PT_SC, ucp_Tagalog },
- { 916, PT_SC, ucp_Tagbanwa },
- { 925, PT_SC, ucp_Tai_Le },
- { 932, PT_SC, ucp_Tai_Tham },
- { 941, PT_SC, ucp_Tai_Viet },
- { 950, PT_SC, ucp_Takri },
- { 956, PT_SC, ucp_Tamil },
- { 962, PT_SC, ucp_Telugu },
- { 969, PT_SC, ucp_Thaana },
- { 976, PT_SC, ucp_Thai },
- { 981, PT_SC, ucp_Tibetan },
- { 989, PT_SC, ucp_Tifinagh },
- { 998, PT_SC, ucp_Ugaritic },
- { 1007, PT_SC, ucp_Vai },
- { 1011, PT_ALNUM, 0 },
- { 1015, PT_PXSPACE, 0 },
- { 1019, PT_SPACE, 0 },
- { 1023, PT_WORD, 0 },
- { 1027, PT_SC, ucp_Yi },
- { 1030, PT_GC, ucp_Z },
- { 1032, PT_PC, ucp_Zl },
- { 1035, PT_PC, ucp_Zp },
- { 1038, PT_PC, ucp_Zs }
+ { 0, PT_ANY, 0 },
+ { 4, PT_SC, ucp_Arabic },
+ { 11, PT_SC, ucp_Armenian },
+ { 20, PT_SC, ucp_Avestan },
+ { 28, PT_SC, ucp_Balinese },
+ { 37, PT_SC, ucp_Bamum },
+ { 43, PT_SC, ucp_Batak },
+ { 49, PT_SC, ucp_Bengali },
+ { 57, PT_SC, ucp_Bopomofo },
+ { 66, PT_SC, ucp_Brahmi },
+ { 73, PT_SC, ucp_Braille },
+ { 81, PT_SC, ucp_Buginese },
+ { 90, PT_SC, ucp_Buhid },
+ { 96, PT_GC, ucp_C },
+ { 98, PT_SC, ucp_Canadian_Aboriginal },
+ { 118, PT_SC, ucp_Carian },
+ { 125, PT_PC, ucp_Cc },
+ { 128, PT_PC, ucp_Cf },
+ { 131, PT_SC, ucp_Chakma },
+ { 138, PT_SC, ucp_Cham },
+ { 143, PT_SC, ucp_Cherokee },
+ { 152, PT_PC, ucp_Cn },
+ { 155, PT_PC, ucp_Co },
+ { 158, PT_SC, ucp_Common },
+ { 165, PT_SC, ucp_Coptic },
+ { 172, PT_PC, ucp_Cs },
+ { 175, PT_SC, ucp_Cuneiform },
+ { 185, PT_SC, ucp_Cypriot },
+ { 193, PT_SC, ucp_Cyrillic },
+ { 202, PT_SC, ucp_Deseret },
+ { 210, PT_SC, ucp_Devanagari },
+ { 221, PT_SC, ucp_Egyptian_Hieroglyphs },
+ { 242, PT_SC, ucp_Ethiopic },
+ { 251, PT_SC, ucp_Georgian },
+ { 260, PT_SC, ucp_Glagolitic },
+ { 271, PT_SC, ucp_Gothic },
+ { 278, PT_SC, ucp_Greek },
+ { 284, PT_SC, ucp_Gujarati },
+ { 293, PT_SC, ucp_Gurmukhi },
+ { 302, PT_SC, ucp_Han },
+ { 306, PT_SC, ucp_Hangul },
+ { 313, PT_SC, ucp_Hanunoo },
+ { 321, PT_SC, ucp_Hebrew },
+ { 328, PT_SC, ucp_Hiragana },
+ { 337, PT_SC, ucp_Imperial_Aramaic },
+ { 354, PT_SC, ucp_Inherited },
+ { 364, PT_SC, ucp_Inscriptional_Pahlavi },
+ { 386, PT_SC, ucp_Inscriptional_Parthian },
+ { 409, PT_SC, ucp_Javanese },
+ { 418, PT_SC, ucp_Kaithi },
+ { 425, PT_SC, ucp_Kannada },
+ { 433, PT_SC, ucp_Katakana },
+ { 442, PT_SC, ucp_Kayah_Li },
+ { 451, PT_SC, ucp_Kharoshthi },
+ { 462, PT_SC, ucp_Khmer },
+ { 468, PT_GC, ucp_L },
+ { 470, PT_LAMP, 0 },
+ { 473, PT_SC, ucp_Lao },
+ { 477, PT_SC, ucp_Latin },
+ { 483, PT_SC, ucp_Lepcha },
+ { 490, PT_SC, ucp_Limbu },
+ { 496, PT_SC, ucp_Linear_B },
+ { 505, PT_SC, ucp_Lisu },
+ { 510, PT_PC, ucp_Ll },
+ { 513, PT_PC, ucp_Lm },
+ { 516, PT_PC, ucp_Lo },
+ { 519, PT_PC, ucp_Lt },
+ { 522, PT_PC, ucp_Lu },
+ { 525, PT_SC, ucp_Lycian },
+ { 532, PT_SC, ucp_Lydian },
+ { 539, PT_GC, ucp_M },
+ { 541, PT_SC, ucp_Malayalam },
+ { 551, PT_SC, ucp_Mandaic },
+ { 559, PT_PC, ucp_Mc },
+ { 562, PT_PC, ucp_Me },
+ { 565, PT_SC, ucp_Meetei_Mayek },
+ { 578, PT_SC, ucp_Meroitic_Cursive },
+ { 595, PT_SC, ucp_Meroitic_Hieroglyphs },
+ { 616, PT_SC, ucp_Miao },
+ { 621, PT_PC, ucp_Mn },
+ { 624, PT_SC, ucp_Mongolian },
+ { 634, PT_SC, ucp_Myanmar },
+ { 642, PT_GC, ucp_N },
+ { 644, PT_PC, ucp_Nd },
+ { 647, PT_SC, ucp_New_Tai_Lue },
+ { 659, PT_SC, ucp_Nko },
+ { 663, PT_PC, ucp_Nl },
+ { 666, PT_PC, ucp_No },
+ { 669, PT_SC, ucp_Ogham },
+ { 675, PT_SC, ucp_Ol_Chiki },
+ { 684, PT_SC, ucp_Old_Italic },
+ { 695, PT_SC, ucp_Old_Persian },
+ { 707, PT_SC, ucp_Old_South_Arabian },
+ { 725, PT_SC, ucp_Old_Turkic },
+ { 736, PT_SC, ucp_Oriya },
+ { 742, PT_SC, ucp_Osmanya },
+ { 750, PT_GC, ucp_P },
+ { 752, PT_PC, ucp_Pc },
+ { 755, PT_PC, ucp_Pd },
+ { 758, PT_PC, ucp_Pe },
+ { 761, PT_PC, ucp_Pf },
+ { 764, PT_SC, ucp_Phags_Pa },
+ { 773, PT_SC, ucp_Phoenician },
+ { 784, PT_PC, ucp_Pi },
+ { 787, PT_PC, ucp_Po },
+ { 790, PT_PC, ucp_Ps },
+ { 793, PT_SC, ucp_Rejang },
+ { 800, PT_SC, ucp_Runic },
+ { 806, PT_GC, ucp_S },
+ { 808, PT_SC, ucp_Samaritan },
+ { 818, PT_SC, ucp_Saurashtra },
+ { 829, PT_PC, ucp_Sc },
+ { 832, PT_SC, ucp_Sharada },
+ { 840, PT_SC, ucp_Shavian },
+ { 848, PT_SC, ucp_Sinhala },
+ { 856, PT_PC, ucp_Sk },
+ { 859, PT_PC, ucp_Sm },
+ { 862, PT_PC, ucp_So },
+ { 865, PT_SC, ucp_Sora_Sompeng },
+ { 878, PT_SC, ucp_Sundanese },
+ { 888, PT_SC, ucp_Syloti_Nagri },
+ { 901, PT_SC, ucp_Syriac },
+ { 908, PT_SC, ucp_Tagalog },
+ { 916, PT_SC, ucp_Tagbanwa },
+ { 925, PT_SC, ucp_Tai_Le },
+ { 932, PT_SC, ucp_Tai_Tham },
+ { 941, PT_SC, ucp_Tai_Viet },
+ { 950, PT_SC, ucp_Takri },
+ { 956, PT_SC, ucp_Tamil },
+ { 962, PT_SC, ucp_Telugu },
+ { 969, PT_SC, ucp_Thaana },
+ { 976, PT_SC, ucp_Thai },
+ { 981, PT_SC, ucp_Tibetan },
+ { 989, PT_SC, ucp_Tifinagh },
+ { 998, PT_SC, ucp_Ugaritic },
+ { 1007, PT_SC, ucp_Vai },
+ { 1011, PT_ALNUM, 0 },
+ { 1015, PT_PXSPACE, 0 },
+ { 1019, PT_SPACE, 0 },
+ { 1023, PT_WORD, 0 },
+ { 1027, PT_SC, ucp_Yi },
+ { 1030, PT_GC, ucp_Z },
+ { 1032, PT_PC, ucp_Zl },
+ { 1035, PT_PC, ucp_Zp },
+ { 1038, PT_PC, ucp_Zs }
};
const int PRIV(utt_size) = sizeof(PRIV(utt)) / sizeof(ucp_type_table);
Modified: code/trunk/pcregrep.c
===================================================================
--- code/trunk/pcregrep.c 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcregrep.c 2012-06-02 11:03:06 UTC (rev 975)
@@ -251,7 +251,7 @@
{ OP_PATLIST, 'e', NULL, "regex(p)=pattern", "specify pattern (may be used more than once)" },
{ OP_NODATA, 'F', NULL, "fixed-strings", "patterns are sets of newline-separated strings" },
{ OP_STRING, 'f', &pattern_filename, "file=path", "read patterns from file" },
- { OP_STRING, N_FILE_LIST, &file_list, "file-list=path","read files to search from file" },
+ { OP_STRING, N_FILE_LIST, &file_list, "file-list=path","read files to search from file" },
{ OP_NODATA, N_FOFFSETS, NULL, "file-offsets", "output file offsets, not text" },
{ OP_NODATA, 'H', NULL, "with-filename", "force the prefixing filename on output" },
{ OP_NODATA, 'h', NULL, "no-filename", "suppress the prefixing filename on output" },
@@ -1105,15 +1105,15 @@
endptr = main_buffer + bufflength;
/* Unless binary-files=text, see if we have a binary file. This uses the same
-rule as GNU grep, namely, a search for a binary zero byte near the start of the
+rule as GNU grep, namely, a search for a binary zero byte near the start of the
file. */
if (binary_files != BIN_TEXT)
{
- binary =
+ binary =
memchr(main_buffer, 0, (bufflength > 1024)? 1024 : bufflength) != NULL;
if (binary && binary_files == BIN_NOMATCH) return 1;
- }
+ }
/* Loop while the current pointer is not at the end of the file. For large
files, endptr will be at the end of the buffer when we are in the middle of the
@@ -1230,16 +1230,16 @@
/* Just count if just counting is wanted. */
if (count_only) count++;
-
- /* When handling a binary file and binary-files==binary, the "binary"
- variable will be set true (it's false in all other cases). In this
+
+ /* When handling a binary file and binary-files==binary, the "binary"
+ variable will be set true (it's false in all other cases). In this
situation we just want to output the file name. No need to scan further. */
-
+
else if (binary)
{
fprintf(stdout, "Binary file %s matches\n", filename);
- return 0;
- }
+ return 0;
+ }
/* If all we want is a file name, there is no need to scan any more lines
in the file. */
@@ -1876,15 +1876,15 @@
contains an underscore. */
if (strchr(op->long_name, '_') != NULL) continue;
-
+
if (op->one_char > 0 && (op->long_name)[0] == 0)
n = 31 - printf(" -%c", op->one_char);
- else
+ else
{
- if (op->one_char > 0) sprintf(s, "-%c,", op->one_char);
+ if (op->one_char > 0) sprintf(s, "-%c,", op->one_char);
else strcpy(s, " ");
n = 31 - printf(" %s --%s", s, op->long_name);
- }
+ }
if (n < 1) n = 1;
printf("%.*s%s\n", n, " ", op->help_text);
@@ -2356,7 +2356,7 @@
/* If the option type is OP_PATLIST, it's the -e option, which can be called
multiple times to create a list of patterns. */
-
+
if (op->type == OP_PATLIST)
{
if (cmd_pattern_count >= MAX_PATTERN_COUNT)
@@ -2367,9 +2367,9 @@
}
patterns[cmd_pattern_count++] = option_data;
}
-
+
/* Handle OP_BINARY_FILES */
-
+
else if (op->type == OP_BINFILES)
{
if (strcmp(option_data, "binary") == 0)
@@ -2380,11 +2380,11 @@
binary_files = BIN_TEXT;
else
{
- fprintf(stderr, "pcregrep: unknown value \"%s\" for binary-files\n",
- option_data);
+ fprintf(stderr, "pcregrep: unknown value \"%s\" for binary-files\n",
+ option_data);
pcregrep_exit(usage(2));
- }
- }
+ }
+ }
/* Otherwise, deal with single string or numeric data values. */
@@ -2755,7 +2755,7 @@
goto EXIT2;
}
}
-
+
/* If a file that contains a list of files to search has been specified, read
it line by line and search the given files. Otherwise, if there are no further
arguments, do the business on stdin and exit. */
@@ -2765,30 +2765,30 @@
char buffer[PATBUFSIZE];
FILE *fl;
if (strcmp(file_list, "-") == 0) fl = stdin; else
- {
+ {
fl = fopen(file_list, "rb");
if (fl == NULL)
{
- fprintf(stderr, "pcregrep: Failed to open %s: %s\n", file_list,
+ fprintf(stderr, "pcregrep: Failed to open %s: %s\n", file_list,
strerror(errno));
goto EXIT2;
- }
- }
+ }
+ }
while (fgets(buffer, PATBUFSIZE, fl) != NULL)
{
int frc;
char *end = buffer + (int)strlen(buffer);
while (end > buffer && isspace(end[-1])) end--;
- *end = 0;
- if (*buffer != 0)
- {
- frc = grep_or_recurse(buffer, dee_action == dee_RECURSE, FALSE);
+ *end = 0;
+ if (*buffer != 0)
+ {
+ frc = grep_or_recurse(buffer, dee_action == dee_RECURSE, FALSE);
if (frc > 1) rc = frc;
- else if (frc == 0 && rc == 1) rc = 0;
- }
- }
- if (fl != stdin) fclose (fl);
- }
+ else if (frc == 0 && rc == 1) rc = 0;
+ }
+ }
+ if (fl != stdin) fclose (fl);
+ }
/* Do this only if there was no file list (and no file arguments). */
@@ -2804,7 +2804,7 @@
at top level - this suppresses the file name if the argument is not a directory
and filenames are not otherwise forced. */
-only_one_at_top = i == argc - 1 && file_list == NULL;
+only_one_at_top = i == argc - 1 && file_list == NULL;
for (; i < argc; i++)
{
Modified: code/trunk/pcreposix.c
===================================================================
--- code/trunk/pcreposix.c 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcreposix.c 2012-06-02 11:03:06 UTC (rev 975)
@@ -160,7 +160,7 @@
REG_BADPAT, /* disallowed UTF-8/16 code point (>= 0xd800 && <= 0xdfff) */
REG_BADPAT, /* invalid UTF-16 string (should not occur) */
/* 75 */
- REG_BADPAT /* overlong MARK name */
+ REG_BADPAT /* overlong MARK name */
};
/* Table of texts corresponding to POSIX error codes */
Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c 2012-06-02 05:56:58 UTC (rev 974)
+++ code/trunk/pcretest.c 2012-06-02 11:03:06 UTC (rev 975)
@@ -737,7 +737,7 @@
"JIT stack limit reached",
"pattern compiled in wrong mode: 8-bit/16-bit error",
"pattern compiled with other endianness",
- "invalid data in workspace for DFA restart"
+ "invalid data in workspace for DFA restart"
};
@@ -2600,10 +2600,10 @@
int do_showcaprest = 0;
int do_flip = 0;
int erroroffset, len, delimiter, poffset;
-
-#if !defined NODFA
+
+#if !defined NODFA
int dfa_matched = 0;
-#endif
+#endif
use_utf = 0;
debug_lengths = 1;
@@ -3946,9 +3946,9 @@
{
fprintf(outfile, "Timing DFA restarts is not supported\n");
break;
- }
- if (dfa_workspace == NULL)
- dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int));
+ }
+ if (dfa_workspace == NULL)
+ dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int));
for (i = 0; i < timeitm; i++)
{
PCRE_DFA_EXEC(count, re, extra, bptr, len, start_offset,
@@ -4019,9 +4019,9 @@
#if !defined NODFA
else if (all_use_dfa || use_dfa)
{
- if (dfa_workspace == NULL)
- dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int));
- if (dfa_matched++ == 0)
+ if (dfa_workspace == NULL)
+ dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int));
+ if (dfa_matched++ == 0)
dfa_workspace[0] = -1; /* To catch bad restart */
PCRE_DFA_EXEC(count, re, extra, bptr, len, start_offset,
(options | g_notempty), use_offsets, use_size_offsets, dfa_workspace,