Revision: 612
http://vcs.pcre.org/viewvc?view=rev&revision=612
Author: ph10
Date: 2011-07-02 16:20:59 +0100 (Sat, 02 Jul 2011)
Log Message:
-----------
Fix two study bugs concerned with minimum subject lengths; add features to
pcretest so that all tests can be run with or without study; adjust tests so
that this happens.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/HACKING
code/trunk/RunTest
code/trunk/doc/pcretest.1
code/trunk/pcre_internal.h
code/trunk/pcre_study.c
code/trunk/pcretest.c
code/trunk/perltest.pl
code/trunk/testdata/testinput11
code/trunk/testdata/testinput2
code/trunk/testdata/testinput5
code/trunk/testdata/testinput7
code/trunk/testdata/testoutput11
code/trunk/testdata/testoutput2
code/trunk/testdata/testoutput5
code/trunk/testdata/testoutput7
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/ChangeLog 2011-07-02 15:20:59 UTC (rev 612)
@@ -79,8 +79,10 @@
synonym of -m (show memory usage). I have changed it to mean "force study
for every regex", that is, assume /S for every regex. This is similar to -i
and -d etc. It's slightly incompatible, but I'm hoping nobody is still
- using it. It makes it easier to run collection of tests with study enabled,
- and thereby test pcre_study() more easily.
+ using it. It makes it easier to run collections of tests with and without
+ study enabled, and thereby test pcre_study() more easily. All the standard
+ tests are now run with and without -s (but some patterns can be marked as
+ "never study" - see 20 below).
15. When (*ACCEPT) was used in a subpattern that was called recursively, the
restoration of the capturing data to the outer values was not happening
@@ -101,6 +103,13 @@
18. If a pattern containing \R was studied, it was assumed that \R always
matched two bytes, thus causing the minimum subject length to be
incorrectly computed because \R can also match just one byte.
+
+19. If a pattern containing (*ACCEPT) was studied, the minimum subject length
+ was incorrectly computed.
+
+20. If /S is present twice on a test pattern in pcretest input, it *disables*
+ studying, thereby overriding the use of -s on the command line. This is
+ necessary for one or two tests to keep the output identical in both cases.
Version 8.12 15-Jan-2011
Modified: code/trunk/HACKING
===================================================================
--- code/trunk/HACKING 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/HACKING 2011-07-02 15:20:59 UTC (rev 612)
@@ -2,7 +2,8 @@
--------------------------
These are very rough technical notes that record potentially useful information
-about PCRE internals.
+about PCRE internals. For information about testing PCRE, see the pcretest
+documentation and the comment at the head of the RunTest file.
Historical note 1
@@ -449,4 +450,4 @@
Philip Hazel
-May 2011
+July 2011
Modified: code/trunk/RunTest
===================================================================
--- code/trunk/RunTest 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/RunTest 2011-07-02 15:20:59 UTC (rev 612)
@@ -1,6 +1,14 @@
#! /bin/sh
-# Run PCRE tests.
+# Run the PCRE tests using the pcretest program. All tests are now run both
+# with and without -s, to ensure that everything is tested with and without
+# studying. However, there are some tests that produce different output after
+# studying, typically when we are tracing the actual matching process (for
+# example, using auto-callouts). In these few cases, the tests are duplicated
+# in the files, one with /S to force studying always, and one with /SS to force
+# *not* studying always. The use of -s doesn't then make any difference to
+# their output. There is also one test which compiles invalid UTF-8 with the
+# UTF-8 check turned off for which studying is disabled with /SS.
valgrind=
@@ -137,33 +145,37 @@
if [ $do1 = yes ] ; then
echo "Test 1: main functionality (Compatible with Perl >= 5.8)"
- $valgrind ./pcretest -q $testdata/testinput1 testtry
- if [ $? = 0 ] ; then
- $cf $testdata/testoutput1 testtry
- if [ $? != 0 ] ; then exit 1; fi
- else exit 1
- fi
- echo "OK"
+ for opt in "" "-s"; do
+ $valgrind ./pcretest -q $opt $testdata/testinput1 testtry
+ if [ $? = 0 ] ; then
+ $cf $testdata/testoutput1 testtry
+ if [ $? != 0 ] ; then exit 1; fi
+ else exit 1
+ fi
+ if [ "$opt" = "-s" ] ; then echo "OK with study" ; else echo "OK"; fi
+ done
fi
# PCRE tests that are not Perl-compatible - API, errors, internals
if [ $do2 = yes ] ; then
echo "Test 2: API, errors, internals, and non-Perl stuff"
- $valgrind ./pcretest -q $testdata/testinput2 testtry
- if [ $? = 0 ] ; then
- $cf $testdata/testoutput2 testtry
- if [ $? != 0 ] ; then exit 1; fi
- else
- echo " "
- echo "** Test 2 requires a lot of stack. If it has crashed with a"
- echo "** segmentation fault, it may be that you do not have enough"
- echo "** stack available by default. Please see the 'pcrestack' man"
- echo "** page for a discussion of PCRE's stack usage."
- echo " "
- exit 1
- fi
- echo "OK"
+ for opt in "" "-s"; do
+ $valgrind ./pcretest -q $opt $testdata/testinput2 testtry
+ if [ $? = 0 ] ; then
+ $cf $testdata/testoutput2 testtry
+ if [ $? != 0 ] ; then exit 1; fi
+ else
+ echo " "
+ echo "** Test 2 requires a lot of stack. If it has crashed with a"
+ echo "** segmentation fault, it may be that you do not have enough"
+ echo "** stack available by default. Please see the 'pcrestack' man"
+ echo "** page for a discussion of PCRE's stack usage."
+ echo " "
+ exit 1
+ fi
+ if [ "$opt" = "-s" ] ; then echo "OK with study" ; else echo "OK"; fi
+ done
fi
# Locale-specific tests, provided that either the "fr_FR" or the "french"
@@ -191,19 +203,22 @@
if [ "$locale" != "" ] ; then
echo "Test 3: locale-specific features (using '$locale' locale)"
- $valgrind ./pcretest -q $infile testtry
- if [ $? = 0 ] ; then
- $cf $outfile testtry
- if [ $? != 0 ] ; then
- echo " "
- echo "Locale test did not run entirely successfully."
- echo "This usually means that there is a problem with the locale"
- echo "settings rather than a bug in PCRE."
- else
- echo "OK"
+ for opt in "" "-s"; do
+ $valgrind ./pcretest -q $opt $infile testtry
+ if [ $? = 0 ] ; then
+ $cf $outfile testtry
+ if [ $? != 0 ] ; then
+ echo " "
+ echo "Locale test did not run entirely successfully."
+ echo "This usually means that there is a problem with the locale"
+ echo "settings rather than a bug in PCRE."
+ break;
+ else
+ if [ "$opt" = "-s" ] ; then echo "OK with study" ; else echo "OK"; fi
+ fi
+ else exit 1
fi
- else exit 1
- fi
+ done
else
echo "Cannot test locale-specific features - neither the 'fr_FR' nor the"
echo "'french' locale exists, or the \"locale\" command is not available"
@@ -216,70 +231,82 @@
if [ $do4 = yes ] ; then
echo "Test 4: UTF-8 support (Compatible with Perl >= 5.8)"
- $valgrind ./pcretest -q $testdata/testinput4 testtry
- if [ $? = 0 ] ; then
- $cf $testdata/testoutput4 testtry
- if [ $? != 0 ] ; then exit 1; fi
- else exit 1
- fi
- echo "OK"
+ for opt in "" "-s"; do
+ $valgrind ./pcretest -q $opt $testdata/testinput4 testtry
+ if [ $? = 0 ] ; then
+ $cf $testdata/testoutput4 testtry
+ if [ $? != 0 ] ; then exit 1; fi
+ else exit 1
+ fi
+ if [ "$opt" = "-s" ] ; then echo "OK with study" ; else echo "OK"; fi
+ done
fi
if [ $do5 = yes ] ; then
echo "Test 5: API, internals, and non-Perl stuff for UTF-8 support"
- $valgrind ./pcretest -q $testdata/testinput5 testtry
- if [ $? = 0 ] ; then
- $cf $testdata/testoutput5 testtry
- if [ $? != 0 ] ; then exit 1; fi
- else exit 1
- fi
- echo "OK"
+ for opt in "" "-s"; do
+ $valgrind ./pcretest -q $opt $testdata/testinput5 testtry
+ if [ $? = 0 ] ; then
+ $cf $testdata/testoutput5 testtry
+ if [ $? != 0 ] ; then exit 1; fi
+ else exit 1
+ fi
+ if [ "$opt" = "-s" ] ; then echo "OK with study" ; else echo "OK"; fi
+ done
fi
if [ $do6 = yes ] ; then
echo "Test 6: Unicode property support (Compatible with Perl >= 5.10)"
- $valgrind ./pcretest -q $testdata/testinput6 testtry
- if [ $? = 0 ] ; then
- $cf $testdata/testoutput6 testtry
- if [ $? != 0 ] ; then exit 1; fi
- else exit 1
- fi
- echo "OK"
+ for opt in "" "-s"; do
+ $valgrind ./pcretest -q $opt $testdata/testinput6 testtry
+ if [ $? = 0 ] ; then
+ $cf $testdata/testoutput6 testtry
+ if [ $? != 0 ] ; then exit 1; fi
+ else exit 1
+ fi
+ if [ "$opt" = "-s" ] ; then echo "OK with study" ; else echo "OK"; fi
+ done
fi
# Tests for DFA matching support
if [ $do7 = yes ] ; then
echo "Test 7: DFA matching"
- $valgrind ./pcretest -q -dfa $testdata/testinput7 testtry
- if [ $? = 0 ] ; then
- $cf $testdata/testoutput7 testtry
- if [ $? != 0 ] ; then exit 1; fi
- else exit 1
- fi
- echo "OK"
+ for opt in "" "-s"; do
+ $valgrind ./pcretest -q $opt -dfa $testdata/testinput7 testtry
+ if [ $? = 0 ] ; then
+ $cf $testdata/testoutput7 testtry
+ if [ $? != 0 ] ; then exit 1; fi
+ else exit 1
+ fi
+ if [ "$opt" = "-s" ] ; then echo "OK with study" ; else echo "OK"; fi
+ done
fi
if [ $do8 = yes ] ; then
echo "Test 8: DFA matching with UTF-8"
- $valgrind ./pcretest -q -dfa $testdata/testinput8 testtry
- if [ $? = 0 ] ; then
- $cf $testdata/testoutput8 testtry
- if [ $? != 0 ] ; then exit 1; fi
- else exit 1
- fi
- echo "OK"
+ for opt in "" "-s"; do
+ $valgrind ./pcretest -q $opt -dfa $testdata/testinput8 testtry
+ if [ $? = 0 ] ; then
+ $cf $testdata/testoutput8 testtry
+ if [ $? != 0 ] ; then exit 1; fi
+ else exit 1
+ fi
+ if [ "$opt" = "-s" ] ; then echo "OK with study" ; else echo "OK"; fi
+ done
fi
if [ $do9 = yes ] ; then
echo "Test 9: DFA matching with Unicode properties"
- $valgrind ./pcretest -q -dfa $testdata/testinput9 testtry
- if [ $? = 0 ] ; then
- $cf $testdata/testoutput9 testtry
- if [ $? != 0 ] ; then exit 1; fi
- else exit 1
- fi
- echo "OK"
+ for opt in "" "-s"; do
+ $valgrind ./pcretest -q $opt -dfa $testdata/testinput9 testtry
+ if [ $? = 0 ] ; then
+ $cf $testdata/testoutput9 testtry
+ if [ $? != 0 ] ; then exit 1; fi
+ else exit 1
+ fi
+ if [ "$opt" = "-s" ] ; then echo "OK with study" ; else echo "OK"; fi
+ done
fi
# Test of internal offsets and code sizes. This test is run only when there
@@ -290,39 +317,45 @@
if [ $do10 = yes ] ; then
echo "Test 10: Internal offsets and code size tests"
- $valgrind ./pcretest -q $testdata/testinput10 testtry
- if [ $? = 0 ] ; then
- $cf $testdata/testoutput10 testtry
- if [ $? != 0 ] ; then exit 1; fi
- else exit 1
- fi
- echo "OK"
+ for opt in "" "-s"; do
+ $valgrind ./pcretest -q $opt $testdata/testinput10 testtry
+ if [ $? = 0 ] ; then
+ $cf $testdata/testoutput10 testtry
+ if [ $? != 0 ] ; then exit 1; fi
+ else exit 1
+ fi
+ if [ "$opt" = "-s" ] ; then echo "OK with study" ; else echo "OK"; fi
+ done
fi
# Test of Perl >= 5.10 features
if [ $do11 = yes ] ; then
echo "Test 11: Features from Perl >= 5.10"
- $valgrind ./pcretest -q $testdata/testinput11 testtry
- if [ $? = 0 ] ; then
- $cf $testdata/testoutput11 testtry
- if [ $? != 0 ] ; then exit 1; fi
- else exit 1
- fi
- echo "OK"
+ for opt in "" "-s"; do
+ $valgrind ./pcretest -q $opt $testdata/testinput11 testtry
+ if [ $? = 0 ] ; then
+ $cf $testdata/testoutput11 testtry
+ if [ $? != 0 ] ; then exit 1; fi
+ else exit 1
+ fi
+ if [ "$opt" = "-s" ] ; then echo "OK with study" ; else echo "OK"; fi
+ done
fi
# Test non-Perl-compatible Unicode property support
if [ $do12 = yes ] ; then
echo "Test 12: API, internals, and non-Perl stuff for Unicode property support"
- $valgrind ./pcretest -q $testdata/testinput12 testtry
- if [ $? = 0 ] ; then
- $cf $testdata/testoutput12 testtry
- if [ $? != 0 ] ; then exit 1; fi
- else exit 1
- fi
- echo "OK"
+ for opt in "" "-s"; do
+ $valgrind ./pcretest -q $opt $testdata/testinput12 testtry
+ if [ $? = 0 ] ; then
+ $cf $testdata/testoutput12 testtry
+ if [ $? != 0 ] ; then exit 1; fi
+ else exit 1
+ fi
+ if [ "$opt" = "-s" ] ; then echo "OK with study" ; else echo "OK"; fi
+ done
fi
# End
Modified: code/trunk/doc/pcretest.1
===================================================================
--- code/trunk/doc/pcretest.1 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/doc/pcretest.1 2011-07-02 15:20:59 UTC (rev 612)
@@ -4,7 +4,7 @@
.SH SYNOPSIS
.rs
.sp
-.B pcretest "[options] [source] [destination]"
+.B pcretest "[options] [input file [output file]]"
.sp
\fBpcretest\fP was written as a test program for the PCRE regular expression
library itself, but it can also be used for experimenting with regular
@@ -18,14 +18,17 @@
.\" HREF
\fBpcreapi\fP
.\"
-documentation.
+documentation. The input for \fBpcretest\fP is a sequence of regular expression
+patterns and strings to be matched, as described below. The output shows the
+result of each match. Options on the command line and the patterns control PCRE
+options and exactly what is output.
.
.
-.SH OPTIONS
+.SH COMMAND LINE OPTIONS
.rs
.TP 10
\fB-b\fP
-Behave as if each regex has the \fB/B\fP (show byte code) modifier; the
+Behave as if each pattern has the \fB/B\fP (show byte code) modifier; the
internal form is output after compilation.
.TP 10
\fB-C\fP
@@ -33,7 +36,7 @@
about the optional features that are included, and then exit.
.TP 10
\fB-d\fP
-Behave as if each regex has the \fB/D\fP (debug) modifier; the internal
+Behave as if each pattern has the \fB/D\fP (debug) modifier; the internal
form and information about the compiled pattern is output after compilation;
\fB-d\fP is equivalent to \fB-b -i\fP.
.TP 10
@@ -46,7 +49,7 @@
Output a brief summary these options and then exit.
.TP 10
\fB-i\fP
-Behave as if each regex has the \fB/I\fP modifier; information about the
+Behave as if each pattern has the \fB/I\fP modifier; information about the
compiled pattern is given after compilation.
.TP 10
\fB-M\fP
@@ -67,7 +70,7 @@
below).
.TP 10
\fB-p\fP
-Behave as if each regex has the \fB/P\fP modifier; the POSIX wrapper API is
+Behave as if each pattern has the \fB/P\fP modifier; the POSIX wrapper API is
used to call PCRE. None of the other options has any effect when \fB-p\fP is
set.
.TP 10
@@ -79,8 +82,21 @@
megabytes.
.TP 10
\fB-s\fP
-Behave as if each regex has the \fB/S\fP modifier; in other words, force each
-regex to be studied.
+Behave as if each pattern has the \fB/S\fP modifier; in other words, force each
+pattern to be studied. If the \fB/I\fP or \fB/D\fP option is present on a
+pattern (requesting output about the compiled pattern), information about the
+result of studying is not included when studying is caused only by \fB-s\fP and
+neither \fB-i\fP nor \fB-d\fP is present on the command line. This behaviour
+means that the output from tests that are run with and without \fB-s\fP should
+be identical, except when options that output information about the actual
+running of a match are set. The \fB-M\fP, \fB-t\fP, and \fB-tm\fP options,
+which give information about resources used, are likely to produce different
+output with and without \fB-s\fP. Output may also differ if the \fB/C\fP option
+is present on an individual pattern. This uses callouts to trace the the
+matching process, and this may be different between studied and non-studied
+patterns. If the pattern contains (*MARK) items there may also be differences,
+for the same reason. The \fB-s\fP command line option can be overridden for
+specific patterns that should never be studied (see the /S option below).
.TP 10
\fB-t\fP
Run each compile, study, and match many times with a timer, and output
@@ -193,10 +209,10 @@
\fB/<bsr_unicode>\fP PCRE_BSR_UNICODE
.sp
The modifiers that are enclosed in angle brackets are literal strings as shown,
-including the angle brackets, but the letters can be in either case. This
-example sets multiline matching with CRLF as the line ending sequence:
+including the angle brackets, but the letters within can be in either case.
+This example sets multiline matching with CRLF as the line ending sequence:
.sp
- /^abc/m<crlf>
+ /^abc/m<CRLF>
.sp
As well as turning on the PCRE_UTF8 option, the \fB/8\fP modifier also causes
any non-printing characters in output strings to be printed using the
@@ -290,9 +306,13 @@
The \fB/M\fP modifier causes the size of memory block used to hold the compiled
pattern to be output.
.P
-The \fB/S\fP modifier causes \fBpcre_study()\fP to be called after the
-expression has been compiled, and the results used when the expression is
-matched.
+If the \fB/S\fP modifier appears once, it causes \fBpcre_study()\fP to be
+called after the expression has been compiled, and the results used when the
+expression is matched. If \fB/S\fP appears twice, it suppresses studying, even
+if it was requested externally by the \fB-s\fP command line option. This makes
+it possible to specify that certain patterns are always studied, and others are
+never studied, independently of \fB-s\fP. This feature is used in the test
+files in a few cases where the output is different when the pattern is studied.
.P
The \fB/T\fP modifier must be followed by a single digit. It causes a specific
set of built-in character tables to be passed to \fBpcre_compile()\fP. It is
@@ -746,7 +766,7 @@
For example:
.sp
re> </some/file
- Compiled regex loaded from /some/file
+ Compiled pattern loaded from /some/file
No study data
.sp
When the pattern has been loaded, \fBpcretest\fP proceeds to read data lines in
@@ -792,6 +812,6 @@
.rs
.sp
.nf
-Last updated: 06 June 2011
+Last updated: 02 July 2011
Copyright (c) 1997-2011 University of Cambridge.
.fi
Modified: code/trunk/pcre_internal.h
===================================================================
--- code/trunk/pcre_internal.h 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/pcre_internal.h 2011-07-02 15:20:59 UTC (rev 612)
@@ -595,10 +595,10 @@
#define PCRE_JCHANGED 0x0010 /* j option used in regex */
#define PCRE_HASCRORLF 0x0020 /* explicit \r or \n in pattern */
-/* Options for the "extra" block produced by pcre_study(). */
+/* Flags for the "extra" block produced by pcre_study(). */
-#define PCRE_STUDY_MAPPED 0x01 /* a map of starting chars exists */
-#define PCRE_STUDY_MINLEN 0x02 /* a minimum length field exists */
+#define PCRE_STUDY_MAPPED 0x0001 /* a map of starting chars exists */
+#define PCRE_STUDY_MINLEN 0x0002 /* a minimum length field exists */
/* Masks for identifying the public options that are permitted at compile
time, run time, or study time, respectively. */
Modified: code/trunk/pcre_study.c
===================================================================
--- code/trunk/pcre_study.c 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/pcre_study.c 2011-07-02 15:20:59 UTC (rev 612)
@@ -66,9 +66,10 @@
rather than bytes.
Arguments:
- code pointer to start of group (the bracket)
- startcode pointer to start of the whole pattern
- options the compiling options
+ code pointer to start of group (the bracket)
+ startcode pointer to start of the whole pattern
+ options the compiling options
+ had_accept pointer to flag for (*ACCEPT) encountered
Returns: the minimum length
-1 if \C was encountered
@@ -77,7 +78,8 @@
*/
static int
-find_minlength(const uschar *code, const uschar *startcode, int options)
+find_minlength(const uschar *code, const uschar *startcode, int options,
+ BOOL *had_accept_ptr)
{
int length = -1;
BOOL utf8 = (options & PCRE_UTF8) != 0;
@@ -125,17 +127,23 @@
case OP_BRAPOS:
case OP_SBRAPOS:
case OP_ONCE:
- d = find_minlength(cc, startcode, options);
+ d = find_minlength(cc, startcode, options, had_accept_ptr);
if (d < 0) return d;
branchlength += d;
+ if (*had_accept_ptr) return branchlength;
do cc += GET(cc, 1); while (*cc == OP_ALT);
cc += 1 + LINK_SIZE;
break;
/* Reached end of a branch; if it's a ket it is the end of a nested
- call. If it's ALT it is an alternation in a nested call. If it is
- END it's the end of the outer call. All can be handled by the same code. */
+ call. If it's ALT it is an alternation in a nested call. If it is END it's
+ the end of the outer call. All can be handled by the same code. If it is
+ ACCEPT, it is essentially the same as END, but we set a flag so that
+ counting stops. */
+ case OP_ACCEPT:
+ *had_accept_ptr = TRUE;
+ /* Fall through */
case OP_ALT:
case OP_KET:
case OP_KETRMAX:
@@ -144,7 +152,7 @@
case OP_END:
if (length < 0 || (!had_recurse && branchlength < length))
length = branchlength;
- if (*cc != OP_ALT) return length;
+ if (op != OP_ALT) return length;
cc += 1 + LINK_SIZE;
branchlength = 0;
had_recurse = FALSE;
@@ -367,7 +375,11 @@
d = 0;
had_recurse = TRUE;
}
- else d = find_minlength(cs, startcode, options);
+ else
+ {
+ d = find_minlength(cs, startcode, options, had_accept_ptr);
+ *had_accept_ptr = FALSE;
+ }
}
else d = 0;
cc += 3;
@@ -411,7 +423,10 @@
if (cc > cs && cc < ce)
had_recurse = TRUE;
else
- branchlength += find_minlength(cs, startcode, options);
+ {
+ branchlength += find_minlength(cs, startcode, options, had_accept_ptr);
+ *had_accept_ptr = FALSE;
+ }
cc += 1 + LINK_SIZE;
break;
@@ -479,10 +494,9 @@
case OP_THEN_ARG:
cc += _pcre_OP_lengths[op] + cc[1+LINK_SIZE];
break;
-
+
/* The remaining opcodes are just skipped over. */
- case OP_ACCEPT:
case OP_CLOSE:
case OP_COMMIT:
case OP_FAIL:
@@ -688,6 +702,7 @@
while (try_next) /* Loop for items in this branch */
{
int rc;
+
switch(*tcode)
{
/* If we reach something we don't understand, it means a new opcode has
@@ -1200,6 +1215,7 @@
{
int min;
BOOL bits_set = FALSE;
+BOOL had_accept = FALSE;
uschar start_bits[32];
pcre_extra *extra;
pcre_study_data *study;
@@ -1257,7 +1273,7 @@
/* Find the minimum length of subject string. */
-switch(min = find_minlength(code, code, re->options))
+switch(min = find_minlength(code, code, re->options, &had_accept))
{
case -2: *errorptr = "internal error: missing capturing bracket"; break;
case -3: *errorptr = "internal error: opcode not recognized"; break;
Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/pcretest.c 2011-07-02 15:20:59 UTC (rev 612)
@@ -1436,6 +1436,7 @@
size_t size, regex_gotten_store;
int do_mark = 0;
int do_study = 0;
+ int no_force_study = 0;
int do_debug = debug;
int do_G = 0;
int do_g = 0;
@@ -1502,7 +1503,7 @@
}
}
- fprintf(outfile, "Compiled regex%s loaded from %s\n",
+ fprintf(outfile, "Compiled pattern%s loaded from %s\n",
do_flip? " (byte-inverted)" : "", p);
/* Need to know if UTF-8 for printing data strings */
@@ -1510,7 +1511,7 @@
new_info(re, NULL, PCRE_INFO_OPTIONS, &get_options);
use_utf8 = (get_options & PCRE_UTF8) != 0;
- /* Now see if there is any following study data */
+ /* Now see if there is any following study data. */
if (true_study_size != 0)
{
@@ -1624,7 +1625,14 @@
case 'P': do_posix = 1; break;
#endif
- case 'S': do_study = 1; break;
+ case 'S':
+ if (do_study == 0) do_study = 1; else
+ {
+ do_study = 0;
+ no_force_study = 1;
+ }
+ break;
+
case 'U': options |= PCRE_UNGREEDY; break;
case 'W': options |= PCRE_UCP; break;
case 'X': options |= PCRE_EXTRA; break;
@@ -1808,10 +1816,12 @@
true_size = ((real_pcre *)re)->size;
regex_gotten_store = gotten_store;
- /* If -s or /S was present, study the regexp to generate additional info to
- help with the matching. */
+ /* If -s or /S was present, study the regex to generate additional info to
+ help with the matching, unless the pattern has the SS option, which
+ suppresses the effect of /S (used for a few test patterns where studying is
+ never sensible). */
- if (do_study || force_study)
+ if (do_study || (force_study && !no_force_study))
{
if (timeit > 0)
{
@@ -2049,9 +2059,12 @@
/* Don't output study size; at present it is in any case a fixed
value, but it varies, depending on the computer architecture, and
so messes up the test suite. (And with the /F option, it might be
- flipped.) */
+ flipped.) If study was forced by an external -s, don't show this
+ information unless -i or -d was also present. This means that, except
+ when auto-callouts are involved, the output from runs with and without
+ -s should be identical. */
- if (do_study || force_study)
+ if (do_study || (force_study && showinfo && !no_force_study))
{
if (extra == NULL)
fprintf(outfile, "Study returned NULL\n");
@@ -2129,7 +2142,11 @@
}
else
{
- fprintf(outfile, "Compiled regex written to %s\n", to_file);
+ fprintf(outfile, "Compiled pattern written to %s\n", to_file);
+
+ /* If there is study data, write it, but verify the writing only
+ if the studying was requested by /S, not just by -s. */
+
if (extra != NULL)
{
if (fwrite(extra->study_data, 1, true_study_size, f) <
@@ -2139,7 +2156,6 @@
strerror(errno));
}
else fprintf(outfile, "Study data written to %s\n", to_file);
-
}
}
fclose(f);
Modified: code/trunk/perltest.pl
===================================================================
--- code/trunk/perltest.pl 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/perltest.pl 2011-07-02 15:20:59 UTC (rev 612)
@@ -103,6 +103,10 @@
$pattern =~ s/W(?=[a-zA-Z]*$)//;
+ # Remove /S or /SS from a pattern (asks pcretest to study or not to study)
+
+ $pattern =~ s/S(?=[a-zA-Z]*$)//g;
+
# Check that the pattern is valid
eval "\$_ =~ ${pattern}";
Modified: code/trunk/testdata/testinput11
===================================================================
--- code/trunk/testdata/testinput11 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/testdata/testinput11 2011-07-02 15:20:59 UTC (rev 612)
@@ -246,6 +246,7 @@
aaabccc
/(A (A|B(*ACCEPT)|C) D)(E)/x
+ AB
ABX
AADE
ACDE
@@ -403,7 +404,10 @@
AC
CB
-/(*MARK:A)(*SKIP:B)(C|X)/K
+/--- Force no study, otherwise mark is not seen. The studied version is in
+ test 2 because it isn't Perl-compatible. ---/
+
+/(*MARK:A)(*SKIP:B)(C|X)/KSS
C
D
@@ -435,9 +439,9 @@
/A(*:A)A+(*SKIP:A)(B|Z) | AC/xK
AAAC
-/--- Don't loop! ---/
+/--- Don't loop! Force no study, otherwise mark is not seen. ---/
-/(*:A)A+(*SKIP:A)(B|Z)/K
+/(*:A)A+(*SKIP:A)(B|Z)/KSS
AAAC
/--- This should succeed, as a non-existent skip name disables the skip ---/
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/testdata/testinput2 2011-07-02 15:20:59 UTC (rev 612)
@@ -1061,11 +1061,16 @@
/abc(?C)de(?C1)f/I
123abcdef
-/(?C1)\dabc(?C2)def/I
+/(?C1)\dabc(?C2)def/IS
1234abcdef
*** Failers
abcdef
+/(?C1)\dabc(?C2)def/ISS
+ 1234abcdef
+ *** Failers
+ abcdef
+
/(?C255)ab/I
/(?C256)ab/I
@@ -1310,29 +1315,44 @@
abcde
abcdfe
-/a*b/ICDZ
+/a*b/ICDZS
ab
aaaab
aaaacb
+/a*b/ICDZSS
+ ab
+ aaaab
+ aaaacb
+
/a+b/ICDZ
ab
aaaab
aaaacb
-/(abc|def)x/ICDZ
+/(abc|def)x/ICDZS
abcx
defx
+ ** Failers
abcdefzx
+/(abc|def)x/ICDZSS
+ abcx
+ defx
+ ** Failers
+ abcdefzx
+
/(ab|cd){3,4}/IC
ababab
abcdabcd
abcdcdcdcdcd
-/([ab]{,4}c|xy)/ICDZ
+/([ab]{,4}c|xy)/ICDZS
Note: that { does NOT introduce a quantifier
+/([ab]{,4}c|xy)/ICDZSS
+ Note: that { does NOT introduce a quantifier
+
/([ab]{1,4}c|xy){4,5}?123/ICDZ
aacaacaacaacaac123
@@ -1404,30 +1424,54 @@
1X
123456\P
-/abc/I>testsavedregex
+/abc/IS>testsavedregex
<testsavedregex
abc
** Failers
bca
-/abc/IF>testsavedregex
+/abc/ISS>testsavedregex
<testsavedregex
abc
** Failers
bca
+/abc/IFS>testsavedregex
+<testsavedregex
+ abc
+ ** Failers
+ bca
+
+/abc/IFSS>testsavedregex
+<testsavedregex
+ abc
+ ** Failers
+ bca
+
/(a|b)/IS>testsavedregex
<testsavedregex
abc
** Failers
def
+/(a|b)/ISS>testsavedregex
+<testsavedregex
+ abc
+ ** Failers
+ def
+
/(a|b)/ISF>testsavedregex
<testsavedregex
abc
** Failers
def
+/(a|b)/ISSF>testsavedregex
+<testsavedregex
+ abc
+ ** Failers
+ def
+
~<(\w+)/?>(.)*</(\1)>~smgI
<!DOCTYPE seite SYSTEM "http://www.lco.lineas.de/xmlCms.dtd">\n<seite>\n<dokumenteninformation>\n<seitentitel>Partner der LCO</seitentitel>\n<sprache>de</sprache>\n<seitenbeschreibung>Partner der LINEAS Consulting\nGmbH</seitenbeschreibung>\n<schluesselworte>LINEAS Consulting GmbH Hamburg\nPartnerfirmen</schluesselworte>\n<revisit>30 days</revisit>\n<robots>index,follow</robots>\n<menueinformation>\n<aktiv>ja</aktiv>\n<menueposition>3</menueposition>\n<menuetext>Partner</menuetext>\n</menueinformation>\n<lastedited>\n<autor>LCO</autor>\n<firma>LINEAS Consulting</firma>\n<datum>15.10.2003</datum>\n</lastedited>\n</dokumenteninformation>\n<inhalt>\n\n<absatzueberschrift>Die Partnerfirmen der LINEAS Consulting\nGmbH</absatzueberschrift>\n\n<absatz><link ziel="http://www.ca.com/" zielfenster="_blank">\n<bild name="logo_ca.gif" rahmen="no"/></link> <link\nziel="http://www.ey.com/" zielfenster="_blank"><bild\nname="logo_euy.gif" rahmen="no"/></link>\n</absatz>\n\n<absatz><link ziel="http://www.cisco.de/" zielfenster="_blank">\n<bild name="logo_cisco.gif" rahmen="ja"/></link></absatz>\n\n<absatz><link ziel="http://www.atelion.de/"\nzielfenster="_blank"><bild\nname="logo_atelion.gif" rahmen="no"/></link>\n</absatz>\n\n<absatz><link ziel="http://www.line-information.de/"\nzielfenster="_blank">\n<bild name="logo_line_information.gif" rahmen="no"/></link>\n</absatz>\n\n<absatz><bild name="logo_aw.gif" rahmen="no"/></absatz>\n\n<absatz><link ziel="http://www.incognis.de/"\nzielfenster="_blank"><bild\nname="logo_incognis.gif" rahmen="no"/></link></absatz>\n\n<absatz><link ziel="http://www.addcraft.com/"\nzielfenster="_blank"><bild\nname="logo_addcraft.gif" rahmen="no"/></link></absatz>\n\n<absatz><link ziel="http://www.comendo.com/"\nzielfenster="_blank"><bild\nname="logo_comendo.gif" rahmen="no"/></link></absatz>\n\n</inhalt>\n</seite>
@@ -3312,14 +3356,22 @@
/A(*PRUNE:A)B/K
ACAB
-/(*MARK:A)(*PRUNE:B)(C|X)/K
+/(*MARK:A)(*PRUNE:B)(C|X)/KS
C
D
-/(*MARK:A)(*THEN:B)(C|X)/K
+/(*MARK:A)(*PRUNE:B)(C|X)/KSS
C
D
+/(*MARK:A)(*THEN:B)(C|X)/KS
+ C
+ D
+
+/(*MARK:A)(*THEN:B)(C|X)/KSS
+ C
+ D
+
/--- This should fail, as the skip causes a bump to offset 3 (the skip) ---/
/A(*MARK:A)A+(*SKIP)(B|Z) | AC/xK
@@ -3681,4 +3733,16 @@
/-- --/
+/-- These studied versions are here because they are not Perl-compatible; the
+ studying means the mark is not seen. --/
+
+/(*MARK:A)(*SKIP:B)(C|X)/KS
+ C
+ D
+
+/(*:A)A+(*SKIP:A)(B|Z)/KS
+ AAAC
+
+/-- --/
+
/-- End of testinput2 --/
Modified: code/trunk/testdata/testinput5
===================================================================
--- code/trunk/testdata/testinput5 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/testdata/testinput5 2011-07-02 15:20:59 UTC (rev 612)
@@ -198,7 +198,7 @@
/\xC3\xC3\xC3xxx/8
-/\xC3\xC3\xC3xxx/8?DZ
+/\xC3\xC3\xC3xxx/8?DZSS
/abc/8
\xC3]
Modified: code/trunk/testdata/testinput7
===================================================================
--- code/trunk/testdata/testinput7 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/testdata/testinput7 2011-07-02 15:20:59 UTC (rev 612)
@@ -3973,13 +3973,13 @@
ac
bbbbc
-/abc/>testsavedregex
+/abc/SS>testsavedregex
<testsavedregex
abc
*** Failers
bca
-/abc/F>testsavedregex
+/abc/FSS>testsavedregex
<testsavedregex
abc
*** Failers
Modified: code/trunk/testdata/testoutput11
===================================================================
--- code/trunk/testdata/testoutput11 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/testdata/testoutput11 2011-07-02 15:20:59 UTC (rev 612)
@@ -501,6 +501,10 @@
No match
/(A (A|B(*ACCEPT)|C) D)(E)/x
+ AB
+ 0: AB
+ 1: AB
+ 2: B
ABX
0: AB
1: AB
@@ -821,7 +825,10 @@
CB
No match, mark = B
-/(*MARK:A)(*SKIP:B)(C|X)/K
+/--- Force no study, otherwise mark is not seen. The studied version is in
+ test 2 because it isn't Perl-compatible. ---/
+
+/(*MARK:A)(*SKIP:B)(C|X)/KSS
C
0: C
1: C
@@ -864,9 +871,9 @@
AAAC
0: AC
-/--- Don't loop! ---/
+/--- Don't loop! Force no study, otherwise mark is not seen. ---/
-/(*:A)A+(*SKIP:A)(B|Z)/K
+/(*:A)A+(*SKIP:A)(B|Z)/KSS
AAAC
No match, mark = A
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/testdata/testoutput2 2011-07-02 15:20:59 UTC (rev 612)
@@ -3580,11 +3580,13 @@
1 ^ ^ f
0: abcdef
-/(?C1)\dabc(?C2)def/I
+/(?C1)\dabc(?C2)def/IS
Capturing subpattern count = 0
No options
No first char
Need char = 'f'
+Subject length lower bound = 7
+Starting byte set: 0 1 2 3 4 5 6 7 8 9
1234abcdef
--->1234abcdef
1 ^ \d
@@ -3596,6 +3598,24 @@
*** Failers
No match
abcdef
+No match
+
+/(?C1)\dabc(?C2)def/ISS
+Capturing subpattern count = 0
+No options
+No first char
+Need char = 'f'
+ 1234abcdef
+--->1234abcdef
+ 1 ^ \d
+ 1 ^ \d
+ 1 ^ \d
+ 1 ^ \d
+ 2 ^ ^ d
+ 0: 4abcdef
+ *** Failers
+No match
+ abcdef
--->abcdef
1 ^ \d
1 ^ \d
@@ -4778,7 +4798,7 @@
+4 ^ ^ e
No match
-/a*b/ICDZ
+/a*b/ICDZS
------------------------------------------------------------------
Bra
Callout 255 0 2
@@ -4793,6 +4813,8 @@
Options:
No first char
Need char = 'b'
+Subject length lower bound = 1
+Starting byte set: a b
ab
--->ab
+0 ^ a*
@@ -4815,6 +4837,48 @@
+2 ^ ^ b
+0 ^ a*
+2 ^^ b
+ +0 ^ a*
+ +2 ^ b
+ +3 ^^
+ 0: b
+
+/a*b/ICDZSS
+------------------------------------------------------------------
+ Bra
+ Callout 255 0 2
+ a*+
+ Callout 255 2 1
+ b
+ Callout 255 3 0
+ Ket
+ End
+------------------------------------------------------------------
+Capturing subpattern count = 0
+Options:
+No first char
+Need char = 'b'
+ ab
+--->ab
+ +0 ^ a*
+ +2 ^^ b
+ +3 ^ ^
+ 0: ab
+ aaaab
+--->aaaab
+ +0 ^ a*
+ +2 ^ ^ b
+ +3 ^ ^
+ 0: aaaab
+ aaaacb
+--->aaaacb
+ +0 ^ a*
+ +2 ^ ^ b
+ +0 ^ a*
+ +2 ^ ^ b
+ +0 ^ a*
+ +2 ^ ^ b
+ +0 ^ a*
+ +2 ^^ b
+0 ^ a*
+2 ^ b
+0 ^ a*
@@ -4861,7 +4925,7 @@
+2 ^^ b
No match
-/(abc|def)x/ICDZ
+/(abc|def)x/ICDZS
------------------------------------------------------------------
Bra
Callout 255 0 9
@@ -4892,6 +4956,8 @@
Options:
No first char
Need char = 'x'
+Subject length lower bound = 4
+Starting byte set: a d
abcx
--->abcx
+0 ^ (abc|def)
@@ -4915,6 +4981,8 @@
+10 ^ ^
0: defx
1: def
+ ** Failers
+No match
abcdefzx
--->abcdefzx
+0 ^ (abc|def)
@@ -4924,6 +4992,80 @@
+4 ^ ^ |
+9 ^ ^ x
+5 ^ d
+ +0 ^ (abc|def)
+ +1 ^ a
+ +5 ^ d
+ +6 ^^ e
+ +7 ^ ^ f
+ +8 ^ ^ )
+ +9 ^ ^ x
+No match
+
+/(abc|def)x/ICDZSS
+------------------------------------------------------------------
+ Bra
+ Callout 255 0 9
+ CBra 1
+ Callout 255 1 1
+ a
+ Callout 255 2 1
+ b
+ Callout 255 3 1
+ c
+ Callout 255 4 0
+ Alt
+ Callout 255 5 1
+ d
+ Callout 255 6 1
+ e
+ Callout 255 7 1
+ f
+ Callout 255 8 0
+ Ket
+ Callout 255 9 1
+ x
+ Callout 255 10 0
+ Ket
+ End
+------------------------------------------------------------------
+Capturing subpattern count = 1
+Options:
+No first char
+Need char = 'x'
+ abcx
+--->abcx
+ +0 ^ (abc|def)
+ +1 ^ a
+ +2 ^^ b
+ +3 ^ ^ c
+ +4 ^ ^ |
+ +9 ^ ^ x
++10 ^ ^
+ 0: abcx
+ 1: abc
+ defx
+--->defx
+ +0 ^ (abc|def)
+ +1 ^ a
+ +5 ^ d
+ +6 ^^ e
+ +7 ^ ^ f
+ +8 ^ ^ )
+ +9 ^ ^ x
++10 ^ ^
+ 0: defx
+ 1: def
+ ** Failers
+No match
+ abcdefzx
+--->abcdefzx
+ +0 ^ (abc|def)
+ +1 ^ a
+ +2 ^^ b
+ +3 ^ ^ c
+ +4 ^ ^ |
+ +9 ^ ^ x
+ +5 ^ d
+0 ^ (abc|def)
+1 ^ a
+5 ^ d
@@ -5015,7 +5157,7 @@
0: abcdcdcd
1: cd
-/([ab]{,4}c|xy)/ICDZ
+/([ab]{,4}c|xy)/ICDZS
------------------------------------------------------------------
Bra
Callout 255 0 14
@@ -5048,8 +5190,59 @@
Options:
No first char
No need char
+Subject length lower bound = 2
+Starting byte set: a b x
Note: that { does NOT introduce a quantifier
--->Note: that { does NOT introduce a quantifier
+ +0 ^ ([ab]{,4}c|xy)
+ +1 ^ [ab]
+ +5 ^^ {
++11 ^ x
+ +0 ^ ([ab]{,4}c|xy)
+ +1 ^ [ab]
+ +5 ^^ {
++11 ^ x
+ +0 ^ ([ab]{,4}c|xy)
+ +1 ^ [ab]
+ +5 ^^ {
++11 ^ x
+No match
+
+/([ab]{,4}c|xy)/ICDZSS
+------------------------------------------------------------------
+ Bra
+ Callout 255 0 14
+ CBra 1
+ Callout 255 1 4
+ [ab]
+ Callout 255 5 1
+ {
+ Callout 255 6 1
+ ,
+ Callout 255 7 1
+ 4
+ Callout 255 8 1
+ }
+ Callout 255 9 1
+ c
+ Callout 255 10 0
+ Alt
+ Callout 255 11 1
+ x
+ Callout 255 12 1
+ y
+ Callout 255 13 0
+ Ket
+ Callout 255 14 0
+ Ket
+ End
+------------------------------------------------------------------
+Capturing subpattern count = 1
+Options:
+No first char
+No need char
+ Note: that { does NOT introduce a quantifier
+--->Note: that { does NOT introduce a quantifier
+0 ^ ([ab]{,4}c|xy)
+1 ^ [ab]
+11 ^ x
@@ -5467,14 +5660,33 @@
123456\P
No match
-/abc/I>testsavedregex
+/abc/IS>testsavedregex
Capturing subpattern count = 0
No options
First char = 'a'
Need char = 'c'
-Compiled regex written to testsavedregex
+Subject length lower bound = 3
+No set of starting bytes
+Compiled pattern written to testsavedregex
+Study data written to testsavedregex
<testsavedregex
-Compiled regex loaded from testsavedregex
+Compiled pattern loaded from testsavedregex
+Study data loaded from testsavedregex
+ abc
+ 0: abc
+ ** Failers
+No match
+ bca
+No match
+
+/abc/ISS>testsavedregex
+Capturing subpattern count = 0
+No options
+First char = 'a'
+Need char = 'c'
+Compiled pattern written to testsavedregex
+<testsavedregex
+Compiled pattern loaded from testsavedregex
No study data
abc
0: abc
@@ -5483,14 +5695,33 @@
bca
No match
-/abc/IF>testsavedregex
+/abc/IFS>testsavedregex
Capturing subpattern count = 0
No options
First char = 'a'
Need char = 'c'
-Compiled regex written to testsavedregex
+Subject length lower bound = 3
+No set of starting bytes
+Compiled pattern written to testsavedregex
+Study data written to testsavedregex
<testsavedregex
-Compiled regex (byte-inverted) loaded from testsavedregex
+Compiled pattern (byte-inverted) loaded from testsavedregex
+Study data loaded from testsavedregex
+ abc
+ 0: abc
+ ** Failers
+No match
+ bca
+No match
+
+/abc/IFSS>testsavedregex
+Capturing subpattern count = 0
+No options
+First char = 'a'
+Need char = 'c'
+Compiled pattern written to testsavedregex
+<testsavedregex
+Compiled pattern (byte-inverted) loaded from testsavedregex
No study data
abc
0: abc
@@ -5506,10 +5737,10 @@
No need char
Subject length lower bound = 1
Starting byte set: a b
-Compiled regex written to testsavedregex
+Compiled pattern written to testsavedregex
Study data written to testsavedregex
<testsavedregex
-Compiled regex loaded from testsavedregex
+Compiled pattern loaded from testsavedregex
Study data loaded from testsavedregex
abc
0: a
@@ -5520,6 +5751,24 @@
def
No match
+/(a|b)/ISS>testsavedregex
+Capturing subpattern count = 1
+No options
+No first char
+No need char
+Compiled pattern written to testsavedregex
+<testsavedregex
+Compiled pattern loaded from testsavedregex
+No study data
+ abc
+ 0: a
+ 1: a
+ ** Failers
+ 0: a
+ 1: a
+ def
+No match
+
/(a|b)/ISF>testsavedregex
Capturing subpattern count = 1
No options
@@ -5527,10 +5776,10 @@
No need char
Subject length lower bound = 1
Starting byte set: a b
-Compiled regex written to testsavedregex
+Compiled pattern written to testsavedregex
Study data written to testsavedregex
<testsavedregex
-Compiled regex (byte-inverted) loaded from testsavedregex
+Compiled pattern (byte-inverted) loaded from testsavedregex
Study data loaded from testsavedregex
abc
0: a
@@ -5541,6 +5790,24 @@
def
No match
+/(a|b)/ISSF>testsavedregex
+Capturing subpattern count = 1
+No options
+No first char
+No need char
+Compiled pattern written to testsavedregex
+<testsavedregex
+Compiled pattern (byte-inverted) loaded from testsavedregex
+No study data
+ abc
+ 0: a
+ 1: a
+ ** Failers
+ 0: a
+ 1: a
+ def
+No match
+
~<(\w+)/?>(.)*</(\1)>~smgI
Capturing subpattern count = 3
Max back reference = 1
@@ -10805,20 +11072,36 @@
ACAB
0: AB
-/(*MARK:A)(*PRUNE:B)(C|X)/K
+/(*MARK:A)(*PRUNE:B)(C|X)/KS
C
0: C
1: C
MK: A
D
+No match
+
+/(*MARK:A)(*PRUNE:B)(C|X)/KSS
+ C
+ 0: C
+ 1: C
+MK: A
+ D
No match, mark = B
-/(*MARK:A)(*THEN:B)(C|X)/K
+/(*MARK:A)(*THEN:B)(C|X)/KS
C
0: C
1: C
MK: A
D
+No match
+
+/(*MARK:A)(*THEN:B)(C|X)/KSS
+ C
+ 0: C
+ 1: C
+MK: A
+ D
No match, mark = B
/--- This should fail, as the skip causes a bump to offset 3 (the skip) ---/
@@ -11577,4 +11860,21 @@
/-- --/
+/-- These studied versions are here because they are not Perl-compatible; the
+ studying means the mark is not seen. --/
+
+/(*MARK:A)(*SKIP:B)(C|X)/KS
+ C
+ 0: C
+ 1: C
+MK: A
+ D
+No match
+
+/(*:A)A+(*SKIP:A)(B|Z)/KS
+ AAAC
+No match
+
+/-- --/
+
/-- End of testinput2 --/
Modified: code/trunk/testdata/testoutput5
===================================================================
--- code/trunk/testdata/testoutput5 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/testdata/testoutput5 2011-07-02 15:20:59 UTC (rev 612)
@@ -802,7 +802,7 @@
/\xC3\xC3\xC3xxx/8
Failed: invalid UTF-8 string at offset 0
-/\xC3\xC3\xC3xxx/8?DZ
+/\xC3\xC3\xC3xxx/8?DZSS
------------------------------------------------------------------
Bra
\X{c0}\X{c0}\X{c0}xxx
@@ -2184,7 +2184,7 @@
No options
No first char
No need char
-Subject length lower bound = 2
+Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85
/\R/SI8
@@ -2192,7 +2192,7 @@
Options: utf8
No first char
No need char
-Subject length lower bound = 2
+Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \xc2 \xe2
/\h*A/SI8
Modified: code/trunk/testdata/testoutput7
===================================================================
--- code/trunk/testdata/testoutput7 2011-06-29 08:49:21 UTC (rev 611)
+++ code/trunk/testdata/testoutput7 2011-07-02 15:20:59 UTC (rev 612)
@@ -1011,10 +1011,10 @@
0: bbbbbbbbbbbbcdX
/(a|b)/SF>testsavedregex
-Compiled regex written to testsavedregex
+Compiled pattern written to testsavedregex
Study data written to testsavedregex
<testsavedregex
-Compiled regex (byte-inverted) loaded from testsavedregex
+Compiled pattern (byte-inverted) loaded from testsavedregex
Study data loaded from testsavedregex
abc
0: a
@@ -6439,10 +6439,10 @@
bbbbc
0: c
-/abc/>testsavedregex
-Compiled regex written to testsavedregex
+/abc/SS>testsavedregex
+Compiled pattern written to testsavedregex
<testsavedregex
-Compiled regex loaded from testsavedregex
+Compiled pattern loaded from testsavedregex
No study data
abc
0: abc
@@ -6451,10 +6451,10 @@
bca
No match
-/abc/F>testsavedregex
-Compiled regex written to testsavedregex
+/abc/FSS>testsavedregex
+Compiled pattern written to testsavedregex
<testsavedregex
-Compiled regex (byte-inverted) loaded from testsavedregex
+Compiled pattern (byte-inverted) loaded from testsavedregex
No study data
abc
0: abc
@@ -6464,10 +6464,10 @@
No match
/(a|b)/S>testsavedregex
-Compiled regex written to testsavedregex
+Compiled pattern written to testsavedregex
Study data written to testsavedregex
<testsavedregex
-Compiled regex loaded from testsavedregex
+Compiled pattern loaded from testsavedregex
Study data loaded from testsavedregex
abc
0: a
@@ -6477,10 +6477,10 @@
No match
/(a|b)/SF>testsavedregex
-Compiled regex written to testsavedregex
+Compiled pattern written to testsavedregex
Study data written to testsavedregex
<testsavedregex
-Compiled regex (byte-inverted) loaded from testsavedregex
+Compiled pattern (byte-inverted) loaded from testsavedregex
Study data loaded from testsavedregex
abc
0: a