https://bugs.exim.org/show_bug.cgi?id=2641
Bug ID: 2641
Summary: test pattern delimiters
Product: PCRE
Version: 10.35 (PCRE2)
Hardware: x86
OS: Linux
Status: NEW
Severity: bug
Priority: medium
Component: Code
Assignee: ph10@???
Reporter: hv@???
CC: pcre-dev@???
Comments in perltest say:
# Unless
# "subject_literal" is on the pattern, data lines are processed as
# Perl double-quoted strings, so if they contain " $ or @ characters, these
# have to be escaped. For this reason, all such characters in the
# Perl-compatible testinput1 and testinput4 files are escaped so that they can
# be used for perltest as well as for pcre2test.
I assume that by "data lines" it means the strings to match on rather than the
patterns; the patterns are processsed by C< eval "\$_ =~ ${pattern}" >, which
will interpret the pattern as a regexp rather than a double-quoted string
_except_ if certain special delimiters such as C<"> or C<'> are used.
For no obvious reason (except the first), some of the patterns in
testdata/testinput1 are enclosed in those special delimiters:
1960:"(?>.*/)foo"
3834:"(?x)(?-x: \s*#\s*)"
3839:"(?x-is)(?:(?-ixs) \s*#\s*) include"
5235:"(?>.*)foo"
5239:"(?>.*?)foo"
5662:'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++'
5665:'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
5668:'\A(?:[^\"]++|\"(?:[^\"]++|\"\")++\")++'
5671:'\A([^\"1]++|[\"2]([^\"3]*+|[\"4][\"5])*+[\"6])++'
5730:"Z*(|d*){216}"
5732:"(?1)(?#?'){8}(a)"
5744:"(?|(\k'Pm')|(?'Pm'))"
5836:'(?>ab|abab){1,5}?M'
5839:'(?>ab|abab){2}?M'
5842:'((?(?=(a))a)+k)'
5845:'((?(?=(a))a|)+k)'
5848:'(?(?!(b))a|b)+k'
6414:"(?<=X(?(DEFINE)(A)))X(*F)"
6418:"(?<=X(?(DEFINE)(A)))."
6421:"(?<=X(?(DEFINE)(.*))Y)."
6424:"(?<=X(?(DEFINE)(Y))(?1))."
6427:"(?(DEFINE)(?<foo>bar))(?<![-a-z0-9])word"
This causes problems for example with the pattern at line 3834: because the
special delimiter causes it to be interpolated like a double-quoted string, the
"\s" in the pattern are interpolated as "s", so the wrong pattern results.
I'd suggest changing the delimiter to '/' for all of these except the first,
and for that one using something less special such as '!'.
There are two similar cases in testinput4:
479:"(?s)(.{1,5})"utf
2223:"[\S\V\H]"utf
Hope this helps,
Hugo van der Sanden
--
You are receiving this mail because:
You are on the CC list for the bug.