[pcre-dev] [Bug 2641] New: test pattern delimiters

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2641] New: test pattern delimiters
https://bugs.exim.org/show_bug.cgi?id=2641

            Bug ID: 2641
           Summary: test pattern delimiters
           Product: PCRE
           Version: 10.35 (PCRE2)
          Hardware: x86
                OS: Linux
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
          Assignee: ph10@???
          Reporter: hv@???
                CC: pcre-dev@???


Comments in perltest say:

# Unless
# "subject_literal" is on the pattern, data lines are processed as
# Perl double-quoted strings, so if they contain " $ or @ characters, these
# have to be escaped. For this reason, all such characters in the
# Perl-compatible testinput1 and testinput4 files are escaped so that they can
# be used for perltest as well as for pcre2test.

I assume that by "data lines" it means the strings to match on rather than the
patterns; the patterns are processsed by C< eval "\$_ =~ ${pattern}" >, which
will interpret the pattern as a regexp rather than a double-quoted string
_except_ if certain special delimiters such as C<"> or C<'> are used.

For no obvious reason (except the first), some of the patterns in
testdata/testinput1 are enclosed in those special delimiters:
1960:"(?>.*/)foo"
3834:"(?x)(?-x: \s*#\s*)"
3839:"(?x-is)(?:(?-ixs) \s*#\s*) include"
5235:"(?>.*)foo"
5239:"(?>.*?)foo"
5662:'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++'
5665:'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
5668:'\A(?:[^\"]++|\"(?:[^\"]++|\"\")++\")++'
5671:'\A([^\"1]++|[\"2]([^\"3]*+|[\"4][\"5])*+[\"6])++'
5730:"Z*(|d*){216}"
5732:"(?1)(?#?'){8}(a)"
5744:"(?|(\k'Pm')|(?'Pm'))"
5836:'(?>ab|abab){1,5}?M'
5839:'(?>ab|abab){2}?M'
5842:'((?(?=(a))a)+k)'
5845:'((?(?=(a))a|)+k)'
5848:'(?(?!(b))a|b)+k'
6414:"(?<=X(?(DEFINE)(A)))X(*F)"
6418:"(?<=X(?(DEFINE)(A)))."
6421:"(?<=X(?(DEFINE)(.*))Y)."
6424:"(?<=X(?(DEFINE)(Y))(?1))."
6427:"(?(DEFINE)(?<foo>bar))(?<![-a-z0-9])word"

This causes problems for example with the pattern at line 3834: because the
special delimiter causes it to be interpolated like a double-quoted string, the
"\s" in the pattern are interpolated as "s", so the wrong pattern results.

I'd suggest changing the delimiter to '/' for all of these except the first,
and for that one using something less special such as '!'.

There are two similar cases in testinput4:
479:"(?s)(.{1,5})"utf
2223:"[\S\V\H]"utf

Hope this helps,

Hugo van der Sanden

--
You are receiving this mail because:
You are on the CC list for the bug.