[pcre-dev] [Bug 2407] New: PCRE2 10.33: pcre2grep: `--only-…

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2407] New: PCRE2 10.33: pcre2grep: `--only-matching' produces no output when it should
https://bugs.exim.org/show_bug.cgi?id=2407

            Bug ID: 2407
           Summary: PCRE2 10.33: pcre2grep: `--only-matching' produces no
                    output when it should
           Product: PCRE
           Version: 10.33 (PCRE2)
          Hardware: x86
                OS: Linux
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
          Assignee: ph10@???
          Reporter: stvar@???
                CC: pcre-dev@???


Dear Philip,

While providing to 'pcre2grep' a quite big regex that was
generated programmatically out of smaller ones, I ran into
the following issue of this program:

When 'pcre2grep' gets an regex that has a lots of capturing
groups -- to be precise: when it contains more than 32 such
groups -- is does not output matched text if the program was
invoked with `--only-matching'.

Here is a short example of the behavior I'm describing (the
complete test scenario is attached at the end of this text):

$ gen-regex
(\b1\b)|(\b2\b)|...|(\b32\b)|(\b33\b)

The stock 'pcre2grep' produces the following incorrect output:

$ print 32 33|pcre2grep -oe "$(gen-regex)"
32

$ print 32 33|pcre2grep -noe "$(gen-regex)"
1:32
2:

The source file 'pcre2grep.c' reveals that the magic number
'33' from above is none other than the hard-coded constant
'OFFSET_SIZE'.

Further looking into that file I obtained an immediate fix:

  --- pcre2-10.33/src/pcre2grep.c
  +++ pcre2-10.33/src/pcre2grep.c
  @@ -2688,7 +2688,7 @@
             for (om = only_matching; om != NULL; om = om->next)
               {
               int n = om->groupnum;
  -            if (n < mrc)
  +            if (n == 0 || n < mrc)
                 {
                 int plen = offsets[2*n + 1] - offsets[2*n];
                 if (plen > 0)


that makes 'pcre2grep' to produces the normally expected
output:

$ print 32 33|pcre2grep -oe "$(gen-regex)"
32
33

$ print 32 33|pcre2grep -noe "$(gen-regex)"
1:32
2:33

The variable 'mrc' is '0' when the function 'pcre2_match'
returned '0'. This indicates that the vector of offsets
is too small to accommodate all the capturing groups of
the input regex. However, in case the user supplied no
argument to `-o|--only-matching' (leading to 'n' and
'om->groupnum' being '0'), the program should -- and
surely it can -- print out the entire matched text.

In case the program is invoked with `-o|--only-matching'
that has an argument greater than '0', it should report
somehow to the user the error condition indicated by 'mrc'
being '0'.

Sincerely,

Stefan Vargyas.


-----------------------
Complete Test Scenario:

# print [ARG...]
$ print() { printf '%s\n' "$@"; }

# gen-regex [NUM]
$ gen-regex() { local n="${1:-33}"; [[ "$n" == +([0-9]) ]] || return 1;
for((i=1;i<=n;i++));do echo -n "(\b$i\b)"; [ "$i" -lt "$n" ] && echo -n '|';
done; echo; }

# gen-cmds [OPT...]
$ gen-cmds() { local o="$@"; for((k=32;k<=34;k++));do echo "print 32 33
34|pcre2grep ${o:+$o }-oe \"\$(gen-regex $k)\""; done; }

# meta command:
$ gen-cmds

# output is as expected:
$ print 32 33 34|pcre2grep -oe "$(gen-regex 32)"
32

# expected output: $'32\n33\n'
$ print 32 33 34|pcre2grep -oe "$(gen-regex 33)"
32

# expected output: $'32\n33\n34\n'
$ print 32 33 34|pcre2grep -oe "$(gen-regex 34)"
32

# meta command:
$ gen-cmds -n

# output is as expected:
$ print 32 33 34|pcre2grep -n -oe "$(gen-regex 32)"
1:32

# expected output: $'1:32\n2:33\n'
$ print 32 33 34|pcre2grep -n -oe "$(gen-regex 33)"
1:32
2:

# expected output: $'1:32\n2:33\n3:34\n'
$ print 32 33 34|pcre2grep -n -oe "$(gen-regex 34)"
1:32
2:
3:

--
You are receiving this mail because:
You are on the CC list for the bug.