https://bugs.exim.org/show_bug.cgi?id=2407
Bug ID: 2407
Summary: PCRE2 10.33: pcre2grep: `--only-matching' produces no
output when it should
Product: PCRE
Version: 10.33 (PCRE2)
Hardware: x86
OS: Linux
Status: NEW
Severity: bug
Priority: medium
Component: Code
Assignee: ph10@???
Reporter: stvar@???
CC: pcre-dev@???
Dear Philip,
While providing to 'pcre2grep' a quite big regex that was
generated programmatically out of smaller ones, I ran into
the following issue of this program:
When 'pcre2grep' gets an regex that has a lots of capturing
groups -- to be precise: when it contains more than 32 such
groups -- is does not output matched text if the program was
invoked with `--only-matching'.
Here is a short example of the behavior I'm describing (the
complete test scenario is attached at the end of this text):
$ gen-regex
(\b1\b)|(\b2\b)|...|(\b32\b)|(\b33\b)
The stock 'pcre2grep' produces the following incorrect output:
$ print 32 33|pcre2grep -oe "$(gen-regex)"
32
$ print 32 33|pcre2grep -noe "$(gen-regex)"
1:32
2:
The source file 'pcre2grep.c' reveals that the magic number
'33' from above is none other than the hard-coded constant
'OFFSET_SIZE'.
Further looking into that file I obtained an immediate fix:
--- pcre2-10.33/src/pcre2grep.c
+++ pcre2-10.33/src/pcre2grep.c
@@ -2688,7 +2688,7 @@
for (om = only_matching; om != NULL; om = om->next)
{
int n = om->groupnum;
- if (n < mrc)
+ if (n == 0 || n < mrc)
{
int plen = offsets[2*n + 1] - offsets[2*n];
if (plen > 0)
that makes 'pcre2grep' to produces the normally expected
output:
$ print 32 33|pcre2grep -oe "$(gen-regex)"
32
33
$ print 32 33|pcre2grep -noe "$(gen-regex)"
1:32
2:33
The variable 'mrc' is '0' when the function 'pcre2_match'
returned '0'. This indicates that the vector of offsets
is too small to accommodate all the capturing groups of
the input regex. However, in case the user supplied no
argument to `-o|--only-matching' (leading to 'n' and
'om->groupnum' being '0'), the program should -- and
surely it can -- print out the entire matched text.
In case the program is invoked with `-o|--only-matching'
that has an argument greater than '0', it should report
somehow to the user the error condition indicated by 'mrc'
being '0'.
Sincerely,
Stefan Vargyas.
-----------------------
Complete Test Scenario:
# print [ARG...]
$ print() { printf '%s\n' "$@"; }
# gen-regex [NUM]
$ gen-regex() { local n="${1:-33}"; [[ "$n" == +([0-9]) ]] || return 1;
for((i=1;i<=n;i++));do echo -n "(\b$i\b)"; [ "$i" -lt "$n" ] && echo -n '|';
done; echo; }
# gen-cmds [OPT...]
$ gen-cmds() { local o="$@"; for((k=32;k<=34;k++));do echo "print 32 33
34|pcre2grep ${o:+$o }-oe \"\$(gen-regex $k)\""; done; }
# meta command:
$ gen-cmds
# output is as expected:
$ print 32 33 34|pcre2grep -oe "$(gen-regex 32)"
32
# expected output: $'32\n33\n'
$ print 32 33 34|pcre2grep -oe "$(gen-regex 33)"
32
# expected output: $'32\n33\n34\n'
$ print 32 33 34|pcre2grep -oe "$(gen-regex 34)"
32
# meta command:
$ gen-cmds -n
# output is as expected:
$ print 32 33 34|pcre2grep -n -oe "$(gen-regex 32)"
1:32
# expected output: $'1:32\n2:33\n'
$ print 32 33 34|pcre2grep -n -oe "$(gen-regex 33)"
1:32
2:
# expected output: $'1:32\n2:33\n3:34\n'
$ print 32 33 34|pcre2grep -n -oe "$(gen-regex 34)"
1:32
2:
3:
--
You are receiving this mail because:
You are on the CC list for the bug.