https://bugs.exim.org/show_bug.cgi?id=1848
Bug ID: 1848
Summary: pcregrep outputs duplicate matches
Product: PCRE
Version: 8.38
Hardware: x86
OS: Linux
Status: NEW
Severity: bug
Priority: medium
Component: Code
Assignee: ph10@???
Reporter: d.f.fischer@???
CC: pcre-dev@???
Created attachment 895
-->
https://bugs.exim.org/attachment.cgi?id=895&action=edit
test case
Attached is an input file for pcretest. Newlines expanded for better
readability, it searches for the multiline pattern
match (\d+):
(.)
in the text
match 1:
a
match 2:
b
match 3:
c
match 4:
d
match 5:
e
Note that pattern and text both end with a newline. The bug only appears when
this is the case. When it is run through pcretest, five matches are found as
expected.
$ pcretest inputPCRE version 8.38 2015-11-23
~match (\d+):\n (.)\n~Gm
match 1:\n a\nmatch 2:\n b\nmatch 3:\n c\nmatch 4:\n d\nmatch 5:\n e\n
0: match 1:\x0a a\x0a
1: 1
2: a
0: match 2:\x0a b\x0a
1: 2
2: b
0: match 3:\x0a c\x0a
1: 3
2: c
0: match 4:\x0a d\x0a
1: 4
2: d
0: match 5:\x0a e\x0a
1: 5
2: e
But when the same is attempted using pcregrep instead, the second match is
duplicated, the third match appears tripled, the fourth quadrupled, et cetera.
$ tail -n1 input | sed 's/\\n/\n/g' | \
$ pcregrep --om-separator / -Mo0 -o1 -o2 \
$ "$(pcregrep -o1 '~(.+)~' input)"
match 1:
a
/1/a
match 2:
b
/2/b
match 3:
c
/3/c
match 4:
d
/4/d
match 5:
e
/5/e
match 2:
b
/2/b
match 3:
c
/3/c
match 4:
d
/4/d
match 5:
e
/5/e
match 3:
c
/3/c
match 4:
d
/4/d
match 5:
e
/5/e
match 4:
d
/4/d
match 5:
e
/5/e
match 5:
e
/5/e
Instead, pcregrep should output each correctly found match only a single time.
--
You are receiving this mail because:
You are on the CC list for the bug.