[pcre-dev] [Bug 1848] New: pcregrep outputs duplicate matche…

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
New-Topics: [pcre-dev] [Bug 1848] pcregrep outputs duplicate matches, [pcre-dev] [Bug 1848] pcregrep outputs duplicate matches, [pcre-dev] [Bug 1848] pcregrep outputs duplicate matches, [pcre-dev] [Bug 1848] pcregrep outputs duplicate matches, [pcre-dev] [Bug 1848] pcregrep outputs duplicate matches, [pcre-dev] [Bug 1848] pcregrep outputs duplicate matches
Subject: [pcre-dev] [Bug 1848] New: pcregrep outputs duplicate matches
https://bugs.exim.org/show_bug.cgi?id=1848

            Bug ID: 1848
           Summary: pcregrep outputs duplicate matches
           Product: PCRE
           Version: 8.38
          Hardware: x86
                OS: Linux
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
          Assignee: ph10@???
          Reporter: d.f.fischer@???
                CC: pcre-dev@???


Created attachment 895
--> https://bugs.exim.org/attachment.cgi?id=895&action=edit
test case

Attached is an input file for pcretest. Newlines expanded for better
readability, it searches for the multiline pattern

    match (\d+):
     (.)


in the text

    match 1:
     a
    match 2:
     b
    match 3:
     c
    match 4:
     d
    match 5:
     e


Note that pattern and text both end with a newline. The bug only appears when
this is the case. When it is run through pcretest, five matches are found as
expected.

    $ pcretest inputPCRE version 8.38 2015-11-23
    ~match (\d+):\n (.)\n~Gm
    match 1:\n a\nmatch 2:\n b\nmatch 3:\n c\nmatch 4:\n d\nmatch 5:\n e\n
     0: match 1:\x0a a\x0a
     1: 1
     2: a
     0: match 2:\x0a b\x0a
     1: 2
     2: b
     0: match 3:\x0a c\x0a
     1: 3
     2: c
     0: match 4:\x0a d\x0a
     1: 4
     2: d
     0: match 5:\x0a e\x0a
     1: 5
     2: e


But when the same is attempted using pcregrep instead, the second match is
duplicated, the third match appears tripled, the fourth quadrupled, et cetera.

    $ tail -n1 input | sed 's/\\n/\n/g' | \
    $   pcregrep --om-separator / -Mo0 -o1 -o2 \
    $   "$(pcregrep -o1 '~(.+)~' input)"
    match 1:
     a
    /1/a
    match 2:
     b
    /2/b
    match 3:
     c
    /3/c
    match 4:
     d
    /4/d
    match 5:
     e
    /5/e
    match 2:
     b
    /2/b
    match 3:
     c
    /3/c
    match 4:
     d
    /4/d
    match 5:
     e
    /5/e
    match 3:
     c
    /3/c
    match 4:
     d
    /4/d
    match 5:
     e
    /5/e
    match 4:
     d
    /4/d
    match 5:
     e
    /5/e
    match 5:
     e
    /5/e


Instead, pcregrep should output each correctly found match only a single time.

--
You are receiving this mail because:
You are on the CC list for the bug.