[pcre-dev] [Bug 1848] pcregrep outputs duplicate matches

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Old-Topics: [pcre-dev] [Bug 1848] New: pcregrep outputs duplicate matches
Subject: [pcre-dev] [Bug 1848] pcregrep outputs duplicate matches
https://bugs.exim.org/show_bug.cgi?id=1848

Eric Hoffman <ehoffman@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ehoffman@???


--- Comment #2 from Eric Hoffman <ehoffman@???> ---
I'm still seeing this issue with pcre version 8.40.

I just create a file with 4 lines (file.txt):
123<\n>
456<\n>
789<\n>
---<\n>

And I do the following syntax (from freshly compiled pcregrep):

$ ./pcregrep --version
pcregrep version 8.40 2017-01-11

$ ./pcregrep -Mo "(\n|.)*---" file.txt
123
456
789
---
456
789
---
789
---
---


After investigation, the fix that was implemented only advance to the end of
line, so the bug appear again when parsing line 2, and line 3.

To fix it, you would need to skip all lines until you reach the end of the
pattern

--- pcregrep.c  (revision 1647)
+++ pcregrep.c  (working copy)
@@ -1865,7 +1865,22 @@
                     /* If the current match ended past the end of the line
(only possible
                     in multiline mode), we are done with this line. */


-                    if ((unsigned int)offsets[1] > linelength) goto
END_ONE_MATCH;
+                    if ((unsigned int)offsets[1] > linelength)
+                    {
+                        char *endmatch = ptr + offsets[1];
+                        t = ptr;
+                        while (t <= endmatch)
+                        {
+                            t = end_of_line(t, endptr, &endlinelength);
+                            if (t < endmatch)
+                                linenumber++;
+                            else
+                                break;
+                        }
+                        linelength = t - ptr - endlinelength;
+
+                        goto END_ONE_MATCH;
+                    }


                     startoffset = offsets[1];    /* Restart after the match */
                     if (startoffset <= oldstartoffset)



What do you think?

Regards,
Eric H.

--
You are receiving this mail because:
You are on the CC list for the bug.