[pcre-dev] [Bug 1130] New: pcregrep doesn't copy entire line…

Top Page
Delete this message
Author: Peter Valdmar Mørch
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1130] New: pcregrep doesn't copy entire lines to output when they are long ( > 25000 chars)
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1130
           Summary: pcregrep doesn't copy entire lines to output when they
                    are long ( > 25000 chars)
           Product: PCRE
           Version: 8.12
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: peter@???
                CC: pcre-dev@???



I create a file with a single long line with an 'a', many 'x's and a 'b'.
pcregrep for a or for b produces output that differs from the origial file,
which I didn't expect.

This happens for 25000 'x's, but for 20000 'x's it is fine, as this shell
snippet illustrates:

> perl -e 'print "a", "x"x25000, "b\n"' > long
> perl -e 'print "a", "x"x20000, "b\n"' > short
> for ls in long short ; do for ab in a b ; do pcregrep $ab $ls > ${ls}${ab} ; done ; done
> md5sum long* short*

a02b4cbbb437eaf52997832952a1d052 long
1dfac8b938bfaec4c6bd727ffae356fd longa
1d5d6df30c643aed4a626dd8ab36f2ec longb
27ee48c18be91ac0038ba8d9a3988625 short
27ee48c18be91ac0038ba8d9a3988625 shorta
27ee48c18be91ac0038ba8d9a3988625 shortb
> ls -l long* short*

-rw-r--r-- 1 pvm pvm 25003 2011-07-12 14:33 long
-rw-r--r-- 1 pvm pvm 24576 2011-07-12 14:33 longa
-rw-r--r-- 1 pvm pvm 427 2011-07-12 14:33 longb
-rw-r--r-- 1 pvm pvm 20003 2011-07-12 14:33 short
-rw-r--r-- 1 pvm pvm 20003 2011-07-12 14:33 shorta
-rw-r--r-- 1 pvm pvm 20003 2011-07-12 14:33 shortb


I expected
> pcregrep a long

to generate the same output as
> pcregrep b long

to be identical to long, just as is the case with the short* files

I was pcregrep-ping through minified javascript files (that are all in one
line) and noticed that the output doesn't have a terminating newline:

> (cat long ; cat long) > 2long
> pcregrep a 2long | wc -l

0
> grep a 2long | wc -l

2

> pcregrep --version

pcregrep version 8.12 2011-01-15

on Ubuntu natty


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email