[pcre-dev] [Bug 1130] pcregrep doesn't copy entire lines to …

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1130] pcregrep doesn't copy entire lines to output when they are long ( > 25000 chars)
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1130

Philip Hazel <ph10@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED





--- Comment #1 from Philip Hazel <ph10@???> 2011-07-30 19:24:03 ---
pcregrep wasn't really designed with exceedingly long lines in mind. It scans
files by reading large chunks into an in-memory buffer and works mostly in the
middle of that buffer, so as to have some "before" and "after" lines available
in case they are to be output. There was no test on a line's overflowing the
buffer.
[I am aware that on some operating systems, files can be memory-mapped and so
scanned in their entireity more easy, but pcregrep is written (almost entirely)
in Standard C, so as to run in many different environments, not all of which
may be able to do this. Besides, it also has to handle .gz files.]

The buffer parameter size was set at 8K. pcregrep in fact gets a block of
memory three times this size, to allow for before/after. I am old enough to
remember when 24K was a lot of memory, but of course today even 24M is
unexceptional.

I have committed a patch that may help. The changes are as follows:

    (a) The default value of the buffer size parameter has been increased from
        8K to 20K. (A buffer three times this size is actually used.)          


    (b) The default can be changed by ./configure --with-pcregrep-bufsize when
        PCRE is built.                                                          


    (c) A --buffer-size=n option has been added to pcregrep, to allow the size
        to be set at run time.                                                 


    (d) Numerical values in pcregrep options can be followed by K or M, for   
        example --buffer-size=50K.                                            


    (e) If a line being scanned overflows pcregrep's buffer, an error is now 
        given and the return code is set to 2.     




--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email