[pcre-dev] [Bug 1024] pcregrep crashes with multilines

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1024] pcregrep crashes with multilines
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1024




--- Comment #1 from Philip Hazel <ph10@???> 2010-09-24 11:01:06 ---
On Thu, 23 Sep 2010, felix groebert wrote:

> I'm using pcregrep the following way:
>
> gdb --args pcregrep -M 'va_start(.|\R)*va_end' someFileContainingIt.c
> Program received signal SIGSEGV, Segmentation fault.
> 0x00007ffff79bba52 in ?? () from /lib/libpcre.so.3
>
> They backtrace hints at a stack recursion, so the impact could be a DoS.


Second time this week.

This is not a bug, though it is frequently mis-reported as one, so much so that
I keep this standard response on file:

-------------------------------------------------------------------------------
1. Matching a regular expression is like finding your way through a forest with
many branching paths. As PCRE passes each junction, it has to remember data so
that it can backtrack to that point if necessary. By default, it uses recursion
to store this data on the process stack, because that is fast. However, it can
alternatively be compiled to use the heap instead (run ./configure with
--disable-stack-for-recursion), but that slows performance.

2. It is very easy to write a regular expression that has a very large number
of branches (unlimited repetition of a group, for example). When PCRE goes deep
into such a tree, it may use a lot of memory.

3. Even in these days of gigabyte main memories, some operating system
environments set small default limits on the maximum size of the process stack,
for example, 8Mb. Thus, it is often the case that there is more heap than stack
available (by default). A matching operation that needs a lot of memory may
succeed if the heap is used, but run out of memory if the stack is used.

4. Running out of stack often causes a segfault. Because of this, PCRE contains
the facility to limit the depth of recursion so as to return an error code
instead. However, the default value is large, so it does not normally come into
play unless you explicitly set a smaller value.

5. If you are running into a problem of stack overflow, you have the following
choices:

  (a) Work on your regular expression pattern so that it uses less memory.
  (b) Increase the size of your process stack.
  (c) Compile PCRE to use the heap instead of the stack.
  (d) Set PCRE's recursion limit small enough so that it gives an error
      before the stack overflows.    


6. There is more discussion of some of these ideas in the pcrestack.3 man page.
-------------------------------------------------------------------------------


PCRE does have an option you can set (grep the pcreapi doc for
MATCH_LIMIT_RECURSION). Unfortunately, pcregrep does not have any means
by which you can set this value.

Perhaps it should. I have noted the requirement.

> pcregrep version 7.8 2008-09-05


Current PCRE is 8.10.

Regards,
Philip


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email