[pcre-dev] [Bug 1437] Using PCRE-8.34 on x86-64 Linux with …

Top Page
Delete this message
Author: Zoltan Herczeg
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1437] Using PCRE-8.34 on x86-64 Linux with --enable-jit and --enable-utf , grep -iP '^S' gets stuck on a binary file consuming a lot of CPU for many seconds
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1437




--- Comment #13 from Zoltan Herczeg <hzmester@???> 2014-01-26 16:18:19 ---
> Can you reproduce this in pcretest? I don't seem to be able to do that.


Yes, you can:

re> /^s/8imS+
data> \n\xbf#\?


re> /^.a/8mS+
data> \n\x80ba\?


Basically this test explott the skip_char_back() function during the fast
forward search of newline. The JIT way is going back one character, and read
characters until it finds a newline. Since this character is also read, we can
simply start the match from here. In this particular case, we start the match
one character earlier than we finished last time. This could be fixed by an
optimization: instead of moving one character forward, move back, and start a
newline search, we can simply start a newline search when a match is failed.

This function can be exploited other ways as well:

re> /(\B.)+/8S+
data> #\x80#\?

Error -27 (JIT stack limit reached)

In my opinion,

- what we cannot do: ensure that UTF-8 matching works correctly on an invalid
UTF input.

- what we can do: if certain conditions are fulfilled, we can ensure that there
will be no infinite loop or buffer over-read (and segmentation fault).

The conditions would be: when the input might not be correct UTF, but we don't
want to run the very expensive UTF check, the buffer must be preceded by a
single character UTF code (such as \0) and there should
max_utf_character_length - 1 of these characters after the buffer. In case of
UTF-8, that means one \0 before, and five \0 after the buffer. In UTF-16, a
single \0 is enough on both sides. Would that satisfy your needs Shlomi Fish?


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email