[pcre-dev] [Bug 969] compile_branch seg fault

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: pcre-dev
Old-Topics: [pcre-dev] [Bug 969] New: compile_branch seg fault
Subject: [pcre-dev] [Bug 969] compile_branch seg fault
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=969

Philip Hazel <ph10@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID





--- Comment #1 from Philip Hazel <ph10@???> 2010-05-22 17:19:21 ---
(In reply to comment #0)
> A seg fault occurs in compile_branch when a string starting with "\0277\0377"
> is passed to pcre_compile2 with compile options PCRE_CASELESS |
> PCRE_NO_UTF8_CHECK | PCRE_UTF8.
>
>
> #include "pcre.h"
>
> int main ()
> {
>     const char needle[] = {0x5B, 0xFF};


1. You have not terminated the pattern string with a binary zero to make it
into a C string. However, adding one makes no difference.

2. 0x5B is not \0277, it is \0133.

3. The sequence 0x5B, 0xFF is not a valid UTF-8 character. By setting
PCRE_NO_UTF9_CHECK you have told PCRE that you are giving it a valid UTF-8
string. This is a lie, so the consequences are unpredictable. If you do not set
PCRE_NO_UTF8_CHECK, you get the "invalid UTF-8 string" error.

4. The documentation (the pcreapi page) says "If you already know that your
pattern is valid, and you want to skip this check for performance reasons, you
can set the PCRE_NO_UTF8_CHECK option. When it is set, the effect of passing an
invalid UTF-8 string as a pattern is undefined. It may cause your program to
crash."

5. Consequently, I am closing this as INVALID.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email