[pcre-dev] [Bug 2483] Out-of-bounds memory read in internal_dfa_match() (internal_dfa

Author: admin
Date:
To: pcre-dev
Subject: [pcre-dev] [Bug 2483] Out-of-bounds memory read in internal_dfa_match() (internal_dfa_match.c)

https://bugs.exim.org/show_bug.cgi?id=2483

Petr Pisar <ppisar@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |INVALID

--- Comment #4 from Petr Pisar <ppisar@???> ---
The reproducer can be reduced to:

/[^a]*\x{3c2}/i,utf
\x{d10000}\=no_utf_check

It crashes because the subject text \x{d10000} is not an valid UTF-8 text and
at the same time you disable checks for UTF-8 validity with no_utf_check
subject modifier. If you remove the modifier:

/[^a]*\x{3c2}/i,utf
\x{d10000}

then PCRE performs the check and explains what's wrong with the subject text:

$ pcre2test < test
PCRE2 version 10.33 2019-04-16
/[^a]*\x{3c2}/i,utf
\x{d10000}
Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at
offset 0

This is not a bug. It's a documented behavior. From pcre2api(3) manual:

       If  you  know that your pattern is a valid UTF string, and you want to
skip this
       check for performance reasons, you can set the PCRE2_NO_UTF_CHECK 
option.  When
       it  is  set,  the  effect of passing an invalid UTF string as a pattern
is undeâ
       fined. It may cause your program to crash or loop.

       Note that this option can also be passed to pcre2_match() and 
pcre_dfa_match(),
       to suppress UTF validity checking of the subject string.

--
You are receiving this mail because:
You are on the CC list for the bug.

This message is part of the following thread:
	the complete thread tree sorted by date
	admin at

[pcre-dev] [Bug 2483] Out-of-bounds memory read in internal…