[pcre-dev] Result of pcre2_get_startchar() undefined as of S…

Top Page
Delete this message
Author: Ralf Junker
Date:  
To: pcre-dev@exim.org
Subject: [pcre-dev] Result of pcre2_get_startchar() undefined as of SVN 1176
As of SVN revision 1176, pcre2_get_startchar() may return an arbitrary,
undefined result.

Here is the pcre2test example input:

/x/utf
     \x80\=startchar


This pcre2test input used to return

Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 0

but with SVN 1176 it returns an arbitrary value, for example

Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at
offset 2156822670

This happens because at pcre2_match.c pcre2_match() returns early at
line 6337

return PCRE2_ERROR_UTF8_ERR20; /* Isolated 0x80 byte */

before setting the match_data->startchar further below.

It looks like the problem was introduced with

SVN Revision 1094
Implement support for invalid UTF in the pcre2_match() interpreter.

Without further testing, the same problem seems to be present for JIT
matching at around line 6215.

Ralf