------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1058
Summary: Feeding bad utf8 strings confuses pcre_exec()
Product: PCRE
Version: 7.9
Platform: Other
OS/Version: Windows
Status: NEW
Severity: bug
Priority: low
Component: Code
AssignedTo: ph10@???
ReportedBy: alehotsky@???
CC: pcre-dev@???, alehotsky@???
If I feed a target string like "+\001\177" to pcre_exec() with a pattern of
"[^a-z]", it appears that it grabs the trailing NUL on the string as being the
second byte of a UTF8 character. It then returns a 'match' of {0,4}.
Now, I admit that this is bad Unicode, and it's not hard for me to check that
the return length is actually longer than the source string.
But I suspect that this might be indicative of a more serious bug if you are
using partial matches and the utf8 code point breaks across buffers, and since
no UTF8 character can have a NUL byte within its multibyte sequence, you might
be able to easily detect this and either report an error or return the correct
endpoint.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email