[pcre-dev] [Bug 1468] Segmentation fault for a text starting with 0x80

Author: Zoltan Herczeg
Date:
To: pcre-dev
Subject: [pcre-dev] [Bug 1468] Segmentation fault for a text starting with 0x80 - 0xbf in UTF-8 mode

------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1468

--- Comment #7 from Zoltan Herczeg <hzmester@???> 2014-04-21 16:23:51 ---
> I want to search also binary files recognized as UTF-8 characters as much
> as possible in UTF-8 mode, instead of whether PCRE_NO_UTF8_CHECK is set or not.

I was once thinking about what we can do with UTF-8 strings. If the random
character sequence would be preceded by a valid, single character UTF code (\0
for example), and followed by a min(1, maximum_UTF_codepoint_length - 1) number
of single character UTF code (UTF-8: three, UTF-16/32: one \0), PCRE could be
made that no crash or infinite loop happens. However, it is not guaranteed,
that all matches will be found (because of certain optimizations). Because of
the second condition, this might be a waste of efforts though.

--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

This message is part of the following thread:
	the complete thread tree sorted by date
	Philip Hazel at
	Philip Hazel at

[pcre-dev] [Bug 1468] Segmentation fault for a text starting…