[pcre-dev] [Bug 1554] support subject strings with invalid UTF-8 sequences

著者: admin
日付:
To: pcre-dev
古いトピック: [pcre-dev] [Bug 1554] New: support subject strings with invalid UTF-8 sequences
題目: [pcre-dev] [Bug 1554] support subject strings with invalid UTF-8 sequences

https://bugs.exim.org/show_bug.cgi?id=1554

Zoltan Herczeg <hzmester@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hzmester@???

--- Comment #6 from Zoltan Herczeg <hzmester@???> ---
Is this issue still valid? Did you find any solution (e.g. converting all
invalid bytes to \0)?

On the fly validation of UTF characters is costly, especially for a
backtracking engine, where the same character may be validated several times.
This would slow down the interpreter too much. However, PCRE2-JIT could be
extended with invalid UTF parsing. This feature would be enabled by a flag, so
extra cost for others who don't need this feature would only be negligible
compile time overhead.

But the validation itself is still costly, especially if we need to do it for
the same character several times. Therefore "fixing" the input before calling
pcre2_match would still be faster.

--
You are receiving this mail because:
You are on the CC list for the bug.

このメッセージは次のスレッドの一部です:
	日付によるスレッドの仕分け

[pcre-dev] [Bug 1554] support subject strings with invalid U…