------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1437
--- Comment #12 from Philip Hazel <ph10@???> 2014-01-24 17:13:56 ---
On Fri, 24 Jan 2014, Zoltan Herczeg wrote:
> Thanks. Now I have the answer. Grep puts a \x0a before the start of the string,
> and your string starts with an invalid utf-8 code, 0xbf. You can fully
> reproduce this issue on the following input:
>
> \x0a\xbf#
Can you reproduce this in pcretest? I don't seem to be able to do that.
> It is a good question what to do here, I would like to hear others opinion
> (especially Philip). I think the best thing would be to switch back to ASCII
> mode in grep, if the input is not a valid UTF string.
Note that pcregrep runs in ASCII mode by default; you have to use the -u
option to make use UTF-8. If you do that on the 1.dat file, it reports
error -10 (bad UTF-8 string).
I agree with your conclusion, so I guess this is really a grep issue,
not a PCRE issue. It should not be setting PCRE_NO_UTF_CHECK if it
cannot guarantee valid UTF-8 data.
Philip
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email