Re: [pcre-dev] [Bug 1437] Using PCRE-8.34 on x86-64 Linux wi…

Pàgina inicial
Delete this message
Autor: ph10
Data:  
A: 1437
CC: pcre-dev
Assumpte: Re: [pcre-dev] [Bug 1437] Using PCRE-8.34 on x86-64 Linux with --enable-jit and --enable-utf , grep -iP '^S' gets stuck on a binary file consuming a lot of CPU for many seconds
On Fri, 24 Jan 2014, Zoltan Herczeg wrote:

> Thanks. Now I have the answer. Grep puts a \x0a before the start of the string,
> and your string starts with an invalid utf-8 code, 0xbf. You can fully
> reproduce this issue on the following input:
>
> \x0a\xbf#


Can you reproduce this in pcretest? I don't seem to be able to do that.

> It is a good question what to do here, I would like to hear others opinion
> (especially Philip). I think the best thing would be to switch back to ASCII
> mode in grep, if the input is not a valid UTF string.


Note that pcregrep runs in ASCII mode by default; you have to use the -u
option to make use UTF-8. If you do that on the 1.dat file, it reports
error -10 (bad UTF-8 string).

I agree with your conclusion, so I guess this is really a grep issue,
not a PCRE issue. It should not be setting PCRE_NO_UTF_CHECK if it
cannot guarantee valid UTF-8 data.

Philip

--
Philip Hazel