[pcre-dev] [Bug 1437] Using PCRE-8.34 on x86-64 Linux with …

Página Principal
Apagar esta mensagem
Autor: Shlomi Fish
Data:  
Para: pcre-dev
Assunto: [pcre-dev] [Bug 1437] Using PCRE-8.34 on x86-64 Linux with --enable-jit and --enable-utf , grep -iP '^S' gets stuck on a binary file consuming a lot of CPU for many seconds
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1437




--- Comment #2 from Shlomi Fish <shlomif@???> 2014-01-23 07:46:07 ---
> --- Comment #1 from Zoltan Herczeg <hzmester@???> 2014-01-23
> 00:11:06 ---
> Thanks for the bug report. I tried your pattern and input on the
> latest trunk, and matched /^S/ 100000 times to your input. The JIT runtime
> was 0.16 sec, the interpreter was 1.14 sec.


OK.

>
> Am I see right that you use PCRE-8.33? That is a bit old revision, and several
> improvements were added since then (e.g. 'S' has three lowercases, and
> matching such letters was improved recently).


Well, I also tried it with libpcre-8.34 on x86-64 on Debian Testing, and was
able to reproduce the bug there. 8.34 is the latest version according to
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/ .

> Furthermore your input is not a
> valid UTF8 stream, and the matching behaviour is not defined in such cases.


OK, I realise it is a binary file, but the longer story is that it was part of
a file in a Claws-Mail directory tree, which I searched using grep -iPr, and
once grep got to that file, it got stuck and started consuming a lot of CPU and
wouldn't proceed any further.

>
> Would it be possible to dump the input of pcre_exec calls in grep? I suspect
> something is bad ('size' argument for example) with some input buffers.


I think it would be possible to do that using gdb. I can try doing that.

Regards,

-- Shlomi Fish


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email