[pcre-dev] [Bug 1437] New: Using PCRE-8.34 on x86-64 Linux w…

Top Page
Delete this message
Author: Shlomi Fish
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1437] New: Using PCRE-8.34 on x86-64 Linux with --enable-jit and --enable-utf , grep -iP '^S' gets stuck on a binary file consuming a lot of CPU for many seconds
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1437
           Summary: Using PCRE-8.34 on x86-64 Linux with --enable-jit and --
                    enable-utf , grep -iP '^S' gets stuck on a binary file
                    consuming a lot of CPU for many seconds
           Product: PCRE
           Version: 8.34
          Platform: x86-64
               URL: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16499
        OS/Version: Linux
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: shlomif@???
                CC: pcre-dev@???



Created an attachment (id=683)
--> (http://bugs.exim.org/attachment.cgi?id=683)
Offending file to be given as input.

Hi all,

I originally filed the bug in GNU grep. See:

http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16499

for more investigation.

Quoting from it:

Hi all,

after I save the attached file as 1.dat , I see that grep -iP on '^Subject:'
or on '^S' gets stuck in the en_US.UTF-8 locale. It is fine in pcregrep and in
ack.

[SHELL]
shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8
~/apps/TEST-grep-from-git-TO-DEL/bin/grep -iP '^Subject:' < 1.dat ^C

real    0m4.199s
user    0m4.195s
sys     0m0.003s
shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8
~/apps/TEST-grep-from-git-TO-DEL/bin/grep -iP '^S' < 1.dat ^C


real    0m3.486s
user    0m3.485s
sys     0m0.001s
shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8
~/apps/TEST-grep-from-git-TO-DEL/bin/grep -iE '^S' < 1.dat


real    0m0.002s
user    0m0.002s
sys     0m0.000s
shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8
~/apps/TEST-grep-from-git-TO-DEL/bin/grep -P '^S' < 1.dat ^C


real    0m1.887s
user    0m1.885s
sys     0m0.000s
shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8
~/apps/TEST-grep-from-git-TO-DEL/bin/grep -P '^Subject:' < 1.dat


real    0m0.003s
user    0m0.000s
sys     0m0.002s
shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8
~/apps/TEST-grep-from-git-TO-DEL/bin/grep -P '^Subject:' < 1.dat time LC_ALL=C
~/apps/TEST-grep-from-git-TO-DEL/bin/grep -iP '^Subject:' < 1.dat


real    0m0.003s
user    0m0.001s
sys     0m0.001s
shlomif <at> telaviv1:~$ time LC_ALL=C pcregrep -i '^Subject:' < 1.dat


real    0m0.002s
user    0m0.001s
sys     0m0.000s
shlomif <at> telaviv1:~$ time LC_ALL=C ack -i '^Subject:' 1.dat


real    0m0.066s
user    0m0.059s
sys     0m0.007s
shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8 ack -i '^Subject:' 1.dat


real    0m0.070s
user    0m0.063s
sys     0m0.006s
[/SHELL]


The same thing happens with grep-2.16 built from the sources. I'm on Mageia
Linux x86-64 Cauldron (what will be Mageia 4).

shlomif <at> telaviv1:~$ ldd ~/apps/TEST-grep-from-git-TO-DEL/bin/grep 
        linux-vdso.so.1 (0x00007fff2a7fe000)
        libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f19ed302000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f19ecf4d000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f19ecd30000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f19ed568000)
shlomif <at> telaviv1:~$ rpm -qf /lib64/libpcre.so.1
lib64pcre1-8.33-2.mga4


Regards,

        Shlomi Fish


After some investigation I discovered that the problem was manifested on x86-64
systems only with PCRE-8.x that was built with JIT support (and --enable-utf
too naturally). The problem happens in a JIT-generated function without
debugging symbols.

If I built PCRE and GNU grep-2.16 like this on a Debian Testing ("jessie")
x86-64 VM then running LC_ALL=en_US.UTF-8 ~/apps/grep/bin/grep -iP '^S' < 1.dat
caused it to hang:

BUILD_pcre.bash:

«
#!/bin/bash
CFLAGS="-g" ./configure --prefix="$HOME/apps/pcre" --enable-utf --enable-jit
»

BUILD_grep.bash:

«
#!/bin/bash
# Source this file.
export CPATH="/home/shlomif/apps/pcre/include/"
export LD_LIBRARY_PATH="/home/shlomif/apps/pcre/lib"
export LIBRARY_PATH="/home/shlomif/apps/pcre/lib"
CFLAGS="-g" ./configure --prefix="$HOME/apps/grep"
»

(searcing for «-iP '^Su'» was fine).

---------------

I'll attach the 1.dat file here.

Regards,

-- Shlomi Fish


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email