[pcre-dev] [Bug 2540] New: Valgrind errors in PCRE2 JIT code

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2540] New: Valgrind errors in PCRE2 JIT code
https://bugs.exim.org/show_bug.cgi?id=2540

            Bug ID: 2540
           Summary: Valgrind errors in PCRE2 JIT code
           Product: PCRE
           Version: 10.34 (PCRE2)
          Hardware: x86-64
                OS: Linux
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
          Assignee: ph10@???
          Reporter: vesselin@???
                CC: pcre-dev@???


Created attachment 1283
--> https://bugs.exim.org/attachment.cgi?id=1283&action=edit
ZIP file with bugreport files

While investigating a problem in TeXstudio search function one of the users
tried running it under Valgrind+GDB and came across a few memory errors
reported by Valgrind.

I investigated the problems in more detail and it seems that the problems are
caused by the PCRE2 JIT code which uses XMM registers/instructions to accesses
memory after the
end of the allocated string buffer.

TeXstudio uses QString from the Qt framework, which in turn uses PCRE2 for the
regular expression search.

The exact versions are
Qt 5.13.2 (as shipped with Fedora 31)
pcre2-10.34 (as shipped with Fedora 31)

The string in which the search is being done is "wo?wo??" (without the double
quotes).
The string that is being searched for (the pattern) is "wo" (that is we don't
have any special regular expression characters inside and the pattern).

The actual Valgrind error is:
---------------------------------------------
==2547== Invalid read of size 16
==2547==    at 0x253C92B5: ???
==2547==    by 0x187A604B: ???
==2547==  Address 0x187a604e is 30 bytes inside a block of size 40 alloc'd
---------------------------------------------
(the full error message is attached as valgrind_error.txt)


A full GDB disassembly of the offending PCRE2 JIT code is attached as
jit_stage_2_disassembly.txt
The actual offending instruction is
=> 0x00000000253c92b5:    f3 0f 6f 4e fe    movdqu xmm1,XMMWORD PTR [rsi-0x2]
and in the full disassembly it is prefixed by "=>" (the GDB notation for
current EIP).


I also got the first stage of the PCRE2 JIT code where the separate JIT
assembly instructions are kept in the compiler buffer but they are not merged
into a single
code block and the jump addresses and not adjusted. The dump of the first-stage
JIT code is attached as jit_stage_1_dump.txt

The offending instruction binary code is at offset 0x0155 (341) from the start
of the buffer (address 0x2f86fa75)
0x2f86fa75:     0xf3    0x0f    0x6f    0x4e    0xfe
(see jit_stage_1_dump.txt for the full hex dump)


I did a GDB backtrace of the code that builds the offending instruction it is
available as jit_stage_1_bt.txt
>From the stage 1 backtrace it seems that the offending instruction is generated

in pcre2_jit_simd_inc.h on line 568
I am also attaching the corresponding source file pcre2_jit_simd_inc.h
Lines 567-568 are
--------------------------------------------------------------------
load_from_mem_sse2(compiler, data1_ind, str_ptr_reg_ind, 0);
load_from_mem_sse2(compiler, data2_ind, str_ptr_reg_ind, -(sljit_s8)diff);
--------------------------------------------------------------------
and they generate the two adjacent instructions from stage 2 disassembly

--------------------------------------------------------------------
   0x00000000253c92b1:    66 0f 6f 06    movdqa xmm0,XMMWORD PTR [rsi]
=> 0x00000000253c92b5:    f3 0f 6f 4e fe    movdqu xmm1,XMMWORD PTR [rsi-0x2]
--------------------------------------------------------------------


where the second instruction is the actual offending one which reads past the
end of the allocated buffer.

Overall it seems that the JIT code does not check for the end of the string in
which we search and therefore it just
reads past the end of the allocated string buffer.

So I would appreciate if any of the PCRE2 developers can tell me:

1. Is this Valgrind warning a serious issue? In this case it seems fairly
harmless, but I would imagine that it could lead to an exception and crash if
the JIT code tries to read from memory that is has no read access to.
2. Is there an easy workaround for this kind of read past the end of the buffer
when using PCRE2?

--
You are receiving this mail because:
You are on the CC list for the bug.