[pcre-dev] Getting crash when searching binary data with ca…

Top Page
Delete this message
Author: Thomas Tempelmann
Date:  
To: pcre-dev
Subject: [pcre-dev] Getting crash when searching binary data with case-insensitive option
Hi everyone, I hope I can get some help here with using PCRE2.

I'm writing a search program, similar to grep, but with a GUI, and using
multiple threads (it's Find Any File for macOS). I had been using Apple's
own regex lib, which works reliably (no crashes). but it's too limited in
its abilities, so I wanted to switch to PCRE2.

I now run into a reproducible crash, though, and need help resolving this.

My goal is to be able to search for UTF8 text inside binary data. Maybe I'm
doing this wrong.

My code for this currently looks like this:

uint32_t regexOptions = PCRE2_UTF | PCRE2_NO_UTF_CHECK | PCRE2_CASELESS;

uint32_t matchOptions = PCRE2_NOTBOL | PCRE2_NOTEOL | PCRE2_NOTEMPTY |
PCRE2_NO_UTF_CHECK;

int errNum = 0; PCRE2_SIZE errOfs = 0;

pcre2_code *regEx2 = pcre2_compile_8 ((PCRE2_SPTR)find,
PCRE2_ZERO_TERMINATED, regexOptions, &errNum, &errOfs, NULL);

pcre2_match_data *regEx2Match = pcre2_match_data_create_from_pattern
(regEx2, NULL);


pcre2_match_8 (regEx2, (PCRE2_SPTR8)dataPtr, dataLen, 0, matchOptions,
regEx2Match, NULL);

Without the PCRE2_NO_UTF_CHECK option, it seems it won't find anything in
binary files. I also add the PCRE2_CASELESS to be able to find text
case-insensitive, but that's what leads to the crash.

For instance, if I search my local "locate.database" for
"NSURLVolumeNameKey", I get a crash in the "match" function:

const char *find = "NSURLVolumeNameKey";

size_t dataLen = 32 * 1024 * 1024; // 32 MB

void *dataPtr = malloc (dataLen);

int fd = open ("/var/db/locate.database", O_RDONLY);

dataLen = read (fd, dataPtr, dataLen);


If I remove either the PCRE2_NO_UTF_CHECK or the PCRE2_CASELESS option, I
get no crash. Also, when shortening the search string, I get no crash.


Here's some details on the crash as shown by Xcode:


    0x10001bbae <+37742>: leaq   0x1221b(%rip), %rdi ; _pcre2_ucd_stage1_8


    0x10001bbb5 <+37749>: movzwl (%rdi,%rax,2), %eax


    0x10001bbb9 <+37753>: shlq   $0x7, %rax


    0x10001bbbd <+37757>: movl   %ecx, %edi


    0x10001bbbf <+37759>: subl   %edx, %edi


    0x10001bbc1 <+37761>: movslq %edi, %rdx


    0x10001bbc4 <+37764>: addq   %rax, %rdx


    0x10001bbc7 <+37767>: leaq   0x16602(%rip), %rax ; _pcre2_ucd_stage2_8


-> 0x10001bbce <+37774>: movzwl (%rax,%rdx,2), %edx

    0x10001bbd2 <+37778>: leaq   0x75d7(%rip), %rax


I get the msg: Thread 1: EXC_BAD_ACCESS (code=1, address=0x1004fe2b6)

Registers:

Exception State Registers:
trapno unsigned int 0x00000003
err unsigned int 0x00000000
faultvaddr unsigned long 0x00007fff95466230

General Purpose Registers:
rax unsigned long 0x00000001000321d0
rbx unsigned long 0x00007ffeefbfa76e
rcx unsigned long 0x0000000000e97673
rdx unsigned long 0x0000000000266073
rdi unsigned long 0x0000000000000073
rsi unsigned long 0x0000000000000009
rbp unsigned long 0x00007ffeefbfa450
rsp unsigned long 0x00007ffeefbfa330
r8 unsigned long 0x0000000000000000
r9 unsigned long 0x0000000000000080
r10 unsigned long 0x00007ffeefbfa530
r11 unsigned long 0x00000001006815cd
r12 unsigned long 0x0000000000000000
r13 unsigned long 0x00007ffeefbfa780
r14 unsigned long 0x00000001006815cb
r15 unsigned long 0x0000000000000010
rip unsigned long 0x000000010001bbce
rflags unsigned long 0x0000000000000202
cs unsigned long 0x000000000000002b
fs unsigned long 0x0000000000000000
gs unsigned long 0x0000000000000000
eax unsigned int 0x000321d0
ebx unsigned int 0xefbfa76e
ecx unsigned int 0x00e97673
edx unsigned int 0x00266073
edi unsigned int 0x00000073
esi unsigned int 0x00000009
ebp unsigned int 0xefbfa450
esp unsigned int 0xefbfa330
r8d unsigned int 0x00000000
r9d unsigned int 0x00000080
r10d unsigned int 0xefbfa530
r11d unsigned int 0x006815cd
r12d unsigned int 0x00000000
r13d unsigned int 0xefbfa780
r14d unsigned int 0x006815cb
r15d unsigned int 0x00000010

I am using libpcre2-8.a on macOS 10.13.6. The config.log from my build
shows:

It was created by PCRE2 configure 10.35, which was
> generated by GNU Autoconf 2.69. Invocation command line was
> $ ./configure --disable-shared --enable-silent-rules CFLAGS=-O2
> -mmacosx-version-min=10.11



You can download my locate.database here, along with the Xcode project I
use for testing this (also includes the built libpcre2):
https://files.tempel.org/tmp/PCRE2_Binary_Search.zip (3.6 MB)

--
Thomas Tempelmann, http://apps.tempel.org/
Follow me on Twitter: https://twitter.com/tempelorg
Read my programming blog: http://blog.tempel.org/