[pcre-dev] Crashes in pcre2_match_16 with binary data

Top Page
Delete this message
Author: Thomas Tempelmann
Date:  
To: pcre-dev
Subject: [pcre-dev] Crashes in pcre2_match_16 with binary data
In 2020 I had asked for help with searching for UTF-8 text in binary data
(i.e. any files on a disk). I got some advice and all worked well.

Now I expanded my code to also search for UTF-16 text in the same data.

So I built the lib with both the 8 and 16 bit functions, and create
separate pcre_code, match_data, context etc. using the _8 and _16 suffixes
instead of the default macros.

And it all works fine - It finds both UTF-8 and UTF-16 strings in files.

However, I sometimes get crashes in the pcre2_match_16() function, whereas
I never get them in the _8 function. And it happens both with JIT and
without.

I also use the same options for both versions, of course.

With PCRE2 v10.42.

I also ruled out a mix-up between the _8 and _16 structs by only using the
_16 code, and I also don't use concurrent threads.

The crashes are a bit random, i.e. certain files crash often but not always.

But within 5 seconds of scanning random files on my disk, I get always a
crash.

Since I use a built lib, I cannot easily look at the source code where it
crashes.

I wonder if there are cmdline tools I can use for testing in order to rule
out a mistake on my end. But it seems that pcre2grep does not support
UTF-16 search, right? Or do I have to build the tool with special options
first?

--
Thomas Tempelmann, http://apps.tempel.org/