https://bugs.exim.org/show_bug.cgi?id=2642
Bug ID: 2642
Summary: Searching with PCRE2_MATCH_INVALID_UTF and
PCRE2_CASELESS not working in binary files
Product: PCRE
Version: 10.35 (PCRE2)
Hardware: x86-64
OS: All
Status: NEW
Severity: bug
Priority: medium
Component: Code
Assignee: Philip.Hazel@???
Reporter: tempelmann@???
CC: pcre-dev@???
Created attachment 1334
-->
https://bugs.exim.org/attachment.cgi?id=1334&action=edit
the binary file with the subject data
(See also my post on the developers mailing list titled "Getting crash when
searching binary data with case-insensitive option")
PCRE2 seems currently unable to find plain ASCII text with the case-insensitive
option in binary files.
I have attached a sample binary file for this. Searching for the string
"AWAVAUATSH" inside, or any other case variation, fails to find it, when I use
the PCRE2_CASELESS option. Without PCRE2_CASELESS, it works.
I see no logical reason why this shouldn't work. Adding the caseless option
means that the search tree is simply getting bigger, with more decision cases.
And since it works when searching in plain text files, it should as well work
in files that contain invalid Unicode codes inside (i.e. are considered
binary). The search pattern is still inside that file and should be found.
Here's the test code.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#define PCRE2_CODE_UNIT_WIDTH 8
#import "pcre2.h"
int main(int argc, const char * argv[])
{
{
const char *find = "AWAVAUATSH";
uint32_t regexOptions = PCRE2_MATCH_INVALID_UTF | PCRE2_UTF |
PCRE2_CASELESS;
uint32_t matchOptions = PCRE2_NOTBOL | PCRE2_NOTEOL | PCRE2_NOTEMPTY;
int errNum = 0; PCRE2_SIZE errOfs = 0;
pcre2_code *regEx2 = pcre2_compile_8 ((PCRE2_SPTR)find,
PCRE2_ZERO_TERMINATED, regexOptions, &errNum, &errOfs, NULL);
pcre2_match_data *regEx2Match = pcre2_match_data_create_from_pattern
(regEx2, NULL);
size_t bufLen = 32 * 1024 * 1024; // 32 MB, in case we test larger
files
void *bufPtr = malloc (bufLen);
int fd = open ("pcre2_subject_sample", O_RDONLY);
if (fd < 0) {
printf("File not found! Please fix the path in the code.\n");
return 1;
}
size_t actualLen = read (fd, bufPtr, bufLen);
int ok = pcre2_match_8 (regEx2, (PCRE2_SPTR8)dataPtr, actualLen, 0,
matchOptions, regEx2Match, NULL);
if (ok > 0) {
printf("Pattern found\n");
} else {
printf("Pattern NOT found\n");
}
}
return 0;
}
--
You are receiving this mail because:
You are on the CC list for the bug.