[pcre-dev] [Bug 2793] New: Case insensitive search gets exp…

Pàgina inicial
Delete this message
Autor: admin
Data:  
A: pcre-dev
Assumpte: [pcre-dev] [Bug 2793] New: Case insensitive search gets exponentially slower with larger buffers and a specific text file
https://bugs.exim.org/show_bug.cgi?id=2793

            Bug ID: 2793
           Summary: Case insensitive search gets exponentially slower with
                    larger buffers and a specific text file
           Product: PCRE
           Version: 10.37 (PCRE2)
          Hardware: x86-64
                OS: All
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
          Assignee: Philip.Hazel@???
          Reporter: tempelmann@???
                CC: pcre-dev@???


Created attachment 1395
--> https://bugs.exim.org/attachment.cgi?id=1395&action=edit
main.c, 1.txt, 2.txt

I have two log files. In both, every line is 91 chars long, having only ASCII
chars. The first has the same line repeated all over. The other has "real" log
lines, with ever-changing time codes. Both are about 10 MB in size.

When I search the first, it takes milliseconds, but searching the other takes
many seconds, and that's clearly wrong.

If I double the file / buffer sizes, the time explodes (i.e. it does not simply
double in size) only with the second file.

Also, this only happens in non-jit mode, and only when I choose the
case-insensitive option. And if I try the same with the built pcre2grep
command, using the options "--buffer-size=32M --no-jit -i", it's also not
reproducible. Only going wrong with my own code.

Here's the code I use to read and search each file. It's as simple as it can
get, I think.

const char *find = "EDL";
uint32_t regexOptions = PCRE2_CASELESS; // without this, it's fast as
expected
int errNum = 0; PCRE2_SIZE errOfs = 0;
pcre2_code *regEx2 = pcre2_compile_8 ((PCRE2_SPTR)find,
PCRE2_ZERO_TERMINATED, regexOptions, &errNum, &errOfs, NULL);
pcre2_match_data *regEx2Match = pcre2_match_data_create_from_pattern (regEx2,
NULL);

// read from file
size_t dataLen = 10 * 1024 * 1024; // 20 MB
void *dataPtr = malloc (dataLen);
int fd = open ("2.txt", O_RDONLY);
dataLen = read (fd, dataPtr, dataLen);

pcre2_match_8 (regEx2, (PCRE2_SPTR8)dataPtr, dataLen, 0, 0, regEx2Match,
NULL);


Attached is the complete "main.c" plus the two text files, zipped (it
compressed quite well, to about 400 KB)

--
You are receiving this mail because:
You are on the CC list for the bug.