[pcre-dev] [Bug 1841] Pcre using a lot of cpu and time to ma…

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1841] Pcre using a lot of cpu and time to match
https://bugs.exim.org/show_bug.cgi?id=1841

Philip Hazel <ph10@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|NEW                         |RESOLVED


--- Comment #1 from Philip Hazel <ph10@???> ---
You appear to be using a regex to match a list of fixed strings. This is not
the best way of doing that because there are fast algorithms for doing literal
string searches, for example:

https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

Regex searches come into their own when the search patterns are not all
literals.
However, if you do use one for this kind of search, there are ways to speed
things up. For each character in your log file PCRE is going to laboriously
check each initial character of your substrings in turn. Assuming that your log
files contain characters other than letters, one way of speeding this up would
be to check for a letter before testing all the individual ones. A lookahead
such as (?=[a-z]) at the start might speed things up. And/or you could group
your strings by initial letter and use lookaheads so that you search only those
that begin with a given letter.

The pcretest program has facilities for timing matches, and can therefore be
used to compare the performance of different regex.

Jeffrey Friedl's book "Mastering Regular Expressions" has useful information
about optimizing patterns, though I can't remember if it says much about
literal strings.

Finally, are you using JIT? That can speed up PCRE matches by quite a lot. Oh,
and as you are using PCRE1, are you calling pcre_study()?

I am going to close this item, because I do not think it is a bug.

--
You are receiving this mail because:
You are on the CC list for the bug.