[pcre-dev] [Bug 1027] Faster keyword matching

Pàgina inicial
Delete this message
Autor: Philip Hazel
Data:  
A: pcre-dev
Assumptes vells: [pcre-dev] [Bug 1027] New: Faster keyword matching
Assumpte: [pcre-dev] [Bug 1027] Faster keyword matching
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1027

Philip Hazel <ph10@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX





--- Comment #6 from Philip Hazel <ph10@???> 2011-08-08 15:27:56 ---
I have decided to close this item, because I don't think anything will ever be
done. I have had a closer look at the A-C matching, and one problem is that it
is incompatible with Perl. If I match /cater|caterpillar/ against the string
"caterpillar", Perl and pcre_exec() will yield the match "cater", because
Perl's rule is "first found match". The A-C algorithm finds all possible
matches, in length order rather than pattern order. I suppose one might adapt
it to remember the order of the strings, and thereby pick the one that Perl
would find, but I think you'd also have to find a way of stopping it searching
further after it had found it.

Incidentally, when you were running the timing tests you posted here, did you
use pcre_study() before calling pcre_exec()? With fixed string patterns, this
can make a big difference. I've just run a small test and I get approximately
30% improvement in pcre_exec() and 50% in pcre_dfa_exec() for very little study
time.
(This was with about 20 words searching a long string in which none of them
appeared.)

Another reason for not trying to pursue this is the JIT optimisation addition
that is currently being developed (and just by some people). It will be
released as part of PCRE in the next couple of months.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email