[pcre-dev] [Bug 1419] PCRE slower at matching UTF8 character…

Top Page
Delete this message
Author: Zoltan Herczeg
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1419] PCRE slower at matching UTF8 character classes.
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1419




--- Comment #14 from Zoltan Herczeg <hzmester@???> 2013-12-16 06:53:54 ---
Sorry, I misunderstood you. I thought you found this bounty, and wanted to fix
it.

It is very good news, that Facebook is interested in PCRE development! I
suspect you need to do a lot of text processing. May I ask why did you choose
to use PCRE? In general, features or performance is more important to you? Do
you use JIT? Of course you don't need to answer if you are not allowed to share
such info.

In the last few years, we have been working a lot on PCRE performance. However,
we continually need use cases to improve it further, so thanks for noticing
this issue. I was once thinking about improving property-less character ranges,
but no-one complained about them before, so it seemed a low priority task. I
want to make their bitmask "precise", which could allow certain optimizations.
And introduce a new XCL_SOMETHING flag, which tells that its bitmask is
precise. I expect we will get a speed similar to [b-z] case in this test case.
I will do this in the next few weeks.

Regarding SSE4.2, according to this table my CPU does not support it:

http://software.intel.com/en-us/articles/performance-tools-for-software-developers-intel-compiler-options-for-sse-generation-and-processor-specific-optimizations

You will likely need to find somebody else to do that. I only did ARM-NEON
optimizations before, and did not know much about x86 SIMD.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email