[pcre-dev] [Bug 1419] PCRE slower at matching UTF8 character…

Top Page
Delete this message
Author: Ben Maurer
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1419] PCRE slower at matching UTF8 character classes.
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1419




--- Comment #15 from Ben Maurer <ben.maurer@???> 2013-12-16 10:04:42 ---
We use PCRE in HHVM (our implementation of PHP). Since PHP uses PCRE, the main
goal in using PCRE was compatability. PCRE is a great choice of library for us
and serves us very well. We've recently been experimenting with using the JIT
to increase regex performance.

In this particular case, I noticed though some of our internal profiling tools
that a set of regexes which filtered Emoji from UTF8 encoded strings seemed to
be taking an unusually large amount of time. I was able to isolate it down to
the example I attached to this bug.

For us, any work that increases performance is very useful. While PCRE isn't a
huge overall percentage of our runtime, a large infrastructures provides a lot
of leverage -- a gain might be small as a percentage of our infrastructure, but
still meaningful in absolute terms.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email