Author: Zoltán Herczeg Date: To: pcre-dev Subject: [pcre-dev] Adding SSE2 support to PCRE2-JIT
Hi,
this is just a notification that I recently added x86 SSE2 support to PCRE2-JIT. This improves the performance of "first character" search. The code is mostly ready, but testing may reveal some issues. I also expect some false positive valgrind reports, since the code performs 16 byte aligned reads, which may cross buffer boundaries. For example, if the input buffer (called subject) starts at byte offset 3 and end at 9, the SSE2 part reads 16 bytes from offset 0. This is a valid read, since memory pages are at least 1K (usually 4K) aligned. Of course the algorithm ignores data outside the buffer boundaries, but valgrind may catch this and report this an error.
I will probably do some fine tuning later, since the effectiveness depends on the input (and not the pattern unfortunately). If we search a character which is frequent in the input, the current Aho–Corasick implementation is usually faster. However, if we search a character, which is very rare, SSE2 is unbeatable.