[pcre-dev] 3x-4x slowdown in pcre_match

Top Page
Delete this message
Author: enh
Date:  
To: pcre-dev
CC: Martijn Coenen, Stephen Hines
Subject: [pcre-dev] 3x-4x slowdown in pcre_match
clang has an option to force initialize stack data that it can't
otherwise prove will be initialize before it's read. various platforms
(amongst them Apple's, Microsoft's, and Android) are turning this on
globally for kernel and/or userspace.

we haven't found many cases where we need to locally disable it to
preserve performance, but "large stack buffer in code that's called in
hotspots" is the common case of this uncommon case.

most recently we found this one in pcre2, which is heavily used by all
the SELinux label stuff on Android:

diff --git a/dist2/src/pcre2_match.c b/dist2/src/pcre2_match.c
index 419561f..e3eea02 100644
--- a/dist2/src/pcre2_match.c
+++ b/dist2/src/pcre2_match.c
@@ -6048,7 +6048,7 @@ proves to be too small, it is replaced by a
larger one on the heap. To get a
vector of the size required that is aligned for pointers, allocate it as a
vector of pointers. */

-PCRE2_SPTR stack_frames_vector[START_FRAMES_SIZE/sizeof(PCRE2_SPTR)];
+PCRE2_SPTR stack_frames_vector[START_FRAMES_SIZE/sizeof(PCRE2_SPTR)]
__attribute__((uninitialized));
mb->stack_frames = (heapframe *)stack_frames_vector;

/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated

i'm happy to try to send you a patch that also includes the relevant
configure "does this compiler have this attribute?" stuff, but thought
i'd check first for your opinions about what naming you'd like to see,
and what you'd like this use-site to look like. just #ifdef
HAVE_ATTRIBUTE_UNINITIALIZED or a PCRE2_KEEP_UNINITIALIZED that
expands to nothing on systems that don't support this.

let me know your thoughts so i can write the fuller patch. (or let me
know if you'd rather just do it rather than play telephone!)

thanks!