Re: [pcre-dev] PCRE2 and thread safety of jit compilation?

Author: Giuseppe D'Angelo
Date:
To: Zoltán Herczeg
CC: pcre-dev
Subject: Re: [pcre-dev] PCRE2 and thread safety of jit compilation?

On Tue, Jan 5, 2016 at 5:12 PM, Zoltán Herczeg <hzmester@???> wrote:
> Perhaps we could start by supporting some platforms, and gradually cover more with the community help. I heard that asm volatile forces GCC (and perhaps clang) to disable moving instructions around such asm blocks.
>
> E.g:
>
> statement1;
> asm volatile (" ");
> statement2;
>
> Is it true, that statement1 is fully completed before statement2 is executed even if the assembly part is nothing?

I don't think so, it's even documented:
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

"Note that the compiler can move even volatile asm instructions
relative to other code"

What you may use on older GCCs is __sync_synchronize() (full barrier),
assuming pointer assignments to be atomic.

Newer GCCs have __atomic_load / __atomic_store with the
__ATOMIC_ACQUIRE and __ATOMIC_RELEASE memory models; even newer GCCs
have C11, so the _Atomic type qualifier and <stdatomic.h> operations.

... totally open question marks about other platforms / compilers ...

> The CPU can still reorder stores. An x86 CPU does not have (need) data write barrier instruction as far as I know. Recent ARM 32 CPUs has data write barrier. Could somebody tell me how can I test whether this instruction is available at compile time? ARM 64 should not be a problem. I have not checked other CPUs yet.

I guess you want to check the exact ARM revision of the CPU? Some
detection code like this?

http://code.woboq.org/qt5/qtbase/src/corelib/global/qprocessordetection.h.html#90

> There is one more thing. This theoretically affects everything, not just JIT compilation. If we compile a pattern with pcre2_compile, it is possible that the result pointer has been shared with another thread, but the compiled pattern data is not.
>
> Main thread:
>
> compiled_pcre_pattern->byte_code = something;
> return compiled_pcre_pattern;
>
> shared_pattern = compiled_pcre_pattern;
>
> Another thread:
> match (shared_pattern, subject);
>
> The byte_code part can be a garbage on the other thread (since it can be executed by another CPU). People did not complain about these effects before, is there a reason for that? I don't want to solve a non-existing problem.

Well, this is a problem of PCRE users and how they share that
"shared_pattern" across threads. The "proper" way is making it an
atomic pointer, then turning the assignment into an atomic
store/release and the read in match into an atomic load/acquire.

Cheers,
--
Giuseppe D'Angelo