Re: [pcre-dev] Powerpc optimisation

Top Page
Delete this message
Author: Zoltán Herczeg
Date:  
To: Frederic Bonnard
CC: pcre-dev
Subject: Re: [pcre-dev] Powerpc optimisation
Hi Frederic,

thank you for measuring PCRE on PPC. The results are quite interesting.

It seems to me that those patterns are slower whose require heavy backtracking. I mean where fast-forward (skipping) algorithms cannot be used (or they match too frequently). The /[a-zA-Z]+ing/ is a good example for that. Backtracking engines (PCRE, Oniguruma) suffers much more on PPC than those that read input once (TRE, RE2). I suspect branch prediction on x86 is better, but only statistics profilers can prove that. Oprofile is available everywhere, and can profile JIT code. That part is developed by IBM :)

http://oprofile.sourceforge.net/doc/devel/index.html

It needs some extra coding though. If you are interested to work on that, I can help.

Btw the Tom.{10,25}river|river.{10,25}Tom pattern is twice as fast on PPC with JIT if I understand the numbers correctly.

Regards,
Zoltan

Frederic Bonnard <frediz@???> írta:
>Thanks Zoltan for the quick reply.
>- Ok I think I got it for SSE2.
>- For SIMD instructions, I fear I don't have currently the knowledge for that but
>would be willing to learn/help.
>- A good start would be that 3rd point, about current code and performance
> status on PPC vs x86.
> I reused http://sljit.sourceforge.net/regex_perf.html, I hope it is relevant.
> pcre directory has been updated to use latest 8.37 instead of 8.32.
> My VMs were :
> * x86-64 4x2.3GHz 4G memory on a x86-64 host
> * ppc64el 4x3GHz 4G memory on a P8 host
> * ppc64 4x3GHz 4G memory on a P8 host
> All were installed with Ubuntu 14.04 LTS.
> Note on Ubuntu for ppc64, default is to have binary in 32b running on a 64b
> kernel, thus the binary 'runtest' is 32b. Maybe I'd need to try with 64b
> binary.
> Here is attached the results for those 3 environments. The goal is not to
> find who's the best but rather find any odd behaviour. Also let's focus on
> pcre/pcre-jit .
> Any comment from experts eyes welcomed.
> On my side, I see very comparable results between ppc64/pcc64el so no major
> issue on ppc64el. Now, between x86 and ppc64el, the results for the latter
> seem overall weaker, all the more that the x86 VM has lower freq.
> Results would need maybe more repetition ? and percentage to compare but I
> already see some x2 or x3 time slower results for pcre-jit :
> .{0,3}(Tom|Sawyer|Huckleberry|Finn)
> [a-zA-Z]+ing
> ^[a-zA-Z]{0,4}ing[^a-zA-Z]
> [a-zA-Z]+ing$
> ^[a-zA-Z ]{5,}$
> ^.{16,20}$
> "[^"]{0,30}[?!\.]"
> Tom.{10,25}river|river.{10,25}Tom
>
> Any special treatment for these that could make code generated on power weaker ?
>
> Fred
>
>--
>## List details at https://lists.exim.org/mailman/listinfo/pcre-dev