Re: [pcre-dev] Powerpc optimisation

Góra strony
Delete this message
Autor: Zoltán Herczeg
Data:  
Dla: Frederic Bonnard
CC: pcre-dev
Nowe tematy: Re: [pcre-dev] Powerpc optimisation
Temat: Re: [pcre-dev] Powerpc optimisation
Hi Frederic,

I just realized that results on that page are two years old. So I updated the engines to their most recent versions and uploaded new results. These results are overall better for all engines (partly because of a newer gcc). The JIT is also improved overall, e.g. the 3rd starting from the last pattern was decreased to 27 ms from 190 ms.

Regards,
Zoltan

"Zoltán Herczeg" <hzmester@???> írta:
>Hi Frederic,
>
>thank you for measuring PCRE on PPC. The results are quite interesting.
>
>It seems to me that those patterns are slower whose require heavy backtracking. I mean where fast-forward (skipping) algorithms cannot be used (or they match too frequently). The /[a-zA-Z]+ing/ is a good example for that. Backtracking engines (PCRE, Oniguruma) suffers much more on PPC than those that read input once (TRE, RE2). I suspect branch prediction on x86 is better, but only statistics profilers can prove that. Oprofile is available everywhere, and can profile JIT code. That part is developed by IBM :)
>
>http://oprofile.sourceforge.net/doc/devel/index.html
>
>It needs some extra coding though. If you are interested to work on that, I can help.
>
>Btw the Tom.{10,25}river|river.{10,25}Tom pattern is twice as fast on PPC with JIT if I understand the numbers correctly.
>
>Regards,
>Zoltan
>
>Frederic Bonnard <frediz@???> írta:
>>Thanks Zoltan for the quick reply.
>>- Ok I think I got it for SSE2.
>>- For SIMD instructions, I fear I don't have currently the knowledge for that but
>>would be willing to learn/help.
>>- A good start would be that 3rd point, about current code and performance
>> status on PPC vs x86.
>> I reused http://sljit.sourceforge.net/regex_perf.html, I hope it is relevant.
>> pcre directory has been updated to use latest 8.37 instead of 8.32.
>> My VMs were :
>> * x86-64 4x2.3GHz 4G memory on a x86-64 host
>> * ppc64el 4x3GHz 4G memory on a P8 host
>> * ppc64 4x3GHz 4G memory on a P8 host
>> All were installed with Ubuntu 14.04 LTS.
>> Note on Ubuntu for ppc64, default is to have binary in 32b running on a 64b
>> kernel, thus the binary 'runtest' is 32b. Maybe I'd need to try with 64b
>> binary.
>> Here is attached the results for those 3 environments. The goal is not to
>> find who's the best but rather find any odd behaviour. Also let's focus on
>> pcre/pcre-jit .
>> Any comment from experts eyes welcomed.
>> On my side, I see very comparable results between ppc64/pcc64el so no major
>> issue on ppc64el. Now, between x86 and ppc64el, the results for the latter
>> seem overall weaker, all the more that the x86 VM has lower freq.
>> Results would need maybe more repetition ? and percentage to compare but I
>> already see some x2 or x3 time slower results for pcre-jit :
>> .{0,3}(Tom|Sawyer|Huckleberry|Finn)
>> [a-zA-Z]+ing
>> ^[a-zA-Z]{0,4}ing[^a-zA-Z]
>> [a-zA-Z]+ing$
>> ^[a-zA-Z ]{5,}$
>> ^.{16,20}$
>> "[^"]{0,30}[?!\.]"
>> Tom.{10,25}river|river.{10,25}Tom
>>
>> Any special treatment for these that could make code generated on power weaker ?
>>
>> Fred
>>
>>--
>>## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
>
>
>--
>## List details at https://lists.exim.org/mailman/listinfo/pcre-dev