[pcre-dev] pcre 8.12 + sljit

Pàgina inicial
Delete this message
Autor: Herczeg Zoltán
Data:  
A: pcre-dev
Assumpte: [pcre-dev] pcre 8.12 + sljit
Hi all,

I rebased the code base to PCRE 8.12 and implemented some opcodes (newlinerelated \R . ^ $ and utf8-unicode related \P and xlass. You can download it from here:

http://sljit.sourceforge.net/pcre-8.12-sljit-all.tgz

Some performance resuls (this time on an x86-64 machine, and non-english utf8 text input: http://www.gutenberg.org/cache/epub/23756/pg23756.txt)

Pattern: 'die der' Matches: 236
  C:    0 ms, J:    0 ms, Int runtime:     40 ms JIT runtime:     30 ms
Pattern: 'die der' Matches: 236 Caseless
  C:    0 ms, J:    0 ms, Int runtime:     40 ms JIT runtime:     30 ms
Pattern: 'ist|der|die|und' Matches: 115196
  C:    0 ms, J:    0 ms, Int runtime:    210 ms JIT runtime:     50 ms
Pattern: 'ist|der|die|und' Matches: 118816 Caseless
  C:    0 ms, J:    0 ms, Int runtime:    210 ms JIT runtime:     50 ms
Pattern: '\b\w+\b' Matches: 875496
  C:    0 ms, J:    0 ms, Int runtime:    440 ms JIT runtime:    120 ms
Pattern: '^.{4,32}[kg\P{N}].{4,32}$' Matches: 12020
  C:    0 ms, J:    0 ms, Int runtime:   3470 ms JIT runtime:    920 ms
Pattern: '((\w{2,8},?(\P{Z}|\R)){1,2}\.\s?)$' Matches: 4856 Caseless
  C:    0 ms, J:    0 ms, Int runtime:   9260 ms JIT runtime:   1430 ms
Pattern: '(?:da|ge|om)+(?:n|me)*' Matches: 93640 Caseless
  C:    0 ms, J:    0 ms, Int runtime:    170 ms JIT runtime:     50 ms
Pattern: '\P{Lu}\P{L&}{0,12}[\s\-]{1,4}..[\P{L}\P{N}]{4}' Matches: 469383
  C:    0 ms, J:    0 ms, Int runtime:    550 ms JIT runtime:    140 ms


Note: since JIT compiling is a heavy weight optimization, it becomes efficient when the input size is big, the expression is frequently used or the matching requires deep recursion. Otherwise the compiling time can be longer than the runtime of pcre_exec.

Regards,
Zoltan