[pcre-dev] [Bug 1049] Add support for UTF-16

Startseite
Nachricht löschen
Autor: Zoltan Herczeg
Datum:  
To: pcre-dev
Alte Treads: [pcre-dev] [Bug 1049] New: Add support for UTF-16
Betreff: [pcre-dev] [Bug 1049] Add support for UTF-16
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1049




--- Comment #35 from Zoltan Herczeg <hzmester@???> 2011-12-17 07:11:30 ---
I did some measurements with the 16 bit PCRE on UTF8 and UTF16 (same test). It
seems to me that the UTF16 is a bit slower in interpreter, but the JIT is
basically unaffected.

The utf8_input.txt (size: 1392505 x 4) is loaded.
Pattern: 'die der' Matches: 236
  8 bit: Int runtime:     20 ms JIT runtime:     10 ms 2.00 as fast 50.0% save
  16 bit: Int runtime:     20 ms JIT runtime:     10 ms 2.00 as fast 50.0% save
Pattern: 'ist|der|die|und' Matches: 118816 Caseless
  8 bit: Int runtime:    100 ms JIT runtime:     30 ms 3.33 as fast 70.0% save
  16 bit: Int runtime:    120 ms JIT runtime:     40 ms 3.00 as fast 66.7% save
Pattern: '\b\w+\b' Matches: 803324
  8 bit: Int runtime:    260 ms JIT runtime:    120 ms 2.17 as fast 53.8% save
  16 bit: Int runtime:    260 ms JIT runtime:    120 ms 2.17 as fast 53.8% save
Pattern: '(?:da|ge|om)+(?:n|me)*' Matches: 93640 Caseless
  8 bit: Int runtime:     80 ms JIT runtime:     30 ms 2.67 as fast 62.5% save
  16 bit: Int runtime:    100 ms JIT runtime:     30 ms 3.33 as fast 70.0% save
Pattern: '\b(?(?=\w+ro)\w+pa|\w+lle)\w+\b' Matches: 7972
  8 bit: Int runtime:    580 ms JIT runtime:    260 ms 2.23 as fast 55.2% save
  16 bit: Int runtime:    640 ms JIT runtime:    250 ms 2.56 as fast 60.9% save
Pattern: '\b(\W)\1+\b|(^(?=.*kl)(?=.*no).{15,40}$)' Matches: 148
  8 bit: Int runtime:    650 ms JIT runtime:    180 ms 3.61 as fast 72.3% save
  16 bit: Int runtime:    800 ms JIT runtime:    210 ms 3.81 as fast 73.8% save
Pattern: '^.{4,32}(\P{N})\1{2,}.{4,32}(?<![nuk])$' Matches: 264
  8 bit: Int runtime:    220 ms JIT runtime:     50 ms 4.40 as fast 77.3% save
  16 bit: Int runtime:    240 ms JIT runtime:     50 ms 4.80 as fast 79.2% save
Pattern: '^(\w{3,})(?!\1).*\h.*\1$' Matches: 576 Caseless
  8 bit: Int runtime:   4560 ms JIT runtime:   1890 ms 2.41 as fast 58.6% save
  16 bit: Int runtime:   4290 ms JIT runtime:   1890 ms 2.27 as fast 55.9% save
Pattern: '((\w{2,8},?(\P{Z}|\R)){1,2}\.\s?)$' Matches: 5040 Caseless
  8 bit: Int runtime:   4500 ms JIT runtime:   1270 ms 3.54 as fast 71.8% save
  16 bit: Int runtime:   5090 ms JIT runtime:   1220 ms 4.17 as fast 76.0% save
Pattern: '\b\w*?((.){1,3}\w*\2)\w*?(?1)' Matches: 251012
  8 bit: Int runtime:   1710 ms JIT runtime:    620 ms 2.76 as fast 63.7% save
  16 bit: Int runtime:   1900 ms JIT runtime:    610 ms 3.11 as fast 67.9% save
Pattern: '\w*?(b{2,3})\w*?c' Matches: 16
  8 bit: Int runtime:   1240 ms JIT runtime:    320 ms 3.88 as fast 74.2% save
  16 bit: Int runtime:   1250 ms JIT runtime:    320 ms 3.91 as fast 74.4% save
Pattern: '\P{Lu}\P{L&}{0,12}[\s\-]{1,4}..[\P{L}\P{N}]{4}' Matches: 469547
  8 bit: Int runtime:    300 ms JIT runtime:     90 ms 3.33 as fast 70.0% save
  16 bit: Int runtime:    370 ms JIT runtime:    110 ms 3.36 as fast 70.3% save
Pattern: '\b(\B([c-h])\B|[a-z]+?(?1)[a-z])' Matches: 297816
  8 bit: Int runtime:    950 ms JIT runtime:    300 ms 3.17 as fast 68.4% save
  16 bit: Int runtime:   1060 ms JIT runtime:    310 ms 3.42 as fast 70.8% save
Average:
  8 bit: Int runtime:   1166 ms JIT runtime:    397 ms 3.04 as fast 65.2% save
  16 bit: Int runtime:   1241 ms JIT runtime:    397 ms 3.22 as fast 66.9% save



--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email