And result file :)
Foreword
This page has been taken [1]here. pcre directory has been updated to
use latest 8.37 instead of 8.32. The following is from the original
author, taken as-is on the above page as a reminder. I used 3 VMs which
I installed with Ubuntu 14.04 LTS, one x86_64, ppc64el and ppc64. F.
Participants
The following popular engines were choosen:
* [2]PCRE 8.32
* [3]tre 0.8.0
* [4]Oniguruma 5.9.3
* [5]re2 by Google [source tree: 29.10.2012]
* [6]PCRE 8.37 with sljit JIT compiler support
Before anyone jump to any conclusions, I should note the followings:
* The engines were not fine tuned (because of my lack of knowledge
about their internal workings). I just compiled them with the
default options. I know enabling or disabling some features can
heavily affect the results. If you feel that you have a better
configuration just drop me an e-mail and I will update the results
(hzmester(at)freemail(dot)hu).
* The regular expression engines are compiled with -O3 to allow the
best performance.
* This comparison page was inspired by the work of John Maddock (See
his own regex comparison [7]here). The input is also the same he
used before: [8]mtent12.zip. It is a text file (e-book) which size
is about 20 Mbytes.
* Only common patterns are selected, they are not pathological cases
nor have any PERL specific features. The comparison was caseful.
Results
x86-64 4x2.3GHz 4G (gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2)
Regular expression PCRE PCRE
-DFA TRE Onig-
uruma RE2 PCRE
-JIT
Twain 13 ms 13 ms 392 ms 18 ms 2 ms 17 ms
^Twain 104 ms 118 ms 193 ms 18 ms 77 ms 24 ms
Twain$ 12 ms 13 ms 407 ms 18 ms 2 ms 17 ms
Huck[a-zA-Z]+|Finn[a-zA-Z]+ 16 ms 17 ms 603 ms 40 ms 71 ms 22 ms
a[^x]{20}b 64 ms 382 ms 649 ms 353 ms 359 ms 56 ms
Tom|Sawyer|Huckleberry|Finn 22 ms 26 ms 1011 ms 47 ms 73 ms 40 ms
.{0,3}(Tom|Sawyer|Huckleberry|Finn) 3912 ms 5172 ms 3408 ms 111 ms 87
ms 412 ms
[a-zA-Z]+ing 739 ms 1666 ms 587 ms 872 ms 134 ms 182 ms
^[a-zA-Z]{0,4}ing[^a-zA-Z] 127 ms 153 ms 300 ms 39 ms 78 ms 25 ms
[a-zA-Z]+ing$ 773 ms 1771 ms 577 ms 883 ms 117 ms 183 ms
^[a-zA-Z ]{5,}$ 165 ms 328 ms 382 ms 289 ms 88 ms 49 ms
^.{16,20}$ 150 ms 242 ms 343 ms 494 ms 78 ms 30 ms
([a-f](.[d-m].){0,2}[h-n]){2} 511 ms 737 ms 859 ms 493 ms 151 ms 114 ms
([A-Za-z]awyer|[A-Za-z]inn)[^a-zA-Z] 737 ms 1035 ms 1035 ms 178 ms 117
ms 29 ms
"[^"]{0,30}[?!\.]" 19 ms 39 ms 442 ms 67 ms 8 ms 15 ms
Tom.{10,25}river|river.{10,25}Tom 48 ms 67 ms 674 ms 79 ms 81 ms 40 ms
ppc64el 4x3GHz 4G (gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2)
Regular expression PCRE PCRE
-DFA TRE Onig-
uruma RE2 PCRE
-JIT
Twain 26 ms 22 ms 425 ms 17 ms 3 ms 13 ms
^Twain 162 ms 189 ms 251 ms 17 ms 58 ms 13 ms
Twain$ 26 ms 22 ms 439 ms 17 ms 2 ms 13 ms
Huck[a-zA-Z]+|Finn[a-zA-Z]+ 26 ms 27 ms 769 ms 122 ms 58 ms 17 ms
a[^x]{20}b 113 ms 442 ms 645 ms 685 ms 718 ms 68 ms
Tom|Sawyer|Huckleberry|Finn 38 ms 39 ms 1459 ms 133 ms 59 ms 37 ms
.{0,3}(Tom|Sawyer|Huckleberry|Finn) 7382 ms 7761 ms 5049 ms 279 ms 59
ms 1224 ms
[a-zA-Z]+ing 1412 ms 2097 ms 660 ms 1343 ms 82 ms 431 ms
^[a-zA-Z]{0,4}ing[^a-zA-Z] 206 ms 236 ms 426 ms 41 ms 59 ms 52 ms
[a-zA-Z]+ing$ 1478 ms 2207 ms 649 ms 1380 ms 59 ms 513 ms
^[a-zA-Z ]{5,}$ 248 ms 429 ms 541 ms 489 ms 68 ms 135 ms
^.{16,20}$ 237 ms 331 ms 440 ms 713 ms 59 ms 64 ms
([a-f](.[d-m].){0,2}[h-n]){2} 752 ms 1119 ms 1105 ms 708 ms 76 ms 146
ms
([A-Za-z]awyer|[A-Za-z]inn)[^a-zA-Z] 1149 ms 1533 ms 1300 ms 286 ms 58
ms 23 ms
"[^"]{0,30}[?!\.]" 44 ms 60 ms 527 ms 136 ms 8 ms 43 ms
Tom.{10,25}river|river.{10,25}Tom 73 ms 99 ms 796 ms 164 ms 58 ms 21 ms
ppc64 4x3GHz 4G (gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2)
Regular expression PCRE PCRE
-DFA TRE Onig-
uruma RE2 PCRE
-JIT
Twain 27 ms 21 ms 543 ms 17 ms 12 ms 13 ms
^Twain 174 ms 189 ms 450 ms 17 ms 58 ms 13 ms
Twain$ 27 ms 21 ms 565 ms 17 ms 12 ms 13 ms
Huck[a-zA-Z]+|Finn[a-zA-Z]+ 27 ms 28 ms 866 ms 57 ms 58 ms 18 ms
a[^x]{20}b 114 ms 438 ms 805 ms 806 ms 567 ms 73 ms
Tom|Sawyer|Huckleberry|Finn 41 ms 40 ms 1541 ms 74 ms 58 ms 37 ms
.{0,3}(Tom|Sawyer|Huckleberry|Finn) 9107 ms 8621 ms 5184 ms 246 ms 59
ms 1265 ms
[a-zA-Z]+ing 1825 ms 2348 ms 773 ms 2678 ms 83 ms 456 ms
^[a-zA-Z]{0,4}ing[^a-zA-Z] 223 ms 238 ms 692 ms 51 ms 59 ms 53 ms
[a-zA-Z]+ing$ 1897 ms 2403 ms 763 ms 2789 ms 59 ms 507 ms
^[a-zA-Z ]{5,}$ 307 ms 500 ms 810 ms 609 ms 69 ms 141 ms
^.{16,20}$ 284 ms 329 ms 662 ms 1260 ms 59 ms 64 ms
([a-f](.[d-m].){0,2}[h-n]){2} 1206 ms 1290 ms 1191 ms 996 ms 77 ms 147
ms
([A-Za-z]awyer|[A-Za-z]inn)[^a-zA-Z] 2085 ms 1868 ms 1316 ms 250 ms 58
ms 22 ms
"[^"]{0,30}[?!\.]" 46 ms 59 ms 649 ms 129 ms 16 ms 43 ms
Tom.{10,25}river|river.{10,25}Tom 91 ms 99 ms 893 ms 143 ms 58 ms 22 ms
References
1.
http://sljit.sourceforge.net/regex_perf.html
2.
http://www.pcre.org/
3.
http://laurikari.net/tre/
4.
http://laurikari.net/tre/
5.
http://code.google.com/p/re2/
6.
file:///tmp/pcre.html
7.
http://www.boost.org/doc/libs/1_41_0/libs/regex/doc/gcc-performance.html
8.
http://www.gutenberg.org/files/3200/old/mtent12.zip