Re: [pcre-dev] Strangely long matching times. Could anyone h…

Αρχική Σελίδα
Delete this message
Συντάκτης: Ralf Junker
Ημερομηνία:  
Προς: pcre-dev@exim.org
Αντικείμενο: Re: [pcre-dev] Strangely long matching times. Could anyone help to explain?
Below are my measurements with JIT enabled.

Test file:

/aa.*?bba/
   \[aa                                 bb ]{10000}
   \[aa                                 bb ]{20000}
   \[aa                                 bb ]{30000}
   \[aa                                 bb ]{40000}
   \[aa                                 bb ]{50000}
   \[aa                                 bb ]{60000}
   \[aa                                 bb ]{70000}
   \[aa                                 bb ]{80000}
   \[aa                                 bb ]{90000}
   \[aa                                 bb ]{100000}


Command line:

pcre2test -tm 10 -jit tests_jit.txt

Results:

/aa.*?bba/
   \[aa                                 bb ]{10000}
Match time 1.6000 milliseconds
No match
   \[aa                                 bb ]{20000}
Match time 4.7000 milliseconds
No match
   \[aa                                 bb ]{30000}
Match time 4.6000 milliseconds
No match
   \[aa                                 bb ]{40000}
Match time 4.7000 milliseconds
No match
   \[aa                                 bb ]{50000}
Match time 7.8000 milliseconds
No match
   \[aa                                 bb ]{60000}
Match time 9.4000 milliseconds
No match
   \[aa                                 bb ]{70000}
Match time 10.9000 milliseconds
No match
   \[aa                                 bb ]{80000}
Match time 14.0000 milliseconds
No match
   \[aa                                 bb ]{90000}
Match time 14.1000 milliseconds
No match
   \[aa                                 bb ]{100000}
Match time 17.2000 milliseconds
No match


As expected, JIT is much faster than the PCRE2 interpreter.

However, it is surprising that JIT times grow linearly with subject
length, whereas the interpreter's grow exponentially.

Even more surprisingly: Perl runs still many times faster than PCRE JIT.
Here is a test script:

use Time::HiRes qw( time );

sub runregex {
   $n = @_[0];
   $text = "aa                                 bb " x $n;


   $begin_time = time();
   for ($i=1; $i <= 10; $i++) {
     print "Match\n" if $text =~ /aa.*?bba/;
   }
   $end_time = time();
   printf("%8d %.4f\n", $n, $end_time - $begin_time);
}


runregex (10000);
runregex (20000);
runregex (30000);
runregex (40000);
runregex (50000);
runregex (60000);
runregex (70000);
runregex (80000);
runregex (90000);
runregex (100000);

Ralf

On 29.11.2020 11:39, Zoltán Herczeg wrote:

> is this measured with JIT enabled? I wrote an introduction about the JIT
> compiler before:
>
> https://zherczeg.github.io/sljit/pcre2_jit.html
>
> The single character optimization described in the paragraph containing
> the (*SKIP) verb should handle it.