Re: [pcre-dev] (no subject)

Top Page
Delete this message
Author: ph10
Date:  
To: swati upadhyaya
CC: pcre-dev
New-Topics: Re: [pcre-dev] (no subject)
Subject: Re: [pcre-dev] (no subject)
On Thu, 24 Apr 2014, swati upadhyaya wrote:

>                        Thanks for your replt,it will be great if you can
> shot out my problem...I have tried with many pattern and found that PCRE
> talkes lesser time then any other regex lib thats why want to use PCRE but
> there are some pattern like the one abpve for which its unable to match.


Is this pattern generated by some process? It contains really silly
sequences like \s*(?:(?:(?:\s+)))\s* and similar. I had a further look.
I found it was failing at the \t in the sequence

\s*\s*(?:(?:(?:[\t]+)))\s*\s*

(another crazy sequence) because there were no tab characters in the
data string. So I changed \t to \s (to match a space). The match then
failed with

Error -8 (match limit exceeded)

In other words, the pattern makes a very large search tree, which takes
a long time to scan. Sequences such as (?:(?:\w+\s?)+))) are dangerous
because they contain nested unlimited repeats.

This is such a crazy pattern that I really can't mess with any more. Can
you not find a way of creating a clean pattern without all the
redundancy? It might then be easier to see why it runs for so long. I'm
suspicious of all the .*? items: each of those is going to try the rest
of the pattern after swallowing 0, 1, 2, 3, ... characters. The use of
atomic groups (?>.....) would also stop a lot of the backtracking.

Aha! I changed (?:(?:\w+\s?)+))) to (?:(?>\w+\s?)+))) that is, made it
into an atomic group, and lo and behold, when I ran pcretest:

PCRE version 8.35 2014-04-04

"MSWinEventLog\s*(?:(?:(?:\s+)))\s*(?:\s*(?:(?:(?:\d\s+)))\s*)?\s*(?:(?P<event_log__string>(?:\S+)))\s*\s*(?:(?:(?:.*?)))\s*\s*(?:(?:(?:\s+)))\s*\s*(?:(?P<event_id__0>(?:4610|4614|4622)))\s*\s*(?:(?:(?:[\s]+)))\s*\s*(?:(?P<event_source__all>(?:.*?)))\s*\s*(?:(?:(?:[\s]+)))\s*\s*(?:(?:(?:.*?)))\s*\s*(?:(?:(?:[\s]+)))\s*\s*(?:(?:(?:.*?)))\s*\s*(?:(?:(?:[\s]+)))\s*\s*(?:(?:(?:.*?)))\s*\s*(?:(?:(?:[\s]+)))\s*\s*(?:(?:(?:.*?)))\s*\s*(?:(?:(?:[\s]+)))\s*\s*(?:(?P<event_category__all>(?:.*?)))\s*\s*(?:(?:(?:[\s]+)))\s*\s*(?:(?:(?:(A|An).*?)))\s*\s*(?:(?P<object__words>(?:(?>\w+\s?)+)))\s*\s*(?:(?:(?:has been)))\s*\s*(?:(?P<action__0>(?:loaded)))\s*\s*(?:(?:(?: by the)))\s*\s*(?:(?:(?:.*?)))\s*Package Name\:\s*(?:(?P<package__0>(?:\S+)))\s*"
<14>Mar 2 11:34:38 89.237.143.23 MSWinEventLog 1 Security 6500 Fri Mar 02 11:34:37 2012 4610 Microsoft-Windows-Security-Auditing    N/A    N/A    Success Audit prabhat.ImmuneAps.com    User Logoff    A authentication package has been loaded by the Local Security Authority. This authentication package will be used to authenticate logon attempts.  Authentication Package Name: C:\\Windows\\system32\\msv1_0.dll : MICROSOFT_AUTHENTICATION_PACKAGE_V1_0
 0: MSWinEventLog 1 Security 6500 Fri Mar 02 11:34:37 2012 4610 Microsoft-Windows-Security-Auditing    N/A    N/A    Success Audit prabhat.ImmuneAps.com    User Logoff    A authentication package has been loaded by the Local Security Authority. This authentication package will be used to authenticate logon attempts.  Authentication Package Name: C:\Windows\system32\msv1_0.dll 
 1: Security
 2: 4610
 3: Microsoft-Windows-Security-Auditing
 4: prabhat.ImmuneAps.com    User Logoff
 5: A
 6: authentication package 
 7: loaded
 8: C:\Windows\system32\msv1_0.dll


... and this was pretty well instantaneous.

Philip

--
Philip Hazel