Re: [pcre-dev] What is the best way to multi match

Top Page
Delete this message
Author: Ronen Hod
Date:  
To: Andrew Ho
CC: PCRE Developers
Subject: Re: [pcre-dev] What is the best way to multi match
Hi Andrew,

Thanks for the advice and tips.
Originally, I wrote my code this way (tokenization...), but later I was asked to support regular expressions in the patterns so I had to use a regular expression engine, and the question is "How?".
BTW, isn't it more efficient to let the DFA run a single pass on the input string than to scan it several times?

Thanks, Ronen.

-----Original Message-----
From: Andrew Ho [mailto:andrew@zeuscat.com]
Sent: Wednesday, July 29, 2009 8:28 PM
To: Ronen Hod
Cc: PCRE Developers
Subject: Re: [pcre-dev] What is the best way to multi match

Hi Ronen,

>I am parsing an HTTP query-string ("s1&s2&...&sn"), and need to find
>which of the patterns (p1, p2, ..., pm) exist there.
>So far the best way that I found was to use the RegExp
>^(|.+&)p1($|&)(?C0)|^(|.+&)p2($|&)(?C0)|...|^(|.+&)pm($|&)(?C0)
>and remember the position of every "|" that follows the callout so I
>can identify them when I get the callout (using pcre_dfa_exec()). Does
>anybody have any better working solution for this problem?


To be honest, a regular expression is the wrong tool for the job for
parsing an HTTP query string.

I would do the parsing by hand: separate by '&' characters, then, for
each token, separate by '=' characters. In either the regex or manual
parsing cases, you will need to do URI unescaping (for example, "%61" to
"a"). You can do your manual parsing either using a simple state machine
(at any given time you are either parsing a name, or a value), or with
multiple calls to strtok() or strtok_r().

Humbly,

Andrew

----------------------------------------------------------------------
'Twas brillig, and the slithy toves                         Andrew Ho
  Did gyre and gimble in the wabe.                  andrew@???
  All mimsy were the borogoves,
  And the mome raths outgrabe.          http://www.zeuscat.com/andrew/
----------------------------------------------------------------------