Re: [pcre-dev] What is the best way to multi match

Top Page
Delete this message
Author: Andrew Ho
Date:  
To: Ronen Hod
CC: PCRE Developers
Subject: Re: [pcre-dev] What is the best way to multi match
Hi Ronen,

>I am parsing an HTTP query-string ("s1&s2&...&sn"), and need to find
>which of the patterns (p1, p2, ..., pm) exist there.
>So far the best way that I found was to use the RegExp
>^(|.+&)p1($|&)(?C0)|^(|.+&)p2($|&)(?C0)|...|^(|.+&)pm($|&)(?C0)
>and remember the position of every "|" that follows the callout so I
>can identify them when I get the callout (using pcre_dfa_exec()). Does
>anybody have any better working solution for this problem?


To be honest, a regular expression is the wrong tool for the job for
parsing an HTTP query string.

I would do the parsing by hand: separate by '&' characters, then, for
each token, separate by '=' characters. In either the regex or manual
parsing cases, you will need to do URI unescaping (for example, "%61" to
"a"). You can do your manual parsing either using a simple state machine
(at any given time you are either parsing a name, or a value), or with
multiple calls to strtok() or strtok_r().

Humbly,

Andrew

----------------------------------------------------------------------
'Twas brillig, and the slithy toves                         Andrew Ho
  Did gyre and gimble in the wabe.                  andrew@???
  All mimsy were the borogoves,
  And the mome raths outgrabe.          http://www.zeuscat.com/andrew/
----------------------------------------------------------------------