[pcre-dev] Run script during pattern matching in PCRE2

Góra strony
Delete this message
Autor: Zoltán Herczeg
Data:  
Dla: pcre-dev
Temat: [pcre-dev] Run script during pattern matching in PCRE2
Hi all,

I would like to announce a new feature in PCRE2: running scripts during pattern matching. Basically this is an extension of the callout feature with string arguments. Imagine you can run php, JavaScript, QML scripts inside a regex.

In perl the /ab(?{ print "hello"; })/ regex matches to the "ab" string and also prints hello. In PCRE2 you can do something similar with callouts: /ab(?C1)/. However, the callout has only a 0-255 number argument, which is rather inconvenient to use in a script language, since you need an <id, function> map. Maintenance is difficult, especially if the id needs to be changed (because you need to update all patterns manually). But this is over, strings can be used instead of numbers from now on. The PCRE2 form of the previous example is the following: /ab(?C` print "hello"; `)/ In this example we used the ` for delimiter. However, there are many script languages, and they assign different roles for different characters, so we have a large set of delimiters:

/ab(?C`code`)/
/ab(?C'code')/
/ab(?C"code")/
/ab(?C$code$)/
/ab(?C@code@)/
/ab(?C[code])/
/ab(?C{code})/

These patterns represent the same regex. Feel free to use the most convenient delimiter for you. Even if you need the delimiter character, just duplicate it: print("hello world") can be encoded as /ab(?C" print(""hello world""); ")/

Regarding performance, it is not recommended to use these embedded scripts too frequently. However, it is usually still faster than splitting a pattern to multiple patterns and conditionally run them (based on a condition you cannot evaluate by the regex engine).

I hope everybody will like it, and many script language will implement the support for this feature.

Regards,
Zoltan