Re: [pcre-dev] Remove some restrictions of lookbehind assert…

Top Page
Delete this message
Author: Zoltán Herczeg
Date:  
To: Pcre-dev
Subject: Re: [pcre-dev] Remove some restrictions of lookbehind assertions
 Hi,

> I was faced with a need of nonfixed length lookbehind two times:
> 1. when data came by stream of 24kB blocks and I need to find a last
>numeric in each of it
> /.{24000}(?<=(\d++)\D*+)/g


Even if this would work, the result of this would be always the last position of the subject, and that is probably not what you want.

/(?s).*(?<!\d)\K(?=(\d++))/

This provides the starting position of the decimal number, with going back from the input. This is also much faster than scanning [\G,n-1].

> 2. when I have a json-array file and want to find every top-level element
> that have "id" tag at any nested level
> /(\{(?:[^{}]++|(?1))*+\})(?<=\{"id":"(?>.*?").*)/g


Well, I would first first the top level element first then scan the nested levels. This is closer to the concept how regexes work.

> 2. There are cases where there is no no programming language available for
> user, only regex. And exactly this case is in one of my application.


If you have such issues frequently I would recommend to support some scripting language. There are many small and fairly quick JS engines, which are easy to add. If I remember correctly Duktape or QuickJS are just 1 file "applications".

But if you really want [\G,n-1] searches, you can consider adding it using callouts. I have a fun project which combines pcre2 with shell commands, and it provides so much power:

https://github.com/zherczeg/pcresp

E.g. I can search a time in the output and covert it to seconds using python print command with one step. You can see how callouts can be "exploited" to control matches.

Regards,
Zoltan