Re: [pcre-dev] I'm adding PCRE v2 support to Git. It's a bit…

Top Page
Delete this message
Author: Zoltán Herczeg
Date:  
To: Ævar Arnfjörð Bjarmason
CC: pcre-dev
Subject: Re: [pcre-dev] I'm adding PCRE v2 support to Git. It's a bit slower than v1
Hi,

>I couldn't find out how to get the compile flags for those, but
>presumably it's some comparable middle-of-the-road value, probably
>-O2. I'll try compiling from svn & report back.


Likely -O2. I am surprised that JIT is enabled in default builds. I am sure it wasn't in the past.

>> This is a difficult question since I don't know the internals of git. Yes, /\b(?:PATTERN)\b/ could be used for checking full words unless the PATTERN has some exotic features like (*ACCEPT) control verb. The PCRE2_NO_AUTO_CAPTURE can be useful if you don't need capturing brackets. Callouts can be used for some extreme text searching.
>
>*Nod* will experiment with that.


PCRE is a generic text processor with controlled backtracking which allows complex text processing. E.g:

/#.*(*SKIP)(*F)|abcd/

Searches abcd outside of newline terminated # comments (bash/python). It is easy to extend this to C comments and C strings, so you can search identifiers outside of comments, strings in C source code which is sometimes very useful.

Atomic blocks prevents backtracking:

/(?>\r|\n|\r\n){3}/

This searches 3 consecutive newlines in a code, where newline can be \r, \n, or \r\n. For example this prevents matching of \r\n\r which is just two newlines, not three. But matches to \n\r\r which is three newlines.

I have a toy project (albeit I used it several times), which is basically pcre in bash:

https://github.com/zherczeg/pcresp

This demonstrates the power of callouts. You can do things like searching paths in a text which are valid paths in the current file system, searching accessible ip addresses, whatever.

Regards,
Zoltan