[pcre-dev] [Bug 2182] Lookahead behaving as though a match …

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2182] Lookahead behaving as though a match succeeded in a null-matching repeated group
https://bugs.exim.org/show_bug.cgi?id=2182

--- Comment #6 from John Tattarakis <tattarakis@???> ---
That's great to hear! Just to give you an idea of what you'd be enabling, here
is an example I was tinkering with a while ago:

^(?:(?=(\1.|)(.))){1,1337}?(?!.*\2.*\2)\1\K\2

This matches a character that only exists once in the subject, sort of like a
hypothetical "(.)(?<!\1.+)(?!.*\1)". I have other more complex patterns that
employ this (very limited) trick, such as "match the character that appears
with the greatest frequency", "match the longest/shortest word in the subject",
that would be great to be able to solve more completely. Yes it's ill-advised
to perform such tasks with regex in the real world, but I hardly think it would
be the first time you have introduced a feature that fostered creativity
bordering on insanity :)

While most people would only use this feature to perform simple iterations
through the subject, mostly to overcome the atomic nature of lookaheads (a task
with time complexity O(n^2) if I'm not mistaken?), there may be some more
nefarious users who may try to exploit a basic safeguard such as "make sure at
least one backrefence has changed in value" by using a pattern such as
"(?:(?=((?(?=\1(?<=^.))..|.))))+" that alternately captures the same 1 and 2
character substrings, and thus wouldn't terminate. Perhaps a hard limit on
iteration and/or a more sophisticated safeguard to detect cycles, such as "make
sure the overall state of backreferences is not a duplicate of a past state",
is called for.

Anyway, I'm thinking aloud at this point. Thanks again, and thank you for
taking such a hands-on approach to maintaining this brilliant bit of software.

- John

--
You are receiving this mail because:
You are on the CC list for the bug.