https://bugs.exim.org/show_bug.cgi?id=2182
--- Comment #6 from John Tattarakis <tattarakis@???> ---
That's great to hear! Just to give you an idea of what you'd be enabling, here
is an example I was tinkering with a while ago:
^(?:(?=(\1.|)(.))){1,1337}?(?!.*\2.*\2)\1\K\2
This matches a character that only exists once in the subject, sort of like a
hypothetical "(.)(?<!\1.+)(?!.*\1)". I have other more complex patterns that
employ this (very limited) trick, such as "match the character that appears
with the greatest frequency", "match the longest/shortest word in the subject",
that would be great to be able to solve more completely. Yes it's ill-advised
to perform such tasks with regex in the real world, but I hardly think it would
be the first time you have introduced a feature that fostered creativity
bordering on insanity :)
While most people would only use this feature to perform simple iterations
through the subject, mostly to overcome the atomic nature of lookaheads (a task
with time complexity O(n^2) if I'm not mistaken?), there may be some more
nefarious users who may try to exploit a basic safeguard such as "make sure at
least one backrefence has changed in value" by using a pattern such as
"(?:(?=((?(?=\1(?<=^.))..|.))))+" that alternately captures the same 1 and 2
character substrings, and thus wouldn't terminate. Perhaps a hard limit on
iteration and/or a more sophisticated safeguard to detect cycles, such as "make
sure the overall state of backreferences is not a duplicate of a past state",
is called for.
Anyway, I'm thinking aloud at this point. Thanks again, and thank you for
taking such a hands-on approach to maintaining this brilliant bit of software.
- John
--
You are receiving this mail because:
You are on the CC list for the bug.