------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1372
Summary: We have OR (alternates). How about AND and NOT?
Product: PCRE
Version: N/A
Platform: All
OS/Version: All
Status: NEW
Severity: wishlist
Priority: low
Component: Code
AssignedTo: ph10@???
ReportedBy: sxn02@???
CC: pcre-dev@???
Hi All,
I am working for a company that receives text submitted by it's customers and
we try to make sure it is not illegal, offensive etc before posting it on our
website. A good part of this work is done automatically by "filters" - regular
expressions checking that all is well before publication.
When submitting a filter, my work environment is just an interface to record
the regex, I have no access to the executing code.
I found myself many times in the situation to chase for certain fragments of
text before deciding to OK-it or not. If the subject text has expression 1, AND
also expression 2, AND expression 3, AND expression 4 (in any order), then I
have to act. Or, sometimes, a match is considered positive if I find two out of
three expressions to match.
The first one usually I solve it with something like this:
(?=.*expr1)(?=.*expr2)(?=.*expr3).*expr4
and, if possible, anchor all at the beginning of my regex.
The second one will look similar to this:
(?:(?=.*expr1).*expr2|(?=.*expr2).*expr3|(?=.*expr1).*expr3)
The attachment describes an alternative way, with the benefit of potentially
being more efficient (in my second example above, if the match is expr1&expr3
will go to all three alternates before finding that out. In the proposed
solution you'll see that this is not necessary), with flexibility in what is
matched and what not, backreference capturing, and tremendous visual
simplification of the regex.
I hope this will be found useful.
Regards,
Sorin Schwimmer,
in Toronto, Canada
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email