[pcre-dev] [Bug 1372] New: We have OR (alternates). How abou…

Top Page
Delete this message
Author: Sorin
Date:  
To: pcre-dev
New-Topics: [pcre-dev] [Bug 1372] We have OR (alternates). How about AND and NOT?
Subject: [pcre-dev] [Bug 1372] New: We have OR (alternates). How about AND and NOT?
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1372
           Summary: We have OR (alternates). How about AND and NOT?
           Product: PCRE
           Version: N/A
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: wishlist
          Priority: low
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: sxn02@???
                CC: pcre-dev@???



Hi All,

I am working for a company that receives text submitted by it's customers and
we try to make sure it is not illegal, offensive etc before posting it on our
website. A good part of this work is done automatically by "filters" - regular
expressions checking that all is well before publication.

When submitting a filter, my work environment is just an interface to record
the regex, I have no access to the executing code.

I found myself many times in the situation to chase for certain fragments of
text before deciding to OK-it or not. If the subject text has expression 1, AND
also expression 2, AND expression 3, AND expression 4 (in any order), then I
have to act. Or, sometimes, a match is considered positive if I find two out of
three expressions to match.

The first one usually I solve it with something like this:
(?=.*expr1)(?=.*expr2)(?=.*expr3).*expr4
and, if possible, anchor all at the beginning of my regex.

The second one will look similar to this:
(?:(?=.*expr1).*expr2|(?=.*expr2).*expr3|(?=.*expr1).*expr3)

The attachment describes an alternative way, with the benefit of potentially
being more efficient (in my second example above, if the match is expr1&expr3
will go to all three alternates before finding that out. In the proposed
solution you'll see that this is not necessary), with flexibility in what is
matched and what not, backreference capturing, and tremendous visual
simplification of the regex.

I hope this will be found useful.

Regards,
Sorin Schwimmer,
in Toronto, Canada


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email