[pcre-dev] [Bug 1368] New: Easier to use function for multi-…

Top Page
Delete this message
Author: Sébastien Wilmet
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1368] New: Easier to use function for multi-segment matching
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1368
           Summary: Easier to use function for multi-segment matching
           Product: PCRE
           Version: N/A
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: wishlist
          Priority: medium
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: swilmet@???
                CC: pcre-dev@???



There are roughly various ways to do multi-segment matching:
- Use PCRE_PARTIAL_HARD or PCRE_PARTIAL_SOFT
- Combined with pcre_exec() or pcre_dfa_exec()

As the pcrepartial manpage explains, there are some issues. Matching on the
full subject string can give a different result than matching segment by
segment. It's difficult to work around all the issues, and there are perhaps
other issues not listed in the manpage.

Therefore it would be nice to have a higher-level function to do multi-segment
matching.

Some thoughts: currently when doing a partial match, only a few pieces of
information are returned when the end of the subject string is encountered: the
offsets of the partial match (there are maybe other things returned, I didn't
look in details the API). To continue the matching with the next segment, only
the previous offsets are used. These offsets are not sufficient if we want
exactly the same result as matching on the full subject string.

So a general idea is to be able to stop the matching algorithm when the end of
the string is encountered, and return a private data structure that contain all
the information required to resume the algorithm where it stopped. But it's
maybe really difficult to implement, I don't know.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email