[pcre-dev] Multisegment matching with pcre_exec()

Top Page
Delete this message
Author: ND
Date:  
To: Pcre-dev
Subject: [pcre-dev] Multisegment matching with pcre_exec()
Hi, Philip!

What do you think about adding following PCRE behavior:

The return code PCRE_ERROR_MULTISEGMENT raised, and matching abandons immediately if at any time during the matching process PCRE needs to check (not bumpalong) the next symbol of subject string, but discovers an end of string. An extra parameter - last_bumpalong_offset - is returned.

IMHO, it will allow to organize true multisegment matching. It takes into account an expected operations order. For example, we can be shure that 'a.{0,10}c|b' finds at first the 'a--b--c' in string, disparted by two consequent segments: 'a--b-' and '-c'. This possibilities are non-reachable from PCRE_PARTIAL using.

This behavior may be activated by PCRE_MULTISEGMENT option for pcre_exec(). Using of this option will automatically disable some optimizations. But user can organize the control (oh, non-optimal, but full) over the streamed data without revolutionary changes in PCRE engine.

Regards, Michael.