On Mon, 30 Aug 2010, I wrote:
> I am not promising anything, and I do not think I will be working on
> PCRE for a while, but I will bear this in mind.
I have now got round to looking at this issue in depth. At first, I
started to implement a new PCRE option, but after thinking some more
about it, I decided that you were right: it should be the default way of
working when PCRE_PARTIAL_HARD is set. I have just committed patches
that implement the following two changes to partial matching:
4. A partial match never returns an empty string (because you can
always match an empty string at the end of the subject); however the
checking for an empty string was starting at the "start of match"
point. This has been changed to the "earliest inspected character"
point, because the returned data for a partial match starts at this
character. This means that, for example, /(?<=abc)def/ gives a
partial match for the subject "abc" (previously it gave "no match").
5. Changes have been made to the way PCRE_PARTIAL_HARD affects the
matching of $, \z, \Z, \b, and \B. If the match point is at the end
of the string, previously a full match would be given. However,
setting PCRE_PARTIAL_HARD has an implication that the given string
is incomplete (because a partial match is preferred over a full
match). For this reason, these items now give a partial match in
this situation. [Aside: previously, the one case /t\b/ matched
against "cat" with PCRE_PARTIAL_HARD set did return a partial match
rather than a full match, which was wrong by the old rules, but is
now correct.]
I still have a number of other issues to look at before working towards
a new release. It will probably be getting on for Christmas before I
have dealt with all of them. Meanwhile, if you want to try the new code,
you can check out the working files using this command:
svn co svn://vcs.exim.org/pcre/code/trunk pcre
If you have any comments on these changes, now is a good time to make
them!
Philip
--
Philip Hazel