Re: [pcre-dev] PCRE feature request - "backwards" search

Top Page
Delete this message
Author: M.Taylor
Date:  
To: pcre-dev
Old-Topics: Re: [pcre-dev] PCRE feature request
Subject: Re: [pcre-dev] PCRE feature request - "backwards" search

hi philip,
sorry about this, but...
within 60 secs of pressing "send" on my previous email, I realised
there might be a problem with your suggestion...
on a standard forwards search, after my editor has found a match, it
uses the end-point of the first match as a starting point for a repeat
search, so that the user does not get multiple matches within the
already matched substring.
I would want the same behaviour when going backwards, so I actually
need two things from pcre-exec (not one as I originally suggested):
1. the option to return the last (most "right-hand") match instead of
the first; *and*
2. return the left-hand (or "start") point of the matching substring,
in addition to the right-hand (or "end") point of the match.
I am assuming here, that I can "lie" about the real length of the
subject string to limit the "right-hand" end-point of the subject
string that pcre-exec searches.
regards,
mark.
Philip Hazel said the following on 17/09/2010 15:54:

On Thu, 16 Sep 2010, M.Taylor wrote:



I would really like to provide backwards RE searches, and to do this I
would need a change made to PCRE (or at least I think I would - please
correct me if I am wrong here).


...


But all I need is this: given a string to be searched and RE to search
for, then return the last match instead of the first. That's it.


There are two possible interpretations of "backwards searches". As well
as the one you have chosen, some people have thought of a totally
backwards search process. For example, for a regex such as "a\d+b" (that
is, "a" followed by any number of digits, followed by "b", they think of
scanning the string from right to left, looking for "b", followed by any
number of digits, followed by "a". In many cases, that might be the same
thing, but one can construct examples where it isn't.

That was just a comment. As to your actual question:

I have had the feature "find the last occurrence of x in a line" in *my*
text editor for a long time, and it works with regex (using PCRE,
naturally). I wrote the code so long ago (it's many years since it was
last modified) that I have forgotten how it works, and will now have to
go and read the code...

Hmm... how cunning (though I say so myself). If the regex is <REGEX>, it
transforms it into .*<REGEX> before compiling and matching. What this
does is to make pcre_exec() zip to the end of the string (because of the
.*), and then come backwards along along the string, looking for a match.

Will this solve your problem?

Philip



-- 
Mark Taylor
Corporate Information and Computing Services
University of Sheffield
Email [1]M.Taylor@???
Tel 0114 222 1145 (direct line)
    0114 222 2000 ext 21145 (via switchboard)


References

1. mailto:M.Taylor@Sheffield.ac.uk