Autor: Philip Hazel Data: Dla: 1203 CC: pcre-dev Temat: Re: [pcre-dev] [Bug 1203] New: desired features:
match-and-substitute and permutations
On Thu, 26 Jan 2012, Steve Andrews wrote:
> (1) I need a library function that can perform pattern matching substitutions.
> It would be declared sort of like this:
>
> int matchandsub(pattern1,string1,pattern2,string2);
>
> For example, suppose I gave this function pattern1="a.c", string1="abc", and
> pattern2="j.k"; string2 is only for output. I want the function to see that
> yes, string1 is a correct match for pattern1 and that the letter b is the
> substituted letter. Then, it puts 'b' into pattern2, returning it in string2
> as "jbk". In general, pattern1 and pattern2 would have to have the same
> expression forms (or, more precisely, pattern2 would need to have a subset of
> the expressions used in pattern1).
Thanks for taking the time to post your wishlist item. However, the
above description is extremely special-purpose to your requirements; I
cannot see it being of wide general use.
I have always seen PCRE purely as a matching library, partly because
there is enough work in maintaining that selection of facilities, and
partly because, apart from some very simple cases, most applications
that want to do replacements have some kind of rather special
requirement, as indeed you do.
Having said that, I suppose all you are really asking for is a variant
on a general search-and-replace, where the Perl-like usage would be
matchandsub("a(.)c", "abc", "j$1k", output)
Perhaps one day somebody who is keen enough will write another library
that sits on top of PCRE, and provides a number of different
general-purpose "search and replace" facilities. But don't hold your
breath.
There is one item in the Contrib directory of the PCRE FTP site that
does substitution. It is called pcre_subst.tar.gz but I have no
knowledge of its internals. It may be somewhere to start if you end up
writing your own code.
> (2) I need regular expression support for pattern permutation. For example,
> it's standard that the pattern "abc|def" matches to either "abc" or "def".
> However, I want to write the pattern "abc&def" and to have this match to either
> "abcdef" or to "defabc". This would be a significant new addition to regular
> expression matching, but I think would be highly compatible with the current
> design philosophy. It would also make PCRE more useful for bioinformatics and
> other applications.
Introducing a new metacharacter such as & is a big step, and of course
incompatible with Perl and other regex engines. Again, this requirement
does strike me as somewhat special-purpose.
You can, in fact, write a pattern using existing syntax that does what
you want in simple cases:
(abc)(def)|(?2)(?1)
However, subroutine calls such as (?2) are atomic, so if "abc" is some
complicated subpattern, this is not exactly the same as
(abc)(def)|(def)(abc)
Instead, it is equivalent to
(abc)(def)|(?>def)(?>abc)
Applications that need to use very complicated patterns often use
programs to write the patterns. For example, you could write a function
such as
make_permuted_pattern("abc", "def", output)
which returned the pattern "abcdef|defabc". However, if you use
capturing parentheses in either of the parts, there would be issues
about numbering (relative numbering could help).
As you have probably gathered by now, I am unlikely to work on either of
these things myself - though of course I can't predict whether anyone
else will feel inspired.