[pcre-dev] [Bug 2472] New: Feature Request: PCRE2_SUBSTITUT…

Top Page

Reply to this message
Author: admin
To: pcre-dev
Subject: [pcre-dev] [Bug 2472] New: Feature Request: PCRE2_SUBSTITUTE_LITERAL option for pcre2_substitute without processing replacement strings

            Bug ID: 2472
           Summary: Feature Request: PCRE2_SUBSTITUTE_LITERAL option for
                    pcre2_substitute without processing replacement
           Product: PCRE
           Version: N/A
          Hardware: x86
                OS: Windows
            Status: NEW
          Severity: wishlist
          Priority: medium
         Component: Code
          Assignee: ph10@???
          Reporter: ew3652@???
                CC: pcre-dev@???


I am following the guidelines on https://pcre.org/ to file a feature request by
opening a bug ticket.
I also tried searching for literal and pcre2_substitute in the closed and open
bug section but was not able to find a similar feature request.

I think an additional option e.g. PCRE2_SUBSTITUTE_LITERAL
which specifies that the replacement string in pcre2_substitute
should not be processed at all would be useful for many programs
that utilize pcre2_substitute.

I believe a common use case is when arbitrary replacement strings
are obtained from an external source and copying replacement strings
for preprocessing/escaping is to be avoided.
One example that should be quite common are many long strings
with monetary values such as

            "....amounts to $10 in value...."

(here the replacement string refers to the currency symbol $ for a monetary
dollar value).
Currently this would have to be escaped as

            "....amounts to $$10 in value...."

or with extended syntax

            "\Q....amounts to $10 in value....\E"

according to https://pcre.org/current/doc/html/pcre2api.html#substitutions.
My personal use case is obtaining the replacement strings inside a
user defined function of a database application.

Comparison to other PCRE2 options
A similar option PCRE2_LITERAL is available for pcre_compile despite regular
expressions not being efficient for its use case.
The proposed option would be the counterpart to PCRE2_SUBSTITUTE_EXTENDED.
While PCRE2_SUBSTITUTE_EXTENDED increases replacement string processing
complexity, PCRE2_SUBSTITUTE_LITERAL would decrease it.

Disadvantages of Alternatives
Escape Replacement String
Replacement strings need to be copied to a new buffer and escaped.
This requires extra memory and knowledge of which characters are to be
escaped ($).

Extended syntax e.g. \Q \E
Extended sytnax also requires a new copy and
adding \Q and \E as well as escaping \E in the replacement string.

Substitution callouts
A placeholder replacement string could be handed to pcre2_compile
(e.g. empty string) and literal replacement handled by a callout.
This is not only cumbersome but also makes
not easy to use because callouts are not called for overflows.

Implementing a separate routine based on pcre2_substitute
Implementing a correct routine that behaves as pcre2_substitute does
is not trivial and some internal methods that pcre2_substitute uses are
not exported.
(e.g. UTF checks or direct access to the callouts set in the match context
which would require a different parameter set in the separate
implementation to handle callouts).
- Actually get_callout and get_substitute_callout functionality
with the public headers seems something that could also be useful but is
not part of this feature request).

Implementation Thoughts
I hope some thoughts on untested code are appropriate here.
I could not find a guideline with respect to that and I saw some code in other
My first impression is that since the size of the replacement is known and it
is constant, one could call the CHECKMEMCPY macro in pcre2_substitute.c before
replacement processing and skip entering the replacement string processing loop
with the next relevant section being the callout section.

     BOOL all_literal = ((options & PCRE2_SUBSTITUTE_LITERAL)!=0);
     if (all_literal) {
        // skip replacement processing loop ...
     } else {
        // replacement processing loop ...
     //callout section

The cost of such implementation would then be an option bit of the match
options and one additional if check within the global loop of pcre2_substitute.

You are receiving this mail because:
You are on the CC list for the bug.