[pcre-dev] [Bug 2283] RE::error() is nonempty even if compil…

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2283] RE::error() is nonempty even if compilation succeeds
https://bugs.exim.org/show_bug.cgi?id=2283

--- Comment #4 from Philip Hazel <ph10@???> ---
Thanks for the patch. Having looked at it and the code, I now understand what
is going on, but I don't think your patch is the right way to fix it. The
relevant bit of pcre2cpp.cc code is this:

  // Special treatment for anchoring.  This is needed because at
  // runtime pcre only provides an option for anchoring at the
  // beginning of a string (unless you use offset).                
  //                                                                
  // There are three types of anchoring we want:                
  //    UNANCHORED      Compile the original pattern, and use
  //                    a pcre unanchored match.
  //    ANCHOR_START    Compile the original pattern, and use
  //                    a pcre anchored match.        
  //    ANCHOR_BOTH     Tack a "\z" to the end of the original pattern
  //                    and use a pcre anchored match.                


  const char* compile_error;                                  
  int eoffset;                   
  pcre* re;                   
  if (anchor != ANCHOR_BOTH) {
    re = pcre_compile(pattern_.c_str(), pcre_options,
                      &compile_error, &eoffset, NULL);
  } else {                                 
    // Tack a '\z' at the end of RE.  Parenthesize it first so that
    // the '\z' applies to all top-level alternatives in the regexp.
    string wrapped = "(?:";  // A non-counting grouping operator
    wrapped += pattern_;
    wrapped += ")\\z";
    re = pcre_compile(wrapped.c_str(), pcre_options,
                      &compile_error, &eoffset, NULL);


The problem is that the wrapping is turning, for example, (*UCP)xxx into
(?:(*UCP)xxx)\z which isn't right because (*UCP) etc must be right at the start
of the pattern. It needs to turn into (*UCP)(?:xxx)\z to do what the code
wants. I am not sure how to do this in C++. The code needs to check for all the
start of pattern specials, possibly more than one of them, to find where the
pattern really starts. Can you code that easily? (This problem would not arise
in PCRE2, which has an ENDANCHORED option, but of course there's no C++
wrapper.)

I'm not sure about your --enable issue but try without --enable-utf, because
--enable-unicode-properties should imply it.

--
You are receiving this mail because:
You are on the CC list for the bug.