[pcre-dev] [Bug 1616] New: Line begin anchor fits not at end…

Top Page
Delete this message
Author: David Gausmann
Date:  
To: pcre-dev
New-Topics: [pcre-dev] [Bug 1616] Line begin anchor fits not at end of text, if the last character is a new line character, [pcre-dev] [Bug 1616] Line begin anchor fits not at end of text, if the last character is a new line character
Subject: [pcre-dev] [Bug 1616] New: Line begin anchor fits not at end of text, if the last character is a new line character
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1616
           Summary: Line begin anchor fits not at end of text, if the last
                    character is a new line character
           Product: PCRE
           Version: 10.10 (PCRE2)
          Platform: Other
        OS/Version: Windows
            Status: NEW
          Severity: bug
          Priority: low
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: david.gausmann@???
                CC: pcre-dev@???



Hello there,

I've found a mysterious behaviour.

My regular expression has the following pattern: ^
The search option is PCRE2_MULTILINE.
The text, I am search through, has the following content: \n\n\n
The new line behaviour is PCRE2_NEWLINE_ANYCRLF.

I whould expect, that I get four results:
- Offset 0
- Offset 1
- Offset 2
- Offset 3

Instead I get only the first three results.
If I add a character to the end of my haystack text, then I get four results.

My C++ code looks like that (I've removed some irrelevant information):

----------------------------------------------------------
while(nStart <= nLength)
{
  // Search for next match
  int nResult = pcre2_match(this->m_pRegEx,
reinterpret_cast<PCRE2_SPTR16>(wszTextOffset), static_cast<size_t>(nLength),
static_cast<size_t>(nStart), 0, pMatchData.get(), nullptr);
  if(nResult < 0)
  {
    switch(nResult)
    {
    case PCRE2_ERROR_NOMATCH:
      goto NoMatch;


    default:
      // Throw RegEx error
      // ...
    }
  }


// Copy found match
// ...

  if(!this->m_bGlobal)
    break;
  if(puOVector[0] == puOVector[1])
    nStart = puOVector[1] + 1;    // Zero length match (otherwise we get an
endless loop)
  else
    nStart = puOVector[1];
}
----------------------------------------------------------


The loop is executed four times, but in the fourth loop pcre2_match returns
PCRE2_ERROR_NOMATCH.

Is this a bug or must I do something differently to allow zero-length matches
like this at the end of text?


Kind Regards
David Gausmann


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email