Re: [pcre-dev] 7.4-RC2 (pcrecpp/Craig)

Startseite
Nachricht löschen
Autor: Craig Silverstein
Datum:  
To: silvermoonwoman
CC: pcre-dev
Betreff: Re: [pcre-dev] 7.4-RC2 (pcrecpp/Craig)
} I saw this in pcrecpp.cc, since newline can also be configured to
} (since version 7.0) to -1 (ANY) or (since version 7.1) to -2
} (ANYCRLF) it looks to me like something is missing

Gotcha! It looks like it to me too. In debug mode, the assert should
trigger, which would warn us that something is wrong -- have you ever
seen that? In any case, it does look like something ought to be
fixed.

Philip, I know you try to stay away from c++ code, but NewlineMode is
a very C-like function. If you know the proper way to fix this, do
you mind making a go of it? I've tried to stay as ignorant as I can
of all the newline issues, since they seem really hairy. :-)

}       // If the current char is CR and we're in CRLF mode, skip LF too.
}       // Note it's better to call pcre_fullinfo() than to examine
}       // all_options(), since options_ could have changed bewteen
}       // compile-time and now, but this is simpler and safe enough.
}       if (start+1 < static_cast<int>(str->length()) &&
}           (*str)[start] == '\r' && (*str)[start+1] == '\n' &&
}           NewlineMode(options_.all_options()) == PCRE_NEWLINE_CRLF) {
}         matchend++;
}       }


Ah, I see -- this code triggers only when we "really" match the empty
string, to make sure we don't go into an infinite loop. We advance
one character in that case. As an optimization, we're willing to
consider \r\n to be one character in the proper mode, but it's not
really "wrong" for us just to eat the \r -- we're doing something
non-canonical in any case, so whatever we do is fine (ignoring utf8
issues, which we handle correctly right below this).

So it would be fine to add PCRE_NEWLINE_ANY/etc here, but it's ok to
leave them out, and ok if the config option doesn't match the current
state of the RE.

} One other thing I noticed, it does not appear the functions support
} retrieval of named subpatterns, e.g., for replacement text. I say
} that from looking at the unittest.

I believe that's true -- we just do positional assignment. You can
read the comments at the top of pcrecpp.h to see examples of the
supported use cases.

craig