Craig Silverstein wrote:
>> I peeked at the pcrecpp source code and it doesn't support all the
>> newline values either.
>>
>
> What do you mean by "doesn't support"? Do you mean in RE_Options?
> That class doesn't support most pcre flags, which is why we allow the
> option_flags constructor. I'm guessing you're referring to something
> else, but I don't know what it might be.
>
> craig
>
>
>
>
Hi Craig,
I saw this in pcrecpp.cc, since newline can also be configured to (since
version 7.0) to -1 (ANY) or (since version 7.1) to -2 (ANYCRLF) it looks
to me like something is missing, and that unexpected returns are likely.
I'm not a programmer, I may be misunderstanding.
========================
// Returns PCRE_NEWLINE_CRLF, PCRE_NEWLINE_CR, or PCRE_NEWLINE_LF.
// Note that PCRE_NEWLINE_CRLF is defined to be P_N_CR | P_N_LF.
static int NewlineMode(int pcre_options) {
// TODO: if we can make it threadsafe, cache this var
int newline_mode = 0;
/* if (newline_mode) return newline_mode; */ // do this once it's cached
if (pcre_options & (PCRE_NEWLINE_CRLF|PCRE_NEWLINE_CR|PCRE_NEWLINE_LF)) {
newline_mode = (pcre_options &
(PCRE_NEWLINE_CRLF|PCRE_NEWLINE_CR|PCRE_NEWLINE_LF));
} else {
int newline;
pcre_config(PCRE_CONFIG_NEWLINE, &newline);
if (newline == 10)
newline_mode = PCRE_NEWLINE_LF;
else if (newline == 13)
newline_mode = PCRE_NEWLINE_CR;
else if (newline == 3338)
newline_mode = PCRE_NEWLINE_CRLF;
else
assert("" == "Unexpected return value from pcre_config(NEWLINE)");
}
return newline_mode;
=========================
Also this in global replace function, from functional point of view, I
think should also be happening if the newline is ANY or ANYCRLF. However
now that the mode can be changed from the beginning of the pattern, its
really not enough to be getting it from config. For example, library
could be configured with CRLF but pattern could have (*LF). OTOH, empty
matches are a peculiarity in any event so the importance of doing this
is small. Could probably do it regardless of newline mode, whenever
empty string just matched at CR and next char is LF.
// If the current char is CR and we're in CRLF mode, skip LF too.
// Note it's better to call pcre_fullinfo() than to examine
// all_options(), since options_ could have changed bewteen
// compile-time and now, but this is simpler and safe enough.
if (start+1 < static_cast<int>(str->length()) &&
(*str)[start] == '\r' && (*str)[start+1] == '\n' &&
NewlineMode(options_.all_options()) == PCRE_NEWLINE_CRLF) {
matchend++;
}
=========================
One other thing I noticed, it does not appear the functions support
retrieval of named subpatterns, e.g., for replacement text. I say that
from looking at the unittest.
Mind you I am speaking from the position of wishing I were in a better
position to evaluate pcrecpp, especially since thanks to Cmake and the
debated changes, it now compiles in Visual Studio.
It gets these warnings (in VS Express 2005)
pcrecpp.cc
..\pcre-7.4-RC2\pcrecpp.cc(708) : warning C4244: '=' : conversion from 'long' to 'short', possible loss of data
..\pcre-7.4-RC2\pcrecpp.cc(719) : warning C4244: '=' : conversion from 'unsigned long' to 'unsigned short', possible loss of data
Regards,
Sheri