Re: [pcre-dev] PCRE2 POSIX newline matching

Startseite
Nachricht löschen
Autor: Ralf Junker
Datum:  
To: pcre-dev
Betreff: Re: [pcre-dev] PCRE2 POSIX newline matching
On 03.09.2015 18:24, ph10@??? wrote:

>> Put briefly: Are the results in testoutput18 as per PCRE2 SVN 362 correct? In
>> >particular, for this section (line 61-65):
>> >
>> >/abc.def/
>> >     *** Failers
>> >No match: POSIX code 17: match failed
>> >     abc\ndef
>> >No match: POSIX code 17: match failed

>> >
>> >I get a match instead of a failure.
>> >
>> >If the failure is the expected result, I must look at my code. If a match is
>> >expected, testouput18 should be corrected.
>
> It should be a failure because \n does not match '.' by default. If you
> set the dotall modifier, it should match.


Curiously enough, \n matches '.' in my implementation. It turned out that compiling with NEWLINE_DEFAULT = 3 (CRLF) is responsible. After I changed NEWLINE_DEFAULT to its default value (2, LF), the above test failed as expected.

Now the question is: Does POSIX rely on NEWLINE_DEFAULT == 2 (LF)?

If yes, the change below makes sure that PCRE2_NEWLINE_LF is always applied regardless of NEWLINE_DEFAULT. As I see no option to modify the newline setting via the POSIX API, it modifies regcomp().

If no, maybe the test cases could be adjusted or expanded?

Ralf

----------------------------

PCRE2POSIX_EXP_DEFN int PCRE2_CALL_CONVENTION
regcomp(regex_t *preg, const char *pattern, int cflags)
{
#if NEWLINE_DEFAULT != PCRE2_NEWLINE_LF
pcre2_compile_context *ccontext;
#endif /* NEWLINE_DEFAULT != PCRE2_NEWLINE_LF */
PCRE2_SIZE erroffset;
int errorcode;
int options = 0;
int re_nsub = 0;

/* POSIX relies on PCRE2_NEWLINE_LF, adjust if NEWLINE_DEFAULT differs. */
#if NEWLINE_DEFAULT != PCRE2_NEWLINE_LF
ccontext = pcre2_compile_context_create(NULL);
if (ccontext == NULL) return REG_ESPACE;
pcre2_set_newline(ccontext, PCRE2_NEWLINE_LF);
#endif /* NEWLINE_DEFAULT != PCRE2_NEWLINE_LF */

if ((cflags & REG_ICASE) != 0)    options |= PCRE2_CASELESS;
if ((cflags & REG_NEWLINE) != 0)  options |= PCRE2_MULTILINE;
if ((cflags & REG_DOTALL) != 0)   options |= PCRE2_DOTALL;
if ((cflags & REG_NOSUB) != 0)    options |= PCRE2_NO_AUTO_CAPTURE;
if ((cflags & REG_UTF) != 0)      options |= PCRE2_UTF;
if ((cflags & REG_UCP) != 0)      options |= PCRE2_UCP;
if ((cflags & REG_UNGREEDY) != 0) options |= PCRE2_UNGREEDY;


preg->re_pcre2_code = pcre2_compile((PCRE2_SPTR)pattern, PCRE2_ZERO_TERMINATED,
     options, &errorcode, &erroffset, ccontext);
preg->re_erroffset = erroffset;


#if NEWLINE_DEFAULT != PCRE2_NEWLINE_LF
pcre2_compile_context_free(ccontext);
#endif /* NEWLINE_DEFAULT != PCRE2_NEWLINE_LF */