Re: [pcre-dev] /x modifier bug when using # comments in RE ?

Top Page
Delete this message
Author: GAUTIER Herve
Date:  
To: pcre-dev@exim.org
Subject: Re: [pcre-dev] /x modifier bug when using # comments in RE ?

On 07/06/2014 10:46, ph10@???<mailto:ph10@hermes.cam.ac.uk> wrote:

On Fri, 6 Jun 2014, ph10@???<mailto:ph10@hermes.cam.ac.uk> wrote:

> On Fri, 6 Jun 2014, GAUTIER Herve wrote:
>
> > But, my remaining problem is that if I have compiled PCRE with
> > --enable-newline-is-crlf on a system (Linux) which has LF as newline,
> > when using pcretest, how can I enter the newline sequence CR LF when
> > entering my RE ?
>
> CTRL-M, Return (maybe?)


I was interrupted before I could try that ... it doesn't, of course,
work since Linux turns CTRL-M (= return) into LF.

Yes ^^ ! Does not work !


I wondered afterwards
whether a better approach for you might be to leave PCRE configured at
its default (newline == LF) and make your application that processes
CRLF-terminated files set the appropriate PCRE option
(PCRE_NEWLINE_CRLF) when calling pcre_compile(). Would that work for
you?

Not sure it will work.
So far I understand, the choosen way to recognize newline at compilation time or at run time, is applied on RE _and_ DATA.
>From my point of view, minor remaining problems are:

- Impossible to enter the sequence CR LF when defining RE using pcretest.
- Impossible to choose independently the newline for RE and DATA.
- I still do not understand why when PCRE is compiled with --enable-newline-is-crlf on a system (Linux) which has LF as newline, using /x modifier, PCRE lib or prcetest ignored the newlines (LF) in my RE:

--8<------------------------------------------------------------------
$ pcretest
PCRE version 6.6 06-Feb-2006

re> /^

    >               [+-]?
    >               (
    >                       \d+\.\d+
    >                     |\d+\.
    >                    |\.\d+
    >              )
    >              ([eE][+-]?\d+)?
    >              $/xm

data> START\r\n1.2E3\r\nEND

0: 1.2E3\x0d
1: 1.2
2: E3
data>
--8<------------------------------------------------------------------

Here it matches, I don't know why. In my opinion, it should not because I have defined newline as CR LF at library compilation time, so when I press Return key, a LF is enter in my RE but this LF is ignored (by PCRE library or prcetest program).


Nevermid, thank for the great work, it will test a work around soon.
Maybe using --enable-newline-is-anycrlf at PCRE lib compilation or still using --enable-newline-is-crlf and "automagically" add CR before LF in my regex definition.

Rv


--
Hervé GAUTIER