Re: [pcre-dev] match on multiline strings containing '[ ]'

Top Page
Delete this message
Author: Mihai Matei
Date:  
To: pcre-dev
CC: pcre-dev
Subject: Re: [pcre-dev] match on multiline strings containing '[ ]'
Excellent,

    Both your suggestions work. 


Thanks!


----- Original Message ----
From: Philip Hazel <ph10@???>
To: Mihai Matei <mihaimilk@???>
Cc: pcre-dev@???
Sent: Friday, November 16, 2007 4:03:29 PM
Subject: Re: [pcre-dev] match on multiline strings containing '[ ]'

On Fri, 16 Nov 2007, Mihai Matei wrote:

> Is the (.*) not the correct pattern to use if I want all the contents
> in between "[CDATA[" and "]]" , even if it's multiple lines?


. does not match newline unless you set the PCRE_DOTALL option (don't
know what it's called in your application). Another way of matching any
character including newlines is to use something like [\s\S].

Philip

--
Philip Hazel


      ____________________________________________________________________________________
Be a better sports nut!  Let your teams follow you 
with Yahoo Mobile. Try it now.  http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJFrom mihaimilk@??? Mon Nov 19 10:47:37 2007
Envelope-to: pcre-dev@???
Received: from web54202.mail.re2.yahoo.com ([206.190.39.244]:26197)
    by tahini.csx.cam.ac.uk with smtp (Exim 4.67)
    (envelope-from <mihaimilk@???>) id 1Iu4A6-0006Il-0g
    for pcre-dev@???; Mon, 19 Nov 2007 10:47:37 +0000
Received: (qmail 86424 invoked by uid 60001); 19 Nov 2007 10:47:29 -0000
X-YMail-OSG: fCr79DMVM1nEWgBPqTECFKHIBC8jEwQJj3h7IOhyMPwf7d22ZbIyyZloG4R3VD6Dcw--
Received: from [15.203.169.124] by web54202.mail.re2.yahoo.com via HTTP;
    Mon, 19 Nov 2007 02:47:28 PST
X-Mailer: YahooMailRC/818.27 YahooMailWebService/0.7.157
Date: Mon, 19 Nov 2007 02:47:28 -0800 (PST)
From: Mihai Matei <mihaimilk@???>
To: pcre-dev@???
Cc: pcre-dev@???
MIME-Version: 1.0
Message-ID: <974056.85880.qm@???>
X-Spam-Score: 2.4 (++) 
X-Spam-Status: No, score=2.4 required=5.0 tests=AWL=-0.385, BAYES_50=0.4,
    DNS_FROM_RFC_ABUSE=0.2, DNS_FROM_RFC_WHOIS=1.447,
    HTML_50_60=0.134, HTML_MESSAGE=0.6 autolearn=no version=3.1.8
Content-Type: text/plain; charset=us-ascii
X-Content-Filtered-By: Mailman/MimeDel 2.1.7
Subject: Re: [pcre-dev] match on multiline strings containing '[ ]'
X-BeenThere: pcre-dev@???
X-Mailman-Version: 2.1.7
Precedence: list
List-Id: PCRE Development <pcre-dev.exim.org>
List-Unsubscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
    <mailto:pcre-dev-request@exim.org?subject=unsubscribe>
List-Archive: <http://lists.exim.org/lurker/list/pcre-dev.html>
List-Post: <mailto:pcre-dev@exim.org>
List-Help: <mailto:pcre-dev-request@exim.org?subject=help>
List-Subscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
    <mailto:pcre-dev-request@exim.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Nov 2007 10:47:37 -0000


Excellent,

    Both your suggestions work. 


Thanks!


----- Original Message ----
From: Philip Hazel <ph10@???>
To: Mihai Matei <mihaimilk@???>
Cc: pcre-dev@???
Sent: Friday, November 16, 2007 4:03:29 PM
Subject: Re: [pcre-dev] match on multiline strings containing '[ ]'

On Fri, 16 Nov 2007, Mihai Matei wrote:

> Is the (.*) not the correct pattern to use if I want all the contents
> in between "[CDATA[" and "]]" , even if it's multiple lines?


. does not match newline unless you set the PCRE_DOTALL option (don't
know what it's called in your application). Another way of matching any
character including newlines is to use something like [\s\S].

Philip

--
Philip Hazel


      ____________________________________________________________________________________
Be a better sports nut!  Let your teams follow you 
with Yahoo Mobile. Try it now.  http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJFrom admin@??? Mon Nov 19 12:42:19 2007
Envelope-to: pcre-dev@???
Received: from wwwrun by tahini.csx.cam.ac.uk with local (Exim 4.67)
    (envelope-from <admin@???>) id 1Iu5xD-00083X-7k
    for pcre-dev@???; Mon, 19 Nov 2007 12:42:19 +0000
From: Philip Hazel <ph10@???>
Sender: admin@???
To: pcre-dev@???
X-Bugzilla-Reason: CC
X-Bugzilla-Type: newchanged
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: PCRE
X-Bugzilla-Component: Code
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: bug
X-Bugzilla-Who: ph10@???
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Priority: low
X-Bugzilla-Assigned-To: ph10@???
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields: Status Resolution
In-Reply-To: <bug-617-288@???/>
Content-Type: text/plain; charset="UTF-8"
MIME-Version: 1.0
Message-Id: <E1Iu5xD-00083X-7k@???>
Date: Mon, 19 Nov 2007 12:42:19 +0000
Subject: [pcre-dev] [Bug 617] Several questions
X-BeenThere: pcre-dev@???
X-Mailman-Version: 2.1.7
Precedence: list
Reply-To: 617@???
List-Id: PCRE Development <pcre-dev.exim.org>
List-Unsubscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
    <mailto:pcre-dev-request@exim.org?subject=unsubscribe>
List-Archive: <http://lists.exim.org/lurker/list/pcre-dev.html>
List-Post: <mailto:pcre-dev@exim.org>
List-Help: <mailto:pcre-dev-request@exim.org?subject=help>
List-Subscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
    <mailto:pcre-dev-request@exim.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Nov 2007 12:42:19 -0000


------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=617

Philip Hazel <ph10@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED





--- Comment #4 from Philip Hazel <ph10@???> 2007-11-19 12:42:18 ---
I have committed patches for everything that needs changing from this thread.
Some of the issues raised are not bugs:

1. /(?)/ is perfectly OK. It is an option setting (like /(?i)/ but without
actually setting anything. Perl accepts this.

2. /(?){0}/ is not right because you can't repeat an option setting. The size
of the repeat doesn't matter. Perl also faults this with "quantifier follows
nothing in regex".

3. Yes, recursion like /(?(DEFINE)(?<a>(?&b))(?<b>(?&a)))(?&a)/ will run out of
stack. This is no different to any programming language.

4. /(?+-a)/ and /(?-+a)/ give different messages because (?+ can only be
followed by digits, whereas (?- can be followed by an option to unset, e.g.
(?-i). However, I have improved the messages.

5. I have made it give an error for /(?(+10))/ even though Perl does not. It
seems right to diagnose references to non-existent subpatterns.

6. /(?<a>(a+(*THEN)b|(?&b)))(?(DEFINE)(?<b>a+b))/ does have 3 groups, but when
you match aaab, only two of them (1 and 2) are set. The pcre_exec() function
does not return unset groups at the end of the list, so pcretest cannot show
them. I have improved the documentation.

7. When you match /(\D+|<\d+>)*[!?]\Ka+/ with the string
aaaaaaaaaaaaaaaaaaaa1!aaaaaaaaaaaaaaaaaaaa
it will backtrack a lot. First \D+ matches all the a's once; then, when it does
not find ! or ? it will it will match all except the last a with \D+ and then
repeat by the * to match again. And so on. Nested unlimited repeats usually
have this property.

















--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email