Re: [pcre-dev] [Bug 1315] \r, \n and $ matching seems to be…

Page principale
Supprimer ce message
Auteur: Giuseppe D'Angelo
Date:  
À: Ze'ev Atlas
CC: pcre exim
Sujet: Re: [pcre-dev] [Bug 1315] \r, \n and $ matching seems to be illogical or not fully documented.
On 7 November 2012 19:01, Ze'ev Atlas <zatlas1@???> wrote:
>
> Hi all
> 2 questions:
>
> 1. Is this constract:
> /something$[^x]somethingelse/m
> available in Perl as well?


Yes, http://perldoc.perl.org/perlre.html#Modifiers

> 2. Why should this constract match
> something\nsomethingelse


Because with the //m modifier the $ matches immediately before the
internal newline (but doesn't consume it). The \n character itself is
matched by the [^x] character class. [*]

> I ask the question because \n does consume one position. I understand that the $ itself does not consume any position, but in that case why do we need it, since
> /something[^x]somethingelse/m
> should hapily match
> something\nsomethingelse
> as well.


In fact it does match, the //m modifier doesn't change anything in
this case. But this last pattern doesn't enforce the first "something"
to be at the end of a line -- "somethingAsomethingelse" matches this
pattern but not the former.

> In essence, I guess that my question is what is the function of the $ assertion when not signaling the start of a string?


The '$' assertion is used for the *end* of a string, not the *start*
(that would be the caret: ^). Also, their meaning changes if the //m
modifier is in effect.

(Also, pedanticly, even without //m $ matches before a newline which
is at the end of the string.)

Hope this helps,
--
Giuseppe D'Angelo

[*] Using perl might help you to visualize the matching process. I'm
using a simplified example here. Also, I had to use //x to allow
whitespaces inside the pattern (in order to prevent $ to interpolate
the [^x] and keep the pattern itself clean), but that doesn't matter
for the discussion:

$ perl -Mre=debug -e '"A\nB" =~ /A $ [^x] B/mx'
Compiling REx "A $ [^x] B"
Final program:
   1: EXACT <A> (3)
   3: MEOL (4)
   4: ANYOF[\x00-wy-\xff][{unicode_all}] (15)
  15: EXACT <B> (17)
  17: END (0)
anchored "A"$ at 0 (checking anchored) minlen 3
Guessing start of match in sv for REx "A $ [^x] B" against "A%nB"
Found anchored substr "A"$ at offset 0...
Guessed: match at offset 0
Matching REx "A $ [^x] B" against "A%nB"
   0 <> <A%nB>               |  1:EXACT <A>(3)
   1 <A> <%nB>               |  3:MEOL(4)
   1 <A> <%nB>               |  4:ANYOF[\x00-wy-\xff][{unicode_all}](15)
   2 <A%n> <B>               | 15:EXACT <B>(17)
   3 <A%nB> <>               | 17:END(0)
Match successful!
Freeing REx: "A $ [^x] B"