[pcre-dev] [Bug 1315] \r, \n and $ matching seems to be ill…

Top Page
Delete this message
Author: Christoph Anton Mitterer
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1315] \r, \n and $ matching seems to be illogical or not fully documented.
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1315




--- Comment #3 from Christoph Anton Mitterer <calestyo@???> 2012-11-08 23:55:43 ---
Hey Zoltan.

(In reply to comment #1)
> next time you should be a little more polite. That could help a lot.

I don't see any particular place that I'd consider impolite?! Perhaps a bit
stiff... well it was very late in the morning and I was already nearly falling
asleep.
Anyway, be assured, that I didn't meant anything impolitely :)

> What you misunderstand here is that $ is not a character range, such as [a-z],
> it is a zero-width assertion.

So you mean in the sense as "till/at the end of the line"?!


> e.g: /a$[^x]b/m matches to a\nb, since $ itself just checks a condition, but
> does not change the character position.

Uhm... not yet sure whether I understand...

> \r[^\n] this pattern matches to two character long strings. E.g, it matches to
> \rx but not to \r or \r\n. Here, [^\n] is a character range, so it does not
> match at the end of the input.

So ok... that means basically a pattern like \r[^\n] needs at least to
characters to match.
Am I right then, that \r\n is _not_ matched here, because \n doesn’t appear
at all as a character (because it’s considered not a character in that sense
but the "mark up" for line separating)?


> \r[^$] also requires two characters, the first one is \r, followed by anything
> which is not $ (here, $ is a plain character), so it matches to \r\n, but not
> to \r$.

Ah.... ok so [...] always means there needs to be a character... and if I put
in ^$ it just says "at that position, there must be a character, but the
end-of-line condition must NOT be met.


> I hope now everything is clear.

Let me just re-evaluate my cases...

a) It's still not clear why a plain $ doesn't match... I would expect it to
_always_ match... as I would for ^

b) The case:
$ hd $file
00000000  41 0d 0a                                          |A..|
00000003
$ pcregrep '\n' $file ; echo $?
1
Is that, because in UNIX (or rather when the end-of-line is set to \n... \n
will never match, because again, the \n is not considered a char but rather the
condition "end-of-line".




This goes now rather towards support and less towards (invalid) bug reporting:
Is there a way in PCRE to do what I wanted... e.g. matching a CR, that is not
followed by an LF?
Or can I check for e.g. LFCRs, who are not actually just CRLFCR i.e. checking
for LFCR which are not prefixed by another CR.

So basically a way to check for the end-of-line character by itself and not as
a condition.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email