[pcre-dev] [Bug 1315] New: \r, \n and $ matching seems to be illogical or not fully documented.

Author: Christoph Anton Mitterer
Date:
To: pcre-dev
Subject: [pcre-dev] [Bug 1315] New: \r, \n and $ matching seems to be illogical or not fully documented.

------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1315
           Summary: \r, \n and $ matching seems to be illogical or not fully
                    documented.
           Product: PCRE
           Version: N/A
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: bug
          Priority: high
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: calestyo@???
                CC: pcre-dev@???

Hi.

I'm a bit confused by the following, which I can't either explain by what's
documented in pcrepattern(3)'s "NEWLINE CONVENTIONS" section.

I'm under UNIX, so $ == LF

$ file=cr_at_file_end
$ hd $file
00000000  41 0d                                             |A.|
00000002
$ pcregrep '\r[^\n]' $file ; echo $?
1
====> WHY? THere is no \n after the 0d
$ pcregrep '\r[^$]' $file ; echo $?
1
====> WHY? I guess a line terminated not be the end-of-line character(s) but
rather by the end-of-file is also considered to match $ there. (*)
$ pcregrep '\n' $file ; echo $?
1
=> CLEAR
$ pcregrep '\r' $file ; echo $?
0
=> CLEAR
$ pcregrep '$' $file ; echo $?
1
====> WHY? If the above (*) is true, than this should also match, right?

$ file=cr_not_at_file_end
$ hd $file
00000000  41 0d 41                                          |A.A|
00000003
$ pcregrep '\r[^\n]' $file ; echo $?
A0
=> CLEAR
$ pcregrep '\r[^$]' $file ; echo $?
A0
=> CLEAR
$ pcregrep '\n' $file ; echo $?
1
=> CLEAR
$ pcregrep '\r' $file ; echo $?
A0
=> CLEAR
$ pcregrep '$' $file ; echo $?
1
====> WHY? If the above (*) is true, than this should also match, right?

$ file=lf_at_file_end
$ hd $file
00000000  41 0a                                             |A.|
00000002
$ pcregrep '\r[^\n]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\r[^$]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\n' $file ; echo $?
1
====> WHY? There is a 0a.
$ pcregrep '\r' $file ; echo $?
1
=> CLEAR
$ pcregrep '$' $file ; echo $?
1
====> WHY? There even IS the end-of-line character, not to talk about (*)

$ file=lf_not_at_file_end
$ hd $file
00000000  41 0a 41                                          |A.A|
00000003
$ pcregrep '\r[^\n]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\r[^$]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\n' $file ; echo $?
1
====> WHY? There is a 0a.
$ pcregrep '\r' $file ; echo $?
1
=> CLEAR
$ pcregrep '$' $file ; echo $?
1
====> WHY? If the above (*) is true, than this should also match, right?

$ file=crlf_at_file_end
$ hd $file
00000000  41 0d 0a                                          |A..|
00000003
$ pcregrep '\r[^\n]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\r[^$]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\n' $file ; echo $?
1
====> WHY? There is a 0a.
$ pcregrep '\r' $file ; echo $?
A
0
=> CLEAR
$ pcregrep '$' $file ; echo $?
1
====> WHY? There even IS the end-of-line character, not to talk about (*)

$ file=crlf_not_at_file_end
$ hd $file
00000000  41 0d 0a 41                                       |A..A|
00000004
$ pcregrep '\r[^\n]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\r[^$]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\n' $file ; echo $?
1
====> WHY? There is a 0a.
$ pcregrep '\r' $file ; echo $?
A
0
=> CLEAR
$ pcregrep '$' $file ; echo $?
1
====> WHY? There even IS the end-of-line character, not to talk about (*)

Now as you can see, behaviour is a bit strange:
- (*) It seems that the last line of the file is implicitly appended by an
end-of-line character (which is fine of course) when there is none.
- sometimes, \n behaves as $, sometimes not; IMHO... WTF?!
- single "$" is not matched regardless of whether there is an end of line
character or not

So either this is wrong or better said illogical behaviour, or there might be
something missing (hope I haven't overseen anything) in the documentation.

In the later case I'd guess at least:
I. That the _current_ end of line character(s) are implicitly added to the
file's end when there are none
II. Why single '$' doesn't match (which is especially weird if (I) is true)
III. Why \n sometimes behaves like $, sometimes not... (of course the same for
the other end of line sequences)

Thanks,
Chris.

--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

This message is part of the following thread:
	the complete thread tree sorted by date

	Zoltan Herczeg at

[pcre-dev] [Bug 1315] New: \r, \n and $ matching seems to b…