------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1315
Summary: \r, \n and $ matching seems to be illogical or not fully
documented.
Product: PCRE
Version: N/A
Platform: All
OS/Version: All
Status: NEW
Severity: bug
Priority: high
Component: Code
AssignedTo: ph10@???
ReportedBy: calestyo@???
CC: pcre-dev@???
Hi.
I'm a bit confused by the following, which I can't either explain by what's
documented in pcrepattern(3)'s "NEWLINE CONVENTIONS" section.
I'm under UNIX, so $ == LF
$ file=cr_at_file_end
$ hd $file
00000000 41 0d |A.|
00000002
$ pcregrep '\r[^\n]' $file ; echo $?
1
====> WHY? THere is no \n after the 0d
$ pcregrep '\r[^$]' $file ; echo $?
1
====> WHY? I guess a line terminated not be the end-of-line character(s) but
rather by the end-of-file is also considered to match $ there. (*)
$ pcregrep '\n' $file ; echo $?
1
=> CLEAR
$ pcregrep '\r' $file ; echo $?
0
=> CLEAR
$ pcregrep '$' $file ; echo $?
1
====> WHY? If the above (*) is true, than this should also match, right?
$ file=cr_not_at_file_end
$ hd $file
00000000 41 0d 41 |A.A|
00000003
$ pcregrep '\r[^\n]' $file ; echo $?
A0
=> CLEAR
$ pcregrep '\r[^$]' $file ; echo $?
A0
=> CLEAR
$ pcregrep '\n' $file ; echo $?
1
=> CLEAR
$ pcregrep '\r' $file ; echo $?
A0
=> CLEAR
$ pcregrep '$' $file ; echo $?
1
====> WHY? If the above (*) is true, than this should also match, right?
$ file=lf_at_file_end
$ hd $file
00000000 41 0a |A.|
00000002
$ pcregrep '\r[^\n]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\r[^$]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\n' $file ; echo $?
1
====> WHY? There is a 0a.
$ pcregrep '\r' $file ; echo $?
1
=> CLEAR
$ pcregrep '$' $file ; echo $?
1
====> WHY? There even IS the end-of-line character, not to talk about (*)
$ file=lf_not_at_file_end
$ hd $file
00000000 41 0a 41 |A.A|
00000003
$ pcregrep '\r[^\n]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\r[^$]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\n' $file ; echo $?
1
====> WHY? There is a 0a.
$ pcregrep '\r' $file ; echo $?
1
=> CLEAR
$ pcregrep '$' $file ; echo $?
1
====> WHY? If the above (*) is true, than this should also match, right?
$ file=crlf_at_file_end
$ hd $file
00000000 41 0d 0a |A..|
00000003
$ pcregrep '\r[^\n]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\r[^$]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\n' $file ; echo $?
1
====> WHY? There is a 0a.
$ pcregrep '\r' $file ; echo $?
A
0
=> CLEAR
$ pcregrep '$' $file ; echo $?
1
====> WHY? There even IS the end-of-line character, not to talk about (*)
$ file=crlf_not_at_file_end
$ hd $file
00000000 41 0d 0a 41 |A..A|
00000004
$ pcregrep '\r[^\n]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\r[^$]' $file ; echo $?
1
=> CLEAR
$ pcregrep '\n' $file ; echo $?
1
====> WHY? There is a 0a.
$ pcregrep '\r' $file ; echo $?
A
0
=> CLEAR
$ pcregrep '$' $file ; echo $?
1
====> WHY? There even IS the end-of-line character, not to talk about (*)
Now as you can see, behaviour is a bit strange:
- (*) It seems that the last line of the file is implicitly appended by an
end-of-line character (which is fine of course) when there is none.
- sometimes, \n behaves as $, sometimes not; IMHO... WTF?!
- single "$" is not matched regardless of whether there is an end of line
character or not
So either this is wrong or better said illogical behaviour, or there might be
something missing (hope I haven't overseen anything) in the documentation.
In the later case I'd guess at least:
I. That the _current_ end of line character(s) are implicitly added to the
file's end when there are none
II. Why single '$' doesn't match (which is especially weird if (I) is true)
III. Why \n sometimes behaves like $, sometimes not... (of course the same for
the other end of line sequences)
Thanks,
Chris.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email