Re: [pcre-dev] Newline feature request -- matching metachara…

Top Pagina
Delete this message
Auteur: Sheri
Datum:  
Aan: pcre-dev
Onderwerp: Re: [pcre-dev] Newline feature request -- matching metacharacter?
Hi Philip,

Philip Hazel wrote:
> When PCRE is not in UTF-8 mode, it can still get characters with values
> up to 255, and this includes the Unicode 0x85 character - which is of
> course the same as ISO 8859-1.
>
>


OK let me make my case. Speaking as a user.

Perl already has a metacharacter that matches whichever line ending the
(ascii) file uses: \n. Perl is smart enough to do it based on the data
content (I know there's more to it under the covers, but that is how it
looks to the user).

PCRE can't use \n in the same way.

You invented \R to service line breaks on Unicode. PCRE doesn't process
Unicode except in utf8 mode. So the Unicode user loses nothing if its
meaning is changed for ASCII. Remember, on Windows \x85 is a horizontal
ellipsis. Form Feed and Vertical Tab are not valid line breaks either
(though not as objectionable as \x85).

T'would be convenient to have a metacharacter that matches Ascii line
breaks. T'would address a shortcoming PCRE has already in comparison
with Perl. If you have a philosophical objection to \R, maybe you could
choose something else.

Regards,
Sheri