[pcre-dev] [Bug 1554] support subject strings with invalid U…

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1554] support subject strings with invalid UTF-8 sequences
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1554




--- Comment #5 from Philip Hazel <ph10@???> 2014-12-20 18:28:52 ---
On Sat, 20 Dec 2014, Vincent Lefevre wrote:

> It's also a pcregrep problem, and not solved there either! Anyway, even
> patterns like 'z[0-9]zzz' are concerned, as seen above.


Thanks for all your comments and tests, which are interesting. Of
course, any program can always be improved, but at this stage I'm not
concentrating on performance (I'm concentrating on getting the new API
(PCRE2) released).

Incidentally, do you know if the pcregrep you were using has JIT
support? (pcretest -C will tell you.) However, when a file has many
short lines, a great proportion of the work in a grep will be handling
the lines, not running matches, so I wouldn't expect JIT support to make
much difference.

pcregrep was originally written as a demonstration program for the PCRE
library, but people liked it, so it became a "real" program. I suspect
it works in a different way to GNU grep - not only because it uses a
different matching engine, but also because it supports features like
the -M (multiline) option, which I don't think grep supports. As for
performance issues, it is the underlying library that we concentrate on,
rather than pcregrep.

I have to say that implementing the new library API (PCRE2) is probably
the last big project that I will undertake - I'm getting too old :-( so
any further major development of PCRE/pcregrep will have to be done by
others. Over its lifetime PCRE has often been improved by patches
submitted by users who have been clever enough to spot ways of improving
it, either with new functionality or performance improvement. I hope
this will continue.

Philip


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email