Re: [pcre-dev] Proposal for a new API for PCRE

Startseite
Nachricht löschen
Autor: Ze'ev Atlas
Datum:  
To: pcre-dev@exim.org
Betreff: Re: [pcre-dev] Proposal for a new API for PCRE
Let me please clarify what I say.

in the proposal you have:
pcre2_set_global_options(context,
 uint32_t unset_option_bits),
 uint32_t set_option_bits,

This function sets and unsets on/off options that are to apply to every pattern that is processed using
this context. The second and third arguments are a combination of these bits:

PCRE2_DOLLAR_ENDONLY $ matches only at the end
PCRE2_DUPNAMES allow duplicate named subpatterns
PCRE2_JAVASCRIPT_COMPAT modified pattern interpretation
PCRE2_NEVER_UTF forbid (*UTF) in patterns
PCRE2_UTF patterns and subjects are coded in UTF
PCRE2_UCP use Unicode Properties for \d etc.

I suggest adding:

PCRE2_CHARSET_8859_1 or PCRE2_CHARSET_LATIN1
PCRE2_CHARSET_EBCDIC_1047
.
.
PCRE2_CHARSET_PRIVATE1
PCRE2_CHARSET_PRIVATE2

.
.

These would be mutually exclusive and, yes, most would imply PCRE2_NEVER_UTF

More official names would be added when we have the chartables developed and tested
 
Ze'ev Atlas




________________________________
From: Ze'ev Atlas <zatlas1@???>
To: "pcre-dev@???" <pcre-dev@???>
Sent: Thursday, August 29, 2013 12:50 AM
Subject: Re: [pcre-dev] Proposal for a new API for PCRE


I read the proposal carefully and fell in love with it.  This is actually very good and I will make more comments as I think about them.

The first point I have is that in the old PCRE, UTF8 and EBCDIC where mutually exclusive in some 'magical' way.  I would suggest that this should be in the context rather, so that the two code-pages could live happily together.  Somehow, I never accepted the fact that the 8 bit library that I compile in my port is deceivingly the same as the one that is compiled for UTF8, yet it does not produce the same results.  I know that my port is confined to that strange and obscure parallel world of z/OS, but in real life, EBCDIC data may and would trickle down to Unix and Windows based operations and ASCII data does come into z/OS servers.

Even fancier, EBCDIC has few code-pages in different countries (Greek, Turkish, Hebrew, Russian and probably some more) and so does ASCII.  Handling all this should be relegated to the context with the possibility of supplying a library of standard (and customized) code-pages (not only EBCDIC) rather then allowing the super sophisticated user to supply his/her rigidly customized character table in an obscure way.

I know that I am mixing two related issues (UTF8 vs. EBCDIC and ASCII vs. EBCDIC), but what I am trying to say is, standardize the chartable handling and manage it via the context.  We do not have to supply every code-page table in  the world, only the most popular.  I am sure that many people would develop such tables and would be happy to share them.
 
Ze'ev Atlas




________________________________
From: "ph10@???" <ph10@???>
To: Ze'ev Atlas <zatlas1@???>
Cc: "pcre-dev@???" <pcre-dev@???>
Sent: Wednesday, August 28, 2013 1:48 PM
Subject: Re: [pcre-dev] Proposal for a new API for PCRE


On Tue, 27 Aug 2013, Ze'ev Atlas wrote:

> Seeing it in the context (pun not intended)  of old style programming
> languages, to which my port (PCRE for native z/OS) is catering, I'd
> lament the extra work and complexity, although, I'd probably be able
> to do the port for that API as well.


I would argue that it is not that much more complex. In the "simple
example" section of the document, the old example is 23 lines long, and
the new example is 25 lines long. Not very different.

> I understand that I am a part of a backwards camp of people who are
> not that excited about OO approach.


As I remarked in a previous post, I don't myself think it really is an
OO approach.

>  I would like however to borrow from the OO terminology, is there a
> way to encapsulate all the extra preparations so the poor programmer
> would not need to do all this extra [confusing] coding?  


For "simple" applications, the only extra preparations are one call to
pcre2_init_context() and one call to pcre2_create_match_data(). The
simplest interface of all is of course the POSIX wrapper, and that will
remain unchanged. All the extra coding will be hidden, though the POSIX
wrapper cannot provide all the functionality of the native API.

Philip

--
Philip Hazel
--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev