Re: [pcre-dev] New API proposal revision

Top Page
Delete this message
Author: Rv
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] New API proposal revision
Le 2014/03/08 13:27, ph10@??? a écrit :
> Thank you for taking the time to read it and comment.


My pleasure. I just discover and use your work, I used to use POSIX
regex of GNU libc, it nice a to have so much powerfull library than PCRE
! Thank again !
I am happy you understand "my english" !

>> - For all functions concerning context, I will prefer to always start the name
>> of the function by pcre2_ctx_* or pcre2_context_*
> I guess I used the names that I did because of English word order. For
> example, "free context" is more natural than "context free" because
> English puts the verb before the object.
>
> It is not too late to change (but as I've started - slowly - working on
> a framework, it is almost too late). However, if, for example,
> pcre2_free_context changed to pcre2_context_free, then for consistency
> we would also have to change pcre2_free_match_data to
> pcre2_match_data_free and so on. In other words, group the names by the
> object they work on instead of by the action they take.
>
> I am inclined not to change - does anybody else on this list have views?


I understand. English is not my natural language. But I see two (or
maybe they are the same ^^) advantages of this:
1) It gives an object language side as you know which "object" the
function manipulates (structure is an object with everything public ^^)
just by the start of its name.
2) It is easier when you use the lib because you know that all functions
to interact with context have names starting by prce2_ctx, and the ones
for match data starts by prce2_mdata, ... Easier to learn, to find in
documentation or header files, and then to use ! Ok, this is an opinion
of a non-english natural language speaker... But it seems to me better.
Anyone else ?

>> - I prefer to use the word "alloc" in function name when allocation is done
>> and reserve the word "init" to initialize structures. Thus pcre2_init_context
>> will be pcre2_ctx_alloc (and pcre2_ctx_free)
> pcre2_init_context *does* initialize the structure. It seemed to me that
> it would be more convenient for a user to write
>
> pcre2_context *c = pcre2_init_context(NULL);
>
> instead of
>
> pcre2_context *c = pcre2_init_context(pcre2_alloc_context());


I don't see things like that. In fact, this is not because the function
allocates some memory that it can not initialize it. The way I will do
is something as following:
In file.h
extern void prce2_ctx_init(prce2_ctx * ctx);
extern prce2_ctx * prce2_ctx_alloc(void);
in file.c:
prce2_ctx_init(prce2_ctx * ctx) {
ctx-> = /* do initialization of fields */
}
prce2_ctx * prce2_ctx_alloc(void) {
prce2_ctx * ctx;
if(ctx = malloc(sizeof(prce2_ctx)), ctx != NULL) {
pcre2_ctx_init(ctx);
}
return ctx;
}
To be honest, I think that in this way in the case of PCRE2, the export
of the init function is not necessary (users don't need it, they need of
the alloc function which does the initialization)

>
>> - Are you sure this is necessary to have memory_management functions
>> (private_malloc/private_free) by context ? Same question for
>> recursion_memory_management ?
> Yes! That is one of the main reasons for making a new API. The fact that
> the current private malloc/free functions are global variables means
> that they cannot be different for different threads. The new API has no
> global variables.


Ok. I do like global variables. When I have said generic (gen) or global
(glo), the meaning was not to be global variables in C language, but
global variables for the PCRE2 API in all contexts. All context could
have a pointer to a structure which defines some general features of
PCRE2 if these features are not overridden in the context.
But to be able to define private malloc/free functions in a general way
(and not as global variables) may be sufficient.
My problem is that I am not aware of the problems you could encounter
with threads (why threads will need to have different private
malloc/free functions).

>
>> - Is it necessary to have a pcre2_get_user_data()/pcre2_ctx_get_udata() as
>> this information is given when private_malloc() and private_free() are called
>> ?
> An earlier draft of the proposal was different, but during previous
> discussion people said there should be complete matching groups of
> set/get functions for each field in the context, for consistency. That
> is why these functions are now specified.


Ok, I understand the matching groups of set/get functions concept.
But if the user data are used only when private_malloc() and
private_free() are called, I think there is 2 choices:
- private_malloc() and private_free() have as second parameter the user
data and the get function is not useful.
- private_malloc() and private_free() have as second parameter the
context and the get function is useful. In this case, you can imagine to
not get the user data (because you don't need it) and/or to get user
data using the get function and/or other information (other than user
data) from the context (may be interesting for future enhancement).

>
>> - I will change all returns code of the API to have the following rules:
>> o Allocation functions return as standard allocation functions, that is a
>> pointer, NULL if failure
>> o Other functions always return an int >= 0 if OK, < 0 if NOK (#define
>> PCRE2_ERR_INVAL_XX => return(-PCRE2_ERR_INVAL_XX))
> The error codes are very similar to the existing API. They don't seem to
> have been a problem in the past, so I saw no reason to change them.


There are problems !
1) No defines for the pcre_compile() errrors. I know, there are a lot of
different errors, but they shall be defined !
2) defines for pcre_exec() errors are already negative in header file !
Not good:
I will prefer than the standard "open" function will return a negative
value of the corresponding errno than -1 and have to use errno with
positive value. Another way to say that:
- define error with positive value, but
- in case of error, return the negative value of the error.
3) As some functions are not named with "alloc" (pcre2_compile), then I
will think they return an error code instead of a pointer...

>
>> Thus
>> prce2cc = pcre2_compile_alloc(ctx, pattern, size, options, &errcode, &errpos);
>>
>> - Maybe prce2cc structure will be able to keep errors information and remove
>> errcode et errpos in the call of compile_alloc.
> pcre2_compile returns NULL for a compile-time error - there is no
> structure.


Yes, no structure when there is an error. But my proposal is to return
structure even in case of failure (see below)

>
>> Then the test
>> if(prce2cc == NULL){
>>     pcre2_get_error_message(errorcode, buffer, 120);
>> }
>> will become something like:
>> if(prce2_compile_failure(prce2cc)) {
>>     errcode = prce2_get_errcode(prce2cc);
>>     errpos = prce2_get_errpos(prce2cc);
>>     prce2_get_errpos(prce2cc, buffer, 120);
>> and/or
>>     pcre2_strerror(errcode, buffer, 120) /* like standard strerror() */
>> }
> I think that is unnecessarily complicated - and you still have to free
> the pcre2cc structure.


Good point.
But as you will write the free of the structure when it will be
successful, the same code could be used when an error appears ! Easier
to write (no if statement, just finish by freeing the structure) and if
the error information are contained in the structure itself, there are
less parameters to give when calling the functions and so less
parameters left to the user of the API to manage. Think about it.

> The code at the moment does not even allocate a
> structure until it has made a first-pass check through the regex, which
> is where most compile-time errors are picked up.


Sorry, I am not sure to understand very well (limited english...)

>
>> - I always use the fullinfo COUNTSIZE to get the size of the expected ovector
>> size
>> So I prefer to have something like
>> match_data = pcre2_mdata_alloc(pcre2cc, 0); /* 0 to use fullinfo(COUNTSIZE), >
>> 0 for a specify size*/
> You need also to pass the context (for malloc/free), but I can see that
> something like this could be a useful convenience. Thank you for the
> idea.


^^ ! I will used it for sure !

>
>> rc = pcre2_exec(ctx, prce2cc, array_subject, lower, upper, max, stride,
>> option, match_data);
>> I will use array_subject (a) lower (l), upper (u), max (m) and stride (s) to
>> define slice as follow (if I had time, I will implement it for python slice):
>> 8<------------------------------------------------------------------
>> A bit long, sorry... You can skip if no interest
> Er, yes, I didn't really follow all of that.


Nevermind, I will try to come back again about it
I thing this is the "negative length means ‘zero-terminated string’"
which disturbs me.
I need more time, sorry

Since to exchange idea !

--
Rv