[pcre-dev] New API proposal revision

Αρχική Σελίδα
Delete this message
Συντάκτης: Rv
Ημερομηνία:  
Προς: pcre-dev
Αντικείμενο: [pcre-dev] New API proposal revision

Hello !

First of all, be kind with my poor english ! Sorry for that !
Then, as I just have discovered PCRE lib, I would congratulate all of
you for this great job.
To finish, I have seen the proposal for a V2 of the API, I could not
resist to read it (quickly), without any knowledge of what has been
already discussed here, but I prefer to give you my first (certainly not
correct) impression.
My time is a bit limited, so I will do it in short:

- For all functions concerning context, I will prefer to always start
the name of the function by pcre2_ctx_* or pcre2_context_*

- I prefer to use the word "alloc" in function name when allocation is
done and reserve the word "init" to initialize structures. Thus
pcre2_init_context will be pcre2_ctx_alloc (and pcre2_ctx_free)

- Are you sure this is necessary to have memory_management functions
(private_malloc/private_free) by context ? Same question for
recursion_memory_management ?
Maybe I will implement a generic (gen) or global (glo) interface to
specify these functions by pcre2_(gen|glo)_mem_set(malloc, free) and
pcre2_(gen|glo)_recmem_set(malloc, free) . By default new created
context inherit of these functions
Then if necessary, implement functions by context pcre2_ctx_mem_set(ctx,
malloc, free) and pcre2_ctx_recmem_set(ctx, malloc, free)
Thus:
pcre2_gen_mem_set(mymalloc, myfree);
pcre2_ctx *ctx = pcre2_ctx_alloc(); /*no need to have the size of the
context*/
pcre2_ctx_mem_set(myctxmalloc, myctxfree); /*if needed*/
pcre2_ctx_recmem_set(myctxrecmalloc, myctxrecfree); /*if needed*/
pcre2_ctx_set_udata(ctx, &mydata);
pcre2_ctx_set_something(ctx, ...);
or pcre2_gen_set_something() if it is global setting

- Is it necessary to have a pcre2_get_user_data()/pcre2_ctx_get_udata()
as this information is given when private_malloc() and private_free()
are called ?

- I will change all returns code of the API to have the following rules:
o Allocation functions return as standard allocation functions, that is
a pointer, NULL if failure
o Other functions always return an int >= 0 if OK, < 0 if NOK (#define
PCRE2_ERR_INVAL_XX => return(-PCRE2_ERR_INVAL_XX))

Thus
prce2cc = pcre2_compile_alloc(ctx, pattern, size, options, &errcode,
&errpos);

- Maybe prce2cc structure will be able to keep errors information and 
remove errcode et errpos in the call of compile_alloc. Then the test
if(prce2cc == NULL){
    pcre2_get_error_message(errorcode, buffer, 120);
}
will become something like:
if(prce2_compile_failure(prce2cc)) {
    errcode = prce2_get_errcode(prce2cc);
    errpos = prce2_get_errpos(prce2cc);
    prce2_get_errpos(prce2cc, buffer, 120);
and/or
    pcre2_strerror(errcode, buffer, 120) /* like standard strerror() */
}


- I always use the fullinfo COUNTSIZE to get the size of the expected
ovector size
So I prefer to have something like
match_data = pcre2_mdata_alloc(pcre2cc, 0); /* 0 to use
fullinfo(COUNTSIZE), > 0 for a specify size*/

rc = pcre2_exec(ctx, prce2cc, array_subject, lower, upper, max, stride,
option, match_data);
I will use array_subject (a) lower (l), upper (u), max (m) and stride
(s) to define slice as follow (if I had time, I will implement it for
python slice):
8<------------------------------------------------------------------
A bit long, sorry... You can skip if no interest
l and u are offset, starting at 1 (not zero)
A B C D E
1 2 3 4 5
-5 -4 -3 -2 -1

*** l and u:

* If l == 0 => l = 1
* If u == 0 => u = length(a)

* If l > 0 => slice starts at offset x = a[l-1]
* If u > 0 => le slice ends at offset y = a[u-1]

* If l < 0 => slice starts at offset x = a[length(a)-1+l]
* If u < 0 => slice ends at offset y = a[length(a)-1+u]

* If x > length(a) (ie x > 0 && x out of bound) => x = length(a)-1
* If y > length(a) (ie y > 0 && y out of bound) => y = length(a)-1
* If x < -length(a) (ie x < 0 && x out of bound) => x = 0
* If y < -length(a) (ie y < 0 && y out of bound) => y = 0

*** m

* If m < length(a) => z = m
* If m > length(a) => z = length(a)

*** s

* can not be 0

*** S (resulting slice)

* If x < y && s > 0 S is the string from a[x] to a[y] by stride of +s (S
is in the same order than a) until m elements
* If x < y && s < 0 S is the string from a[y] to a[x] by stride of +s (S
is in the reverse order than a) until m elements
* If x > y && s > 0 S is the string from a[x] to a[y] by stride of -s (S
is in the reverse order than a) until m elements
* If x > y && s < 0 S is the string from a[y] to a[x] by stride of -s (S
is in the same order than a) until m elements
* S[m] = '\0'

A bit.... to much maybe....
8<------------------------------------------------------------------

And to finish:
prce2_mdata_free(match_data);
prce2_compile_free(ctx, prce2cc);

It's late, It will continue if someone is interested to discuss of all
of this
Congratulation if you read this line !

Regards,


--
Rv