Re: [pcre-dev] New API

Startseite
Nachricht löschen
Autor: Carsten Klein
Datum:  
To: pcre-dev
Betreff: Re: [pcre-dev] New API
Hi Philip,

>> In one of the previous versions of this document, you said, that the
>> string extraction functions return 1 on success and zero otherwise
>
> Did I? I thought my only comment on these was "Apart from changes to the
> variable types and the addition of a context argument for functions that
> get memory, these are otherwise unchanged." And in the current
> specification they return a length for success and a negative number for
> error. That is certainly my intention.


No, actually you didn't. Sorry, I mixed this up with the pcre2_version
function. Since the size of the buffer needed for the version is easy to
estimate, there seems not to be a problem with that function.

>
>> There are languages out there, that have string types, which store the
>> string's length. For these, I suggest to return the length of the copied
>> string (number of characters, of course, not bytes), if the copy process
>> worked fine.
>
> Number of characters may be tricky; what is known is the number of
> bytes (or 16-bit or 32-bit units). Nowhere else in the API does PCRE
> count in characters.


In fact, that's what I meant, when I referred to "number of characters".
Of course, with surrogates the actual number of "real" characters may
differ from the number of 8-, 16- or 32-bit units, but that is not the
point here.

Seems like I'm much affected by the Win32 API. There, the number of
characters is either the number of 16-bit WCHARs or the number 8-bit
CHARs, depending on whether unicode is used or not.

So, returning the number of bytes, 16-bit or 32-bit units is actually
what I was requesting.

>
>> Furthermore, if the provided buffer is too small, the string copy
>> functions should return the required buffer size (that is string length
>> in characters plus the terminating null character) to help the caller to
>> provide a large enough buffer (likely after a first call with a
>> typically sized buffer has failed).
>
> At present they return PCRE_ERROR_NOMEMORY. A function could be provided
> to return the length required: int pcre2_get_stringlength(context,
> number) for example.


Of course, a separate function for getting the length of the substring
would be a suitable solution, too.

However, in fact you need two such functions, one to pass a stringnumber
and one to pass the name (and the caller needs to pass a
pcre2_match_data, not the context):

int pcre2_get_substring_length(
pcre2_match_data *match_data,
int stringnumber);

int pcre2_get_named_substring_length(
pcre2_match_data *match_data,
PCRE2_SPTR name);

For my mind, returning the substring length (or the required buffer
size) instead of PCRE_ERROR_NOMEMORY, if the provided buffer is too
small, directly from pcre2_copy_named_substring and
pcre2_copy_substring is less costly (both for the caller and the
implementor).

Regards,
Carsten