Author: ph10 Date: To: Ze'ev Atlas CC: pcre-dev@exim.org Subject: Re: [pcre-dev] EBCDIC many faces
On Sun, 5 Nov 2017, Ze'ev Atlas via Pcre-dev wrote:
> I need an adviceIn designing the conversion between EBCDIC code pages
> for PCRE2 I decided to make the API as simple as possible for the user
> and reduce overhead as much as possible. I am asking the user for
> example to tell me what is the estimated max length of the pattern and
> subject string so I can allocate them one time for the whole run
> (memory is usually not an issue in z/OS systems).The main issue is the
> ovector and converting back to the locale environment code page.
> Should I release the memory and reallocate for any execution of a
> pattern or is there a better way. Oh, and a silly question! How do I
> know the length of the subject sting? Ze'ev Atlas
I'm not sure I can help without knowing more about what kind of API you
are defining. The ovector in the standard API is provided by the user as
part of the match data block, created by calling pcre2_match_data_create.
As it contains only offsets, there shouldn't be any need to translate
it.
In the standard API the subject string is passed to pcre2_match() either
with a length or as a zero-terminated string (in which case
pcre2_match() calculates its length when it starts up). If you are
inventing some new function called, say, pcre2_translate_match() with
identical arguments to pcre2_match(), then it can find the length if
necessary, copy the subject to some new memory while translating it, and
then call pcre2_match() with the known length. The offsets returned in
the ovector should apply to either the translated subject or to the
original - as I understand it your translation doesn't alter the length.