[pcre-dev] implementing support for std::wstring in pcrecpp

Top Page
Delete this message
Author: Thorsten Schöning
Date:  
To: pcre-dev
Subject: [pcre-dev] implementing support for std::wstring in pcrecpp
Hello,

in our programs we face the situation that we use both std::wstring
and therefore Unicode with Windows-functions, especially for paths,
and sometimes just std::string with windows-1252-encoding or UTF-8. As
I understand, PCRE does support UTF-8 natively, but not other Unicode
encoding, therefore we always recode std::wstring to std::String with
UTF-8-encoding ourselves.

Because we only use pcrecpp and it abstracts the use of strings in
it's own StringPiece class, I thought of extending it to support
std::wstring by reencoding it itself into proper UTF-8. It seems that
I just has to write a new constructor which does the encoding work,
maintain a flag that UTF-8 is present and adjust the args given to
PCRE to use UTF-8-matching and maybe add/change Arg::parse_* methods
which handle strings.

The real problem seems to be how to do the reencoding into UTF-8. We
mainly work with Windows which provides and API to do so. On Linux it
seems that iconv is a often used library, but this would make PCRE
depend on 3rd party libraries, which it doesn't now.

What's your opinion in general for my approach? Any chance it could
get merged or should I have to maintain my own patches? Could you live
with just adding support for Windows in the first way? If not, what's
the best approach to reencode on Linux and others? I have no
experience using iconv on Windows, therefore would prefer the native
API for that.

Mit freundlichen Grüßen,

Thorsten Schöning

--
Thorsten Schöning
AM-SoFT IT-Systeme - Hameln | Potsdam | Leipzig

Telefon: Potsdam: 0331-743881-0
E-Mail:  tschoening@???
Web:     http://www.am-soft.de


AM-SoFT GmbH IT-Systeme, Konsumhof 1-5, 14482 Potsdam
Amtsgericht Potsdam HRB 21278 P, Geschäftsführer: Andreas Muchow