[exim-dev] [Bug 508] New: PostgreSQL client_encoding broken

Top Page
Delete this message
Reply to this message
Author: bug508
Date:  
To: exim-dev
Subject: [exim-dev] [Bug 508] New: PostgreSQL client_encoding broken
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

http://www.exim.org/bugzilla/show_bug.cgi?id=508

           Summary: PostgreSQL client_encoding broken
           Product: Exim
           Version: 4.67
          Platform: Macintosh
               URL: http://www.exim.org/mail-archives/exim-dev/2007-
                    May/msg00027.html
        OS/Version: All
            Status: NEW
          Keywords: work:tiny
          Severity: bug
          Priority: medium
         Component: General execution
        AssignedTo: ph10@???
        ReportedBy: casey@???
         QAContact: exim-dev@???



(sorry for the extra post to exim-dev - didn't notice there's a bugzilla now
before sending that)

Exim Changelog:
> PH/19 Added PQsetClientEncoding(conn, "SQL_ASCII") to the pgsql code module.
>      This is apparently needed in addition to the PH/07 change above to avoid
>      any possible encoding problems.


SQL_ASCII encoding isn't an encoding at all - rather, it's the lack
thereof. If you have a database initialized with the SQL_ASCII encoding
(sadly the default in <=7.4 [1]), then no encoding checks are done on data
coming into the database. You can throw whatever you want in, but then,
you rely on your client software knowing the encoding of the data it's
pulling out as it could be in any old encoding and you don't really have
any idea which one. So, if you use a single database for numerous
applications, some of which are UTF-8, others which are ISO-8859-1, and
perhaps another that's BIG5, and you want them all to be able to access
and modify subsets of the same data, you're asking for trouble as you'll
run into encoding issues constantly. It also means you can't cleanly
export your database data to any other form that expects a single encoding
- a single byte in an incompatible encoding will throw a monkey wrench
into things.

The solution is to pick an appropriate encoding and go with it. This is
the default configuration if you don't explicitely specify at database
initialization time and your system encoding is set up as of PostgreSQL
8.0 [2]. In my case, as I need to support several languages and
applications within one database, that choice was UTF-8. When I encounter
an application, such as Exim <4.67, that doesn't talk in UTF-8 natively
but needs to enter noncompatible characters in another encoding, I address
that with "alter role blah set client_encoding = 'ISO-8859-1'" or
whatever, which causes PostgreSQL to translate between the client's
encoding and database encoding automatically [3]. This does the same
thing as PQSetClientEncoding, but it does it at connection time, without
executing a set statement. This means that when the application does
finish the connection and sends a set statement, it overrides any setting
I have placed in the database.

I wouldn't mind Exim calling PQSetClientEncoding if it actually set the
encoding to what it internally supports (which would make it "just work"
on non-ISO-8859-1/SQL_ANSII databases such as mine without an
administrator override), however as it overrides the database settings I'm
not sure that it's such a good idea to call it unconditionally without an
option to do so.


[1] http://www.postgresql.org/docs/7.4/static/app-initdb.html
[2] http://www.postgresql.org/docs/8.0/static/app-initdb.html
[3] http://www.postgresql.org/docs/current/static/multibyte.html#AEN24138

--
Configure bugmail: http://www.exim.org/bugzilla/userprefs.cgi?tab=email