[exim-dev] 4.67 - PH/19 PostgreSQL client encoding change br…

Pàgina inicial
Delete this message
Reply to this message
Autor: SeattleServer Mailing Lists
Data:  
A: exim-dev
Assumpte: [exim-dev] 4.67 - PH/19 PostgreSQL client encoding change breaks things...
Guess I need to pay attention to changelogs. At least my log files still
talk to me ;) :

Me: wtf are all these encoding errors in my database logs?

Log:
2007-05-27 14:40:23.712612500 a.mx-exim: ERROR: invalid byte sequence for
encoding "UTF8": 0x85
2007-05-27 14:51:16.481525500 a.mx-exim: ERROR: invalid byte sequence for
encoding "UTF8": 0x95
2007-05-27 14:57:32.443967500 a.mx-exim: ERROR: invalid byte sequence for
encoding "UTF8": 0x95
2007-05-27 14:59:22.797730500 a.mx-exim: ERROR: invalid byte sequence for
encoding "UTF8": 0xa0
2007-05-27 14:59:28.033711500 a.mx-exim: ERROR: invalid byte sequence for
encoding "UTF8": 0xa0
2007-05-27 15:02:43.401531500 a.mx-exim: ERROR: invalid byte sequence for
encoding "UTF8": 0x95
2007-05-27 15:03:47.620627500 a.mx-exim: ERROR: invalid byte sequence for
encoding "UTF8": 0x95

Me: Hmm...my database is UTF-8, however as Exim doesn't seem to talk to
UTF-8 databases nicely so I placed an override in as I do for other apps
that fixes everything in this case. Let me check if it's still there:

seattleserver=# select rolconfig from pg_roles where rolname = 'a.mx-exim';
                   rolconfig
-----------------------------------------------
 {search_path=exim,client_encoding=ISO-8859-1}
(1 row)


Me:
Ok, so what's the issue?
seattleserver=# alter role 'a.mx-exim' set log_statement = 'all';

Log:
2007-05-28 04:36:20.917780500 a.mx-exim: LOG: statement: set
client_encoding to 'SQL_ASCII'

Me:
Huh?

Exim Changelog:
PH/19 Added PQsetClientEncoding(conn, "SQL_ASCII") to the pgsql code module.
      This is apparently needed in addition to the PH/07 change above to
avoid
      any possible encoding problems.


Nooooooooo!

Okay, I don't like encodings much, but I'll try to simplify how I
understand things work in PostgreSQL:

SQL_ASCII encoding isn't an encoding at all - rather, it's the lack
thereof. If you have a database initialized with the SQL_ASCII encoding
(sadly the default in <=7.4 [1]), then no encoding checks are done on data
coming into the database. You can throw whatever you want in, but then,
you rely on your client software knowing the encoding of the data it's
pulling out as it could be in any old encoding and you don't really have
any idea which one. So, if you use a single database for numerous
applications, some of which are UTF-8, others which are ISO-8859-1, and
perhaps another that's BIG5, and you want them all to be able to access
and modify subsets of the same data, you're asking for trouble as you'll
run into encoding issues constantly. It also means you can't cleanly
export your database data to any other form that expects a single encoding
- a single byte in an incompatible encoding will throw a monkey wrench
into things.

The solution is to pick an appropriate encoding and go with it. This is
the default configuration if you don't explicitely specify at database
initialization time and your system encoding is set up as of PostgreSQL
8.0 [2]. In my case, as I need to support several languages and
applications within one database, that choice was UTF-8. When I encounter
an application, such as Exim <4.67, that doesn't talk in UTF-8 natively
but needs to enter noncompatible characters in another encoding, I address
that with "alter role blah set client_encoding = 'ISO-8859-1'" or
whatever, which causes PostgreSQL to translate between the client's
encoding and database encoding automatically [3]. This does the same
thing as PQSetClientEncoding, but it does it at connection time, without
executing a set statement. This means that when the application does
finish the connection and sends a set statement, it overrides any setting
I have placed in the database.

I wouldn't mind Exim calling PQSetClientEncoding if it actually set the
encoding to what it internally supports (which would make it "just work"
on non-ISO-8859-1/SQL_ANSII databases such as mine without an
administrator override), however as it overrides the database settings I'm
not sure that it's such a good idea to call it unconditionally without an
option to do so.


[1] http://www.postgresql.org/docs/7.4/static/app-initdb.html
[2] http://www.postgresql.org/docs/8.0/static/app-initdb.html
[3] http://www.postgresql.org/docs/current/static/multibyte.html#AEN24138


Cheers,
--
SeattleServer Mailing Lists
lists@???