On Sat, 23 Mar 2024, Exim Bugzilla via Exim-dev wrote:
> https://bugs.exim.org/show_bug.cgi?id=3085
>
> Bug ID: 3085
> Summary: Allow UTF-8 for log output
> Product: Exim
> Version: N/A
> Hardware: All
> OS: Linux
> Status: NEW
> Severity: bug
> Priority: medium
> Component: Logging
> Assignee: unallocated@???
> Reporter: forza@???
> CC: exim-dev@???
>
> This is probably not a bug, but more of a request for comments.
>
> I am logging to syslog instead of files. The syslog is handled by syslog-ng,
> and I parse the logfiles with Fail2Ban.
>
> The exim.conf:
>
> ### Logging
> log_selector = +all
> log_file_path = syslog
> syslog_timestamp = false
> syslog_duplication = false
> syslog_processname = exim
> SYSLOG_LONG_LINES = yes
>
>
> No, my issue is that sometimes Fail2Ban fails to read some of the lines and
> outputs a warning like this:
>
> 2024-03-17T19:23:33.870+00:00 warning fail2ban.filter[2922]: WARNING Error
> decoding line from '/var/log/exim.log' with 'UTF-8'.
> 2024-03-17T19:23:33.870+00:00 warning fail2ban.filter[2922]: WARNING Consider
> setting logencoding to appropriate encoding for this jail. Continuing to
> process line ignoring invalid characters: b'2024-03-17T19:23:33.698+00:00
> notice exim[5673]: [12\\21] F From: "\xbe\xe7\xb9\xcc\xbc\xf8"
> <msoony@???>\n'
[ So syslog-ng is writing exim's logging to /var/log/exim.log
I guess there are reasons to go the indirect way. ]
How do the relevant lines look in /var/log/exim.log - perhaps with
grep "2024-03-17T19:23:33.698+00:00" /var/log/exim.log
I guess the result would be something like:
2024-03-17T19:23:33.698+00:00 notice exim[5673]: [12\\21] F From: "��̼�" <msoony@???>
?
> So, this leads me to my current question. Can Exim be set to output
> UTF-8 encoded logs to syslog?
> Apparently, the syslog format
> according to RFC-5425 says " MSG SHOULD be UNICODE, encoded using
> UTF-8", but it seems to allow plain US-ASCII too.
[ For a piece of text, if the plain US-ASCII encoding is correct
then that byte stream is automatically valid UTF-8 and represents
that text correctly.
It is impossible to support UTF-8 and not handle
(true 7bit) plain US-ASCII correctly ! ]
> https://datatracker.ietf.org/doc/html/rfc5424#section-6.4
>
> I believe syslog-ng could handle non-UT8 messages, using flags(sanitize-utf8)
> on the source, however the manual specifies:
>
> "The HEADER part of the message must be in plain ASCII format, the parameter
> values of the STRUCTURED-DATA part must be in UTF-8, while the MSG part should
> be in UTF-8. The different parts of the message are explained in the following
> sections."
>
> Perhaps I am overthinking all of this. I'd appreciate some thoughts on correct
> logging configurations.
I think you are looking in the wrong place for the problem.
It is not that exim is disallowing UTF-8 output in the log,
but that it occasionally the output is not valid UTF-8.
The fundamental issue is we have "garbage in",
so will inevitably have "garbage out".
Exim is trying to log some "text" - the display-name of the From: header -
which should be ASCII (unless SMTPUTF8 is enabled, in which case it can be
UTF-8) but in this case is not UTF-8 or ASCII, but some unknown byte-stream.
[ Do you happen to know what language or
character set this sender writes their name in ? ]
As I understand it, exim logs this byte-stream as-is and there is nothing that
syslog-ng or fail2ban could reasonably do to interpret it correctly.
I believe that if you reverted to having exim log to a file,
the same issue would be there, probably with exactly the same byte-stream
as the syslog.
The best "fix" might be for exim to log this byte-stream coded as hex,
but in many cases that would be less readable than doing nothing.
For example
From: "André Aitchison" <andrew@???>
where the e-acute was encoded in LATIN-9 is not valid UTF-8,
but it is much clearer left like that than logged as
From: "\x41\x6e\x64\x72\xe9\x20\x41\x69\x74\x63\x68\x69\x73\x6f\x6e" <andrew@???>
- and then exim would have to spend time figuring out when the display-name
was not valid UTF-8.
I have not used fail2ban for email logs.
Is the message merely annoying, or is this stopping you from blocking
<msoony@???> because other lines in the log indicate a problem ?
--
Andrew C. Aitchison Kendal, UK
andrew@???
--
## subscription configuration (requires account):
##
https://lists.exim.org/mailman3/postorius/lists/exim-dev.lists.exim.org/
## unsubscribe (doesn't require an account):
## exim-dev-unsubscribe@???
## Exim details at
http://www.exim.org/
## Please use the Wiki with this list -
http://wiki.exim.org/