Re: [exim] Queue ID format

Συντάκτης: Yves Goergen
Ημερομηνία:
Προς: Jeremy Harris, exim-users
Αντικείμενο: Re: [exim] Queue ID format

Don't worry, I'm not trying to make a meaning out of that ID. Just want
to narrow down the recognised pattern to avoid false interpretation of
the log entries.

Well, if that log was intended for humans, would it be an interesting
idea to write a machine-readable log as well? I'm specifically
interested in metrics such as these:

* How many messages are submitted by a specific user/address or for a
specific recipient/domain?
* From how many different hosts/IP addresses are messages submitted for
a specific user/sender?
* How many remote SMTP errors indicating server reputation issues do we
see, and from which remote services?
* How many messages from a specific user/address could not be delivered?
* Does a user have a high sender spam score?

This is all to monitor the quality of the local service and detect
hacked accounts or other kinds of misuse of the service.

But I already see at my last list item which uses a log message from my
custom Exim config that it's probably hard to generate a more
parsing-friendly format (e.g. JSON). Every custom log message would need
to be annotated for that.

By now, out of 20000 log lines, I can't recognise 30. From all others it
seems I can extract sufficient meaning and data for the necessary
metrics. I can live with that, it's just a lot of code required to get
there.

-Yves

-------- Ursprüngliche Nachricht --------
Von: Jeremy Harris via Exim-users <exim-users@???>
Gesendet: Donnerstag, 24. Dezember 2020, 23:35 MEZ
Betreff: [exim] Queue ID format

On 24/12/2020 22:17, Yves Goergen via Exim-users wrote:
I'm parsing Exim log files, specifically the mainlog. Man, that's a
complex structure and it's hard to find all necessary details from the
documentation and by reading my actual log files. I'm using several
regular expressions for different kinds of lines. But a stateful parser
(the ones used to understand programming languages) would probably have
been the better choice here. Apache access logs just require a single
regex, for Exim I already have 8, one of which just covers most
meaningless messages I don't care about, and lots of detailed
post-processing.

The logs are really designed for human use, not for machine consumption.

What assumptions can I make about the format of a queue message ID? For
now, I use this regex:

[^ ]+

Though it seems they always match this regex:

[0-9A-Za-z]{6}-[0-9A-Za-z]{6}-[0-9A-Za-z]{2}

It may change at any time from future development changes.
There's a relevant comment in the source:

/* Now build the unique message id. This has changed several times over the
lifetime of Exim. This description was rewritten for Exim 4.14 (February
2003).
...

I *think* that some high-volume sites are at or close to performance
limits [1]
that the current format imposes, hence I must reiterate: this (the
message_id
format) is not supposed to be an exported interface. It's only documented
behaviour is that it is unique.

It's fairly reasonable to assume it'll never have an embedded space. I
would not
recommend trying to extract meaning from it.

Αυτό το μήνυμα είναι μέρος του ακόλουθου νήματος:
	το πλήρες δέντρο νημάτων ταξινομημένο κατά ημερομηνία
	Graeme Fowler στο
	Yves Goergen στο