[exim] getting exim metrics? (using TICK?)

Top Page
Delete this message
Reply to this message
Author: Patrick von der Hagen
Date:  
To: exim-users
Subject: [exim] getting exim metrics? (using TICK?)
Hi all,

monitoring exim has always been a priority for me, starting 12 years ago
with nagios and munin, adding some homegrown code on the way to perform
anomaly detection (like an authenticated sender suddenly causing dozens
of bounces a minute, great indicator for compromised accounts) one
server having twice the volume of its peers or processing almost no
messages at all,...).

I've been looking for modern time series databases like carbon,
influxdb, prometheus,.. for quite a while, but the database never seems
to be the issue, the problem usually seems to collect the metrics. I
have to admit, I loved the simplicity of munin-plugins and just won't
learn C in order to write plugins for collectd...

However, telegraf
https://www.influxdata.com/time-series-platform/telegraf/ now seems to
be a capable tool in order to get my metrics by parsing logfiles
(logparser-plugin) and would support a wide range of backend data
stores. So the "collecting metrics" challenge might be completed after
all. I tried to get started with a small configuration for exim
https://github.com/pvdh/telegraf-logparser-exim/

The TICK stack looks particularly promising, especially Kapacitor to
perform detection and alerts based on the metrics collected by telegraf
and stored in influxdb.

What are your experiences? I suppose I'll be investing quite some time
in my setup and I want the result to last, so does anybody have
different suggestions regarding "how to get metrics from exim in a
modern backend, get graphs, detect issues, raise alerts"?

I simply don't want start running in the wrong direction, missing better
alternatives. ;-)

And of course, if anyone happens to have a more advanced telegraf
configuration than I do, any contribution would be greatly appreciated. ;-)

Best regards,
Patrick.