Re: [Exim] exim stats?

Top Page
Delete this message
Reply to this message
Author: Odhiambo Washington
Date:  
To: Matthew Daubenspeck
CC: Exim Users
Subject: Re: [Exim] exim stats?
* Matthew Daubenspeck <matt@???> [20011025 20:19]: writing on the subject '[Exim] exim stats?'
| Does Exim have anything built in (or an addtional add on that anyone is
| aware of) that will provide some type of stats on a per user basis? Items
| such as messages, message size, etc etc.


If you first searched the list archives you'd have encountered good results.


eximstats is a util that ships with Exim.

In my crontab, I have something like this:

# Get Daily Statistics for Exim
55      23      *       *       *       root    sh /home/wash/Administration/send-exim-stats 2>&1


and the file send-exim-stats looks like this:
###
#!/bin/sh
#eximstats needs to be used something similiar to what I do:
STATS=/usr/local/sbin/eximstats
LOG=/var/log/exim/mainlog
$STATS -ne -nr $LOG | mail -s "Daily Exim Statistics" wash, tole
##

Another variation (that I use for comparison) is one that looks like this:

#!/bin/sh
#eximstats needs to be used something similiar to what I do:
STATS2=/home/wash/Administration/eximstats_sten
LOG=/var/log/exim/mainlog
$STATS2 -ne -nr -nt -domain $LOG | mail -s "Daily Exim Stats" wash, tole


This modified version of eximstats was posted to this list. I've attached
the file for you.


Hope that helps.

-Wash

S y s t e m s   A d m i n i s t r a t o r
--
                                              ~\\_                 
 Odhiambo Washington                            \\\\               
 Wananchi Online Ltd.,                          `\\\\\             
 1st Flr Loita Hse, Loita Street                 |\\\\\            
 PO Box 10286,00100-NAIROBI,KE.                   \\\\\|__.--~~\   
 Fax: 254 2 313985-9                           _--~            /   
 Fax: 254 2 313922                           /~ //////  _-~~~~'    
 E-mail: wash@???                  ('-//////-//           
 URL    : http://www.wananchi.com            //////(((-)           
 GSM: 254 72 743 223 / 254 733 744 121     /////"                  
                                        _///"                      


+++
"I'd give my right arm to be ambidextrous."
#!/usr/bin/perl

# Copyright (c) 1999 University of Cambridge.
# See the file NOTICE for conditions of use and distribution.

# Perl script to generate statistics from one or more Exim log files.

# Usage: eximstats [<options>] <log file> <log file> ...

# 1996-05-21: Ignore lines not starting with valid date/time, just in case
#               these get into a log file.
# 1996-11-19: Add the -h option to control the size of the histogram,
#               and optionally turn it off.
#             Use some Perl 5 things; it should be everywhere by now.
#             Add the Perl -w option and rewrite so no warnings are given.
#             Add the -t option to control the length of the "top" listing.
#             Add the -ne, -nt options to turn off errors and transport
#               information.
#             Add information about length of time on queue, and -q<list> to
#               control the intervals and turn it off.
#             Add count and percentage of delayed messages to the Received
#               line.
#             Show total number of errors.
#             Add count and percentage of messages with errors to Received
#               line.
#             Add information about relaying and -nr to suppress it.
# 1997-02-03  Merged in some of the things Nigel Metheringham had done:
#               Re-worded headings
#               Added received histogram as well as delivered
#               Added local senders' league table
#               Added local recipients' league table
# 1997-03-10  Fixed typo "destinationss"
#             Allow for intermediate address between final and original
#               when testing for relaying
#             Give better message when no input
# 1997-04-24  Fixed bug in layout of error listing that was depending on
#               text length (output line got repeated).
# 1997-05-06  Bug in option decoding when only one option.
#             Overflow bug when handling very large volumes.
# 1997-10-28  Updated to handle revised log format that might show
#               HELO name as well as host name before IP number
# 1998-01-26  Bugs in the function for calculating the number of seconds
#               since 1970 from a log date
# 1998-02-02  Delivery to :blackhole: doesn't have a T= entry in the log
#               line; cope with this, thereby avoiding undefined problems
#             Very short log line gave substring error
# 1998-02-03  A routed delivery to a local transport may not have <> in the
#               log line; terminate the address at white space, not <
# 1998-09-07  If first line of input was a => line, $thissize was undefined;
#               ensure it is zero.
# 1998-12-21  Adding of $thissize from => line should have been adding $size.
#             Oops. Should have looked more closely when fixing the previous
#               bug!
# 1999-11-12  Increased the field widths for printed integers; numbers are
#               bigger than originally envisaged.
# 2001-03-21  Converted seconds() routine to use Time::Local, fixing a bug
#               whereby seconds($timestamp) - id_seconds($id) gave an
#               incorrect result.
#             Added POD documentation.
#             Moved usage instructions into help() subroutine.
#             Added 'use strict' and declared all global variables.
#             Added '-html' flag and resultant code.
#             Added '-cache' flag and resultant code.
#             Added add_volume() routine and converted all volume variables
#               to use it, fixing the overflow problems for individual hosts
#               on large sites.
#             Converted all volume output to GB/MB/KB as appropriate.
#             Don't store local user stats if -nfl is specified.
#             Modifications done by: Steve Campbell (<steve@???>)
# 2001-04-02  Added the -t_remote_users flag. Steve Campbell.
# 2001-10-15  Added the -domain flag. Steve Campbell.



=head1 NAME

eximstats - generates statistics from Exim mainlog files.

=head1 SYNOPSIS

eximstats [Options] mainlog1 mainlog2 ... > report.txt

Options:

=over 4

=item B<-h>I<number>

histogram divisions per hour. The default is 1, and
0 suppresses histograms. Valid values are:

0, 1, 2, 3, 5, 10, 15, 20, 30 or 60.

=item B<-ne>

Don't display error information.

=item B<-nr>

Don't display relaying information.

=item B<-nr>I</pattern/>

Don't display relaying information that matches.

=item B<-nt>

Don't display transport information.

=item B<-q>I<list>

List of times for queuing information single 0 item suppresses.

=item B<-t>I<number>

Display top <number> sources/destinations
default is 50, 0 suppresses top listing.

=item B<-tnl>

Omit local sources/destinations in top listing.

=item B<-t_remote_users>

Include remote users in the top source/destination listings.

=item B<-html>

Output the results in HTML.

=item B<-cache>

Cache results of timegm() lookups. This will result in a significant
speedup when processing hundreds of thousands of messages, at a
cost of increasing the memory utilisation.

=back

=head1 DESCRIPTION

Eximstats parses exim mainlog files and outputs a statistical
analysis of the messages processed. By default, a text
analysis is generated, but you can request an html analysis
by using the B<-html> flag.

=head1 AUTHOR

There is a web site at http://www.exim.org - this contains details of the
mailing list exim-users@???.

=head1 TO DO

This program does not perfectly handle messages whose received
and delivered log lines are in different files, which can happen
when you have multiple mail servers and a message cannot be
immeadiately delivered. Fixing this could be tricky...

=head1 SUBROUTINES

The following section will only be of interest to the
program maintainers:

=cut

use integer;
use strict;
use Time::Local;


##################################################
#             Static data                        #
##################################################
# Will convert from 'use vars' to 'our' when perl 5.6.0 is out for
# Solaris 2.6 on sunfreeware.com.
use vars qw(@tab62 @days_per_month $gig);
use vars qw($VERSION);



@tab62 =
  (0,1,2,3,4,5,6,7,8,9,0,0,0,0,0,0,     # 0-9
   0,10,11,12,13,14,15,16,17,18,19,20,  # A-K
  21,22,23,24,25,26,27,28,29,30,31,32,  # L-W
  33,34,35, 0, 0, 0, 0, 0,              # X-Z
   0,36,37,38,39,40,41,42,43,44,45,46,  # a-k
  47,48,49,50,51,52,53,54,55,56,57,58,  # l-w
  59,60,61);                            # x-z


@days_per_month = (0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334);
$gig     = 1024 * 1024 * 1024;
$VERSION = '1.14';


# Declare global variables.
use vars qw($received_data_total  $received_data_gigs  $received_count_total);
use vars qw($delivered_data_total $delivered_data_gigs $delivered_count_total);
use vars qw(%arrival_time %size %from_host %from_address);
use vars qw(%timestamp2time);            #Hash of timestamp => time.
use vars qw($i);                #General loop counter.


# The following are parameters whose values are
# set by command line switches:
use vars qw($show_errors $show_relay $show_transport);
use vars qw($topcount $local_league_table $include_remote_users);
use vars qw($hist_opt $hist_interval $hist_number);
use vars qw($relay_pattern @queue_times $html);
use vars qw($cache_id_times);
use vars qw($host_or_domain);


##################################################
#                   Subroutines                  #
##################################################



=head2 volume_rounded();

$rounded_volume = volume_rounded($bytes,$gigabytes);

Given a data size in bytes, round it to KB, MB, or GB
as appropriate.

Eg 12000 => 12KB, 15000000 => 14GB, etc.

Note: I've experimented with Math::BigInt and it results in a 33%
performance degredation as opposed to storing numbers split into
bytes and gigabytes.

=cut

sub volume_rounded {
my($x,$g) = @_;
my($rounded);

while ($x > $gig)
{
$g++;
$x -= $gig;
}

# Values < 1 GB

if ($g <= 0)
  {
  if ($x < 10000)
    {
    $rounded = sprintf("%6d", $x);
    }
  elsif ($x < 10000000)
    {
    $rounded = sprintf("%4dKB", ($x + 512)/1024);
    }
  else
    {
    $rounded = sprintf("%4dMB", ($x + 512*1024)/(1024*1024));
    }
  }


# Values between 1GB and 10GB are printed in MB

elsif ($g < 10)
{
$rounded = sprintf("%4dMB", ($g * 1024) + ($x + 512*1024)/(1024*1024));
}

# Handle values over 10GB

else
{
$rounded = sprintf("%4dGB", $g + ($x + $gig/2)/$gig);
}

return $rounded;
}


=head2 add_volume();

add_volume(\$bytes,\$gigs,$size);

Add $size to $bytes/$gigs where this is a number split into
bytes ($bytes) and gigabytes ($gigs). This is significantly
faster than using Math::BigInt.

=cut

sub add_volume {
  my($bytes_ref,$gigs_ref,$size) = @_;
  $$bytes_ref += $size;
  while ($$bytes_ref > $gig) {
    $$gigs_ref++;
    $$bytes_ref -= $gig;
  }
}



=head2 format_time();

$formatted_time = format_time($seconds);

Given a time in seconds, break it down into
weeks, days, hours, minutes, and seconds.

Eg 12005 => 3h20m5s

=cut

sub format_time {
my($t) = pop @_;
my($s) = $t % 60;
$t /= 60;
my($m) = $t % 60;
$t /= 60;
my($h) = $t % 24;
$t /= 24;
my($d) = $t % 7;
my($w) = $t/7;
my($p) = "";
$p .= "$w"."w" if $w > 0;
$p .= "$d"."d" if $d > 0;
$p .= "$h"."h" if $h > 0;
$p .= "$m"."m" if $m > 0;
$p .= "$s"."s" if $s > 0 || $p eq "";
$p;
}



=head2 seconds();

$time = seconds($timestamp);

Given a time-of-day timestamp, convert it into a time() value using timegm.
We expect the timestamp to be of the form "$year-$mon-$day $hour:$min:$sec",
with month going from 1 to 12, and the year to be absolute (we
do the necessary conversions).
We use timegm rather than timelocal as the id_seconds
appears to be calculated in GM time.

Should we ever switch to local time for any reason, we should
calculate the localtime offset once at the beginning of the program,
and then use timegm from then on as timelocal() is horribly inefficient.

If the -cache flag is specified, then we cache the results of the
gmtime lookup. This results in a significant performance boost when
processing hundreds of thousands of messages per day
at the cost of maintaining a memory cache.

=cut

sub seconds {
my($timestamp) = @_;

  if ($cache_id_times && $timestamp2time{$timestamp}) {
    return($timestamp2time{$timestamp});
  }


return 0 unless ($timestamp =~ /^(\d{4})\-(\d\d)-(\d\d)\s(\d\d):(\d\d):(\d\d)/);
my(@timestamp) = ($1,$2,$3,$4,$5,$6);


  #Adjust the values, as per gmtime(), and then reverse it to
  #to put it into the correct order.
  $timestamp[0] -= 1900;
  $timestamp[1]--;
  my $time = timegm(reverse @timestamp);
  if ($cache_id_times) {
    $timestamp2time{$timestamp} = $time;
  }
  $time;
}



=head2 id_seconds();

$time = id_seconds($message_id);

Given a message ID, convert it into a time() value.

=cut

sub id_seconds {
my($sub_id) = substr((pop @_), 0, 6);
my($s) = 0;
my(@c) = split(//, $sub_id);
while($#c >= 0) { $s = $s * 62 + $tab62[ord(shift @c) - ord('0')] }
$s;
}



=head2 calculate_localtime_offset();

$localtime_offset = calculate_localtime_offset();

Calculate the the localtime offset from gmtime in seconds.

$localtime = gmtime() + $localtime_offset.

This subroutine commented out as it's not currently in use.

=cut

#sub calculate_localtime_offset {
#  # Pick an arbitrary date, convert it to localtime & gmtime, and return the difference.
#  my (@sample_date) = (0,0,0,5,5,100);
#  my $localtime = timelocal(@sample_date);
#  my $gmtime    = timegm(@sample_date);
#  my $offset = $localtime - $gmtime;
#  return $offset;
#}



=head2 print_queue_times();

$time = print_queue_times($message_type,\@queue_times,$queue_more_than);

Given the type of messages being output, the array of message queue times,
and the number of messages which exceeded the queue times, print out
a table.

=cut

sub print_queue_times {
no integer;
my($string,$array,$queue_more_than) = @_;

my $printed_one = 0;
my $cumulative_percent = 0;
#$queue_unknown += keys %arrival_time;

my $queue_total = $queue_more_than;
for ($i = 0; $i <= $#queue_times; $i++) { $queue_total += $$array[$i] }

my $temp = "Time spent on the queue: $string";

my($format);
if ($html) {
print "<hr><a name=\"$string time\"></a><h2>$temp</h2>\n";
print "<table border>\n";
print "<tr><th>Time</th><th>Messages</th><th>Percentage</th><th>Cumulative Percentage</th>\n";
$format = "<tr><td align=\"right\">%s %s</td><td align=\"right\">%d</td><td align=\"right\">%5.1f%%</td><td align=\"right\">%5.1f%%</td>\n";
}
else
{
printf("%s\n%s\n\n", $temp, "-" x length($temp));
$format = "%5s %4s %6d %5.1f%% %5.1f%%\n";
}

for ($i = 0; $i <= $#queue_times; $i++)
  {
  if ($$array[$i] > 0)
    {
    my $percent = ($$array[$i] * 100)/$queue_total;
    $cumulative_percent += $percent;
    printf($format,
      $printed_one? "     " : "Under",
      &format_time($queue_times[$i]),
      $$array[$i], $percent, $cumulative_percent);
    $printed_one = 1;
    }
  }


if ($queue_more_than > 0)
  {
  my $percent = ($queue_more_than * 100)/$queue_total;
  $cumulative_percent += $percent;
  printf($format,
    "Over ",
    &format_time($queue_times[$#queue_times]),
    $queue_more_than, $percent, $cumulative_percent);
  }


#printf("Unknown %6d\n", $queue_unknown) if $queue_unknown > 0;
print "</table>\n" if $html;
print "\n";
}



=head2 print_histogram();

print_histogram('Deliverieds|Messages received',@interval_count);

Print a histogram of the messages delivered/received per time slot
(hour by default).

=cut

sub print_histogram {
my($text) = shift;
my(@interval_count) = @_;
my($maxd) = 0;
for ($i = 0; $i < $hist_number; $i++)
{ $maxd = $interval_count[$i] if $interval_count[$i] > $maxd; }

my $scale = int(($maxd + 25)/50);
$scale = 1 if $scale == 0;

my($type);
if ($text eq "Deliveries")
{
$type = ($scale == 1)? "delivery" : "deliveries";
}
else
{
$type = ($scale == 1)? "message" : "messages";
}

my($temp) = sprintf("$text per %s (each dot is $scale $type)",
($hist_interval == 60)? "hour" :
($hist_interval == 1)? "minute" : "$hist_interval minutes");

if ($html) {
print "<hr><a name=\"$text\"></a><h2>$temp</h2>\n<pre>\n";
}
else {
printf("%s\n%s\n\n", $temp, "-" x length($temp));
}

my $hour = 0;
my $minutes = 0;
for ($i = 0; $i < $hist_number; $i++)
{
my $c = $interval_count[$i];

# If the interval is an hour (the maximum) print the starting and
# ending hours as a label. Otherwise print the starting hour and
# minutes, which take up the same space.

  if ($hist_opt == 1)
    {
    printf("%02d-%02d", $hour, $hour + 1);
    $hour++;
    }
  else
    {
    if ($minutes == 0)
      { printf("%02d:%02d", $hour, $minutes) }
    else
      { printf("  :%02d", $minutes) }
    $minutes += $hist_interval;
    if ($minutes >= 60)
      {
      $minutes = 0;
      $hour++;
      }
    }


printf(" %6d %s\n", $c, "." x ($c/$scale));
}
print "\n";
print "</pre>\n" if $html;
}




=head2 print_league_table();

print_league_table($league_table_type,\%message_count,\%message_data);

Given hashes of message count and message data, which are keyed by
the table type (eg by the sending host), print a league table
showing the top $topcount (defaults to 50).

=cut


sub print_league_table {
my($text,$m_count,$m_data,$m_data_gigs) = @_;
my($name) = ($topcount == 1)? "$text" : "$topcount ${text}s";
my($temp) = "Top $name by message count";

my($format);
if ($html) {
print "<hr><a name=\"$text count\"></a><h2>$temp</h2>\n";
print "<table border>\n";
print "<tr><th>Messages</th><th>Bytes</th><th>\u$text</th>\n";

# Align non-local addresses to the right (so all the .com's line up).
# Local addresses are aligned on the left as they are userids.
my $align = ($text !~ /local/i) ? 'right' : 'left';
$format = "<tr><td align=\"right\">%d</td><td align=\"right\">%s</td><td align=\"$align\">%s</td>\n";
}
else {
printf("%s\n%s\n\n", $temp, "-" x length($temp));
$format = "%7d %10s %s\n";
}

my $count = 1;
my($key);
foreach $key (sort
               {
               $$m_count{$b}     <=> $$m_count{$a} ||
               $$m_data_gigs{$b} <=> $$m_data_gigs{$a}  ||
               $$m_data{$b}      <=> $$m_data{$a}  ||
               $a cmp $b
               }
             keys %{$m_count})
  {
  printf($format, $$m_count{$key}, volume_rounded($$m_data{$key},$$m_data_gigs{$key}), $key);
  last if $count++ >= $topcount;
  }
print "</table>\n" if $html;
print "\n";


$temp = "Top $name by volume";
if ($html) {
print "<hr><a name=\"$text volume\"></a><h2>$temp</h2>\n";
print "<table border>\n";
print "<tr><th>Messages</th><th>Bytes</th><th>\u$text</th>\n";
}
else {
printf("%s\n%s\n\n", $temp, "-" x length($temp));
}

$count = 1;
foreach $key (sort
               {
               $$m_data_gigs{$b} <=> $$m_data_gigs{$a}  ||
               $$m_data{$b}      <=> $$m_data{$a}  ||
               $$m_count{$b}     <=> $$m_count{$a} ||
               $a cmp $b
               }
             keys %{$m_count})
  {
  printf($format, $$m_count{$key}, volume_rounded($$m_data{$key},$$m_data_gigs{$key}), $key);
  last if $count++ >= $topcount;
  }


print "\n";
print "</table>\n" if $html;
}


=head2 html_header();

$header = html_header($title);

Print our HTML header and start the <body> block.

=cut

sub html_header {
my($title) = @_;
my $text = << "EoText";
<html>
<head>
<title>$title</title>
</head>
<body>
<h1>$title</h1>
EoText
return $text;
}



=head2 help();

help();

Display usage instructions and exit.

=cut

sub help {
print << "EoText";

eximstats Version $VERSION

Usage: eximstats [Options] mainlog1 mainlog2 ... > report.txt

Parses exim mainlog files and generates a statistical analysis of
the messages processed. Valid options are:

-h<number> histogram divisions per hour. The default is 1, and
                0 suppresses histograms. Other valid values are:
            2, 3, 5, 10, 15, 20, 30 or 60.
-ne             don't display error information
-nr             don't display relaying information
-nr/pattern/    don't display relaying information that matches
-nt             don't display transport information
-q<list> list of times for queuing information
                single 0 item suppresses
-t<number> display top <number> sources/destinations
                default is 50, 0 suppresses top listing
-tnl            omit local sources/destinations in top listing
-t_remote_users show top user sources/destinations from non-local domains
-html           output the results in HTML
-cache          increased processing speed, but higher memory utilisation.
-domain         show results by domain instead of by host


EoText

exit 1;
}




##################################################
#                 Main Program                   #
##################################################



$show_errors = 1;
$show_relay = 1;
$show_transport = 1;
$topcount = 50;
$local_league_table = 1;
$include_remote_users = 0;
$hist_opt = 1;
$host_or_domain = 'host';

@queue_times = (60, 5*60, 15*60, 30*60, 60*60, 3*60*60, 6*60*60,
                12*60*60, 24*60*60);



# Decode options

while (@ARGV > 0 && substr($ARGV[0], 0, 1) eq '-')
  {
  if    ($ARGV[0] =~ /^\-h(\d+)$/) { $hist_opt = $1 }
  elsif ($ARGV[0] =~ /^\-ne$/)     { $show_errors = 0 }
  elsif ($ARGV[0] =~ /^\-nr(.?)(.*)\1$/)
    {
    if ($1 eq "") { $show_relay = 0 } else { $relay_pattern = $2 }
    }
  elsif ($ARGV[0] =~ /^\-q([,\d\+\-\*\/]+)$/)
    {
    @queue_times = split(/,/, $1);
    my($q);
    foreach $q (@queue_times) { $q = eval($q) + 0 }
    @queue_times = sort { $a <=> $b } @queue_times;
    @queue_times = () if ($#queue_times == 0 && $queue_times[0] == 0);
    }
  elsif ($ARGV[0] =~ /^\-nt$/)     { $show_transport = 0 }
  elsif ($ARGV[0] =~ /^\-t(\d+)$/) { $topcount = $1 }
  elsif ($ARGV[0] =~ /^\-tnl$/)    { $local_league_table = 0 }
  elsif ($ARGV[0] =~ /^\-html$/)   { $html = 1 }
  elsif ($ARGV[0] =~ /^\-cache$/)  { $cache_id_times = 1 }
  elsif ($ARGV[0] =~ /^\-domain$/) { $host_or_domain = 'domain' }
  elsif ($ARGV[0] =~ /^\-help$/)   { help() }
  elsif ($ARGV[0] =~ /^\-t_remote_users$/) { $include_remote_users = 1 }
  else
    {
    print STDERR "Eximstats: Unknown or malformed option $ARGV[0]\n";
    help();
    }
  shift;
  }



# Initialize slots for queue times
my(@queue_bin,@remote_queue_bin,@received_interval_count,@delivered_interval_count);

for (my $i = 0; $i <= $#queue_times; $i++)
{
$queue_bin[$i] = 0;
$remote_queue_bin[$i] = 0;
}

# Compute the number of slots for the histogram

if ($hist_opt > 0)
  {
  if ($hist_opt > 60 || 60 % $hist_opt != 0)
    {
    print "Eximstats: -h must specify a factor of 60\n";
    exit 1;
    }
  $hist_interval = 60/$hist_opt;
  $hist_number = (24*60)/$hist_interval;
  @received_interval_count = (0) x $hist_number;
  @delivered_interval_count = (0) x $hist_number;
  }


#$queue_unknown = 0;

$received_data_total = 0;
$received_data_gigs = 0;
$received_count_total = 0;

$delivered_data_total = 0;
$delivered_data_gigs = 0;
$delivered_count_total = 0;

my $queue_more_than = 0;
my $delayed_count = 0;
my $relayed_unshown = 0;
my $begin = "9999-99-99 99:99:99";
my $end = "0000-00-00 00:00:00";
my(%received_count,       %received_data,       %received_data_gigs);
my(%delivered_count,      %delivered_data,      %delivered_data_gigs);
my(%received_count_user,  %received_data_user,  %received_data_gigs_user);
my(%delivered_count_user, %delivered_data_user, %delivered_data_gigs_user);
my(%transported_count,    %transported_data,    %transported_data_gigs);
my(%remote_delivered,%relayed,%delayed,%had_error,%errors_count);




# Scan the input files and collect the data
foreach my $file (@ARGV) {
  if ($file =~ /\.gz/) {
    unless (open(FILE,"gunzip -c $file |")) {
      print STDERR "Failed to gunzip -c $file: $!";
      next;
    }
  }
  elsif ($file =~ /\.Z/) {
    unless (open(FILE,"uncompress -c $file |")) {
      print STDERR "Failed to uncompress -c $file: $!";
      next;
    }
  }
  else {
    unless (open(FILE,$file)) {
      print STDERR "Failed to read $file: $!";
      next;
    }
  }


  while (<FILE>)
    {
    next if length($_) < 38;
    next unless /^(\d{4}\-\d\d-\d\d\s(\d\d):(\d\d):\d\d)\s(\S{16}) (..)/;


    my($tod,$m_hour,$m_min,$id,$flag) = ($1,$2,$3,$4,$5);
    my $ip   = (/\sH=\S+(?:(?=\s\()\s\S+)?(\s\[[^]]*\])/) ? $1 : "";
    my $host = (/\sH=(\S+)/) ? $1 : 'local';
    if ($host_or_domain eq 'domain' && $host !~ /^\[/) {
      #Remove the host portion from the DNS name. We ensure that we end up with
      #at least xxx.yyy. $host can be (x.y.z),  x.y.z. [IPAddr] is parsed out
      #above.
      $host =~ s/^(\(?)[^\.]+\.([^\.]+\.[^\.])/$1$2/;
    }


    $begin = $tod if $tod lt $begin;
    $end   = $tod if $tod gt $end;



    if ($flag eq '<=')
      {
      my $thissize = (/\sS=(\d+)( |$)/) ? $1 : 0;
      $size{$id} = $thissize;
      if ($host ne 'local')
        {
        if ($show_relay)                   # Save incoming information
          {                                # in case it becomes interesting
          my($from) = /^.{40}(\S+)/;       # later, when delivery lines are read
          $from_host{$id} = "$host$ip";
          $from_address{$id} = $from;
          }
        }
      if (/\sU=(\S+)/)
        {
        my $user = $1;
        if (($local_league_table   && ($host eq 'local')) ||
        ($include_remote_users && ($host ne 'local'))   )
          {
          $received_count_user{$user}++;
      add_volume(\$received_data_user{$user},\$received_data_gigs_user{$user},$thissize);
          }
        }


      $received_count{$host}++;
      add_volume(\$received_data{$host},\$received_data_gigs{$host},$thissize);
      $received_count_total++;
      add_volume(\$received_data_total,\$received_data_gigs,$thissize);


      $arrival_time{$id} = $tod if $#queue_times >= 0;
      if ($hist_opt > 0)
        {
        $received_interval_count[($m_hour*60 + $m_min)/$hist_interval]++;
        }
      }


    elsif ($flag eq "=>")
      {
      my $size = $size{$id};
      $size = 0 if !defined $size;
      if ($host ne 'local')
        {
        $remote_delivered{$id} = 0 if !defined($remote_delivered{$id});
        $remote_delivered{$id}++;


        # Determine relaying address if either only one address listed,
        # or two the same. If they are different, it implies a forwarding
        # or aliasing, which is not relaying. Note that for multi-aliased
        # addresses, there may be a further address between the first
        # and last.


        if ($show_relay && defined $from_host{$id})
          {
        my($old,$new);
          if (/^.{40}(\S+)(?:\s+\([^)]\))?\s+<([^>]+)>/)
            { ($old,$new) = ($1,$2); }
          else
            { $old = $new = ""; }


          if ("\L$new" eq "\L$old")
            {
            ($old) = /^.{40}(\S+)/ if $old eq "";
            my $key = "H=\L$from_host{$id}\E A=\L$from_address{$id}\E => " .
              "H=\L$host\E$ip A=\L$old\E";
            if (!defined $relay_pattern || $key !~ /$relay_pattern/o)
              {
              $relayed{$key} = 0 if !defined $relayed{$key};
              $relayed{$key}++;
              }
            else { $relayed_unshown++ }
            }
          }
        }


      if (($local_league_table   && ($host eq 'local')) ||
      ($include_remote_users && ($host ne 'local'))   )
        {
        my($rest) = substr($_,40);
        if (my($user) = split(($rest =~ /</)? ' <' : ' ', $rest))
          {
          if ($user =~ /^[\/|]/)
            {
            my($parent) = $_ =~ /(<[^@]+@?[^>]*>)/;
            $user = "$user $parent" if defined $parent;
            }
      $delivered_count_user{$user}++;
      add_volume(\$delivered_data_user{$user},\$delivered_data_gigs_user{$user},$size);
          }
        }


      $delivered_count{$host}++;
      add_volume(\$delivered_data{$host},\$delivered_data_gigs{$host},$size);
      $delivered_count_total++;
      add_volume(\$delivered_data_total,\$delivered_data_gigs,$size);


      if ($show_transport)
        {
        my $transport = (/\sT=(\S+)/) ? $1 : ':blackhole:';
        $transported_count{$transport}++;
        add_volume(\$transported_data{$transport},\$transported_data_gigs{$transport},$size);
        }
      if ($hist_opt > 0)
        {
        $delivered_interval_count[($m_hour*60 + $m_min)/$hist_interval]++;
        }
      }


    elsif ($flag eq "==" && defined($size{$id}) && !defined($delayed{$id}))
      {
      $delayed_count++;
      $delayed{$id} = 1;
      }


    elsif ($flag eq "**")
      {
      $had_error{$id} = 1 if defined ($size{$id});
      if ($show_errors)
        {
        my($error) = substr($_, 40);
        $errors_count{$error}++;
        }
      }


    elsif (/Completed$/)
      {
      if ($#queue_times >=0)
        {
        #Note: id_seconds() benchmarks as 42% slower than seconds() and computing
        #the time accounts for a significant portion of the run time.
        my($queued);
        if (defined $arrival_time{$id}) {
          $queued = &seconds($tod) - &seconds($arrival_time{$id});
      delete($arrival_time{$id});
        }
        else {
      $queued = &seconds($tod) - &id_seconds($id);
        }


        for ($i = 0; $i <= $#queue_times; $i++)
          {
          if ($queued < $queue_times[$i])
            {
            $queue_bin[$i]++;
            $remote_queue_bin[$i]++ if $remote_delivered{$id};
            last;
            }
          }
        $queue_more_than++ if $i > $#queue_times;
        }


      if ($show_relay)
        {
        delete($from_host{$id});
        delete($from_address{$id});
        }
      }
    }
  }


if ($begin eq "9999-99-99 99:99:99")
{
print "**** No valid log lines read\n";
exit 1;
}

my $title = "Exim statistics from $begin to $end";

if ($html) {
  print html_header($title);
  print "<ul>\n";
  print "<li><a href=\"#grandtotal\">Grand total summary</a>\n";
  print "<li><a href=\"#transport\">Deliveries by Transport</a>\n" if $show_transport;
  if ($hist_opt) {
    print "<li><a href=\"#Messages received\">Messages received per hour</a>\n";
    print "<li><a href=\"#Deliveries\">Deliveries per hour</a>\n";
  }
  if ($#queue_times >= 0) {
    print "<li><a href=\"#all messages time\">Time spent on the queue: all messages</a>\n";
    print "<li><a href=\"#messages with at least one remote delivery time\">Time spent on the queue: messages with at least one remote delivery</a>\n";
  }
  print "<li><a href=\"#Relayed messages\">Relayed messages</a>\n" if $show_relay;
  if ($topcount) {
    print "<li><a href=\"#sending host count\">Top 50 sending ${host_or_domain}s by message count</a>\n";
    print "<li><a href=\"#sending host volume\">Top 50 sending ${host_or_domain}s by volume</a>\n";
    print "<li><a href=\"#local sender count\">Top 50 local senders by message count</a>\n";
    print "<li><a href=\"#local sender volume\">Top 50 local senders by volume</a>\n";
    print "<li><a href=\"#destination count\">Top 50 destinations by message count</a>\n";
    print "<li><a href=\"#destination volume\">Top 50 destinations by volume</a>\n";
    print "<li><a href=\"#local destination count\">Top 50 local destinations by message count</a>\n";
    print "<li><a href=\"#local destination volume\">Top 50 local destinations by volume</a>\n";
  }
  print "<li><a href=\"#errors\">List of errors</a>\n" if %errors_count;
  print "</ul>\n<hr>\n";


}
else {
print "\n$title\n";
}

# Print grand totals

my($format1,$format2);
if ($html) {
print << "EoText";
<a name="grandtotal"></a>
<h2>Grand total summary</h2>
<table border>
<tr><th>TOTAL</th><th>Volume</th><th>Messages</th><th>${host_or_domain}s</th><th colspan=2>At least one addr<br>Delayed</th><th colspan=2>At least one addr<br>Failed</th>
EoText

$format1 = "<tr><td>%s</td><td align=\"right\">%s</td><td align=\"right\">%d</td><td align=\"right\">%d</td>";
$format2 = "<td align=\"right\">%d</td><td align=\"right\">%4.1f%%</td><td align=\"right\">%d</td><td align=\"right\">%4.1f%%</td>";
}
else {
print << "EoText";

Grand total summary
-------------------

                                                       At least one address
  TOTAL               Volume    Messages    ${host_or_domain}s      Delayed       Failed
EoText
  $format1 = "  %-16s %9s      %6d     %4d";
  $format2 = "  %6d %4.1f%% %6d %4.1f%%",
}


my $volume = volume_rounded($received_data_total, $received_data_gigs);
my $failed_count = keys %had_error;
  {
  no integer;
  printf("$format1$format2\n",'Received',$volume,$received_count_total,
    scalar(keys %received_data),$delayed_count,
    ($received_count_total) ? ($delayed_count*100/$received_count_total) : 0,
    $failed_count,
    ($received_count_total) ? ($failed_count*100/$received_count_total) : 0);
  }


$volume = volume_rounded($delivered_data_total, $delivered_data_gigs);
printf("$format1\n\n",'Delivered',$volume,$delivered_count_total,scalar(keys %delivered_data));
print "</table>\n" if $html;

# Print totals by transport if required

if ($show_transport)
  {
  if ($html) {
    print "<hr><a name=\"transport\"></a><h2>Deliveries by Transport</h2>\n";
    print "<table border>\n";
    print "<tr><th>&nbsp;</th><th>Volume</th><th>Messages</th>\n";
    $format1 = "<tr><td>%s</td><td align=\"right\">%s</td><td align=\"right\">%d</td>";
  }
  else {
    print "Deliveries by transport\n";
    print "-----------------------";
    print "\n                        Volume    Messages\n";
    $format1 = "  %-18s    %6s      %6d";
  }
  my($key);
  foreach $key (sort keys %transported_data)
    {
    printf("$format1\n",$key,
      volume_rounded($transported_data{$key},$transported_data_gigs{$key}),
      $transported_count{$key});
    }
  print "</table>\n" if $html;
  print "\n";
  }


# Print the deliveries per interval as a histogram, unless configured not to.
# First find the maximum in one interval and scale accordingly.

if ($hist_opt > 0)
{
&print_histogram("Messages received", @received_interval_count);
&print_histogram("Deliveries", @delivered_interval_count);
}

# Print times on queue if required

if ($#queue_times >= 0)
  {
  &print_queue_times("all messages", \@queue_bin,$queue_more_than);
  &print_queue_times("messages with at least one remote delivery",
    \@remote_queue_bin,$queue_more_than);
  }


# Print relay information if required

if ($show_relay)
  {
  my $temp = "Relayed messages";
  print "<hr><a name=\"$temp\"></a><h2>$temp</h2>\n" if $html;
  if (scalar(keys %relayed) > 0 || $relayed_unshown > 0)
    {
    my $shown = 0;
    my $spacing = "";
    my($format);


    if ($html) {
      print "<table border>\n";
      print "<tr><th>Count</th><th>From</th><th>To</th>\n";
      $format = "<tr><td align=\"right\">%d</td><td>%s</td><td>%s</td>\n";
    }
    else {
      printf("%s\n%s\n\n", $temp, "-" x length($temp));
      $format = "%7d %s\n      => %s\n";
    }


    my($key);
    foreach $key (sort keys %relayed)
      {
      my $count = $relayed{$key};
      $shown += $count;
      $key =~ s/[HA]=//g;
      my($one,$two) = split(/=> /, $key);
      printf($format, $count, $one, $two);
      $spacing = "\n";
      }
    print "</table>\n<p>\n" if $html;
    print "${spacing}Total: $shown (plus $relayed_unshown unshown)\n";
    }
  else
    {
    print "No relayed messages\n";
    print "-------------------\n" unless $html;
    }
  print "\n";
  }


# If the topcount is zero, print no league tables

if ($topcount > 0)
  {
  &print_league_table("sending ${host_or_domain}", \%received_count, \%received_data,\%received_data_gigs);
  &print_league_table("local sender", \%received_count_user,
    \%received_data_user,\%received_data_gigs_user) if ($local_league_table || $include_remote_users);
  &print_league_table("destination", \%delivered_count, \%delivered_data,\%delivered_data_gigs);
  &print_league_table("local destination", \%delivered_count_user,
    \%delivered_data_user,\%delivered_data_gigs_user) if ($local_league_table || $include_remote_users);
  }


# Omit error statistics if configured out

if ($show_errors)
{
my $total_errors = 0;

  if (scalar(keys %errors_count) != 0)
    {
    my $temp = "List of errors";
    my($format);
    if ($html) {
      print "<hr><a name=\"errors\"></a><h2>$temp</h2>\n";
      print "<table border>\n";
      print "<tr><th>Count</th><th>Error</th>\n";
      $format = "<tr><td align=\"right\">%d</td><td>%s</td>\n";
    }
    else {
      printf("%s\n%s\n\n", $temp, "-" x length($temp));
    }


    my($key);
    foreach $key (sort keys %errors_count)
      {
      my $text = $key;
      chop($text);
      $total_errors += $errors_count{$key};
      if ($html) {
    printf($format,$errors_count{$key},$text);
      }
      else {
    printf("%5d ", $errors_count{$key});
    while (length($text) > 65)
      {
      my($first,$rest) = $text =~ /(.{50}\S*)\s+(.+)/;
      last if !$first;
      printf("%s\n      ", $first);
      $text = $rest;
      }
    printf("%s\n\n", $text);
    }
      }
    print "</table>\n<p>\n" if $html;
    }


my $temp = "Errors encountered: $total_errors";
print $temp,"\n";
print "-" x length($temp),"\n" unless $html;
}

# End of eximstats