[Exim] Performance: Solaris and Linux

Top Page
Delete this message
Reply to this message
Author: Steven Dossett
Date:  
To: exim-users
Subject: [Exim] Performance: Solaris and Linux
Hi folks,

I'm new to the list and this is my first post ;) I'm working on a
deployment
of exim as the standard mail exchangers for our company. We have a bit
of
"big iron" (well, modest iron) serving customers today without mail
exchanger
functionality split off into multiple hosts. In considering exim for
splitting
out this task. I've made use of a few low cost, readily available
(around this
office) Sun Netra T1s. My intent is to create a cluster of fairly low
cost exim
servers behind an SLB for inbound and outbound MX service. Nothing
special
there. However, my testing indicates that I can get much better
performance
for the price out of Dell Linux systems. I thought I would share what
I've
seen and perhaps some of you might have some ideas on how I might
further tweak
the Solaris Netra T1s before I'm 100% convinced that for the price and
performance in this class of system, exim would be a better performer on
Dells
running Linux. The performance difference I'm seeing in this price range
is
startling. I'll begin by sharing some highlights from my exim
configuration,
describing the components involved, and the test results I've seen.

Here are some of the more interesting snippets from my configuration:

hostlist relay_from_hosts = 127.0.0.1 :
net24-dbm;/opt/datasvcs/exim/clients/clients24.db :
net16-dbm;/opt/datasvcs/exim/clients/clients16.db :
net-dbm;/opt/datasvcs/exim/clients/clients.db
rfc1413_hosts = *
rfc1413_query_timeout = 0s
ignore_bounce_errors_after = 5m
timeout_frozen_after = 4h
smtp_accept_max_per_host = 5
smtp_accept_max = 400
message_size_limit = 10M
recipients_max = 1000
recipients_max_reject
remote_max_parallel = 5
queue_run_max = 20
smtp_accept_queue_per_connection = 20
queue_only_load = 8
deliver_queue_load_max = 12
split_spool_directory
smtp_connect_backlog = 50
delay_warning = 0s
return_size_limit = 10000
dns_retrans = 2s
dns_retry = 1
smtp_enforce_sync = false

The first system is a Sun Netra T1 (440Mhz IIi) running Solaris 8 with
1GB of
RAM and 2x18GB local disks (RAID 1 w/DiskSuite). I have the hints
database on
a tmpfs/RAM partition. I/O on the box is quite responsive, considering
the
disk configuration, with most asvc_times between 8 and 14ms. Sometimes I
see
it a little higher, but not too shabby. I've seen that on this box, as
load
climbs above 8 (and even more so above 12) that my time to answer and
banner
become sporadic. I can get quite a few fast answers less than 1 second,
but
I also get quite a few slow ones (6, 8, 10 - 20 seconds, etc.) When this
occurs, I see that I have no idle cycles left on the CPU so I believe
I'm just
waiting for CPU time to be able to handle the fork for the new request
and
slice processing time off for the new process.

With this server and the configuration above, I ran some tests with
postal.
For the first batch of tests I ran with:
postal -z test -m5 -p5 127.0.0.1 <file with a single dest addr> -
In this scenario, the destination email address was a remote address and
the
Netra achieved approximately 250 messages accepted per minute. This
seemed
fairly low to me so I decided to configure Exim on a spare whitebox PC
that I
had available running Linux 7.2. This little PC is a 1.4Ghz AMD with
512MB RAM
and a generic IDE HD. With the same postal settings, this PC was able
to accept approximately 1,100 messages per minute. That's quite a
difference.
I then decided to see if the bulk of my performance limitation on the
Netra
was related to Solaris having to fork and handle exim processes by
reducing the number of processes the system would have to fork and end
by
using -c 1,000 in postal. In this scenario I saw ~1,100 messages
accepted per
minute for the Netra and ~2,500 messages per minute with the PC. At this
point,
my curiosity was peaked and I decided to get serious about trying Exim
on a
"likely candidate" sized Dell server that I would use in production. For
just
a bit more $$ than the Sun Netra T1, I can get a Dell with HW RAID,
2x2.4Ghz
Xeon CPUs, 2x36GB SCSI Disks, and 1GB of RAM. Yes, quite a bit more
hardware
for just a bit more cost and an impressive bump in performance that
seems to
far outweigh the cost increase.

For my next round of tests, I used the same Netra configuration and a
Dell
configuration matching the one listed above except that I had 6GB of RAM
in
the box instead of 1GB. This server is being prepared for a different
purpose,
so it just happened to have such a nice abundance of memory. However, I
didn't
use anywhere close to that much memory in my tests so I don't think it
had any
significant impact in favoring the Dell. I also changed my test scenario
to
use a local account aliased to /dev/null for both servers to further
reduce
I/O issues that could exist in testing. I used postal with a variety of
simultaneous processes and here is what I saw for average messages
accepted
per minute:

Postal processes        Solaris 8 Netra T1      RedHat 7.2 Dell
1                       214                     1,321
2                       324                     2,302
4                       426                     3,910
8                       595                     6,700
16                      853                     8,021
32                      980                     8,877


At 8 postal processes and above I saw the Netra pretty much bottom out
as far
as available idle CPU cycles. This allowed me to again see the sporadic
mix of
slow answer times. On the other hand the Dell, even at 32 postal
processes,
was still approximately 50% idle.

I've tried the usual kernel tuning for the Sun including:
set rlim_fd_cur = 4096
set rlim_fd_max = 8192
set tcp:tcp_conn_hash_size=8192
set tcp:tcp_time_wait_interval=30000
set autoup=240
set maxpgio=1024

I've also tested with ncsize at 15,000 and at 1,500 but that doesn't
appear
to make a significant difference either. Also, my tcp_time_wait_interval
is
30,000 so I don't have a lot of connections hanging about in time_wait.
As far
as I can tell, the Netra performance in this configuration is just
topped out.

So, my questions are.. Are there any additional exim configuration
options I
should consider to squeeze more performance out of the Netra? With my
queue_only_load and other exim settings shifted to keep answer times
good on
the Netra, I'm concerned that with deep queues I might have trouble
keeping up.
Is anyone out there running Exim 4.24 on a Sun Netra T1 in a similar
configuration? Do you get better performance and if so are there some
specific kernel tuning adjustments that you can recommend for Solaris?
Is
anyone out there running Exim 4.24 on Linux in a similar configuration?
How
has it performed for you? Do you have any general Exim on Linux
suggestions
that I should keep in mind before moving forward with a Dell/Linux
solution?
The Dell I used in the tests above is pretty much untuned. I'd also
likely
consider RedHat 8 instead of 7.2 and perhaps updated with a more recent
kernel.
For the Linux Exim users out there, which file systems are you using for
Exim?
I should also mention that on both platforms I see a high ratio of CPU
time
spent in sys vs. user. Normal exim behavior with a lot of processes?

Btw, I've ran with the Netra T1s in service for a bit and I've seen them
handle ~170,000 received messages per day so they aren't terrible, but
for the
price vs. performance it appears that I can have a much better
performing
solution with a Dell/Linux cluster for slightly more cost. I hope the
performance numbers above are helpful to those out there who are also
tuning
and testing Exim.

Thanks,
Steven