Re: [exim] Finding system bottlenecks to speed up Exim

Top Page
Delete this message
Reply to this message
Author: Patrick von der Hagen
Date:  
To: Exim Mailing List
Subject: Re: [exim] Finding system bottlenecks to speed up Exim
Marc Perkel schrieb:
>
> Martin A. Brooks wrote:
>> Marc Perkel wrote:

[...]
>>> Disk IO is not real heavy. It has a reasonably fast SATA II drive
>>> that's not very full. Writes about 3 gigs of log file entries a day.
>>>
>> A server with _one_ disk?
> I have 2 drives in this server.

You should have learned a bit or two by now.... like giving precise
problem descriptions.

Anyway, you still believe that your problem might be related to the
number of tcp-connections. I suppose anyone else has a different opinion
and thinks about disk-IO.

I suppose you don't really understand disk-IO, otherwise you wouldn't
mention "fast SATA II drive that's not very full", since capacity and
used vs. free space are totally unrelated to email-server-performance

If you have a raid-1-configuration, all your writes are limited by the
disk which happens to react slower. For each message actually passing
through your system you have to write at least two spool-files once and
update the inode-tables regarding creation and deletion of those files.
However, a certain percentage of messages might need updates to their
state, for example if they have ten recipients which can't be delivered
at the same time. Oh, and if you use a journaling filesystem, don't
forget the journal.
Logfiles should not be synced to disk immediately, so there are fewer
transactions, but still, there are at least several IO-transactions to
disk for each message. I don't count various exim-databases and
locking-issues, since they can be handled by RAM. Any swapping, by the
way? Now, the number of transactions per seconds a hard-disk can support
is surprisingly low.

Standard-reactions to disk-IO-problems would be to use different disks
at least for exim-spool and exim-log, but I you are serious about that
you would probably use one raid-1 for system, one raid-1 for spool and
one raid-1 for log. Thus the number of transactions would be spread
across severals disks.

Even better: decent(!) Solid-State-disks can perform much more
transactions than "normal" disks. Their lack of capacity is usually no
issue for a setup where e-mail is only passed through.

Net-IO might be an issue, e.g. a slow DNS-resolver. But as a
long-time-list-member you know about the value of a local caching
nameserver....

I can't stress enough how important it is to have long-term-statistics
which can help to investigate performance-problems. I constantly
advocate using munin to monitor your systems. Currently it looks like
you thought "hey, load is strange", did some counting, found some
unexpected number related to TCP and now you can't think of any other
possible explanation/reason for your problems. But if you had some
decent statistics, you might realise that this value has been perfectly
normal for the last 10 month and is unlikely to be the reason of current
problems, so you would neither waste our time, nor yours.

PS: Just as an example regarding disk-IO: if you consider a
MS-Exchange-installation and call in some consultants to calculate the
required hardware-setup, they will usually ask questions like "how many
users?", "how many messages sent/received per time-unit?", "Expected
speak-volume?", etc.

Then they calculate the number of concurrent IO-transactions required
for decent performance, and based on that number they calculate how many
disks are needed. Then they consider raid (either 1 or 5) add some
spare-disks and tell you a huge number of disks.
Then they ask you how much storage should be available to each user and
THEN they will calculate the required size of each disk. I don't think
it is unlikely to end up with 100 disks of 160GB each....

--
CU,
Patrick.