Re: [Exim] NFS or SAMBA & MEMFS or separate HDD

Top Page
Delete this message
Reply to this message
Author: Tamas TEVESZ
Date:  
To: Rob Butler
CC: exim-users
Subject: Re: [Exim] NFS or SAMBA & MEMFS or separate HDD
On Wed, 14 Aug 2002, Rob Butler wrote:

> In that paper Yann recommends using a hard drive for the OS and a second
> pair of drives (raid 1) for the spool as the spool drive will have very high
> IO, and the system will need to access the exim binaries on the OS drive
> often as well.
>
> MEMFS or separate OS HDD questions:
>
> 1) Is the only reason for the separate drive for the OS to allow the exim
> exe's to be read from a disk that is not busy?


i'd argue with that with only one single reason: if the binary is
accessed so often, it will probably be held in the page cache anyway.

 > 2) What if instead I did the following:
 >         Only 2 drives, raid 1 with OS and spool on them (boot, root, spool,
 > and swap partitions .. hopefully the swap will never be used).  1.5 GB ram.
 > When system is started a memory filesystem is created and the exim binaries
 > are copied into it.


i don't think this makes much sense, in the light of the above
argument.

> Also I plan on using Berkeley DB as the database for
> the server, so I would copy the Berkeley DB into the memory file system as
> well. Hints DB, etc are all in memory file system too. Exim is started
> from the memory filesystem. Now anytime the exim binary needs to be used,
> it is really just doing a memory copy and not accessing the OS partitions.


much hassle for little gain. while, agreed, a fast i/o subsystem is
the heart of it, i don't think it worth going into this extreme. the
os is probably smarter than you (reffering to the page cache again).

what you should definitely do is, however, consider using tdb instead
of berkley db for your databases (as long as yout dbs do not need to
be rebuilt often - see archives, see comparison between different db
formats). if you do, you're probably better off with something like
sql (even on the same host. if accessed every so often, the tables and
the data will probably be in the page cache, again. more memory!).

> 3) if #2 were used, how much of a hit would the exim logs make to the single
> pair of raid 1 drives.... Is this another good reason to have a separate OS
> drive? Or, maybe a separate drive just for the logs?


separate os drive is good because a) it's logical to differentiate
between "code" and "data" ("code" here being the os, "data" the spool,
the logs, etc). when (note: "when". not "if" :) your disk fails, you
replace it, reload the os, you're in business again. if you are doing
stripe which shares the os and the spool, and your disk fails, you are
deep in trouble.

> 4) If instead of writing the exim logs to disk I sent log data to another
> server via network would that (along with setup # 2) eliminate the need for
> a dedicated OS drive in the server?


imho not - see above.

> 5) What do you think of using a setup similar to #2 + #4, but instead of
> putting the OS on the HDD, you used a compact flash card (128MB or whatever)
> to hold the OS and boot from? The idea here that a compact flash card
> probably would be about as safe as a pair of raid 1 drives for the OS since
> compact flash doesn't have any moving parts.


but it ages... granted, you don't have much writes on it :) and, for
that matter, a 40-60G ide drive is about the same price as a 256m-1g
cf card, with hells a lots more space to spare (and use, even). buy
two pieces more than you actually need for spare parts...

> The Berkeley DB for users, etc
> would be pulled from the network at boot and held in memory. The hints
> databases would be created fresh at boot and held in memory. Basically,
> except for upgrades to software the compact flash would never be written to,
> and only read during boot. Also no swap partitions anywhere... feedback???


this much depends on what os you are using. linux, for example,
doesn't really like to be without swap (this could have changed with
the gazillion vm subsystems around, but i certainly wouldn't rely on
it...)

> 6) What about network booting the SMTP servers, holding everything in RAM
> like #2, and writing logs to network as in #4, no swap, and using the local
> SCSI raid 1 drives just for spool?


basically, my opninion is that while operating systems are far from
being perfect, it doesn't neccessarily worth the hassle trying to
outsmart them. page cache, page cache, page cache. oh, and a single
backup image of your system disk.

> Several SMTP servers running exim for in and outbound mail 10/100 connection
> to internet
> Several pop3 servers 10/100 connection to internet
> Several file servers connected by a separate private 1Gb SWITCHED COPPER
> network to the POP3 and SMTP servers
> All servers are linux.
>
> Which is better / faster NFS or SAMBA?


i myself am not a big fan of network file systems, and try to avoid
using them where possible. i rather spend a bit more time desiging a
setup/backend so that it doesn't need network fs at all (might be my
education, though - as an ee, while i admit that all these radio,
wave, laser and whatnot technologies are good, galvanic connection
rules them all out ;)

> 3) do you think a 1Gb network is overkill?


not neccessarily. you might not be able to saturate it even closely
(think pci transfer rates barrier), but the same packet travels on the
wire faster...

 >  Would a 100Mb switched network
 > be enough for:
 >     10 SMTP / 5 POP3 / 5 file servers?
 >     20 SMTP / 10 POP 3 / 10 file servers?
 >     Suggestions on numbers?


estimated traffic ? both in amounts and in characteristics ? (are
gb equipment so much more expensive than fe equipment ?)

> Basically, if you were going to start from scratch, and wanted to build the
> biggest, meanest, leanest, fastest, most mind blowing mail system going
> while minimizing cost (2 drives in each server just for the OS raided can
> get expensive) what would you do?


i, in fact, *am* doing that as we speak :) overly minimizing the cost
is not a goal, though (i mean, if it's needed, it's needed. and i'm
certainly not trying to skimp on low-end [quality] hardware.

mirrored disks for os, striped disks for spool (i mean exim's spool),
raid5'd disks for the mail spool (as in the final place for the mails)
(as customers get pretty upset if they loose stuff, even if it costs a
little i/o because of the raid5), no network fs but logic in the
middle tier so that everything knows where to deliver to (smtp) or
server from (pop/imap) for a particular mailbox. if you're short in
disk, snap in another one. if no more room, snap in another box. this
even gives the advantage of having an on-site standby machine, should
one of them go down.

(just for the record: as for the start, for my setup i'm planning one
single frontend smtp relay that routes messages to combined pop/imap
servers. i can't even imagine the type of traffic when one single
frontend will not be enough. also, i'm only talking a couple of
hundred thousand users, with unestimated amounts of mail traffic...
that's why my setup is small, but relatively easily extensible).

when i/o bw gets narrow, i'll probably remove several fsync()s from
exim. this saves hells. don't do this unless you're fully aware of the
consequences. you can save tremendous amounts of i/o with rightly
choosen software (anyone ever seen courier-pop3d's calcsize() ? what
happened to your dinner after ?)

summary: as i see it, you're trying to spare just too much on the
expenses (money-wise) side. memory is dirt cheap. disks are not as
much, but you probably can safely spare the mirrored os with a good
backup system. read up on the relevant characteristics of the
operating systems which you could use, choose accordingly. don't
always and for all cost try to outsmart it. if you cut penny costs too
much, you'll pay it back several times in labour.

geez, i'm seeing the biggest thread coming in existence ;) my view is
usually distinct enough from others' so that i usually generate
flamewars :)

--
[-]