Re: [exim] RAID level for mailstore

Top Page
Delete this message
Reply to this message
Author: W B Hacker
Date:  
To: exim users
Subject: Re: [exim] RAID level for mailstore
Graeme Fowler wrote:
> Hi
>
> Look! I'm responding to the original question, not getting involved in
> an argument. Woo!
>
> On Thu, 2008-03-20 at 19:35 +0000, Tom Brown wrote:
>> I wonder what people think for a RAID level for their mail stores?
>
> It depends entirely on circumstance. That circumstance comprises
> hardware, number of drives, OS, network load, peak required throughput,
> what apps are sharing the array, required space, future growth, whether
> the system is read- or write-heavy, the size of the files, and a
> multitude of other factors (many of which Bill Hacker has already
> mentioned in the spiralling "which is best" thread).
>
>> I can only have RAID 1 or RAID 5 and i dont have many users ~500 so what
>> do people think? This will be on a hardware RAID card but the RAID
>> channel will not be dedicated to the mail store ie there will be another
>> RAID disk sharing the channel.


'Only RAID 1 or RAID5' hints what is on-hand is not a bottom-end
psuedo-controller (most cannot do RAID5, only 0,1, or 0+1), but neither
is it a high-end (which can also do a few more fallen-from-grace
versions, and the current buzzphrase - RAID 6).

Leaving SCSI out for the moment, it is now hard to even find in stock
new 3.5" drives under 200 GB or 2.5" under 80 GB.

But the 'sweet spot' in bang-for-the buck is now around 320 to 500 GB
3.5", 120 to 160 GB 2.5".

Even if you go all-IMAP, no POP, 500 typical users are not likely to
fill up a pair, or two sets - of 500 to 750 GB SATA drives fast enough
that you won't have plenty of time to plan as they grow.

If you really want to reduce risk of failure, buy decent, but not
stupidly expensive drives, do about 30-60 days 'burn-in' on another box
(or as hot standby in the same box), migrate the data and swap-out the
older production drives at not over 2-year intervals or 60% full -
whichever comes first. You *can* even 'leapfrog' intentionally
mismatched RAID1 (say a 500 GB with a 750, then activate another slice
and go 750 with 1 TB) to further shorten the age of the rig, but it is
more work.

The 'pulls' go into lighter / less critical duty, and you keep on with
hand-me-down - replacing only the drives at the top of the risk chain.

If you slice/partition intelligently, the natural progression of
larger-cheaper drives make for a cheap and cheerful growth with minimal
admin time, and the opportunity to re-optimize every year or two to cope
with shifting needs.

And RAID1 = with pure mirrored data, no parity - fits the 'cheap and
cheerful' model quite well, as it is largely portable across controllers
and even back and forth between 'hardware' RAID and software RAID.

RAID5 is not so easy to move about, and wo betide you if you have a
controller failure, have no identical spare, and cannot find an exact
match 'Real Soon' - or ever.

YMMV

Bill


>
> You may want to refer to some of the RAID resources on the web. Googling
> for "raid levels" is a good start.
>
> In essence, given that you have a choice of RAID1 or RAID5 you're
> choosing between the following (or not; depending on the number of
> spindles you require):
>
> RAID1
> Mirroring.
> A given pair of disks act as a single drive.
> Writes can be slower than a single disk (all drives in the mirror have
> to report success for a write to complete).


To the underlying OS's VFS layer, ultimately, yes.

To Exim, not. It has gone away to do other things.

Not on a Unix anyway. Nor AFAIK on all Linux fs, either.

> Reads can be faster than a single disk (reads can come from any disk in
> the mirror; where files exceed a single block size the file can be read
> from multiple disks simultaneously).
> No loss of data with failure of a single disk.
>
> RAID5
> Stripe + Parity
> The data and parity blocks are striped across all spindles within the
> array.
> Poor write performance - each write takes multiple operations; read/read
> parity/calc parity/write data+parity. A good controller does this in RAM
> (cache) and flushes to disk some time later. A better controller has
> battery backup so in the event of a power failure it can flush the cache
> after power is restored, leaving the data intact.
> No loss of data with failure of a single disk.
>

But potential speed hit, especially if it is not distributed parity, and
it was the parity drive that went tits-up. Which - given they do the
most I/O - does happen.

IOW - a RAID5 with a failed component can slow down even if NOT actively
rebuilding. A RAID1 only suffers a performance hit *while* rebuilding.

>> Just want to lay the spindles out 'correctly'
>
> How many are you intending to have?
>
> In my experience, 500 users is fairly small and will not hit you so
> hard; RAID5 will probably be a good compromise for you since it will
> give you the maximum amount of data for a given set of spindles (above
> 3). RAID1 essentially gives you half of your theoretical capacity; RAID5
> is harder to calculate but, for example, using a 6-spindle array of
> 100GB disks with no spare drive you'd have in the region of 475GB
> available (this will shrink or grow according to vendor parity
> calculations, "right-sizing", and a number of other factors).
>


Price 6 ~100 GB SCSI (SATA 100 GB are history) AND a decent RAID5
controller, vs 2 X 750 GB SATA. No special controler needed.

> It's also worth remembering that vendors sell disks according to "Giga"
> equalling 1000, not 1024 - so your actual disk size is likely to be
> slightly smaller anyway. As an example, a RAID1 mirror of two 750GB
> vendor-sized disks shows me:
>
>      Array Size : 730860992 (697.00 GiB 748.40 GB)

>
> The GiB being the actual Gigabytes in base 16 rather than base 10.
> Anyway, that's a digression.
>


But useful - priced.

> As you mention the fact that you only have level 1 or 5 available, how
> many spindles can your hardware controller have attached?
>
> Graeme
>


Figure you need 2 SATA channels per RAID1 and Tyan, Serverworks,
GigaByte often have 6 to 8 onboard = 3 to 4 arrays w/o even an add-on
card or port mux. Boot, the OS, utils, and apps on a pair of 80-160 GB
2.5", 'read mostly' so spends most time in fast RAM, 2 arrays of 2 X 500
GB each for queue, logs, and mailstore and you are looking at 1+TB in
short 2U or long 1U.

Take the savings vs a RAID5 and build a hot-standby twin to the server,
split the load, keep the full user database and structures on both,
readly for near-seamless IP-takover or DNS re-pointing of POP & IMAP as
well as smtp.

Bill