Increasing disk reliability with RAID

It is important to understand the terms reliability and performance as they pertain to disks. Reliability is the ability of the disk system to accommodate a single- or multi-disk failure and still remain available to the users. Performance is the ability of the disks to efficiently provide information to the users.

Adding redundancy almost always increases the reliability of the disk system. The most common way to add redundancy is to implement a Redundant Array of Inexpensive Disks (RAID).

There are two types of RAID:

Hardware — The most commonly used hardware RAID levels are: RAID 0, RAID 1, RAID 5, and RAID 10. The main differences between these RAID levels focus on reliability and performance as previously defined.

Software — Software RAID can be less expensive. However, it is almost always much slower than hardware RAID, because it places a burden on the main system CPU to manage the extra disk I/O.

The different hardware RAID types are as follows:

RAID 0 (Striping) — RAID 0 has the following characteristics:

High performance — Performance benefit for randomized reads and writes

Low reliability — No failure protection

Increased risk — If one disk fails, the entire set fails

The disks work together to send information to the user. While this arrangement does help performance, it can cause a potential problem. If one disk fails, the entire file system is corrupted.

RAID 1 (Mirroring) — RAID 1 has the following characteristics:

Medium performance — Superior to conventional disks due to "optimistic read"

Expensive — Requires twice as many disks to achieve the same storage, and also requires twice as many controllers if you want redundancy at that level

High reliability — Loses a disk without an outage

Good for sequential reads and writes — The layout of the disk and the layout of the data are sequential, promoting a performance benefit, provided you can isolate a sequential file to a mirror pair

In a two disk RAID 1 system, the first disk is the primary disk and the second disk acts as the parity, or mirror disk. The role of the parity disk is to keep an exact synchronous copy of all the information stored on the primary disk. If the primary disk fails, the information can be retrieved from the parity disk.

Be sure that your disks are able to be hot swapped so repairs can be made without bringing down the system. Remember that there is a performance penalty during the resynchronization period of the disks.

On a read, the disk that has its read/write heads positioned closer to the data will retrieve information. This data retrieval technique is known as an optimistic read. An optimistic read can provide a maximum of 15 percent improvement in performance over a conventional disk. When setting up mirrors, it is important to consider which physical disks are being used for primary and parity information, and to balance the I/O across physical disks rather than logical disks.

RAID 10 or 1+0 — RAID 10 has the following characteristics:

High reliability — Provides mirroring and striping

High performance — Good for randomized reads and writes.

Low cost — No more expensive than RAID 1 mirroring

RAID 10 resolves the reliability problem of striping by adding mirroring to the equation.

Note: If you are implementing a RAID solution, Progress Software Corporation recommends RAID 10.

RAID 5 — RAID 5 has the following characteristics:

High reliability — Provides good failure protection

Low performance — Performance is poor for writes due to the parity's construction

Absorbed state — Running in an absorbed state provides diminished performance throughout the application because the information must be reconstructed from parity

Caution: Progress Software Corporation recommends not using RAID 5 for database systems.

It is possible to have both high reliability and high performance. However, the cost of a system that delivers both of these characteristics is higher than a system that is only delivers one of the two.