Monday, May 01, 2006

RAID-DP vs RAID 10 protection

On average, disk capacity is doubling every 15 to 18 months. Unfortunately, disk error correction code (ECC) capabilities have not kept pace with that capacity growth. This has a direct impact on data reliability over time. In other words, disks are about as good as they are going to get, but are now storing eight times the amount of data they did just four years ago. All storage system vendors are affected. A double-parity configuration shields customers against multiple drive failures for superior protection in a RAID group.
- Roger Cox, Chief Analyst - Gartner, Inc

With the advent of SATA drives and their proliferation in the Enterprise the above comment is quite significant. Most vendors to date, use a RAID 1 or RAID 10 protection schemes to address the shortcomings of PATA/SATA drives. What we do know about these drives is that they have low MTBFs and the Bit Error Rate is 10^14. That's approximately 1 bit error per 11.3TB. Compare this to FC drives at 10^15 with 1 bit error per 113TB!!!

Drive reliability is a function of two elements: MTBF (Mean Time Between Failures) and BER (Bit Error Rate). Historically ATA drives have demonstrated lower reliability than SCSI or FC drives and this has nothing to do with the interface type but rather it's directly related to the components used (media, actuator, motor etc) in the drive.

As I mentioned above, PATA/SATA drives are getting deployed in abundance these days in the Enterprise as a lower cost medium to host non-mission critical apps, as well as, serving as targets/Snap areas holding Snapshoted data for applications residing on higher performance disks. In addition, they are deployed in Tiered Storage approaches either within the same array or across arrays of different costs.

In order to protect against against potential Double disk failures in PATA/SATA configurations, several vendors propose RAID 1 or RAID10. While, seemingly, there's nothing wrong with deploying RAID 1 or RAID 10 configurations, they do add cost to the overall solution by requiring 2x the initial capacity and thus 2x the cost. These types of configurations do protect against a variety of Dual disk failure scenarios, however, they do not protect against every Dual disk failure scenario.

So lets look what is the probability of a Double disk failure using RAID 10. Below we have a 4 disk RAID 10 Raid group:

Here, we have 6 potential dual disk failure scenarios shown by the various arrows. Two of these failures scenarios are fatal (i.e Both disks that hold same mirrored block on either side, fail). So the probability of a fatal double disk failure is 2/6 or 33%. or 1/n-1 disks. Yow!!!

So let me see...2x capacity at 2x the cost so you can, potentially, survive 66% of the failures!!! Clearly there's a winner and a loser here and you can guess who's who.

With Netapp's patented RAID-DP solution you are guaranteed protection against a double disk failure at a fraction of RAID 1 (2N) or RAID 10 (2N) capacity (RAID-DP=N+2 parity drives) and at a fraction of the cost.

Furthermore, RAID-DP is very flexible and allows our customers to non-disruptively change from an existing RAID-4 configuration to a RAID-DP one and vice versa, on-the-fly.

1 comment:

Anonymous said...

Good work. Thanks for sharing the information. I am a student working on storage systems.

--Kiran*