Friday, August 1, 2008

MAID (Massive Array of Idle Disk) - Not all MAID is created equal

In the past week I have been involved in a number of conversations regarding MAID and spin down technologies. My surprise was the depth of misunderstanding that existed regarding the two technologies and so the purpose of this posting is to provide some insight and hopefully some wisdom on what is and what is not MAID.

The concept of MAID technology is the brainchild of a research team from the University of Colorado who hypothesized that MAID (massive array of idle disks) would be a storage structure that would deliver the density of tape with the performance similar to that of disk and with a very small power envelope.[1]

SNIA Definition – “MAID, a storage system comprising of a massive array of (idle) disk drives that are powered down individually or in groups when not required.” [2]

The motivation for such an architecture was to deliver a solution that exploited relatively inexpensive SATA disk technology to create a commercially viable, enterprise class, mass storage solution that exhibited much of the access performance and data integrity characteristics of a disk array but with the economics of a tape library. The sweet spot for this technology, and where it will deliver the most benefit, is in the storage and management of persistent data that is, infrequently accessed data (low IOP’s), data that will rarely if ever be changed, but data that is serving applications that need faster access to individual files than magnetic tape can deliver. Adding to the sweet spot characteristics is high energy efficiency.

But not all “MAID” labeled solutions are created equal. Do not confuse MAID with spin down features that are added to conventional disk array architectures.

So what are the key characteristics of a MAID Solution that drive user benefit and what are the unique design characteristics that are not noticeable in traditional array design.

1. Very high drive packaging density.
The fact that in a MAID architecture the number of drives that can spin at any one time is limited. This allows extremely dense packaging not possible in conventional architectures. Example, a single COPAN frame can support up to 896 drives while the new EMC Infiniflex 10000 aka Hulk tops out at 300 drives and DataDirect Networks S2A Storage Scalar at 600 drives per rack. Greater storage density delivers floor space savings, elimination of aging technology through consolidation and the opportunity to reallocate more expensive storage to more appropriate use and potentially delaying an expensive purchase.

2. The number of drives that can spin at any one time is limited and will not exceed 50% of the total number of drives installed.
This is the original definition as presented by the University of Colorado researchers. COPAN currently limits this number to 25%. The key being that not all drives spin at any one time reducing maximum power requirements, reduced heat generation which in turn reduces the necessary cooling infrastructure and aides in the elimination of rotational vibration issues

3.The power available in the Cabinet will not support all drives spinning at any one time.
Limited power budget drives power efficiencies and prevents any misguided attempt to power up all drives.

4.The component count, power supplies, power converters, fans etc will be significantly less than traditional architectures.
Reduced component count should equate to a less cost and improved system reliability. However only actual reliability data will confirm how well theory performs when reduced to practice.

5. Access to data on drives that are powered down will be 15 seconds or greater. Longer than expected from a traditional disk subsystem but significantly less than off board tape.
Applications must be MAID aware. A request for data on a powered down LUN will experience a delayed response back to the application and if the application is not MAID aware the delay may trigger a time out or recovery action. Applications must be MAID aware.

6.To meet data center, enterprise class expectations the solution should have embedded data and device integrity checking and self healing capabilities.
The value and usefulness of data lives much longer after its creation and its initial period of activity. Corporate governance and government compliance regulations are causing data to be stored for increasing longer periods of time and it is this accretive process that is fueling the explosive data growth issue. Any solution that targets the storage of long term data must be architected to ensure the integrity and availability of the data when requested. The more automated the processes the better.

What is not a characteristic of a MAID solution is the increasingly common feature of drive spin down, occasionally referred to as sleepy drives, a software feature added to traditional array architectures. Spin down is a compromise approach enabling vendors to add this feature to existing array architectures. It is better classified as a power efficiency feature that will deliver 25% to 60% of the power savings possible from a MAID implementation. Companies who offer a spin down option include Fujitsu, HDS, NEC, DataDirect Networks, Nexsan with EMC and Pillar Data promising a future deliverable.

A broader discussion of MAID and its relevance in the data center is presented in the white paper “ Defining MAID – (Massive Array of Idle disk). A discussion of its relevance in the data center.” This paper can be accessed on my web site

[1] The Case for Massive Array of Idle Disks (MAID): Dennis Colarelli, Dirk Grunwald and Michael Neufeld, Dept of Computer Science, University of Colorado, Boulder. January 7th, 2002.
[2] The Dictionary of Storage Networking Technology, Storage Networking Industry Association (SNIA), 2005/2006.
[3] Persistent Data Storage Architecture: COPAN Systems, September, 2006

No comments: