Percolating through the many presentations at EMC World was the notion that data reduction is a key weapon in the fight to address the ominous challenge of increasing data growth which according to Mr. Tucci, is still expected to exceed 40% CAGR, even in today’s economic turmoil. Their vision for storage reduction technologies is not simply storage efficiency but as a significant component in their data protection strategies. This was nicely encapsulated in Tucci’s introductory comments that positioned data de-duplication as the technology that makes D2D back-up affordable.
The interesting twist in EMC’s perspective is the blending of their tiering strategy known as FAST (fully Automated storage Tiering) and data reduction technologies including compression, file level (single instance) and sub-file level de-duplication, both at the source as well as target side. The EMC vision is a move away from point products, such as Data domain, and calls for the re-architecting of traditional back-up strategies and methodologies to enable the integration of data reduction technologies across the storage infrastructure. It is ambitious but if they can pull it off will be impressive.
Avamar is their lead product and last year it had the distinction of being the fastest growing EMC product. Its IP has appearing in a number of product with Networker announcing de-duplication on 5/19 and Celerra supporting a no cost, file based de-duplication, available since February, both based on Avamar IP. The question that surfaces for me is whether or not de-duplication is becoming commoditized. This would mean that its future as a point product is limited as it morphs into a standard array feature such as snapshot. This happens to be my perspective and is one that puts sense into the recent acquisition of Data Domain by NetApp. Another thought is whether data reduction will become a service, an idea that is perhaps supported by the recent EMC announcement of their data de-duplication assessment service.
During one of the breakout session the following questions were asked:
Who is having bandwidth issues – no one responded.
Who was meeting their BU windows, not one positive response.
When challenged, about the incongruity of their responses the audience had a bit of an “Ah Ha” moment. The bottom line being that there appears to be a gap in the understanding of the full value of data reduction particularly data de-duplication. The notion that source based data reduction reduces the volume of data that has to move over the wire and hence significantly reduce back-up windows is apparently not as obvious as perhaps assumed. This attribute of source based data de-duplication is one of its key advantages.
EMC has a strong vision and despite proof point successes such as Nationwide (reduced their back-up window from 48hrs to 8hrs) it will be some time before the results of their integrated approach will be apparent, despite Celerra’s file based implementation.
Sunday, May 31, 2009
Thursday, May 28, 2009
Do not Confuse Innovation with Technology Harvesting
Having been involved in product development for many years I have also witnessed the myopic thought processes that guide solution development. Short term perspectives that focus on today's near term problems without giving appropriate consideration to enabling the flexibility necessary to meet the yet unrecognized challenges of tomorrow may reflect a marketing failure but in many cases it reflects a corporate driven design compromise to enable the use of existing technologies.
These rear view mirror perspectives are how I describe the notion of “Running backwards into the future”
Established vendors have a legacy, they have revenue generating product lines that they protect by a continual upgrade process and this creates a business challenge when looking to introduce new technologies. This desire to protect existing investments and revenue streams drives the propensity of the market place incumbents to compromise and adapt solution development to fit existing technology.
An interesting evolution obvious in some recent product introductions has been the use of standard commodity components. The motivation being the belief that technological evolution can be exploited more rapidly and that solutions based on these components can evolve at a pace that tends to be significantly faster than systems based on custom silicon. Exploiting the use of high volume commodity components that enjoy the benefits of the cost containing commodity curve, not only helps to keep cost lower but avoids the heavy investment commitment associated with custom silicon.
Developing dedicated silicon has advantages and no doubt will continue in many high-end solutions but the inertia and cost associated with the development of custom silicon found in proprietary solutions mean that they do not adapt or evolve as quickly as those based on standard and commodity components
The arguement is compelling in favor of exploiting commodity components in terms of cost and speed of implementing technology refresh and I predict will become more the rule than the exception.
Sunday, May 17, 2009
Data Domain claims to have the fastest global data de-duplication for enterprise DR readiness.
Earlier this month Data Domain announced some interesting enhancements to their Enterprise Replicator Software. The announcement underlined their market performance to justify their bullish exclamation that they are #1 in data de-duplication.
Note: I have updated this posting since its original publishing.
Originally introduced to the market in 2004:
- By 2006 Data Domain had 330 customers that has grown to over 3000 today.
- In 2006 they had shipped over 1000 systems, today this number is now over 8000.
- In 2006 they had over 30PB of data under protection, today the number claimed is greater than 1000PB.
- In 2006 the replication attach rate was 40%, now it is close to 60%.
The key elements of the announcement were:
- Collection Replication. Basically a full system mirror between two systems for DR purposes.
- Global cross-site de-duplication. Simply put a feature that collects de-duplicated data from multiple remote sites, all managed by a single directory of common objects that is representative of all these disparate data sources. This limits the replication to only unique data objects when viewed across all data sources. The benefit of this aggregated view is to significantly minimize the number of unique data objects replicated, hence minimizing expensive bandwidth requirements. The logical throughput performance quoted was 6GB/sec or 21.6 TB/hr.
- The fan-in ratio has been increased to 90 to 1. Meaning that one central instance of a single Data Domain controller (DD660 or DD690) can now support the de-duplication of up to 90 remote data sources, significantly simplifying back-up and recovery actions.
Data reduction techniques such as data deduplication and data compression are now well accepted as legitimate techniques in the battle to control today’s explosive data storage growth. These logical representations of data have a significantly smaller physical footprint requiring not only less physical storage but significantly less bandwidth when replicated. These basic data reduction methodologies continue to evolve and there is a growing argument that the basic technologies will become commoditized within the next 12 to 18 months with those vendors who will be successful will be the thoughtful innovators who create meaningful, value added extensions to the basic data reduction technology.
Data Domain are certainly showing such innovative tendencies.
Friday, May 15, 2009
EMC World, May 17th through 21st
On Sunday night there is a concert with the Gin Blossoms - who are the Gin Blossoms?
Friday, May 8, 2009
Oracle holding onto SUN Storage
Last week (4/30) in a blog post I posed a few scenarios which added Ellison’s investment in Pillar Data into the mix and now that the commitment has apparently been made to hold onto the SUN Storage it remains to be seen what the trickle down effect will be on Pillar Data.
In an interview with Chris Mellor, Bob Maness, VP WW Marketing, at Pillar Data said that “Compared to SUN’s disk storage we have superior products. SUN storage is not competitive. So we can go to SUN resellers and offer Pillar so that their storage attach rate go up compared to current SUN storage.” Such a statement opens Maness up to an accusation of hubris or perhaps he is simply taking a page out of Donald Trumps playbook and promoting the outlook for Pillar Data with bravado and employing a little hyperbole.
Nothing however is likely to happen until the deal is done.
Wednesday, May 6, 2009
Not all MAID (Massive Array of Idle Disks) is created equal
The concept of MAID technology is the brainchild of a research team from the University of Colorado who hypothesized that MAID (massive array of idle disks) would be a storage structure that would deliver the density of tape with the performance similar to that of disk and with a very small power envelope.[1]SNIA Definition – “MAID, a storage system comprising of a massive array of (idle) disk drives that are powered down individually or in groups when not required.” [2]The motivation for such an architecture was to deliver a solution that exploited relatively inexpensive SATA disk technology to create a commercially viable, enterprise class, mass storage solution that exhibited much of the access performance and data integrity characteristics of a disk array but with the economics of a tape library. The sweet spot for this technology, and where it will deliver the most benefit, is in the storage and management of persistent data that is, infrequently accessed data (low IOP’s), data that will rarely if ever be changed, but data that is serving applications that need faster access to individual files than magnetic tape can deliver. Adding to the sweet spot characteristics is high energy efficiency.But not all “MAID” labeled solutions are created equal. Do not confuse MAID with spin down features that are added to conventional disk array architectures.
So what are the key characteristics of a MAID Solution that drive user benefit and what are the unique design characteristics that are not noticeable in traditional array design.
- Very high drive packaging density.The fact that in a MAID architecture the number of drives that can spin at any one time is limited. This allows extremely dense packaging not possible in conventional architectures. Example, a single COPAN frame can support up to 896 drives while the EMC Infiniflex 10000 aka Hulk tops out at 300 drives and DataDirect Networks S2A Storage Scalar at 600 drives per rack. Greater storage density delivers floor space savings, elimination of aging technology through consolidation and the opportunity to reallocate more expensive storage to more appropriate use and potentially delaying an expensive purchase.
- The number of drives that can spin at any one time is limited and will not exceed 50% of the total number of drives installed.This is the original definition as presented by the University of Colorado researchers. COPAN currently limits this number to 25%. The key being that not all drives spin at any one time reducing maximum power requirements, reduced heat generation which in turn reduces the necessary cooling infrastructure and aides in the elimination of rotational vibration issues.
- The power available in the Cabinet will not support all drives spinning at any one time.Limited power budget drives power efficiencies and prevents any misguided attempt to power up all drives.
- The component count, power supplies, power converters, fans etc will be significantly less than traditional architectures.Reduced component count should equate to a less cost and improved system reliability. However only actual reliability data will confirm how well theory performs when reduced to practice.
- Access to data on drives that are powered down will be 15 seconds or greater. Longer than expected from a traditional disk subsystem but significantly less than off board tape.Applications must be MAID aware. A request for data on a powered down LUN will experience a delayed response back to the application and if the application is not MAID aware the delay may trigger a time out or recovery action. Applications must be MAID aware.
- To meet data center, enterprise class expectations the solution should have embedded data and device integrity checking and self healing capabilities.The value and usefulness of data lives much longer after its creation and its initial period of activity. Corporate governance and government compliance regulations are causing data to be stored for increasing longer periods of time and it is this accretive process that is fueling the explosive data growth issue. Any solution that targets the storage of long term data must be architected to ensure the integrity and availability of the data when requested. The more automated the processes the better.
What is not a characteristic of a MAID solution is the increasingly common feature of drive spin down. This is an approach which does not completely remove power from the drives but does still deliver significant energy savings. Although drive power down has been in the SCSI command set for some time the practical implementation of drive spin down to manage power efficiency was a result of additional innovation by the drive manufacturers, not the array vendor.
Drive manufacturers such as Hitachi introduced multiple powered down states. This was first introduced for laptops, and now extended into enterprise SATA drives used in enterprise storage arrays. For example, Hitachi allows a drive to be in one of four power states:
- Level 0:
Normal operation at 7,200 rpm with heads loaded (un-parked) - level 1:
Heads Unloaded (parked, reduces wind resistance on heads)
15% to 20% power savings
Sub-second recovery time - level 2:
Heads Unloaded,
Slows to 4000 rpm
35% to 45% power savings
15 second recovery time - level 3:
Stops spinning (sleep mode; powered on)
60% to 70% savings
30 to 45 second recovery time
Seagate has a SATA drive that allows the drive just to be powered off (level 3), and Western Digital has a SATA drive so-called “Green Drive” that revolves slower (5,400rpm) and can also park the heads (level 1).
To the purest spin down is a compromise approach enabling vendors to add this feature to existing array architectures. It is better classified as a power efficiency feature that will deliver 25% to 60% of the power savings possible from a MAID implementation. Companies who offer a spin down option include Fujitsu, HDS, NEC, DataDirect Networks, Xiotech, Xyratex, Nexsan, Greenbytes and EMC with NetApp, Pillar Data and others promising a future deliverable.
A broader discussion of MAID and its relevance in the data center is presented in the white paper “ Defining MAID – (Massive Array of Idle disk). This paper can be accessed on my web site http://www.veridictusassociates.com/
[1] The Case for Massive Array of Idle Disks (MAID): Dennis Colarelli, Dirk Grunwald and Michael Neufeld, Dept of Computer Science, University of Colorado, Boulder. January 7th, 2002.[2] The Dictionary of Storage Networking Technology, Storage Networking Industry Association (SNIA), 2005/2006.[3] Persistent Data Storage Architecture: COPAN Systems, September, 2006
Labels: automaid, copan systems, data center energy efficiency, data management, data storage, disk storage, eco mode, maid storage, maid technology, persistent data, spin down
Monday, May 4, 2009
Holographic Storage Steps Closer to Reality.
A differentiator that GE enjoys was explained by Brian Lawrence, Head of GE’s Holographic Storage Program, in a recent LaserFocusWorld article; “ Because GE’s micro-holographic discs could essentially be read and played using similar optics to those found in standard in standard Blu-ray players, our technology will pave the way for cost effective, robust and reliable holographic drives that could be in every home.” The inference is that micro-holographic players should be able to read CD’s, DVD,s and Blu-ray Discs which questions the wisdom of GE’s market entry strategy.
Still about two years from delivering a practical solution GE is planning to target commercial archive applications followed by the consumer market. Considering Lawrence’s comments, the GE entry positioning is difficult to understand. Perhaps someone should tell their marketing folks that InPhase Technologies, who are also focused on the commercial archive industry, are already delivering disc capacities of 300GB, with 800GB on the roadmap for 2011.
