Sunday, May 31, 2009

Data Reduction Technologies are key to EMC’s data protection strategies:

Percolating through the many presentations at EMC World was the notion that data reduction is a key weapon in the fight to address the ominous challenge of increasing data growth which according to Mr. Tucci, is still expected to exceed 40% CAGR, even in today’s economic turmoil. Their vision for storage reduction technologies is not simply storage efficiency but as a significant component in their data protection strategies. This was nicely encapsulated in Tucci’s introductory comments that positioned data de-duplication as the technology that makes D2D back-up affordable.

The interesting twist in EMC’s perspective is the blending of their tiering strategy known as FAST (fully Automated storage Tiering) and data reduction technologies including compression, file level (single instance) and sub-file level de-duplication, both at the source as well as target side. The EMC vision is a move away from point products, such as Data domain, and calls for the re-architecting of traditional back-up strategies and methodologies to enable the integration of data reduction technologies across the storage infrastructure. It is ambitious but if they can pull it off will be impressive.

Avamar is their lead product and last year it had the distinction of being the fastest growing EMC product. Its IP has appearing in a number of product with Networker announcing de-duplication on 5/19 and Celerra supporting a no cost, file based de-duplication, available since February, both based on Avamar IP. The question that surfaces for me is whether or not de-duplication is becoming commoditized. This would mean that its future as a point product is limited as it morphs into a standard array feature such as snapshot. This happens to be my perspective and is one that puts sense into the recent acquisition of Data Domain by NetApp. Another thought is whether data reduction will become a service, an idea that is perhaps supported by the recent EMC announcement of their data de-duplication assessment service.

During one of the breakout session the following questions were asked:

Who is having bandwidth issues – no one responded.
Who was meeting their BU windows, not one positive response.

When challenged, about the incongruity of their responses the audience had a bit of an “Ah Ha” moment. The bottom line being that there appears to be a gap in the understanding of the full value of data reduction particularly data de-duplication. The notion that source based data reduction reduces the volume of data that has to move over the wire and hence significantly reduce back-up windows is apparently not as obvious as perhaps assumed. This attribute of source based data de-duplication is one of its key advantages.

EMC has a strong vision and despite proof point successes such as Nationwide (reduced their back-up window from 48hrs to 8hrs) it will be some time before the results of their integrated approach will be apparent, despite Celerra’s file based implementation.

2 comments:

Kevin Crook , Clearpace Software said...

It’s interesting that data reduction and de-duplication are nearly always talked about in the context of unstructured data and files. It shouldn’t be forgotten that there are some interesting technologies around that also address the problem of large and growing volumes of structured data in the enterprise from databases, logs and events. The data compression techniques employed by the columnar database vendors and archive store specialists also enable companies to compress structured data at source before writing to disk, improving BU windows. For example, we at Clearpace actually compress structured data by 40:1 before writing to the customers storage platform of choice through XAM, http://tinyurl.com/ns74co, including EMC’s Centera device.

Bill said...

Kevin: My apology in not getting back sooner. The notification went to an old email, need to fix. you are perfectly correct,unstructured data is the topical subject in coversations and the issue of structured data reduction tends to be ignored. There are a couple of companies offering solutions yourselves, ZettaPoint and an interesting compression company Storwize. I am sure there are more
I must confess I am not familiar with Clearpace, that I will also fix.
Expect to see a growing interest in primary data reduction.