Sunday, March 28, 2010

Don’t let Virtual Desktop (VDI) “Storage Tax” take you by surprise!

My favorite phrase these days is that “server virtualization breaks traditional storage” and the more I talk with folks who are at the sharp end of a virtualization implementation the more I am convinced of its truth.

There are several reasons we can point to that are responsible for this dilemma. Robin Harris of StorageMojo coined a very descriptive phrase, the “IO Blender” to describe one of the issues, the mixing of multiple virtual IO streams resulting in the delivery of a single stream of random IO to the storage. This characteristic of virtualization neutralizes much of the IO management smarts that architects of traditional storage have developed over the years. A topic I am reserving for a future blog.

However what about implementing a Virtual Desktop Infrastructure and what are the strains that VDI places on the storage infrastructure?

Data growth is the most obvious impact. As each local desktop is replaced with a virtual desktop the local storage burden now transfers to the centralized storage pool. Note, local storage tends to be cheap consumer grade storage, data center storage is much more expensive. So, assuming that there is an average local demand of 10GB, multiply that by 1000 virtual user’s, that translates to an additional 10TB of storage that the data center storage administrator has to find. With the aggressive growth in unstructured enterprise data an average user requirement of 10GB may be an understatement. True there are many techniques such as tiering, thin provisioning, data deduplication and compression to help to keep data growth in check but it remains a challenge.

However, without demoting the significance of data growth perhaps the greater problem is data access or more accurately access density.

Access density is a measure of performance; according to Fred Moore “access density is the ratio of disk drive performance measured in IO/sec to the capacity of the drive”. In the case of a storage system it would be the total IOPS the system can deliver to the total capacity. Another factor that influences IO performance is drive utilization. As drives begin to fill with data, latency increases and eventually the system ability to deliver its maximum IOPS is impaired. As a side thought avoiding this problem is what short stroking is all about. Short stroking limits the amount of data stored on the drive to the outer bands of the drive platters. While this maintains maximum IO performance it results in only 20% or so of the disk capacity being used. Effective, but very expensive.

So what has this to do with VDI?

First let us assume that a desktop has an average IOP requirement of 20 IOPS. However when an application such as Visio is accessed the IO demand goes through the roof. Considering that laptops are designed to exploit the maximum IO its hard drive can deliver, my 20 IOPS assumption may be conservative. In an environment that has 1000 VDI clients, that is an additional 20,000 IOP of “average” demand placed on the storage. However, a major characteristic of VDI is that at start of day the majority of the clients boot there virtual system at close to the same time which creates a peak IO demand on the storage well above the average state. A great example of an enterprises requirement for high velocity data.

So the question to be determined is what Peak IO performance should the storage be able to deliver? The answer lies in the QoS expectations but for illustration let us assume a peak of 50,000 IOPS is the requirement. Based on a FC drive with an approximate 200 IOP capability this translates to an additional 250 spindles and if SATA is planned this number doubles. A lot of drives and this does not allow for the impact of the additional capacity demand.

This additional storage requirement driven by a need for increased access performance as well as increased data center capacity is what I refer to as virtualizations “Storage Tax”. It can be a rather unpleasant surprise to the unsuspecting.

So what are the potential solutions?

First step understand performance expectations and plan early for additional resources, chances are you will need them. Possible options;

1. Deploy a large number of traditional storage spindles. Concurrent access to these spindles will deliver high IOPS. Short stroking is a technique that could be applied. This is an expensive option in both capital as well as operational expense. Maximum tax if you will.
2. A solution that delivers high IOPS through dense spindle packaging enabling concurrent access to a large number of spindles. These architectures deliver an economical option without compromising performance.
3. SSD is an option and will most certainly meet the access performance requirements. However SSD is still expensive and depending on the capacity requirements may eliminate pure SSD as a practical option.
4. The most effective solution for the larger implementation is probably a tiered solution that combines SSD and spinning disk. While most storage vendors now offering tiered solutions in their portfolio it is not the storage (SSD or HDD) that is the key differentiator but the quality of their tiered storage management. Remember tiered solutions should eliminate problems and complexities not create them.

No comments: