Thursday, October 8, 2009

Evolution of Clustered Storage

Clustered storage has its roots in the high performance computing (HPC) world where researchers needed to solve the cost/performance dilemma associated with their mainframe resources. With the dramatic evolution of commodity servers in price, performance and size, innovative thinkers realized that by aggregating the compute power of these individual resources their problem was solved.

Clustered storage followed a similar evolutionary path as clustered computing and while it is true the compute performance dilemma was mitigated, the unintended consequence was to move the performance bottleneck to storage. Large monolithic “big iron” storage solutions could no longer deliver the performance needed.

Clustered File System is a distributed file system that is not associated with a single server or particular group of clients but with a cluster of storage servers otherwise known as cluster nodes.The solution was to logically integrate (network) standard servers (storage servers/controllers) and basic storage units (JBODs), with software that manages the physical federation of these standard components (cluster nodes) and a unique file system (clustered file system) that manages the hosted data.

Cluster Nodes: The physical compute and storage components normally referred to as nodes that can be either compute (standard servers), storage (JBOD/Array) or hybrid nodes. Hybrids have both compute and storage resources in the same physical unit, simplifying linear performance and capacity growth.

Clustered File System (CFS): The intelligence that manages the data within the storage cluster. A clustered file system is a distributed file system that is not associated with a single server or particular group of clients but with a cluster of storage servers otherwise known as cluster nodes. The CFS services initiator requests irrespective of the node within the storage cluster receiving the request.

Cluster file systems have outstanding elasticity supporting a scalability that exceeds the architectural capabilities of comparable monolithic options. Clustered storage solutions can scale from the low terabyte (TB) range to very large pools of storage measured in petabytes (PBs). What tends to limit the upper limit is the physical capability of the storage architecture.

If any node or other component fails in a clustered architecture access to data is not compromised; there is no single point of failure.However, not all clustered file systems are created equal. The highest performing systems are parallel architectures that support parallel data access while allowing all nodes to concurrently access the same files. However even parallel file systems are not created equal with differing approaches to concurrency control. The difference is two disparate approaches namely symmetrical that requires clients to run metadata manager code, and asymmetrical that has dedicated metadata managers.

Each node within a symmetrical cluster is a logical peer to all other nodes within the cluster. The file system is maintained across the entire cluster, so by definition this architecture delivers a robust, highly available data storage solution optimized for high availability. If any node or other component fails, access to data is not compromised, the node replacement is non-disruptive, and there is no single point of failure. This flexibility delivers improved reliability, accessibility, serviceability and upgradability. However, because of the distribution of the metadata a symmetrical design has the potential to impact high-end performance, but unlikely to be an issue in general purpose commercial computing.

An asymmetrical cluster is similar but the need for dedicated metadata nodes compromises the symmetry and introduces a single point of failure, diluting the overall solution robustness.

Clustered architectures are becoming increasing popular as general purpose storage engines. Their simplicity, scalability, flexibility and cost effectiveness make them an ideal option for the data intensive enterprise whether a large, medium or small enterprise.