Monday, June 22, 2009

Data classification for structured data enables intelligent management of Oracle database data

With all the current hype about storage tiering being the panacea to efficient data storage management what tends to be overlooked is the somewhat basic but critical step of data classification, beyond a simplistic classification determined by basic usage characteristics

However when talking data classification beware of semantics. To those interested in data security, data classification means top secret, confidential, proprietary, internal etc and these folks worry about access permissions and physical security. To the data storage professional the term refers to its basic activity profile such as age, size and access frequency and increasingly by its perceived business value. This post focuses on the latter.

Data classification today tends to be associated with un-structured data but, have you ever thought of using data classification to manage your structured data? Data classification is just as relevant when considering what data to migrate to a higher performing storage such as SSD as well as offloading persistent and long tailed data to less expensive media. Not only can intelligent data migration help with storage performance and economics but in the case of structured data offloading, users can expect a significant database performance boost.

Recently I was introduced to a small software company, ZettaPoint. With a value proposition that appears to resonate with end users they have drawn the attention of a couple of key industry players Oracle and EMC. ZettaPoint has developed a tool that helps end users optimize the performance of their Oracle database by classifying database data not just by its activity profile but by its determined business value. This enables data placement on storage that matches its usage and business value which separates dormant data from active database data. This removes the drag on database performance caused by growing volumes of inactive data. Performance can also be turbocharged by migrating the “hot” data identified by the ZettaPoint solution to SSD class storage. The ZettaPoint, usage based data classification tool for structured data has been embraced by EMC to help with the optimization of their EFD (Enterprise Flash Drive) implementations and Oracle sees its value in enabling Oracle ILM (Information Lifecycle Management). For those interested whitepapers are available for both applications on the ZettaPoint website.

The product is called DBclassify and through actual workflow analysis, differentiates structured data which presents full visibility into the actual use of a database. By tracking and ranking every database object (tables, columns and partitions) it delivers data classification based on an analysis of actual database activity rather than on a policy based approach as common in manual classification. DBclassify also tracks all SQL statements running in the database, including local connections and stored procedures. The type of information that DBclassify uses in its analysis and he will present includes;

  • Which database objects (including partitions( experience I/O waits the most.
  • Which users are experiencing I/O wait most often and how much time do they wait on I/O relative to the overall response time.
  • What kind of I/O wait users are experiencing.
  • Which tables and columns are most frequently used, and which are seldom used or dormant.
  • Which database objects are used by which application and users?
  • Who accesses the data.
  • The ranges of data values are most frequently queried by business users and which are never queried.
  • The data that has the most or least value to business users
  • Whether access patterns are different from what was expected.

In the case of EMC this tool can help identify whether the database will benefit from being on EFD and aid the user with data placement (see white paper on integration with V-Max) . It has filtering mechanisms that enables users to focus on data associated with specific applications or users rather than placing the complete database on SSD technology although that is always an option. When paired with Oracles ILM Assistant, DBclassify adds usage based data classification that automates what previously was a manual, policy based process and enables users of the Oracle ILM solution to realize more of its potential.

ZettaPoint is a small but energetic company and the features of DBclassify were demonstration to me at the recent EMC World. The demonstration lived up to the marketing propaganda and I did see a number of named users, including EMC’s internal IT express significant interest. However being a pragmatist I have no actual customer testimony and so I currently take their claims at face value but with caution. I think the phrase is trust but verify.

The ZettaPoint solution addresses a real world problem and is a solution with a tangible value to the end user. I would tag these guys as worth watching and those of you with database performance issues I suggest you take a real close look.

1 comment:

Kevin Crook , Clearpace Software said...

While I agree that data classification is an important component of the ILM process, it’s equally important to consider the most appropriate data management technology for hot and cold data. For example, at Clearpace we don’t believe that keeping historical data in a database and moving it to a different storage tier is the optimal solution. Specialized archive repositories with extreme data compression, simplified data management, industry standard access and running on commodity hardware, can really drive the cost and complexity out of managing historical data. It shouldn’t just be about cheaper storage platforms and media