/********************************************************* END OF STYLE RULES *********************************************************/

Tuesday, August 22, 2006

Disruption and Innovation in Data Storage


The Disruption in Demand
I've been in storage for almost twenty years and I can't remember a situation where customers are looking for solutions to so many new data management and storage problems as today. It's driven by the convergence of the needs to consolidate storage resources to increase utilization, provide continuous access to data in the face of any disaster, keep data secure and comply with laws and regulations, and to manage the huge amount of data based on its content and service level requirements. While this disruption in demand is creating a market willing to pay a premium for a solution, the storage industry really doesn't know how to provide a solution so growth has been fairly anemic and products are rapidly commoditizing. Yes, there has been some growth in storage startups, VC funding and new products claiming Compliance Solutions, etc. The problem is that most of these are point products that might provide some small benefit but don't really provide the solution that customers are willing to pay a premium for.


There's a disconnect here of the type described by Christensen and Raynor in chapter 5 of "The Innovator's Solution". In this chapter titled "Getting the Scope of the Business Right" they talk about when industry standard architectures, that are integrated from components that comply to standard interfaces are better vs. when an integrated architecture using new, enhanced interfaces and components has the competitive advantage. The advantage moves back and forth based on the level of technology relative to the problems customers are willing to pay to solve.


In the 1970s computer engineers were still working out how to build a computer system that provided reasonable performance, meaning the OS ran well on the processor, which could move data to/from disks, etc. Engineers needed the flexibility to redesign and enhance these interfaces quickly to make them better (they couldn't wait for a standards body). Then, customers were willing to pay for these enhancements, even if it meant paying the premium to get them from a vertically-integrated company like IBM or DEC, because the increased ability to solve their IT problems with their 1970s datacenter was worth it.


Then, we all know what happened in the 1980s. I was at DEC at the time. VAX/VMS machines were great but we were increasingly adding Cadillac features that most customers didn't need but that we forced them to pay a premium for anyway. In other words, we overshot the market. At the same time, what Christensen calls the Dominant Architecture evolved. Applications used standard APIs such as Unix, which ran on Intel or Motorola instruction sets and talked to storage using block SCSI protocols. This shifted the advantage to solutions built from best-of-breed components at each layer of the stack and created the horizontally-integrated industry that we have today. Over and over I go into datacenters and see this heterogeneous layering with a best-of-breed application such as Oracle, running on Unix, on say Dell platforms, with Veritas VM, Emulex HBAs, to EMC storage. This is a big change from twenty years ago.


What enabled this to happen was that twenty years ago several interfaces froze at the technology level of the early 1980s. The industry agreed that the Unix file system API more or less defined the functionality a file system could provide to applications. They could open, close, and read and write files and the file system could handle a few properties such read-only and revision dates. There was no point in telling the file system anything more about the data because they didn't know what to do about it. Similarly, the industry agreed that below the filesystem the interface was 512 byte blocks with even the few properties the file system new about it stripped out because a 1985 disk drive barely had enough intelligence to control an actuator. Forget about helping to manage the information. These engineers never imagined today's volume managers or RAID storage servers that have more compute power and lines of code than the average server of the day.


Sure, you can innovate within your layer. Most notably, the disk drive has become a RAID controllers with a variety of data services and reliability and availability features but it is still limited by the fact that at the host interface, it has to make sure it looks and acts like a brainless old SCSI disk drive. More than anything, these interfaces into the dominant system architecture define the overall customer value that a particular layer can provide. As Christensen and Raynor describe, this is what ultimately forces the advantage back to a vertically integrated solution and this is caused by a new disruption in demand. For enterprise storage, that disruption is here in the form of the 'Perfect Storm' caused by the convergence of requirements for storage consolidation, continuous availability, compliance with information laws, and the need to manage vast amounts of data based on its information content.


As I talk to senior engineers and architects across the storage industry I see them struggling with these 1980s interfaces. A common answer is to attempt to bypass and disable every other layer of the stack - effectively building a proprietary stack with enhanced, but proprietary interfaces between the layers. I hear this a lot from Database engineers: "We bypass the filesystem and go straight to the SCSI passthrough interface with our own volume manager and turn off caching in the disk drive and we asked Seagate for a mode page where we can turn off seek optimization because we need to control that. Oh, and don't use a RAID controller because that just gets in the way...".


The right answer of course is to let the experts in each layer develop the best products but extend the interfaces to give each layer the right information to help manage the data. Doing all the data management in the database application is not the right answer. Neither is doing it all in the volume manager or the virtualization engine or the RAID controller or the disk drive. It's a distributed computing problem and the optimal solution will allow each layer to add more value in the stack. This is why I'm such a huge fan of Object Storage - both OSD and enhanced NFS. They are enhanced, and extensible interfaces to let each layer, and the total system solve the data management demand disruption in ways the 1980s dominant architecture cannot.


This sounds like goodness for everyone. Engineers unleash a new wave of innovation, storage vendors get to charge a premium for them, customer are happy to pay because they finally get a solution, and investors make a return for the first time in years. As Christensen describes though, this transition from a modular, horizontally integrated industry to offering a new, unique, top to bottom solution is hard. Read the book.


There are signs of hope though. Open source, particularly Linux is providing a way to break these limiting interfaces. One, because the whole OS is accessible, any interface can be improved. Two, Linux is increasingly used as the embedded OS for storage devices so new storage interfaces can be implemented on both the server and storage side. Lustre is a good example of this. Third, the community development process provides a way for vendors in different layers of the stack to work together to develop new interfaces as is happening with NFS RDMA and pNFS. Also, HPTC, which really hasn't been a driver in new technology for a while is innovating through the use of new object storage interfaces with products like Clustre and Panasas storage. Finally some vendors are trying, including EMC with Centera and Sun with Honeycomb. These products manage data based on content by extending all the way up the stack through proprietary interface to a new application-level API. I hope we can agree on an enhanced API and see some real applications.


Summary
To summarize this long post, the storage industry is in the midst of a Demand Disruption for solutions to data management problems. The industry has solidified around a set of suppliers for modular components that plug into interfaces that were standardized in the 1980s based on the technology of that time. These modular component are hitting the limits of their ability to add value through these interfaces causing their products to commoditize. The next wave of innovation, and ability to charge premiums will come through changes to these interfaces either by systems companies that supply proprietary stacks (or portions thereof), or component suppliers who work together to enhance the interfaces.


In future posts, I will continue to look at new storage products, companies and technology through this lense defined by Christensen and Raynor.