/********************************************************* END OF STYLE RULES *********************************************************/

Friday, September 01, 2006

Innovation at the Disk Drive Component

Subtitled: My Free Advice to the Disk Drive Vendors

The disk drive industry has been severely restricted by the limitations of the block interface. By restricting the functionality a drive can expose to that of a 1980's disk drive, they have been limited to primarily innovating along only one dimension of performance - Capacity. Of course, they have made amazing increases in performance but, much as been written about the growing imbalance between capacity and the ability to access that data in reasonable time as well as support a consistent performance SLA. I've also seen several articles lately about problems with sensitive data left on old disk drives. These point to the need for drive vendors to innovate in more dimensions of performance or, saying it differently, add value in other ways than just increasing capacity and lowering cost.

It's a Distributed Computing Problem
If you talk to engineers who work at layers above the disk drive (RAID controllers, volume managers, file systems), you'll get answers like "the job of a disk driver is just to hold lots of data cheaply, we'll take care of the rest". The problem is, they can never solve problems like security, optimizing performance and providing a consistent SLA as well as if they enlist the help of the considerable processing power embedded in the disk drive itself.


Back in the 60's and early 70s, most of the low-level functions of a disk drive were controlled by the host CPU. Engineers could have said: "Hey, our CPUs are getting so much faster, it's no problem continuing to control all these low-level functions". Instead, as employees of vertically-integrated companies like IBM and DEC, they were able to take a systemic view of the problem. They realized the advances in silicon technology could be better used to embedded a controller in the drive where it could be more efficient at controlling the actuator and spindle motor. So, they actually completely changed the interface to the disk drive - a radical and foreign concept to so many computer engineers today. Now, three decades later we are dealing with a whole new set of data storage problems and, the processing power embedded in the disk drive has grown along with increases in silicon technology. Now, as in the 1970s, the right answer is to distribute some of this processing to the disk processor where it has the knowledge, and is in the right location to handle it.


The first thing to realize is these hard drives already have significant processing power built into their controller and in many cases, have wasted silicon real-estate that be used to add more intelligence. This processing power used for thing like fabricating lies about the disk geometry for OS's and drivers that think drive layout is like it was twenty years ago and want a align data on tracks, remapping around bad sections of media, read-ahead and write-back caching, re-ordering I/Os etc. The problem with these last three, is they are being done without any knowledge of the data, severely limiting it's ability to help overall system performance. We need to enable these processor to combine their knowledge of how the drive mechanics really work, with some knowledge of the properties of the data it is storing.


The first problem to address is the growing disparity between the amount of data stored under a spindle relative to the time it takes mechanical components to access it. For example, if an I/O spans from the end of one track to the beginning of the next, It still takes on the order of a millisecond just to re-align the actuator to the beginning of the track on the next platter. Or, if a track has a media defect, it can take many milliseconds to find the data that has been relocated to a good sector. Drives could save many tens of milliseconds if they just knew how data was grouped together. They could keep related data on the same track and avoid spanning defects. This is, of course, one of the key benefits of moving to an object interface.


The next problem to address is how to support a performance Service Level Agreement (SLA). Tell the drive that an object needs frequent, or fast access so it can locate it where seek times are shortest. Tell the drive that an object contains audio or video to make sure it can stream the data on reads without gaps. Allow the OS and drive to track access patterns so the drive can adjust the SLA and associated access characteristics as the workload changes. This has to be done where the knowledge of the drive characteristics is known.

How to Change the Interface
Of course, at the point I'm not telling the drive vendors anything they don't already know. Seagate, in particular, drove creation of the T10 OSD interface and has been a big advocate of the object interface for drives. The problem is, after almost ten years, they have had limited success. As Christensen pointed out, changing a major interface in a horizontally integrated industry is really hard. No one wants to develop a product to a new interface until there is already an established market. This means, not only are there products that plug into the other side of the interface, but they must be fully mature and 'baked' with an established market. So, the industry sits deadlocked on this chicken-and-egg problem. I think there is hope though and here is my advice on how to create a path out of this deadlock.

1. Up-level the discussion and speak to the right audience
The consumer of the features enabled by OSD drives are File System, RAID, and Database application developers. The T10 spec defines the transport mechanism but, that discussion is highly uninteresting to this audience. They need to know specific value they get by storing objects and they need to understand that it's value they can ONLY get by offloading to the embedded disk processor. In addition, it needs to be expressed in their language - as an object API. This is about storing objects and it maps into the Object-Oriented view of development. It's an interface for persisting objects. These objects have some public properties that can be set by the application to define required performance, security and other attributes to be applied when persisting the data. It's basic OO Design 101.

2. Standardize this higher-level API
Seagate has already gets the need for standards and has done it for the transport protocol. I hope some standardization of the higher-level API is happening in the SNIA OSD Workgroup. For any serious developer to adopt an API built on HW features, the HW must be available from multiple sources and different vendors must provide consistent behavior for some core set of functions. Of course, this lets direct competitors in on the game, but it up-levels the game to a whole new level of value.

3. Leverage open source and community development
I continue to see open source leading the way at innovating across the outdated interfaces. HW vendors who are locked into the limitations of these outdated interfaces have the most to gain by enabling value-add in their layer through open-source software but, they seem to have a blind spot here. Leverage this opportunity! It's not about traditional market analyses of current revenue opportunities. It's about showing the world whole new levels of value that your HW can offer and about getting that to early adopters so those features gain maturity.


Many of the pieces are already there. IBM and Intel have OSD drivers for Linux on Sourceforge. One is coming from Sun for Solaris. File systems are there from ClustreFS and Panasas. Emulex has demo'd a FC driver for Linux. Most of the pieces are there Object Persistence API and disk firmware. Also, the beauty of community development is that you don't have to staff armies of SW developers to do it. A small group focused on evangelizing, and creating, leading, and prototyping open development projects is enough. The developers are out there, the customer problems are there, and the start-ups and VC money are out there looking to create these solutions. Finally, although open-source leads the way on Linux and Open Solaris, if the value prop is compelling enough, developers will find a way to do it by bypassing the block stack in Windows which will, in turn, force Microsoft to support this interface so they can insert Windows back into the value chain.

4. Make the technology available to developers as cheaply as possible
The open development community is not going to leverage new HW features if they can't get the HW. Sounds fairly obvious but the FC industry in particular is missing the boat on this. Lustre and PanFS have been implemented on IP. IBM and Intel's OSD drivers on Sourceforge are for iSCSI. The irony is that Lustre and PanFS, which focus on HPTC where they could most benefit from FC performance, have been forced to move to IP, promoting the misconception that FC has some basic limitations that prevent its use in HPTC compute grids.


Any developer should be able to buy a drive and download OSD FW for it. Ideally, this should include not only a set of expensive FC drives, but also a $99 SATA drive available at Frye's. Hopefully the FW development processes at the drive vendors have evolved to the point where it is modular enough that a small team should be able to take the FW source code for a new drive, plug-in the OSD front-end, and release it on a download site for developers.

5. Participate as part of the developer community
Create an open bug database and monitor and address those issues. As early developers use this FW, they need a way to report problems, track resolution, and generally get the feeling the disk vendors are committed to supporting this new API. In addition, consider opening the source for the OSD interface part of the disk FW. The 'secret sauce' for handling properties can still be kept closed. This will accomplish several things. One, it will drive the de-facto standard (one of the primary reasons for open-sourcing anything). Two, it will enable drive vendors to leverage bug fixes and enhancements from the open-source community. Three, it will help build trust from the database/file system/RAID vendors that this interface really is mature and can be trusted and that they retain some control over the ability to find and fix problems. Fourth, it will help enable second-source vendors to implement consistent basic functionality.

Conclusions
This will take time but the ability to innovate along more dimensions than just capacity and the resulting value-add that customers are willing to pay for is worth the long-term investment. The key requirements to adopting this new interface are to communicate the value of this new functionality to the developers who will use it in terms they understand; make the functionality readily available to them and provide as much of the solution as possible; and build their trust by enabling second source suppliers and using early adopters such as the HPTC and open developer community. Finally, if any drive vendor wants help creating a specific plan, send me a note through the contact link on this blog page and we can talk about a consulting arrangement.