/********************************************************* END OF STYLE RULES *********************************************************/

Wednesday, August 09, 2006

Re-post: More on Blocks

Originally posted Aug 31, 2005
A few weeks ago I was blogging about how block protocols like SCSI were designed around the on-disk sector format and limited intelligence of 1980's disk drives. Clearly, if we were starting from a clean sheet of paper today to design storage for modern datacenters, this is NOT the protocol we would create.


The real problem though isn't just that the physical sector size doesn't apply to today's disk arrays. The problem today has more to do with the separation of the storage and the application/compute server. Storage in today's data-centers sits in storage servers, typically in the form of disk arrays or tape libraries which are available as services on a network, the SAN. These storage services are used by a number of clients - the application/compute servers. As with any server, you would like some guarantee of the level of service it provides. This includes things like availability of the data, response time, security, failure and disaster tolerance, and a variety of other service levels needed to insure compliance with laws for data retention and to avoid over-provisioning.


The block protocol was not designed with the notion of service levels. When a data client writes a collection of data, there is no way to specify to the storage server what storage service level is required for that particular data. Furthermore, all data gets broken into 512-byte blocks so there isn't even a way to identify how to group blocks that require a common service level. The workaround today is to use a management interface to apply service levels at the LUN level which is at too high a level and leads to over-provisioning. This gets really complicated when you factor in Information Lifecycle Management (ILM) where data migrates and gets replicated to different classes of storage. This leads to highly complex management software and administrative processes that must tie together management APIs from a variety of storage servers, operating systems, and database and backup applications.


If we were starting from a clean sheet of paper today to design a storage interconnect we would do a couple of things. One, we would use the concept of a variable sized data Object that allows the data client to group related data at a much finer granularity then the LUN. This could be an individual file, or a database record, or any unit of data that requires a consistent storage service level. Second, each data object would include metadata - the information about the object that identifies what service levels, access rights, etc. are required for this piece of data. This metadata stays with the data object as it migrates through its lifecycle and gets accessed by multiple data clients.


Of course there are some things about today's block protocols we would retain such as the separation of command and data. This allows block storage devices and HBAs to quickly communicate the necessary command information to set up DMA engines and memory buffers to subsequently move data very efficiently.


Key players in the storage industry have created just such a protocol in the ANSI standards group that governs the SCSI protocol. The new protocol is called Object SCSI Disk (OSD). OSD is based on variable-sized data object which includes metadata and can run on all the same physical interconnects as SCSI including parallel SCSI, Fibre Channel, and ethernet. With the OSD protocol, we now have huge potential to enable data clients to specify service levels in the metadata of each data object and to design storage servers to support those service level agreements.


I could go on for many pages about potential service levels that can be specified for data objects. They cover performance, insuring the right availability, security, including access rights and access logs, compliance with data retention laws, and any storage SLAs a storage administrator may have. I'll talk more about these in future blogs.