/********************************************************* END OF STYLE RULES *********************************************************/

Tuesday, September 19, 2006

My Ideal SAN, Part II, Data Services


In part one, I talked about the SAN interconnect and network and array services that facilitated the use of lots of scaleable rack servers. Now I want to talk about how to achieve scaleability on the storage side, and about data services that help solve the combined problem of centralizing information, keeping it always available, putting it on the right class of storage, and keeping it secure and compliant with information laws.



Consolidating and Managing the Information with NFS
Early SANs were about consolidating storage HARDWARE, not the information. The storage was partitioned up, zoned in the SAN, and presented exclusively to large servers giving them the impression they were still talking to direct-attached storage. This allowed the server to continue to run data services in the host stack because it virtually owned it's storage. My ideal datacenter uses lots of scaleable rack servers with applications that grow and migrate around. Trying to run the data services spread across all these little app servers/data clients is nearly impossible. The INFORMATION, not just the storage hardware has to be centralized and shared and most of the data services have to run where the storage lives - on the data servers. This means block storage is out. Block storage servers which receive disaggregated blocks of storage with no properties of the data are hopelessly limited in their ability to meaningfully manage and share the information. So, my storage needs to be object-based and, since I'm building this datacenter from scratch, I'm going to use NFS V4++. (If I needed to run this on legacy FC infrastructure, I would use the OSD protocol but, more on that later). With enhanced NFS, the storage servers keep the information in meaningful groupings with properties that let it store the information properly.

Performance and Availability
For high performance and availability I want NFS V4 plus some enhancements. One enhancement is the RPC via RDMA standard being developed by Netapp and Apple. The onboard NIC in the rack servers should be capable of performing RDMA for the RPC ULP as well as iSCSI. For availability, the host stack must support basic IP multipath as well as NFS volume and iSCSI LUN failover. The latter should use the industry-standard symmetric standard, or ANSI T10 ALUA for asymmetric LUN failover. For NFS volume/path failover, the V4 fs_locations is helpful because it allows a storage server to redirect the client to another controller that has access to the same, or a mirrored copy of the data. This helps but, to achieve full availability and scaleability, we need pNFS with it's ability to completely decouple information from any particular piece of HW.

pNFS
I few weeks ago a posted a few notes on pNFS. pNFS applies the proven concept of using centralized Name and Location services in networks - the same concept that has allowed the internet to grow to millions of nodes. The pNFS Name/Location server can run on the same inexpensive clustered pair of rack servers as the storage DHCP service. With pNFS, instead of mounting a file from a piece of NFS server hardware, clients do a lookup by name, and the pNFS Name/Location server returns a pointer to the storage device(s) where the data currently resides. Now files can move between a variety of small, low-cost, scaleable storage arrays giving high availability. Frequently-accessed data can reside on multiple arrays and an app server can access the nearest copy. For performance, app servers can stripe files across multiple arrays. Finally, with NFS V4 locking semantics, multiple app servers can share common information - something FC/Block SANs have never been able to do effectively.

The Storage Server Hardware
Just like I described using small, scaleable, rack-mount application servers, pNFS now allows doing the same with the storage. My storage arrays would be scaleable - probably 2U/12-drive or 3U/16 drive bricks, some with high-performance SAS disks, and others with low-performance SATA. Some with high-performance mirrorsets, others with lower-performance RAID 5. The interface is ethernet that is RDMA-capable for both iSCSI and NFS/RPC. As I described in part one, they can be configured with iSCSI LUNs and assigned meaningful names and the array registers those with the central name service on the SAN. They can also be configured with NFS volumes that register with pNFS. This gives the ultimate in scaleability, flexibility, lower cost, high availability, and automated configuration. Now, we can talk about how to seriously help manage the data.

Managing the Data
Managing the Data means being able to do four things all at the same time. One, keep the data centralized and shared. Two, keeping it always accessible, in the face of any failures or disasters. Three, putting the right data on the right class of storage, and Four, complying with applicable laws and regulations for securing, retaining, auditing, etc. It's when you put all four of these together that it gets tough with today's SANs.


I already talked about how pNFS with NFS V4++ solves the first two - keeping the data centralized, shared, and 100% accessible. With pNFS, arrays can share files among multiple data clients. Both arrays and data clients can locally and remotely replicate data via IP and the pNFS server allows data clients to find the remote copies in the event of a failure. Similarly, on the application server side, if a server fails, an application can migrate to another server and quickly find the data it needs.


Now I want to talk about how the object nature of NFS allows solving the second two problems. Again, because the data remains in meaningful groupings (files) and has properties along with the ability to add properties over time, the storage servers can now put it on the right class of storage, and apply the right compliance steps. NFS today has some basic properties that let the storage server put the data on the right class of storage. Revision dates and read-only properties allow the storage to put mostly-read data on cheaper RAID 5 volumes. With revision dates, the storage can migrate older data to lower cost SATA/RAID-5 volumes and even eventually down to tape archive. With the names of files, the storage can perform single-instancing. These properties are a start but I would like to see the industry standardize more properties to define the Storage Service Levels data objects require.


Finally, compliance with data laws is where the object nature of NFS can help the most. The problem with these laws is they apply to the Information, not to particular copies of the data. Availability and Consolidation requirements mean the information has to be replicated, archived and shared on the storage network. With NFS, information can be named, and the name service can keep track of where every copy resides. The properties associated with the data can include an ACL and audit trail of who accessed each copy. The storage can retain multiple revisions, or can include an 'archive' property so the storage makes it read-only. The properties can include retention requirements then, once the retention period expires, the storage can delete all copies. These are just a few of the possibilities.

How to Get There?
Some of this development is happening. Enhancements to NFS V4 are being defined and implemented in at least Linux and Solaris. pNFS is being defined and prototyped through open source, with strong participation by Panasas. RDMA for NFS is at least getting defined as a standard. Now we need NICs from either the storage HBA, or ethernet NIC vendors. Some gaps where I don't see enough progress are one, defining more centralized configuration, naming and lookup services for pNFS storage networks. Panasas and the open development community seem to be focusing on HPTC right now. Probably not a bad place to start. That market needs the parallel access to storage they get from pNFS and object-based storage. But, it leaves an opportunity to define the services to automate large SANs for other markets. The other gap is standardizing properties for data objects, specifically for defining Storage Service Levels and Compliance with data laws. These need to be standardized. (I need to check what the SNIA OSD group is doing here).

Notes on Transitioning from Legacy FC SANs
One of the nice features of pNFS's separation of control and data flow is that it doesn't care what transport is used to move the data. They typical datacenter with it's large investment in Fibre Channel will have to leverage that infrastructure. There is no reason the architecture I describe can't use FC in parallel with ethernet with the T10 OSD protocol provided OS drivers are available that connect the OSD driver to the vnode layer. The same data objects with the same properties attached can be transmitted through the OSD protocol over FC. THIS is the value of the T10 OSD spec. It allows an Object-based data management architecture like I described above to leverage the huge legacy FC infrastructure.