/********************************************************* END OF STYLE RULES *********************************************************/

Tuesday, September 26, 2006

New Storage Arrays: Part 1

This post is another collection of notes. In this case, notes from reading the websites from several fairly new (to me at least) storage subsystem vendors. I don't have any inside information or access to NDA material on these companies. All my notes and conclusions are the result of reading material on their websites.


Panasas builds a storage array and associated installable file system that closely aligns with my vision of an ideal SAN so, needless to say, I like them. Their focus is HPTC, specifically, today's supercomputers built from many compute nodes running Linux on commodity processors. For these supercomputers, it's critical that multiple compute nodes can efficiently share access to data. To facilitate this, Panasas uses object storage along with a pNFS MetaData Server (MDS). Benefits include:

    Centralize and offload block space management. Compute nodes don't have to spend a lot of effort comparing free/used block lists between each other. A compute node can simply request creation of a storage object.

    Improved Data Sharing. Compute nodes can open objects for exclusive or shared access and do more caching on the client. This is similar to NFS V4. The MDS helps by providing a call-back mechanism for nodes waiting for access.

    Improved Performance Service Levels. Objects include associated properties, and with the data grouped into objects, the storage can be smarter about how to layout the data for maximum performance. This is important for HPTC which may stream large objects.

    Better Security Objects include ACLs and authentication properties for improved security in these multi-node environments.

Panasas uses pNFS concepts but goes beyond pNFS, I think. Compute nodes include the client layout manager so they can stripe data across OSD devices for increased performance (reference my pNFS Notes). They use the MDS server for opens/closes, finding data, and requesting call-backs when waiting for shared data. They get the scaleable bandwidth that results from moving the MDS out of the datapath. More importantly, the MDS provides the central point to keep track of new storage and for new servers to go to find the storage it needs. Supports scaleability of both storage and servers.

Object Properties Panasas objects use the object-oriented concept of public and private properties. Public properties are visible to the Object Storage Device and specify the Object ID, size, and presumably other properties to tell the OSD the SLA it needs. Private properties are not visible to the OSD and are used by the client AND the MDS. They include ACLs, client (layout manager) RAID associations, etc.

iSCSI Panasas runs their OSD via iSCSI over IP/Ethernet. I assume they use RDMA NICs in their OSD array and it's up to the client whether or not to use one. For control communications with the MDS, they use standard RPC.

File System I don't think their filesystem is Lustre. I think they wrote their own client that plugs into the vnode interface on the Linux client. I don't know if their OSDs work with Lustre or not. I would think they would not pass up that revenue opportunity. I think that 30% of the Top-100 supercomputers use Lustre.

Standards I like that Panasas is pursuing and using standards. They understand that this is necessary to grow their business. They claim their OSD protocol is T10 compliant and they are driving the pNFS standard.

Storage Hardware Interesting design that uses 'blades'. From the front, looks like a Drive CRU, but a much deeper card with (2) SATA HDDs. Fits into a 4U rack mount tray. Includes adapters for IB and Myranet, as well as native ethernet/iSCSI interface. Don't know what the price is but, appears to be built from commodity components so ought to be reasonably inexpensive. I didn't see anything about the FW but I'm certain it must be Linux-based.

Summary Again, I like it - a lot. They are aligned with the trend to enable high-performance, scaleable storage on commodity storage AND server hardware (ethernet interconnect, x86/x64 servers running Linux, simple storage using SATA disks). Developing FileSystem and MDS server software to enable this scaling that actually works. Driving it as an open standard including driving pNFS as a standard that is transport-agnostic. By using the open-source process they can take advantage of contributions from the development community. Finally, makes sense to start out in HPTC get established and mature the technology but I see a lot of potential in commercial/enterprise datacenters.


BlueArc is an interesting contrast to Panasas. Both are trying to address the same problem - Scaleable, intelligent, IP network and object-based storage that can support lots of scaleable application servers but, they approach the problem in completely different ways. Panasas, founded by a computer science PhD (Garth Gibson) uses software to combine the power the lots of commodity hardware. BlueArc on the other hand, founded by a EE with a background developing multi-processor servers is addressing the problem with custom high-performance hardware.

The BlueArc product is an NFS/CIFS server that can also serve up blocks via iSCSI. Their goal is scaleability but their premise is that new SW standards such as pNFS and NFS V4++ are too new so they work within the constraints of current, pervasive versions of NFS/CIFS. Their scaleability and ease-of-use comes from from very high performance hardware that can support so many clients that only a few are needed.

Hardware Overview. Uses the four basic components of any RAID or NAS controller: Host Interface, Storage Interface, Non-real-time executive/error handling processor, and Real-time data movement and buffer memory control. Each of these is implemented as independent modules that plug into a common chassis and backplane.

    Chassis/Backplane Chassis with a high-performance backplane. Website explains that it uses "contention-free pipelines" for many concurrent sessions and low-latency interprocessor communications between I/O and processing modules. Claims this is a key to enabling one rack of storage to scale to support many app servers.

    Network Interface Module Custom plug-in hardware module providing the interface to the ethernet-based storage network. Website says includes HW capability to scale to 64k sessions

    File System Modules Plug-in processing modules for running NAS/CIFS/iSCSI. Two types: 'A' modules does higher-level supervisory processing but little data movement. 'B' module actually moves file system data and controls buffer memory.

    Storage Interface Module Back-end FC, SCSI interface and processing. Also does multipathing. Website says it contains much more memory than a typical HBA so it can support more concurrent I/Os

Software The software mainly consists of the embedded FW in the server for NAS/CIFS and filesystem processing. Works with standard CIFS/NFS/iSCSI so no special client software required. The white paper refers to the 'Object Storage' architecture but no OSD interface is supported at this time. Includes volume management (striping, mirroring) for the back-end HW RAID trays.

Summary Again, the advantage is high performance and scaleability due to custom hardware. It uses existing network standards so it can be rolled into a datacenter today and it's ready to go. No special drivers or SW required on the app servers which is nice. Also, since you only need one, or a few of these you don't have the problem of managing lots of them. Similar to the benefits of using a large mainframe vs. rack servers. Also, implemented as a card-cage that lets you start small and grow - sort of like a big Sun E10k SPARC server where you can add CPU and I/O modules.

Keys to success will include three things. One, the ability to keep up with new advances in hardware. Two the ability to keep it simple to manage. Third, and my biggest concern, the ability to mature the custom, closed firmware and remain competitive with data services. This is custom hardware requiring custom firmware. BlueArc needs to continue staffing enough development resources to keep up. This concerns me because I've been at too many companies who tried this approach and just couldn't keep up with commodity HW and open software.

Pillar Data

Pillar builds an integrated rack of storage that includes RAID trays (almost certainly OEM'd), a FC block SAN head and a NAS head which can both be used at the same time sharing the same disk trays, and a management controller. Each is implemented as 19" rack mount modules. There's no bleeding-edge technology here. It's basic block and NAS storage with the common, basic data services such as snapshot, replication, and a little bit of CDP. That appears to be by design and supports their tag-line: 'a sensible alternative'. The executive team is experienced storage executives that know that most datacenter admins are highly risk-averse and their data management processes are probably built around just these few basic data-services so this strategy makes sense as a way to break into the datacenter market.

The unique value here is that both NAS and block are integrated under one simple management interface, you can move (oops, I mean provision) storage between both, and the same data services can be applied to both block and NAS. Most of the new invention here is in the management controller which bundles configuration with wizards, capacity planning, policies for applying data services, and tiered storage management. It allows a user to define three tiers of storage, assign data to those tiers, and presumably the system can track access patterns for, at least the NAS files, and migrate between tiers of storage.

Looking Forward

This looks like a company trying to be the next EMC. It is managed by several mature, experienced executives including several ex-STK VPs. They are building on mature technology and trying to build the trust of enterprise datacenter administrators. The value prop is integration of mature, commonly used technologies - something attractive to many admins who use NAS storage with one management UI from one vendor, block storage from another, and SAN management from yet another.

What's really interesting is when you combine this with their Oracle relationship. They are funded by Larry Ellison. As I described in my post on Disruption and Innovation in Storage, I firmly believe that for enterprise storage, the pendulum has swung back to giving the competitive advantage to companies that can innovate up and down an integrated stack by inventing new interfaces at each layer of the stack. We will never solve today's data management problems with a stack consisting of an application sitting on top of the old POSIX file API, a filesystem that breaks data into meaningless 512-byte blocks for a block volume manager, in-turn talking to a block storage subsystem. So, Oracle is doing the integration of the layers starting from the top by bypassing the filesystem, integrating it's own volume manager and talking directly to RDMA interfaces. Now we have Pillar integrating things from the bottom up. By getting Oracle and Pillar together to invent a new interface, they could create something similar to my vision of an ideal SAN.

In this vision of the future, Oracle provides a bundle of software that can be loaded on bare, commodity hardware platforms. It includes every layer from the DB app, through volume management down to RDMA NIC driver and basic OS services which come from bundling Linux. The commodity x64 blades could include RDMA-capable NICs for high-performance SAN interconnect. Then, using NFS V4++, Oracle and Pillar agree on extended properties for the data objects to tell the Pillar storage subsystem what Service Levels and Compliance steps to apply to the data objects as they are stored, replicated, etc. Over time, to implement new data services or add compliance to new data management laws, Oracle and Pillar can quickly add new data properties to the interfaces up and down the stack. They don't have to wait for SNIA or ANSI to update a standard and they don't have to wait for other players to implement their side of the interface. Microsoft can do this with VDS and their database. With Pillar, Oracle can do it as well.