/********************************************************* END OF STYLE RULES *********************************************************/

Tuesday, August 15, 2006

pNFS

Parallel NFS (pNFS) extends NFS V4 to separate communication of control (file opens and lookups) and user data. With pNFS, the NFS server, which previously performed all control operations AND moved the data, now just returns 'pointers' telling a client where the data is actually located. This means the pNFS server can be implemented on a fairly inexpensive (although highly available) server such as a clustered pair of x64 rack servers.


Parallel NFS also breaks the restriction that a filesystem must reside on a single server - or as Hildebrand says it: the 'single server' design which binds one network endpoint to all files in a file system. The pNFS server can return more than just a simple file pointer to a client. It can return a descriptor (a LAYOUT) describing locations on multiple storage servers containing portions, or copies of the file. Clients in pNFS implement a layer below the Vnode Ops and above the I/O driver called a 'Layout Driver' that interprets the LAYOUT and routes I/Os to the right regions of the right storage devices to do file I/O. The original goal was for high performance parallel access (hence the name parallel NFS) however, as you might have noticed, this Layout Driver sounds a lot like a Volume Manager in a block stack. Just like a Volume Manager, the Layout Driver could implement Mirrors, RAID 5, Remote Replication, Snapshot, Continuous Data Protection (CDP), or anything else a VM does.


The third big difference with pNFS is that as a result of separating control and data, these can now run over different transports. The relatively low-bandwidth, short bursts of control information between clients and pNFS servers can run over standard, low-cost etnernet NICS. The data traffic, more suited to RDMA-capable adapter can run over Fibre Channel, SCSI, or ethernet via a ULP that an adapter can implement to perform RDMA such as iSCSI (see yesterday's post). This also means that the client can use the multipathing driver in the FC or iSCSI stack.


This separation of control and data also means the protocol between the client and the STORAGE server can be either Blocks, Object, or something more like traditional NFS (Files via RPC). If it's blocks, you still need much of the traditional filesystem running on the client and the Layout managed by the pNFS server has to include managing free/used blocks. Objects offer an improvement by moving basic block management to the storage server as well as allowing association of properties with the data objects so the storage server can do a better job storing the data. NFS-like files offer similar benefits with V4 Named Properties but I still have my question from yesterday's post about whether we can build an RDMA NIC to RDMA this data.


At this point, I want to recap the key features of pNFS:

    Separation of Control from Data Traffic and separation of the NFS server from the actual Storage Servers;

    Eliminates the restriction that a file system must reside on one server and allows files to be replicated or spread across multiple Storage Servers under the control of a layer in the client stack similar to today's block volume managers;

    Allows data transfer between clients and Storage Servers over a variety of transports and protocols including FC SANs with their high-performance RDMA adapters and highly available multi-path drivers. Also can include OSD with it's ability to centralize block management and associate useful properties to data at an object granularity.


The Layout Manager is one of my favorite parts of pNFS. Hildebrand et al describe the Linux framework for pluggable Layout Drivers being developed jointly between IBM, Netapp, and U Michigan. Here's the Link. This is beautiful. All kinds of features needed to manage data can be implemented here including local and remote sync and async mirrors with seamless failover, COW snapshot, and CDP. This is a start-up opportunity - creating the next Veritas VM but for NAS based on pNFS.


The other part I really like is the freedom to choose your transport protocol. You can use OSD and associate properties so the Storage Server can store the data on the right tier, keep it secure, comply with data management laws and regulations - and do it at the right granularity based on the data. Then you can run it on an efficient transport such as iSCSI via an RDMA NIC, or on your exiting FC SAN. Or, you can use NFS V4++ over RDMA. It's your choice.