/********************************************************* END OF STYLE RULES *********************************************************/

Monday, September 11, 2006

My Ideal SAN, Part I, Boot Support


This is the first of what may be several posts where I describe my idea of an ideal SAN using a combination of products and technology available today, technology still being defined in the standards bodies, and some of my own ideas. My ideal SAN will use reasonably priced components, use protocols that automate and centralize configuration and management tasks, will scale to thousands of server and storage nodes, and provides storage service levels that solve real data management problems.

The Interconnect
My SAN will use Ethernet. In part because of cost but mostly because it comes with a true network protocol stack. Also, because I can get scaleable rack-mount servers that come with ethernet on the motherboard so I don't need add-on HBAs. The normal progression for an interconnect, as happened with ethernet and SCSI, is that it starts out as an add-on adapter card costing a couple hundred dollars. Then, as it becomes ubiquitous, it moves to a $20 (or less) chip on the motherboard. Fibre Channel never followed this progression because it's too expensive and complex to use as the interconnect for internal disks, and it never reached wide-enough adoption to justify adding the socket to motherboards. I want rack-mount servers that come ready to go right out of the box and with two dual-ported NICs so I have two ports for the LAN, and two for the SAN. Sun's x64 rack servers and probably others meet this requirement.

Boot
To further simplify configuration and management, these rack servers won't have internal disks. They will load pre-configured boot images from LUNs on centralized arrays on the SAN via iSCSI. In spite of my raves about object storage, I don't see any reason to go beyond the block ULP for the boot LUN. The SAN NIC in these servers will be RDMA-capable under iSCSI and include an iSCSI boot BIOS that can locate and load the OS from the correct boot LUN. It finds the boot LUN using the same IP-based protocols that let you take your notebook into a coffee shop, get connected to the internet, and type in a human-readable name like 'google.com' and connect to a remote google server. These are, of course, DHCP and DNS.


I will have a pair of clustered rack-mount servers on the SAN running DHCP and other services. The Internet Engineering Task Force (IETF), the body that defines standards for internet protocols has extended DHCP to add the Boot Device as one of the host configuration options. So replacing or adding a new rack server involves entering its human-readable server name, e.g. websrv23, into its eprom and making sure your central DHCP server has been configured with the boot device/LUN for each of your websrvxx app servers. When the new server powers-on, it broadcasts its name to the DHCP server which replies with websrv23's IP address, boot LUN, and other IP configuration parameters. It can then use a local nameserver to find the boot device by name and then load its operating system. The architect for one very large datacenter who is hoping to move to an IP SAN called these Personality-less Servers.

Array Data Services for Boot Volumes
I want a few data services in my arrays to help manage boot and boot images. First, my ethernet-based arrays will also use DHCP to get their IP address and will register their human-readable array name with the DHCP server. In addition to automating network config, this enables the DHCP application to provide an overview of all the devices on the SAN, and to present the devices as one namespace using meaningful, human-readable names.


One data service I will use to help manage boot volumes is fast volume replication so I can quickly replicate a boot volume, add patches/updates that I want to test out, and present that as a new LUN. I'll have app servers for testing out these new boot images and through DHCP I will route these to boot from the updated boot volumes. Once these are tested, then I want to be able to quickly replicated these back to my production boot volumes.


The other array data service I would like is my own invention that allows me to minimize the number of boot volumes I have to maintain. Ninety-some percent of every boot volume is the same and is read-only. Only a small number of files including page, swap, and log files get written to. I would like a variation of snapshot technology that allows me to create one volume in the array and present that as multiple LUNs. Most reads get satisfied out of the one volume. Writes to the LUN however, get redirected to a small space allocated for each LUN and the array keeps track of which blocks have been written to and any reads to an updated block are read from the per-LUN update space. With this feature I can manage one consistent boot image for each type of server on the SAN.

It's a Real Network
This is why I like iSCSI (for now). You get a real network stack with protocols that let you scale to hundreds or thousands of devices and you can get servers where the SAN interconnect is already built-in. Nothing I've described here (except my common boot volume) is radically new. Ethernet, DHCP, DNS, and even the iSCSI ULP are all mature technologies. Only a few specific new standards and products are needed to actually build this part of my ideal SAN:

    iSCSI BIOS Standard iSCSI Adapters with embedded BIOS are available from vendors such as Emulex and Qlogic but they don't use DHCP to find the boot volume and they're not on the motherboard. We need an agreement for the motherboard-resident BIOS for standard NICs. Intel and Microsoft are the big players here.

    SAN DHCP Server Application We need a DHCP server with the IETF extension for configuring boot volumes. It would be nice if the GUI was customized for SANs with menus for configuring and managing boot volumes and features for displaying the servers and storage on the SAN using the single, human-readable namespace. This app should run on standard Unix APIs so it runs on any Unix.

    The Arrays Finally, we need the arrays that support user-assigned names and use those with DHCP configuration. Maybe iSCSI arrays do this already - I haven't looked. Then, some features to help manage boot volumes would be nice.


If anyone who manages a real SAN is reading this, send me a comment.