The Cloud’s Infrastructure Needs More Than A Silver Lining

The "cloud" has such as nice ring to it. It makes computing seem so simple. For many users, it offers an opportunity to offload their software-based services and applications to someone else’s hardware. Users then can share hardware and dynamically adjust their hardware needs on the fly. It’s also one of the most diverse collections of hardware that has been customized for companies providing cloud services such as Amazon and Google.

Then there’s the issue of “personal” or “private” clouds, which tend to be similar to clouds provided by Amazon and Google but sized and tailored for a company or organization. There may even be linkage and migration of services between the “private” and “public” cloud.

Clouds are possible on a personal level, but only hobbyists and hackers tend to develop them since the configuration and maintenance of even a small cloud cluster is beyond most people and usually unnecessary. Most users will view a “personal” cloud as a set of Internet services provided to individuals via a range of devices from PCs to smart phones.

The cloud is designed to present a consistent, unified Infrastructure as a Service (IaaS). Other “cloud” services can be built using IaaS including Software as a Service (SaaS) and Platform as a Service (PaaS). A quick look under the hood reveals a wide variety of system architectures and implementations because requirements vary significantly and range from companies that need to host a Web site on a single virtual machine (VM) to organizations that need massive amounts of high-performance computing (HPC) power or massive amounts of storage or both.

Many vendors have taken a similar route in their marketing. IBM has a wide range of systems and architectures available from blade servers to rack-mount servers that can host the latest Power architecture processors or x86 processors depending upon customer requirements. Each retains its own SKU, but IBM PureSystems is now the overall brand and targets the cloud (Fig. 1). It also brings the storage and networking components under the same umbrella.

1. The IBM PureSystems brand covers a range of blade and rack-mount servers.

Virtualization is the basis for the cloud, and it effectively hides the storage and networking details. These days the storage access is often through an iSCSI link or a virtual disk controller, so the actual device may be local or across the network.

Cloud Storage

Cloud storage is still based on conventional hard-disk and solid-state disk (SSD) drives, but the mix and interfaces vary significantly. For example, 3.5-in. SATA drives provide high-density, low-cost storage while 2.5-in. single-level cell (SLC) flash drives deliver high performance, especially for read operations. SLC flash drives are found in transaction systems while SATA drives manage large amounts of archival data.

A single SATA/SAS controller can handle the conventional drives. They can even utilize on-board memory and SSDs for caching like LSI’s CacheCade Pro software, which runs on LSI’s matching hardware (see “Getting The Most Out Of SSD Arrays”).

Of course, for the “cloud,” performance is sometimes the only thing. That’s why SAS is moving up to 12 Gbits/s versus 6 Gbits/s for SATA (see “12-Gbit/s SAS Pushes Storage Envelope”). Unfortunately, SATA and SAS are single-channel interfaces. Controllers can combine channels to deliver higher throughput, but this can still be a bottleneck for flash drives as their bandwidth grows as well. This is one reason why designers are turning to PCI Express.

Dell’s latest R820 system supports PCI Express Flash SSDs that improve the I/O bandwidth (Fig. 2). The SSD Form Factor Working Group (SSD FF WG) defines the form factor, which in turn defines a new connector that exposes PCI Express as well as a pair of SAS/SATA interfaces (see “Storage Still Needs Drives, Flash, SATA, And PCI Express”).

2. Dell’s latest R820 system supports PCI Express Flash SSDs that improve the I/O bandwidth. The SSD Form Factor Working Group defines the form factor.

Interfaces like the one from SSD FF WG change how designers create systems. For example, a system could support this interface but only have PCI Express connections. The kinds of drives that can be plugged in then would be limited because the actual interface used is limited by what the drives support. Currently, only SSDs will support the PCI Express connection.

On the other hand, a SAS or SATA interface on the host could support a range of existing drives. The issue would be whether the controller interface would support only SAS or SATA or dual SAS/SATA connections or whether PCI Express would be part of the mix as well. The dual SAS/SATA only makes sense for a system with at least two compute nodes.

None of this is apparent from the front of the rack. Likewise, virtual machines that are running client operating systems and applications will only see a difference with respect to capacity and performance.

This approach provides flexibility. Cost is often an overriding issue, though, so conventional SAS/SATA support is used. Also, PCI Express support of flash storage is not limited to this interface or hiding behind a SAS or SATA controller. Flash storage on PCI Express cards is common. Standards like NVM Express (see “NVM Express: Flash At PCI Express Speeds”) and SCSI Express (see “The Fundamentals Of Flash Memory Storage”) make this easier to accommodate on the software side. These are the same protocols supported on the SSD FF WG interface.

Cloud System Racks

As the Internet grew, racks of 1U systems with a single processor dominated the market. Blade servers emerged to address modularity and high reliability. Hard drives initially were swappable, but blades allowed the entire system to be swapped. Upgrades were possible by simply swapping blades.

Performance and capacity continued to grow as processor performance increased. Multicore processors made the processing node more formidable. This was key to the success of virtualization since Internet service providers (ISPs) could distribute multiple workloads onto these platforms. However, these higher-performance systems were power-hungry and generated a significant amount of heat.

SeaMicro surprised the world with its SM10000 10U system with 512 Atom cores a number of years ago (see “10U Rack Packs 512 Atoms”). The Atom was a lower-performance chip compared to its multicore Xeon sibling, but it sipped power instead of guzzling it.

The SM10000 incorporated a proprietary, high-speed serial hypercube architecture to connect the Atom chips to each other and to the storage and network interfaces that had an FGPA front end. This was more efficient and compact than the 1U alternatives, but ISPs and cloud providers were already using cooler, low-power solutions. They just took up more room.

SeaMicro’s latest SM10000-64HD includes 64 Intel Xeon-based boards (Fig. 3). The same box can hold 64 Atom-based boards, and the latest board has a multicore Atom. AMD purchased SeaMicro, so look for APU-based processor boards in the future. AMD’s R-Series APU (accelerated processing unit) includes a 384-core GPU (see “APU Blends Quad-Core x86 With 384-Core GPU”). It would have been a challenge to incorporate a conventional GPU into SeaMicro’s board, but the APU is power-efficient and a single chip like the Atom or Xeon.

3. SeaMicro’s SM10000-64HD (a) includes 64 Intel Xeon-based boards (b, bottom). The same rack can hold 64 Atom-based boards (b, top) as a low-power alternative.

GPUs are part of the mix in the cloud, but there are challenges because GPUs lack the virtualization support of CPUs that the cloud is built on. Still, there are advantages to providing software with access to a GPU, especially as clusters of CPUs become the norm for cloud-based services rather than one of many VMs on a processing node. There is little need for security or virtualization if a node has a GPU and only one virtual machine.

The x86 CPU architecture tends to dominate the cloud. The other major architectures include Oracle’s Sparc, IBM’s Power, and Intel’s Itanium with HP its major supporter. The possible usurper may be the emerging 64-bit ARM architecture (see “ARM Joins The 64-Bit Club”). Low power has been associated with ARM’s chip architectures, and it has dominated the mobile space. It also has a significant presence in the embedded space. There are 32-bit ARM arrays being delivered, but the 64-bit platform is likely needed to make a major dent in the x86 market. Still, there is plenty of room for all these processor architectures.

SeaMicro’s form factor is not the norm. Other approaches are becoming more popular because of the cloud’s need for a range of processing platform and storage combinations.

SuperMicro’s 6027TR-H70RF boasts four X9DRT-HF motherboards with dual Intel E5-2600 Xeon processors. Each processor module has access to three 3.5-in., hot-swap, SATA/SAS drive bays. The processor module is removable from the rear, while the storage is hot-swappable from the front. The power supplies are also hot-swappable.

SuperMicro’s 2027GR-TRF is a more conventional 2U system (Fig. 4). It features a X9DRG-HF (b) motherboard with Intel’s E5-2600 Xeon processor. Also, it can handle three x16 PCI Express slots, which in turn can handle conventional PCI Express GPU cards. It has x8 and x4 slots as well. Storage can be included in ten 2.5-in. hot-swap drives accessible from the front panel. An 1800-W redundant power supply and five heavy-duty cooling fans embody the kinds of power requirements that the cloud server must deliver.

4. SuperMicro’s 2027GR-TRF (a) features a X9DRG-HF (b) motherboard with Intel’s E5-2600 Xeon processor. The system has three x16 PCI Express Gen 3.0 slots suitable for GPUs.

The move from individual servers to an array of servers allows all these approaches to be viable for a range of applications. Blade servers like those from Dell and IBM normally swap the entire system including any storage. They are ideal for high-availability systems where replacements can be added quickly and reliably.

Systems like SuperMicro’s provide a modular capability but at a lower cost. Likewise, conventional 1U rack systems can be logically removed from the cloud and replaced later.

Some applications work best with compute nodes that have their storage elsewhere on the network. Others work best with some local storage. It may need to be fast or slow depending upon the application. NoSQL platforms like Hadoop’s Hadoop File System (HFS) work best when the data node and compute node are the same. Hard drives often work better than SSDs in this instance.

Cloud Networking

Networking ties these compute and storage services together. 1G Ethernet is the workhorse for most nodes, with 10G showing up in some high-performance nodes. Systems like SeaMicro’s incorporate 10G switches within the system, but racks normally have an Ethernet switch that ties the nodes within the system together. 10G and now 40G/100G act as the backbone in larger systems.

The challenge lies in switching at these high speeds and handling other network-related chores like load balancing, content filtering, quality-of-service (QoS), virtual local-area network (VLAN) and firewall support. These chores tend to be easier for smaller networks where a single device can handle the bandwidth. Devices that can handle line rates of 40G or more are cutting-edge.

OpenFlow may have an impact in the future. This software-defined network (SDN) approach from the Open Network Foundation is designed to manage everything from network stacks and routers to switches and virtual switches by providing these devices with a set of forwarding instructions. Not all devices will support OpenFlow, and some vendors have their own approach to managing large cloud networks.

InfiniBand usually finds a home in the cloud where HPC resides. Its low latency and overhead make it a better choice than Ethernet. InfiniBand’s Virtual Protocol Interconnect (VPI) can mix Ethernet traffic over an InfiniBand connection.

Part of the challenge when choosing a cloud service provider and deploying large applications on the cloud is how the underlying network, storage, and compute node work together as well as what other applications are using the same cloud. As noted, this mix can vary significantly and therefore radically affect the resulting performance of the system.

So, the “cloud” may be a single term everyone is abusing these days, but under the hood it’s a much different experience. The variety of configurations, chips, storage devices, and networking is mind boggling and looks to get more complicated in the future.

Storage Still Needs Drives, Flash, SATA, And PCI Express

Solid-state storage radically changed the storage equation in the enterprise. Solid-state disks (SSDs) are significantly faster than hard-disk drives (HDDs), but SSDs are more costly per gigabyte. They also have a limited lifetime, although this varies significantly depending upon whether the SSD is based on single-level cell (SLC), multi-level cell (MLC), or triple-level cell (TLC) memory. Denser flash memory supports a lower number of writes.

SSDs make up a major percentage of enterprise and cloud storage solutions. At this point, most of the large-capacity flash storage is in the form of 2.5-in. SATA or SAS flash disk drives. SATA tops out at 6 Gbits/s, and SAS now has a 12-Gbit/s standard. These are fast, but SSDs push these limits.

PCI Express is the alternative used by Micron’s ReadSSD P320h and Fusion-io’s ioDrive PCI Express-based flash storage boards (see “The Storage Hierarchy Gets More Complex”). PCI Express provides a fast and scalable interface compared to the single-channel SATA/SAS interface. These devices come with their own device drivers for accessing the onboard storage.

NVM Express is an emerging standard for access to storage using PCI Express (see “NVM Express: Flash At PCI Express Speeds”). It will allow a common driver to access hardware from any vendor. A similar PCI Express interface standard, SCSI Express from the SCSI Trade Association (STA), is in the works. SCSI Express is essentially a standard PCI Express SCSI interface that would be suitable for any type of storage, not just flash memory drives.

The SSD Form Factor Workgroup (SSD FF WG) has defined a connector interface and module form factor that takes advantage of PCI Express and SSDs (Fig. 1). It is designed to handle a range of configurations from dual-channel SATA drives to PCI Express-based drives. A device typically will provide only one kind of interface, SATA or PCI Express, but the common connector allows system designers to provide common slots that handle any kind of compatible drive. Power and ground pins are common to either interface. The cost of the connector is on par with those already used on disk drives.

1. The SSD Form Factor Workgroup defined a single connector that handles SATA or PCI Express interfaces.

Micron’s 700-Gbyte P320h drive utilizes this new connector (Fig. 2). It has a PCI Express Gen 2 interface. The drive employs SLC flash that targets high-performance storage requirements. Internally, the drive employs a redundant array of independent NAND (RAID) chips. It uses 25 W and can deliver 785,000 read input/output operations per second (IOPs).

2. Micron’s P320h drive uses the PCI Express interface found on the SSD Form Factor Workgroup’s new connector.

Dell’s latest PowerEdge servers provide sockets for this type of device. Like conventional drives, a frame encloses the drive providing hot-swap capabilities. Both PCI Express and SATA support hot-swapping. Dell is one of many companies that are supporting this new standard, and most disk suppliers will likely offer it as an option along with existing SATA and SAS standards since these form factors are already well established.

Systems will continue to utilize a range of storage form factors from the conventional 2.5-in. SATA/SAS drive to this new, combined interface. Likewise, board-level products will have their place since hot-swap or disk-drive form factors are not always a requirement. Each approach is likely to remain viable for many years to come.