SM10000 processor node
SM10000 processor board
SeaMicro is taking a completely different approach to many-core cluster computing. It didn’t follow the pack, which is chasing large multicore processors. Instead, it looked for the processor that delivered the most performance for the least amount of power and came up with the single-core Intel Atom. Its performance per watt at 100% CPU utilization is more than three times better than Xeon or Opteron multicore solutions.
Then, SeaMicro packed 512 1.6-GHz Z530 Atom processors and 1 Tbyte of DRAM into a 10U system tied together with a supercomputer fabric called a multidimensional torus that’s similar to IBM’s Blue Gene supercomputer (Fig. 1). The torus interconnect uses a PCI Express-style serializer-deserializer (SERDES). The fabric delivers a system bandwidth of 1.28 Tbits/s. The torus provides redundancy and connectivity.
The $139,000 SM10000 targets Web service support where clusters are common. It incorporates all of the components traditionally found in a rack including a load balancing system, network switch, and storage subsystem. The approach reduces the space and power requirements by 75%.
SeaMicro calls the entire system a Dynamic Compute Allocation Technology (DCAT). On average, it uses only 1 kW with a maximum power requirement of 2 kW. The system is more efficient than competing multicore solutions even at 100% system utilization, but the SM10000 is significantly more efficient when system utilization drops to 25%.
The SM10000 allows 2048 Atoms to fit into a single rack. The architecture can support any processor from ARM to PowerPC, but Atoms are in the first incarnation.
SeaMicro has divided the system into three major components: compute, storage, and communications (Fig. 2). The compute engines are identical boards with eight Atoms per board. Each board also has four SeaMicro ASICs, one for every two Atom processors. The ASIC does a number of things including connecting to each of four adjacent nodes.
There are 64 boards in the system for a total of 512 Atom processors. The boards are divided into four groups that plug into a pair of backplanes. The system is arranged so the processor boards are accessible from the side of the system. The system slides out on rails, allowing access to the boards even while the system is running. All of the components including these boards are hot-swappable.
The storage section is board-oriented as well. There are eight controller boards, each handling up to eight drives that can be a mix of 2.5-in. hard-disk drives (HDDs) and solid-state disk (SSD) drives. The controllers provide SAN-style (storage-area network) storage within the system, allowing disks to be sliced, merged, or otherwise managed. It is even possible to have configurations where storage has a single writer and multiple readers.
The communication section consists of eight switch boards that feature either eight 1-Gbit Ethernet ports or two 10-Gbit Ethernet ports. The switches also have a pair of FPGA nodes that link the storage controllers to the torus network.
The boards have a T-style layout with a section that plugs into the processor backplanes. Each controller has a pair of FPGA nodes that are part of the torus. This board layout facilitates system cooling. Airflow moves from the front, across the storage boards, then to the processor boards, and finally to the connectivity boards and power supplies.
The torus essentially wraps around the processor backplanes and through the communication and storage nodes on the respective boards (Fig. 3). The node interconnect hardware handles the packet routing. The hardware is essentially identical, although the communication and storage nodes are implemented in FPGAs and the processor nodes are implemented as an ASIC.
The system automatically recognizes system connectivity changes as they occur and adjusts routing accordingly. This allows a processor board to be replaced while the system is running.
The ASIC handles a pair of Atom processors (Fig. 4). The processor boards hold eight processors and four ASICs plus DRAM for each processor (Fig. 5). Each Atom processor currently requires the matching system control hub (SCH) chip to provide the PCI interface to the ASIC. The rest of the peripheral support such as USB and hard disk is not needed. The processor is actually the smallest part of the package, leading to an even smaller footprint if the SCH could be eliminated by the CPU vendor.
An out of band I2C system similar to that found on managed systems such as VPX and AdvancedTCA handles the system configuration and management.
The processor does not have to have virtualization support as the ASIC provides virtualized hardware. This support on the processor nodes gives the operating system and application software access to virtual storage, network, and peer-to-peer communication links. The storage and network connections usually will be between a processor and a storage or network node. This is not a requirement since the virtual interfaces could be between processor nodes as well.
Processor-to-processor communication is done over the fabric bypassing the network switch. Linux is the initial operating system for the SM10000, though there is no hardware limitation with respect to operating systems.
Like any new company, SeaMicro is concentrating on its first product, but its success will lead to interesting possibilities in the future. The first major change is to take advantage of Intel’s latest Atom, which uses a PCI Express connection to its peripherals.
This is a major change for Intel and a significant advantage for SeaMicro that can essentially eliminate eight major power-hungry, relatively speaking, support chips. SeaMicro only used these chips to get the PCI interface to its ASIC.
SeaMicro has initially targeted clustered Web services. The architecture lends itself nicely to this class of applications. There are many applications where other approaches are more efficient, but this architecture is highly effective in many areas such as applications with partitioned workloads.
Supercomputer applications where locality of data is critical and nearest neighbor communication is useful will work quite well with SeaMicro’s platform. It will be interesting to see how this platform gets exploited by programmers.