Keeping it cool is imperative in all kinds of applications. That's why designers are increasingly turning to the 64-bit Power architecture, which always sits at the top of the list in power/performance ratios. Following that trend, PA Semi uses the single- or dual-core PWRficient PA6T-1628M and takes the ratio more than a few steps further, consuming a mere 13 W on average for a 2-GHz processor (Fig. 1).
The PA6T core's very fine-grain clock gating significantly reduces power requirements (Fig. 2). It takes more power to switch a flip-flop than it does to keep it stable. Likewise, a small percentage of flip-flops in a processor will change state at the same time. Minimizing the unnecessary changes reduces power requirements.
Most advanced processors already use clock gating at a high functional level. This improves efficiency, but even more efficiency is achievable by splitting blocks with a dedicated clock into smaller collections of transistors. PA Semi takes this almost to the extreme with over 25,000 blocks.
This fine-grain approach increases the number of gates to support clock gating, though the number is relatively small compared to the overall processor architecture. Now that the number of transistors is relatively unimportant, power and diagnostics can use techniques that add such overhead to improve overall system efficiency.
In this case, the approach and payoff are considerable. Coarse clock gating often can reduce power requirements by 40% (Fig. 3). Of course, the fine-grain design upgrade is better at its maximum, while the average even trumps that number.
The amount of power the system requires will vary depending on the program being executed, which is why it's important to know the limits. PA Semi's designers put together worst-case tests, quaintly named a "thermal virus," to see how well or how poorly the new design would work. Even here, the results were significantly lower than a coarse-grain approach.
The fine-grain approach isn't easily applied to existing designs. PA Semi used a number of techniques to generate gated clock blocks, such as augmented register and logic definitions that incorporate the gated clock architecture. The process also required routing a larger number of clock and control signals throughout the chip.
Instead of significant alterations to the design process, the process required a greater awareness of the design approach. The approach partitions the power plane so voltages can be optimized per region. It isn't just a matter of using new design blocks.
PERFORMANCE STILL MATTERS
PA Semi didn't slow down the clocks or skimp on peripherals for its first chip (Fig. 4). The level 1 and 2 caches, as well as the speed, are on par with other Power architecture chips.
The dual DDR2 memory controllers provide access to off-chip memory and deliver it across the Conexium Interchange, a high-speed on-chip switch with a 64-Mbyte/s peak data rate and up to 1 Gtransaction/s. The controllers use active and pre-charge standby to reduce power consumption.
The processor is compatible with the Power architecture, including support for virtualization. It's a superscalar, out-of-order design with a strongly ordered memory model and minimized use of content-addressable memories (CAMs). It supports a host of power-down modes as well.
The SMbus and UART interfaces are low-speed compared to the 24 high-speed serializer-deserializer (SERDES) units in the Envio intelligent I/O subsystem. These SERDES include 8 PCI Express ports supporting one to 16 lanes and dual 10-Gbit Ethernet interfaces, as well as quad 1-Gbit Ethernet interfaces. The SERDES won't handle all of these interfaces, but they will support various combinations that may use one or more SERDES.
The subsystem also incorporates offload engines that support RAID, TCP/IP, and encryption, including AES, DES, DES3, ARC4, Kusumi plus SHA-1, SHA-256, and MD5 hashing. Also, on-chip trace support augments the JTAG debugging. There are trace buffers for transactions on the Conexium Interchange and the Envio peripherals.
SMALLER, COOLER, FASTER
The PA6T-1628M has generated quite a bit of interest, especially for compact, high-performance form factors like VPX (VITA 46). The 3U form factor requires high-density layouts. However, cooling tends to be a major issue. This makes the clock-gated chip ideal for a 3U VPX single-board computer (SBC).
Extreme Engineering Solutions' XPedite8070 is one of the first 3U VPX SBCs available. Possibly along with one of its siblings, it fits nicely into the stylish XPand1000 development system (Fig. 5). The XPedite8070 features a single 1.5-GHz dual-core PA6T1682 chip with 2 Gbytes of DDR2 memory, 32 Mbytes of NOR flash, and 1 Gbyte of NAND flash.
The board only needs 34 W. The fabric interface uses PCI Express or a 10-Gbit Ethernet XAUI fabric interconnect. Also, the board brings out dual SGMII and dual, isolated Gbit Ethernet ports. The serial ports are accessible via USB interfaces. Software support is available for Linux and WindRiver's VxWorks.
Also in the 3U VPX arena, Curtiss-Wright Controls Embedded Computing's VPX3125 uses the same 1.5-GHz single- or dual-core chips with a similar DDR2 complement. It has 128 Mbytes of NOR flash, 1 Gbyte of NAND flash, and 512 kbytes of nonvolatile memory (NVRAM).
The VPX3-125 exposes the serial ports, a pair of x4 PCI Express lanes, and a pair of 1-Gbit Ethernet ports. The board also has a USB 2.0 host port and digital I/O ports. Expansion is possible via an XMC/PMC site. It comes in a VPX-REDI (VITA 48) form factor.
As the Extreme Engineering Solutions and Curtiss-Wright Controls Embedded Computing offerings indicate, PA Semi has struck a chord with its high-performance, low-power solution. Expect to see quite a few additions to this list, especially in other form factors where the Power architecture is already popular.
PA Semi PA6T-1682M
Architecture: 64-bit Power
Speed: 2 GHz
Cores: one or two
Memory: dual DDR2 controllers
Cache: 64-kbyte instruction and 64-kbyte data L1 per core, 2 Mbytes L2 shared
Power: 13 W typical, 25 W max at 2 GHz;
6 W typical, 10 W max at 1 GHz Peripherals: three SMbus, two UART, boot bus
High-speed peripherals: 24 configurable SERDES, 8x PCI Express lanes for dual 10Gbit Ethernet, quad 1-Gbit Ethernet