Only a few companies have invested in an architectural license for the ARM processor. Faraday Technology Corp., one of those elite few, has revamped the ARM processor around the ARM V4 architecture to deliver a synthesizable CPU that runs at up to 500 MHz (worst case). The FA626 core is closely tied to the UMC foundry's 130-nm CMOS process. It requires only 0.6 mW/MHz and 0.7 mA in standby mode.
To achieve a much higher level of performance than the standard ARM core, Faraday's designers broke away from using the ARM AMBA bus for CPU-to-memory interfaces. Instead, they developed a novel on-chip switch fabric dubbed the M-Hub (see the figure).
The M-Hub provides a 64-bit, 166-MHz synchronous bus interface optimized to connect to DDR333 DRAMs. The fabric creates point-to-point connections among the bus masters connected to it, so the bandwidth is scalable to support high-performance network and video interfaces anticipated by applications the FA626 targets.
Also, the M-Hub uses different bandwidth and latency filters. Each master, then, can request guaranteed performance from the double-data-rate (DDR) DRAM. Designers can add more level 2 cache to increase available memory bandwidth as well. An AHB/APB bus bridge ensures access to a wide range of peripheral IP blocks.
Another tweak to the architecture involves a cache coherency engine that's located centrally near the DDR DRAM controller rather than at each of the bus masters. As a result, customers who add their own unique bus master functions needn't implement cache coherency at each bus master. This guarantees consistent data for the entire memory space without any significant design effort.
When clocked at 500 MHz, the FA626 delivers performance comparable to the MIPS 24K, PowerPC 440, and other high-performance RISC CPUs--but with the ARM instruction-set architecture. The processor core employs an eight-stage pipeline, 32-kbyte instruction and data caches, a branch-target buffer, and nonblocking loads. Together, these features deliver 1.35 Dhrystone MIPS/MHz.
To minimize power consumption, designers optimized the circuits to leverage UMC's Fusion multiple-threshold voltage process. Thus, circuits can use two different transistor threshold voltages to optimize for both high performance and low power.
Single-use licenses for the FA626 cost $500,000, including the CPU and level 1 data and instruction caches, the coprocessor interface, and the DDR controller and cache coherency engine (with streaming cache and a phase-locked loop). CPU cache size is configurable from 8 to 32 kbytes. The company also plans to offer a version of the core on UMC's 180-nm process.
Faraday Technology Corp.