Massively Parallel CPU Array Targets High-End Communications

Broadcom’s XLP900 architecture can translate into a massively parallel processing array that has been augmented to handle high-end switching communications and networking chores.

William G. Wong

July 22, 2013

3 min read

These days, network processing is all about high-speed interconnects and the number of cores designers can bring to bear to the packet processing problem. Companies like Tilera try to pack everything into a single chip (see “72-Core Platform Targets Networking Chores”) with multiple versions providing a different number of cores.

Broadcom’s XLP900 series processors incorporate a lot of cores into a single chip but provide a mechanism to build larger arrays to handle more demanding environments. The series packs 80 cores per chip that can be mixed in an array of eight chips for a total of 640 cores (Fig. 1). This array of 64-bit MIPS-based nxCPU cores can deliver 1.28 Tbits/s of memory coherent performance. Each chip delivers 160 Gbits/s executing over 1 trillion operations/s (TOPS).

Figure 1. Broadcom’s XLP900 has up to 80 cores linked to a fast messaging network and distributed interconnect that supports hardware virtualization across an array with up to 640 cores. Each chip brings four DDR3 controllers and a collection of network accelerators in addition to network and peripheral interfaces.

The XLP900 builds around a three-level coherent cache system that is linked by a 2D distributed interconnect. The quad-issue, quad-threading cores use a superscalar, out-of-order design with virtualization support that is compatible with Linux kernel virtual machine (KVM) and Quick EMUlator (QEMU). The system also supports I/O virtualization including PCI Express Gen 3 single root, I/O virtualization (SR-IOV) with 255 virtual functions/port.

Related Articles

Each nxCPU cluster has four cores with L1 and L2 caches for the cluster. Power gating operates on a per core basis. There is a single L3 cache for the chip, but the cache coherency system spans multiple chips. The four DDR3 controllers on each chip deliver a bandwidth of 68.25 Gbytes/s. The memory system is linked to all processors via an inter-chip interface.

This approach is similar to AMD’s HyperTransport on its Opteron server chips and what Intel incorporates in its Xeon processors. Each chip has three inter-chip interconnects that allow up to eight chips to be combined into a single array (Fig. 2).This is sufficient for building an eight-chip array and without using additional hardware.

Figure 2. Three bidirectional, high-speed inter-chip interfaces can link up to eight chips into an array that contains 640 cores.

The inter-chip links handle a range of traffic including caching information. They also support the fast messaging network (FMN), which links peripherals to cores throughout the interconnect. The packet-processing engine handles network peripherals with interfaces that include XLAUI, KR4, XFI, XAUI, RXAUI, and HiGig2. There also is IEEE 1588-compatible hardware time stamping and support for SyncE, MACSec, and PFC.

Related Articles

The hardware accelerators are specialized processors designed to operate with minimal CPU intervention. They target packet-processing chores including compression/decompression and regular expression processors, enabling full-speed deep packet inspection (DPI). The security and encryption support RSA1K, RSA2K, RSA4K, RSA8K, and ECC. The RAID engine targets storage applications, which is where the SATA and PCI Express interfaces come into play. Other peripheral interfaces such as USB 3.0 can be used as well.

These network communication processors will likely handle packet processing chores with data moving onto another host, but the architecture allows this type of processing to stay within the node. Linux virtualization can handle hundreds of virtual machines that are close to the network traffic and protected from each other.

About the Author

William G. Wong

Senior Content Director - Electronic Design and Microwaves & RF

I am Editor of Electronic Design focusing on embedded, software, and systems. As Senior Content Director, I also manage Microwaves & RF and I work with a great team of editors to provide engineers, programmers, developers and technical managers with interesting and useful articles and videos on a regular basis. Check out our free newsletters to see the latest content.

You can send press releases for new products for possible coverage on the website. I am also interested in receiving contributed articles for publishing on our website. Use our template and send to me along with a signed release form.

Check out my blog, AltEmbedded on Electronic Design, as well as his latest articles on this site that are listed below.

You can visit my social media via these links:

I earned a Bachelor of Electrical Engineering at the Georgia Institute of Technology and a Masters in Computer Science from Rutgers University. I still do a bit of programming using everything from C and C++ to Rust and Ada/SPARK. I do a bit of PHP programming for Drupal websites. I have posted a few Drupal modules.

I still get a hand on software and electronic hardware. Some of this can be found on our Kit Close-Up video series. You can also see me on many of our TechXchange Talk videos. I am interested in a range of projects from robotics to artificial intelligence.