Versal: A New Level of Compute Configurability

Versal: A New Level of Compute Configurability

Xilinx lifted the veil on Everest, the Adaptive Compute Acceleration Platform (ACAP), that is now known as Versal.

Everest, the Adaptive Compute Acceleration Platform (ACAP) from Xilinx, has officially been unveiled, and it’s now called Versal (Fig. 1). It greatly extends Xilinx’s Zynq Ultrascale+ MPSoC that has multiple hard-core Arm processors as well as complementary hard-core peripherals.

A number of Versal’s features make it quite different than Zynq Ultrascale+, though. First, the processing cores that include Arm Cortex-A72s and real-time Cortex-R5s are tied to the Platform Management Controller (PMC). Second, the processing complex is fully functional, so it can boot and use the PMC to configure the rest of the chip. This wasn’t the case with earlier Xilinx platforms. It can greatly simplify development and deployment because the software has control over the system and runs on a known configuration.

1. Versal is the implementation of Xilinx’s Adaptive Compute Acceleration Platform (ACAP).

The adaptable-hardware (AH) component is essentially the FPGA we have come to know and love, but has been enhanced and optimized for its new use within a much larger compute environment. It still provides a fully configuration fabric and migration is relatively straightforward, although there are differences. One major difference is the network-on-a-chip (NoC) that ties everything together.

The NoC has AXI hooks into the AH (I still think we need to call it an FPGA), but also links it to everything else on the chip, including the AI and DSP engines. The movement of the DSP support will likely be the most difficult aspect of migration from existing FPGAs to Versal. However, the NoC provides a consistent interface to all devices. Also, the AXI interface is one that Xilinx has standardized on; therefore, even soft devices on existing FPGAs would look the same as any hard or soft device in Versal.

Some of the other changes with AH is a customizable memory hierarchy and dynamic reconfiguration that’s eight times faster than existing platforms. This is key for reuse of the AH as well as making it more desirable to reconfigure a system. Slow or difficult reconfiguration often eliminates it as an option for a particular application.

2. The Versal family has six incarnations, with AI Core and Prime being the first two out the gate.

The Versal family currently consists of half-a-dozen configurations, two of which have been enumerated so far (Fig. 2). Those two are the Versal AI core and Versal Prime.

AI Core and Prime

The Versal AI Core is the midrange artificial-intelligence (AI) platform that includes AI acceleration. The Versal Prime series lacks the AI acceleration (Fig. 3). Other series in the AI group will augment the AI components, such as adding high-speed analog support for the AI RF series. This is similar to the Zynq UltraScale+ RFSoC with multi-gigasample/s analog-to-digital and digital-to-analog converters (ADCs and DACs). The Prime, Premium, and HBM (high bandwidth memory) versions have no or limited hardware-accelerated AI support, but add features like HBM that’s common in high-performance computing (HPC) platforms.

3. Versal Prime (left) forgoes AI acceleration that’s included in AI Core (right).

The AI engines in the AI Core are 1-GHz VLIW/SIMD vector processing cores with their own memory (Fig. 4). These are connected in an array, not to be confused with the NoC. The tightly coupled memory can be organized in different memory hierarchies. The hardware and software are programmable to support current and emerging AI deep-neural-network (DNN) models.

4. Xilinx’s AI accelerator consists of an array of vector-processing cores with tightly coupled memory.

Versal also includes a range of hardware tied to the NoC, such as host interfaces like x16 PCI Express Gen 4, AXI-DMA, and CCIX (“see-six”). Xilinx is part of the CCIX Consortium that manages the Cache Coherent Interconnect for Accelerators (CCIX) specification. CCIX is a high-speed interconnect.

Memory interfaces include DDR4-3200, LPDDR4-4266, and HBM. Among the network interfaces are 100G multi-rate Ethernet, 600G Ethernet, and Interlaken plus 600G cryptographic engines that support AES, IPSEC, and MACSEC. Chips will be available with high-speed SERDES that support 32G, 58G PAM4, and 112G PAM4.

The RF signal chain is for the RF Versal incarnation. The support includes the multi-gigasample/s DACs and ADCs as well as integrated digital downconverters/digital upconverters (DDCs/DUCs). Software-decision forward error correction (SD-FEC) is also part of the mix, which is crucial for high-speed communication systems such as 5G.

Finally, there’s MIPI D-PHY support for sensors at rates up to 3 Gb/s. NAND and other memories are supported directly. LVDS and GPIOs round out the peripheral complement.

The block diagram doesn’t show security features other than crypto acceleration, but there’s more that’s not enumerated in the illustration. The processors support Arm’s TrustZone, which is the basis of trust since the software comes up first. The NoC can be partitioned so that specific processors or AH blocks are limited to what devices and services are available to them. Such support is important to partitioned system like automotive or avionics, where certification is only possible if the underlying system configuration can be guaranteed and isolated. The NoC also implements features such as quality-of-support (QoS) that are needed in real-time embedded systems.

Development is more complex but potentially easier because of the NoC and standard hardware components (Fig. 5). The unified software development environment that features Vivado will address the hardware accelerators, including the AI and DSP engines. One interesting thing to note is that Xilinx has Python in its primary language list that also specifies C and C++. Python is one of the major languages used in AI work. Furthermore, the development environment considers accelerators that are implemented in the AH. This should make support of third-party logic implementations easier to manage and incorporate into an application.

5. Xilinx will provide a unified development environment to develop applications for Versal.

Versal chip specifications look very similar to those of Zynq. The VM1102 Versal Prime is at the low end of the series and includes 472 DSP engines, 352 system logic cells, over 150K lookup tables (LUTs), and 256 kB of on-chip ECC memory. The Versal Prime family has a pair of Cortex-A72s and a pair of Cortex-R5s, dual Ethernet ports, USB-2.0, and dual CAN-FD. It doesn’t incorporate CCIX support, but does sport a x8 PCIe Gen 4 port. And it fits into a 21- × 21-mm chip.

The top-end Versal Prime VM2902 has over 3000 DSP engines, 2154 system logic cells, and almost one million LUTs. In the mix are half-a-dozen memory controllers, a x16 PCIe Gen 4 port with CCIX support, and a pair of x8 PCIe Gen 4 ports.

The AI series starts with the VC1352, which has 128 AI engines and 928 DSP engines. The high-end VC1902 incorporates 400 AI engines and 1968 DSP engines. They have comparable compute, I/O, and memory complements along the lines of the Versal Prime, although the AI platforms have more Ethernet MAC support and the VC1352 has SD-FEC support.

ACAP

Xilinx’s ACAP architecture changes the dynamic between FPGAs and ASICs. An FPGA fabric is still part of the Versal ecosystem, but it’s surrounded by a much larger hard-core array of devices that will be implemented as efficiently as any ASIC. Likewise, large SoCs will always have something like the NoC to connect the ever-growing device complement.

The ACAP approach has a number of benefits, including power and performance efficiency due to the hard cores. For example, Xilinx is expecting to deliver a Versal core that uses as little as 5 W. Having more fixed functions also simplifies software, since more standard targets are available for manipulation. Another key factor is the NoC, because on-chip communication is significantly more efficient than going off chip. This also reduces the number of pins needed for a device. Finally, costs should be more comparable to ASICs for similar functionality compared to FPGAs, since FPGA fabric tends to be expensive compared to hard logic in terms of footprint and overall cost.

Versal devices will obviously be used for design and development purposes with the long-term goal of an ASIC. However, they’re more likely to be applied in the final deployment due to their lower costs, lower power requirements, and higher performance characteristics compared to FPGAs or more basic FPGA SoCs.

Versal looks to hit a sweet spot in terms of design, addressing a wide range of applications that are demanding in terms of performance and communication. There’s no comparable solution at this point, so vendors that want to enter the ACAP sphere will be playing catch up at this point. Versal offers developers an opportunity to deliver cost-effective solutions that don’t require an ASIC or a more complex collection of chips on a module or PCB.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish