SmartNIC Architectures: The Future is a Portable Architecture

One possible future for SmartNICs is the move to a disaggregated architecture where the form and function are separated. That means the functionality is software-defined, but the architecture is hardware-accelerated.

Scott Schweitzer

Related To:

AMD/Xilinx

May 24, 2021

11 min read

This article is part of the Communication and System Design Series: Have SmartNIC - Will Compute

What you'll learn:

Development of a programming language for the data plane.
The importance of composability for NIC portables.
Managing programs and data with the control plane.

This is the fifth chapter in our SmartNIC series, and here we’ll show one path that SmartNICs and DPUs may be taking. We opened the series with “What Makes a SmartNIC Smart?,” where we defined the attributes that separate a SmartNIC from an ordinary NIC. Essentially, SmartNICs should be programmable, and the software development kit should be made available to customers. Also, SmartNICs should have extensive onboard computational power, far beyond that of a standard NIC, available for programming.

We then moved on to “Why is a SmartNIC Better than a Regular NIC?” Here, we ventured into the addition of computational elements to a NIC that make it smart. Also discussed were the types of applications that could be run on a SmartNIC, along with the benefits of offloading the host CPU from this additional processing.

In our third piece, “SmartNIC Architectures: A Shift to Accelerators and Why FPGAs are Poised to Dominate,” we reviewed all of the leading-edge SmartNIC products on the market and each of their architectures. We also delved into why FPGAs will eventually dominate in this market—hint, it’s the hardware-accelerated pipeline.

The last entry in the series was “How PCIe5 with CXL, CCIX, and SmartNICs Will Change Solutions Acceleration,” which addressed changes being made to PCIe with the addition of new protocols designed to more closely link SmartNICs and accelerators. In this article, we'll review one possible future for SmartNICs—one where the architecture is portability. Built into portability is also programmability and composability, leading us to a software-defined and hardware-accelerated platform.

The Need for One Programming Language

One of the most common deficiencies found in nearly all SmartNICs is the genuine shortage of tools available for customers to program them. A fundamental part of this problem has been the real lack of a single industry-wide programming language for network components.

As network devices like routers, switches, and NICs evolved, they separated the data plane from the control plane. The data plane processes unique data structures, called packets, and rapidly moves them through the device. Packets with the same routing information make up flows. Over the last few decades, the industry has come up with numerous innovations around how to manage flows through the data plane more efficiently.

In 2014, a standard was proposed that focused on these innovations, and the P4 Language Consortium was launched. P4 is a unique new domain-specific language built around the concept of programming the data plane.

As a language, P4 was designed to be implementation independent, meaning that programs written in P4 can be easily recompiled for any number of P4 target platforms, and they should then work on those platforms. This enables code portability within an organization where multiple P4 target platforms might exist, each with vastly different hardware processing backends.

Therefore, an engineer could write a P4 program that enforces a particular corporate security process, then install this program on all of the company's routers, switches, and eventually SmartNICs that understand P4. These products could even be from various vendors; they all need to be valid P4 target platforms. P4 enables code portability, which is vital for highly complex orchestrated systems within cloud service providers (CSPs) and hyperscale data centers.

Composability

The next critical aspect of NIC portable is composability. This is the capability of taking various modules, most written in P4, and assembling them into a coherent data-plane architecture by inserting the modules into the scaffolding provided by a flexible and portable SmartNIC architecture.

The architecture often has interfaces to two elements that are hardware-defined, the direct memory access (DMA) and media access control (MAC) engines. The DMA engine communicates with the host CPU through the PCIe bus. It’s often hardware-architecture-specific in that it leverages circuitry specific to the geometry of that chip. Similarly, the MAC engine also leverages similar communications circuitry to interface with the Ethernet network.

As such, both of these data-plane pipeline blocks are hard logic blocks that exist on the chip driving the SmartNIC. All of the remaining composable NIC architecture blocks are written mostly as P4 modules. These four main block types are hubs that are entry points for plugin modules, a virtualized NIC (vNIC), the match-action engine (MAE), and the network (NET) engines. The NIC is composable because the data plane readily accepts additional logic blocks at various points where modules can be plugged in (Fig. 1).

1. The composable NIC streaming subsystem architecture data plane readily accepts additional logic blocks at various points where modules can be plugged in.

The custom blocks can be written in P4, C/C++, HLS, and RTL, and then inserted directly into the data plane. These custom blocks connect with hubs that provide multiplexing, buffering, and routing functions between streaming processors. This provides customers with the ultimate flexibility when it comes to managing network flows.

The vNIC blocks are written in P4 and present a traditional NIC interface to operating-system (OS) device drivers. One or more vNICs can exist, and they can be mapped to specific applications via ports. This also enables kernel bypass for applications seeking the best possible performance. The MAE handles Open vSwitch (OvS) functions for kernels supporting containers or hypervisors managing multiple virtual machines (VMs). The MAE can also be leveraged to execute stateful filters providing a potential system firewall within the SmartNIC.

Finally, we have the NET receive and transmit blocks that handle basic ingress and egress packet functions, typical NIC features, and stateless offloads. As shown in Figure 1, suitable entry points in the scaffolding make up the composable portable SmartNIC so that customers can insert code wherever they see fit in both the receive and transmit data-plane pipelines.

All of the composable modules in the data-plane pipelines shown in red in Figure 1 execute in programmable logic. This means that bug fixes and future feature additions are a simple matter of updating the bits that make up the portable NIC. Traditional NICs are almost entirely hard logic—sure, sometimes you can load new firmware into them, but much of what makes the NIC a NIC is set in silicon the day the product ships. As you can see in the figure, only three block types are fixed; all of the others are fluid and can be updated. This means that the portable NIC is software-defined and hardware-accelerated.

Data-Plane Performance

In Figure 1, we focused on the data plane with its receive and transmit pipelines. Performance of the data plane is critical because as Ethernet speeds increase, so does the data-plane packet rate. In Figure 2, we show the data plane for an Arm core-based SmartNIC architecture. The data plane in an Arm core approach will become congested as it passes packet data back and forth between memory Arm cores and logic blocks.

2. The data plane in an Arm core approach will become congested as it passes packet data back and forth between memory.

In this example, the Arm core has done some packet processing in stage 1. Then in stage 2, it pushes that packet into DRAM memory. In stage 3, an “accelerator” brings that packet back in from DRAM to perform some operation. In step 4, after operating on the packet, the accelerator copies the packet directly into the cache of another Arm core. In step 5, the Arm core, after working on the packet, then copies it back to DRAM. Finally, in step 6, the accelerator pulls that packet from DRAM and pushes it to the host CPU through the DMA engine.

As the packet rate into this Arm-core-based SmartNIC increases, especially into the hundreds of millions of packets per second, the overall throughput of the NIC will suffer, packets will be dropped, and retransmissions will begin, further aggravating the problem. Packets are continually being moved around, most often through memory creating congestion on the DRAM bus.

Figure 1 shows that the programmable logic architecture is pipelined, meaning that it can ingest vast sums of packets. Packets flow through the pipeline stages, all operating in parallel and without requiring that packets be moved in and out of DDR memory. Operations occur on the packets due to logic gate calculations, not classic fetch and execute CPU instructions. This means that with every clock tick, all parallel stages of each pipeline are performing logical operations on packets. This is what’s meant by hardware-accelerated.

Management via the Control Plane

We haven’t yet touched on the control plane. This is often a collection of Arm cores that manage the programs and data inserted into the logic blocks and lookup tables used by the data plane. The control plane also collects statistical information useful in further managing and measuring key elements’ performance within the NIC. Control plane tasks occur in parallel to the data plane and should not gate the overall performance of the NIC.

The control plane enables software-level management of the data plane, allowing the software-defined element in a portable NIC architecture. Another part of the control plane is exception handling. When a new packet arrives, for which a flow rule doesn’t exist, an exception event occurs, which is handled by the control plane. This exception can also be forwarded to the host CPU to decide whether it’s a valid packet. If the packet is valid, then a flow rule can be added into the MAE by the control-plane Arm core or host CPU. Then all the subsequent packets will be handled entirely within the data plane.

In addition to all of the data-plane functions discussed above, compute-intense applications such as IPSec, kTLS, and DPI/Regex, which may consume significant amounts of host CPU cycles, can execute with the Arm cores in the control plane. This enables the offloading of these applications directly into the SmartNIC, where they can be implemented using the plug-in architecture.

Charting the Roadmap

As SmartNICs gain traction with hyperscalers, CSPs, and large data-center customers, they will demand to see the SmartNIC vendor’s product roadmap. In technology, roadmaps layout a timeline, often stretching out 18 to 36 months, demonstrating to customers the vendor’s commitment to a given product line.

In 2007, Intel introduced the Tick-Tock production model, which means that one cycle, the tick, is an improved platform, and the next, the tock, is the chip geometry change. While everyone isn’t like Intel, the industry does often mirror their cadence. For SmartNICs, it’s often new silicon on the tick and software or board variations on the tock. The time between ticks was typically two years; this ensured an adequate return on the chip investment. A portable SmartNIC architecture is, in essence, the OS for the NIC.

By restructuring classic NIC development as a software project, instead of an application-specific integrated-circuit (ASIC) design project, this frees the NIC team up to advance at a pace more in keeping with the business's needs. Also, chip development can focus on advances in PCIe and Ethernet input/output logic and innovations in programmable and control-plane logic. Board development becomes more an issue of supporting the new chip and its heat and power, not about accommodating the ever-increasing list of features and offloads requested by marketing. Having a genuinely portable NIC enables software and hardware advances to occur in parallel rather than lockstep.

What does this mean for a roadmap? Simple, it shows customers that there’s a consistent NIC OS that evolves, along the way, new features will appear in the OS. Moreover, additional plugin modules will be made available by both the vendor and third parties to increase the range of applications running on the NIC. It frees up the chip- and board-development teams to execute in parallel with the portable NIC team. Each product on the roadmap will be a point of convergence for all three groups.

Technology is always marching forward. Today we’re talking about P4 programmable, composable, and portable SmartNICs that install into programmable logic, creating a highly efficient collection of accelerated packet-processing pipelines. Recently, Xilinx took the above approach in releasing its new Alveo SN1000 SmartNIC.

In the future, new portable SmartNIC builds will execute on even more advanced chips. These chips will leverage thousands of machine-learning engines, a high-performance network on chip (NoC), and composable DMA engines. The future is bright, and it will be exciting to see how software-defined, hardware-accelerated technology evolves over time.

Read more from the Communication and System Design Series: Have SmartNIC - Will Compute

About the Author

Scott Schweitzer

Director of SmartNIC Product Planning, Achronix Semiconductor

Since his baptism on the altar of the TRS-80, Scott has been a lifelong technology evangelist. He's written profitable software products, built hardware, and formally managed programs for IBM, NEC, Solarflare, and now Xilinx.

After two decades plugging along with single socket computing platforms, Scott joined NEC in 2003 to manage its new Intel Itanium multi-core 64-bit Super Computing Server, and he's never looked back. In 2005, Scott shifted his focus to clustering for Super Computing and extreme performance networking.

As 10-GbE adoption ramped up, he launched his wildly popular 10GbE.net blog. With market changes in 2017, Scott rolled 10GbE.net into the Technology Evangelist, a blog with thousands of monthly page views, and an accompanying podcast.

SETI Celebrates 30 Years of Searching for Extraterrestrial Signals

SmartNIC Architectures: The Future is a Portable Architecture

The Need for One Programming Language

Composability

Data-Plane Performance

Management via the Control Plane

Charting the Roadmap

About the Author

Scott Schweitzer

Director of SmartNIC Product Planning, Achronix Semiconductor

Related

SETI Celebrates 30 Years of Searching for Extraterrestrial Signals

Webinar: Introducing VITA 93 QMC

DC-DC Converter Design Made Easy

MAX66250/MAX66301 NFC Secure Authenticators and Coprocessors

Voice Your Opinion!

To join the conversation, and become an exclusive member of Electronic Design, create an account today!

Trending

Skyworks Buys Qorvo to Create $22 Billion RF, Analog, and Mixed-Signal Chip Giant

Value-Optimized PMIC Steered Toward Automotive Systems

Meta Unveils Helios Open Rack for AI Infrastructure

Sponsored Picks

Intelligent Buildings

Designing Accurate Gas Monitoring Systems with Chemiresistive Devices

LTC4296-1/LTC9111 SPoE/PD Controllers