What are the Challenges Facing SmartNIC Designers?

This video is part of the TechXchange: SmartNIC Accelerating the Smart Data Center and TechXchange Talks

SmartNICs are forming the front end to the cloud, allowing significant amounts of processing to occur at the communication interface. This offloads the cluster of processors and GPGPUs that make up the main computing unit of a cloud or enterprise server environment. Designing and building a SmartNIC is more than slapping a processor alongside of a few Ethernet network interface chips.

I talked with Achronix's Scott Schweitzer about design issues associated SmartNICs.

Transcript

Wong: Hello. I'm Bill Wong, Senior Content Director with Electronic Design. And today we're going to be talking about SmartNICs with Achronix's Scott Schweitzer.

So, Scott, could you tell us a little bit about some of the challenges that SmartNIC designers are having today?

Schweitzer: Well, Bill, thank you for having me on today. I think when it comes to designing SmartNICs, there's two levels to it.

The first level is the board-level design and the second level is the chip-level design. We have experience in both areas and both of them share a lot of common challenges. There's obviously the functionality you're looking for from the product. You've got to balance that with the space available to put the product in as well as the power. Then there is the heat that the board's power generates. It can be quite a vexing problem to address.

When you look at a SmartNIC, for example, you try to target a full-height PCI Express board that is half length. So you can get into as many slots as possible. The full height, half length, and single width constrains how much heat you can radiate. At the end of the day, all the electricity that goes into a core turns into heat for the most part and you have to be able to get rid of it.

Power and heat are critical at that level. On the chip level, we have the same kind of issues.

If I want to do a chip, I want to fit it in a half reticle or a full reticle, which is kind of the die size or space that you want to consume for the chip. Then you have to look at the functionality I want in there. That turns into intellectual-property blocks or async blocks that you want to put around the perimeter. I also want programmable logic like Arm cores.

You've got to kind of balance all of these things out and fit them in. Not only have to fit them in, but also account for all the heat and other things that all of these parts can generate

So it's an interesting juggling act.

Wong: Well, most of the SmartNICs leverage Arm cores for the programmable data plan to analyze and modify routes in packets. Is this the best approach or is it just a reasonable compromise?

Schweitzer: It's a good compromise in that everybody knows how to program an Arm core. Right. There are already blocks written to do a variety of different functions with Arm. So it's pretty reusable and generally a nice compromise.

I would say the issue is that's fine, for maybe a 10- or 25-Gig SmartNIC. As you start to get to 140-Gig Ethernet, you expect that an Arm core is going to have a challenging time to decompose a packet, figure out what the packet is going to do. It is more difficult when you want to do some sort of transform before sending it on its way. That's a lot of work to do for an Arm core at a 100-Gig timeframe. So you really want a basic logic or programmable logic to address data at those rates.

Wong: Well down the road. Should a smart chip augment or even replace the BMC complex on a server?

Schweitzer: If we look at the BMC complexes that we've kind of grown accustomed to over the years, like Dell's iDRAC, as an example, those have been Gigabit Ethernet interfaces. They've helped us manage the servers very effectively. What we're going to see down the road is to have what's called a data plane that the data flows through and then a control plan for managing the data flow. That control plane is analogous to the control plane that you would find in an iDRAC in the BMC, baseboard management controller chip.

So there is a tremendous overlap of function. The next generation or even the generation after we're going to see the DPU basically taking over the server where you're not actually going to put an OS on the server anymore. You're going to put an OS into the DPU and the DPU will work like a hypervisor or a container manager, and it will dynamically put VMs or containers into the x86 cores as they're needed for the workloads that you have to do.

As you get down the road to that world, the DPU is going to do a lot of things similar to what the baseboard management controller or iDRAC Dell server does today. We could see that down the road. That is going to be a possibility where those two items become one and then they're on the motherboard.

They've got maybe a 10-Gig data-plane link that's hot all the time. When the servers powered up, then the 100- or 400-gig links come online and then everything functions from that point.

Wong: Now VMware and AMD are all pushing for Project Monterey to become the de facto SmartNIC operating system. Now, is this a wise approach or are there better options available?

Schweitzer: Windows kind of controls the OS of the desktop and Apple has some entrenched space there as well. Linux has kind of the OS for servers. You're going to see an OS battle on the DPU front and VMware kind of saw that up front early on. They kind of took ESX, spun off a version of it that would run on Arm, and then kind of migrated over and said, "Hey, you know what? We can use this as kind of the DPU OS." It's very clever of them. I give them a lot of credit for doing it.

It's a clever way to get entrenched early. The problem is it's going to be proprietary. They're going to want, you know, some sort of fee for that OS or hypervisor running on the CPU, and they're going to charge you per whatever down the road. Linux and the Linux crowd that, you know, is out there. And it's been used to doing this for decades.

Once something is free, they're not going to want to pay for the OS for another OS for the server down the road. So Red Hat has OpenShift that will will likely be a direct competitor with VMware as Monterey. I expect OpenShift is getting some attention from India and a few others. That's probably going to be the lead horse in the open software path for the battle of the SmartNIC.

Wong: Delving down a little deeper. What role should programmable logic play in a robust SmartNIC implementation?

Schweitzer: That's a great question. These NICs are getting extremely complex. If you look at the block flow diagram, for example, for Chelsio's newly announced T7, Terminator 7 Deep Blue Chip, which was in their newsletter Monday this week, I think there's probably a dozen IP blocks on that chip.

These things take the better part of two years to go from concept to fruition and show up is first silicon. Then you have to go through all the testing and bring up and you end up realizing doing a metal spin sometimes, so it could take up to three years to develop the chip. New ideas have come along and new problems have come along, too. New things that need to be addressed come along.

It's always a good idea to have some sort of programmable logic available to you to address these new problems in a programmable framework like an Arm core or a RISC core. The data rates are just so high. I mean, RISC-V is maybe a little better shot at doing it because they can, you know, dynamically change some of their some of the opcodes and command programming language. But for the most part, programmable logic is the way to go with this kind of stuff.

You can only do so much with hard logic, so you need to keep an area of programmable logic available or you could do a converged adapter where you take something like Chelsio's T5 or NVIDIA's Bluefield 3 or Broadcom's chip. You can put an FPGA up next to it. Intel has been famous for doing that. They've done it on the last couple of boards that they've produced where it's a hybrid. They use an ASIC chip for the kind of run rate data-plane stuff, but then they have the FPGA available for all the stuff they didn't think of at the time they designed a chip.

Wong: Great. There's a lot going on in SmartNICs these days. Thanks for filling us in, from your perspective at Achronix.

Schweitzer: No problem. Thanks, Bill. Appreciate your time.

Check out more videos/articles in the TechXchange: SmartNIC Accelerating the Smart Data Center and TechXchange Talks