Suppose someone asks you what a transistor costs. A quick look at an Arrow Electronics catalog would suggest it to be in the range of few cents to a little over a dollar in single quantity, depending on type and application.
But what if your inquirer asks how much a single transistor embedded in an IC costs? Depending on the type and size of the IC, the cost may be several decimal digits below $1. In the latest Intel processors, which include more than 10 billion transistors and sell for few hundred bucks, the cost of one transistor would be in the neighborhood of one-hundred-millionth of a dollar.
Or it could cost you $1 billion. As striking as this number sounds, it is the estimated cost to repair one defective transistor in an Intel SATA controller, as reported recently by Intel (see “Bad Transistor May Not Cost A Billion Dollars”). But why? Where does this immense amount of money come from?
The Cost Of Bugs
Being late to market is a bad thing. Exactly how bad it can be is harder to pin down because every situation is different and the dynamics are complex. While many factors like economic cycles lie outside the control of those involved in a project, there is one all-too-frequent contributor to market misses that can be controlled: late discovery of bugs.
Finding a bug late in the project is bad for two reasons: it contributes to a delivery slip, and it costs more to fix the bug. By some estimates, each stage of the project that goes by without detecting a bug raises the cost of fixing the bug by 10 times. For example, if a $10,000 register transfer level (RTL) bug gets past RTL verification and layout, you may have a $1 million fix on your hands.
These numbers get even worse when we add the contribution of the embedded software, considering that the amount of software content for a given chip project is growing dramatically. Systems-on-a-chip (SoCs) are sophisticated embedded systems, and the hardware, complex as it is, is really only the platform on which millions of lines of software will be run. And the software and hardware must be well matched, since subtle hardware problems can cause software to run poorly or not at all. This makes software validation prior to tapeout more important than it has ever been.
There are three ways to validate software before you go to silicon: high-end simulation; emulation on traditional “big-box” emulators based on custom chips sold by companies like Cadence and Mentor; and emulation based on off-the-shelf FPGA emulators such as those sold by EVE. All three of these techniques can theoretically be used to validate software on hardware, helping detect bugs early.
Calculating the cost of fixing a bug, while different for each company or project, is relatively straightforward. Determining lost revenues is much harder, and various consultants and companies have put forward different models for estimating the sales penalty for being late. Using a model that’s “too accurate” actually doesn’t help, since the unknowns swamp out the presumed higher precision.
Back in the 1990s, a company called Logic Automation* proposed a relatively simple model that assumes that sales increase linearly to their peak and then decrease linearly back to zero. Using two triangles to represent what you could have sold and what you actually did sell after a late launch, you can calculate the difference in areas to figure out what you lost (see the figure**).
With this and some assumptions, we can look at a couple of scenarios to see how the different verification options play out. We’ll assume a project on the relatively modest 65-nm node, with a mask re-spin cost of $3 million. In addition, let’s assume:
- A product lifecycle of 24 months
- Total expected revenues of $500 million
- 50 engineers making $200,000 each, working 250 days per year for a combined cost of $40,000 per day
- A savings of two months of delay (at 22 working days per month) and one mask spin using one of the three verification approaches above
Cranking through the numbers says that the two months saved will reduce the R&D cost by almost $1.8 million. The avoided mask re-spin means a cost savings of $3 million. Being late to market, however, is the real killer. Using the simplified model above, adding a two-month delay would mean losing 23.6% of your expected revenue, or $118 million. This means that avoiding the two-month delay and mask fix saves a total of about $123 million.
This, of course, assumes that all three of the verification approaches can be equally effective, both in the amount of time it takes to run the verification and in the cost of acquiring the systems. We need some more information to compare the three methods, so we will assume:
- A 20 million-gate ASIC design
- Weekly regression tests
- A billion clock cycles required to exercise the hardware
- A trillion clock cycles required to exercise the software
Let’s look first at the high-end simulation farm, assuming we can simulate 100 cycles each second. This means the hardware can complete its simulation run in a day and a half. The software, however, will require 1157 days. This creates a blocker problem right away: that’s far longer than the week allotted for running the regression tests. In fact, it’s almost as long as the entire design project and market window combined for a single run.
Simulation, then, could never be used to save the two months assumed above. It simply takes too long to run and is realistically a non-starter for what we’re trying to accomplish.
On The Emulator
Next comes the big-box emulator. With emulators, there are two ways to validate your hardware: co-emulation (running the emulator as an accelerator to simulation) at the C++ level using transactors and co-emulation at the RTL level. The higher-abstraction C++ approach will clearly run faster for those blocks that don’t yet have their RTL defined. Once those blocks have been implemented, then RTL-level validation can be run, although more slowly. Once the hardware is in place, you can validate the software using the emulator as a full-up in-circuit emulator (ICE), running the software on the emulated target architecture.
A reasonable performance assumption for a big-box emulator would be 400,000 cycles per second for C++/transactor co-emulation and 2000 cycles per second for RTL co-emulation. Running the software on the ICE implementation should be possible at 1 million cycles per second. This lets you get the hardware validation done in 40 minutes for C++ and in five days for RTL-level, leaving you less than two days for validating the software. The software validation, however, takes more than 11 days to complete.
This is much faster than simulation provides, but it is still longer than the one-week regression cycle. You could accommodate this by switching to a monthly regression cycle for the software. The cost of this lies in the fact that you will run fewer regression sets, and any software-related bugs will take longer to find. While hardware bugs could be identified within a week of their being introduced, software bugs could take up to a month to be caught.
For an FPGA-based emulator, we can achieve 3 million cycles per second for C++/transactors and 3000 cycles per second for RTL level, letting us turn the entire design in eight minutes at the C++ level and four days at the RTL level. For software, ICE verification can be performed at 5 million cycles per second, letting us complete the software run in two-and-a-half days. This gives a total run time for hardware and software combined of six-and-a-half days, squeaking into the week we allocated.
These are only examples, and designers could repeat the exercise with their own specific numbers. But the results are still likely to bolster the benefit that FPGA-based emulators bring for faster IC verification and higher design quality. Who knows? It could end up saving you a billion dollars.
*Purchased by Logic Modeling in 1992, which was purchased by Synopsys in 1994.
**The author wishes to thank Michiel Ligthart, chief operating officer of Verific Design Automation, for his sharp eye and mathematical prowess in catching an error in the original equation.