Many of the metrics that companies currently use to determine and market their products' performance have become outmoded and need a new approach. Measurements commonly assigned to system performance—CPU clock rate, amount of main memory, storage capacity, theoretical aggregate bandwidth between the CPU and its primary memory subsystem—no longer are as relevant to most applications as they once were. Anyone who believes that these easy-to-quantify metrics will accurately predict system performance likely hasn’t performed many such measurements on real systems.
Focusing just on performance (not ease of use, maintainability, or other broader aspects), customers really only care about how quickly their applications run. And the vast majority of applications sees no system performance increase once the basic subsystems have reached a certain minimum threshold, which in most cases we’ve already achieved.
Take CPU clock rates. A CPU running at some mind-boggling megahertz rating provides less value if it needs such a large fan you can’t hear yourself think, especially if the CPU is spending most of its time in the idle loop. Focusing too much on any one component is like playing the fastest running back on the football team even though he fumbles the ball most of the time. The outcome will disappoint you. If you look at how real systems operate today, the performance is determined not by the individual ratings of the underlying building blocks, but rather by how well the subsystems work together— the interconnect. Looking at it another way, today’s systems operate much like the networks of yesterday. You’ll speed up network performance if you employ a higher-speed local-area network connection. Today’s systems are similar, only with other, less obvious interconnect issues to resolve.
I’m not suggesting we just replace tried-and-true measurements such as a CPU’s clock speed with some theoretical aggregate interconnect bandwidth measurement. That would merely swap one misleading metric with another. To properly determine if the measurement has some value, we need to look more deeply. Sometimes, very low initial latency is just as important to a system as its possible throughput. If an amazingly fast CPU is waiting for a reply to a query before handling its next operation, it doesn’t matter how much extra interconnect bandwidth is available, since it’s not being used. An interconnect’s overhead is just as important, since the usable bandwidth largely determines system performance. A protocol like Ethernet, for instance, has high latency and uses up much of its bandwidth for overhead. So, it’s a poor choice for subsystem interconnection.
Looking deeper into the subsystem, we can see the importance of interconnect “blocking.” Blocking occurs when one piece of data gets stuck and prevents another from getting through, even if the second piece has a clear channel in front of it. The likelihood of interconnect blocking depends a great deal on how the protocol is specified and on how the hardware is designed. It’s not difficult to measure things such as blocking, as long as they’re identified in the system design process.
Many other metrics are better suited to predicting system performance, and each one of them is readily quantified once you determine what you need to look for. Luckily, there’s an easier way to ensure better overall system performance—a state-of-the-art interconnect such as PCI Express. It provides the high-bandwidth, low-latency, low-overhead, and blocking-resistant features ideal for modern subsystems. Furthermore, PCI Express silicon is readily available now in a wide range of application-friendly configurations.
This new approach to performance measurement doesn’t mean you no longer need to think about system-level performance. But it does give you a significant head start in your design activity.