(Image courtesy of Intel).

Intel Overhauls Architecture in Alder Lake CPUs Due Out in Late 2021

Aug. 25, 2021
Intel said Alder Lake would be based on its Intel 7 node when the first desktop processors in the family come to market later this year. Other new features of the processor include support for PCIe Gen 5, DDR5, and LPDDR5.

Intel said that its new line of Core processors for personal computers, code-named Alder Lake, will feature two different sets of CPU cores when the first products come to the desktop market this year.

As part of its annual Architecture Day, the semiconductor giant shared details on the Alder Lake CPUs and the new x86 microarchitectures inside. Instead of combining as many identical CPU cores as possible, the Alder Lake chips have a hybrid architecture, where less strenuous chores run on smaller “Efficient” (code-named Gracemont) cores, while high-priority tasks are reserved for bigger “Performance” cores called Golden Cove. Intel said the combination gives it large performance uplift over its predecessors.

The Context

Intel has been ceding ground in the semiconductor market in recent years in the wake of severe delays in its most advanced processors. Intel CEO Pat Gelsinger has charted out a strategy to regain its leadership in process technology by 2025. But as the recovery plan plays out, Intel is trying to prevent AMD and other rivals from further eroding its market share by leaning harder on advances in its processors' architecture.

Apple is in the process of replacing Intel in its Macs with its internally designed M1 chip. Qualcomm, long the largest smartphone chip vendor, is also expanding its ambitions in laptops with its Snapdragon 8cx.

Intel is trying to get its act together with the heterogeneous architecture at the heart of Alder Lake. The multi-threading Efficient core (also called the “e-core”) offers a big generational leap in performance and efficiency over Intel's Skylake core. The Performance core (or “p-core”) plays a more central role in Alder Lake, bringing a boost in single-thread to help with hefty chores on personal computers, such as gaming.

Intel said Alder Lake will be based on the Intel 7 node, previously its 10-nm Enhanced SuperFin process.

Alder Lake

Intel entered the era of heterogeneous architectures with its Lakefield CPUs last year. But the chips paired only a single high-end “Sunny Cove” core with four high-efficiency “Tremont” cores based on its 10-nm process, with a focus on power efficiency at the expense of performance. Intel is pushing the envelope with Alder Lake CPUs, which will have two sets of cores for the first time in one of its flagship processors.

Intel said Alder Lake would have up to 16 cores, split between as many as eight large performance cores and eight small efficiency-focused cores, handling up to 24 threads all at once, with up to 30 MB of cache. Intel said Golden Cove pumps out 19% more performance on a wide range of general-purpose workloads than Intel's Cypress Cove core, while Gracemont is 40% faster (or 40% more power-efficient) than Skylake.

For Alder Lake, Intel said that it engineered a wide range of interchangeable building blocks, from Golden Cove and Gracemont to graphics, display, memory, I/O, and other components, which can be rearranged depending on the market Alder Lake is targeting. Intel said Alder Lake has a highly scalable architecture that can be used in ultramobile laptops to desktop PCs with power ratings from about 9 W up to 125 W.

“One of our most important goals when designing Alder Lake was to support all client segments through a single highly scalable SoC architecture,” noted Arik Gihon, Alder Lake's chief processor architect. While the desktop processor has up to eight Golden Cove and eight Gracemont cores, the mobile CPU brings together up to six P-cores and eight E-cores and the ultramobile up to two P-cores and eight E-cores. 

Intel is bringing PCIe Gen 5 into the fold for the first time in its product lineup. Intel said the Alder Lake CPU supports up to 16 lanes of PCIe Gen 5, which can supply up to double the bandwidth over PCIe Gen 4, or up to 64 Gbps over 16 lanes. For desktop, Alder Lake offers 16 lanes of PCIe Gen 5 with 4 lanes PCIe Gen 4 lanes. For the mobile segment, it supports up to 12 PCIe Gen 4 lanes and 16 Gen 3 slots, Intel said.

It also improved the memory controller to fetch information from DRAM at a faster rate than its previous chip designs. Alder Lake ushers in support of the new DDR5-4800 and LPDDR5-5200 standards with a unified memory controller that also supports DDR4-3200 and LPDDR4x. Intel said Alder Lake can adjust the memory’s frequency in response to the workload being executed to save as much power as possible.

Alder Lake chips have a wide range of interconnect fabrics to connect the varying components inside. The compute fabric runs through the middle of the processor, serving as a sort of spine ribbed on both sides with x86 cores. The fabric supports up to 1,000 gigabytes per second (Gbps), or 100 Gbps for every core, and it can choose the path of least resistance to boost bandwidth or latency, depending on the data load.

The I/O fabric supports up to 64 Gbps, while the memory fabric features data rates of up to 204 Gbps and can change speeds based on whether the system needs higher bandwidth, less latency, or lower power.

Intel's big.LITTLE?

Intel is not alone in combining two types of cores in a CPU. This concept has been at the heart of Arm’s big.LITTLE architecture for more than a decade. Many of the world’s smartphone chips have clusters of larger, power-hungry Arm CPU cores reinforced by clans of smaller, high-efficiency but less potent cores to prolong a device’s battery life. Arm used the same arrangement in its DynamIQ architecture, which is meant to give chip designers more flexibility to mix and match CPU cores in a system on a chip (SoC).

The concept has also played a major part in the long dominance of Qualcomm’s Snapdragon chips for Android smartphones as well as Apple’s A-series chips in the iPhone. Apple also used a heterogeneous architecture in its M1 silicon for Macs, which features four high-end cores code-named “Firestorm,” and four power-efficient “Icestorm” cores on a single silicon die, in a configuration that resembles Alder Lake’s.

With Alder Lake, Intel is focused less on saving battery power than it is on wringing out more performance for workloads with a lot of threads, running operations faster by dividing labor between Golden Cove and Gracemont.

Thread Director

To keep the cores working together seamlessly, Intel rolled out a new workload scheduler called “Thread Director” that tells the operating system when to run threads on Golden Cove-class cores and when to run them on Gracemont-class cores. The new technology, wired directly into the hardware, is used to intelligently assist the operating system in assigning tasks in the foreground such as video games and other background tasks to the right core at the right time.

Intel said Thread Director can adjust scheduling decisions in real-time based on the conditions the CPU is experiencing instead of assigning threads based on “first-come, first-serve” or other pre-programmed rules in the OS. That inflexibility “leaves a lot of performance on the table and creates overhead with software development," said Intel’s Rajshree Chabukswar.

The tool adds another dimension to how the OS determines where to schedule a thread. In the event a high-priority workload enters the pipeline while all the performance cores are occupied, Thread Director communicates a hint to the OS about threads that can be relocated without a penalty to performance. The system then offloads the existing thread to an efficiency core, creating an open slot for the new thread.

Intel said Thread Director monitors the mix of instructions in each thread and the state of each core, offering feedback while programs are running. The system adapts the advice it gives the OS based on its power limits and the amount of heat the processor can tolerate.

“Nothing is static based on any software,” Chabukswar pointed out. “Everything is dynamic, based on the current context of whatever is running on the system, all augmented by hardware telemetry."


To join the conversation, and become an exclusive member of Electronic Design, create an account today!