Density vs. latency: Hitting the sweet spot

Selecting the best memory solution in terms of system performance often means designers must trade memory density versus performance.

As today's processors continue to speed up, main memory grows larger. Unfortunately, memory access time (latency) has seen little improvement. In fact, with each succeeding generation, main memory access takes longer in terms of processor cycles. Although other variables are involved, memory latency is closely tied to overall system performance. Therefore, the choice of memory technology can dramatically impact system performance as well as cost.

Memory tradeoffs

Traditionally, SRAM embedded memory has been the designer's choice for fast memory. This choice, however, comes at the expense of cost and silicon area. Alternatives such as embedded DRAM or Z-RAM are much lower cost options, but they have higher latency and are typically used further from the processor. In spite of this, having more memory closer to the processor often generates a performance advantage—even if the raw memory latency is higher. With such opportunities for lower cost and higher performance, alternative memories are now replacing SRAM in what was traditionally sacred SRAM territory.

One alternative memory option—Z-RAM technology—is a new player in this market and offers some compelling features:

• Simple process integration: Z-RAM requires no process modification for logic processes built on SOI. By comparison, DRAM is difficult to build on a logic process.

• Minimal standby power: When the Z-RAM array isn't in use, all nodes in each memory cell are tied to ground. As a result, there's no array leakage; only refresh current. And the refresh current is small compared to typical SRAM array leakage. Consequently, Z-RAM can offer orders of magnitude lower standby power.

• Low soft-error rate: Z-RAM soft-error rates are consistent with embedded DRAM, and are an order of magnitude better than SRAM.

• Better value proposition: At a system level, designers can use the same amount of real estate in Z-RAM as with SRAM, but with 4x higher density. Even if Z-RAM is slower, the overall memory performance can improve.

• Cost savings: Cost savings are proportional to the amount of real estate occupied. In terms of mm² cost, if the real estate is cut down by a factor of 4, the cost of memory goes down by at least a factor of 4 as well. An important note: yield will always go up as a result of using smaller real estate, which provides added gains.

• Lithography friendly: Z-RAM memory technology features a logic-based, highly lithography-friendly layout. Its regular, grating-like structures are amenable to reticle enhancement technologies. In addition, the bit lines/words in the layout are straight, which helps maintain high yields (Fig. 1).

SOI Process

The Z-RAM offers a compelling memory option for system-on-a-chip (SoC) and MPU applications built on a silicon-on-insulator (SOI) process. For applications built on bulk silicon, there can be compelling reasons to convert to SOI. These include lower junction capacitance, speed power advantages, a lower soft-error rate, improved scalability, and the ease of closely coupling memory (both SRAM and Z-RAM) to logic. These benefits can quickly tilt the scale in favor of a transition to SOI.

System performance model

When evaluating system performance, one of the first parameters that must be considered is the time it takes to retrieve date from cache (cache hit). Here, raw access time (latency) comes into play. If data is not present in the cache (cache miss), the system must go to the next level of cache (e.g., discrete memory or hard drive) to retrieve the data. This process typically incurs a large time penalty due to the overhead involved.

A common rule of thumb is that for every level of cache you step away from the processor, the number of cycles it takes for memory access jumps by a factor of 10. Obviously, then, it's desirable to have as high a hit rate as possible. Granted, other cache structure variables will impact hit rate, but the one that typically has the most direct impact is the total size of the cache—the more memory, the higher the probability of a hit.

In typical applications, hit rates (for a given cache level) can range from 80 to 95%. It's reasonable to expect many applications, particularly those with lower hit rates, to see major hit-rate boosts with larger cache size.

Consider the following example: comparison of a four-clock latency for SRAM and a six-clock latency for Z-RAM. In this example, assume that, because of typical system overhead, raw memory-access time represents a fraction of the total memory-access time. In both cases, six clock cycles were allocated for system overhead and 100 clock cycles were allocated for a cache miss. Due to the reduced Z-RAM cell size relative to SRAM, it's assumed that the Z-RAM cost/bit is four times lower than SRAM. These assumptions create a relatively conservative model that favors the SRAM solution.

In Scenario 1, assume that memory density is the same and both solutions provide a hit rate of 75%. Here, the system performance is higher for SRAM. The primary advantage of Z-RAM would be to reduce cost. SRAM latency would be ten clocks 75% of the time, and 100 clock cycles 25% of the time. This yields an average latency of:

Average latency = Hit Latency*Hit probability + Miss Latency*(Miss Probability)

SRAM Average Latency = (10*0.75) + (100*0.25) = 7.5 + 25 = 32.5 clock cycles

Z-RAM Average Latency = (12*0.75) + (100*0.25) = 9.0 + 25 = 34 clock cycles

In this situation, the Z-RAM solution reduces memory cost fourfold, but increases the average latency from 32.5 to 34 clock cycles—4.6% longer than the SRAM solution.

In Scenario 2, assume that cache memory size increases twofold and that the hit probability rises from 75 to 87.5%. Here, the average latency comparison yields:

RAM Average Latency = (10*0.75) + (100*0.25) = 7.5 + 20 = 32.5 clock cycles

Z-RAM Average Latency (2x density) = (12*0.875) + (100*0.125) = 10.5 + 12.5 = 23 clock cycles

In this situation, the memory cost is two times lower and the average latency of Z-RAM to SRAM (23 versus 32.5 cycles) improves by 29% (Fig. 2).

In conclusion, based on analysis of the system performance model as detailed in this article, Z-RAM technology offers a strong opportunity to both reduce memory cost and increase performance.

David Fisch is director of architecture at Innovative Silicon.