Unified DDR3 Memory Channel Design for High Bandwidth Apps with Legacy Components

The demand for DDR3 memory channel bandwidth is growing rapidly in applications such as high definition video and TV, smart phones, and networking component. To meet these requirements, SoCs usually include a high bandwidth DDR3 memory channel. However, in a number of SoCs, many legacy component still need to access a normal DDR3 memory channel as well.

Some legacy component may interface with a DDR3 memory channel through a non-split transaction type bus protocol (such as AHB bus). SoC designers often create separate DDR3 memory channels to prevent the non-split transaction legacy protocol component from blocking those high bandwidth applications. This implementation, however, may significantly increase both pin count and design complexity.

To minimize pin count and simplify SoC design complexity, this article introduces a unified single DDR3 memory channel architecture that can simultaneously serve legacy component without creating performance bottlenecks.

Unified Memory Channel Background

Memory channel performance plays an important role in overall SoC system performance. In many applications, many components inside the SoC require a certain amount of memory bandwidth . For example, a high definition video component requires memory bandwidth to access display data from memory channel. In virtually every SoC, total memory bandwidth will be shared by more than one component. Under such condition, effective memory bandwidth is more than a matter of memory bus bandwidth or memory controller logic. Sometimes the interface protocol of a component creates a memory channel bandwidth bottleneck.

Figure 1 shows a regular memory channel architecture shared by two different components.

Figure 1. Memory channel design with two components.

Besides the effective memory bus bandwidth and the memory controller efficiency, component interface also plays a very important memory channel bandwidth role. The component interface protocol may create a bottleneck as part of read transaction access latency.

There are different kinds of component interface protocols. A modern component interface protocol usually supports a split-transaction protocol. This means that the component can issue the read transactions efficiently without blocking the overall performance. A split-transaction read protocol is shown in Figure 2 and Figure 3.

Figure 2. Split-transaction read command is more efficient without causing blocking.

Figure 3. Split-transaction read data timing diagram.

As shown in Figure 2, a split-transaction protocol component can issue read commands in a pipelining approach with read command IDs (RdAddrID) to a memory controller without waiting for the return of read data. After certain initial lead off latency, the first data will be returned to the component with a corresponding read command ID (RdDatID). The following read data will be returned to the component in pipelining and back-to-back behavior without suffering subsequence latency.

However, there are a lot of legacy components that will use a non-split-transaction interface protocol. Often, the legacy's read transaction behavior will create a memory channel bottleneck . The non-split transaction read protocol is illustrated in Figure 4.

Figure 4. Non-split transaction read protocol can be a memory channel bottleneck.

In non-split transaction read protocol, the next read command can be issued only when the prior read transaction data return. Therefore, there will be one access latency for every read transaction such that the legacy interface protocol will become the performance bottleneck of the memory channel.

At the same time, the legacy component may hold the memory channel resource and block the performance of the split-transaction component. To solve this bottleneck problem, the SoC designers usually use a multiple memory channel architecture as shown in Figure 5. This approach, however, may create additional design complexity and increase pin count overhead.

The answer to this complicated set of circumstances is a unified memory channel architecture to resolve the performance bottleneck issue.

Figure 5. A multi-channels architecture provides a unified design.

Unified Memory Channel Summary

Figure 6 shows a unified memory channel architecture that can be used to explore optimal memory bus bandwidths for both modern components and legacy components. In this architecture, the SoC uses only one unified memory channel block.

Figure 6. A unified memory channel architecture can deliver optimal bandwidth.

For modern components with a split-transaction interface, the split-transaction interface buffering bridge will issue pipelining split-transaction commands to the memory controller. The split-transaction interface buffering bridge of each component has its own buffering FIFO to store commands and data so that each component will not block the other's command or data flow control. The split-transaction interface buffering bridge architecture requires independent buffering resource for each component so that there will be no resource conflict among the components.

Legacy components will have two bridge modules that connect the legacy component to the memory controller. The first block is the non-split transaction buffering bridge. This block transforms the non-split transaction interface protocol from component to a pseudo-split transaction buffering bridge. The access latency induced by a non-split transaction will be handled by the pseudo-split transaction buffering bridge and will not block the performance of modern split-transaction interface modern components.

As an example, in a regular architecture, an incremental burst of an unspecified length command from the legacy component will occupy the memory controller read access resources until the unspecified length command has been completed by the source. However, in the unified memory channel architecture, the incremental burst of unspecified length commands will be broken into pseudo-split transaction commands in the pseudo-split transaction interface bridge so that the pseudo-split transaction commands will not hold memory controller resources for the entire non-split transaction time period. The architecture of the pseudo-split transaction interface bridge will be described in the next section.

Unified Memory Channel Detail

The pseudo-split transaction buffering bridge includes four modules as shown in Figure 7. There is a source command interpreter module, slit-command generator, return data buffer and buffering control modules.

Figure 7. A pseudo-split transaction interface bridge is designed to handle the differences between protocols.

To break the access latency of the legacy non-split transaction commands, a source command interpreter accepts the source command and keeps the information, such as length (or incremental burst of unspecified length), to handle the differences between non-split transaction protocol and pseudo-split transaction protocol and to correctly "handshake with the non-split transaction buffering bridge.

The split-command generator is responsible for taking the on-going non-split transaction command and breaking it into split transaction commands for the memory controller. For example, when an incremental burst of an unspecified length command is issued from the source command interpreter, the split-command generator will partition the command into fixed length commands to issue the pre-defined number of partitioned commands to the memory controller.

After the pipelining commands are issued, the split-command generator will release the memory controller address path resource immediately, just like a split-transaction buffering bridge, and let the return data buffer module track and wait for return data. After the data is returned from the memory controller, the return data buffer module will send the data back to source command interpreter and let the source command interpreter handle the protocol with the non-split transaction buffering bridge. In this way, the source component still interfaces with memory channel in non-split transaction protocol.

There is still, however, a flow control problem. Since the length of the on-going non-spit transaction is unknown for an incremental burst of unspecified length transaction, the split-command generator will not know how many pre-defined partitioned commands will be issued to the memory controller to meet the need of the on-going non-split transaction.

To solve this flow control problem, the buffer control module is responsible for the number of split-command generator commands, as well as the clean-up at the end of the on-going non-split transaction.

After the start of the on-going non-split transaction, the split-command generator will issue the transaction's pipelining commands based on its communication with the memory controller. However, the buffer control module will get an end-of-transaction notice from the source command interpreter after the on-going non-split transaction command is completed. The end-of-transaction notice triggers the buffer control module's notification to the split-command generator to stop command issuing activities and return to an idle state to await a new transaction.

At the same time, the buffer control module tells the return data buffer module to initiate a clean-up action that occurs in the following steps. First, the buffer control module will wait until all return data that are part of the on-going non-split transaction have been sent back to the source command interpreter. Second, the buffer control module will wait until all the return data from the pipelining commands issued by the split-command generator have been returned from the memory controller. After the first and second steps are confirmed by buffer control module, it will flush all the unnecessary data and status in the return data buffer module so that it is ready to process the next non-split transaction from split-command generator.

The above description shows how pseudo-split transaction buffering bridge handles the incoming non-split transaction without holding memory controller resources as a result of a non-split transaction.

Conclusion

In this article, we have defined how a unified memory channel architecture simultaneously handles split-transaction protocol and non-split-transaction protocol without suffering a shared bandwidth penalty.

With the advance of the memory technology, high performance (bandwidth) and high capacity memory components are becoming increasingly popular. Consequently, many SoC design bottlenecks can be resolved by using these ultra high performance memory components.