MLC Challenges Mobile-Entry Barriers

Smart phones, personal digital assistants (PDAs), and other devices are luring users with increased functionality and personalization options. These designs are based on standard operating systems. They have PC-like functionality and an operational look and feel. The new devices support multiple software applications and more sophisticated hardware, such as color screens. With a greatly increased area, they can store a mix of audio, video, and text files. To incorporate such impressive features, however, their storage requirements have had to become substantially greater. For example, 2.5G handsets incorporate 128-Mb (16-MB) or even 256-Mb (32-MB) Flash memory. In contrast, 2G handsets housed only 16 to 32 Mb (2 to 4 MB) of memory.

As storage requirements continue to climb, the demand for sleek packaging also must be met. This is particularly true for the smart-phone market, in which small size and low weight are critical design elements. To gain competitive ground, Flash-memory vendors are trying to squeeze more and more capacity into constantly shrinking silicon dies. Only a handful of technologies have found a way to pack more information into a single memory cell. Of these technologies, multi-level cell (MLC) is considered the most mature.

By reducing Flash die size, MLC technology achieves a breakthrough cost structure. Using binary or single-level-cell Flash technology, it stores two or more bits of data per physical cell instead of the traditional one bit per cell. MLC does face some obstacles, however. The increased density of the MLC-based Flash media affects data reliability and performance. In addition, MLC must program and sense the correct voltage level accurately and quickly.

Various hardware and software solutions on today's market vow to overcome these problems. Their goal is to enable MLC technology for NAND, NOR, and AND Flash media. No single company has managed to live up to this task on its own. Through partnerships, however, a few companies have introduced products that implement MLC technology with varying levels of success: Intel with NOR Flash, Hitachi with AND Flash, and Toshiba with NAND Flash. MLC Flash was first mass-produced by Intel in 1999, known by the name StrataFlash. This product doubles the capacity of NOR Flash while achieving adequate reliability. Unfortunately, its performance is far slower than standard NOR Flash.

Compared to NOR and AND devices, NAND Flash appeared to be the ideal media for data storage. It flaunts high-speed erase and write, high density, and small size. Based on these characteristics, Toshiba chose NAND Flash as the basis upon which it would implement MLC technology. The company's first MLC NAND product was introduced in December 2002. It offers up to a 50% decrease in die size compared to standard NAND. When matched against competing NOR MLC products, it claims to decrease size by about 70%.

NAND Flash itself is not a perfect media, however. It contains a number of randomly scattered bad blocks. It also requires on-the-fly error correction. Because it uses a non-standard I/O interface, it is difficult to integrate. These limitations are dramatically worsened in MLC NAND. Some other problems are also exacerbated. These include the different software interfaces and a slower programming time compared to standard NAND. The combination of these characteristics makes MLC NAND extremely difficult to use as a standalone local-data-storage solution.

Figure 1 shows the basic structure of a Flash memory cell. Though it is similar to a standard MOS transistor, a Flash cell must be able to retain its charge after power removal. Only then can it permanently store data. To enable this charge retention, a layer called the floating gate is added between the substrate and the select gate. Layers of oxide isolate it from the substrate and the select gate.

A transistor can be biased to optionally conduct a current between its source and drain. In other words, voltage can be applied to the source, drain, gate, and substrate. The voltage level at which the transistor conducts is called its threshold voltage (V_Th). The transistor only conducts if the voltage between the select gate and the source (V_GS) is larger than V_Th. Adding/removing charge to/from the floating gate modifies V_Th.

To determine if the floating gate is charged, two conditions must be met. A specific V_GS must be applied to the cell. In addition, the circuit must be capable of sensing if the transistor is conducting. These basic elements are needed to implement Flash data storage.

In Flash devices that implement binary-Flash technology, two possible ranges exist for V_Th. In contrast, MLC technology can have several valid V_Th ranges. The first implementation of MLC uses four voltage levels (FIG. 2). Each state is mapped to one of four combinations of two bits. As a result, the cell can store two bits of data.

Figure 3 shows some of the complexity that is spawned by the migration from binary Flash to MLC. Because the circuits must maintain tighter V_Th tolerances, the programming and erase processes become more complicated. The result is longer program and erase times and a more complicated read process.

MLC also offers financial benefits. Its high-density design innovations reduce silicon die size, which is the major contributor to overall device cost. For MLC NAND, this reduction in size and cost is greatest in capacities of 256 Mb (32 MB) and higher. That die can be up to 50% smaller than dies that provide a same-capacity binary-Flash device. This savings must be measured in both dollars and space.

In the cellular-phone market in particular, every millimeter of real estate can have an impact on the size of the product and—ultimately—market success. When compared with binary Flash, however, these high-density design innovations introduce three major limitations: data reliability, performance, and Flash management.

Data reliability: As noted, a binary-Flash cell must distinguish between two voltage states while an MLC-Flash cell must distinguish between four. Yet binary- and MLC-based devices both use a voltage window with a similar size. The distance between adjacent voltage levels is therefore much smaller in MLC than it is in binary Flash.

This reduced distance affects data reliability. In an MLC-Flash cell, detecting voltage levels is a more precise and complex task than it is in a binary-Flash cell. It is thus subject to a higher probability of error. This risk of error can affect data reliability in both the short and long term. For example, assume that the probability of all types of errors in binary Flash is on the order of 10⁻¹⁰. The overall probability of MLC Flash errors is two orders of magnitude worse.

For instance, take long-term data errors. To function reliably as a nonvolatile memory device, Flash memory cells must provide long-term data-retention capabilities. Here, the long-term stability of voltage levels is critical. Leakage to/from the floating gate may alter the voltage level. This leakage tends to slowly change the cell's voltage from its initial level to a different level after cell programming or erasing. The resulting voltage level may be incorrectly interpreted as a different logical value. Due to the smaller distance between MLC levels versus binary-Flash levels, MLC-Flash cells are more likely to be affected by leakage effects. They are potentially more prone to errors.

Another problematic error that crops up is the program-disturb or over-program effect. This error will cause a programming operation on one page to induce a bit-value change on another unrelated page. In binary-Flash technology based on a 0.16-µm manufacturing process, the typical program-disturb error rate is on the order of 1 bit error per 1010 bits programmed. MLC-Flash technology, on the other hand, has an error rate on the order of one bit error per 108 bits programmed.

Lastly, the read-disturb effect causes a page-read operation to induce a permanent bit-value change in one of the read bits. This example also uses binary-Flash technology based on a 0.16-µm manufacturing process. Here, the typical read-disturb error rate is on the order of 1 bit error per 106 repetitive reads of the page containing the bit. MLC cells are more prone to such errors. In actual measurements, however, the effect is less severe than it is in program-disturb errors. The measured rate is on the order of 1 bit error per approximately 105 repetitive reads of the page.

Performance:
Compared to binary-Flash technology, MLC technology needs more time to complete the following basic Flash operations: reading a page into the Flash buffer; writing a Flash buffer into a page; and erasing a Flash unit. For write operations in particular, raw Flash comparisons indicate that MLC performance is only 25% that of binary Flash. Many factors other than raw Flash speed influence performance, however. They include host-CPU-bus timing issues; error detection and correction; software algorithms employed by the device driver; file-system overhead; patterns of file access by the user; and bus cycles.

From the user's point of view, raw read or write times are totally irrelevant. Rather, the user "feels" how long it takes, for example, between when a long write-command sequence is issued to the file system and when those requests are completed. To truly quantify these times, measurements should be performed under scenarios that duplicate the real world as closely as possible. First, fill the disk to almost full capacity. Then, perform the measurements while taking into account the hidden software mechanisms that are interfacing the Flash to the user (file system, device driver, etc.).

When sustained-read performance values are compared in real-world scenarios for binary Flash and MLC, the gap between them lessens considerably. MLC performance is 98% of binary-Flash performance. Certain operations account for this closing of the gap. Binary Flash and MLC require such operations to support a sustained-read operation. Examples include running the driver- and file-system code and accumulating bus cycles to support address, command, error-correction code, and control information.

To compare sustained-write performance for both technologies in real-world scenarios, one must consider an additional factor: making room for new data when no free space is available. The time that it takes to erase a Flash unit must be added to the calculation. Then, depending on how long it is, add the time that it takes to manage the Flash. For example, using M-Systems' TrueFFS adds 5% of the time that is required to write a unit.

For binary Flash, these calculations result in a sustained write-performance rate of 250 KBps on a low MIPS platform. This number translates into 4 µs per byte for a typical mix of files compared with 172 KBps for MLC. (Note that the number of sectors per unit for MLC is twice the corresponding number for binary Flash.) When these figures are translated into percentages, MLC sustained-write performance is approximately 69% of binary-Flash write performance.

Write performance greatly varies according to the user's access patterns. Mainly, it is influenced by the average file size. For large files, the rate goes up to approximately 600 KBps. For very small files, it is much lower. Here, the time that is required for file-system handling is more significant than device-driver time—especially when dealing with small files. The bus cycle time for writing is practically the same as it is for reading. All of the remaining time is spent on software overhead.

Flash management:
With MLC's architecture, pages can only be written sequentially. But in binary Flash, they can be written randomly within the erase block. MLC also makes partial page programming impossible, whereas binary-Flash technology enables it. Both the sequential write-only and the lack of partial page programming impose limitations on MLC. These limitations affect reliability as well as performance.

For the mobile-handset industry, MLC technology can potentially deliver breakthrough cost and size benefits for local data and code storage. For this reason, Flash vendors have chosen to take on the challenge of overcoming MLC reliability, performance, and Flash-management limitations. A result of one such effort can be seen in M-Systems' x2 technology. This combination of algorithms, performance enhancements, and Flash-management capabilities was developed in cooperation with Toshiba.

BENEFITS OF x2 Aside from being integrated into the different modules of M-Systems' Mobile DiskOnChip G3 architecture, the x2 technology boasts full compatibility with the company's TrueFFS technology for Flash management. It includes the reliability and performance improvements that are integrated into TrueFFS along with the thin controller and the Flash media itself. By balancing software and hardware, this technology promises to keep reliability and performance at their peak. At the same time, the x2 technology maintains MLC cost and size benefits.

By virtue of its dense manufacturing process and smallest silicon die, NAND seems to be the best choice for high-capacity Flash memory. Yet that high density further intensifies the complexity of MLC. Despite this problem, MLC NAND-based products are already available in the market today. They promise to maximize MLC benefits while delivering higher reliability and performance rates that almost equal the rates reached by binary NAND (See Table).

In summary, the major improvement derived from MLC technology is a much smaller size per bit. This shrunken bit size leads to a greatly reduced silicon size. Of course, these advantages come with added complexity in the device hardware architecture and device driver software. They also invite reliability and performance tradeoffs. Yet MLC's appeal continues to grow because of its size benefits and cost-effectiveness. As a result, vendors of the most popular Flash technologies used in mobile handsets—NAND, NOR, and AND—have implemented ways to overcome MLC limitations.