Interview: Kent Smith Addresses Error Correction And Flash Storage Technology

Interview: Kent Smith Addresses Error Correction And Flash Storage Technology

The need for higher densities and packing more bits into a single cell have pushed up the number of errors that occur. Designers have utilized error correcting codes (ECC) to address the problem but methods used with earlier storage technology has proven inadequate as densities have increased. This is why Low-Density Parity Check (LDPC) error correction has become critical to high density flash storage (see “Low-Density Parity Check Error Correction for Solid State Storage”). Not all implementations of LDPC are the same and different levels can address more errors although they take more time to execute (Fig. 1).

Figure 1. LDPC decoding latency can be minimized by using progressively stronger (and slower) forms of soft-decision (SLDPC) decoding only as needed when hard-decision (HDLPC) decoding fails.

LSI recently introduced SHIELD technology with advanced LDPC error correction in its next-generation SandForce flash controller. I spoke with Kent Smith, senior director of marketing for the Flash Components Division of LSI Corporation. He is responsible for all outbound marketing and performance analysis.

Wong: Is SHIELD technology LSI’s first implementation of LDPC error correction?

Smith: Thanks, Bill, for your interest in LDPC error correction and LSI SHIELD technology. SHIELD technology really is an exciting advancement and we are pleased to have this opportunity to share our perspectives with your readers.

LSI’s first implementation of LDPC codes was to correct errors in the magnet media of hard disk drives. LSI TrueStore read channels with LDPC iterative decoding technology have been shipping in high volume for HDDs since 2010. This experience and engineering expertise are leveraged in SHIELD error correction technology. LSI development techniques and tools were instrumental in adapting LDPC error correction to flash memory enabling delivery of robust and advanced error correction optimized for the solid state storage market. Leveraging our expertise in the area of ECC evaluation and testing was particularly valuable in accelerating the time-to-market for SHIELD error correction.

Wong: What capabilities does LSI SHIELD technology include?

Smith: That’s a great question because SHIELD technology definitely includes more than just LDPC error correction. The LSI SHIELD implementation of LDPC decoding includes both hard- and soft-decision LDPC decoding with three value-added features that improve performance. The first is a multi-level error correction schema that employs progressively stronger forms of soft-decision decoding only as needed. The second is sophisticated digital signal processing to improve speed and accuracy when reading cells. And the third is the use of parallel LDPC engines with specialized hardware acceleration. We believe these three features give SHIELD technology industry-leading performance for LDPC error correction for flash memory.

SHIELD technology also includes two other advances that provide complementary error correction capabilities. The first is intelligent error mitigation. This technology identifies common causes of errors, such as program/erase cycling and read disturb, and initiates the most expeditious action needed to mitigate the problem. Depending on the cause of the error, SHIELD error correction might perform a simple program/erase cycle and/or relocate the data to a different page or block to prevent future problems.

The second complementary capability involves the way memory is allocated to error correction. SHIELD technology employs an adaptive code rate that is able to provide stronger error correction for weaker flash memory blocks and less error correction for stronger blocks. This enables the amount of space used for error correction to begin small and increase over time as the blocks age and generate more errors. The unused ECC space is automatically used by the flash controller for additional over-provisioning, which improves overall performance while also helping to increase the SSD’s endurance.

Wong: What effect will SHIELD error correction technology have on SSD endurance?

Smith: SHIELD technology will definitely increase endurance in any solid state storage solution using any geometry and density of NAND flash memory. But the technology is too new to predict the precise benefit, especially given the wide range of products being built using chips with different geometries and densities from different manufacturers.

I can at least provide a “ballpark” estimate based on a demonstration LSI did at the 2013 Flash Memory Summit. The demo achieved a six-times improvement in endurance over the flash manufacturer’s rated specification. That means a chip specified to remain under a certain output bit error rate for 3000 program/erase cycles, for example, could function reliably up to 18,000 cycles. But as they say in the auto industry, “Your mileage may vary” because the increase in endurance really does depend on a wide range of design choices that LSI does not control.

Wong: How does SHIELD technology achieve such an impressive increase in endurance?

Smith: There are two basic reasons. The first is inherent to LDPC error correction. Soft-decision decoding simply provides the strongest level of error correction available today for flash memory. LSI’s major value-add has been to implement sophisticated, soft-decision, LDPC decoding in way that minimizes the read latency penalties normally associated with soft-decision decoding of NAND flash memory.

The second reason is how SHIELD technology allocates the code space needed to perform LDPC error correction. As I mentioned previously, it does this dynamically rather than statically, and this dynamic or adaptive code rate creates a better balance between over-provisioning and endurance over the extended life of the SSD. Here’s how. When flash memory is new, errors are few and very little error correction code space is needed, so more of the reserved ECC space can be used to increase over-provisioning. This has the added benefit of reducing the write amplification during garbage collection, which further extends the life of the flash. As the program/erase cycles take their inevitable toll on the cells, more space is needed for error correction information. By taking advantage of the extra over-provisioning space and slowly returning it back to error correction over time, the end of life of the SSD can be extended well beyond what would be possible with a static allocation.

It is interesting to note that when the flash reaches the manufacturer’s specification for endurance, the space allocated to ECC is nominally back to the flash manufacturer’s specs. After that the SHIELD adaptive code rate can work in reverse by taking some of the over-provisioning space and increasing the ECC area to continue correcting errors beyond what would normally be the end-of-life for the SSD. This extended life comes at a slight reduction in performance, of course, but the SSD would otherwise be unusable without this capability. If performance is critical, the SSD could be replaced and moved to a lower tier of storage where it would still significantly out-perform every fast-spinning HDD currently on the market.

It is also important to note here that LSI SandForce flash controllers also include LSI DuraWrite technology, which helps increase endurance even more through wear-leveling and data reduction that minimizes write amplification. We believe the combination of LSI SHIELD and DuraWrite technologies will afford industry-leading flash memory endurance.

Wong: Does SHIELD error correction adversely affect performance?

Smith: Yes and no. Hard-decision LDPC decoding is about as fast as the error correction technologies currently being used in solid state storage solutions, and decoding times are typically a small fraction of the NAND flash read time. Soft-decision decoding provides much stronger error correction, but has a higher latency. SHIELD technology minimizes this performance penalty in two ways. One is by using progressively stronger soft-decision decoding only as needed. The other is to employ parallel engines with hardware acceleration. And as I also mentioned previously, SHIELD error correction incorporates intelligent error handling to minimize the need for even using soft-decision decoding.

I feel the need to put performance into its proper perspective, though. Solid state disks are much faster than hard disk drives, so increasing the read latency by reading NAND flash one or more additional times to get soft-decision information for stronger error correction—and doing so only when absolutely necessary—is not significant enough to create a performance issue. And even the strongest and slowest level of soft-decision LDPC error correction is faster than some of the other provisions used to restore data that contains uncorrectable errors.

Wong: That brings up an important point: What does SHIELD error correction do when it encounters an error it cannot correct?

Smith: There’s nothing needed in SHIELD technology to handle uncorrectable errors because these are already addressed by two other technologies integrated into SandForce flash controllers. One is an end-to-end cyclical redundancy check capable of detecting the so-called silent errors that are not detected by whatever error correcting code is used. The other is LSI RAISE technology, which stands for Redundant Array of Independent Silicon Elements. As the name implies, RAISE provides data redundancy and protection much like RAID does in a Redundant Array of Inexpensive Disks.

Wong: How much does SHIELD technology increase the cost, size and/or power consumption of flash controllers?

SHIELD technology does require allocation of more die space, but this is a reasonable design trade-off given the ability to use smaller and denser flash memory chips. And the increased space has only a modest impact on production costs. LDPC error correction does have the potential to consume more power, but the increase in power consumption is kept to a minimum because SHIELD technology employs a multi-level error correction schema that utilizes processor-intensive soft-decision decoding only as needed.

Wong: Is LSI applying for any patents related to SHIELD technology?

Smith: We are constantly expanding LSI’s intellectual property assets, and SHIELD technology certainly includes a number of technological advances that we believe would be patentable. While we wait for the status on the many filed patent applications related to SHIELD, we are treating these as trade secrets preventing us from getting into too many details on how it works.

Wong: Will SHIELD technology be implemented in all future SandForce flash controllers?

Smith: As flash memory technology continues to shrink in geometry and increase in bit-per-cell density, flash controllers will need to provide enhanced error correction capabilities. So we expect that solid state storage solutions using these smaller and denser chips will choose to use flash controllers equipped with SHIELD technology, such as the SandForce SF3700 flash controller family. But LSI plans to continue providing different versions of SandForce flash controllers to meet different needs as cost-effectively as possible, and those may or may not require LDPC error correction’s advanced capabilities, but may still use other features of SHIELD technology.

Wong: When will SSDs equipped with LSI SHIELD technology become available?

Smith: We’re currently involved in the design phase with several SSD manufacturers, and expect production shipments of disks equipped with SHIELD technology to begin sometime in 2014.

Wong: Is LSI planning any enhancements to the SHIELD technology?

Smith: As matter of policy, LSI does not discuss future product capabilities except under non-disclosure. Suffice it to say that LSI takes great pride in making continual enhancements to all of the company’s many technologies, and SHIELD error correction will be no exception.


 

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish