Improve Product Qualification Accuracy with Advanced Solid-State Storage Usage SMART Monitoring Technology

April 24, 2008

6 min read

For software engineers, system-level hardware engineers, program managers, and product marketing managers alike, the product qualification process can be a challenging one, especially for applications requiring long product lifecycles. Since product life can be tied closely to the storage solution used, establishing the projected life of a product’s storage system solution under various application-specific usage models provides a critical—though previously unavailable—indicator of storage system performance over time.

Many designers have turned to advanced solid-state storage in their applications, due to the technology’s reputation for ruggedness and reliability in small form factors. However, endurance issues caused by the instability of shrinking process geometries and faster host system interfaces, coupled with heavy usage cycles over long product deployments, have resulted in many storage systems failing in the field. This means costly unscheduled downtime, frequent repairs, or necessary replacement of the storage mediums in use, which directly affects overall product or system usability.

What designers really need is a way to accurately determine the usable life of a given storage solution within a particular application, so that they can then make reasonable assumptions and claims to customers regarding their design’s actual lifespan. These designers can benefit from advanced storage technology that constantly monitors and reports the exact amount of a drive’s remaining usable life. While a number of products on the market can alert a user when a drive is approximately 95% worn, a utility that forecasts, monitors, reports drive life while deployed in the field, and detects issues early enough to actually do something about them is ideal.

Predictable Vs. Unpredictable Drive Failures
Before launching into any discussion about reducing the potential for drive failure, it is important to note that some failures can be predicted while others simply cannot. Unpredictable failures among solid-state drives caused by electronic issues can occur quickly and seemingly out of nowhere. These difficulties often are the result of quality or power issues that must be worked through by designers. In contrast, predictable failures that are typically caused by degradation over time can actually be prevented with monitoring. These types of failures are avoided by modeling drive usage in a given application during prequalification.

Storage Wear-Out
As users constantly strive to store massive amounts of information on solid-state drives, failures have become more of an issue in the systems that incorporate them. This is an issue plaguing many industries and, though the cases of this occurring are numerous, the root of the problem has been difficult to diagnose. Specifically, it has been difficult to determine if and when:

Endurance specifications have been exceeded when a drive is reading too many times to one location.
All of the drive’s spare blocks are exhausted.
There is an error occurring from a power anomaly when writing to the drive.

Even though solid-state storage by its nature does not mechanically wear out, there’s still concern about failures when a given solution exceeds its endurance specification. Traditional flash cards do not incorporate any type of feedback mechanism; consequently, they are allowed to operate until they exceed their endurance specification and fail. This often occurs suddenly and without warning. Unexpected storage system failures from drive wear-out, caused by enhanced duty cycles, can completely disrupt business, resulting in costly downtime, loss of data, and potential loss of revenue.

In contrast, advanced solid-state storage drives incorporate Self-Monitoring Analysis and Reporting Technology (SMART), which allows users to monitor the exact amount of usable storage a drive has left. Originally developed for use in hard disk drives, the theory behind SMART is that drive failures do not usually occur suddenly. Instead, they result from mechanical issues that generally occur over time.

Though some of the functionality of SMART does not apply to solid-state drives because they have no moving parts, SMART incorporated with advanced solid-state storage does make it possible to accurately forecast when a potential failure will occur, thus avoiding unexpected failures that can lead to a number of critical situations.

Drive Usage Improves Prequalification Process
Just as having a tool that accurately calculates the percentage of the drive used helps users manage their resources and use the remaining space efficiently and effectively, the same tool can be incorporated by designers to predict the lifespan of any given application in the field. The feedback provided by monitoring drive usage in this manner makes it possible to perform storage usage modeling. It is quite difficult for system designers to fully understand all of the data transactions that occur between a host system and a drive in various applications—especially when operating and file systems are used. By modeling typical scenarios of drive usage in similar applications, the lifespan of future drives becomes predictable.

This becomes possible with technology that monitors the write and erase cycles of each block, along with the usage of spare blocks in a drive. Monitoring the number of spares allows the host to set its usage threshold and take preventative action based on an application-specific and established set of criteria.

By also tracking and tabulating the write and erase transactions for each block in the drive, an accurate calculation of remaining life becomes possible. If the used percentage of the drive is elevated beyond the preset usage threshold, the drive can be flagged for replacement during the next scheduled maintenance period. Eliminating unscheduled maintenance calls can save time, money, and unnecessary hassle.

Design teams can take full advantage of this type of usage monitoring system if they meet just two requirements. First, the drive must allow polling of its block data and second, the host system must have control over the host system driver.

Modeling drive usage allows designers to alleviate concerns regarding write and erase endurance, accurately calculate the percentage of a drive used, and determine the location of bad blocks and used spares. Modeling also can aid design teams in the qualification process, helping them determine just how long a product can be expected to last in the field. Plus, it can help designers answer common questions such as:

Will the solid-state drive be likely to fail in the application before a scheduled replacement?
Is there a way to measure how long the product will last via bench tests?
Can the endurance of the product be monitored once it is in the field?

Create a Win/Win Scenario Technology is constantly changing and all field environments come with a unique set of unmanageable variables, leaving electronic designs at the mercy of the proverbial unknown. However, prequalification offers designers a chance to test new designs and ensures the best possible performance within reason.

Since storage reliability issues are a common and major cause of field failures, this area deserves ample consideration. Enabling designers to provide more accurate estimates of drive lifespan within a given application will help them ultimately offer more realistic product lifespan predictions to prospective customers. This benefit is undeniably significant. After all, when new products perform as expected, everybody wins.