Test Your SSDs For Consistent Performance

Businesses don’t buy specifications. They buy results. When an enterprise buys a storage solution, it wants a specific set of equipment to accomplish tasks that meet one or more specific business needs. If that task happens to be online transaction processing (OLTP), that need likely will be measured in input/output operations per second (IOPS).

However, OLTP workloads sport much different characteristics than, say, archival or media serving workloads. The size of files, frequency of read/write requests, number of simultaneous users, and many other factors will vary according to the workload type. A storage product that seems to be best-of-breed in one application may, in a different environment, fall on its face.

Much of the storage market is preoccupied with only two metrics: dollars per gigabyte and peak transfer rates. Thinking in these terms derives mostly from the consumer/client market, where capacity, cost, and raw speed (usually in that order) are the three bullet points that buyers study. But in data centers, performance is not as much about peak speed as it is about sustained, consistent ability to operate at the highest possible levels under enterprise-specific workloads and conditions.

A Matter Of Metrics

Can you measure an elephant on a bathroom scale? In theory, yes. Is it the most appropriate way to measure the animal, and do you trust the results to be accurate? Probably not.

In the world of enterprise storage, Iometer is a bathroom scale. While use of the free program for benchmarking is ubiquitous, it remains fairly simplistic in how it conducts testing. Being able to set given percentages of random versus sequential data between client and server volumes is fine, but that only addresses a limited set of criteria.

We can borrow an example from the client device world. In 2008, according to SearchStorage, IDC set out to test the widely held assumption that solid-state disks (SSDs) in PCs would deliver faster application launch times because their access times were so much faster than those of hard drives (Fig. 1).

1.Not all apps benchmark equally. IDC testing reveals how the assumed performance advantage seen in HD Tach does not appear when testing Internet Explorer. (courtesy of IDC’s “The Need to Standardize Storage Device Performance Metrics,” sponsored by Seagate)

Initial numbers obtained with storage device benchmarking tool HD Tach confirmed the well-known fact that SSD accesses were virtually instantaneous and many times faster than hard drives. But once the drives were installed into a laptop and put under real-world conditions, it became obvious that other factors within the system were limiting performance.

The SSDs showed essentially no benefit over their famously slower magnetic cousins. In other words, the commonly held assumptions about SSD performance in a given scenario were based on misapplied data from a benchmarking tool inappropriate to the application at hand.

This same misapplication often happens with enterprise drives when IOPS are treated as a blanket benchmarking metric. For example, an enterprise-class multi-level cell (MLC) SSD might obtain a result of 14,008 IOPS in the SPC-1C benchmark while rival SSDs, even consumer-class client drives, will likely approach 20,000 IOPS in Iometer for significantly less cost per gigabyte. The key point is that such comparisons often fail to qualify what kind of IOPS are being used.

Typical enterprise applications might have I/O sizes ranging from 2k to 256k, and backup applications can use I/O blocks spanning into the megabytes. Every I/O size will yield a different IOPS result. The manager of an archival application using 1-Mbyte I/Os probably has no concern about 4k write performance.

Moreover, one technician is likely to use a particular mix of I/O characteristics in a given test while another tech will use a different set of parameters, making it all too possible that two people are discussing the same metric in the same benchmarking tool and dealing with wildly divergent interpretations of one device’s results. IT managers really need industry-standard benchmarks that use fixed workloads appropriate to key enterprise applications.

SPC: Filling The Need

The non-profit Storage Performance Council (SPC) now boasts a register of nearly 50 of the storage industry’s largest component and systems manufacturers. Its mission is to provide the industry with objective, practical enterprise storage benchmarks to help foster market adoption of compatible storage solutions.

The group offers several tests divided into two broad groups. The SPC-1 series focuses on highly randomized workloads of the sort commonly found in OLTP, mail server, and database applications. The SPC-2 group covers more sequential loads, especially those characteristic of large file apps and streaming video. In particular, SPC-1C examines specific components, such as drives or controllers, and hinges on a different workload model than what is commonly found in traditional storage benchmarking apps (see the table).

As detailed in the test’s specification, once workloads with a carefully balanced mix of data types and I/O loads are ready to run, SPC-1C proceeds through three test phases: primary metrics, repeatability, and data persistence. The most important phase of primary metrics is demonstrating that the storage device can sustain maximum I/O throughput over an extended period. Whereas Iometer is often run for three minutes, the SPC-1C spec calls for one-hour measurement intervals. Some SPC tests call for a four-hour or even eight-hour cycle (Fig. 2).

2. The SPC-1C benchmark measures I/O performance at various workload levels. This is important since it’s common for storage devices not to operate at peak levels constantly, and performance anomalies may appear at lower use levels. (courtesy of the SPC Benchmark 1C Official Specification)

Unlike other benchmarks, SPC-1C does not aim to capture and convey whatever results occur during the sustainability phase. Rather, the benchmark operates on more of a pass/fail model. If the I/O throughput falls below 95% of the reported SPC-1C IOPS throughput, the test is invalidated. This ensures the drive is performing consistently and performance spikes or dips aren’t skewing test results.

Once the sustainability test phase completes, SPC-1C runs a five-minute IOPS test phase at the same load level, essentially measuring performance at the end of a marathon test session. Next, response times are measured at BSU load levels of 95%, 90%, 80%, 50%, and 10%. Finally, the IOPS test runs at 100%, and 10% loads are repeated for at least 10 minutes each to verify consistent performance even after considerable and heavy use.

In fact, demonstrating consistency in performance is likely the SPC-1C’s primary accomplishment. For storage solutions to help IT groups meet their service level agreements, the drive needs to perform at the same level throughout its addressable media over time.

Figure 3 shows three enterprise SSD drives that did not demonstrate this ability. Aside from all three SSDs showing remarkably slow access times, note in particular how device B, after about an hour and a half of running, saw its performance nosedive under a sustained load. Drives B and C also show a marked difference in performance between ASU-3 and the other two ASUs.

3. SSDs are frequently assumed to offer negligible latency, but SPC tests reveal this not to be the case. Drives that exhibit excessive response delays or vary too much in their response times will fail testing. (courtesy of Seagate, Enterprise Storage, SPC-1C Case Study Consistent Performance Presentation, Flash Memory Summit, 2011)

In comparison, Figure 4 shows a graph from an SPC-1C report for a drive that did pass testing. Whereas the first three drives were prone to changing their results characteristics after an hour or so, the Seagate Pulsar.2 drive shows almost unflinching response times for eight hours straight. Obviously, a traditional three-minute test can’t begin to approach this level of accuracy or insight.

4. A properly functioning SSD capability of passing SPC benchmarking will demonstrate low, flat latency results across hours of testing. (courtesy of SPC Benchmark 1C Full Disclosure Report on the Pulsar.2)

The SPC-1C test provides a much clearer picture about real-world, data center-type performance than more mainstream testing alternatives. An IT manager may want to know if a drive could be trusted to maintain a workload of 10,000 IOPS, servicing many simultaneous users and remaining instantly accessible to all of them. The at-a-glance assessment from an SPC-1C report would be difficult, if not impossible, to approximate with tools such as Iometer (Fig. 5 and Fig. 6).

5. Response times and IOPS measured at various points will increase as workload levels progress from 10% to 100%. SPC testing will reveal where, if ever, a drive’s I/O load will exceed latency thresholds. (courtesy of the SPC Benchmark 1C Full Disclosure Report on the Seagate Pulsar.2 drive)

6. This graph illustrates the very flat (and thus predictable) performance of the drive under test across all three ASUs over a span of eight hours. (courtesy of the SPC Benchmark 1C Full Disclosure Report on the Seagate Pulsar.2 drive)

The Enterprise Difference

Why don’t consumer drives deliver similar consistency even when their IOPS performance may seem (from a cursory glance) to be on par with their enterprise counterparts? Part of the answer has to do with the ASICs within the drives.

While vendors closely guard the intellectual property (IP) specifics of their chips, there is clearly a difference in algorithms, the number and frequency of processing cores, the bandwidth between those cores, and how cores go about balancing workloads. Enterprise-class SSDs also tend to feature nonvolatile cache or power-protected write cache that allows drives to perform as if they are in write cache-enabled mode, but without the data loss concerns typical of enabling write cache on client designs.

Another key differentiator between enterprise and client SSDs is their use of over-provisioning. Over-provisioning sets aside a given amount of storage space to be used for wear leveling, garbage collection, and other background operations that help accelerate and improve drive performance, providing more space in which to move data around and perform write operations more efficiently.

Many consumer drive manufacturers provide tools for users to adjust their drives’ over-provisioning levels, or a drive might ship with 7% to 28% or more over-provisioning from the factory. For example, a 400-Gbyte SSD may contain 512 Gbytes of physical NAND. The missing 112 Gbytes are allocated to over-provisioning. Also, IT buyers should keep in mind that modifying the level of over-provisioning in a drive can significantly alter its capacity versus price metric, and not figuring the price-per-addressable gigabyte after over-provisioning is very misleading.

Deciding which SSDs belong in data centers should weigh more than price, capacity, and peak speed. The drive’s holistic performance and an independent, appropriate assessment of that performance must also be considered. Leverage metrics such as those from the SPC and get a more accurate idea of how your prospective drives will behave in your real-world applications.