The growth of the PCI Express (PCIe) interface in compute markets is continuing at a brisk rate, with several analysts forecasting 10% to 40% growth in 2013. The big news is that PCIe solid-state drives (SSDs) are coming from almost every major supplier. However, not all is as it seems. Many of the solutions use a propriety protocol or a Raid-on-Chip (RoC) device to achieve optimal interface and storage performance. These solutions have served the industry well, but they lack standardization, low latency, and linear performance.
With the release of the Non-Volatile Memory Express (NVM Express or NVMe) specification version 1.0b, the industry now has a standardized, low-latency, open-sourced host driver and command set that enables interface and solid-state memories to achieve a level of performance that will make a difference to all computer systems (see NVM Express: Flash At PCI Express Speeds). The industry also gains compliance testing and a compliance test suite. But the industry has a long history with Serial ATA (SATA), which is by far the most dominant interface for storage in systems today, and the enterprise segment is adopting SAS/SCSI for higher data reliability and performance.
Current performance for SATA uses a serial 6Gbps physical layer (PHY) with an instruction set that is very useful for hard-disk drives (HDDs) and SSDs. However, the enormous growth in PCIe ports for client and server markets, and the growth in caching applications, have shown that it is necessary to upgrade current storage solutions. The ATA command set will also be paired with PCIe under the emerging SATA Express standard by the Serial ATA International Organization, mostly to leverage the performance and scalability of PCIe. Currently, Gen 3 PCIe single-lane performance is 8Gbps.
NVM Express is a scalable host controller interface designed to address the needs of enterprise and client systems using a PCI Express interface. NVM Express provides an easy-to-decode basic command set that is expandable using a defined standardized format. It includes support for parallel operation and can handle up to 64,000 outstanding requests, exceeding most application requirements today. SATA, in comparison, uses a command queue and has some capabilities to support multiple commands. However, there are limitations to the number of outstanding operations. This limitation is variable, based on implementation and whether or not SCSI is implemented.
In this paper we will compare the NVM Express and SATA solutions for solid-state storage and caching applications, and for client and server market applications. We will also examine the difference between these two standards and discuss why the time is right to look at a PCIe storage solution for the future.
The NVM Express (NVMe) specification was developed by more than 80 companies across the industry, and was released on March 1, 2011 by the NVMe Work Group. The NVMe 1.0c specification defines an optimized register interface, command set, and feature set for PCI Express solid-state drives. Although it's not a long history, the concept goes back to the beginning of the NVMHCI working group, which comprised 35 companies and released version 1.0 of the specification in April of 2008. Many of the same companies went on to become part of the NVMe Work Group.
First-generation Serial ATA (SATA) began to ship in mid-2002 with support for data transfer rates of up to 150 MB/s (1.5Gb/s). Designed to be backward- and forward-compatible with all SATA standards, SATA has done a fantastic job. SATA/150 provides a maximum net bandwidth of 150 MB/s based on a gross transfer speed of 1.5Gb/s. SATA/300 doubles the transfer speed to 3.0Gb/s or 300 MB/s net. The SATA interface lowers cost and is highly scalable by simply speeding up the link speed.
SATA also introduced a feature called native command queuing (known as NCQ), which allows the drive to queue up incoming commands, analyze them, and process them in an efficient order. This means that the drive should not have to reposition read/write heads more than necessary. This way, accelerating and slowing down the actuator can be minimized, and the drive can spend most of its time efficiently reading or writing blocks sequentially with the shortest head movement.
|Cable length||1 meter||1 meter||1 meter|
|Devices per port||1||1||1||More with port multiplier|
|NCQ||Yes||Yes||Yes||Native command queue|
S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology)
The introduction of SMART offered a standard way to monitor certain parameters that could help in identifying possible impending failures for hard-disk drives. The commands were modified for SSDs to address issues pertaining to Flash storage, such as block data, write failures, and general media information. The use and support of SMART varies between manufacturers for HDDs and SSDs.
Dataset Management TRIM Command
The SATA TRIM command was added specifically for solid-stage storage and has large implications for performance. The command was necessary for reclaiming blocks of old data (dirty blocks) that were no longer necessary. Flash translation-layer functions like wear leveling and erase are applied to SSDs. Operating processes, such as recycle bin, temp file deleting, and paging can produce a considerable amount of deleted files. By treating the deleted file information as invalid data, the device can reduce its internal operation on all invalid data.
SATA was originally designed as a point-to-point solution supporting application enhancements like RAID, just a bunch of disks (JBOD), and other system configurations. However, as systems have evolved, additional solutions became necessary for port expansion. Systems requiring use of port multipliers and/or JBODs with all drives are being independently addressed, generally with no communal effect. Other configurations are also possible where several drives are presented as a single volume to increase capacity (Fig. 1).
PCIe Transport Overview
Why PCIe? Isn't it more of an interconnect interface than a storage interface? Typically PCIe is deployed in desktops, laptops, networking, and servers. However, PCIe has some excellent attributes for scalability, latency, and performance. For scalability, PCIe can scale up to 16 lanes with standardized connectors for 1, 2, 4, 8, and 16 lanes. In addition, the PCIe standard has provided higher performing generations with higher bandwidths on a per-lane data basis. The three current generations of PCIe are as follows:
- Gen 1: 2Gbps / per lane
- Gen 2: 4Gbps / per lane
- Gen 3: 8Gbps / per lane
PCIe did not just increase the bandwidth; it created additional features that transformed the interconnect to a transport interface. Some of these additional features are compelling for storage applications like hot plug for maintenance, end-to-end data protection, RAID recovery, and other high-end system needs. Other interesting attributes are low latency, point to multi-point, and virtualization. Let's take a closer look at some of these features and how valuable they are to the future of solid-state storage.
PCIe Extensions Applicable to Storage
These extensions are optional but add useful features for specific applications. Here are some of the features that are applicable to storage:
- Multi-cast, broadcasting a single data set or command to multiple end points that can be used for RAID or mirrored storage
- Alternative routing-ID interpretation (ARI) enables I/O Virtualization (IOV), which supports up to 256 functions (physical or virtual)
- TLP processing hints (TPHs) optimize PCIe packet processing with host memory and system cache
One of the key aspects of PCIe is the use of virtualization, with the basic PCIe ARI enabling up to 256 virtual functions; this is natively backward-compatible. But PCIe has additional virtualization abilities:
- SR-IOV single-root complex host
- MR-IOV multi-root complex host
The virtualization feature of PCIe is very well integrated into the NVMe architecture, whereas SATA does not utilize this. However, it is only fair to say that SCSI does have features that can be virtualized. Both of the features listed above can enable new system applications such as micro servers and blade applications. Creating a large shared local resource can reduce search and local data access.
The Role of Virtualization
Virtualization can improve the flexibility and performance of storage platforms by enabling advanced resource sharing. Virtualization allows systems to expand quickly without additional port multipliers or other connectivity hardware. Virtualization can support multiple system images (SIs) over several drives, or create separate streams for dedicated process applications.
NVM Express (NVMe) is a scalable host controller interface that was designed to address the needs of enterprise and client applications that utilize PCI Express-based solid-state storage. The interface provides an optimized command and completion path. It includes support for parallel operation by supporting up to 64,000 commands within each of the 64,000 command queues. Support has been added for many enterprise capabilities like end-to-end data protection (compatible with T10 DIF and DIX standards), enhanced error reporting, and virtualization.
The interface has the following key attributes:
- Support for up to 64,000 I/O queues, with each I/O queue supporting up to 64,000 commands
- Only seven major commands (read, write, flush, identify, get features, set features, abort, event report)
- Fixed size 64- and 16-Byte command decode and execute
- Priority associated with each I/O queue with well-defined arbitration mechanism
- All information to complete a 4KB read request is included in the 64B command itself, ensuring efficient small random I/O operation
- Support for MSI/MSI-X and interrupt aggregation
- Support for multiple namespaces
- Robust error reporting and management capabilities
The NVMe protocol is designed not just for the Flash media of today but also for future non-volatile memory devices. Conceptually it relies on several aspects of the PCIe standard to enable virtualization, redundancy, and data reliability.
In Figure 2, the root complex controls the entire storage array while the switch allows direct communication to the end points. This approach demonstrates how multi-cast and alternative routing-ID interpretation (ARI) can be utilized in a typical system. However, if we wanted to, we could use an NVMe "namespace" command and split NVMe 1-4 into an unlimited number of addressable storage elements, similar to what we can do with volumes today. In comparison, the number of commands using SATA would be limiting.
In Figure 3, by using the flexibility of PCIe we can set up traffic classes through the switch to balance or create different traffic patterns for each drive or partial drive. There are several ways to achieve virtualization in this scheme, such as using virtual channel and traffic classes. Alternatively, going through the switch allows prioritization using three arbitration schemes:
- Hardware-fixed (e.g. round robin)
- Programmable weighted round robin
- Programmable time-based weighted round robin
SATA has served the industry very well for a long time, and is a great solution for spinning media solutions. As the industry moves to hybrid and solid-state storage, the SATA transport and protocol are stressed to provide the same value. With SATA Express entering the market, it appears that PCIe will be the dominant transport for storage, computer, and communication markets.
Although SATA and NVMe are very different from an architectural point of view, they do share some basic functionality. Basic read and write commands are the same. The main differences involve latency, command processing, and power management. The table below highlights high-level functions and shows how each solution addresses the function.
|Transport||SATA point to point, up to 6Gbps||PCIe point to multi-point, up to 8Gbps plus lane expansion|
|Scalability||Requires muxes and port expanders||ARI up to 256 virtual functions; more with SR-IOV|
|Segmentation||15 partitions limited by OS||Unlimited based on the number of namespaces|
|Power Management||SATA's Aggressive Link Power Management specified in SATA 1.x (in conjunction with AHCI-compliant controllers), effects power savings at the serial link (this is independent of disk power management). There's both a "Partial" and "Suspend" mode; Suspend takes longer to enter/exit and saves more power than Partial||A controller will support at least one power state and can support up to a total of 32 power states. Uses a table based approach using static and dynamic power management|
|Command Queue||Typically limited to 32 commands in the processor when used in SSDs||2,000 are possible; however, Windows will limit this to 256. Linux and servers will use more depending on architecture|
For NVMe, the story is still to be written. But for low-latency, virtualized applications found mainly in servers, NVMe seems poised to be quickly adopted given the available resources and support. However, as market adoption has yet to happen, can these standards co-exist? Most definitely. SATA has a very strong following, especially in client solutions. And with the addition of SATA Express, the scalability will improve the performance of SSDs. For NVMe, hybrid-drive applications could potentially be a very good application. And the advanced power options for NVMe could provide new applications and improve the user experience.