The Joys of SSD Caching

The first part of this project was adding Super Microcomputer’s (SuperMicro) CSE-M28 Mobile Rack (Fig. 1) to my SuperServer 6046T-TUF I system (see “Mobile Racking a Server”). The Mobile Rack fits into a pair of 5.25-in drive bays and adds eight 2.5-in hot-swappable, drive bays to the system. There were populated with a mix of high performance Seagate 15K Saviio drives and Micron M500DC (see “Enterprise SSD Targets Big Data Applications”) enterprise 6 Gbit/s SATA solid state drives (SSD). The M500DC is designed for a 5 year life assuming 1 to 3 drive fills per day.

Electronicdesign Com Sites Electronicdesign com Files Uploads 2015 01 833641 Fig1

Figure 1. SuperMicroâs CSE-M28 Mobile Rack fits into two 5.25-in drive bays adding eight 2.5-in hot swappable drive bays.

I used Avago Technologies’ LSI MegaRAID 9361-8i (Fig. 2) to control the eight drives. The controller has a pair of x4, high density, SFF8643 mini-SAS connectors that require only two cables to support the Mobile Rack. The MegaRAID 9361-8i is the variant with 8 internal connections. There are versions with a mix of internal or external or all external connections. All have x8 PCI Express Gen 3 interfaces and can control up to 128 drives via expansion connections.

Electronicdesign Com Sites Electronicdesign com Files Uploads 2015 01 833641 Fig2sm

Figure 2. Avago Technologiesâ LSI MegaRAID 9361-8i handles up to 8 SAS/SATA drives. It can handle 12 Gbit/s SAS drives that are now avialable.

The reason for using the MegaRAID 9361-8i was the CacheCade support. This allows the flash drives to cache the contents of the hard disk drives in the system. The board has 1 Gbytes of 1866 MHz DDR3 RAM that can use a battery backup unit (BBU). This is part of the CacheVault option that includes the BBU and a module that plugs into the controller. The module contains flash storage used to save a copy of the DRAM cache that is restored when the systems starts up again. This feature is normally part of a high reliability/high availability cluster.

SSD caching can help most applications perform better although the amount of improvement can vary significantly. The variance is often based on the amount of SSD storage versus HDD storage as well as the applications. Some applications that do not usually benefit from SSD caching include large sequential databases plus streaming read or write applications. This is because large sequential accesses do not benefit from caching. On the other handy, features like striping can help.

The CacheCade software is enabled with a small key that has to be plugged into the board (Fig. 3). Without CacheCade, the controller works as a high performance, SAS/SATA controller. Caching can be implemented using software on the host but CacheCade offloads the host.

Electronicdesign Com Sites Electronicdesign com Files Uploads 2015 01 833641 Fig3

Figure 3. The CacheCade software is contained in the MegaRAID 9361-8i but it is enabled with the addition of a small module that contains an unlock key.

The first step in testing the system is to install the device drivers. I was using Centos 7 that is equivalent to Red Hat Enterprise Linux (RHEL) 7. RHEL 7 is one of many operating systems supported by the MegaRAID software.

The next step is to install the MegaRAID Storage Manager (MSM) software is used to manage the resources connected to the MegaRAID 9361-8i (Fig. 4). This software can run on any machine, not just the server. It communicates with the device drivers to manage the controller and attached drives. This allows a single instance to manage the controllers and devices throughout a cluster regardless of the operating systems employed. Keep in mind that the functionality of a controller is restricted to a server so the example shown here can only utilize the CacheCade software on the one controller that had it installed.

Electronicdesign Com Sites Electronicdesign com Files Uploads 2015 01 833641 Fig4sm

Figure 4. The MegaRAID Storage Manager can manage multiple controllers on multiple machines.

Finally we get down to the nuts and bolts. MSM works with arrays of disks that it presents as virtual disks to the operating system. This allows the underlying disk layout to change. This is especially handy for RAID arrays with hot spares. The controller can transparently handle swapping out a bad disk for a hot spare allowing the bad disk to be replaced.

The controller supports RAID 0 (striping), 1 (mirror), 10 (mirror stripe), 5, 6, and 60. A collection of spare drives can be associated with a virtual RAID drive. The spares can come from a common pool or a pool dedicated to an array. This flexibility is useful for environments where there are variable requirements in terms of reliability and availability.

Creating a virtual drive with MSM is simply a matter of selecting the type of RAID configuration and populating with drives. Hot spares can be added or removed at any time. Normally the physical drives within a virtual drive will be of the same type, capacity or performance. This minimizes underutilization and consistent performance. For example, a 500 Gbyte RAID 1 mirrored virtual drive could be created from a 500 Gbyte drive and a 1 Tbyte drive. Of course it could also be made into a 1.5 Tbyte RAID 0 virtual drive as well.

CacheCade uses a pair of virtual drives. One usually consists of the HDD virtual disk and another with an SDD virtual disk. The virtual disks can contain one or more physical disks. The two virtual disks are combined into one with CacheCade. The first is what an operating system sees while the other remains hidden as does the caching functionality. The controller handles accesses and directs them to the fastest array adjusting the data on the cache drives based on access patterns.

The challenge for users will be determining the balance of storage. Not all applications will benefit from massive amounts of flash storage while other will. Testing the system with real data will be required to determine the optimum mix. Tools like Microsoft’s XPERF and Linux’s block trace can provide one with feedback on system performance.

In general, the minimum amount of SSD cache should be enough to handle the “hot zone” of data. This is data most frequently used. Data that is not often used would be in the “cold zone.” In between would be the warm and cool zones. Benchmarks can be used to generate distinct partitions but this kind of information really needs to be done with real data.

The single Micron M500DC is an MLC drive with 220 Gbytes of storage. This turned out to be plenty for the smaller applications I was running to test the system including a web server and a relatively small database application. The M500DC drive was the cache for a 3.3 Tbyte RAID 5. Adding a second drive in a RAID 0 configuration did not significantly improve performance because the test data I was using was easily handled by the single drive. Of course, increasing the data set size would have shown a benefit with more SSD storage.

Many applications can justify more storage and even SLC drives. SLC will be needed for applications where the cached data changes often. Enterprise MLC drives like the Micron M500DC will be suitable for a wide range of applications with the benefit of higher capacity for a lower cost. On the plus side, MSM can track use of the drives so it is possible determine when a drive will wear out allowing replacement before this occurs. This feature is called SSD Guard. Of course, having a hot spare handy allows the system to handle the details.

Actually a hot spare may be used to the best effect on the hard disk side because the system can continue running without the cache whereas the hard drive RAID array will also degrade if a drive is missing. This tradeoff is of more concern with a smaller array like the 8-drive Mobile Rack. Balancing the hot spare slots tend to be less of an issue if the controller is tied to dozens of drives.

In the end, I was only able to stress the system using some artificial benchmarks and platforms like Hadoop. Switching between configurations is simple with MSM. Swapping out drives can cause some lengthy rebuilds but that is more of a disk limitation and one network managers always have to deal with. The important thing is to maintain the data and provide access as quickly as possible.