NewSQL takes advantage of main-memory databases like H-Store

Feb. 25, 2015

6 min read

How do we store data? Often, in database systems, but traditional database systems might not be a great fit in the big data era, according to MIT Professor Samuel Madden in the online course “Tackling the Challenges of Big Data,” which runs through March 17.

He first commented on two terms: transactions and analytics. The latter might involve reading a large number of historical records—perhaps to understand what types of investments banking customers make. The focus of this current lecture, however, is transactions, in which a database operation fetches or updates a small piece of information inside of the database. You are performing a transaction, for example, when you log into bank account and transfer money from your checking account to your savings account.

“So in a transactional system, he said, “you have lots of users concurrently using the system, each running some set of transactions on their behalf in order to fetch or manipulate their data.”

Transactional databases, he said, provide several features: they support record-oriented persistent storage, they require that records conform to a schema (a record structure in which, for instance, a record corresponding to a particular customer might include account balance and interest rate), and they provide a query language—a way to access and manipulate data.

They also provide ACID, he said, which stands for atomicity, consistency, isolation, and durability. Atomicity offers what he called an “all or nothing property”—given a group of operations that manipulate a collection of records, either all of those operations successfully complete or none of them do. This prevents a situation in which, for example, you try to transfer money from a checking to savings account but the system crashes after the withdrawal from checking but before the deposit to savings.

Conventional transactions-oriented database systems might not be a good fit for the world of big data, which requires really high throughput, he said. In addition, big data might be distributed across multiple machines. And the machines must be highly available, even if they emit inconsistent data, leading to a property he called “eventual consistency.” And finally, the SQL query language might not be optimal for some big data applications, leading to the development of NoSQL and NewSQL languages.

Madden cited a Google search, in which one of 1,000 nodes might fail. It’s probably better to return the results of the 999 nodes that didn’t fail than no results at all, even if some information is missing. This is fine because we have no concept of “perfect” search results. We might be less accommodating about our bank balances.

He then addressed NoSQL query languages, describing alternatives to the relational data model, including the key value store and the document store. The key value store provides a simple interface with the ability to do “gets” and “puts”—processing is one operation at a time, with no atomic operations over multiple records and no guarantees that traditional database systems would provide. With a document store, a key is associated with, for example, a JSON or HTML document.

Key value stores, he said, are easy to program and implement, they offer high throughput, and they support distributed data across multiple nodes. He noted that from a data perspective, a key value model, document model, and relational model are equivalent—what differs is the query language used to manipulate the data.

He then elaborated on eventual consistency, citing an example involving three replicas, in which two respond to a query while the third is offline. This poses the possibility of a user accessing stale data when the third replica comes back online—which can be resolved through the “majority write/majority read” protocol.

H-Store, Madden said, is a NewSQL system that provides the transactional properties of traditional systems and the performance of NoSQL systems, which he demonstrated via the voter benchmark, developed based on the Japanese version of “American Idol.” Traditional systems cannot handle the millions of votes being registered in a short period of time. Traditional systems spend about a third of their time on buffer pool operations (moving data between disk and memory), nearly another third on locking operations (so that one transaction doesn’t see intermediate results of another transaction), and almost another third on recovery (writing data to disk to ensure the ability to recover, replay, or undo in the event of a crash). Only about 12% of the time is spent doing real work.

A conventional database system can store many petabytes of data on disk. A transactional database, however, may fit in main memory. That’s a reasonable assumption, Madden said, because companies’ numbers of customers or products aren’t growing exponentially, and in fact, the amount of main memory per machine is growing exponentially. H-Store takes advantage of main memory to eliminate overhead related to disk operations.

H-Store also eliminates concurrent execution (for each partition). That may not sound like a good way to improve performance, yet it minimizes overhead related to locking. And finally, H-Store logs the fact that a particular command has started to execute to a command log for recovery purposes, but it doesn’t perform all the fine-grain logging of a traditional heavyweight recovery system. It instead uses “asynchronous background check pointing,” which involves writing to disk at a relatively slow and infrequent rate.

And instead of submitting SQL text, users of H-Store submit a name of a particular stored procedure—VoteCount and InsertVote, for example—they want to run.

H-Store, he said, involved a lot of new ideas, including speculative execution, automatic partitioning, and the ability to run efficiently on multicore machines.

In conclusion, he emphasized “…this idea of using a new kind of database system, one that’s really modern and optimized for main-memory databases like H-Store.”

See these related posts on the online course “Tackling the Challenges of Big Data”:

Read these other articles on big-data topics:

About the Author

Rick Nelson

Contributing Editor

Rick is currently Contributing Technical Editor. He was Executive Editor for EE in 2011-2018. Previously he served on several publications, including EDN and Vision Systems Design, and has received awards for signed editorials from the American Society of Business Publication Editors. He began as a design engineer at General Electric and Litton Industries and earned a BSEE degree from Penn State.