Image courtesy of Thinkstock
Many embedded applications require a database of sorts, but the type can vary widely from ISAM (Indexed Sequential Access Method) to SQL (structure query language). While SQL is readily available on most platforms these days, there are other alternatives—including NoSQL and NewSQL—that have arisen to address “big data,” as well as less-structured content. These three actually cover a wide range of architectures and implementations, so what’s the difference?
SQL is a relational database management system (RDBMS) based on relational algebra and tuple relational calculus. Edgar F. Codd's relational model presented in his 1970 paper, “A Relational Model of Data for Large Shared Data Banks,” was a huge influence on SQL. It is the basis for major database server applications from vendors like Oracle, Microsoft, and IBM, and there are embedded versions available from many other software vendors. There are open-source implementations like PostgreSQL and MariaDB; the latter is a spinoff of MySQL. MySQL is now owned by Oracle while MariaDB is managed by the MariaDB foundation.
At their core is a common SQL query language that has made it easy to learn and utilize these systems. There are enough differences in syntax and semantics to make migration between platforms difficult at times, but in general, it is possible to move data between platforms. Applications can often target different SQL implementations.
Data is stored in tables where rows contained the same type of elements. Element types include the usual data bytes—such as integers, strings, and binary blobs—but the details can vary depending upon the implementation. SQL queries are used to retrieve and manipulate data using one or more tables.
SQL implementations typically support ACID (Atomicity, Consistency, Isolation, Durability) database transactions.
NoSQL refers to a class of databases that are not built on SQL and are designed to handle very large data stores with variable contents (such as documents). They are often designed to support clusters of compute and storage nodes that are common in large cloud environments.
Most NoSQL systems lack ACID support. This means that programming techniques need to change because the systems typically provide “eventual consistency” over time. This can lead to data loss depending upon the database server support and how an application is written. Some systems provide journal or logging capabilities to mitigate these issues.
There is little commonality among NoSQL systems, as there are many types. Typical classification of NoSQL systems includes key-value stores, object oriented databases, document stores, tuple stores, multivalue and multimodal databases, and more. Data may often be replicated to improve performance.
Performance, functionality, scalability, flexibility, and complexity vary between implementations, with some putting different emphasis on these attributes. It makes selection of a NoSQL database more difficult, but the payoff can be a more efficient, higher-performing solution than a conventional SQL database.
Some NoSQL databases include Aerospike, ArangoDB, CouchDB, c-treeACE, HyperDex, InfinityDB, InfiniteGraph, LMDB, MarkLogic, Neo4j, Oracle NoSQL Database, OrientDB ,and Redis.
NewSQL tries to bring some of the features and scalability of NoSQL to SQL. Online transaction processing (OLTP) read-write workloads need to maintain the ACID guarantees of SQL, but growing requirements demand the use of clusters to handle the large number of transactions. NewSQL systems often forego or modify recovery and concurrency controls to gain scalability across large clusters.
NewSQL systems typically employ SQL syntax and semantics at their core. Some reviews of NewSQL solutions have divided the types of implementations into new architectures, modified SQL engines, and transparent sharding.
New architectures include platforms like Google Spanner, MemSQL, NuoDB, and Trafodion. They employ clusters of share-nothing nodes where each node has a subset of the database. The systems are designed for distributed concurrency control, flow control, and query processing.
Modified SQL engines include platforms like MySQL Cluster, Infobright, and TokuDB. They are extended versions of their existing platforms, like MySQL for MySQL Cluster.
Sharding is the horizontal partitioning of data into shards. A shard is kept in a node, and it main contains some shared or unique data. This allows tables to be spread across many nodes. Sharding designs can be complex and difficult to implement effectively, but they can provide significant performance and scalability improvements. Examples include Apache HBase, MySQL Cluster, and ScaleBase.
Developers have a lot of choices when it comes to storing and querying data, but no one approach will address the needs of all applications. Often an application may employ more than one database system. SQL is often a safe and easy choice, and one that is applicable to many applications.