(Editor's note: This column is based upon the authors' forthcoming book "Principles of Database Management: The Practical Guide to Storing, Managing and Analyzing Big and Small Data.")

Relational database systems pay a lot of attention to data consistency and compliance with a formal database schema. New data or modifications to existing data are not accepted unless they satisfy constraints represented in this schema in terms of data types, referential integrity etc.

The way in which RDBMSs coordinate their transactions guarantees that the entire database is consistent at all times, the well-known ACID properties: atomicity, consistency, isolation and durability. Consistency is usually a desirable property; one normally wouldn’t want for erroneous data to enter the system, nor for e.g., a money transfer to be aborted halfway, with only one of the two accounts updated.

Yet, sometimes this focus on consistency may become a burden, because it induces (sometimes unnecessary) overhead and hampers scalability and flexibility. RDBMSs are at their best when performing intensive read/write operations on small or medium sized data sets or when executing larger batch processes, but with only a limited number of simultaneous transactions.

As the data volumes or the number of parallel transactions increase, capacity can be increased by vertical scaling (also called scaling up), i.e. by extending storage capacity and/or CPU power of the database server. However, obviously, there are hardware induced limitations to vertical scaling.

Therefore, further capacity increases need to be realized by horizontal scaling (also known as scaling out), with multiple DBMS servers being arranged in a cluster. The respective nodes in the cluster can balance workloads among one another and scaling is achieved by adding nodes to the cluster, rather than extending the capacity of individual nodes.

Such a clustered architecture is an essential prerequisite to cope with the enormous demands of recent evolutions such as big data (analytics), cloud computing and all kinds of responsive web applications. It provides the necessary performance, which cannot be realized by a single server, but also guarantees availability, with data being replicated over multiple nodes and other nodes taking over their neighbor’s workload if one node fails.

However, RDBMSs are not good at extensive horizontal scaling. Their approach towards transaction management and their urge to keep data consistent at all times, induces a large coordination overhead as the number of nodes increases. In addition, the rich querying functionality may be overkill in many big data settings, where applications merely need high capacity to ‘put’ and ‘get’ data items, with no demand for complex data interrelationships nor selection criteria. Also, big data settings often focus on semi-structured data or on data with a very volatile structure (think for instance about sensor data, images, audio data, and so on), where the rigid database schemas of RDBMSs are a source of inflexibility.

None of this means that relational databases will become obsolete soon. However, the ‘one size fits all’ era, where RDBMSs were used in nearly any data and processing context, seems to have come to an end. RDBMSs are still the way to go when storing up to medium-sized volumes of highly structured data, with strong emphasis on consistency and extensive querying facilities. Where massive volumes, flexible data structures, scalability and availability are more important, other systems may be called for. This need resulted in the emergence of NoSQL databases.

The Emergence of the NoSQL Movement

The term “NoSQL” has become overloaded throughout the past decade, so the moniker now relates to many meanings and systems. The modern NoSQL movement describes databases that store and manipulate data in other formats than tabular relations, i.e. non-relational databases.

The movement should have more appropriately been called NoREL, especially since some of these non-relational databases actually provide query language facilities close to SQL. Because of such reasons, people have changed the original meaning of the NoSQL movement to stand for “not only SQL” or “not relational” instead of “not SQL”.

What makes NoSQL databases different from other, legacy, non-relational systems which have existed since as early as the 1970s? The renewed interest in non-relational database systems stems from Web 2.0 companies in the early 2000s. Around this period, up-and-coming web companies, such as Facebook, Google, and Amazon were increasingly being confronted with huge amounts of data to be processed, often under time-sensitive constraints. For example, think about an instantaneous Google search query, or thousands of users accessing Amazon product pages or Facebook profiles simultaneously.

Often rooted in the open source community, the characteristics of the systems developed to deal with these requirements are very diverse. However, their common ground is that they try to avoid, at least to some extent, the shortcomings of RDBMSs in this respect. Many aim at near linear horizontal scalability, which is achieved by distributing data over a cluster of database nodes for the sake of performance (parallelism and load balancing) and availability (replication and failover management).

A certain measure of data consistency is often sacrificed in return. A term frequently used in this respect is eventual consistency; the data, and respective replicas of the same data item, will become consistent in time after each transaction, but continuous consistency is not guaranteed.

The relational data model is cast aside for other modelling paradigms, which are typically less rigid and better able to cope with quickly evolving data structures. Often, the API (Application Programming Interface) and/or query mechanism are much simpler than in a relational setting. The Comparison Box provides a more detailed comparison of the typical characteristics of NoSQL databases against those of relational systems. Note that different categories of NoSQL databases exist and that even the members of a single category can be very diverse. No single NoSQL system will exhibit all of these properties.

COMPARISON BOX


Relational Databases
NoSQL Databases
Data paradigm
Relational tables
Key-value (tuple) based
Document based
Column based
Graph based
XML, object based
Others: time series, probabilistic, etc.
Distribution
Single-node and distributed
Mainly distributed
Scalability
Vertical scaling, harder to scale horizontally
Easy to scale horizontally, easy data replication
Openness
Closed and open source
Mainly open source
Schema role
Schema-driven
Mainly schema-free or flexible schema
Query language
SQL as query language
No or simple querying facilities, or special-purpose languages
Transaction mechanism
ACID: Atomicity, Consistency, Isolation, Durability
BASE: Basically available, Soft state, Eventually consistent
Feature set
Many features (triggers, views, stored procedures, etc.)
Simple API
Data volume
Capable of handling normal-sized data sets
Capable of handling huge amounts of data and/or very high frequencies of read/write requests

Characteristics of NoSQL databases versus relational databases.

We note, however, that the explosion of popularity of NoSQL data storage layers should be put in perspective considering their limitations. Most NoSQL implementations have yet to prove their true worth in the field (most are very young and in development). Most implementations sacrifice ACID concerns in favor of being eventually consistent, and the lack of relational support makes expressing some queries or aggregations particularly difficult, with map-reduce interfaces being offered as a possible, but harder to learn and use, alternative.

Combined with the fact that RDBMSs do provide strong support for transactionality, durability and manageability, quite a few early adopters of NoSQL were confronted with some sour lessons. See for instance the FreeBSD maintainers speaking out against MongoDB’s lack of on-disk consistency support[1], Digg struggling with the NoSQL Cassandra database after switching from MySQL[2] and Twitter facing similar issues as well (which also ended up sticking with a MySQL cluster for a while longer)[3], or the fiasco of HealthCare.gov, where the IT team also went with a badly-suited NoSQL database[4].

It would be an over-simplification to reduce the choice between RDBMSs and NoSQL databases to a choice between consistency and integrity on the one hand, and scalability and flexibility on the other. The market of NoSQL systems is far too diverse for that. Still, this tradeoff will often come into play when deciding on taking the NoSQL route. We see many NoSQL vendors focusing again on robustness and durability.

We also observe traditional RDBMS vendors implementing features that let you build schema-free, scalable data stores inside a traditional RDBMS, capable to store nested, semi-structured documents, as this seems to remain the true selling point of most NoSQL databases, especially those in the document store category. Some vendors have already adopted “NewSQL” as a new term to describe modern relational database management systems that aim to blend the scalable performance and flexibility of NoSQL systems with the robustness guarantees of a traditional DBMS.

Expect the future trend to continue towards adoption of such “blended systems”, except for use cases that require specialized, niche database management systems. In these settings, the NoSQL movement has rightly taught users that the one size fits all mentality of relational systems is no longer applicable and should be replaced by finding the right tool for the job.

For instance, graph databases arise as being “hyper-relational” databases, which makes relations first class citizens next to records themselves rather than doing away with them altogether. Graph databases express complicated queries in a straightforward way, especially where one must deal with many, nested, or hierarchical relations between objects. The below table concludes this article by summarizing the differences between traditional RDBMSs, NoSQL DBMSs and NewSQL DBMSs.

Traditional SQL RDBMSs

NoSQL databases

Blended systems, “NewSQL”

Relational

Yes

No

Yes

SQL

Yes

No, though can come with own query languages

Yes

Column stores

No

Yes

Yes

Scalability

Limited

Yes

Yes

Consistency model

Strong

Eventually consistent, though some efforts to enforce stronger consistency

Strongly consistent for the most part

BASE (Basically Available, Soft state and Eventually consistent)

No

Yes

No

Handles large (big) volumes of data

No

Yes

Yes

Schema-less

No

Yes

No, though can store and query free-structured fields

For more information, we are happy to refer to our book, "Principles of Database Management: The Practical Guide to Storing, Managing and Analyzing Big and Small Data," www.pdbmbook.com and YouTube channel: https://www.youtube.com/watch?v=o36Z_OqC2ac&list=PLdQddgMBv5zHcEN9RrhADq3CBColhY2hl


[1] http://www.ivoras.net/blog/tree/2009-11-05.a-short-time-with-mongodb.html
[2] http://www.forbes.com/2010/09/21/cassandra-mysql-software-technology-cio-network-digg.html
[3] https://techcrunch.com/2010/07/09/twitter-analytics-mysql/
[4] https://gigaom.com/2013/11/25/how-the-use-of-a-nosql-database-played-a-role-in-the-healthcare-gov-snafu/

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access