Mind Matters Natural and Artificial Intelligence News and Analysis
Programming source code abstract background
Programming source code abstract background

NoSQL Databases are the Problem, Not the Solution

NoSQL means that you will be continually rewriting your code

It’s amazing how much we forget about our own history. Many people think that NoSQL databases are the “next big thing” in technology, and that we should write all of our core applications using them. However, NoSQL databases actually predate relational databases, and common relational databases were established to solve the problems that NoSQL brings.

What are the advantages of NoSQL databases? There are essentially two — they are fast, fast, fast, and they can scale, scale, scale. This much is true. However, if you aren’t building the next Facebook, you probably don’t need that much speed and scale. The fact is, this much speed and scale comes at a cost, and, even in the 1970s, with the limited computers that were available at the time, it was generally decided that the speed and scale of such databases were not worth the complexity needed to manage them.

Prior to the 1970s, two database technologies were prominent — the network model and the hierarchical model. The network model was standardized under a group known as CODASYL (the Committee on Data Systems Languages). The CODASYL model is almost exactly equivalent to the modern NoSQL databases such as DynamoDB.

The problem of the network database models are actually very well-known:

  • Lack of data normalization means that data is repeated in the database and requires the programmer to remember all of the locations where it is stored and keep them up to date.
  • Data access patterns need to be predefined. Adding new data access patterns often requires major updates to the underlying data structure.
  • Backwards compatibility is difficult to maintain. Database changes require simultaneous changes in code. In a relational database, accesses can be done through views which provide application-specific portals into the data.

NoSQL often leads to entirely different databases for different purposes. Because the data has to be stored in a way that is similar to the way that it is used, you often times wind up with the same data stored in different databases for transaction processing, operational analytics, and business analytics, while in SQL they all share a common underlying database.

In the early 1970s, E. F. Codd developed the idea of relational databases as a solution to these problems, and, in the 1980s, SQL was standardized as the common language used to access these databases. Not only do relational databases solve these problems, they provide a theoretical framework for understanding, modeling, and manipulating data that works for nearly every situation. NoSQL takes all of those gains and essentially chucks them in the trash.

Additionally, modern NoSQL databases don’t provide the same consistency guarantees as modern relational databases. Modern relational databases follow the ACID standard. Each transaction is atomic (happens all-at-once or not-at-all), consistent (all committed data follows all integrity rules), isolated (reduces problems with concurrency), and durable (if a transaction says it was successfully committed, even on system failure you won’t lose it). These guarantees are not available in most NoSQL databases.

In short, most of what has transpired in the NoSQL movement is simply a forgetting of the reasons that we switched to relational databases to begin with.

This doesn’t mean that there are no uses of non-relational databases. The point, however, is that NoSQL databases are almost by definition a specialty solution. If you aren’t facing very specific problems that are fixed by a NoSQL database, their limitations far outweigh their advantages. You are much more likely to run into a situation in which you need flexibility than you run into a situation where you need more raw disk-access speed. Developers (and entrepreneurs) like to imagine themselves as the next Facebook or Google, but the reality is that developing for that target before you get there is a recipe for disaster. You should almost always start with an SQL database, and only move to a NoSQL database if the circumstances dictate.

You might think, “we should just code using NoSQL to begin with, and then not have to rewrite it.” However, the fact is that NoSQL means that you will be continually rewriting it. A plain relational database will give you the most flexibility across several orders of magnitude of scale, while a NoSQL database will have you rewriting large portions of your database and code almost continually.

Jonathan Bartlett

Senior Fellow, Walter Bradley Center for Natural & Artificial Intelligence
Jonathan Bartlett is a senior software R&D engineer at Specialized Bicycle Components, where he focuses on solving problems that span multiple software teams. Previously he was a senior developer at ITX, where he developed applications for companies across the US. He also offers his time as the Director of The Blyth Institute, focusing on the interplay between mathematics, philosophy, engineering, and science. Jonathan is the author of several textbooks and edited volumes which have been used by universities as diverse as Princeton and DeVry.

NoSQL Databases are the Problem, Not the Solution