NoSQL



index
Disabled back button Next Section
printable version

Section 0: Module Objectives or Competencies
Course Objective or Competency Module Objectives or Competency
The student will learn that NoSQL databases are available to store and process Big Data, optimized for data analytics for developing data-driven intelligent applications from Big Data. The student will be able to list and explain the characteristics of a NoSQL database.
The student will be able to list and explain the characteristics of key-value databases.
The student will be able to list and explain the characteristics of document databases.
The student will be able to list and explain the characteristics of column-oriented databases.
The student will be able to list and explain the characteristics of graph databases.
The student will be able to explain how NoSQL databases do not replace relational databases, but simply serve a different purpose.
The student will be able to explain the BASE concept and its implications on NoSQL databases.
Section 1: Overview

NoSQL - A database management system that is not based on the traditional relational database model.

NoSQL applies to a broad array of nonrelational database technologies that have developed to address the challenges represented by Big Data

  • The name was originally a Twitter hashtag to flag discussions about the nonrelational database technologies that were being developed by organizations like Google, Amazon, and Facebook to deal with the problems they were encountering as their data sets reached enormous sizes.
  • "NoSQL" does not indicate lack of a query language, nor is concept that the term "NoSQL" stands for "Not Only SQL" correct.
  • Many NoSQL products support query languages that mimic SQL in important ways, although there is currently no NoSQL system that implements standard SQL.
  • "Not Only SQL" is misleading because if the requirement to be considered a NoSQL product were simply that languages beyond SQL are supported, then all of the traditional RDBMS products would qualify.

Every time you search for a product on Amazon, send messages to friends in Facebook, watch a video on YouTube, or search for directions in Google Maps, you are using a NoSQL database.

What is NoSQL?

NoSQL refers to a new generation of databases that have the following general characteristics, and it's important to understand how these will fit together.

Section 2: Key-Value Databases

Key-value (KV) databases are conceptually the simplest of the NoSQL data models.

A key-value database is a simple database that contains a simple string (the key) that is always unique, and one or more associated arbitrary large data fields (the values).

The key-value data model is also referred to as the attribute-value or associative data model.

The database does not attempt to understand the contents of the value component or its meaning – the database simply stores whatever value is provided for the key.

There are no foreign keys; in fact, relationships cannot be tracked among keys at all.


Buckets

Key-value pairs are typically organized into "buckets."

All data operations are based on the bucket plus the key.


Operations

Operations on KV databases are rather simple – only get, store, and delete operations are used.


Example 1

The figure below shows a customer bucket with three key-value pairs.

key-value example.

Since the KV model does not allow queries based on data in the value component, it is not possible to query for a key-value pair based on customer last name, for example.


Example 2

The figure below shows the example of a small truck-driving company called Trucks-R-Us.

Another key-value example.

Each of the three drivers has one or more certifications and other general information. Using this example, we can draw the following important points:


Implementations

Several NoSQL database implementations, such as Google's BigTable and Apache's Cassandra, have extended the key-value data model to group multiple key-value sets into column families or column stores.

Section 3: Document Databases

Document databases are conceptually similar to key-value databases, and they can almost be considered a subtype of KV databases.

A document database is a NoSQL database that stores tagged document-oriented information, also known as semi-structured data, and uses an index to associate "keys" with "documents."

Another important difference is that while KV databases do not attempt to understand the content of the value component, document databases do.

Despite the use of tags in documents, document databases are considered schema-less, that is, they do not impose a predefined structure on the data that is stored.

The tags in a document database are extremely important because they are the basis for most of the additional capabilities that document databases have over KV databases.


Collections

Just as KV databases group key-value pairs into logical groups called buckets, document databases group documents into logical groups called collections.


Example

The figure below represents the same data as Example 1 above, but in a tagged format for a document database.

Document database tagged format.

Document databases even support some aggregate functions such as summing or averaging balances in queries.


Additional Details

Document databases tend to operate on an implied assumption that a document is relatively self-contained, not a fragment of the data about a given topic.

For example, in a relational database data about orders may be decomposed into customer, invoice, line, and product tables.

Document databases do not store relationships as perceived in the relational model and generally have no support for join operations.

Section 4: Column-Oriented Databases

A column-oriented database, also called column family database, is a NoSQL database that organizes data in key-value pairs with keys mapped to a set of columns in the value component.


Groups

As more columns are added, it becomes clear that some columns form natural groups, such as Fname, Lname, and Initial which would logically group together to form a customer’s name.

Similarly, Street, City, State, and Zip would logically group together to form a customer’s address.


Row Keys

Row keys are created to identify objects in the environment.


Resources

Section 5: Graph Databases

A graph database is a NoSQL database based on graph theory to store data about relationship-rich environments.

Interest in graph databases originated in the area of social networks.

The primary components of graph databases are nodes, edges, and properties as seen in the figure below.

Graph database representation.

Similarities and Differences

Graph databases share some characteristics with other NoSQL databases:

However, other key characteristics do not apply to graph databases.


Resources

Section 6: NewSQL Databases

Relational databases are the mainstay of organizational data, and NoSQL databases do not attempt to replace them for supporting line-of-business transactions.

NewSQL databases try to bridge the gap between RDBMS and NoSQL.

Like RDBMS, NewSQL databases support:

Similar to NoSQL, NewSQL databases also support:


Disadvantages

No technology can perfectly provide the advantages of both RDBMS and NoSQL.

Resources:

Section 7: BASE

If the CAP Theorem accurately states the limitations of distributed databases, how do Google’s BigTable and Amazon’s Dynamo and Facebook’s Cassandra deal with a loss of consistency and still maintain system reliability?

One answer is BASE (basically available, soft state, eventually consistent).


Basically Available, Soft state, Eventually consistent

The BASE model isn't appropriate for every situation, but it is certainly a flexible alternative to the ACID model for databases that don't require strict adherence to a relational model.


Recap

Databases with BASE consistency model (NoSQL databases) prefers availability over the consistency of replicated data at write time.


Navigating ACID vs. BASE Trade-offs

There is no right answer to whether an application needs an ACID versus BASE consistency model. Developers and data architects should select their data consistency trade-offs on a case-by-case basis – not based just on what’s trending or what model was used previously.

Given BASE’s loose consistency, developers need to be more knowledgeable and rigorous about consistent data if they choose a BASE store for their application. It's essential to be familiar with the BASE behavior of your chosen aggregate store and work within those constraints.

On the other hand, planning around BASE limitations can sometimes be a major disadvantage when compared to the simplicity of ACID transactions. A fully ACID database is the perfect fit for use cases where data reliability and consistency are essential.


Links

Section 8: Summary

There are literally hundreds of products that can be considered as being under the broadly defined term NoSQL.

The table below shows some popular NoSQL databases of each type.

NoSQL databases.
Section 9: Resources

An Introduction to NoSQL Databases

NoSQL Database Tutorial for Beginners

SQL vs NoSQL or MySQL vs MongoDB

What is NoSQL Database?

SQL vs NoSQL: All You Need to Know

In 2000, Eric Brewer presented the CAP Theorem, which states that there are three essential system requirements necessary for the successful design, implementation, and deployment of applications in distributed computing systems – Consistency, Availability, and Partition Tolerance – but in the majority of instances, a distributed system can only guarantee two of the three features.