Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 22

NoSQL Gnosis

Prelude
From your smartphone to your tablet, to your laptop or your PC - data is pervasive
and plentiful. And the things it can do for you are amazing. In today’s data-intensive
world, much enterprise focus settles on analytics; in other words,  the central problem
becomes what to do with all the data you have collected .
It is a significant problem to solve, but we will never get there if we do not have an
efficient, long-term data storage solution to provide a stable foundation. After all, you
cannot analyze data  if you have nowhere to put it.

The introduction of NoSQL to the world comforted us with the distinguishing


characteristics of storing and operating data.

Why NoSQL?
RDBMS
 Relational Database Management Systems (RDBMS) is a powerful technology
for storing structured data in web and business applications and is considered to
be the forerunner of NoSQL.
 Since the publication of Codds paper A relational model of data for large
shared data banks in 1970, these data stores have been widely adopted and are
often thought of as the only alternative for data storage accessible in a consistent
way.

Drawbacks
What do you think is the reason for the emergence of NoSQL when we had RDBMS?
The following features of RDBMS will answer the question:

 Scalability
 Cost

Why NoSQL?
Scaling

 A way to provide a rich query model is to keep the dataset on a single machine.
 Vertical scaling and Horizontal scaling (multi-node Database solutions) of
servers to store data ended up in vain many times.
 Some approaches of Horizontal scaling are:
o Master-Slave
o Sharding

Master-Slave and Sharding

Master-Slave Sharding

All writesare written to the master while


Scales well for both reads and
the reads are performed against replicated slave
writes.
databases.

Large Datasets may pose a problem because the Loss of referential


master needs to duplicate data to the slaves. integrity across shards.

Critical reads may be incorrect as writes may not be Application is not transparent
propagated down. enough to know the partitions.

Why NoSQL?
Cost

 Horizontal and Vertical Scaling increase cost significantly.

Did you know?

 Vertical scaling is called Scaling up


 Horizontal scaling is called Scaling out

What is NoSQL?
Definition
NoSQL stands for  Not Only SQL. It provides a way to store and retrieve the data that is
stored in tabular format as in relational databases.

 Introduced by Carl Strozzi in 1988, it was later reintroduced by Eric Evans.


 It is a complementary addition to SQL and Relational databases.
 These are flexible database management systems that provide a way to store
and process both structured and semi-structured data.

An epitome of NoSQL
This video briefs about NoSQL and its characteristics.

What makes NoSQL Stand-out?


Few notable features of NoSQL databases makes them stand out from other traditional
databases.
In this topic, let us explore them to realize the real value of NoSQL.

Schema Agnostic
 NoSQL database does not require Schema like RDBMS databases.
 It provides the flexibility of storing information without doing up-front design.
 We can store and retrieve the data without the knowledge of the working of
the database.
 Schema Agnostic is considered to be the most significant difference between
NoSQL and RDBMS databases.
 The development time of the database is reduced.

Fact

Not all NoSQL databases are entirely schema


agnostic.
In HBase, to alter the column definitions, it is
necessary to make certain changes in the database.
But still, it is considered to be a NoSQL database
because not all fields are required to be known in
advance except for the column families.

Non-Relational
In RDBMS, the main goal is to normalize (organizing tables and fields to remove
duplicates) data while in NoSQL, data is stored multiple times.

Example
Consider an example of an online retail store. You can store the delivery address
across many orders a customer places rather than storing it just once and referring it
when required.
Here arises the question.
It does require extra storage space. So why do it?
The two main reasons are:

1. Easy storage and retrieval


2. Query speed

Highly Distributable and Uses Commodity


Hardware
 The key design solution of NoSQL databases is to distribute data across
multiple machines for a single database.
 In the case of a huge dataset, even the largest available server cannot process
the entire data. The data distributed across the machines prove to be
advantageous.
 NoSQL follows a Shared Nothing architecture.

Consider all the messages and tweets on Facebook and Twitter. Though the data is
mostly about what people had for their breakfast or cute pet videos, a  distributed
mechanism to effectively manage all the data is required.

Highly Distributable and Uses Commodity


Hardware
Advantages
 Commodity Servers
o Distributing the database provides an option of using cheaper servers
called Commodity Servers.
o Even for a smaller dataset, it is cheaper to buy two or three commodity
servers than purchase an expensive single high powered server to
process them.
 High Availability
o Though the data is distributed across, replicating it once or twice across
the machines and servers will provide high accessibility and availability.

Did you Know?


Not all open‐source databases support high availability unless you buy the supported,
paid‐for version of the database from the company that develops it.

Advantages over RDBMS


Explore this video to know about the advantages of NoSQL over RDBMS.

Distinguishable Features
The above image depicts some of the popular NoSQL databases.

Few discernible features of NoSQL that make it unique from the other databases are

 Scale-out
 Replication
 Flexible Data Structure

Prelude
NoSQL database world is brimmed up with acronyms. Let us unveil a few to appreciate
NoSQL more.
CAP Theorem

 CAP theorem is also known as Brewer’s theorem. It states that:

It is impossible for a distributed data store to simultaneously provide more


than  two out of Consistency, Availability, Partition Tolerance guarantees.

Consistency

All the protocols must be satisfied by the transaction. There must be no half-completed
transactions.

Availability
Resources must be available always.

Partition tolerance

No single point (or node) of failure.

BASE to the rescue!


NoSQL entrusts a softer model known as the BASE (Basically Available, Soft State,
Eventual Consistency) model, which is contrary to the ACID model, followed by
RDBMS.

BASE
 Basically Available: Guaranteed availability of data anytime.

 Soft State: The state of the system changes with time.

 Eventual Consistency: The system will eventually become consistent once it


stops receiving inputs.

CAP Theorem
Scrutinize your understanding of CAP theorem through this video.

BASE Properties
Know more about the BASE properties of NoSQL databases.

Eventual Consistency
According to Wikipedia:
Eventual consistency is a consistency model used in distributed computing to
achieve  high availability that informally guarantees that, if no new updates are
made to a given data item, eventually all accesses to that item will return the  last
updated value.

Eventual Consistency should not be used in places where there are:

 Frequent updates
 Frequent modifications
 Frequent deletions
 Consistency requirements

Eventual vs Strong
Check out this video to know the differences between Eventual and Strong consistency.

Prelude
Getting your head around NoSQL can be a bit hard. If you studied databases in school,
you might have been influenced to think relationally.
Most people think of RDBMS when they hear the word 'database'. This is natural
because, during the past 30 years, RDBMS has been dominating.
Here are some key terms prevalent to NoSQL databases. Learn them to understand the
beauty of NoSQL.

Terms- SQL vs. NoSQL


The following table maps the SQL terms with NoSQL terms. Explore it!

SQL Terms NoSQL Terms

Database Database

Table Collection

Row Document or BSON document

Column Field

Index Index

Primary key - Specify any unique column or Primary Key - In NoSQL, the primary key is
column combination as the primary key. automatically set to the _id field.

Operators Mapping
The following table maps the aggregation operators of SQL with that of NoSQL.
SQL Aggregation Operators NoSQL Aggregation Operators

WHERE $match

GROUP BY $group

SELECT $out

JOIN $lookup

ORDER BY $sort

LIMIT $limit

SQL vs. NoSQL - Concepts


SQL

SQL is a relational database table, which organizes Structured Data fields into defined


columns.

NoSQL

NoSQL is a non-relational database that does not incorporate the table model. Instead,
data can be stored in a single document file.

SQL vs. NoSQL


Types of Databases
SQL
Relational Database
NoSQL

 Key-value store
 Document databases
 Graph databases
 Wide column stores

Examples
SQL: MySQL, Postgres, Microsoft SQL Server, Oracle Database
NoSQL: MongoDB, Cassandra, HBase, Neo4j

SQL vs. NoSQL


Data Storage Model
SQL
Every record is stored as rows in tables, with each column storing specific attribute
information about that record, much like a spreadsheet.
NoSQL
The storage model varies for different types of databases. Consider the document
databases in which all relevant data are stored together in a single document in JSON,
XML, or any other format.

SQL vs. NoSQL


Schemas
SQL: Structure and datatypes are fixed
NoSQL: Dynamic

Scaling
SQL: Vertical Scaling
NoSQL: Horizontal Scaling

Consistency
SQL: Strong consistency only
NoSQL: Strong or eventually consistent
Prelude
NoSQL is used to describe a family of databases that are non-relational. While the
technologies, data types, and use cases vary wildly among them, it is generally agreed
that there are four types of NoSQL databases.

Let us explore them in this topic!

History
Brief History of NoSQL Databases

1. 1998- Carlo Strozzi uses the term NoSQL for his lightweight, open-source
relational database
2. 2000- Graph database Neo4j is launched
3. 2004- Google BigTable is launched
4. 2005- CouchDB is launched
5. 2007- The research paper on Amazon Dynamo is released
6. 2008- Facebooks open sources the Cassandra project
7. 2009- The term NoSQL was reintroduced

Types of NoSQL Databases


NoSQL database is classified into four different types, namely

1. Key-Value Pair database


2. Column-based database
3. Document-based database
4. Graph-based database

Column-based Database
 Column-based database stores data as column families.
 Column families are a group of related data that are accessed together.
 This database is used in Content management systems, blog management,
and log aggregation.
 Examples
o HBase
o Cassandra
o Hypertable

Document-based Database
 Database stores and retrieves the document.
 It stores a document in the value part of the key-value store.
 This database is used in Content management, web analytics, and real-time
analytics.
 Examples
o MongoDB
o CouchDB
o MarkLogic
o RavenDB

Graph-based Database
 This database stores the entities and the relationship between them as
nodes and edges of the graph, respectively.
 Graph-based database stores entities and the relationship between
them as edges and nodes of a graph, respectively.
 Every node and edge has a unique identifier.
 This database is used in Social Network data, Spatial data, and routing
information
 Examples
o Neo4J
o Infinite Graph
o FlockDB

Key-Value Pair Database


 Data is stored in the key-value pairs.
 This database handles a lot of data.
 The records are stored and retrieved using a key that uniquely identifies the
record.
 This database is used in storing Session information, User profile,
preferences, and shopping cart data
 Examples
o Redis
o Amazon DynamoDB
o Oracle NoSQL Database

Know More!
Check out this video to explore more on the types of NoSQL databases!
Summary
In today's world, the word  Data by itself has a large amount of power to it. Storing and
accessing it when required is arduous.
The recent trends in IT flood us with unstructured data, sparse data problem,
dynamically changing relationships, and globally distributed data, which adds more
weight to the crown.
NoSQL salvages us from all these menaces. But, hold on! You are still in the ocean.
Explore more to reach the NoSQL shore.

You might also like