Professional Documents
Culture Documents
NoSQL Gnosis. - Resp
NoSQL Gnosis. - Resp
Prelude
From your smartphone to your tablet, to your laptop or your PC - data is pervasive
and plentiful. And the things it can do for you are amazing. In today’s data-intensive
world, much enterprise focus settles on analytics; in other words, the central problem
becomes what to do with all the data you have collected .
It is a significant problem to solve, but we will never get there if we do not have an
efficient, long-term data storage solution to provide a stable foundation. After all, you
cannot analyze data if you have nowhere to put it.
Why NoSQL?
RDBMS
Relational Database Management Systems (RDBMS) is a powerful technology
for storing structured data in web and business applications and is considered to
be the forerunner of NoSQL.
Since the publication of Codds paper A relational model of data for large
shared data banks in 1970, these data stores have been widely adopted and are
often thought of as the only alternative for data storage accessible in a consistent
way.
Drawbacks
What do you think is the reason for the emergence of NoSQL when we had RDBMS?
The following features of RDBMS will answer the question:
Scalability
Cost
Why NoSQL?
Scaling
A way to provide a rich query model is to keep the dataset on a single machine.
Vertical scaling and Horizontal scaling (multi-node Database solutions) of
servers to store data ended up in vain many times.
Some approaches of Horizontal scaling are:
o Master-Slave
o Sharding
Master-Slave Sharding
Critical reads may be incorrect as writes may not be Application is not transparent
propagated down. enough to know the partitions.
Why NoSQL?
Cost
What is NoSQL?
Definition
NoSQL stands for Not Only SQL. It provides a way to store and retrieve the data that is
stored in tabular format as in relational databases.
An epitome of NoSQL
This video briefs about NoSQL and its characteristics.
Schema Agnostic
NoSQL database does not require Schema like RDBMS databases.
It provides the flexibility of storing information without doing up-front design.
We can store and retrieve the data without the knowledge of the working of
the database.
Schema Agnostic is considered to be the most significant difference between
NoSQL and RDBMS databases.
The development time of the database is reduced.
Fact
Non-Relational
In RDBMS, the main goal is to normalize (organizing tables and fields to remove
duplicates) data while in NoSQL, data is stored multiple times.
Example
Consider an example of an online retail store. You can store the delivery address
across many orders a customer places rather than storing it just once and referring it
when required.
Here arises the question.
It does require extra storage space. So why do it?
The two main reasons are:
Consider all the messages and tweets on Facebook and Twitter. Though the data is
mostly about what people had for their breakfast or cute pet videos, a distributed
mechanism to effectively manage all the data is required.
Distinguishable Features
The above image depicts some of the popular NoSQL databases.
Few discernible features of NoSQL that make it unique from the other databases are
Scale-out
Replication
Flexible Data Structure
Prelude
NoSQL database world is brimmed up with acronyms. Let us unveil a few to appreciate
NoSQL more.
CAP Theorem
Consistency
All the protocols must be satisfied by the transaction. There must be no half-completed
transactions.
Availability
Resources must be available always.
Partition tolerance
BASE
Basically Available: Guaranteed availability of data anytime.
CAP Theorem
Scrutinize your understanding of CAP theorem through this video.
BASE Properties
Know more about the BASE properties of NoSQL databases.
Eventual Consistency
According to Wikipedia:
Eventual consistency is a consistency model used in distributed computing to
achieve high availability that informally guarantees that, if no new updates are
made to a given data item, eventually all accesses to that item will return the last
updated value.
Frequent updates
Frequent modifications
Frequent deletions
Consistency requirements
Eventual vs Strong
Check out this video to know the differences between Eventual and Strong consistency.
Prelude
Getting your head around NoSQL can be a bit hard. If you studied databases in school,
you might have been influenced to think relationally.
Most people think of RDBMS when they hear the word 'database'. This is natural
because, during the past 30 years, RDBMS has been dominating.
Here are some key terms prevalent to NoSQL databases. Learn them to understand the
beauty of NoSQL.
Database Database
Table Collection
Column Field
Index Index
Primary key - Specify any unique column or Primary Key - In NoSQL, the primary key is
column combination as the primary key. automatically set to the _id field.
Operators Mapping
The following table maps the aggregation operators of SQL with that of NoSQL.
SQL Aggregation Operators NoSQL Aggregation Operators
WHERE $match
GROUP BY $group
SELECT $out
JOIN $lookup
ORDER BY $sort
LIMIT $limit
NoSQL
NoSQL is a non-relational database that does not incorporate the table model. Instead,
data can be stored in a single document file.
Key-value store
Document databases
Graph databases
Wide column stores
Examples
SQL: MySQL, Postgres, Microsoft SQL Server, Oracle Database
NoSQL: MongoDB, Cassandra, HBase, Neo4j
Scaling
SQL: Vertical Scaling
NoSQL: Horizontal Scaling
Consistency
SQL: Strong consistency only
NoSQL: Strong or eventually consistent
Prelude
NoSQL is used to describe a family of databases that are non-relational. While the
technologies, data types, and use cases vary wildly among them, it is generally agreed
that there are four types of NoSQL databases.
History
Brief History of NoSQL Databases
1. 1998- Carlo Strozzi uses the term NoSQL for his lightweight, open-source
relational database
2. 2000- Graph database Neo4j is launched
3. 2004- Google BigTable is launched
4. 2005- CouchDB is launched
5. 2007- The research paper on Amazon Dynamo is released
6. 2008- Facebooks open sources the Cassandra project
7. 2009- The term NoSQL was reintroduced
Column-based Database
Column-based database stores data as column families.
Column families are a group of related data that are accessed together.
This database is used in Content management systems, blog management,
and log aggregation.
Examples
o HBase
o Cassandra
o Hypertable
Document-based Database
Database stores and retrieves the document.
It stores a document in the value part of the key-value store.
This database is used in Content management, web analytics, and real-time
analytics.
Examples
o MongoDB
o CouchDB
o MarkLogic
o RavenDB
Graph-based Database
This database stores the entities and the relationship between them as
nodes and edges of the graph, respectively.
Graph-based database stores entities and the relationship between
them as edges and nodes of a graph, respectively.
Every node and edge has a unique identifier.
This database is used in Social Network data, Spatial data, and routing
information
Examples
o Neo4J
o Infinite Graph
o FlockDB
Know More!
Check out this video to explore more on the types of NoSQL databases!
Summary
In today's world, the word Data by itself has a large amount of power to it. Storing and
accessing it when required is arduous.
The recent trends in IT flood us with unstructured data, sparse data problem,
dynamically changing relationships, and globally distributed data, which adds more
weight to the crown.
NoSQL salvages us from all these menaces. But, hold on! You are still in the ocean.
Explore more to reach the NoSQL shore.