Chapter24 Nosql Dbs

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

CH24:

NOSQL Databases
and Big Data Storage Systems
This Lecture
1- Introduction to NOSQL Systems
2- The CAP Theorem

x 3- Document-Based NOSQL Systems and MongoDB


x 4- NOSQL Key-Value Stores
x 5- Column-Based or Wide Column NOSQL Systems
6- NOSQL Graph Databases and Neo4j
Introduction to NOSQL Systems
no8ftdatabaseprovides a mechanism for
1.1 Emergence of NOSQL Systems storageandretrievaldata ismodeledin
of that
means other
thanthetabularrelationsused
in relationaldatabase
§ Understand what NoSQL is and what it is not.
§ Explore the relationships between NoSQL and RDBMS.
1.2 Characteristics of NOSQL Systems
1.3 Categories of NOSQL Systems
§ Understand the types of NoSQL Databases.
Storage and Retrieval of Data
Storage and Retrieval of Data
Noneedfor
database
§ Flat file systems (for small data e.g., notepad text, Excel spreadsheet)
§ RDBMS – Structured data
FAI
§ OLAP – Cubes
§ NoSQL - Collections in
a
In-Class Exercise (1 minute)
J
Team exercises: Have fun!!!

•Break into teams of 2


•Complete the following questions
Q- What is the crucial difference between a flat file and a
Relational Database?
th if s
Refrencing older both organize
record singletablestructure
Storage in record inner
newer coeelmusingee
simple
Sater portable pics of data
frixble no specials
taste Painful filemaker
eh
Berkoff Slide 6- 6
my
Cdncuracy
control

Introduction to NOSQL Systems


Example Database Systems

neuralnetworks
Datamining
SQL vs. NoSQL:
What is the difference?

§ What kind of data do we store?

§ How many machines do we use?

§ How do we prevent inconsistent data? in nosql


SQL vs. NoSQL:
Typical SQL Database
SQL vs. NoSQL:
Typical NoSQL Database
§ NoSQL Databases are distributed, non-relational database, designed
for large-scale data storage and massive parallel data processing across
a large number of commodity servers.

§ They use non-SQL languages and mechanisms to interact with

§ NoSQL database systems arose alongside major Internet companies,


such as Google, Amazon, and Facebook which had challenges in
dealing with huge quantities of data.
§
It tocontrol
way
a
m
These systems are designed to scale thousands or millions of users
consistnet

doing updates as well as reads, in contrast to traditional DBMSs and


data warehouses.
Introduction to NOSQL Systems:
RDBMS vs. NoSQL

RDBMS NOSQL (Not_Only_SQL))


vertical
sculling scaling
Horizontal

Scale up WGN Nh Scale out overload adjustcapacityoroverall


performance

Structure Data Semi-Structure Data


Relational Schema Schema-free y
Relational Object-oriented
Consistent Eventually consistent
Stable Scalable
ORACLE Key Value: Riak
types
SQL SERVER Column-store: Cassandra
DB2 Document: MongoDB
MySQL Graph: Neo4j
Why NoSQL?
§ Relational DBMSs have been a successful technology for many
years, providing persistence, concurrency control and integration
mechanisms.

§ The need of processing large amount of data changes the direction


from scaling vertically to scaling horizontally on clusters.
a
§ NoSQL databases focus on analytical processing of large scale
datasets, offering increased scalability over commodity hardware.

§ Organizations that collect large amounts of unstructured data are


increasingly turning to non-relational databases (NoSQL databases).
Five core features of NoSQL Database
1. Not based on relational database model

2. Supports distributed database architectures

3. Provides high scalability, availability, and fault tolerance

4. Supports large amounts of sparse data

5. Performance is valued over consistency (eventual consistency)


1.1 Emergence of NOSQL
Systems
SQL system may not be appropriate for some applications such as
Emails
◦ SQL systems offer too many services (powerful query language,
concurrency control, etc.), which this application may not
need;
◦ structured data model such the traditional relational model
may be too restrictive.
◦ NoSQL typically does not use SQL:
◦ SQL = Structured Query Language
◦ Structure exists but is more Flexible.
◦ SQL require schemas, which are not required by many of
the NOSQL systems.
1.1 Emergence of NOSQL
Systems
Examples of NOSQL systems:
◦ Google – BigTable (Column-store)
◦ Facebook – Cassandra (Column-store)
◦ Amazon – DynamoDB (Key-Value store)
◦ MongoDB (Document-store)collections
◦ CouchDB (Document-store)
◦ Graph databases like Neo4J and GraphBase
1.2 Characteristics of NOSQL
Systems
1.2.1. NOSQL characteristics related to distributed databases and
distributed systems.

1.2.2. NOSQL characteristics related to data models and query


languages.
1.2.1 NoSQL characteristics related to
distributed databases and distributed
systems
1- Scalability: weber
◦ horizontal scalability: adding more nodes for data storage and
processing as the volume of data grows.
stony
expending
size for eachnode
◦ Vertical scalability: expanding the storage and computing
power of existing nodes.

◦ In NOSQL systems, horizontal scalability is employed while the


system is operational, so techniques for distributing the
existing data among new nodes without interrupting system
operation are necessary.
1.2.1 NoSQL characteristics related to
distributed databases and distributed
systems
2- Availability, Replication and Eventual Consistency:
Many applications that use NOSQL systems requires
continuous system availability, therefore:
◦ Data is replicated over two or more nodes in a transparent
manner.
◦ Update must be applied to every copy of the replicated data
items.
◦ Eventual consistency: is a consistency model used in
distributed computing to achieve high availability that
informally guarantees that, if no new updates are made to a
given data item, eventually all accesses to that item will return
the last updated value. updatewilleventuallybereflectedin all nodesthatstore
timethedataisqueried
dataresulting inthesameresponseevery
1.2.1 NoSQL characteristics related to
distributed databases and distributed
systems
3- Replication Models:
3.1 Master-slave replication: requires one copy to be the master
copy;
◦ Write operations must be applied to the master copy, usually using
eventual consistency
◦ For read, all reads are from the master copy, or reads at the slave
copies but would not guarantee that the values are the latest
writes.
1.2.1 NoSQL characteristics related to
distributed databases and distributed
systems
1.2.1 NoSQL characteristics related to
distributed databases and distributed
systems
3- Replication Models:
3.2 Master-master replication: allows reads and writes at any of the
replicas.
consistact
Wphedgemthewith ◦ The values of an item will be temporarily inconsistent. evented
◦ Reconciliation method to resolve conflicting write operations of the
same data item at different nodes must be implemented as part of
problem the master-master replication scheme.

is
1.2.1 NoSQL characteristics related to
distributed databases and distributed
systems
1.2.1 NoSQL characteristics related to
distributed databases and distributed
systems data
4- Sharding of Files: I e
◦ Files can have many millions of records accessed concurrently by
thousands of users.
◦ Sharding (also known as horizontal partitioning) serves to
avialible
distribute the load of accessing the file records to multiple nodes.
concurg◦ Shards works in tandem to improve load balancing on the
replication as well as data availability.
1.2.1 NoSQL characteristics related to
distributed databases and distributed
systems
5- High-Performance Data Access:
Two techniques to find individual records or objects (data items):
1. Hashing: The location of the value is given by the result of h(k).
2. Range partitioning: the location is determined via a range of key values.
Eiistriya Example: location i would hold the objects whose key values K are in the
range Kimin ≤ K ≤ Kimax.
In applications that require range queries, where multiple objects within a
range of key values are retrieved, range partitioned is preferred.
In-Class Exercise
J
Team exercises: Have fun!!!

•Break into teams of 2


•Complete the following questions

Q- What is the difference between using Consistency in


RDBMS versus using Consistency in NoSQL?

Slide 6- 26
1.3 Categories of NOSQL
Systems
format
The most common categories: Documents

1. Document-based NOSQL systems: Ice


◦ Store data in the form of documents using well-known formats such as JSON.
◦ Documents are accessible via their document id, but can also be accessed rapidly using
other indexes.

2. NOSQL key-value stores:


◦ Fast access by the key to the value associated with the key
◦ Value can be a record or an object or a document or even have a more complex data
structure.

3. Column-based or wide column NOSQL systems:


◦ Partition a table by column into column families
◦ Form of vertical partitioning. between
Tumenggretationship important
4. Graph-based NOSQL systems: nodes more
values nodes
◦ Data is represented as graphs than
◦ Related nodes can be found by traversing the edges using path expressions.
The CAP Theorem
The CAP: it’s impossible to guarantee consistency, availability and
partition tolerance at the same time in a distributed system with data
replication.
§ Consistency: every read would get you the most recent write (All
nodes see the same data at the same time.) Typical relational
databases are consistent: SQL Server, MySQL, and PostgreSQL.
§ Availability: every node (if not failed) always executes queries.
Typical relational databases are also available: : SQL Server, MySQL,
and PostgreSQL. This means that relational databases exist in the
CA space - consistency and availability.
with distributed db
§ Partition-tolerance: even if the connections between nodes are
down, the other two (A & C) promises, are kept.
So, two properties out of the three to guarantee.
The CAP Theorem
• Pick two Theorem which means that any
distributed system cannot guaranty C, A, and P
simultaneously, rather, trade-offs must be made at
a point-in-time to achieve the level of performance
and availability required for a specific task.
NoSQL Databases types
In-Class Exercise (1 minute)
J
Team exercises: Have fun!!!

•Break into teams of 2


•Complete the following questions

Q- MongoDB is a _________ database that provides high


performance, high availability, and easy scalability.
a) graph
b) key value
c) document
O
d) All of the mentioned
Documents (objects) map nicely to programming language
data types.
Slide 6- 49
In-Class Exercise (1 minute)
J
Team exercises: Have fun!!!

•Break into teams of 2


•Complete the following questions

Q- When should a key-value data model be used?

A- When attributes are numerous but actual data values are


rare (sparse data)

Slide 6- 53
NOSQL Graph-based stores
• Graph databases replace relational tables with
structured relational graphs of interconnected
key-value pairings.

• Graph databases are useful when you are more


interested in relationships between data than
the data itself and it works perfectly for the
social network.

• It is optimized for relationship traversing not for


querying

• Examples: Neo4j, InfoGrid, Sones GraphDB,


AllegroGraph, InfiniteGraph
NOSQL Graph-based stores
§ It Used for data whose relations are represented well in a graph. Data
is stored in graph structures with nodes (entities), properties
(information about the entities) and lines (connections between the
entities). Noedges

inexistent
In-Class Exercise (1 minute)
J
Team exercises: Have fun!!!

•Break into teams of 2


•Complete the following questions

Q- Discuss with your group some of the characteristics of


NoSQL databases?
spot
A-
1. "schema-less”
2. do not enforce relationships among entities
3. often difficult to create indexes on the data

Slide 6- 57
In-Class Exercise (1 minute)
J
Team exercises: Have fun!!!

•Break into teams of 2


•Complete the following questions

Q- ________ stores are used to store information about


networks, such as social connections.
a) Key-value
b) Wide-column
c) Document
d) Graph
e
Graph stores
Slide 6- 58

You might also like