Module1-Topic1-Data Base Revolutions-Merged

Course code : CSE3009
Course title : No SQL Data Bases

Module :1
Topic :1
Introduction to NoSQL Concepts
Dr. Karthika Natarajan 5/23/2022 1

Objectives
This session will give the knowledge about

• Database revolutions
• First generation Database
• Second generation Database
• Third generation Database
• What is NoSQL?
• Comparison between SQL and NoSQL?

History of Database
• Databases are a foundational element of the modern world. We interact

with them even without knowing it — any time we buy something online, or
log in to a service, or access our bank accounts, and so on
• The concept of a database existed long before computers. In these times,

data was stored in journals, in libraries, and in hundreds of filing cabinets.
Everything was recorded via paper — and that meant it took up space, was
hard to find, and difficult to back up.
• Back then computers became available, and with them, the opportunity for
better data management.

What is Database?
A database is a collection of data, typically describing the activities of one or more

related entities and attributes.
A database is a collection of information that is organized so that it can be easily
accessed, managed and updated.
A database management system, or DBMS, is software designed to assist in

maintaining and utilizing large collections of data, and the need for such systems,
as well as their use, is growing rapidly.

Evolution of Database

First Database Revolution
• The emergence of electronic computers following the Second World War
represented the first revolution in databases.
• Early “databases” used paper tape initially and eventually magnetic tape to
store data sequentially.
• 1955: spinning magnetic disk - Data can be modified or can be deleted easily
in the magnetic disk memory. It also allows random access of data i.e.,
individual records.
• 1961: ISAM (Index Sequential Access Method) made fast record-oriented access
feasible and consequently leads to OLTP (On-line Transaction Processing)
computer systems.
ISAM
• ISAM is an advanced sequential file organization method. Using the primary key, the records are
sorted.
• For each primary key, an index value is generated and mapped with the record. This index is
nothing but the address of record in the file.
• If any record must be retrieved based on its index value, then the address of the data block is
fetched, and the record is retrieved from the memory.

ISAM-Pros and Cons
Pros of ISAM:
•In this method, each record has the address of its data block, searching a record in a
huge database is quick and easy.
•This method supports range retrieval and partial retrieval of records. Since the index is
based on the primary key values, we can retrieve the data for the given range of value.
In the same way, the partial value can also be easily searched, i.e., the student name
starting with 'JA' can be easily searched.
Cons of ISAM
•This method requires extra space in the disk to store the index value.
•When the new records are inserted, then these files must be reconstructed to
maintain the sequence.
•When the record is deleted, then the space used by it needs to be released. Otherwise,
the performance of the database will slow down.

First Database Revolution
By the early 1970s, two major models of DBMS were competing for
dominance.
• The network model was formalized by the CODASYL (Conference
on Data Systems Languages (CODASYL)) standard and implemented
databases such as IDMS (Integrated Database Management
System).
• The hierarchical model provided a somewhat simpler approach found
in IBM’s IMS (Information Management System).

A hierarchical database model is a data model in which the data are organized into a tree-like structure. The
data are stored as records which are connected to one another through links.
In order to retrieve data from a hierarchical database, the whole tree needs to be traversed starting from the
root node.

Hierarchical model for electronics gadgets

Network Model
• It allows a record to have more than

one parent and child record.
• This model is capable of handling

multiple types of relationships which
can help in modeling real-life
applications, for example, 1: 1, 1: M,
M: N relationships.

Hierarchical vs Network model
Second Database Revolution
In the late 1960s, Codd who is working at an IBM laboratory, found the
following drawbacks in First generation DBMS:
• Existing databases were too hard to use.
• Existing databases lacked a theoretical foundation.
• Existing databases mixed logical and physical implementations.
To overcome all these, he published a core ideas that defined the relational
database model that became the most significant model for database
systems for a generation.
New in Second generation
Key concepts of the relational model includes
1.Attribute: Each column in a Table. Attributes are the properties which define a
relation. e.g., Student_Rollno, NAME, etc.
2.Tables – In the Relational model, the relations are saved in the table format. It is
stored along with its entities. A table has two properties rows and columns. Rows
represent records and columns represent attributes.
3.Tuple – It is nothing but a single row of a table, which contains a single record.
4.Degree: The total number of attributes which in the relation is called the degree of
the relation.
5.Cardinality: Total number of rows present in the Table.
6.Column: The column represents the set of values for a specific attribute.
7.Relation instance – Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
8.Relation key - Every row has one, two or multiple attributes, which is called relation
key.
Key concepts in relational model

Relational Model: Advantages & Disadvantages
•Simplicity: A Relational data model in DBMS is simpler than the hierarchical and network
model.
•Structural Independence: The relational database is only concerned with data and not with
a structure. This can improve the performance of the model.
•Easy to use: The Relational model in DBMS is easy as tables consisting of rows and columns
are quite natural and simple to understand
•Query capability: It makes possible for a high-level query language like SQL to avoid
complex database navigation.
•Scalable: Regarding a number of records, or rows, and the number of fields, a database
should be enlarged to enhance its usability.
•Disadvantages:
•Few relational databases have limits on field lengths which can't be exceeded.

Benefits in Second generation
Database normalization is a process in which we modify the

complex database into a simpler database.
Dr. Karthika Natarajan

5/23/2022 19
New in Second generation
Other important Key concepts of the relational model include:
• Constraints
• Operations
• Normal forms
•popular Relational Database management systems
•DB2 and Informix Dynamic Server - IBM

•Oracle and RDB – Oracle
•SQL Server and Access - Microsoft

Transaction Models
Jim Gray defined the most widely accepted transaction model in the late 1970s. This soon
became popularized as ACID transactions
• Atomic: The transaction is indivisible - either all the statements in the transaction are
applied to the database or none are.
• Consistent: The database remains in a consistent state before and after transaction
execution.
• Isolated: While multiple transactions can be executed by one or more users
simultaneously, one transaction should not see the effects of other in-progress
transactions.
• Durable: Once a transaction is saved to the database, its changes are expected to
persist even if there is a failure of operating system or hardware.

Atomicity

Consistent
In case the value read by B and C is $300, which means that data is inconsistent because
when the debit operation executes, it will not be consistent.

Isolation
account A is making T1 and T2 transactions to account B and C, but both are executing independently without
affecting each other. It is known as Isolation.

2000s-nosql
• In 1998, the term NoSQL (not only structured query language) was coined.
• It refers to databases that use query language other than SQL to store and
retrieve data.
• NoSQL databases are useful for unstructured data.
• NoSQL allows faster processing of larger, more varied datasets.
• NoSQL databases are more flexible than the traditional relational databases.

Third Database Revolution
By 2005, Google was by far the biggest website in the world.

When Google began, the relational database was already well established, but
it was inadequate to deal with the volumes and velocity of the data confronting
Google.
• In 2003, Google revealed details of the distributed file system
GFS(Google File System)
• In 2004, it revealed details of the distributed parallel processing
algorithm “MapReduce”
• In 2006, Google revealed details about its BigTable distributed structured
Database.
• In 2007, HADOOP project is developed.

Drawbacks in Second Database Revolution
• Even the most expensive commercial RDBMS such as Oracle could not
provide sufficient scalability to meet the demands of large web sites.
• To overcome this major issue, distributed databases has been introduced.
• “Sharding” involves partitioning the data across multiple databases based

on a key attribute, such as the customer identifier.
• Sharding at sites like Facebook has allowed a MySQL-based system to

scale up to massive levels, but the downsides of doing this are immense.
Many relational operations and database-level ACID transactions are lost.

Cloud Computing
• Between 2006 and 2008, Amazon rolled out Elastic Compute Cloud (EC2).
• EC2 made available virtual machine images hosted on Amazon’s hardware

infrastructure and accessible via the Internet.
• Amazon added other services such as storage (S3, EBS), Virtual Private Cloud
(VPC), a MapReduce service (EMR), and so on.
• The entire platform was known as Amazon Web Services (AWS) and was the first
practical implementation of an Infrastructure as a Service (IaaS) cloud.
• AWS became the inspiration for cloud computing offerings from Google, Microsoft, and
others.

Document Databases
• The impedance mismatch between object-oriented and relational models, leads to

Object relational mapping systems.
• This was enabled by the programming style known as AJAX (Asynchronous JavaScript
and XML), in which JavaScript within the browser communicates directly with a backend
by transferring XML messages.
• XML was soon superseded by JavaScript Object Notation (JSON), which is a self-describing
format similar to XML but is more compact and tightly integrated into the JavaScript
language.
• The databases which supports JSON may directly create, access the database and
eliminates the role of relational middleman. Later these became as “Document Databases”.
• CouchBase and MongoDB are two popular JSON-oriented databases.

NewSQL
In 2007, Michael Stonebraker and his team proposed a number of variants on the
existing RDBMS design.
• H-Store described a pure in-memory distributed database
• C-Store specified a design for a columnar database.
Both these designs were extremely influential in the years to come and are the first
examples of what came to be known as NewSQL database systems
NewSQL databases that retain key characteristics of the RDBMS but that diverge from
the common architecture exhibited by traditional systems such as Oracle and SQL
Server.

The Nonrelational Explosion
At the conclusion, dozens of new database systems like such as MongoDB,

Cassandra, and HBase emerged due to the drawbacks of relational databases.
These new breeds of database systems lacked a common name “Distributed

Non-Relational Database Management System” (DNRDBMS).
However, in late 2009, the term NoSQL quickly caught on as shorthand for any
database system that broke with the traditional SQL database.

The Database technologies

What is NoSQL?
• NoSQL database, also called Not Only SQL, is an approach to data

management and database design that's useful for very large sets of
distributed data.
• NoSQL is not a relational database.
• A relational database model may not be the best solution for all situations.
• The easiest way to understand NoSQL, is that of a database which does

not adhering to the traditional relational database management system
(RDMS) structure.

What is NoSQL?
• The most popular NoSQL database is Apache Cassandra.
• Cassandra, which was once Facebook’s proprietary database, was

released as open source in 2008.
• Other NoSQL implementations include SimpleDB, Google BigTable,

Apache Hadoop, MapReduce, MemcacheDB, and Voldemort.
• Companies that use NoSQL include NetFlix, LinkedIn and Twitter.

Why we should use NoSQL?
There are several reasons why people consider using a NoSQL database.
• Application development productivity.

• Large data.
• Analytics.
• Scalability.
• Massive write performance.
• Fast key-value access.
• Flexible data model and flexible datatypes.
• Schema migration.
• Write availability.
• Easier maintainability, administration and operations.
• Generally available parallel computing.
• Programmer ease of use.
• Distributed systems and cloud computing support.

SQL vs NoSQL
SQL NoSQL
Relational Databases (RDBMS) Non-relational or distributed database

Document based, key-value pairs, graph
Table based databases
databases or wide-column stores
Have dynamic schema for unstructured
Have predefined schema
data
Vertically scalable Horizontally scalable

Scalability is managed by increasing the Scalability is managed by adding few more
CPU, RAM, SSD, etc servers easily in your NoSQL database
Uses UnQL (Unstructured Query
Uses SQL (structured query language) Language). The syntax of using UnQL
varies from database to database
SQL vs NoSQL
SQL NoSQL
MySql, Oracle, Sqlite, Postgres and MS- MongoDB, BigTable, Redis, RavenDb,
SQL Cassandra, Hbase, Neo4j and CouchDB
Not good fit for complex queries (NoSQL
Good fit for the complex query
don’t have standard interfaces)
Not best fit for hierarchical data storage Fits better for the hierarchical data storage
Best fit for heavy duty transactional type

Not fit for heavy transactional applications
applications

SQL vs NoSQL
SQL NoSQL
Excellent support are available for all SQL
Still have to rely on community support
database
Emphasizes on ACID properties Follows the Brewers CAP theorem
(Atomicity, Consistency, Isolation (Consistency, Availability and
and Durability) Partition tolerance )
Classified on the basis of way of storing
data as graph databases, key-value store
Classified as either open-source or close-
databases, document store databases,
sourced column store database and XML
databases.

Summary

• Database revolutions
• First generation Database
• Second generation Database
• Third generation Database
• What is NoSQL?
• Comparison between SQL and NoSQL?

Course code : CSE3009
Course title : No SQL Data Bases
Module :1
Topic :2
Managing Transactions and Data

Integrity
5/23/2022 Dr. N. Karthika 1

Objectives

• Understanding essentials of ACID transactions
• Applying transactional guarantee in distributed systems
• Understanding Brewer’s CAP Theorem
• Exploring transactional support in NoSQL products

RDBMS and ACID
ACID is the key standard and feature of RDBMS

• Atomicity – The entire transaction takes place at once or does
not happen at all.
• Consistency – The database must be consistent before and after
the transaction.
• Isolation – Multiple transactions occur independently without
Interference.
• Durability – Once a transaction is saved to the database, its changes
are expected to persist even if there is a failure of operating system or
hardware
Local vs. Distributed Transactions
• Operations that are part of a transaction can all execute in a single

participating resource or span across multiple participating resources.
Hence, transactions can be local or distributed.
• In local transactions, operations execute in the same resource While in
distributed transactions, operations are spread across multiple resources.

ACID in Distributed Systems
In distributed systems, the ACID principles are achieved using the concept of transaction manager
(responsible for managing coordinators for many transactions) or coordinator (who is responsible for
governing the outcome of the transaction) to manage transactions.
A coordinator communicates with enrolled participants to inform them of the desired termination
requirements, i.e., whether they should accept (e.g., confirm) or reject (e.g., cancel) the work done within the
scope of the given transaction.
For example, whether to purchase the (provisionally reserved) flight tickets for the user or to release them.
An application/client may wish to terminate a transaction in a number of different ways (e.g., confirm or
cancel).
Then the coordinator will attempt to terminate in a manner consistent with that desired by the client, it is
ultimately the interactions between the coordinator and the participants that will determine the actual final
outcome.
The initiator of the transaction (e.g., the client) communicates with a transaction manager and asks it to start
a new transaction and associate a coordinator with the transaction.
ACID in Distributed Systems
The ACID principles are applied using the concept laid down by the open XA
eXtended Architecture released in 1991 by X/Open consortium. The goal of this
specification is to provide atomicity in global transactions involving heterogeneous
components.
It specifies the need for a transaction manager or coordinator to manage

transactions.
Even with a central coordinator, implementing isolation across multiple

databases is extremely difficult. This is because different databases provide
isolation guarantees differently.

Techniques to implement ACID in Distributed
Systems
Two-phase locking (2PL) is a style of locking in distributed transactions where locks
are only acquired (and not released) in the first phase and locks are only released
(and not acquired) in the second phase.
• Phase 1: Growing
➢ Each txn requests the locks that it needs from the DBMS’s lock manager.
➢ The lock manager grants/denies lock requests.
• Phase 2: Shrinking
➢ The txn is allowed to only release locks that it previously acquired. It cannot
acquire new locks.

SS2PL (Strong Strict Two-Phase Locking)is a special case of a technique called
commitment ordering.
The Principle of Commitment Ordering or Guaranteeing Serializability in a
Heterogeneous Environment of Multiple Autonomous Resource Managers Using
Atomic Commitment.
• The txn is not allowed to acquire/upgrade locks after the growing phase
finishes.
• A schedule is strict i.e. the locking protocol releases both write (exclusive) and
read (shared) locks applied by a transaction only after the transaction has
ended, i.e., only after both completing executing (being ready) and becoming
either committed or aborted.
Two-phase commit (2PC) is a technique where the transaction coordinator verifies

with all involved transactional objects in the first phase and actually sends a
commit request to all in the second.
Two-phase commit (2PC)
A two-phase commit protocol is required to

guarantee that all of the action participants either
commit or abort any changes made.
During phase 1, the action coordinator, C, attempts

to communicate with all of the action participants,
A and B, to determine whether they will commit or
abort.
If the action will commit, the coordinator records

this decision on stable storage, and the protocol
enters phase 2.
An abort reply from any participant, causing the

entire action to abort.

• When each participant receives the coordinator’s phase 1 message, they record
sufficient information on stable storage to either commit or abort changes made
during the action.
• each participant who returned a commit response must remain blocked until it
has received the coordinator’s phase 2 message. Until they receive this message,
these resources are unavailable for use by other actions.
• If the coordinator fails before delivery of this message, these resources remain
blocked. However, if crashed machines eventually recover, crash recovery
mechanisms can be employed to unblock the protocol and terminate the action.
5/23/2022 Dr. N. Karthika 10

Factors to consider
Assuring ACID-like guarantees in distributed systems is to understand how

the following three factors get impacted in such systems:
• Consistency
• Availability
• Partition Tolerance
Consistency, Availability, and Partition Tolerance (CAP) are the three pillars
of Brewer’s Theorem that underlies much of the recent generation of
thinking around transactional integrity in large and scalable distributed
systems.
5/23/2022 Dr. N. Karthika 11

Brewer’s Theorem
• Consistency, in distributed environment, means all client
programs who are reading the data from the cluster see
the same data at any given point of time. Two clients
fetching data from two nodes should not see different data
at all.
• Availability you should be able to retrieve the data you
stored in distributed system, no matter what happens inside
cluster. If you make a request, then you must get a response
from system; even if a node (or many nodes) in cluster goes
down.
• Partition Tolerance means that the cluster (as whole) continues to

function even if there is a “partition” (communications break)
between two nodes (both nodes are up, but can’t communicate).
5/23/2022 Dr. N. Karthika 12

•The CAP Theorem states that a distributed system can only meet 2 of 3
properties. So there might only be CA, AP, or CP systems. We can’t guarantee the
third property will be achieved while the other two properties are already
guaranteed. Consequently, no CAP distributed systems exist.
5/23/2022 Dr. N. Karthika 13

CASE STUDY-Example 1
You are asked to design a distributed cluster of 4 data nodes. Replication factor is 2 i.e., any data written in
cluster must be written on 2 nodes; so, when one goes down – second can serve the data. Now try to apply
CAP theorem on this requirement.
In distributed system, two things may happen anytime i.e., node failure (hard disk crash) or network failure
(connection between two nodes go down)
CP [Consistency/Partition Tolerance] Systems

• In distributed system, at the time of reading the data, consistency is determined by a voting kind of
mechanism, where all nodes who have copy of data mutually agree that they have “same copy” of
requested data.
• Now let’s assume that our requested data is present in two nodes N1 and N2.
Client tries to read the data; and our CP system is partition tolerant as well, so an expected network failure
occurred and N2 is detected as DOWN. Now system cannot determine that N1’s data copy is latest or not; it
may be stale as well. So, system decides to send an ERROR event to client.
• Here system chose to prefer data consistency over data availability.
• Similarly, at time of writing the data if replication factor is 2, then system may reject write requests until it
finds two healthy nodes to write data fully in consistent manner.
5/23/2022 Dr. N. Karthika 14

continued
AP [Availability/Partition Tolerance] Systems

• What if in above scenario, system instead of sending ERROR (in case N2 is down); it sends the
data received from N1. Well client got the data, but was it latest data copy stored in system in
past?? You cannot decide. You chose availability over consistency. These are AP systems.
CA [Consistency/Availability] Systems
• In a distributed environment, we cannot avoid “P” of CAP. So, we have to choose between CP or
AP systems. If we desire to have a consistent and available system, then we must forget about
partition tolerance and it’s possible only in non-distributed systems such as oracle and SQL
server.
In today’s world, we can

achieve all 3 in a distributed
system (if not fully, then
partially at least). E.g., Tunable
consistency in Cassandra
5/23/2022 Dr. N. Karthika 15

Example-CASE STUDY
Let’s start with a single node. Let’s suppose we work in a call center that provides a logging
service.
We receive calls from customers who need to register some activities (write operation).
They’ll also be able to call back to ask to remember the things they told us before (read
operation).
We decide to write all the information in a single notebook.
Everything works fine, but the number of customers starts growing every day. We have to put
the calls in a large queue, but we understand we lose the customers who can’t wait for hours
until we answer their call.
We decide to invite a friend to join our logging service. He will also answer the phone, and write
and read requests from his own notebook. Notice that now we have two different notebooks.
Everything goes much better for a while until a problem arises.
5/23/2022 Dr. N. Karthika 16

AP System
We received a call from a customer who asked us to write down his wife’s birthday. After some months he
called back because he knew his wife’s birthday was coming up, but he couldn’t remember the date. This time
our friend answered the call. He didn’t have the information about the birthday in his notebook because we
didn’t share it. The customer got mad:
In the above image, the phone represents the

customer. The LS1 and LS2 are the two nodes of a
logging service, us and our friend.
The above situation is called inconsistency.

For now, our system is an AP system because it is
available and partition tolerant.
5/23/2022 Dr. N. Karthika 17

CP System
Now we’ve decided to deal with the inconsistency. After each call from a customer to either us or our friend,
we write everything in our notebooks and then call each other to synchronize the information. So now the
customers will always get the true information from the service.
There also might be a situation when we get sick and can’t work for a day or more. In such cases, our friend
first checks if he can call us to synchronize our notebooks. If we are unavailable, then he emails us.
When we get back to work, we’ll first check the email to fill the missing data in our notebook:
As a result, the customers will have to wait

for an answer from the service. We’ll have
to spend time calling each other or writing
down the information from the email to
refresh our notebooks. The service becomes
a bit unavailable. This is a CP system now.
5/23/2022 Dr. N. Karthika 18

CA System
• Now our friend decides to quit, but the logging service continues to work.
• We have just one phone and one notebook, just like at the very beginning. As we may notice, the system is
still available. We’ll continue answering the calls and logging them into our notebook:
Thus, our system is consistent. The consistency is reached by having a single source, one notebook. We don’t need
to synchronize with our friend’s notebook anymore. However, the system is not partitioned because we are the
single node in our logging service. This is a CA system. If we take on another friend for the job due to a large
number of customers, then we’ll have to choose between either availability or consistency again.
5/23/2022 Dr. N. Karthika 19

Conclusion
• we’ll always have to choose between
system properties.
• No perfect systems exist.
• We should always build a system
depending on the requirements, types,
and frequency of operations in it.
• The idea is to find a balance of C, A,
and P.
5/23/2022 Dr. N. Karthika 20

Upholding CAP - Conclusion
The choices could be as follows:

• Option 1 - Availability is compromised but consistency and partition
tolerance are preferred over it.
• Option 2 - The system has little or no partition tolerance. Consistency

and availability are preferred.
• Option 3 - Consistency is compromised but systems are always
available and can work when parts of it are partitioned.
5/23/2022 Dr. N. Karthika 21

Conclusion
Strong consistency suggest declined to consider NoSQL databases seriously due

to its relaxed consistency configurations.
Though consistency is an important requirement in many transactional systems,

the strong-or-nothing approach has created a lot of fear, uncertainty, and doubt
among users.
Eventual consistency has its place and should be used where it safely provides
high availability under partition conditions.
5/23/2022 Dr. N. Karthika 22

Summary

• Understanding essentials of ACID transactions
• Applying transactional guarantee in distributed systems
• Understanding Brewer’s CAP Theorem
• Exploring transactional support in NoSQL products
5/23/2022 Dr. N. Karthika 23

Module1-Topic1-Data Base Revolutions-Merged

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module1-Topic1-Data Base Revolutions-Merged

Uploaded by

Copyright:

Available Formats

Course code : CSE3009

Course title : No SQL Data Bases

Introduction to NoSQL Concepts

Dr. Karthika Natarajan 5/23/2022 1

This session will give the knowledge about

Dr. Karthika Natarajan 5/23/2022 2

• Databases are a foundational element of the modern world. We interact

• The concept of a database existed long before computers. In these times,

Dr. Karthika Natarajan 5/23/2022 3

A database is a collection of data, typically describing the activities of one or more

A database management system, or DBMS, is software designed to assist in

Dr. Karthika Natarajan 5/23/2022 4

Dr. Karthika Natarajan 5/23/2022 5

Dr. Karthika Natarajan 5/23/2022 7

Dr. Karthika Natarajan 5/23/2022 8

Dr. Karthika Natarajan 5/23/2022 9

Dr. Karthika Natarajan 5/23/2022 10

Dr. Karthika Natarajan 5/23/2022 11

• It allows a record to have more than

• This model is capable of handling

Dr. Karthika Natarajan 5/23/2022 12

Dr. Karthika Natarajan 5/23/2022 17

Dr. Karthika Natarajan 5/23/2022 18

Database normalization is a process in which we modify the

Dr. Karthika Natarajan

•popular Relational Database management systems

•DB2 and Informix Dynamic Server - IBM

Dr. Karthika Natarajan 5/23/2022 20

Dr. Karthika Natarajan 5/23/2022 21

Dr. Karthika Natarajan 5/23/2022 22

Dr. Karthika Natarajan 5/23/2022 23

Dr. Karthika Natarajan 5/23/2022 24

Dr. Karthika Natarajan 5/23/2022 25

By 2005, Google was by far the biggest website in the world.

Dr. Karthika Natarajan 5/23/2022 26

• To overcome this major issue, distributed databases has been introduced.

• “Sharding” involves partitioning the data across multiple databases based

• Sharding at sites like Facebook has allowed a MySQL-based system to

Dr. Karthika Natarajan 5/23/2022 27

• EC2 made available virtual machine images hosted on Amazon’s hardware

Dr. Karthika Natarajan 5/23/2022 28

• The impedance mismatch between object-oriented and relational models, leads to

Dr. Karthika Natarajan 5/23/2022 29

Dr. Karthika Natarajan 5/23/2022 30

At the conclusion, dozens of new database systems like such as MongoDB,

These new breeds of database systems lacked a common name “Distributed

Dr. Karthika Natarajan 5/23/2022 31

Dr. Karthika Natarajan 5/23/2022 32

• NoSQL database, also called Not Only SQL, is an approach to data

• NoSQL is not a relational database.

• The easiest way to understand NoSQL, is that of a database which does

Dr. Karthika Natarajan 5/23/2022 33

• The most popular NoSQL database is Apache Cassandra.

• Cassandra, which was once Facebook’s proprietary database, was

• Other NoSQL implementations include SimpleDB, Google BigTable,

• Companies that use NoSQL include NetFlix, LinkedIn and Twitter.

Dr. Karthika Natarajan 5/23/2022 34

• Application development productivity.

Dr. Karthika Natarajan 5/23/2022 35

Relational Databases (RDBMS) Non-relational or distributed database

Vertically scalable Horizontally scalable

Best fit for heavy duty transactional type