Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Big Data Analytics(BDA)

GTU #3170722

Unit-2

NoSQL
 Outline
Looping
• What is NoSQL?
• Used of NoSQL
• Types of NoSQL DB
• Key-value Oriented
• Graph Oriented
• Column Oriented
• Document Oriented
• Why NoSQL
• Advantages and Features of NoSQL
• Use of NoSQL in Industry
• SQL vs NoSQL
• NewSQL
What is NoSQL?
 NoSQL (commonly known as "Not Just SQL") represents a completely different database
framework that can achieve high-performance and agile processing of large-scale information.
 In other words, it is a database infrastructure, very suitable for the huge needs of big data.
 The efficiency of NoSQL can be achieved because, unlike highly structured relational
databases, NoSQL databases are inherently unstructured, which makes up for the strict
consistency requirements for speed and agility.
 NoSQL focuses on the concept of distributed databases, where unstructured data can be stored
on multiple processing nodes, and usually on multiple servers.
 This distributed architecture allows NoSQL databases to scale horizontally; as the data
continues to grow, just add more hardware to keep up without reducing performance.
 NoSQL Distributed Database Infrastructure has always been a solution for handling some of the
largest data warehouses on the planet, such as Google, Amazon, and the Central Intelligence
Agency.
Where is NoSQL used?
 NoSQL databases are widely used in big data and other real-time web applications.
 NoSQL databases is used to stock log data which can then be pulled for analysis. Likewise, it is
used to store social media data and all such data which cannot be stored and analyzed
comfortably in RDBMS.
Non-relational data storage
systems

Log Analysis No fixed table schema

Where to used NoSQL? Social Networking Feeds NoSQL No joins

Time based data No multi-document transactions

Relaxes one or more ACID properties


Types of NoSQL
 Traditional RDBMS uses SQL syntax to store and retrieve data from SQL databases.
 They all use a data model that has a different structure than the traditional row-and-column
table model used with relational database management systems (RDBMSs).
 Instead, a NoSQL database system encompasses a wide range of database technologies that
can store structured, semi-structured, unstructured and polymorphic data.
1. Key-Value Pair Oriented
2. Document Oriented
3. Column Oriented
4. Graph Oriented
Key-Value Pair Oriented
 Key-value Stores are the simplest type of NoSQL database.
 Data is stored in key/value pairs.
 It uses keys and values to store the data. The attribute name is stored in ‘key’, whereas the
values corresponding to that key will be held in ‘value’.
 In Key-value store databases, the key can only be string, whereas the value can store string,
JSON, XML, Blob(Binary Large Object), etc. Due to its behavior, it is capable of handling
massive data and loads.
 The use case of key-value stores mainly stores user preferences, user profiles, shopping carts,
etc. Key Value
First Name Rahul
Last Name Patel

 DynamoDB, Riak, Redis are a few famous examples of Key-value store NoSQL databases.
Document Oriented
 Document Databases use key-value pairs to store and retrieve data from the documents.
 A document is stored in the form of XML and JSON (JavaScript Object Notation).

 Data is stored as a value. Its associated key is the unique identifier for that value.
 The difference is that, in a document database, the value contains structured or semi-structured
data.
 This structured/semi-structured value is referred to as a document and can be in XML, JSON or
BSON (Binary JSON) format.
 Examples of Document databases are – MongoDB, OrientDB, Apache CouchDB, IBM Cloudant,
CrateDB, BaseX, and many more.
Column-Oriented
 Column-oriented databases work on columns and are
based on BigTable paper by Google.
 Every column is treated separately. Values of single
column databases are stored contiguously.
 They deliver high performance on aggregation queries
like SUM, COUNT, AVG, MIN etc. as the data is readily
available in a column.
 Column-based NoSQL databases are widely used to
manage data warehouses, business intelligence, CRM,
Library card catalogs,
 HBase, Cassandra, Hypertable are NoSQL query
examples of column based database.
Graph Oriented
 Graph databases form and store the relationship of the data.
 Each element/data is stored in a node, and that node is
linked to another data/element.
 A typical example for Graph database use cases is
Facebook.
 It holds the relationship between each user and their further
connections.
 Graph databases help search the connections between data
elements and link one part to various parts directly or
indirectly.
 The Graph database can be used in social media, fraud
detection, and knowledge graphs. Examples of Graph
Databases are – Neo4J, Infinite Graph, OrientDB, FlockDB,
etc.
Why NoSQL?
 In recent times you can easily capture and access data from various sources, like Facebook,
Google, etc.
 User’s personal information, geographic location data, user generated content, social graphs
and machine logging data are some of the examples where data is increasing rapidly.
 To use above mentioned properties, it is necessary to process large volume of data.
 For which relational databases are not suitable. The evolution of NoSQL databases is to handle
this large volume of data properly.
Why NoSQL?
 NoSQL database is optimum for processing massive volume data with distributed processing.
 NoSQL database supports failover mechanisms and ensures high availability.
 NoSQL database provides easy replication along with horizontally scalable capability.
 NoSQL database is capable of handling structured, semi-structured, and unstructured data.
 NoSQL databases can be installed on commodity hardware and can form clusters for
distributed processing.
 NoSQL database offers flexible schema and can be changed at runtime without service
downtime.
Features of NoSQL
 Few features of NoSQL databases are as follows:
1. They are open source.
2. They are non-relational.
3. They are distributed.
4. They are schema-less.
5. They are cluster friendly.
6. They are born out of 21st century web applications.
Advantages of NoSQL
 Schema Agnostic
 NoSQL databases are schema agnostic.
 Easy to designing your schema before you can store data in NoSQL databases.
 You can start coding, and store and retrieve data without knowing how the database stores and works
internally.
 Schema agnosticism may be the most significant difference between NoSQL and relational databases.

 Scalability
 Scalability is the measure of a system's ability to increase or decrease in performance and cost in response
to changes in application and system processing demands.
 NoSQL databases support horizontal scaling methodology that makes it easy to add or reduce capacity
quickly without tinkering with commodity hardware.
 This eliminates the tremendous cost and complexity of manual sharing that is necessary when attempting to
scale RDBMS.
Advantages of NoSQL Database – Cont.
 Performance
 Some databases are designed to operate best (or only) with specialized storage and processing hardware.
 With a NoSQL database, you can increase performance by simply adding cheaper servers, called commodity
servers.
 This helps organizations to continue to deliver reliably fast user experiences with a predictable return on
investment for adding resources again, without the overhead associated with manual sharing.

 High Availability
 NoSQL databases are generally designed to ensure high availability and avoid the complexity that comes with
a typical RDBMS architecture, which relies on primary and secondary nodes.
 Some ‘distributed’ NoSQL databases use a masterless architecture that automatically distributes data
equally among multiple resources so that the application remains available for both read and write
operations, even when one node fails.
 Relationships are less complicated.
 It is used in distributed computing environment.
 Implementation is less costly. It provides storage for semi-structured data and it is also provide
flexibility in schema.
Use of NoSQL in industry
 Session Store
 Managing session data using relational database is very difficult, especially in case where
applications are grown very much.
 In such cases the right approach is to use a global session store, which manages session information
for every user who visits the site.
 NOSQL is suitable for storing such web application session information very is large in size.
 Since the session data is unstructured in form, so it is easy to store it in schema less documents
rather than in relation database record.
 User Profile Store
 To enable online transactions, user preferences, authentication of user and more, it is required to
store the user profile by web and mobile application.
 In recent time users of web and mobile application are grown very rapidly. The relational database
could not handle such large volume of user profile data which growing rapidly, as it is limited to
single server.
 Using NOSQL capacity can be easily increased by adding server, which makes scaling cost effective.
Use of NoSQL in industry (Cont.)
 Content and Metadata Store
 Many companies like publication houses require a place where they can store large amount of data,
which include articles, digital content and e-books, in order to merge various tools for learning in single
platform.
 The applications which are content based, for such application metadata is very frequently accessed
data which need less response times.
 For building applications based on content, use of NoSQL provide flexibility in faster access to data and
to store different types of contents.
 Mobile Applications
 Since the smart phone users are increasing very rapidly, mobile applications face problems related to
growth and volume.
 Using NoSQL database mobile application development can be started with small size and can be easily
expanded as the number of user increases, which is very difficult if you consider relational databases.
 Since NoSQL database store the data in schema-less for the application developer can update the apps
without having to do major modification in database.
 The mobile app companies like Kobo and Playtika, uses NOSQL and serving millions of users across the
world.
Use of NoSQL in industry (Cont.)
 Third-Party Data Aggregation
 Frequently a business require to access data produced by third party. For instance, a consumer packaged
goods company may require to get sales data from stores as well as shopper’s purchase history.
 In such scenarios, NoSQL databases are suitable, since NoSQL databases can manage huge amount of
data which is generating at high speed from various data sources.

 Internet of Things
 Today, billions of devices are connected to internet, such as smart phones, tablets, home appliances,
systems installed in hospitals, cars and warehouses. For such devices large volume and variety of data is
generated and keep on generating.
 Relational databases are unable to store such data. The NOSQL permits organizations to expand
concurrent access to data from billions of devices and systems which are connected, store huge amount
of data and meet the required performance.

 E-Commerce
 E-commerce companies use NoSQL for store huge volume of data and large amount of request from user.
Use of NoSQL in industry (Cont.)
 Social Gaming
 Data-intensive applications such as social games which can grow users to millions. Such a growth
in number of users as well as amount of data requires a database system which can store such
data and can be scaled to incorporate number of growing users NOSQL is suitable for such
applications.
 NOSQL has been used by some of the mobile gaming companies like, electronic arts, zynga and
tencent.

 Ad Targeting
 Displaying ads or offers on the current web page is a decision with direct income to determine what
group of users to target, on web page where to display ads, the platforms gathers behavioral and
demographic characteristics of users.
 A NoSQL database enables ad companies to track user details and also place the very quickly and
increases the probability of clicks.
 AOL, Mediamind and PayPal are some of the ad targeting companies which uses NoSQL.
SQL Vs. NoSQL
SQL NoSQL
Relational database Non-relational, distributed database
Relational model Model-less approach
Pre-defined schema Dynamic schema for unstructured data
Table based databases Document-based or graph-based or wide column store or key–value pairs databases
Vertically scalable (by increasing system resources) Horizontally scalable (by creating a cluster of commodity machines)
Uses SQL Uses UnQL (Unstructured Query Language)
Not preferred for large datasets Largely preferred for large datasets
Not a best fit for hierarchical data Best fit for hierarchical storage as it follows the key–value pair of storing data similar to
JSON (Java Script Object Notation)
Excellent support from vendors Relies heavily on community support
Supports complex querying and data keeping needs Does not have good support for complex querying
Can be configured for strong consistency Few support strong consistency (e.g., MongoDB), some others can be configured for
eventual consistency (e.g., Cassandra)
Examples: Oracle, DB2, MySQL, MS SQL, PostgreSQL, Examples: MongoDB, HBase, Cassandra, Redis, Neo4j, CouchDB, Couchbase, Riak, etc.
etc.
NewSQL
 NewSQL is a modern relational database system that bridges the gap between SQL and
NoSQL.
 NewSQL databases aim to scale and stay consistent.
 NoSQL databases scale while standard SQL databases are consistent.
 NewSQL attempts to produce both features and find a middle ground.
 This covers two layers of data relational one and a key-value store.
 Advantages of NewSQL :
 It introduces new implementation to traditional relational databases.
 It brings together the advantages of SQL and NoSQL.
 It is easy to migrate between the type and needs of the user.

 Disadvantages of NewSQL :
 They offer partial access to rich traditional systems.
 It may cause a problem in-memory architecture for exceeding volumes of data.
 The core foundation of such databases is relational systems which make it tricky to understand.
Difference between NoSQL vs NewSQL
NoSQL NewSQL
NewSQL is schema-fixed as well as a schema-free
NoSQL is a schema-free database.
database.

It is horizontally scalable. It is horizontally scalable.

It possesses automatically high-availability. It possesses built-in high availability.

It supports cloud, on-disk, and cache storage. It fully supports cloud, on-disk, and cache storage.

Online Transactional Processing is not supported. Online Transactional Processing is fully supported.

There are low-security concerns. There are moderate security concerns.

Use Cases: Big Data, Social Network Applications,


Use Cases: E-Commerce, Telecom industry, and Gaming.
and IOT.

Examples : VoltDB, CockroachDB, NuoDB, ClustrixDB,


Examples : DynamoDB, MongoDB, RaveenDB etc.
Altibase, MemSQL, c-treeACE, Apache Trafodion, etc.

You might also like