Download as pdf or txt
Download as pdf or txt
You are on page 1of 138

BIG DATA AND ANALYTICS

Subject Code : 18CS72 CIE Marks : 40

Lecture Hours : 50 SEE Marks : 60

Credits : 04

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
10 Hours

NoSQL Big Data Management, MongoDB and Cassandra

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Introduction

• Big Data uses Distributed Systems


• Distributed system consists of multiple data nodes at clusters
• Tasks execute in a parallel with data nodes in clusters
• Computing nodes communicate with the application through network

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Introduction

Following are features of distributed computing architecture


1. Increased realiability and fault tollerance
2. Flexibility: Easy to install, implement and deug new services in
distributed environment
3. Sharding: storing the different parts of data onto different sets of
data nodes, clusters or servers
4. Speed:Computing power increases as shards run parallely on individual
data nodes in cluster independently

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Introduction

Following are features of distributed computing architecture


5. Scalability: Both horizontal and vertical sacaling possible
6. Resources Sharing: Memory, Machines and N/w, reduces the cost
7. Open systems: Service accessible to all nodes
8. Performance: Collection of processors increases performance

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Introduction

Demerits
1. Issues in troubleshooting in a larger networking infrastructure
2. Additional software requirements
3. Security risks for data and resources

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Introduction

Following are key terms used in Database Systems


1. Class: Template of a program codes that is extendable
Objects, Initial values for members (states), member functions
2.Object: Instance of class
3.Tupple: Ordered set of data, contributes a record
4. Transaction: execution of instructions in two interrelated entities (Query)
5. DB transactional model: model for transactions, following ACID properties
or BASE properties

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Introduction

Following are key terms used in Database Systems


6. My SQL: widely used open-source database excels as a
content management server
7.Oracle: Widely used object-relational DBMS, written in C++
8. DB2: Database server from IBM, has support to BigData analytics
9. Sybase: Database server based on relational model for business on UNIX
Sybase was the first enterprise level DBMS in Linux.

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Introduction

Following are key terms used in Database Systems


10. MS SQL : Microsoft developed RDBMS for enterprise level databases
that supports SQL and NoSQL
11. PostgreSQL: Enterprise level, object-relational DBMS.
PostgreSQL uses procedural language like Perl and
Python, in addition to SQL.

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

• SQL is a programming language based on relational algebra


• It is a declarative language and it defines the data schema
• SQL creates databases and RDBMSs
• RDBMSs uses tabular data store with relational algebra
• Relations are set of tuples
• Tuples are named attributes
• Tuple identifies by uniquely by candidate keys
• Transactions are exhibit ACID properties

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

ACID properties in SQL


1. Automicity: All operations in transaction must complete
If interrupted, must be undone/roll back
2. Consistency: Transaction must maintain integrity constraint
Follow consistency principle
3.Isolation: Each transaction must be separated
4. Durability: Transaction must persist once completed.

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Triggers, Views and Schedules in SQL Databases
• Trigger is a special stored procedure
• Executes when specific action/s occurs within DB, like change in table data
or actions like UPDATE, INSERT and DELETE

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Triggers, Views and Schedules in SQL Databases
• Views refers to logical construct, used in query statements or
• A view is a virtual or logical table that allows to view or manipulate
parts of the tables.
• A view saves a division of complex query instructions and
reduces query complexity
CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition;
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Triggers, Views and Schedules in SQL Databases
• Chronological sequence of instructions which executes concurrently
• When a transaction in schedule, then all instructions of the transaction
are included in the schedule
• Enables execution of multiple transactions in allotted time intervals

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Join in SQL Databases
• SQL DBs facilitate combining rows from two or more tables
based on related columns
• By using JOIN function during database transaction
SELECT KitKatSales
FROM TransactionTb1 INNER JOIN ACVMSalesTb1
ON TransactionsTab1. KitKatSales = ACVMSalesTb1. KitKatSales ;
• There are scalability and distributed design issues

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
SQL compliant format means
• Database tables constructed using SQL and they enable processing
queries written using SQL
NoSQL term coveys two different meanings
1. Does not follow SQL compliant formats
2. Not Only SQL use SQL compliant formats with variety of other
querying and access methods

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
New category of Data store is NoSQL, offers many features
• Schema flexibility
• Simple relationships
• Dynamic schemas
• Auto sharding
• Replication
• Integrated caching
• Horizontal scalability of shards
• Distributable tuples
• Semi-structures data and flexibility in approach

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Issues
• Lack of standardization in approaches
• Processing difficulties for complex queries
• Depends on eventually consistent results in place of consistency in all
states

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Big-Data NoSQL
• NoSQL DB does not require specialized storage and HW for processing
• Storage can be on cloud
• NoSQL records are in non-relational data store systems
• They use flexible data models
• The records use multiple schemas
• NoSQL data store are considered as semi-structured data

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
NoSQL Data Store characteristics
1. NoSQL is a class of non-relational data storage system with flexible
data model
Example: Key-value pairs
Name-value pairs
Column family big data store
Tabular data store
Cassandra (used in Facebook/Apache)
HBase
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
NoSQL Data Store characteristics
Example: Hash table [Dynamo (Amazon S3)]
Unordered keys using JSON (CouchDB)
JSON (PNUTS)
JSON (MongoDB)
Graph Store
Object Store
Ordered keys

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
NoSQL Data Store characteristics
2. NoSQL does not has fixed schema, such as table
Do not use concept of joins
Data written at one node can be replicated at multiple nodes

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Features in NoSQL Transactions
1. Relax one or more ACID properties
2. Characterize by two out of three properties of CAP theorem
Two are at least present for the Application/Service/Process
3. Can be characterized by BASE properties
Consistency: All copies have the same values

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
CAP theorem
Consistency: All copies have the same values or
All nodes observe same data at same time
Therefore, operations in one partition of the DB should reflects in other
related partitions in case of distributed DB.

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
CAP theorem
Availability: At least one copy is available in case a partition become
inactive or fails
Replication ensures availability
Network failure leads to unavailability

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
CAP theorem
Partitions: Division of large database into different databases without
affecting operations on them by adapting specific procedures

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Brewer’s Consistency Availability and Partition Tolerance
CAP theorem demonstrates that any distributed system cannot guarantee
C, A and P together

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Big-Data NoSQL Solutions

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Big-Data NoSQL Solutions

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Big-Data NoSQL Solutions

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Schema-less Models

- Schema refers to designing of a structure for datasets


- Data structures for storing into databases
- NoSQL data not necessarily have a fixed table schema
- NoSQL data model offers rlaxation in one or more ACID properties

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Characteristics of schema-less Models

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Increasing flexibility for Data Manipulation: BASE properties

• Basic Availability: Ensures by distribution of shards across many data


nodes with high degree of replication
• Soft state: Ensures processing even in the presence of inconsistencies
but achieving consistency eventually
• Eventually consistency: Consistency requirement inNoSQL databases
meeting at some point of time in future
BASE model is not necessarily appropriate in all cases but it is flexible

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store Example: Student Database in NoSQL

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
NoSQL Data Architecture Patterns
• Key-Value Store: High performance, Scalable and flexible
Data retrieval is fast in key-value store
- A Simple string called, key maps to large data string or BLOB
- Key-value store accesses use a primary key for accessing the values

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Advantages of Key-Value store
1. Any data type in the value field. Store Information BLOB of data
like text, hypertext, images, audio and video
2. Query just request the values and returns the values as single item,
values can be of any data type.
3. Key-value store is eventually consistent
4. May be hierarchical or may be ordered key-value store
5.Returned values can be used to convert into lists, table columns, data-
frame fields
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Advantages of Key-Value store
6.Scalable, Reliable, portable and low operational cost
7. Key can be synthetic or auto generated. Also flexible and in many format
i) Artificially generated strings created from hash of a value
ii) Logical path names to images or files
iii) REST web service calls (request response cycles)
iv) SQL queries

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Limitations
i) No indexes are maintained on values, thus a subset of values is not
searchable
ii) Key-value store does not provide traditional DB capabilities, such as
atomicity, consistency
iii) Maintaining unique values as keys may become more difficult when
the volume of data increases
iv) Queries cannot be performed on individual values. No clause like
where in a relational DB usable that filters a result set.
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Key-Value store provides client to read and write values using key as
follows
i) Get(key) returns value associated with key
ii) Put(key,value) associates value with key and updates a value if key
exists
iii) Multi-get(key1,key2,…,keyN) returns the list of values associated with
list of keys
iv) Delete(key) removes key and its value from data store

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Typical uses of key-value store are


1. Image store
2. Document or file store
3. Lookup table
4. Query cache

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Key-Value: Riak
• Open source Erlang language data store
• It is a key-value data store system
• Data auto-distributes and replicates in Riak
Amazon’s DynamoDB, Redis, Memcached and its flavors,
Berkely DB, upscaledb, Project Voldemort and Couchbase.

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Document Store: Characteristics
• High performance
• Flexible
• Scalability varies, depends on stored contents
• Complexity is low compared to tabular, object and graph data stores

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Document Store: Features
1. Stores unstructured data
2. Storage has similarity with object store
3. Data stores in nested hierarchies
Ex: JSON, XML DOM, BLOB, Document tree
4. Querying is easy
Ex: Section number, sub section number, figure caption, table headings
are used to retrieve document partitions

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Document Store: Features


5. No object relational mapping enables easy search by following paths
from root of document tree
6. Transactions on the document data store exhibit ACID properties

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Uses of Document Store


1. Office documents
2. Inventory store
3. Forms data
4. Document exchange
5. Document search

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Demerits of Document Store


- Incompatible with SQL
- Complex for implementation
Tools are
- CouchDB
- MongoDB
- Terrastore
- OrientDB
- RavenDB
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Document JSON Format in CouchDB Database


- Open source
- Provides mapping functions during querying, combining and filtering of
information
- Deploys JSON Data store model for documents. Each document
maintains separate data and metadata
- CouchDB is multi-master application. Write does not require field locking
during concurrency control.

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Document JSON Format in CouchDB Database


- COuchDB query language is Javascript.
- CouchDB access the documents using HTTP API
- Data replication results fault tolerance and reliability

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
XML Document Architecture Pattern and Discovering Hierarchical Structure

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Document Architecture Pattern and Discovering Hierarchical Structure

- Xpath and Xquery query languages for finding and extracting elements
and attributes of documents
- Xpath treats XML document as a tree of nodes
- Xpath queries are expressed in the form of Xpath expressions

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Document Architecture Pattern and Discovering Hierarchical Structure

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Tabular Data Store


- Uses rows and columns
- Row head field used as key to access multiple values from successive
columns in that row
- OLTP is fast on in-memory row-form data
- Relational DB store is in-memory row-form data
- Key in the first column of row is at memory address
- Values in successive columns at successive addresses, makes OLTP
easier
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Tabular Data Store


- All fields of row are accessed at a time together
- Makes data searching and accessing faster during transaction
processing

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Tabular Data Store


- In-memory column-based data has the keys (row-head keys) in first
column of each row at successive memory addresses
- The next column of each row after key has the values at successive
memory addresses, so on upto N columns
- Column based makes OLAP easier
- All fields of column access together

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Tabular Data Store : Column-Family Data Store


- Columnar Data Store
- Column families
- Column-family data store
- Sparse column fields
- Grouping column families
- Grouping into rows

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Tabular Data Store : Characteristics of Column-Family Data Store


- Scalability: Interface to field is simple(Row ID and Column name)
- Partitionability: row group, column group
- Availability: Cost of replication is lower
- Tree-like columnar structure: Columns, Columns-family,
Column-family groups
- Adding New Data easy:

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Tabular Data Store : Characteristics of Column-Family Data Store
- Querying all field values: in column, column-family,
column-family group
- Replication of column: default replication factor=3
- No optimization for join: Similar to sparse matrix
Uses
i) Web crawling ii) Large sparsely populated tables
iii) System that has high variance
Example :Googles BigTable
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Record Columnar (RC) File Format


Optimized Row Columnar (ORC) File format: row grouped data-stripes

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Parquet File Formats: nested hierarchical columnar storage concept.


Nesting sequence: Table, row group, column chunk and chunk page.

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Object Data Store :


An Object data store is a repository, which stores
1. Objects (Files, images, documents, folders and business reports
2. System metadata filename, creation_date, last_modified, language
used (C, C++, C# etc), access permissions, supported query language
3. Custom metadata subject, sharing permissions, catogery

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Object Data Store consists 11 functions supporting API for


1. Scalability
2. Indexing
3. Large collections
4. Querying language, processing and optimization
5. Transactions
6. Data replication for high availability, distribution model and integration
7. Schema evolution

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Object Data Store consists 11 functions supporting API for


8. Persistency
9. Persistent object life cycle
10. Adding modules
11. Locking and caching stratergy

Amazon S3 and MS Azure BLOB support object store

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Object Relational Mapping

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Graph Database
- Series of interconnected nodes
G = (E,V)
- Edges encode relationship between nodes
- Complexity is high
- Performance is variable with scalability
- Enables fast network search
- Linked data sets, like social media data

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Graph Database
- Querying for data uses Graph traversal along the paths
- Traversal may be single step, path expression or recursion
Characteristics
1. Use special query language (RDF uses SPARQL)
2. Can have hyper edge.
3. Relationship between node is consistent in Graph store
4. Have poor scalability

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Graph Database
Uses
- Link analysis
- Friend of friend queries
- Rules and inference
- Rule induction
- Pattern matching
Ex: Neo4j, AllegroGraph, HyperGraph, Infinte Graph, Titan and FlockDB

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Variations of NoSQL Architectural patterns


- Selected pattern may need variation due to business requirement
- Business requirements are ease of using pattern and long term
competitive advantage
- Kelly-McCreary, co founder of ‘NoSQL Now’ suggested that when
selecting a NoSQL pattern
- The pattern may need change and require variation to another patterns

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store

Variations of NoSQL Architectural patterns


Some reasons are
1. Focus changing from performance to scalability
2. Changing from modifiability to agility
3. Affording capacity, availability support, ability for searching and
monitoring the actions

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL Data Store
Steps for selecting pattern
1. Select an architecture
2. Perform difficulty analysis for each six patterns
Difficulties may be low, medium or high in following processes
i) Ingestion ii)Validation of structure and fields
iii) Updating process using batch or record by record approach
iv) Searching process is using full text or by changing sorting order
v) Export results in HTML, XML or JSON
3. Estimate total efforts for each pattern required
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL to Manage Big Data
NoSQL solutions for Big Data and Characteristics
1. High and easy scalability horizontally
2. Support to replication
3. Distributable
4. Usage less expensive servers
5. Usage of open source tools
6. Support schema less model
7. Support integrated caching
8. Flexible
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL to Manage Big Data
Types of Big Data problems
Problems arise due to limitations of NoSQL
1. Solutions must drop support for Database join
2. Open source tools may not have standards
- No stored procedure in MongoDB
- GUI mode tools not available in market
- Some scarify ACID compliancy

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
NoSQL to Manage Big Data

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Shared Nothing Architecture
- Big Data store consists SN architecture
- Node does not share data with other node
- So easily shards can be created
- Partition processes different queries on data of different users at each
node independently
- A coordination protocol controls the processing at all SN nodes.
- An SN architecture optimizes massive parallel data processing

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Shared Nothing Architecture
Features
- Independence: No memory sharing, possesses computational
self-sufficiency
- Self healing: Link failure causes creation of another link
- Each node functioning as shard: Each node stores a partition
- No network contention

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Shared Nothing Architecture
Choosing Distribution model: 1.Single Server Model
- Simplest, suits well for Graph DB
- Also used by key-value, column-family, BigTable
- An application executes data sequentially in Single server

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Shared Nothing Architecture
Choosing Distribution model: 2.Sharding very large Database
- Application process runs on multiple shards in parallel
- Performance improved
- In case a link failure, migrate the shard DB to
another node

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Shared Nothing Architecture
Choosing Distribution model: 3.Master Slave Distribution Model

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Shared Nothing Architecture
Choosing Distribution model: 4.Peer-to-peer (Cassandra)

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Shared Nothing Architecture

- Maste-Slave replication provides greater scalability for read


operations
- Replication provides resilience during read but not at write
operation
- Peer-to-peer replication provides resilience for both read and write

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
Ways of handling Big Data Problems

1. Evenly didtribute data on


cluster using hash rings
2. Use replication to horizontally
distribute client read-request
3. Moving queries to data, not data
to queries
4. Queries distribution to multiple
nodes

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB DataBase

- Opensource DBMS, used to create and manage DBs

- Manages collection and document data store

- Functions: Viewing, Querying, Changing (updating, inserting,

appending or deleting), Visualizing, Running the transactions

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB DataBase

1. Open source

2. Non-relational

3. NoSQL

4. Distributed

5. Document based (JSON like documents)

6. Cross platform

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB DataBase

7. Scalable

8. Flexible Data Model

9. Indexed

10. Multi-master

11. Fault tolerant

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB

Typical applications
• Content management and delivery system
• Mobile Applications
• User Data management
• Gaming
• E-Commerce
• Analytics
• Archiving and logging

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB
Features
1. Physical container for collections: Each DB get its own
set of files on the file system. Number of DBs can run on
single MongoDB Server.
DB server in MongoDB is mongod and client is mongo

2. Collection: Stores number of MongoDB documents


Documents of collection are schema-less

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB
Features
3. Document Model: Document is unit of data in MongoDB
Uses JSON to store data in documents
JSON data basically has key-value pairs
Have dynamic schema

4. Document Data Store: One collection holds different documents


No. of feilds, content and size of document
differ from one document to another
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB
Features
5. Storing of data is flexible: fields can vary from doc to doc
Data structure can be changed over time

6. Storing of documents on disk is in BSON serialization format.


BSON is binary representation of JSON documents

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB
Features
7. Querying, Indexing and real time aggregation :
allows accessing and analyzing data efficiently

8.Deep Query-ability: Supports dynamic queries on documents using


document-based query language
9. No complex joins
10. Distributed DB leads high availability, provides horizontal scalability

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB
Features
11.Indexes on any field in collection of documents:
support queries and operations
12. Atomic operations on single document
13.Fast in place updates: does not allocate new memory and write full
new copy of the object when update required
14. No configurable cache: Uses all free available memory
15. Conversion/mapping: application objects to data store objects not
needed
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB

Replication in MongoDB
- MongoDB replicates with help of replica set
- Replica set is group of mongod processes that store same dataset
- Replica set has has minimum 3 nodes
- Any one out of them called primary, remaining are secondary
- Primary node receives write operations
- Data replicates from primary to secondary

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB

Auto Sharding in MongoDB


- Distribute data across multiple machines
- Sharding provides additional write capability by distributing write load
over number of mongod instances
- Each shard is an independent DB
- Collections of DB forms single logical DB
- 1TB Data-> 20 shards, each 50GB

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB

Data types which MongoDB documents supports

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB Data types which MongoDB documents supports

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB Data types which MongoDB documents supports

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB Features of MongoDB wrt RDBMS

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB MongoDB Query Commands

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB MongoDB Query Commands

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB: Sample usage of Commands

To create Database
use DATABASE >use tata
To check your currently selected database
db >db
To check your databases list
show dbs >show dbs
To drop a database
db.dropDatabase()
>use tata
>db.dropDatabase()

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB: Sample usage of Commands

To create a collection
>db.COLLECTION_NAME.insert(document)

db.tata.insert({
"type": "car",
"model": "2018",
“name": “Nexon",
})

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB: Sample usage of Commands

To add array in collection db.tata.insert( [ {


"type": "car",
"model": "2018",
“name": “Nexon",
},
"type": "car",
"model": "2020",
“name": “Safari,
}
"type": "car",
"model": "2021",
“name": “Punch",
}
] )

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
MongoDB: Sample usage of Commands
To view all collection document
db.COLLECTION_NAME.find()

>db.tata.find()
To update a document
db.COLLECTION_NAME.update(SELECTION_CRITERIA, UPDATED_DATA)

db.tata.update({‘model':2020'}, {$set:{‘name':Harrier'}},{multi:true})

To delete a document
db.COLLECTION_NAME.remove(DELLETION_CRITTERIA)

db.tata.remove((“model”:2020))
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
CassandraDB

- Developed by Facebook and released by Apache


- Later, IBM released enhancement of Cassandra, as open source version
- Includes IBM Data Engine which process NoSQL data store
- Cassandra column family database
- Has distributed design of Dynamo, written in Java
- Facebook, IBM, Twitter, Cisco, Rackspace, eBay, Netflix adopted

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: Characteristics
- Open source
- Scalable
- Non-relational
- NoSQL
- Distributed
- Column based
- Decentralized
- Fault tolerant
- Tuneable Consistency
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: Features

- Maximizes the number of writes – writes are not very costly


- Maximizes data duplication
- Does not support joins, group by, OR clause and aggregations
- Uses classes consisting of ordered keys and semi-structured data
storage systems
- No Master node
- Peer to peer distribution

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: Features

- Stores and hadles massive data of structured, semi-structured and


unstructured format

- It contains set of prgrams for create and manage the Databases

- Provides functions for Querying, viewing, changing, visualize and


transactions performing

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: Data Replication

- Stores data on multiple nodes, so no single point of failure


- Data replication uses a replication stratergy
- Replication factor determines no. of replicas placed on different nodes
- Cassandra returns most recent value of the client
- If any node responds stale value, Cassandra detects and performs
a read repair to update stale values

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: Components

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: Components

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: Features

Scalability:
- Provides linear scalability which increases the throughput
- Decreases response time on increase in number of nodes at cluster

Transaction Support:
- Supports ACID properties

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: Features

Replication Option: Two strategy’s


1. Simple strategy
Simply specify replication factor for cluster

2. Network Topology strategy


Setting replication factor for each data center independently

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
CassandraDB: Data Types

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
CassandraDB: Data Types

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra
CassandraDB: Data Types

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: Data Model


- Based on Google’s BigTable
- Each value maps with two strings (row key, column key) and timestamp
- DB can considered as sparse distributed, multi-dimensional sorted map
- Google file system splits table into multiple tablets along a row
- Each tablet called META1 tablet
- Each META1 tablet maximum size is 200MB
- META0 is master-server
- META1 tablet retrieved by META0 server through querying

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: Data Model

Cassanra Data Model consists 4 main components


1. Cluster: made up of multiple nodes and keyspaces

2. Keyspace: Namespace to to group multiple column families, one per partition

3. Column: Consists column name, value and timestamp

4. Column-family: Multiple columns with row key reference


Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: DDL Commands

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: DDL Commands

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: DDL Commands

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: CONSISTENCY command


- Consistency command shows current consistency level

CONSISTENCY <LEVEL> Sets new level consistency


Valid consistency levels are
ALL ANY ONE
TWO THREE QUORUM
LOCAL_ONE LOCAL_QUORUM EACH_QUORUM
SERIAL LOCAL_SERIAL
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: CONSISTENCY command

ALL : Highly consistent. A write must be written to commitlog and memtable


on all replica nodes in the cluster

ANY : A write must be written to commitlog and memtable of at least one node

ONE : A write must be written to commitlog and memtable of at least one


replica node

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: CONSISTENCY command


TWO, THREE : (Same as one, but) A write must be written to commitlog
& memtable of at least two or three replica nodes respectively

QUORUM : A write must be written to commitlog and memtable on quorum of


replica nodes in all data centers

LOCAL_ONE : A write must be written for at least one replica node in the local
data center
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: CONSISTENCY command


LOCAL_QUORUM: A write must be written to commitlog and memtable
on quorum of replica nodes in the same data center

EACH_QUORUM: A write must be written to commitlog and memtable


on quorum of replica nodes in all data centers

SERIAL : Linearizable consistency to prevent unconditional update

LOCAL_SERIAL : Same as serial but restricted to local data center


Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: Keyspaces

- An object that contains all column families as bundle

- Generally there is one keyspace per application

- Keyspace in Cassandra is a Namespace that defines data replication on


nodes

- A cluster contains one keyspace per node


Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: Keyspaces

CREATE KEYSPACE <keyspace name>

WITH replication = { ‘class’ : ‘<Strategy name>’,


‘replication_factor’ : ‘<No. of replicas>’ }

AND durable_writes = ‘<TRUE/FALSE>’;

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: Keyspaces

ALTER KEYSPACE <Keyspace Name>

WITH replication = { ‘class’ : ‘<Strategy name>’,


‘replication_factor’ : ‘<No. of replicas>’ };

DESCRIBE KEYSPACE : Displays existing keyspaces


DROP KEYSPACE : Drops a keyspace
Use KEYSPACE : Connects the client session with keyspace
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: CQL (Cassandra Query Language

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: CQL (Cassandra Query Language

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: CQL (Cassandra Query Language

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
NoSQL Big Data Management, MongoDB and Cassandra

CassandraDB: CQL (Cassandra Query Language

Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga

You might also like