Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 65

Slides and Feedback at: http://joind.

in/11012

NoSQL Introduction
NoSQL Introduction
• Understand what NoSQL is and what it is not.
• Why would you want to use NoSQL within your project
and which NoSQL database would you utilize?
• Explore the relationships between NoSQL and RDBMS.
• Understand how to select between an RDBMs (MySQL
and PostgreSQL), Document Database (MongoDB), Key-
Value Store, Graph Database, and Columnar databases or
combinations of the above.
Thursday May 8th 2014, 3:00pm-3:50pm SB 139
Slides and Feedback at: http://joind.in/11012
2
NoSQL
• History
• Popular NoSQL Databases
• NoSQL Database Comparisons
• Terminology
• Consistency, Replication, Performance
• NoSQL Implementation CRUD Operations

3 Slides and Feedback at: http://joind.in/11012


NoSQL Introduction
• NoSQL is a commonly adopted misnomer
• Typically does not use ANSI SQL
– SQL = Structured Query Language
– Structure exists but is more Flexible
– Queries are performed
– Language is closer to Programming Languages

4
Slides and Feedback at: http://joind.in/11012
NoSQL History

5 http://www.w3resource.com/mongodb/nosql.php
NoSQL History
• 1998 Carlo Strozzi Command Line Database
• June 11, 2009 Meetup
– Open Source, Distributed, Non-Relational DB
– Eric Evans (Rackspace)
– Johan Oskarsson (Last.fm)

6
NoSQL History

7
NoSQL History
• Bad name, but it stuck!
• Not a definitive term
• Generally, Newer databases solving new
and different problems
• Not Only SQL http://blog.sym-
link.com/2009/10/30/nosql_whats_in_a_name.html

8 Slides and Feedback at: http://joind.in/11012


NoSQL Origination
• Problems not solved by RDBMs
• Limitations of RDBMs, not SQL

9 Slides and Feedback at: http://joind.in/11012


Most Popular Databases
http://db-engines.com/en/ranking
Ranking by: Web Content, Web Searches, Technical Discussion, Jobs, Resumes

10
Most Popular NoSQL
• MongoDB - Document Store
• Cassandra – Wide Column Store
• Solr – Search Engine
• Redis – Key-value store
• Hbase – Wide Column Store
• Memcached – Key-value Store
• CouchDB – Document Store
• Neo4j – Graph Database
• Riak – Key-value Store
• SimpleDB – Key-value Store within Amazon Cloud

11 Slides and Feedback at: http://joind.in/11012


NoSQL vs RDBMs

12 Image Reference: http://blogs.the451group.com/information_management/2012/11/02/updated-database-landscape-graphic/


Reading Recommendations

Great Overview of NoSQL:


Seven Databases in Seven Weeks
Eric Redmond and Jim Wilson

13
NoSQL “Bleeding Edge”
• Several solutions are mature and stable
enough to run large scale production
environments
• Not all permutations have been considered
• Several (but not all) optimization strategies
have been published
• Crucial elements such as Security may be a
secondary add-on in favor of performance.
14
NoSQL “Bleeding Edge”
Sun Microsystems csh man page:
“Although robust enough for
general use, adventures into the
esoteric periphery of the C shell
may reveal unexpected quirks.”

15 Slides and Feedback at: http://joind.in/11012


NoSQL Comparison

Take note of patterns:


Recent Release, Open Source, Utilized at High-Volume sites
Variety of Formats:
Key-Value, Wide-Column, Document, Graph
16 http://db-engines.com/en/ranking
NoSQL Database Types
• Key-Value
• Column Oriented Databases (Columnar)
• Graph
• Document

• Search Database - Solr


• Key-Value Web Optimization - Memcached

17
Key-Value Stores
Key Value

Code bucket
code:java 17.316% Lowest rank on Feb 2014
code:C 18.334% Lowest rank on August 2013
code:Objective-C Lowest rank on Dec 2007 11.341%
code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”}

Key Value

drink bucket
drink:java coffee
drink:punch Sprite + pineapple juice
drink:pop Carbonated Soda

http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
18
Column Oriented Database

19
Neo4j

20
Document Oriented Database
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"contribs" : [ "Fortran", "ALGOL", "FP" ],
"awards" : [
{ "award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society" },
{ "award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering" }
]
}
21
Document Oriented Database
{ "facutly" :
[
{ {
"_id" : 1, "_id" : 2,
"name" : { "first" : "John", "last" : "Backus" }, "name" : { "first" : "David", "last" : "Williams" },
"contribs" : [ "Fortran", "ALGOL", "FP" ], "contribs" : [ "C#", "Java", "PHP" ],
"awards" : [ "awards" : [
{ "award" : "W.W. McDowell Award", { "award" : "Sherman Peabody Award II",
"year" : 1967, "year" : 2095,
"by" : "IEEE Computer Society" }, "location" : "Paris",
{ "award" : "Draper Prize", "by" : "Intergalactic Continuum" },
"year" : 1993, { "award" : "Sherman Peabody Award IX",
"by" : "National Academy of Engineering" } "year" : 2090,
] "location" : "Paris",
}, "by" : "Intergalactic Continuum" },
{ "award" : "Sherman Peabody Award IV",
"year" : 2093,
"location" : "Paris",
"by" : "Intergalactic Continuum" }

]
}
]
}

22
Document Oriented Database
http://chris.photobooks.com/json/

23 Slides and Feedback at: http://joind.in/11012


http://visualizer.json2html.com/
24
NoSQL Comparison

No ANSI SQL Standards, No Predefined Schemas, Replication,


Eventual Consistency, Rarely Foreign Keys, Data Types not required
25 Newer Concepts: Sharding, REST API, JSON, MapReduce
NoSQL Characteristics
No Predefined Schemas
• May insert data without creating a table
• Schema Versions (v1.5, v1.6, v1.7,…)
Rarely Foreign Keys
• No JOIN operations
• Relationships are not automatically maintained
Eventual Consistency
• Old copies being replaced by new records
• Inconsistent data until all replacements are complete

26
Download NoSQL v95141.3

Released 4/1/2014
http://www.nosql.org/downloads/ymbkm.zip

27
NoSQL
Terminology
and
Concepts

28 Slides and Feedback at: http://joind.in/11012


Sharding
Partitions – Data distributed across disks

Sharding – Data distributed across servers

29
Map Reduce
Divides work across distributed systems
Parallel processing of large data sets
Divide – Conquer – Consolidate
Often Implement by defining Map and Reduce classes or functions
2
6
16
8
1+2+3+6+7+8+9=? 36
1
7
20
3
9

Google’s MapReduce Programming Model – Revisited Ralf Lammel, Microsoft, 2008


http://www.sciencedirect.com/science/article/pii/S0167642307001281
30
JSON
Subset of JavaScript Object Notation
Similarities to XML method for representing data
Syntax
Name : Value pairs
“salary” : “125000”
Values are: number, string, Boolean, array, object, or NULL
Objects can store Objects, Arrays can store Arrays
Separate pairs by commas
“salary” : “125000”, “gender” : “male”
Curly braces denote objects
{ “salary” : “125000”, “gender” : “male” }
Square brackets denote arrays
“phone” : [”555-1212”, ”555-3344”]
31 “phone” : [ {“office” : ”555-1212”}, {“mobile” : ”555-3344”} ]
JSON Example
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"contribs" : [ "Fortran", "ALGOL", "FP" ],
"awards" : [
{ "award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society" },
{ "award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering" }
]
}
32 http://www.mongodb.com/json-and-bson
REST API
CRUD (Create, Read, Update, Delete) operations through the web
HTTP Methods
GET (List/Read)
POST(Update)
PUT(Create)
DELETE(Delete)

EXAMPLE API http://www.blinksale.com/api/


List/Read Data via HTTP GET to
http://www.blinksale.com/invoices
http://www.blinksale.com/invoices/invoice_id/payments
http://www.blinksale.com/invoices/?start=2006&end=2008
Returns XML results
33
REST API
Update data via HTTP POST to
http://www.blinksale.com/invoices/invoice_id/payments
<?xml version="1.0" encoding="UTF-8"?>
<payment xmlns="http://www.blinksale.com/api">
<amount>1000.00</amount>
<date>2006-09-27</date>
</payment>

REST = REpresentational State Transfer

Twitter Example:
https://dev.twitter.com/docs/api/1.1 (GET and POST only)
34
Database SELECT Statements
Oracle
SELECT * FROM relationships

MongoDB
db.relationships.find()

Cassandra (CQL)
SELECT * FROM relationships

35 Slides and Feedback at: http://joind.in/11012


Database SELECT Statements
Redis – Key-Value Store
SMEMBERS relationships

Riak – Key-Value Store with REST API (+ proprietary drivers)


http://localhost:8091/riak/relationships/likes

Neo4j (Cypher)
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m

36 Slides and Feedback at: http://joind.in/11012


JOINS without Foreign Keys
original_id = ObjectId()

db.employer.insert({
"_id": original_id,
"name": "Broadway Tech",
"url": "bc.example.net" })

db.people.insert({
"name": "Erin",
“employer_id": original_id,
"url": "bc.example.net/Erin" })

“Erin” works at “Broadway Tech”


One of the employees at “Broadway Tech” is “Erin”

http://docs.mongodb.org/manual/reference/database-references/#document-references
37
Replication Challenge is
Write Consistency

38
ACID, BASE, CAP, CPR
1979 Gray, 1983 Reuter & Härder - ACID
Atomic, Consistent, Isolated, Durable
Rollback: All or Nothing, Follows Rules, Simultaneous, No Drops
1997 Brewer - BASE
Basically Available, Soft-state, Eventually consistent
2000 Brewer – CAP (Pick Two)
Consistency, Availability, Partition Tolerance
CPR (Pick Two)
Consistency, Performance, Replication/Redundancy

Contrived - Stretch Definitions


39
CPR
Consistency Performance

Pick Two Redundancy

40
CPR
Consistency Performance

Spread data across storage or computer

A B C D Redundancy

41
ABCE ABCE ABCD ABCD

Updates may be Performance


inconsistent across devices

Consistency

Redundancy

42
ABCD ABCD ABCD ABCD

Consistency
One Update Locks all Nodes

Performance

Redundancy

43
CRUD
Create
Read
Update
Delete

44 Slides and Feedback at: http://joind.in/11012


SQL CRUD
Create
INSERT INTO table (column1, column2) VALUES (9, 'string');
Read
SELECT column1, column2 FROM table;
Update
UPDATE table SET column2 = 'text' WHERE column1= 9
Delete
DELETE FROM table WHERE column2='text'

45
Key-Value Stores
Key Value

code bucket
code:java 17.316% Lowest rank on Feb 2014
code:C 18.334% Lowest rank on August 2013
code:Objective-C Lowest rank on Dec 2007 11.341%
code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”}

Key Value

drink bucket
drink:java coffee
drink:punch Sprite + pineapple juice
drink:pop Carbonated Soda

http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
46
Redis CRUD
http://redis.io/commands
http://redis.io/topics/data-types-intro
http://openmymind.net/2011/11/8/Redis-Zero-To-Master-In-30-Minutes-Part-1/

Redis is an in-memory Key-Value Store which stores:


Strings, Hashes, Lists, Sets, or Ordered sets

Strings: values of strings are concrete and can not be altered


SET user:jim {lastname: ‘Mathews’, salary: 125000}
GET user:jim

Hashes: allows modification and retrieval of individual values


HSET user:jim salary 125000
HSET user:jim lastname Mathews
47 HGET jim salary
Redis CRUD
Lists: One-dimensional array with insert, append, pop, and
push
Redis.lpush(‘users:employees’, ‘user:jim’)
redis.mget(redis.lrange(‘users:employess’,0,5))

Sets: lists with no duplicate values (SADD = Set Add)


SADD users:employees jim
SADD users:employees krishna
SMEMBERS employees

Sorted Sets: are sets with an added sorting value


ZADD users:employees 125000 jim
ZADD users:employees 157000 Krishna
48 ZRANGEBYSCORE users:employees 100000 180000
Riak CRUD
Easy to install and configure test cluster
REST Queries

Create/PUT a “course:CIS2120” row


Key Value
course:CIS2120 {“name”:”Database Coding”, “days”:”MWF”}

curl –v –X PUT http://localhost:8091/riak/course/CIS2120 \


-H “Content-Type: application/json” \
-d ‘{“name”:”Database Coding”, “days”:”MWF”}’

Read/GET the value for “course:CIS2120”


curl –X GET http://localhost:8091/riak/course/CIS2120
49 curl http://localhost:8091/riak/course/CIS2120
Riak Links
Riak can link on value to key:value to another with a relationship

curl –v –X PUT http://localhost:8091/riak/student/sorensen \


-H “Content-Type: application/json” \
-H “Link: </riak/course/CIS2120>; riaktag=\”enrolled\”” \
-d ‘{“firstname”:”Conner”}’

This does not automatically create a link from “sorensen“ to


“CIS2120”

50
Neo4j

51
Neo4j – Graph Database
http://www.neo4j.org/learn/try

http://docs.neo4j.org/refcard/2.0/
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
Matches a person “n” that likes person “m”

https://gist.github.com/peterneubauer/6019125
http://gist.neo4j.org/?6019125

52
Neo4j CRUD
Must try dragging nodes at: http://www.neo4j.org/learn/try

MATCH (user {name:“Bill"})-[:KNOWS]->(colleague)


WHERE colleague.employer=“LinkedIn”
RETURN user,colleague
ORDER BY colleague.name LIMIT 10

http://docs.neo4j.org/refcard/2.0/
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
Matches a person “n” that likes person “m”
MATCH (n)-[r]->(m) RETURN n,r,m
Matches any relationship between “n” and “m”

53
http://www.neo4j.org/learn/cypher
Neo4j
(LUKE {name:"Luke Skywalker"}), (OBI_WAN)-[:KNOWS]->(VADER),
(HAN {name:"Han Solo"}), (LUKE)-[:KNOWS]->(R2D2),
(LEIA {name:"Princess Leia Organa"}), (R2D2)-[:KNOWS]->(C3PO),
(OBI_WAN {name:"Obi Wan Kenobi"}), (LUKE)-[:LIVED_ON]->(TATOOINE),
(YODA {name : "Yoda"}), (HAN)-[:LIVED_ON]->(CORELLIA),
(VADER {name:"Darth Vader"}), (LEIA)-[:LIVED_ON]->(ALDERAAN),
(C3PO {name:"C3PO", droid:true}), (YODA)-[:LIVED_ON]->(DAGOBAH),
(R2D2 {name:"R2D2", droid:true}), (LUKE)-[:DEVOTED_TO]->(JEDI),
(CHEWBACCA {name:"Chewbacca"}), (LUKE)-[:DEVOTED_TO]->(REBELLION),
(TATOOINE {name:"Tatooine", distance:13184}), (LUKE)-[:DEVOTED_TO]->(LIGHT_SIDE),
(DAGOBAH {name:"Dagobah", distance:15407}), (VADER)-[:DEVOTED_TO]->(SITH),
(JEDI {name:"Jedi"}), (VADER)-[:DEVOTED_TO]->(EMPIRE),
(SITH {name:"Sith"}), (VADER)-[:DEVOTED_TO]->(DARK_SIDE),
(REBELLION {name:"Rebellion"}), (LEIA)-[:DEVOTED_TO]->(REBELLION),
(EMPIRE {name:"Empire"}), (HAN)-[:DEVOTED_TO]->(REBELLION)
(DARK_SIDE {name:"Dark Side"}), …
(LIGHT_SIDE {name:"Light Side"}), https://gist.github.com/peterneubauer/6019125
… http://gist.neo4j.org/?6019125
(LUKE)-[:FRIENDS_WITH]->(HAN),
(LUKE)-[:FRIENDS_WITH]->(LEIA), MATCH y-[r]-other
(HAN)-[:FRIENDS_WITH]->(CHEWBACCA), WHERE y.name='Yoda'
(YODA)-[:TEACHES]->(OBI_WAN), return y.name, type(r), other.name
(YODA)-[:TEACHES]->(LUKE),
54 (OBI_WAN)-[:TEACHES]->(LUKE),
Google BigTable
• White Paper published in 2006
• Many databases based upon BigTable
• 13 pages, readable for many non-techies
• Insightful into the early days of NoSQL
http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf

55
Hbase
Large-Scale, Column-oriented database
Consistency, Performance, Fault-Tolerant, ACID via Locking
Tables are created before initial data is added
Tables have
row keys are indexed row identifier strings
column families – contain one or more columns
timestamp for version control

56
Hbase
Row key is a unifier for column families.
If row does insert values in a column family no disk space
is utilized within the column family.

Keys are identified by column_family:column_name


text:
revision:author
revision:comment

Write-Ahead Logging
(WAL)
similar to file system
journaling

57
Hbase CRUD
create ‘wiki_table’, ‘text_column_family’, ‘revision_column_family’
create ‘wiki’, ‘text’, ‘revision’
put ‘wiki’, ‘first page’, ‘text:’, ‘…’
put ‘wiki’, ‘first page’, ‘revision:author’, ‘…’
get ‘wiki’, ‘first page’, [‘revision:author’, ‘revision:comment’]
delete ‘wiki’, ‘first page’, ‘revision:author’
scan ‘wiki’ = SELECT * FROM wiki

Seven Databases in Seven Weeks, Redmond & Wilson 2012


58
MongoDB Document Store
db.courses.insert({
name: “CIS2120”,
description: “Database Coding”,
instructor: {
name: “David Williams”,
email: “david.williams@usu.edu”
}
instructor2: {
name: “John Kerley-Weeks”,
office: “JQL230”
}
subjects: [“Python”, “MongoDB”, “3NF”, “ETL”, “Star Schema”]
})
59
MongoDB vs SQL
http://docs.mongodb.org/manual/reference/sql-comparison/
MongoDB -> SQL Terminology
Collection -> Table
Document -> Row
Column -> Field
db.courses.find() = SELECT * FROM courses
db.courses.count() = SELECT COUNT(*) FROM courses
db.courses.find({name: “CIS2120”})

60
MongoDB Simple Database
http://media.mongodb.org/zips.json
{"city": "ACMAR", "loc": [-86.51557, 33.584132], "pop": 6055, "state": "AL", "_id": "35004"}
{"city": "ADAMSVILLE", "loc": [-86.959727, 33.588437], "pop": 10616, "state": "AL", "_id": "35005"}
{"city": "ADGER", "loc": [-87.167455, 33.434277], "pop": 3205, "state": "AL", "_id": "35006"}
{"city": "KEYSTONE", "loc": [-86.812861, 33.236868], "pop": 14218, "state": "AL", "_id": "35007"}
{"city": "NEW SITE", "loc": [-85.951086, 32.941445], "pop": 19942, "state": "AL", "_id": "35010"}
{"city": "ALPINE", "loc": [-86.208934, 33.331165], "pop": 3062, "state": "AL", "_id": "35014"}
{"city": "ARAB", "loc": [-86.489638, 34.328339], "pop": 13650, "state": "AL", "_id": "35016"}
{"city": "BAILEYTON", "loc": [-86.621299, 34.268298], "pop": 1781, "state": "AL", "_id": "35019"}
{"city": "BESSEMER", "loc": [-86.947547, 33.409002], "pop": 40549, "state": "AL", "_id": "35020"}
{"city": "HUEYTOWN", "loc": [-86.999607, 33.414625], "pop": 39677, "state": "AL", "_id": "35023"}
{"city": "BLOUNTSVILLE", "loc": [-86.568628, 34.092937], "pop": 9058, "state": "AL", "_id": "35031"}
{"city": "BREMEN", "loc": [-87.004281, 33.973664], "pop": 3448, "state": "AL", "_id": "35033"}
{"city": "BRENT", "loc": [-87.211387, 32.93567], "pop": 3791, "state": "AL", "_id": "35034"}
{"city": "BRIERFIELD", "loc": [-86.951672, 33.042747], "pop": 1282, "state": "AL", "_id": "35035"}

{“city”: “Logan, UT”, “additionally”: [“Nibley, UT”, “River Heights, UT”], “state”: “UT”, “version”: “2.1”, “_id”: “84321”}
{“city”: “Olivehurst, CA”, “additionally”: [“Arboga, CA”, “Plumas Lake, CA”, “West Linda, CA”], “state”: “CA”, “version”: “2.1”,
“_id”: “95961”}

61
Cassandra Characteristics
Scalable, High-availability Wide-columnar datastore
Peer-to-peer rather than master-slave clusters
Tunable consistency can read/write to a single node,
quorum of nodes or all nodes
Recommends static and dynamic column families
Static column families have contain pre-defined columns
Contact Info: phone, address, email, web
Dynamic families have variable numbers of similar columns
Students enrolled in a course

62
Cassandra CRUD
http://www.datastax.com/docs/0.8/references/cql
http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt

CREATE TABLE course (


name text PRIMARY KEY,
instructor text,
maxstudents int
)

INSERT INTO course (name, instructor, maxstudents) VALUES


(‘CIS2120’, ‘Williams’, 28)

UPDATE course SET maxstudents=26 WHERE name=‘CIS2120’

SELECT name, instructor FROM course WHERE maxstudents > 20

63
Cassandra CRUD
No JOIN operations or FOREIGN KEYS

CREATE TABLE people (


name text PRIMARY KEY,
email text,
phones map<text, text>
)

INSERT INTO people (name, email, phones)


VALUES (‘John Weeks’, ‘john.weeks@usu.edu’,
{‘mobile’ : ‘555-1212’, ‘office’ : ‘797-7133’, ‘fax’ : ‘555-1212’})

UPDATE people SET phones[‘office’] = ‘555-1212’


64 WHERE email = ‘john.weeks@usu.edu’
Questions
???

65 Slides and Feedback at: http://joind.in/11012

You might also like