Chapter 2a Non Structured DataRozianiwati

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

CHAPTER 2 (a)

Non-Structured Data
ISP610 BUSINESS DATA
ANALYTICS

Rozianiwati Binti Yusof

References:

Ruhaila Maskat (PhD)-ITS480


Wikipedia
SearchDataManagement
3pillarglobal
April 17 1
By the end of this lesson, you should know:
• What NoSQL databases are.
• How are they different from SQL databases.
• Types of NoSQL databases.

April 17 2
NoSQL databases
• Non SQL or Non relational or Not only SQL.
• Stores and retrieves data that is not modelled in rows and columns.
• "Not only SQL“ - may support SQL-like query languages.

April 17 3
Applications of NoSQL databases
• The NoSQL distributed database infrastructure has been the solution
to handling some of the biggest data warehouses on the planet – i.e.
the likes of Google, Amazon, and the CIA.
• Airbus
http://medianetwork.oracle.com/video/player/4662924811001

April 17 4
NoSQL vs
SQL
1. Non-relational model. 1. Relational model.
NoSQL

Stores data in JSON, 2. Stores data in a table.

SQL
2.
key/value, graphs, columns. 3. Adding a new property may
3. New properties can be added on require altering schemas.
the fly. 4. Good for structured data.
4. Good for semi-structured, 5. Relationships are captured in
complex or nested data. normalised model using joins to
5. Relationships are captured by resolve references across tables.
denormalizing data and 6. Strict schema.
presenting all data for an object
in a single record.
6. Dynamic/flexible schema.

April 17 5
April 17 6
SQL NoSQL

April 17 7
Case study: Building a social media website
• Users can post articles with related media like, pictures, videos, or
even music.
• Users can comment on posts and give points for ratings.
• Users can see a feed of posts.
• Users can interact with the main website.

April 17 8
Relational model

April 17 9
NoSQL model
In general:
• One query.
• No JOINS.
• No schema is maintained.

April 17 10
Types of NoSQL databases

• Key-value
• Column /
BigTable
• Document
• Graph

April 17 11
April 17 12
Key-value database
• Most basic and a backbone implementation of NoSQL.
• Underlying is a hash table which consists of a unique key that points
to a specific item of data.
• Work by matching keys with values like a dictionary.
• Give a key (e.g. the_answer_to_life) and receives a matching value
(e.g.24).
• Database is a global collection of key-value pairs.
• As the volume of data increases, maintaining unique values as
keys may become more difficult.
• Riak, Amazon S3 (Dynamo), Oracle NoSQL.
April 17 13
KEY
The key in a key-value pair must (or at least, should) be unique. This is the
unique identifier that allows you to access the value associated with that key.

In theory, the key could be anything. But this may depend on the
DBMS. One DBMS may impose limitations while another may impose none.
In Redis for example, the maximum allowed key size is 512 MB. You can use any
binary sequence as a key, from a short string of text, to the contents of an
image file. Even the empty string is a valid key.

However, for performance reasons, you should avoid having a key that’s too
long. But too short can cause readability issues too. In any case, the key should
follow an agreed convention in order to keep things consistent.
THE VALUE
The value in a key-value store can be anything, such as text (long or short), a number, markup code such as HTML,
programming code such as PHP, an image, etc.

The value could also be a list, or even another key-value pair encapsulated in an object.
Some key-store DBMSs allow you to specify a data type for the value. For example, you could specify that the
value should be an integer. Other DBMSs don’t provide this functionality and therefore, the value could be of any
type.

As an example, the Redis DBMS allows you to specify the following data types:
•Binary-safe strings.
•Lists: collections of string elements sorted according to the order of insertion.
•Sets: collections of unique, unsorted string elements.
•Sorted sets, similar to Sets but where every string element is associated to a floating number value,
called score. Allows you to do things like, select the top 10, or the bottom 10, etc.
•Hashes, which are maps composed of fields associated with values. Both the field and the value are strings.
•Bit arrays (or simply bitmaps).
•HyperLogLogs: this is a probabilistic data structure which is used in order to estimate the cardinality of a set.
Example
data

This data can be write in key : value form

India: {B-25, Sector-58, India-201301}


Romania: {…..}
US:{…}
April 17 16
Phone Directory Artist Info

Key Value Key Value


Bob (123) 456-7890 artist:1:name AC/DC
Jane (234) 567-8901 artist:1:genre Hard Rock
Tara (345) 678-9012 artist:2:name Slim Dusty
Tiara (456) 789-0123 artist:2:genre Country

Key Value

123456789 APPL, Buy, 100, 84.47


Stock Trading
This example uses a list as the value. 234567890 CERN, Sell, 50, 52.78

The list contains the stock ticker, whether its a “buy” or “sell” order, 345678901 JAZZ, Buy, 235, 145.06

the number of shares, and the price.


456789012 AVGO, Buy, 300, 124.50
Storage

• Any reads and


writes of values
uses the key.
• Key can be
synthetic or Value can be String,
auto-generated. JSON, BLOB etc

April 17 18
Basic reading and
writing
• Get(key), returns the value associated with the provided key.
• Put(key, value), associates the value with the key.
• Multi-get(key1, key2, .., keyN), returns the list of values associated
with the list of keys.
• Delete(key), removes the entry for the key from the data store.

April 17 20
Column/BigTable
• Advance the simple nature of key / value based.
• Do not require a pre-structured table to work with the data.
• Work by creating collections of one or more key / value pairs.
• Two dimensional arrays whereby each key has one or more key /
value pairs attached to it.
• Two groups: column-store and column-family store.
• Column-family store: Bigtable, HBase, Hypertable, and Cassandra.
• Column-store: Sybase IQ, C-store, Vertica, VectorWise, MonetDB,
ParAccel and Infobright.
April 17 21
Some key benefits of columnar databases include:
•Compression. Column stores are very efficient at data compression
and/or partitioning.
•Aggregation queries. Due to their structure, columnar databases
perform particularly well with aggregation queries (such as SUM,
COUNT, AVG, etc).
•Scalability. Columnar databases are very scalable. They are well
suited to massively parallel processing (MPP), which involves having
data spread across a large cluster of machines – often thousands of
machines.
•Fast to load and query. Columnar stores can be loaded extremely
fast. A billion row table could be loaded within a few seconds. You can
start querying and analysing almost immediately.
KEY

Column-store,
position-base
d
April 17 23
Column-store
,
rowid-based

KEY
VALUE
April 17 24
Column-family

April 17 25
April 17 28
• The outermost keys 3PillarNoida,
3PillarCluj, 3PillarTimisoara and
3PillarFairfax are analogues to
rows.

• ‘address’ and ‘details’ are


called column families.

• The column-family ‘address’


has columns ‘city’ and ‘pincode’.

• The column-family details’


has columns ‘strength’ and
‘projects’.

April 17 29
Document database

• A collection of key value pairs but the values stored (referred to as


“documents”) provide some structure and encoding of the managed
data i.e. XML, JSON, BSON. A unique key is a simple identifier (string,
URI, path).
• Embeds attribute metadata associated with content, this provides a
way to query data based on contents. API is used to retrieve data
based on content. Also allows editing of content and metadata.
• While key-value stores require the key to access data value,
document store has metadata which allows data access directly to the
attribute instead of through a key.
• CouchDB, Apache Cassandra, MongoDB.
April 17 30
Document database

• Document is the most basic unit of data.


• Documents are ordered sets of key-value pairs.
• Each document contains one or more name-value pairs.

Example:
{ KEY
_id : 978 NAME-VALUES
“Title” : “The Linux Command Line”, Document 1
“Author” : “William Shotts”
}
April 17 31
Documents are gathered together in collections within the database.

Collections should make sense e.g. books, webstore, retail store, fruits.
Hence, document database is unstructured and schemaless.

April 17 32
Since we are so used to relational
db…
Relational Databases Document Databases
Databases Databases or Buckets
Tables Collections or Type Signifiers
Rows Documents
Columns Attributes/Names
Index Index

April 17 33
Document database
• We can store different schemas in different documents and these documents reside in the same
collection.
Example:
{
_id : 1
“ISBN” : “978”,
Document 1
“Title” : “The Linux Command Line”
}
Collection
{
_id : 2
“ASIN” : Document 2
“Item” “B00J”,
: “Cherry Barbeque Sauce”
}

April 17 34
Document database

•We can have more complicated structure. Example:


•{

_id : “978”,
“Title” : “Data Science”,
“Author” : [“William
List of values
Jackson”, “Ben
} Ten”]

April 17 35
Graph database
• Use graph structures with edges, nodes and properties.
• Nodes are organised based on their relationships with one
another.
• These relationships are represented by edges between the nodes.
• Relationship defines social connectivities.
• Both nodes and relationships have defined properties.
• Neo4j.

April 17 37
Here is a comparison between the classic relational model and the graph model :

Relational model Graph model


Tables Vertices and Edges set

Rows Vertices

Columns Key/value pairs

Joins Edges
Use case
• People who likes this product, usually like that product.
• Mary is friends with George. George likes pizza. George has visited
Japan. Thus, we can ask the question of who are the friends of Mary’s
friends who likes the food that Mary’s friend likes but have not visited
the place that Mary’s friend has visited.
• You are more likely to be friends with Abu because you know Ali since
Abu is Ali’s friend.

April 17 39
Graph database

April 17 40
SOCIAL NETWORK
QUESTIONS:
Convert XML script into :
Key Value database,
columnfamily,
Document database
Convert relational database into document database

You might also like