Professional Documents
Culture Documents
EUC1502 Module5 Big-Data
EUC1502 Module5 Big-Data
EUC1502 Module5 Big-Data
● Data storage
● Data processing
● Data mining
● Data visualisation
● Business Intelligence Systems
Outline
Miner Miner
Vis.js D3.js
CartoDB Plot.ly
Tableau QlikView
R HighCharts
A Survey on Tools
- Business Intelligence
Pentaho Actuate
SpagoBI JasperReports
Tableau QlikView
Palo Tactic
NoSQL databases:
Bob’s friends
Graph data model means that data are modelled such a graph.
Property graph
Nodes
BATCH STREAMING
VOLUME VELOCITY
HYBRID
● Batch processing for large volumes of information (e.g. ADN
sequentiation)
● Streaming processing for rapid generated data (e.g. Twitter)
● Hybrid processing for large volumes rapidly generated (e.g. in-depth
analysis of Twitter tweets)
Data Processing
- Processing steps
Map/Reduce paradigm:
● Map: The Map process divides the data into subsets and sends them to each
process node in key-value format <K, V>
● Reduce: Each node returns the result in key-list of values format <K, L (V)>
and they are combine to produce the final result
● Map: A line of text is sent to each node, where the key K is the line number,
and the value V is the line of text <nline, text>. The result of the task is a list
of pairs <word, 1> for each word in the text.
● Reduce: It collects all the outputs of Map processes as pairs <key, value> or
<word, 1>, and it is responsible for grouping them in pairs <word,
occurrence> by adding the ones of each word
Data Processing
- Batch processing
Data Processing
- Batch processing
© autoritas Cosmos-intelligence
Data Processing
- Stream processing
KESTREL trident
Data Processing
- Hybrid processing
Data Processing
- Hybrid processing
SUMMINGBIRD
Outline
● Graph Databases. Ian Robinson, Jim Webber and Emil Eifrem. O’Reilly.
http://neo4j.com/books/graph-databases/
http://www.springer.com/us/book/9781441984616
https://www.cs.cornell.edu/home/kleinber/networks-book/
References