Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

Module 5

Data Analytics for IOT


Syllabus
 Introduction
 Apache Hadoop
 Using Hadoop Map Reduce for Batch Data Analysis
 Apache Oozie
 Apache Spark
 Apache Storm
 Using Apache Storm for Real-time Data Analysis
 Structural Health Monitoring Case Study.
Apache Hadoop

 It is a open source framework for distributed batch


processing of big data.

 MapReduce is a parallel programming model suitable


for analysis of big data

◦ Map Reducing Programming Model


◦ Hadoop MapReduce Job Execution.
◦ MapReduce Job Execution Overflow
Hadoop Cluster Setup
 Hadoop filesystem HDFS is highly fault
tolerant.

 Preferred OS- LINUX


 OTHER OS_ Cygwin Environment.
STEPS INVOVLVED IN SETTTING UP A
HADOOP CLUSTER:

 Install Java.

 Install Hadoop.

 Configure Network.

 Configure Hadoop

 Starting and Stopping Hadoop Cluster


Install Java
 Hadoop requires Java 6 or later version.

 Commands for installing java:

#set the properties

sudo apt-get –q –y install python-software-properties


sudo add-apt-repository –y ppa:webupd8team/java
sudo apt-get –q –y update
#state that you accepted the license

echo debconf shared/accepted-oracle-license-v1-1 select true |


sudo debconf-set-selections
echo debconf shared/accepted-oracle-license-v1-1 seen true |
sudo debconf-set-selections
Using Hadoop Map Reduce for Batch
Data Analysis
Apache Oozie
 Apache Oozie is a scheduler system to manage &
execute Hadoop jobs in a distributed environment.

 Apache Oozie is a workflow scheduler for Hadoop. It


is a system which runs the workflow of dependent jobs.
Here, users are permitted to create Directed Acyclic
Graphs of workflows, which can be run in parallel and
sequentially in Hadoop.
Apache Spark
Apache Storm
Using Apache Storm for Real-time Data
Analysis
Structural Health Monitoring Case
Study.

You might also like