Professional Documents
Culture Documents
Hadoop - Session 8 Oozie MRV2
Hadoop - Session 8 Oozie MRV2
Hadoop - Session 8 Oozie MRV2
www.Bigdatainpractice.com
Name Node
Secondary
Name Node
Map Reduce
D2
D3
D4
D5
D6
D7
D8
Job Tracker
www.Bigdatainpractice.com
12/19/2015
historically justified
business decisions
Deployment of
Knowledge
Reporting Tools
(Extracted Knowledge)
MapReduce Framework
HADOOP
(Cluster)
Web
logs
UNIVERSE:
Diverse Business
Applications
APP2
Emails,
Documents
EDW
(un-structured)
APP1
Flat Files
www.Bigdatainpractice.com
HDFS
Name Node
D1
D2
D5
D6
3.
D3
D7
D8
www.Bigdatainpractice.com
12/19/2015
T2
T3
T4
T5
T6
T7
T8
MR V1 Other challenges:
1.
Cascading Failures
2.
Multi-tenancy
www.Bigdatainpractice.com
Name node 2
Name node n
Name
Space
1
Name
Space
2
Name
Space
n
POOL 1
Data
Node 1
BLOCK POOLS
POOL 2
Data
Node 2
POOL n
Data
Node N
www.Bigdatainpractice.com
12/19/2015
Active Name
Node Writes Edit
Log to Journal
Nodes
JOURNAL NODE
JOURNAL NODE
PASIVE
NAME
NODE
Data Nodes send
heart-beat message
to both Name Nodes
Data
Node 1
Data
Node 2
Data
Node N
Data
Node 2
www.Bigdatainpractice.com
historically justified
business decisions
Deployment of
Knowledge
Reporting Tools
(Extracted Knowledge)
MapReduce
Framework
HBASE
Apache
Spark GIRAPH, Other
Frameworks
HADOOP
(Cluster)
Web
logs
UNIVERSE:
Diverse Business
Applications
APP2
Emails,
Documents
(un-structured)
EDW
APP1
Flat Files
www.Bigdatainpractice.com
12/19/2015
RESOURCE
MANAGER
Application Manager:
Start and monitor Application
Masters running on cluster
Restarts Application Masters on
Different Node in case of Failures
Cap Scheduler
App Manager
Client
Resource Manager
Node Manager
Application Master
HDFS
Application Master:
Get Input Splits
Request Resources and Start
Containers on different nodes
Works with client in complete
life cycle of Job
Application
Master
Container
Container
Node
Manager
Node
Manager
Node
Manager
Node
Manager
www.Bigdatainpractice.com
Apache Oozie
www.Bigdatainpractice.com
12/19/2015
Introduction to Oozie
Oozie Important Components
Oozie Workflow Components
Oozie Workflow Features
Oozie Operational Details
www.Bigdatainpractice.com
Apache Oozie
1. Open Source, Java application
for scheduling Hadoop Jobs
8. Complex Data
Transformations can be
designed and scheduled
using apache oozie
7. Workflows,
coordinators and
Bundles as important
components
www.Bigdatainpractice.com
12/19/2015
Start
End
Action
Decision
Fork
Join
www.Bigdatainpractice.com
12/19/2015
www.Bigdatainpractice.com
12/19/2015
#*****************cluster settings*******************
nameNode=hdfs://localhost:8020
jobTracker=localhost:8021
queueName=default
oozie.wf.application.path=${nameNode}/user/${user.name}/OozieSimple
#****************decision variable*******************
is_mapreduce_simple_run=1
is_python_run=1
#******************Step 1 Python variables***********
#***Set country=NA if it has to be read from file*****
country=IN
FilePath=/user/cloudera/country.txt
#****************mapreduce variables*****************
input=/user/cloudera/INPUT1/SalesData.csv
output=/user/cloudera/OozieDecisionOUT
www.Bigdatainpractice.com
<decision name=check_is_python_action_run">
<switch>
<case to=python_action">
${wf:conf('_is_python_run')eq 1}
</case>
<default to=check_is_mapreduce_simple_run" />
</switch>
</decision>
www.Bigdatainpractice.com
12/19/2015
Sample Fork
<workflow-app xmlns="uri:oozie:workflow:0.2"
name="pscore_datacheck">
<start to="fork" />
<fork name="fork">
<path start="decision_check_action_1" />
<path start="decision_check_action_2" />
<path start=action_3" />
</fork>
www.Bigdatainpractice.com
Sample FS Action
<action name="create_output_directories">
<fs>
<mkdir path="${nameNode}/user/hive/warehouse/prep/>
<mkdir path="${nameNode}/user/hive/warehouse/scoring"/>
<mkdir path="${nameNode}/user/hive/warehouse/post"/>
</fs>
<ok to="decision_on_next_activity"/>
<error to="send-email"/>
</action>
www.Bigdatainpractice.com
10
12/19/2015
Thank You
www.Bigdatainpractice.com
11