14 Oozie

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 38

Diploma in Big Data

and Analytics
Oozie
Agenda

In this session, you will learn about:

• Overview
• What is Oozie?
• Features of Oozie
• Oozie architecture
• Workflows
• Coordinators
• Submitting Monitoring and Managing
Oozie Jobs.

Private and Confidential 2


Overview

• Collection of Actions arranged in a DAG

• Control dependency from one action to the other

• Definitions are written in hPDL

• Written to workflow.xml file

Private and Confidential 3


What is Oozie ?

Private and Confidential 4


Features of Oozie (1/2)

• Oozie is a server based Workflow Engine specialized in running workflow jobs


with actions that run Hadoop Map/Reduce and Pig jobs.

• Oozie is a Java Web-Application that runs in a Java servlet-container.

• For the purposes of Oozie, a workflow is a collection of actions (i.e. Hadoop


Map/Reduce jobs, Pig jobs) arranged in a control dependency DAG (Direct Acyclic
Graph). "control dependency" from one action to another means that the second
action can't run until the first action has completed.

• Oozie allows a user to create Directed Acyclic Graphs of workflows and these can
be ran in parallel and sequen)al in Hadoop

• Oozie can also run plain java classes, Pig workflows, and interact with the HDFS –
Nice if you need to delete or move files before a job runs

Private and Confidential 5


Features of Oozie (2/2)

• Oozie can run job’s sequentially (one after the other) and in parallel
(multiple at a time)

• Java Client API / Command Line Interface – Launch, control, and monitor
jobs from your Java Apps

• Web Service API – You can control jobs from anywhere

• Run Periodic jobs – Have jobs that you need to run every hour, day, week?
Have Oozie run the jobs for you

• Receive an email when a job is complete

Private and Confidential 6


Why use Oozie ?

Private and Confidential 7


Oozie Architecture

Private and Confidential 8


Oozie Architecture

Private and Confidential 9


Oozie Workflow

Private and Confidential 10


Workflow Engine

Private and Confidential 11


How do you make a Workflow

Private and Confidential 12


How do you make a Workflow

Private and Confidential 13


Oozie Start, End and Error Nodes

Private and Confidential 14


Oozie Action Node

Private and Confidential 15


Oozie Map Reduce Node

Private and Confidential 16


Oozie Java Job Tag

Private and Confidential 17


Oozie File System Tag

Private and Confidential 18


Oozie Sub Workflow Tag

Private and Confidential 19


Oozie Fork/Join Nodes

Private and Confidential 20


Oozie Decision Nodes

Private and Confidential 21


Oozie Parameterization

Private and Confidential 22


Oozie Parameterization Example

Private and Confidential 23


Re-Running a Failed Job

Private and Confidential 24


Example - Workflow

Private and Confidential 25


Coordinator

Private and Confidential 26


Oozie Coordinator – Coordinator Job

Private and Confidential 27


Oozie Coordintor – coordinator. properties

• Contains control flow nodes & action nodes

• Control Flow Nodes:


• Defines beginning and ending of a workflow
• Control the execution path

• Action Nodes:
• A variety of actions are supported

• Workflow can be parameterized


• You can use the variables ${variable_name}

Private and Confidential 28


Oozie Coordintor – coordinator. properties

• Directs Oozie to the location of the coordinator job.


• Assigns values to variables referenced in both coordinator.xml as well
workflow.xml

freq=1440
startTime=2012-07-12T14:00Z
endTime=2012-08-01T14:00Z
timezone=UTC
workflowPath=/user/hduser/oozieWF
jobtracker=localhost:54311
namenode=localhost:54310
PREFIX=hdfs://localhost:54310/user

Private and Confidential 29


Submitting an Oozie Job – Step by Step

STEP 1 STEP 2 STEP 3


Create a directory in the Goto OozieWFonLocal Double click on
hduser’s home directory directory à Right Click à workflow.xml & select
(means you are creating a Create a document à
“Display” à write the
directory on your local Select “Empty File” à
machine), name it as Rename it as below code into
OozieWFonLocal “workflow.xml” workflow.xml

Private and Confidential 30


Submitting an Oozie Job – Step by Step

<workflow-app name="oozieWF" xmlns="uri:oozie:workflow:0.1">


<start to="wordcount"/>
<action name="wordcount">
<map-reduce>
<job-tracker>${jobtracker}</job-tracker>
<name-node>${namenode}</name-node>
<prepare>
<delete path="${PREFIX}/${wf:user()}/myoutput"/>
</prepare>
<configuration>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>

Private and Confidential 31


Submitting an Oozie Job – Step by Step
<property>
<name>mapreduce.map.class</name>
<value>org.apache.hadoop.examples.TokenizerMapper</value>
</property>
<property>
<name>mapreduce.reduce.class</name>
<value>org.apache.hadoop.examples.IntSumReducer</value>
</property>
<property>
<name>mapred.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>

Private and Confidential 32


Submitting an Oozie Job – Step by Step
<property>
<name>mapred.input.dir</name>
<value>/user/${wf:user()}/OozieWFonHDFS/GutenbergDocs </value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/user/${wf:user()}/myoutput</value>
</property>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
</map-reduce>
<ok to="end"/>
<error to="fail"/>
</action>

Private and Confidential 33


Submitting an Oozie Job – Step by Step

<kill name="fail">
<message>Bummer, error message[${wf:errorMessage()}]
</message>
</kill>
<end name="end"/>
</workflow-app>

Step 4: Create a directory at /home/hduser/OozieWFonLocal & name it as


lib à Copy the “WordCount.jar” file into lib directory

Step 5: Copy the “GutenbergDocs” directory to


/home/hduser/OozieWFonLocal

Private and Confidential 34


Submitting an Oozie Job – Step by Step

Step 6: Create one more Empty File in


/home/hduser/OozieWFonLocal & name it as job.properties

Step 7: Copy the below code into job.properties file #JobTracker and
NodeName
jobtracker=localhost:54311
namenode=localhost:54310
#prefix of the HDFS path for input and output, adapt!
PREFIX=hdfs://localhost:54310/user
#HDFS path where you need to copy workflow.xml and lib/*.jar to
oozie.wf.application.path=hdfs://localhost:54310/user/hduser/Oozi
eWFonHDFS/
#one of the values from Hadoop mapred.queue.names
queueName=default

Private and Confidential 35


Submitting an Oozie Job – Step by Step

Step 8: Now, create a directory on HDFS at /user/hduser as


OozieWFonHDFS à upload the OozieWFonLocal to OozieonHDFS hadoop
fs -mkdir /user/hduser/OozieonHDFS hadoop fs -put
/home/hduser/OozieonLocal/* /user/hduser/OozieonHDFS

Step 9: Next, invoke the Oozie workflow as follows:


cd $OOZIE_HOME/oozie/bin
export OOZIE_URL=http://localhost:11000/oozie
./oozie job -run -config /home/hduser/OozieWFonLocal/job.properties

Private and Confidential 36


Summary

• Oozie is a WorkFlow scheduler for Hadoop

• Oozie manages and integrates multiple Hadoop Jobs

• It has two main components, WorkFlow and Coordinator

• Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions

• Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered


by time (frequency) and data availabilty.

Private and Confidential 37


Thank you
Mumbai | Bangalore | Pune | Chennai | Jaipur

ACCREDITED TRAINING PARTNER:

You might also like