14 Oozie

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 38

Diploma in Big Data

and Analytics

In this session, you will learn about:

• Overview
• What is Oozie?
• Features of Oozie
• Oozie architecture
• Workflows
• Coordinators
• Submitting Monitoring and Managing
Oozie Jobs.

Private and Confidential 2


• Collection of Actions arranged in a DAG

• Control dependency from one action to the other

• Definitions are written in hPDL

• Written to workflow.xml file

Private and Confidential 3

What is Oozie ?

Private and Confidential 4

Features of Oozie (1/2)

• Oozie is a server based Workflow Engine specialized in running workflow jobs

with actions that run Hadoop Map/Reduce and Pig jobs.

• Oozie is a Java Web-Application that runs in a Java servlet-container.

• For the purposes of Oozie, a workflow is a collection of actions (i.e. Hadoop

Map/Reduce jobs, Pig jobs) arranged in a control dependency DAG (Direct Acyclic
Graph). "control dependency" from one action to another means that the second
action can't run until the first action has completed.

• Oozie allows a user to create Directed Acyclic Graphs of workflows and these can
be ran in parallel and sequen)al in Hadoop

• Oozie can also run plain java classes, Pig workflows, and interact with the HDFS –
Nice if you need to delete or move files before a job runs

Private and Confidential 5

Features of Oozie (2/2)

• Oozie can run job’s sequentially (one after the other) and in parallel
(multiple at a time)

• Java Client API / Command Line Interface – Launch, control, and monitor
jobs from your Java Apps

• Web Service API – You can control jobs from anywhere

• Run Periodic jobs – Have jobs that you need to run every hour, day, week?
Have Oozie run the jobs for you

• Receive an email when a job is complete

Private and Confidential 6

Why use Oozie ?

Private and Confidential 7

Oozie Architecture

Private and Confidential 8

Oozie Architecture

Private and Confidential 9

Oozie Workflow

Private and Confidential 10

Workflow Engine

Private and Confidential 11

How do you make a Workflow

Private and Confidential 12

How do you make a Workflow

Private and Confidential 13

Oozie Start, End and Error Nodes

Private and Confidential 14

Oozie Action Node

Private and Confidential 15

Oozie Map Reduce Node

Private and Confidential 16

Oozie Java Job Tag

Private and Confidential 17

Oozie File System Tag

Private and Confidential 18

Oozie Sub Workflow Tag

Private and Confidential 19

Oozie Fork/Join Nodes

Private and Confidential 20

Oozie Decision Nodes

Private and Confidential 21

Oozie Parameterization

Private and Confidential 22

Oozie Parameterization Example

Private and Confidential 23

Re-Running a Failed Job

Private and Confidential 24

Example - Workflow

Private and Confidential 25


Private and Confidential 26

Oozie Coordinator – Coordinator Job

Private and Confidential 27

Oozie Coordintor – coordinator. properties

• Contains control flow nodes & action nodes

• Control Flow Nodes:

• Defines beginning and ending of a workflow
• Control the execution path

• Action Nodes:
• A variety of actions are supported

• Workflow can be parameterized

• You can use the variables ${variable_name}

Private and Confidential 28

Oozie Coordintor – coordinator. properties

• Directs Oozie to the location of the coordinator job.

• Assigns values to variables referenced in both coordinator.xml as well


Private and Confidential 29

Submitting an Oozie Job – Step by Step


Create a directory in the Goto OozieWFonLocal Double click on
hduser’s home directory directory à Right Click à workflow.xml & select
(means you are creating a Create a document à
“Display” à write the
directory on your local Select “Empty File” à
machine), name it as Rename it as below code into
OozieWFonLocal “workflow.xml” workflow.xml

Private and Confidential 30

Submitting an Oozie Job – Step by Step

<workflow-app name="oozieWF" xmlns="uri:oozie:workflow:0.1">

<start to="wordcount"/>
<action name="wordcount">
<delete path="${PREFIX}/${wf:user()}/myoutput"/>

Private and Confidential 31

Submitting an Oozie Job – Step by Step

Private and Confidential 32

Submitting an Oozie Job – Step by Step
<value>/user/${wf:user()}/OozieWFonHDFS/GutenbergDocs </value>
<ok to="end"/>
<error to="fail"/>

Private and Confidential 33

Submitting an Oozie Job – Step by Step

<kill name="fail">
<message>Bummer, error message[${wf:errorMessage()}]
<end name="end"/>

Step 4: Create a directory at /home/hduser/OozieWFonLocal & name it as

lib à Copy the “WordCount.jar” file into lib directory

Step 5: Copy the “GutenbergDocs” directory to


Private and Confidential 34

Submitting an Oozie Job – Step by Step

Step 6: Create one more Empty File in

/home/hduser/OozieWFonLocal & name it as job.properties

Step 7: Copy the below code into job.properties file #JobTracker and
#prefix of the HDFS path for input and output, adapt!
#HDFS path where you need to copy workflow.xml and lib/*.jar to
#one of the values from Hadoop mapred.queue.names

Private and Confidential 35

Submitting an Oozie Job – Step by Step

Step 8: Now, create a directory on HDFS at /user/hduser as

OozieWFonHDFS à upload the OozieWFonLocal to OozieonHDFS hadoop
fs -mkdir /user/hduser/OozieonHDFS hadoop fs -put
/home/hduser/OozieonLocal/* /user/hduser/OozieonHDFS

Step 9: Next, invoke the Oozie workflow as follows:

cd $OOZIE_HOME/oozie/bin
export OOZIE_URL=http://localhost:11000/oozie
./oozie job -run -config /home/hduser/OozieWFonLocal/job.properties

Private and Confidential 36


• Oozie is a WorkFlow scheduler for Hadoop

• Oozie manages and integrates multiple Hadoop Jobs

• It has two main components, WorkFlow and Coordinator

• Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions

• Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered

by time (frequency) and data availabilty.

Private and Confidential 37

Thank you
Mumbai | Bangalore | Pune | Chennai | Jaipur


You might also like