2019 - LSP - Unit 10 - Ecosystem Tools

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Ecosystem Tools

CISC525 – Unit 10
Sangwhan Cha
Phil Grim
Before Unit 10
 Project Draft is due on June 3, 11 pm
-> Each member should submit it individually. (Same ppt file)

 Final Project is due on June 17, 11 pm


-> Each member should submit it individually. (Same ppt file)
-> Will include voice annotation of the team members presenting the materials
-> Team evaluation is due on June 17, 11 pm
- Team evaluation template is provided

 Final Exam : June 11 to June 17, 11 pm

 There is the last assignment in unit 10


FAQ
- The entire solution in .pptx format. Here each team member records their voice while
presenting a part of the solution. We were thinking of a screen record and exporting that as a
video. Does that work? Will Moodle support such formats?

: No, it doesn’t work. Each of you should record their voice by inserting Audio for all slides in
your ppt.
: For inserting audio, please, Click “Insert” -> “Audio” ->” Record Audio in the ppt.

- The team evaluation sheet that you will share.


Each team member uploads his/her updated copy on the Moodle.
Here, Moodle should allow us to upload multiple files?

: Yes, you can upload 2 files (your final project and team evaluation)
Learning Goals

 Students will be able to demonstrate


the use of Apache Hue to interface with
Big Data Ecosystem components.
 Students will be able to explain the
characteristics and uses of Apache
Sqoop.
 Students will be able to explain the
characteristics and uses of Apache
Oozie.
Overview

Hue
Oozie
Sqoop
Hue
 Hadoop User Experience, formerly known as Cloudera Desktop
 Open Source under Apache License v2.0
 Web portal to many Ecosystem components and functions
 Hadoop
 File browsing, upload, download
 MapReduce Job Browsing
 Data Access
 Hive
 HBase
 Impala
 SQL Databases
 Workflows
 Oozie
 Pig
 Sqoop
Hue
Examples -1
 User Home Folder
 Familiar interface for file browsing
Hue
Examples -2
 HDFS Browser
Hue
Examples 3
 Job Browser
Hue Examples 4

 Job Browser
Hue
Examples 5

 Hive Queries
Hue
Examples 6

 Hive Queries
Hue
Examples 7
 HBase Browser
Sqoop
 Tool for efficiently transferring data between Hadoop and traditional data
stores such as RDBMSs.
 Generates MapReduce job to accomplish transfers
 Can both import and export data
 Sequence Files
 Hive
 HBase
 Accumulo
 Avro
Sqoop –Contd-1
 Natively supports many database systems with JDBC drivers
 Oracle
 MySQL
 PostgreSQL
 Microsoft SQL Server
 Provides API for supporting other data sources and file types
 Informatica
 Pentaho
 Couchbase
 Supports full table import/export, incremental updates
 Generates Java code that can be re-used in MapReduce jobs.
Sqoop Contd-2
$ sqoop help
Running Sqoop version: 1.4.5-mapr-1410
usage: sqoop COMMAND [ARGS]

Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the
results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information

See 'sqoop help COMMAND' for information on a specific command.


Sqoop Contd-3
$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--table emp --m 1

14/12/22 15:24:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5


14/12/22 15:24:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/12/22 15:24:56 INFO tool.CodeGenTool: Beginning code generation
14/12/22 15:24:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 14/12/22
15:24:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
14/12/22 15:24:58 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
14/12/22 15:25:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-
hadoop/compile/cebe706d23ebb1fd99c1f063ad51ebd7/emp.jar
----------------------------------------------------- -----------------------------------------------------
14/12/22 15:25:40 INFO mapreduce.Job: The url to track the job:
http://localhost:8088/proxy/application_1419242001831_0001/
14/12/22 15:26:45 INFO mapreduce.Job: Job job_1419242001831_0001 running in uber mode : false
14/12/22 15:26:45 INFO mapreduce.Job: map 0% reduce 0%
14/12/22 15:28:08 INFO mapreduce.Job: map 100% reduce 0% 14/12/22 15:28:16 INFO mapreduce.Job: Job
job_1419242001831_0001 completed successfully
----------------------------------------------------- -----------------------------------------------------
14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Transferred 145 bytes in 177.5849 seconds (0.8165 bytes/sec)
14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Retrieved 5 records.
Sqoop Contd.-4
Oozie

 Workflow scheduler system to control Hadoop jobs


 Workflows implemented as Directed Acyclical Graphs of actions
 Oozie Coordinator jobs used to schedule recurring jobs triggered by time
 Supports many ecosystem components out of the box
 MapReduce
 Pig
 Hive
 Sqoop
 Command line and Web interface, Hue integration
Oozie
Workflow Example
<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "simple-Workflow"> <!—Step 3 -->
<start to = "Create_External_Table" />

<!—Step 1 -->
<action name = "Insert_into_Table">
<hive xmlns = "uri:oozie:hive-action:0.4">
<action name = "Create_External_Table"> <job-tracker>xyz.com:8088</job-tracker>
<hive xmlns = "uri:oozie:hive-action:0.4"> <name-node>hdfs://rootname</name-node>
<job-tracker>xyz.com:8088</job-tracker> <script>hdfs_path_of_script/Copydata.hive</script>
<name-node>hdfs://rootname</name-node>
<param>database_name</param>
<script>hdfs_path_of_script/external.hive</script>
</hive>
</hive>
<ok to = "Create_orc_Table" /> <ok to = "end" />
<error to = "kill_job" /> <error to = "kill_job" />
</action> </action>

<!—Step 2 -->
<kill name = "kill_job">
<action name = "Create_orc_Table">
<message>Job failed</message>
<hive xmlns = "uri:oozie:hive-action:0.4"> </kill>
<job-tracker>xyz.com:8088</job-tracker>
<name-node>hdfs://rootname</name-node> <end name = "end" />
<script>hdfs_path_of_script/orc.hive</script>
</hive>
</workflow-app>
<ok to = "Insert_into_Table" />
<error to = "kill_job" />
</action>
Oozie
Workflow Example 2
Oozie
Running Job

$ oozie job --oozie http://host_name:8080/oozie -D


oozie.wf.application.path=hdfs://namenodepath/pathof_workflow_xml/workflow.xml-
run
Oozie
Coordinator and Bundle Example
<coordinator-app xmlns = "uri:oozie:coordinator:0.2" name = <bundle-app xmlns = 'uri:oozie:bundle:0.1'
"coord_copydata_from_external_orc" frequency = "5 * * * *" start = name = 'bundle_copydata_from_external_orc'>
"2016-00-18T01:00Z" end = "2025-12-31T00:00Z"" timezone =
"America/Los_Angeles"> <controls>
<kick-off-time>${kickOffTime}</kick-off-time>
<controls> </controls>
<timeout>1</timeout>
<concurrency>1</concurrency> <coordinator name = 'coord_copydata_from_external_orc'
<execution>FIFO</execution> >
<throttle>1</throttle> <app-path>pathof_coordinator_xml</app-path>
</controls> <configuration>
<property>
<action> <name>startTime1</name>
<workflow> <value>time to start</value>
<app-path>pathof_workflow_xml/workflow.xml</app-path> </property>
</workflow> </configuration>
</action>
</coordinator>

</bundle-app>
</coordinator-app>
Oozie
Hue Integration
Oozie
Hue Integration –Contd. 1
Continued Reading

Sqoop
http://sqoop.apache.org

Oozie
http://oozie.apache.org

Hue Website
http://gethue.com

You might also like