Professional Documents
Culture Documents
2019 - LSP - Unit 10 - Ecosystem Tools
2019 - LSP - Unit 10 - Ecosystem Tools
2019 - LSP - Unit 10 - Ecosystem Tools
CISC525 – Unit 10
Sangwhan Cha
Phil Grim
Before Unit 10
Project Draft is due on June 3, 11 pm
-> Each member should submit it individually. (Same ppt file)
: No, it doesn’t work. Each of you should record their voice by inserting Audio for all slides in
your ppt.
: For inserting audio, please, Click “Insert” -> “Audio” ->” Record Audio in the ppt.
: Yes, you can upload 2 files (your final project and team evaluation)
Learning Goals
Hue
Oozie
Sqoop
Hue
Hadoop User Experience, formerly known as Cloudera Desktop
Open Source under Apache License v2.0
Web portal to many Ecosystem components and functions
Hadoop
File browsing, upload, download
MapReduce Job Browsing
Data Access
Hive
HBase
Impala
SQL Databases
Workflows
Oozie
Pig
Sqoop
Hue
Examples -1
User Home Folder
Familiar interface for file browsing
Hue
Examples -2
HDFS Browser
Hue
Examples 3
Job Browser
Hue Examples 4
Job Browser
Hue
Examples 5
Hive Queries
Hue
Examples 6
Hive Queries
Hue
Examples 7
HBase Browser
Sqoop
Tool for efficiently transferring data between Hadoop and traditional data
stores such as RDBMSs.
Generates MapReduce job to accomplish transfers
Can both import and export data
Sequence Files
Hive
HBase
Accumulo
Avro
Sqoop –Contd-1
Natively supports many database systems with JDBC drivers
Oracle
MySQL
PostgreSQL
Microsoft SQL Server
Provides API for supporting other data sources and file types
Informatica
Pentaho
Couchbase
Supports full table import/export, incremental updates
Generates Java code that can be re-used in MapReduce jobs.
Sqoop Contd-2
$ sqoop help
Running Sqoop version: 1.4.5-mapr-1410
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the
results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
<!—Step 1 -->
<action name = "Insert_into_Table">
<hive xmlns = "uri:oozie:hive-action:0.4">
<action name = "Create_External_Table"> <job-tracker>xyz.com:8088</job-tracker>
<hive xmlns = "uri:oozie:hive-action:0.4"> <name-node>hdfs://rootname</name-node>
<job-tracker>xyz.com:8088</job-tracker> <script>hdfs_path_of_script/Copydata.hive</script>
<name-node>hdfs://rootname</name-node>
<param>database_name</param>
<script>hdfs_path_of_script/external.hive</script>
</hive>
</hive>
<ok to = "Create_orc_Table" /> <ok to = "end" />
<error to = "kill_job" /> <error to = "kill_job" />
</action> </action>
<!—Step 2 -->
<kill name = "kill_job">
<action name = "Create_orc_Table">
<message>Job failed</message>
<hive xmlns = "uri:oozie:hive-action:0.4"> </kill>
<job-tracker>xyz.com:8088</job-tracker>
<name-node>hdfs://rootname</name-node> <end name = "end" />
<script>hdfs_path_of_script/orc.hive</script>
</hive>
</workflow-app>
<ok to = "Insert_into_Table" />
<error to = "kill_job" />
</action>
Oozie
Workflow Example 2
Oozie
Running Job
</bundle-app>
</coordinator-app>
Oozie
Hue Integration
Oozie
Hue Integration –Contd. 1
Continued Reading
Sqoop
http://sqoop.apache.org
Oozie
http://oozie.apache.org
Hue Website
http://gethue.com