Professional Documents
Culture Documents
Unit-5 - Hive
Unit-5 - Hive
1
Introduction
• Apache Hive is an open source data warehouse system built on top of
Hadoop used for querying and analyzing large datasets stored in
Hadoop files.
• Initially, you have to write complex Map-Reduce jobs, but now with the
help of the Hive, you just need to submit merely SQL queries.
• Hive is mainly targeted towards users who are comfortable with SQL.
• Hive use language called HiveQL (HQL), which is similar to SQL.
• HiveQL automatically translates SQL-like queries into MapReduce jobs.
2
Hive Architecture
3
Hive Architecture - Component
• Metastore –
• It stores metadata for each of the tables like their schema and location.
• Hive also includes the partition metadata.
• This helps the driver to track the progress of various data sets distributed
over the cluster.
• Driver –
• It acts like a controller which receives the HiveQL statements.
• The driver starts the execution of the statement by creating sessions.
• It monitors the life cycle and progress of the execution.
4
Hive Architecture - Component
• Compiler –
• It performs the compilation of the HiveQL query.
• This converts the query to an execution plan.
• The plan contains the tasks.
• It also contains steps needed to be performed by the MapReduce to get the
output as translated by the query.
• Optimizer –
• It performs various transformations on the execution plan.
• It aggregates the transformations together, such as converting a pipeline of
joins to a single join, for better performance.
5
Hive Architecture - Component
• Executor –
• Once compilation and optimization complete, the executor executes the
tasks.
• Executor takes care of pipelining the tasks.
6
Hive shell, Hive services, Hive metastore
7
Comparison with traditional database
RDBMS Hive
It is used to maintain database. It is used to maintain data warehouse.
It uses SQL (Structured Query Language). It uses HQL (Hive Query Language).
9
• Arithmetic Operators - +, - , *, /, %
• Relational Operators -> =, !=, <, <=, >, >=, is, is not, LIKE etc
• Logical Operators - > AND, OR, NOT
• Complex Operator ->
• These operators provide an expression to access the elements of Complex
Types
• A[n] - A is an Array and n is an int
• M[key] - M is a Map<K, V> and key has type K
• S.x - S is a struct
10
Hive DDL Commands
• Hive DDL commands are the statements used for defining and changing the
structure of a table or database in Hive.
The several types of Hive DDL commands are:
• CREATE
• SHOW
• DESCRIBE
• USE
• DROP
• ALTER
• TRUNCATE
11
Hive DDL Commands
12
Hive DDL Commands
13
Hive DDL Commands
14
HiveQL - DML
15
Hive tables
16
Hive tables
17
User Defined Functions
• In Hive, the users can define own functions to meet certain client
requirements.
• These are known as UDFs in Hive.
• User Defined Functions written in Java for specific modules.
• Basically, we can use two different interfaces for writing Apache
Hive User Defined Functions.
• Simple API
• Complex API
18
User Defined Functions
Simple API
• Basically, with the simpler UDF API, building a Hive User Defined Function
involves little more than writing a class with one function.
20
Aggregate Functions
• Sum –
hive> select sum(salary) from employee;
• Count –
hive> select count(*) from employee;
• Average –
hive> select avg(salary) from employee where
location='Banglore';
21
Aggregate Functions
• Minimum –
hive> select min(salary) from employee;
• Maximum -
hive> select max(salary) from employee;
• Variance -
hive> select variance(salary) from employee;
• Standard Deviation -
hive> select stddev_pop(salary) from employee;
22
Joins and Subqueries
23
Joins and Subqueries
• LEFT OUTER JOIN - The HiveQL LEFT OUTER JOIN returns all the
rows from the left table, even if there are no matches in the right
table.
• RIGHT OUTER JOIN - The HiveQL RIGHT OUTER JOIN returns all the
rows from the right table, even if there are no matches in the left
table.
• FULL OUTER JOIN - The HiveQL FULL OUTER JOIN combines the
records of both the left and the right outer tables that fulfil the JOIN
condition.
24
CUSTOMERS Table
Join
25
LEFT OUTER JOIN
26
RIGHT OUTER JOIN
27
FULL OUTER JOIN
28
Let’s put your knowledge to the test29
Map Reduce scripts
30
Q & A Time
We have 10 Minutes for Q&A
31