Professional Documents
Culture Documents
Web Based Data Management of Apache Hive
Web Based Data Management of Apache Hive
Web Based Data Management of Apache Hive
Present By :
Krupa Patel(190633116001)
Maitri Patel(190633116003)
Riya Soni(190633116004)
1
Outlines :
Origin
What is hive?
How Hive works?
Hive Architecture
Working of hive
Execution of Hive
limitations
Apache Hive Vs Pig
Hive Table
Summary
2
Origin:
Hive was Initially developed by Facebook.
Data was stored in Oracle database every night
ETL(Extract,Transform,Load) was performed on Data
The Data growth was exponential
– By 2006 1TB /Day
– By 2010 10 TB /Day
– By 2013 about 5000,000,000 per day..etc and
there was a need to find some way to manage the data
“effectively”.
3
What is Hive?
Hive is a Data warehouse infrastructure built on top of
Hadoop that can compile SQL Quires as Map Reduce
jobs and run the jobs in the cluster.
Suitable for semi and structured databases.
Capable to deal with different storage and file formats.
Provides HQL(SQL like Query Language) What Hive
is not.
Does not use complex indexes so do not response in
seconds.
But it scales very well , it works with data of peta byte
order.
4 It is not independent and its performance is tied
How Hive Works?
Hive Built on top of Hadoop – Think HDFS and Map
Reduce
Hive stored data in the HDFS
Hive compile SQL Quires into Map Reduce jobs and
run the jobs in the Hadoop cluster.
It stores schema in a database and processed data into
HDFS.
It is designed for OLAP.
We need reports to make operations better not to
conduct and operations.
5
We use ETL to populate data in DW
Hive Architecture
6
Hive Architecture
User Interface – Hive is a data warehouse infrastructure
software that can create interaction between user and HDFS.
7
Hive Architecture
Execution Engine : The conjunction part of HiveQL
process Engine and MapReduce is Hive Execution
Engine. It uses the flavor of MapReduce.
8
Working of Hive :
9
Execution of Hive :
Execute Query : The Hive interface such as
Command Line or Web UI sends query to Driver (any
database driver such as JDBC, ODBC, etc.) to execute.
Get Plan : The driver takes the help of query
compiler that parses the query to check the syntax and
query plan or the requirement of query.
Get Metadata : The compiler sends metadata request
to Meta store (any database).
Send Metadata: Meta store sends metadata as a
response to the compiler.
10
Execution of Hive :
Send Plan : The compiler checks the requirement and
resends the plan to the driver. Up to here, the parsing
and compiling of a query is complete.
Execute Plan: The driver sends the execute plan to
the execution engine.
Execute Job: The execution engine sends the job to
JobTracker, which is in Name node and it assigns this
job to TaskTracker, which is in Data node. Here, the
query executes MapReduce job.
11
Execution of Hive :
Metadata Ops : Meanwhile in execution, the
execution engine can execute metadata operations with
Meta store.
13
Apache Hive vs. Apache Pig
14
Hive Table:
A Hive Table: -
16
Managing Table
17
Loading Data
Use LOAD DATA to import data into Hive Table.
Use the word OVERWRITE to write over a file of the same
name
18
Insert Data
Use INSERT statement to populate data into a table from
another Hive table.
Overwrite is used to replace the data in the table, Otherwise
the data is appended to the table
19
Performing Queries (HiveQL):
SELECT
20
Summary
21
Thank You…!
22