Web Based Data Management of Apache Hive

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

3161607 – Big Data Analytics

WEB BASED DATA


MANAGEMENT OF APACHE HIVE
Submitted To : Prof. Pooja Bhatt

Present By :
Krupa Patel(190633116001)
Maitri Patel(190633116003)
Riya Soni(190633116004)

1
Outlines :
Origin
What is hive?
How Hive works?
Hive Architecture
Working of hive
Execution of Hive
limitations
Apache Hive Vs Pig
Hive Table
Summary

2
Origin:
Hive was Initially developed by Facebook.
 Data was stored in Oracle database every night
ETL(Extract,Transform,Load) was performed on Data
The Data growth was exponential
– By 2006 1TB /Day
– By 2010 10 TB /Day
– By 2013 about 5000,000,000 per day..etc and
there was a need to find some way to manage the data
“effectively”.

3
What is Hive?
Hive is a Data warehouse infrastructure built on top of
Hadoop that can compile SQL Quires as Map Reduce
jobs and run the jobs in the cluster.
Suitable for semi and structured databases.
Capable to deal with different storage and file formats.
 Provides HQL(SQL like Query Language) What Hive
is not.
Does not use complex indexes so do not response in
seconds.
But it scales very well , it works with data of peta byte
order.
4 It is not independent and its performance is tied
How Hive Works?
 Hive Built on top of Hadoop – Think HDFS and Map
Reduce
 Hive stored data in the HDFS
Hive compile SQL Quires into Map Reduce jobs and
run the jobs in the Hadoop cluster.
It stores schema in a database and processed data into
HDFS.
It is designed for OLAP.
We need reports to make operations better not to
conduct and operations.
5
 We use ETL to populate data in DW
Hive Architecture

6
Hive Architecture
User Interface – Hive is a data warehouse infrastructure
software that can create interaction between user and HDFS.

Meta Store – Hive chooses respective database servers to


store the schema or Metadata of tables, databases, columns in
a table, their data types, and HDFS mapping.

HiveQL Process Engine – HiveQL is similar to SQL for


querying on schema info on the Metastore. It is one of the
replacements of traditional approach for MapReduce
program.

7
Hive Architecture
Execution Engine : The conjunction part of HiveQL
process Engine and MapReduce is Hive Execution
Engine. It uses the flavor of MapReduce.

HDFS or HBASE – Hadoop distributed file system or


HBASE are the data storage techniques to store data
into file system. Extreme scalability (up to 100 PB) –
Self-healing storage .

8
Working of Hive :

9
Execution of Hive :
Execute Query : The Hive interface such as
Command Line or Web UI sends query to Driver (any
database driver such as JDBC, ODBC, etc.) to execute.
Get Plan : The driver takes the help of query
compiler that parses the query to check the syntax and
query plan or the requirement of query.
Get Metadata : The compiler sends metadata request
to Meta store (any database).
Send Metadata: Meta store sends metadata as a
response to the compiler.

10
Execution of Hive :
Send Plan : The compiler checks the requirement and
resends the plan to the driver. Up to here, the parsing
and compiling of a query is complete.
Execute Plan: The driver sends the execute plan to
the execution engine.
Execute Job: The execution engine sends the job to
JobTracker, which is in Name node and it assigns this
job to TaskTracker, which is in Data node. Here, the
query executes MapReduce job.

11
Execution of Hive :
Metadata Ops : Meanwhile in execution, the
execution engine can execute metadata operations with
Meta store.

Fetch Result : The execution engine receives the


results from Data nodes.

Send Results : The execution engine sends those


resultant values to the driver. The driver sends the
results to Hive Interfaces.
12
Limitations:
The biggest limitation of Hadoop is that one have to
use M/R model (Map-Reduce Model). Other
limitations are as stated below:
* Not Reusable
* Error prone
* Multiple stage of Map/Reduce functions for complex
jobs.
*It’s just like asking a developer to write physical
execution plan in the DB.

13
Apache Hive vs. Apache Pig

14
Hive Table:
A Hive Table: -

Data: file or group of files in HDFS .

Schema: in the form of metadata stored in a relational


database

You have to define a schema if you have existing data in


HDFS that you want to use in Hive.

Schema and Data are separate.


15
Defining a Table

16
Managing Table

17
Loading Data
 Use LOAD DATA to import data into Hive Table.
 Use the word OVERWRITE to write over a file of the same
name

18
Insert Data
 Use INSERT statement to populate data into a table from
another Hive table.
 Overwrite is used to replace the data in the table, Otherwise
the data is appended to the table

19
Performing Queries (HiveQL):
SELECT

20
Summary

21
Thank You…!

22

You might also like