© Hortonworks Inc. 2011 - 2018. All Rights Reserved

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Overview

1 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


Lesson Objectives

⬢ Present an overview of Hive


– Compare/contrast to RDBMS technologies
– Step through the architectural design
⬢ Explain how to perform classic operations
– Create and populate tables
– Utilize views
⬢ Review the performance improvements from the Stinger initiatives
⬢ Observe the demonstration: Data Manipulation with Hive

2 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


About Hive
Performing Operations in Hive
Performance Improvements

3 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


What is Hive?

⬢ Data warehouse system for Hadoop


⬢ Create schemas/table definitions that point to
data in Hadoop
⬢ Treat your data in Hadoop as tables
⬢ SQL 92
⬢ Interactive queries at scale

4 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


5 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Query Process

6 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


7 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
8 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
About Hive
Performing Operations in Hive
Performance Improvements

9 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


CREATE TABLE customer (
customerID INT,
firstName STRING,
lastName STRING,
birthday TIMESTAMP
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

10 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


CREATE EXTERNAL TABLE salaries (
gender string,
age int,
salary double,
zip int
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

11 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


CREATE EXTERNAL TABLE SALARIES (
gender string,
age int,
salary double,
zip int
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/user/train/salaries/';

12 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


13 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
14 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
15 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
16 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
17 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
18 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
DataNode DataNode

Mapper Reducer

DataNode DataNode

Mapper

DataNode DataNode

Mapper Reducer

19 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


DataNode DataNode

SELECT c.zip,
COUNT(*) Mapper Reducer

FROM DataNode
customers c DataNode

WHERE….
Mapper

DataNode DataNode

Mapper Reducer

20 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


DataNode DataNode

SELECT c.zip, JOIN orders o ON


COUNT(*) Mapper c.cid = o.cidReducer

FROM DataNode
customers c GROUP BY (c.zip)
DataNode

WHERE….
Mapper

DataNode DataNode

Mapper Reducer

21 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


DataNode DataNode

SELECT c.zip, JOIN orders o ON


COUNT(*) Mapper c.cid = o.cid Reducer

FROM DataNode
customers c GROUP BY (c.zip)
DataNode

WHERE….
Mapper

DataNode ORDER BY DataNode


DISTINCT

Mapper Reducer

22 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


23 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
CREATE VIEW 2010_visitors AS
SELECT fname, lname,time_of_arrival, info_comment
FROM wh_visits
WHERE
cast(substring(time_of_arrival,6,4) AS int) >= 2010
AND
cast(substring(time_of_arrival,6,4) AS int) < 2011;

24 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


from 2010_visitors
select *
where info_comment like "%CONGRESS%"
order by lname;

25 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


About Hive
Performing Operations in Hive
Performance Improvements

26 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


SELECT a.state, COUNT(*), AVG(c.price) FROM a
JOIN b ON (a.id = b.id)
JOIN c ON (a.itemId = c.itemId)
GROUP BY a.state

M M M M M M

R R R R
M M
M M
M M R
R
R R

M M

R R

27 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


CREATE TABLE tablename (
...
) STORED AS ORC;

ALTER TABLE tablename SET FILEFORMAT ORC;

SET hive.default.fileformat=Orc

28 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


LLAP (Live Long And Prosper)
Hive LLAP combines persistent query servers and intelligent in-memory caching to deliver blazing-fast
SQL queries without sacrificing the scalability Hive and Hadoop are known for.

● LLAP uses persistent query


servers to avoid long startup
times and deliver fast SQL.
● LLAP shares its in-memory cache
among all SQL users, maximizing
the use of this scarce resource.
● LLAP has fine-grained resource
management and preemption,
making it great for highly
concurrent access across many
users.
● LLAP is 100% compatible with
existing Hive SQL and Hive tools.

29 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


30 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
31 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
32 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
33 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
/apps/hive/warehouse

34 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


/apps/hive/warehouse

35 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


36 © Hortonworks Inc. 2011 – 2018. All Rights Reserved

37 © Hortonworks Inc. 2011 – 2018. All Rights Reserved


38 © Hortonworks Inc. 2011 – 2018. All Rights Reserved

You might also like