Professional Documents
Culture Documents
Hive - Hands On Exercises: Intellipaat Software Solutions Pvt. LTD
Hive - Hands On Exercises: Intellipaat Software Solutions Pvt. LTD
Hive - Hands On Exercises: Intellipaat Software Solutions Pvt. LTD
Prerequisites
Create Database
Drop Database
Change to Database
Create Table
Load Data
Drop Table
Alter Table
Queries with Where Clause
Group By Queries
Join Tables
Partitioning
External Tables
Sequence Table
Map Join
Storing Data into file
Storing Data into another Table
Prerequisites
Create Database
Drop Database
Change to Database
Create Table
Syntax:
CREATE TABLE <table-name>
( <column name> <data-type>,
<column name> <data type>);
CREATE TABLE <table-name>
Example:
CREATE TABLE students(age int, name string) ROW FORMAT DELIMITED FIELDS
TERMINATED BY ‘,’;
Load Data
Drop Table
Syntax: DROP TABLE <TABLE_NAME>;
Note: If you drop table then you cannot use that table any more. Please recreate the table if
you want to use it further.
Alter Table
Syntax: ALTER TABLE <TABLE_NAME> ADD COLUMNS (NEW_COL COL_TYPE);
ALTER TABLE <TABLE_NAME> DROP [COLUMN] <COL_NAME>
Group By Queries
2, 2
Join Tables
Syntax: SELECT <TABLE1_NAME.COLUMN, TABLE2_NAME.COLUMN, …>
FROM <TABLE1> JOIN <TABLE2> ON (TABLE1.KEY = TABLE2.KEY)
Partitioning
External Tables
If you want to use existing data without copying it into /user/hive/warehouse then
Create EXTERNAL table with LOCATION construct as shown below.
Note: Don’t use LOAD after creating table.
2. Create external table and query for rows using select statement
CREATE EXTERNAL TABLE ext(age int, name string) ROW FORMAT DELIMITED FIELD
TERMINATED BY ‘,’ LOCATION ‘/tmp/table_name’;
4. Also dropping this table will not delete the actual file unlike in Managed tables
(Tables created without using Keyword EXTERNAL)
Sequence Table
Example
CREATE TABLE students_seq (age int, name string) ROW FORMAT DELIMITED FIELDS
TERMINATED BY',' STORED AS SEQUENCEFILE;
INSERT OVERWRITE TABLE students_seq SELECT * FROM students;
SELECT * FROM students_seq;
Map Join
MapJoin allows a table to be loaded into memory so that a join could be performed entirely
within a mapper. This improves performance of Joins in Hive.
Indexing
Hive UDF
1. mkdir com/example/hive/udf
2. vim Lower.java
3. Paste this content
package com.example.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;