Professional Documents
Culture Documents
Big Data
Big Data
CHAPTER-5
Introduction to HBase
Introduction
• Non relational.
• Not relational
• No joins
• No query language
• Not a drop in
replacement of
RDBMS.
Features
• Linear Scalability
• Backup option
PIG
Map Reduce
Parse statements
Plan
HDFS
Features of PIG
• Handles all kinds of data : Apache pig can analyzes all kind of data
structured and unstructured.
Work flow of PIG.
Can Hadoop be
used more
efficiently?
Let See…
Ideas
Not a great
idea.
Shrink
After
Before
Genetic change
Before After
Behind the scenes…?
Facebook
initially
developed hive
What is hive?
1. Column Types
2. Literals
3. Null Values
4. Complex Types
Column Types
Column type are used as column data types of Hive. They are as follows:
1. Integral Types
2. String Types
3. Timestamp
4. Dates
5. Decimals
6. Union Types
Literals
Floating point types are nothing but numbers with decimal points. Generally, this
type of data is composed of DOUBLE data type.
2.Decimal Type
Decimal type data is nothing but floating point value with higher range than
DOUBLE data type. The range of decimal type is approximately -10-308 to 10308.
Complex Types
Create Database
Drop Database
Create Table
Example
Syntax of example
Load Data Statement
Example
Alter Table Statement
Change Statement
Change Statement
Drop Table Statement
Example
Partition
Renaming Partition
Dropping a Partition
Operator
There are four types of operators in Hive:
1. Relational Operators
2. Arithmetic Operators
3. Logical Operators
4. Complex Operators
Built in Function
the built-in functions available in Hive. The functions look quite
similar to SQL functions, except for their usage.
Creating a View
Dropping a View
Creating a Index
Dropping a Index
Select Query
Order By
Group By
Join Table
Join
Left Outer Join
Right Outer Join
Full Outer Join
Physical Layout of hive
• Partitions : subdirectories of
corresponding Table Directory.
Encapsulation
• Hive engine translate all queries into a directed acyclic graph of map-
reduce jobs.
Normal table are created under External table read directly from hdfs file.
warehouse directory.
Normal table are directly visible through External table are not visible in
hdfs directory browsing. warehouse directory.
On dropping a normal table, the source Only dropping the external table only the
data and table metadata both are metadata is deleted.
deleted.
Joins
TEXTFILE ORC
PARQUET
Apache hive data file format.
SEQUENCE
FILE
AVRO
RCFILE
PIG VS HIVE
PIG HIVE
Pig is mainly used for programming Hive mainly used for data analysts.