Pig

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

PIG

1
Definition:

Pig Represents Big Data as data flows.

Pig is a high-level platform or tool which is used to process the large


datasets.

It provides a high-level of abstraction for processing over the


MapReduce.

It was developed by Yahoo.

It provides a high-level scripting language, known as Pig Latin which


is used to develop the data analysis codes
Definition:

The Pig scripts get internally converted to Map Reduce jobs and get
executed on data stored in HDFS.

Pig can also execute its job in Apache Spark.

Pig can handle any type of data, i.e., structured, semi-structured or


unstructured and stores the corresponding results into Hadoop Data
File System.

Every task which can be achieved using PIG can also be achieved
using java used in MapReduce.
Architecture:
Pig Architecture contains the Pig Latin Interpreter and
will be used on the Client Machine.

It uses Pig Latin texts and converts text into a series of


MR tasks.
It will then extract the MR functions and save
the effect to HDFS.

In between, it perform various tasks such as Parse,


Compile, Prepare and Organize Performing data
into the system.
Parser: It checks the syntax of the script,
does type checking, and other miscellaneous checks.
It generates the Pig Latin statements and
logical operators.

Compiler:
The compiler compiles the optimized logical plan
into a series of MapReduce jobs.

Optimizer:
The output of parser passed to optimizer for
optimization.
Features of PIG:

It provides rich set of operations like filter, joins, sorts, etc.

It required less lines of codes.

It can handle both structured and unstructured data.

Ease of writing complex programs.


Difference between PIG and Map Reduce:

PIG Map Reduce

It is scripting language It is compiled programming language

It has less line of code as compared to map It has more lines of code.
reduce

It required less efforts to development It required more efforts for development.

It is high level data processing tool It is low level data processing tool

It provides built-in operators to perform data It is difficult to perform data operations in


operations like union, sorting and ordering. MapReduce.
Difference between PIG and HIVE:

PIG HIVE

It operates on client side of a cluster It operates on server side of a cluster

It uses pig-latin language, used for It uses HQL (Hive Query Language), used
programming for reporting.

It was developed by yahoo It was developed by Facebook

It is used to handle structured and It is used to handle structured data.


unstructured data

It loads the data quickly It loads the data slowly

It does not supports JDBC and ODBC It supports JDBC and ODBC
PIG Utility commands:
Grunt is PIG’s interactive shell. It enable us to enter PIG Latin
interactively and provides a shell for user to interact with HDFS.

To enter Grunt shell, we need to use pig command.

To enter local grunt shell we need to use


pig –x command.
PIG Utility commands:

Shell Commands
It used to execute fs shell command from grunt shell or PIG script.

Example:
grunt> fs –ls

grunt> fs –mkdir dir_name

Utility Commands:
grunt>clear

grunt>help

grunt>run myscript.pig
grunt>quit
PIG Utility commands:

cat filename: prints the content of file to stdout

copyFromLocal local_file hdfs_file:


copy a file from local file system to hdfs

copyToLocal hdfs_file local_file:


copy a file from hdfs file system to local file.

You might also like