Pig Questions

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Pig Questions

Q1. Consider the student data file (st.txt) Data in the following format Name, District; Age,
Gender:
i. Write a PIG Script to display Names of all male students.
ii. Write a PIG Script to find the number of students from Ghaziabad district.
iii. Write a PIG Script to display district wise count of all female students.

Ans.

1. Load the data from the student data file (st.txt).

students = LOAD 'st.txt' USING PigStorage(',') AS (Name:chararray,


District:chararray, Age:int, Gender:chararray);

2. Write a PIG Script to display Names of all male students.

male_students = FILTER students BY Gender == 'Male';


male_student_names = FOREACH male_students GENERATE Name;
DUMP male_student_names;

3. Write a PIG Script to find the number of students from Ghaziabad district.

Ghaziabad_students = FILTER students BY District == 'Ghaziabad';


grouped_Ghaziabad_students = GROUP Ghaziabad_students ALL;
count_students = FOREACH grouped_Ghaziabad_students GENERATE
COUNT(Ghaziabad_students);
DUMP count_students;

4. Write a PIG Script to display district wise count of all female students.

female_students = FILTER students BY Gender == 'Female';


grouped_female_students = GROUP female_students BY District;
count_female_students = FOREACH grouped_female_students
GENERATE group AS District, COUNT (female_students) AS
Female_Count;
DUMP count_female_students;
Q2. Consider the File (sample.txt), write the steps to compute the word count using pig latin script.
This is a hadoop class
hadoop is a bigdata technology

Ans.
Step I: Load the data from HDFS
input = LOAD '/path/to/file/' AS(line:Chararray);
Step II: Convert the Sentence into words
(TOKENIZE(line,' '));
O/P:
({(This),(is),(a),(hadoop),(class)})
({(hadoop),(is),(a),(bigdata),(technology)})
Step III: Convert Column into Rows
Words = FOREACH input GENERATE FLATTEN(TOKENIZE(line,' ')) AS word;
O/P:
(This)
(is)
(a)
(hadoop)
(class)
(hadoop)
(is)
(a)
(bigdata)
(technology)

Step IV: Apply GROUP BY

Grouped = GROUP words BY word;


Step V: Generate word count
wordcount = FOREACH Grouped GENERATE group, COUNT(words);
O/P:
DUMP wordcount;
(a,2)
(is,2)
(This,1)
(class,1)
(hadoop,2)
(bigdata,1)
(technology,1)

Q3. Consider the given information analyze the twitter data and explain the steps involved to
find how many tweets are created by a user using Pig latin.

Ans:

You might also like