Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 8

MACHINE EXERCISE:-5

A. Load the file by assuming an appropriate schema.


Stud= load ‘stud.txt’ using PigStorage(‘ ,‘)
as(rollno:int,name:chararray,age:int,sex:chararray, City:chararray,marks:float);

1. Display all the information from the student file in descending order of the marks and store it in
“ACET” directory.
a= order stud by marks desc;
store a into ‘ACET’;

2. Display the rollno and name of top 5 students who belongs to Amritsar
b= filter stud by city == ‘Amritsar’;
c= order b by marks desc;
d= foreach c generate rollno,name;
e= limit d 5;

3. Display all females from Amritsar , more than 20 years in age and who scored 80 marks .
a= filter stud by city == ‘Amritsar’ and age >20 and marks >80 and sex == ‘Female’;

4. Display all the name and age of all the students whose name starts with ,,S” and are from Delhi.
a= filter stud by name matches ‘S.*’ and city == ‘Delhi’;
b= foreach a generate name ,age;

5. Display the count of student city – wise.


a= group stud by city;
b= foreach a generate group ,COUNT (stud);

6. Display the average ,maximum and minimum marks obtained by the students city-wise.
a= group stud by city ;
b= foreach a generate group ,AVG (stud.marks ),MAX (stud.marks),MN (stud.marks);

7. Rank all the students in both Normal form and DENSE form in descending order.
a= rank stud by marks desc;
b= rank stud by marks desc DENSE;

8.Generate rank of all students whose age is more than 21 and who belonging to Ludhiana.
a= filter stud by age >21 and city == ‘Ludhiana’;
b= rank a by marks desc;

9. Generate a 20% sample of all students


a= sample stud 0.2;

STUDENT DATASETS:-
1. 121801,Rakesh ,20,Male,Amritsar,87

2. 121802,Meenu,21,Female,Delhi,93

3. 121803,Meera,23,Female,Ludhiana,84

4. 121804,Rohan,21,Male,Jaipur,75

5. 121805,Gopal,24,Male,Amritasr,82

6. 121806,jeeva,22,Female,Amritsar,85

7. 121807,Dheeru,23,Male,Jaipur,90

8. 121808,Lagan,22,Male,Delhi,78

9. 121809,Mallik,23,Male,Amritsar,81

10. 1218010,Monika,21,Female,Delhi,85

11. 1218011,Neelu,23,Female,Sonipat,76

12. 1218012,Neelma,22,Female,Ludhina,86

13. 1218013,Aditya,20,Male,Delhi,75

14. 1218014,priyanshu,22,Male,Panipat,80

15. 1218015,Ritu,20,Female,Nalanda,85
16. 1218016,Bharti,22,Female,Dharmpur,93
MACHINE EXERCISE:-06
1. Differentiate between group and cogroup operators in Apache Pig.
Group operator:-
The Group operator organizes relation data into groups based on specific fields names.The
Group operator produces two fields of data one fields named group contains data from the
field upon which group operators performed and second is Bag field that contain the column of
data.

Syntax:-
relname= Group relname1{ALL | BY fieldsnames}[USING ‘collected’|’merge’]
[PARTITION BY partioner][PARALLEL n];
Cogroup operator:-
 The cogroup operator is similar to work on group operator .
 The cogroup operator is used in statement involving two or more relation

Q2. Explain the various joins in Apache Pig.


The various Joins in apache pig are following here:-
 Self- join
 Inner-join
 Outer-join − left join, right join, and full join

Self-join is used to join a table with itself as if the table were two relations,
temporarily renaming at least one relation.
Syntax:-
grunt> Relation3_name = JOIN Relation1_name BY key, Relation2_name BY key
inner-join:- An inner join is used to returns rows when there is a match in both
tables.

Syntax:-
grunt> result = JOIN relation1 BY columnname, relation2 BY columnname;

outer join returns all the rows from at least one of the relations.
Left Outer Join

The left outer Join operation returns all rows from the left table, even if there are
no matches in the right relation.
Syntax:-
grunt> Relation3_name = JOIN Relation1_name BY id LEFT OUTER,
Relation2_name BY customer_id;

Right Outer Join

The right outer join operation returns all rows from the right table, even if there
are no matches in the left table.
Syntax:-

grunt> outer_right = JOIN customers BY id RIGHT, orders BY customer_id;

Full Outer Join

The full outer join operation returns rows when there is a match in one of the
relations.
Syntax:-

grunt> outer_full = JOIN customers BY id FULL OUTER, orders BY customer_id;


Q3. Explain the foreach operator in Apache Pig.
The FOREACH operator is used to generate specified data transformations based
on the column data.

Syntax:-
grunt> Relation_name2 = FOREACH Relation_name1 GENERATE (required data);

Q4. Explain the following operators in Apache Pig: (i) limit (ii) split (iii) order by.

• limit:- The limit operator is used to get a limited number of tuples from a
relation.

Syntax:-
grunt> Result = LIMIT Relation_name required number of tuples;

• Split:- The split operator is used to split a relation into two or more relations.

Syntax:-
grunt> SPLIT Relation1_name INTO Relation2_name IF (condition1),
Relation2_name (condition2),

• Order by:- The Order by operator is used to display the contents of a relation
in a sorted order based on one or more fields.

Syntax:-
grunt> Relation_name2 = ORDER Relatin_name1 BY (ASC|DESC);
Q5.Explain the following operators in Apache Pig: (i) distinct (ii) filter
 Distinct:- The distinct operator is used to remove redundant (duplicate) tuples
from a relation.

Syntax:-
grunt> Relation_name2 = DISTINCT Relatin_name1;

 Filter:- The Filter operator is used to select the required tuples from a relation
based on a condition.
Syntax:-
grunt> Relation2_name = FILTER Relation1_name BY (condition)

You might also like