Bda Lab Record Final ( (18071a0591) )

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Practical Record
On
BIG DATA ANALYTICS LABORATORY (18PC2CS06)
Submitted to
VNR Vignana Jyothi Institute of
Engineering & Technology
An autonomous Institute – NAAC ‘A++’ and
NBA Accredited
Bachelor of Technology
In
Computer Science & Engineering
(B Tech IV Year I sem)

Submitted By
Student Name: MD ABDUL RAB
Roll No: 18071A0591
VNR VignanaJyothi Institute of Engineering & Technology

Bachupally,Nizampet(S,O) ,Hyderabad-90

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 1


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

VNR VIGNANA JYOTHI INSTITUTE  OF


ENGINEERING AND TECHNOLOGY
Bachupally(v),Hyderabad,Telangana,India

Department of Computer Science & Engineering

CERTIFICATE
Certified that this is the bonafide record of the practical work done during the academic

 year………………………………2021...……………………………………………………………….by the student

Name……………………MD ABDUL RAB………………………………………………………………………………..

Hall Ticket No………18071A0591…………………………class…………IV B.Tech CSE 2…………………..

In the laboratory………..............…Big Data Analytics Laboratory.........……………………………..

Department of…………Computer Science and Technology……………………………………………….

Signature of the HOD                                                                       Signature of the Staff Member

Date of Exam……………………..

Signature of the Examiners

Internal examiner                                                                                 External examiner

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 2


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

VNR VIGNANA JYOTHI INSTITUTE OF


ENGINEERING AND TECHONOLOGY
Bachupally(v), Hyderabad, Telangana, India.

NAME:……………………………………MD ABDUL RAB.……………………………………………………………………………..

DEPARTEMENTOF: …........COMPUTER SCIENCE AND ENGINEERING..……………………………………….

ROLLNO: …………………18071A0591………………..…………………………………………………………………………..

LABORATORY: …………………….BIG DATA ANALYTICS LABORATORY…...............................................

CLASS: …………IV B.TECH CSE…………………………..……. SECTION: ………2………………………………………

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 3


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

WEEK No: TITLE OF THE PROGRAM Pg.no Date Signature

10

11

12

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 4


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

WEEK NO:01 DATE: 23/9/2021

PROGRAM TITLE:

Problem statement:

1. To know version of Hadoop

2. To List All files and directories in HDFS

3. To List All Directories in HDFS Root directory

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 5


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

4. To Copy file from local to HDFS

5. To check copied file in HDFS

6. To See Copied File contents in HDFS

7. To See Copied File contents in HDFS – method2

8. To copy file from local to HDFS using put command

9. To copy multiple files from local to HDFS using put command

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 6


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

10. To copy file from HDFS to Local

11. Check Health of HDFS

12. To create directory in HDFS

13. To copy file from HDFS to destination also in HDFS

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 7


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

14. To view files inside a directory in HDFS

15. To delete a file in HDFS

16. To create an empty file in HDFS

17. To check filesize of any file in HDFS

18. To count no of files and directories inside a directory in HDFS

19. To remove directory from HDFS

(method 2)

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 8


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

20. To copy multiple files within HDFS

21. To move file into directory in HDFS

22. Tocheck usage for individual commands

23. To get help for HDFS

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 9


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

.
.
.

24. To check the memory status

25. To check cluster balancing in HDFS

(above command is depreciated, instead use below command)

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 10


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

26. To change permission of a file in HDFS to 777

27. To empty trash in HDFS

28. To display last kilobytes of a file in HDFS.

29. To append contents of a file in local to a file in HDFS

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 11


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

WEEK NO:02 DATE: 28/10/2021

PROGRAM TITLE:

Problem statement:

1. Download JAR Files


2. Open Eclipse
3. Create Project named ‘mapreduce’
4. Create a new package named ‘word’
5. Create a java file in it named ‘Classword.java’
6. Copy/Write the below MapReduce program in the above created class file and save it.

package word;

importjava.io.IOException;
importjava.util.StringTokenizer;

importorg.apache.hadoop.io.IntWritable;
importorg.apache.hadoop.io.LongWritable;

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 12


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapreduce.Mapper;
importorg.apache.hadoop.mapreduce.Reducer;
importorg.apache.hadoop.conf.Configuration;
importorg.apache.hadoop.mapreduce.Job;
importorg.apache.hadoop.mapreduce.lib.input.TextInputFormat;
importorg.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;
importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
importorg.apache.hadoop.fs.Path;

publicclassClassword
{
publicstaticclass Map extends Mapper<LongWritable,Text,Text,IntWritable>
{
publicvoid map(LongWritable key, Text value,Context context)
throwsIOException,InterruptedException{
String line = value.toString();
StringTokenizertokenizer = newStringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
value.set(tokenizer.nextToken());
context.write(value, newIntWritable(1));
}
}
}

publicstaticclass Reduce extends


Reducer<Text,IntWritable,Text,IntWritable> {
publicvoid reduce(Text key, Iterable<IntWritable>values,Context context)
throwsIOException,InterruptedException {
int sum=0;
for(IntWritable x: values)
{
sum+=x.get();
}
context.write(key, newIntWritable(sum));
}
}

@SuppressWarnings("deprecation")
publicstaticvoid main(String[] args) throws Exception {

Configurationconf= newConfiguration();
Job job = new Job(conf,"My Word Count Program");
job.setJarByClass(Classword.class); //here put your class name
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path outputPath = new Path(args[1]);
//Configuring the input/output path from the filesystem into the job
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//deleting the output path automatically from hdfs so that we don't have
to delete it explicitly
outputPath.getFileSystem(conf).delete(outputPath);
//exiting the job only if the flag value becomes false
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 13


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

7. Add jar files to the build path


8. Export the program as a jar file named ‘mapreduce-ar.jar’ to the path ‘/home/cloudera’
9. Open the terminal
10. First Create a text file named ‘mapreduce-text-ar.txt’ for map-reduce

11. Move text file from local to HDFS and check its contents there

12. Now, perform Map-Reduce as below:

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 14


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

13. Exploring the map-reduce results folder named ‘mapreduce-result-ar’ so created.

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 15


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

WEEK NO:04 DATE: 15/11/2021

PROGRAM TITLE: Data Processing Tool – Pig (Latin based scripting lang)

Problem statement:

1. How to enter in grunt shell?

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 16


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

2. Create 2 datasets using gedit command in local

3. Copy files to HDFS

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 17


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

4. How to read your files data in PIG


a. For pigfile-ar.txt

b. For pigfile1-ar.txt

5. Specify schema for above 2 tables.


a. For pigfile-ar.txt

b. For pigfile1-ar.txt

6. Check schema of 2 tables

7. Combine the 2 tables

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 18


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

8. Split the dataset c into 3 relations e.g., d and e.


I need one dataset where $0 has value 1 and other dataset has $0 value 4

9. Do filtering on dataset c where $1 is greater than 6

10. Group dataset c by $2

11. Select column 1 and 2 from dataset a

12. Store result s1 into HDFS as pigresult-ar

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 19


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Check in HDFS

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 20


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

WEEK NO:05 DATE: 17/11/2021

PROGRAM TITLE: SQOOP

Problem statement:

1. How to enter in mysql CLI in cloudera

2. Create a database

3. Select the database created

4. Create a table inside database

5. Insert records in the table created

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 21


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

6. Check contents in the table

7. Import this table into HDFS

8. Check if data is imported in HDFS

9. Create a file In local system . Move the file into HDFS

10. Create another table

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 22


(18071A0591)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

11. Export the file abc.txt into the abc_table created using SQOOP

12. Check the table contents

BIG DATA ANALYTICS LAB 2022 BATCH PAGE NO: 23


(18071A0591)

You might also like