Vnrvjiet

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

VNR VJIET Name of the Experiment: HDFS shell

commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._1_Date:21/9/20

Experiment 1

HDFS SHELL COMMANDS

Q1) What is HDFS?

HDFS commands are used to access the Hadoop File System. HDFS stands for
‘Hadoop Distributed File System’.

The HDFS is a sub-project of the Apache Hadoop project. This Apache Software
Foundation project is designed to provide a fault-tolerant file system designed to run
on commodity hardware. HDFS is accessed through a set of shell commands.

For executing the HDFS commands , open the terminal.

Q2) How to check the version of Hadoop framework?

Command: hadoop version

[cloudera@quickstart Desktop]$ hadoop version

Hadoop 2.6.0-cdh5.10.0

Q3) List the files and subdirectories present in HDFS

Command: hadoop fs –ls

This command will list all the available files and subdirectories under default
directory. For instance, in our example the default directory for Cloudera VM is
/user/cloudera

[cloudera@quickstart Desktop]$ hadoop fs -ls

1
Q4: Return all the directories under root directory

Variations of Hadoop ls Shell Command

[cloudera@quickstart Desktop]$ hadoop fs -ls /

2
VNR VJIET Name of the Experiment: HDFS shell
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._1_Date:21/9/20

Q5) Copy a file from local to HDFS?

copyFromLocal

HDFS Command to copy the file from a Local file system to HDFS.

Usage: hadoop fs -copyFromLocal<localsrc><hdfs destination>

First I have created a file in local named student.txt

[cloudera@quickstart Desktop]$ gedit students.txt

[cloudera@quickstart Desktop]$ hadoop fs -copyFromLocal students.txt


/user/cloudera

Q6: Check if the file is copied to HDFS?

[cloudera@quickstart Desktop]$ hadoop fs -ls

3
Q7: Check the contents of file that you copied in HDFS?

cat

HDFS Command that reads a file on HDFS and prints the content of that file to the
standard output.

Usage: hadoop fs –cat /path_to_file_in_hdfs

[cloudera@quickstart Desktop]$ hadoop fs -cat students.txt

Q8: Achieve the same operation as above with put command?

HDFS Command to copy single source or multiple sources from local file system to
the destination file system.

Usage: hadoop fs -put <localsrc><destination>

[cloudera@quickstart Desktop]$ hadoop fs -put students.txt


/user/cloudera/studentscopied.txt

[cloudera@quickstart Desktop]$ hadoop fs -ls

4
VNR VJIET Name of the Experiment: HDFS shell
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._1_Date:21/9/20

Q9) Copy any file from HDFS to Local File System

copyToLocal

HDFS Command to copy the file from HDFS to Local File System.

Usage: hadoop fs -copyToLocal<hdfs source><localdst>

[cloudera@quickstart Desktop]$ hadoop fs -copyToLocal students.txt


studentscopiedtolocal.txt

[cloudera@quickstart Desktop]$ ls

5
Q10) Check the health of the Hadoop file system.

fsck

HDFS Command to check the health of the Hadoop file system.

Command: hdfs fsck /

Hdfs fsck /

The filesystem under path '/' is HEALTHY

6
VNR VJIET Name of the Experiment: HDFS shell
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._1_Date:21/9/20

Q11) Create a directory in HDFS

HDFS Command to create the directory in HDFS.

Usage: hadoop fs –mkdir /directory_name

[cloudera@quickstart Desktop]$ hadoop fs –mkdir CSE2020

[cloudera@quickstart Desktop]$ hadoop fs -ls

7
Q12) Display the contents of a file inside a directory present in HDFS

[cloudera@quickstart Desktop]$ hadoop fs –mkdir CSE2020

[cloudera@quickstart Desktop]$ hadoop fs -ls

[cloudera@quickstart Desktop]$ hadoop fs –cp cse.txt CSE2020

Q13: Create an empty file in HDFS

touchz

HDFS Command to create a file in HDFS with file size 0 bytes.

Usage: hadoop fs –touchz /directory/filename

[cloudera@quickstart Desktop]$ hadoop fs –touchz empty.txt

[cloudera@quickstart Desktop]$ hadoop fs -ls

8
VNR VJIET Name of the Experiment: HDFS shell
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._1_Date:21/9/20

Q14) Check the file size of any file in HDFS

du

HDFS Command to check the file size.

Usage: hadoop fs –du –s /directory/filename

[cloudera@quickstart Desktop]$ hadoop fs –du students.txt

Q15: Print the contents of a file stored in HDFS

cat

HDFS Command that reads a file on HDFS and prints the content of that file to the
standard output.

9
Usage: hadoop fs –cat /path/to/file_in_hdfs

[cloudera@quickstart Desktop]$ hadoop fs -cat students.txt

Q16) Count the number of directories and files inside a directory in HDFS?

count

HDFS Command to count the number of directories, files, and bytes under the paths
that match the specified file pattern.

Usage: hadoop fs -count <path>

[cloudera@quickstart Desktop]$ hadoop fs –count CSE2020

Q17) Remove a file from HDFS?

rm

HDFS Command to remove the file from HDFS.

Usage: hadoop fs dfs –rm<path>

[cloudera@quickstart Desktop]$ hadoop fs -rm empty.txt

Deleted empty.txt

Q18: Delete a directory completely in HDFS?

rm -r

HDFS Command to remove the entire directory and all of its content from HDFS.

Usage: hadoop fs -rm -r <path>

[cloudera@quickstart Desktop]$ hadoop fs -rm –r CSE2020

Q19) Copy a file or multiple files in a directory in HDFS

cp

10
VNR VJIET Name of the Experiment: HDFS shell
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._1_Date:21/9/20

HDFS Command to copy files from source to destination. This command allows
multiple sources as well, in which case the destination must be a directory.

Usage: hadoop fs -cp<src><dest>

Command: hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2

[cloudera@quickstart Desktop]$ hadoop fs -mkdir /user/cloudera/ CSE2020

[cloudera@quickstart Desktop]$ hadoop fs –cp dummy3.txt CSE2020

Q20: Move a file into a directory in HDFS

mv

HDFS Command to move files from source to destination. This command allows
multiple sources as well, in which case the destination needs to be a directory.

Usage: hadoop fs -mv <src><dest>

[cloudera@quickstart Desktop]$ hadoop fs –mv emptyfile.txt CSE2020

Q21) Find a Usage for an individual command

Usage command gives all the options that can be used with a particular hdfs
command.

HDFS Command that returns the help for an individual command.

Usage: hadoop fs -usage <command>

Command: hdfsdfs -usage mkdir

[cloudera@quickstart Desktop]$ hdfs dfs -usage mkdir

Usage: hadoop fs [generic options] -mkdir [-p] <path> …

11
Q22) Find the help for a given or all commands

help

HDFS Command that displays help for given command or all commands if none is
specified.

Command: hadoop fs –help

Q23: Check the memory status

Check memory status:

Usage: hadoop fs -dfhdfs :/

[cloudera@quickstart Desktop]$ hadoop fs -df

Filesystem Size Used Available Use%

hdfs://quickstart.cloudera:8020 58531520512 1245229056 46116413440 2%

12
VNR VJIET Name of the Experiment: HDFS shell
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._1_Date:21/9/20

Q24) Check for cluster balancing in HDFS

Cluster Balancing

Usage: hadoop balancer

Type command

hadoop balancer

Q25) Change permission for a file to 777

chmod: Changes the permissions of files.

[cloudera@quickstart Desktop]$ hadoop fs -ls -r

[cloudera@quickstart Desktop]$ hadoop fs -chmod 777 /user/cloudera/students.txt

[cloudera@quickstart Desktop]$ hadoop fs -ls -r

13
Q26) Empty the trash in HDFS

14
VNR VJIET Name of the Experiment: HDFS shell
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._1_Date:21/9/20

expunge: Empties the trash. When you delete a file, it isn’t removed immediately
from HDFS, but is renamed to a file in the /trash directory. As long as the file
remains there, you can undelete it if you change your mind, though only the latest
copy of the deleted file can be restored.

[cloudera@quickstart Desktop]$ hadoop fs -expunge

27) Display the last kilobyte of the particular file

tail

This hadoop command will show the last kilobyte of the file to stdout.

[cloudera@quickstart Desktop]$ hadoop fs -tail /user/cloudera/n.txt

28) Append the contents of a file present in local to a file present in HDFS

15
appendToFile : Append single src, or multiple srcs from local file system to the
destination file system

[cloudera@quickstart Desktop]$ gedit first.txt

[cloudera@quickstart Desktop]$ gedit second.txt

[cloudera@quickstart Desktop]$ hadoop fs -copyFromLocal second.txt


/user/cloudera/

[cloudera@quickstart Desktop]$ hadoop fs -appendToFile


/home/cloudera/Desktop/first.txt /user/cloudera/second.txt

[cloudera@quickstart Desktop]$ hadoop fs -cat /user/cloudera/second.txt

29. getmerge

Usage: hdfsdfs -getmerge<src><localdst> [addnl]

Takes a source directory and a destination file as input and concatenates files in src
into the destination local file. Optionally -nl can be set to enable adding a newline
character at the end of each file.

[cloudera@quickstart Desktop]$ hadoop fs -mkdir merge

[cloudera@quickstart Desktop]$ hadoop fs -mv pigfile.txt tab.txt


/user/cloudera/merge

[cloudera@quickstart Desktop]$ hadoop fs -getmerge -nl merge


/home/cloudera/Desktop/mergedfile.txt

[cloudera@quickstart Desktop]$ cat mergedfile.txt

16
VNR VJIET Name of the Experiment: Apache Pig
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._2_Date:5/10/20

Experiment 2

APACHE PIG COMMANDS

Pig is an open-source high level data flow system. It provides a simple language
called Pig Latin, for queries and data manipulation, which are then compiled in to
MapReduce jobs that run on Hadoop.

Q1: How to enter in grunt shell?

[cloudera@quickstart Desktop]$ pig

17
Q2: Create two data sets using gedit command in local?

[cloudera@quickstart Desktop]$ gedit pigfile1.txt

1,2,3

4,5,6

7,8,9

[cloudera@quickstart Desktop]$ cat pigfile.txt

1,2,3

4,5,6

7,8,9

Q3: Copy the above files in HDFS?

[cloudera@quickstart Desktop]$ hadoop fs -put pigfile.txt /user/cloudera/

[cloudera@quickstart Desktop]$ hadoop fs -put pigfile1.txt /user/cloudera/

18
VNR VJIET Name of the Experiment: Apache Pig
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._2_Date:5/10/20

Q4: How to read your (pigfile.txt and pigfile1.txt) data in PIG

grunt> a = LOAD '/user/cloudera/pigfile.txt' using PigStorage(',');

grunt> dump a;

OUTPUT

(1,2,3)

(4,5,6)

(7,8,9)

19
grunt> b = LOAD '/user/cloudera/pigfile1.txt' using PigStorage(',');

20
VNR VJIET Name of the Experiment: Apache Pig
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._2_Date:5/10/20

grunt> dump b;

21
NOTE: If we want to specify schema, we can, but pig is flexible in that. The
columns can be referred as $0 , $1 and so on. But even if you want to specify
schema we can.

Q4: Specify the schema for above two tables?

grunt> a = LOAD '/user/cloudera/pigfile.txt' using PigStorage(',') as (a1:int, a2:int,


a3:int);

grunt> dump a;

22
VNR VJIET Name of the Experiment: Apache Pig
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._2_Date:5/10/20

grunt> b = LOAD '/user/cloudera/pigfile1.txt' using PigStorage(',') as (b1:int, b2:int,


b3:int);

grunt> dump b;

23
Q5: Check the schema of the two tables?

grunt> describe a;

OUTPUT

a: {a1: int,a2: int,a3: int}

grunt> describe b;

OUTPUT

b: {b1: int,b2: int,b3: int}

Q6: Combine the two tables

grunt> c= union a,b;

grunt> dump c

24
VNR VJIET Name of the Experiment: Apache Pig
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._2_Date:5/10/20

Q7: Split the c data setinto two different relationseg. d and e? E.g. I want one
data set where $0 is having value 1 and other data set where value of $0 is 4

grunt> SPLIT c INTO d IF $0 == 1 , e IF $0 == 4;

>dump d;

> dump e;

25
Q8: Do filtering on data set c where $1 is greater than 3?

grunt> f = FILTER c BY $1 > 3;

grunt> dump f

Q9: Group data set c by $2?

grunt> g = GROUP c by $2;

grunt> dump g;

Q: 10: Select column 1 and 2 from dataset a ?

grunt> s1 = foreach a generate $1,$2;

grunt> dump s1;

26
VNR VJIET Name of the Experiment: Apache Pig
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._2_Date:5/10/20

Q11: Store the above result in HDFS?

grunt> store s1 into '/user/cloudera/pigresult';

To check)

27
Q12: check the file written in HDFS

[cloudera@quickstart Desktop]$ hadoop fs -ls /user/cloudera/pigresult/

Now see what’s inside part-m-00000

[cloudera@quickstart Desktop]$ hadoop fs -cat


/user/cloudera/pigresult/part-m-00000

28
VNR VJIET Name of the Experiment: Apache Sqoop
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._3_Date:19/10/20

EXPERIMENT 3
Sqoop commands

Aim: To understand the concept of data ingestion tool “Sqoop”


Objective:
• To import structured data from MYSQL to HDFS
• To export text file (structured data) to MYSQL
Key concept:
• Apache Sqoop is a tool in Hadoop ecosystem which is designed to
transfer data between HDFS (Hadoop storage) and relational
database servers like mysql, Oracle RDB, SQLite, Teradata,
Netezza, Postgres etc.
• Apache Sqoop imports data from relational databases to HDFS,
and exports data from HDFS to relational databases. It efficiently
transfers bulk data between Hadoop and external datastores such
as enterprise data warehouses, relational databases, etc.
• This is how Sqoop got its name – “SQL to Hadoop & Hadoop to
SQL”.
Q1: How to enter in mysql CLI in cloudera
[cloudera@quickstart Desktop]$ mysql -uroot -pcloudera

29
Q2: Create a database
mysql> create database sqoop_db;
Query OK, 1 row affected (0.27 sec)

Q3: Select the database created


mysql> use sqoop_db;

Q3: Create a table inside created database


mysql> create table emp(empno int primary key, ename varchar(10),job
varchar(9),mgr int,hiredate date,sal int,deptno int);
Query OK, 0 rows affected (0.08 sec)

Q4: Insert records in the table created


mysql> INSERT INTO emp VALUES (1,'kriti','teacher',100,2013-1-12,
50000,10);

Q5: Check the contents in the table?


mysql> select * from emp;

30
VNR VJIET Name of the Experiment: Apache Sqoop
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._3_Date:19/10/20

Sqoop – Import
Q6: Import this table in HDFS

[cloudera@quickstart Desktop]$ sqoop import --connect


jdbc:mysql://localhost/sqoop_db --username root --password cloudera
--table emp --target-dir /user/cloudera/scoop_data

Q7: Check if the data is imported in HDFS


[cloudera@quickstart Desktop]$ hadoop fs -ls /user/cloudera/scoop_data

31
Q8: Open the partitions and check for the records
[cloudera@quickstart Desktop]$ hadoop fs –cat
/user/cloudera/scoop_data/part-m-00000
[cloudera@quickstart Desktop]$ hadoop fs -cat
/user/cloudera/scoop_data/part-m-00001
[cloudera@quickstart Desktop]$ hadoop fs -cat
/user/cloudera/scoop_data/part-m-00002

Sqoop – Export
Q9: Export data(like csv) in MYSQL in a table?

Create a file in local in the Desktop, suppose abc.txt

gedit abc.txt

Next put this abc.txt in HDFS

[cloudera@quickstart Desktop]$ hadoop fs -put abc.txt /user/cloudera/

Come to MYSQL prompt

32
VNR VJIET Name of the Experiment: Apache Sqoop
commands

Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._3_Date:19/10/20

Create a table suppose abc_table

NOTE (you should first write use database_name command: that


means you need to tell in which database you wish to create a table,
in our case i’am using database sqoop_db)

mysql> use sqoop_db;


sql> create table abc_table( a int, b int, c int);
Query OK, 0 rows affected (0.13 sec)

In terminal type the following command to export abc.txt in


abc_table table
[cloudera@quickstart Desktop]$ sqoop export --connect
jdbc:mysql://localhost/sqoop_db --username root --password cloudera
--table abc_table --export-dir /user/cloudera/abc.txt

33
Q10: Check in MYSQL if data is exported from HDFS into
abc_table
mysql> select * from abc_table;

34
VNRVJIET Name of the Experiment:Apache Hive
commands

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._4__ Date: 4/1/21

EXPERIMENT 4

HIVE commands

Q1:Goto Hive shell

Q2:Create and use a database

Q3: How to create Managed Table in HIVE?


>create table emp(empno int, ename string, job string, sal int, deptno int)
row format delimited fields terminated by ',';

file called empdetails.txt is createdon desktop


Q4:How to check where the managed table is created in hive
hadoop fs -ls /user/hive/warehouse/emp_details.db/emp

35
Q5:Also check the contents inside emp:
hadoop fs -cat /user/hive/warehouse/emp_details.db/emp/emp_details.txt

Q6:Check the schema of the created table emp?


describe emp;

describe extended emp;

Q7:How to see all the tables present in database

36
VNRVJIET Name of the Experiment:Apache Hive
commands

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._4__ Date: 4/1/21

show tables;

Q8:Select all the enames from emp table

Q9:Select ename from emp table where ename=’A’

​ Q9:Count the total number of records in the created table


Select count(*) from emp;

37
Q10:Group the sum of salaries as per the deptno

select deptno, sum(sal) from emp group by deptno;

Q11.Get the salary of people between 1000 and 2000

select * from emp where sal between 1000 and 2000;

​ Q12:Select the name of employees where job has exactly 5


characters

select ename from emp where job LIKE '_____';

38
VNRVJIET Name of the Experiment:Apache Hive
commands

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._4__ Date: 4/1/21

Q13:List the employee names where job has l as the second


character

select ename from emp where job LIKE '_l%';

Q14: Retrieve the total salary for each department

select deptno, sum(sal) from emp group by deptno;

39
Q15: Add a column to the table

alter table emp add COLUMNS(lastname string);

Q16:How to Rename a table

alter table emp rename to emp1;

Q17:How to drop table

drop table emp1;

40
VNRVJIET Name of the Experiment:Apache Hive
commands

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._4__ Date: 4/1/21

Movies dataset and Hive commands:


Q1.Create a database called movies
$hive
>create database movies;
>use movies;

Q2.Create movies_details table in movies database

41
>create table movie_details(no int,
name string,
year int,
rating decimal,
views int,
Genres string,
Director string)
row format delimited fields terminated by ',';

Q3.Load the dataset of movies from local to hive table


>LOAD DATA LOCAL INPATH
'/home/cloudera/Desktop/hive_demo/movies_new' INTO table
movie_details;

Q4.Check if table is created.


>show tables;

Q5.Retrieve all the records in moive_details


42
VNRVJIET Name of the Experiment:Apache Hive
commands

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._4__ Date: 4/1/21

>select * from movie_details;

43
Q6.Print all movies between year 2005 and 2017
>select * from movie_details where year between 2005 and 2017;

Q7.Select all records where movie name starts with letter c or C.


>select * from movie_details where name like 'C%' or name like 'c%';

44
VNRVJIET Name of the Experiment:Apache Hive
commands

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._4__ Date: 4/1/21

Q8.Select all records where movie name starts with Devil’s


>select * from movie_details where name LIKE 'Devil's%';

Q9.What is the maximum rating among all the movies.


>select max(rating) from movie_details;

Q10. count the number of records


>select count(*) from movie_details;

45
Q11. select rating of the movie Scott Cooper
>SELECT name,rating FROM movie_details WHERE name = 'Scott
Cooper';

Q12. select rating of the movie The Post


> SELECT name,rating FROM movie_details WHERE name = 'The
Post';

Q13. Print all movies under the direction of Daniel Barnz and James
Franco?
> select name ,rating from movie_details where Director='Daniel Barnz'
or Director = 'James Franco'

Q14. Retriew the movies


which has rating more than 6?
> select * from movie_details where rating>6;

46
VNRVJIET Name of the Experiment:Apache Hive
commands

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._4__ Date: 4/1/21

Q15. Count the adventure Movies from the given dataset.

47
Q16. Count the Adventure|Animation|Comedy|Family|Fantasy
Movies from the given dataset.
> SELECT count(*) FROM movie_details where
Genres='Adventure|Animation|Comedy|Family|Fantasy' ;

Q17. Group the movie names as per the Genres?

48
VNRVJIET Name of the Experiment:Apache Hive
commands

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._4__ Date: 4/1/21

>

49
Word count problem in Hive:

Aim: To perform word count on a text file using Hive query Language
Objective: To perform word count on a text file using functions like split
and explode

1)Create and use database

hive> create database hive_count_table;

hive> use hive_count_table;

2)Create a table

hive> create table hive_count_table(data string);

50
VNRVJIET Name of the Experiment:Apache Hive
commands

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._4__ Date: 4/1/21

3)Make a .txt file on local machine consisting of few sentences, in my


case I made hive_count.txt, then load that data to table called
hive_count_table

hive> LOAD DATA LOCAL INPATH


'/home/cloudera/Desktop/hive_word.txt' into table hive_count_table;

hive> show tables;

4)Show data in the table

hive> select * from hive_count_table;

5)The data we have is in sentences, first we have to convert that it


into words applying space as delimiter using split function

hive> select split(data,' ') from hive_count_table;

51
6) explode is to expand an array in a single row across multiple rows,
one for each value in the array.

hive> select explode( split(data,' ')) as word from hive_count_table;

7)Count words and their frequency

hive> select word,count(1)from (select explode(split(data,' ')) AS word


from hive_count_table)tmp group by word;

52
VNRVJIET Name of the Experiment:Apache Hive
commands

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._4__ Date: 4/1/21

53
EXPERIMENT-5

Map Reduce Word count Problem


1. Download the jar files
2. Open eclipse

Press ok on the prompt.


3. Click File->New Project->Java Project

Give project name-- > “mapreduce” and click on finish button

54
VNRVJIET Name of the Experiment: MapReduce
word count problem

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._5_Date:25/1/2020

4. Create a package in this project


Right click on the project “mapreduce” select newpackage

55
Give path as src folder in the project

Give package name as “word” then click finish

5. Create a class  ClassWord

56
VNRVJIET Name of the Experiment: MapReduce
word count problem

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._5_Date:25/1/2020

6. Copy the given mapreduce program in the created class and save it

57
7. Add jar files by following below steps to remove the errors from above code.

Right click on project->Build Path->Configure Build Path

58
VNRVJIET Name of the Experiment: MapReduce
word count problem

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._5_Date:25/1/2020

Goto Libraries->Add external JARs

Select the downloaded jar files.

59
8. Create a jar file for a given program

Right click on project->Export

Select JAR file as export type.

60
VNRVJIET Name of the Experiment: MapReduce
word count problem

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._5_Date:25/1/2020

You can give a name to the jar file.

Click on Finish.

61
9. Jar file will be created on the desktop

10. Open the terminal


Create a txt file (read.txt) on the desktop and move it to HDFS

hi how are you

hope you are fine

we are all good

hi we will meet

see you soon

sure soon we will meet------>content of read.txt

Check whether the file is uploaded to cloudera.

62
VNRVJIET Name of the Experiment: MapReduce
word count problem

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._5_Date:25/1/2020

Execute the jar file

hadoop jar Untitled.jar word.Classword /user/cloudera/read.txt /user/cloudera/result

63
64
VNRVJIET Name of the Experiment: MapReduce
word count problem

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._5_Date:25/1/2020

Check for the output of mapreduce word count problem.

65
VNRVJIET Name of the Experiment: Data
visualization using Tableau

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._6_Date:8/2/2020

EXPERIMENT-6

Data visualization using Tableau

Tableau is the fastest growing data visualization and data


analytics tool that aims to help people see and understand data. In other
words, it simply converts raw data into a very easily understandable
format. Data analysis is great, as it is a powerful visualization tool in the
business intelligence industry.

1) Data uploading in Tableau

You can upload data into tableau using MSExcel or Text file or by
connecting to database server.

After loading your dataset you can view its content as shown in below
figure.There can be one or more tables in your dataset and you can apply
union operation to combine the tables for effective data visualization.

67
Union of two tables.

68
VNRVJIET Name of the Experiment: Data
visualization using Tableau

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._6_Date:8/2/2020

2) Creating sheets in Tableau-worksheets can be created by clicking on


create newsheet in the bottom as shown.

69
3) Applying Filter in Tableau-we can right click the particular column or
row and select apply filter and give respective filter.

4) Creating Dashboard-dashboards can be created by clicking on create


dashboard in the bottom as shown.Dashboards have sequence of
worksheets.

70
VNRVJIET Name of the Experiment: Data
visualization using Tableau

Roll no:17071A0541

Name of laboratory: BDA LAB ExperimentNo._6_Date:8/2/2020

5) Creating Storyboard-storyboards can be created by clicking on create


storyboard in the bottom as shown.Storyboards have sequence of
worksheets or dashboards.

71

You might also like