Vnrvjiet

V N RV J I E TName of the Experiment: HDFS shell
commands Roll no:17071A0541

Name of laboratory: BDA LAB Experiment No._1_Date:21/9/20
Experiment 1
HDFS SHELL COMMANDS
. HDFS stands for ‘Hadoop Distributed File System’.

is Apache Software Foundation project is designed to provide a fault-tolerant file system designed to run on commodity ha
$ hadoop version Hadoop 2.6.0-cdh5.10.0
es under default directory. For instance, in our example the default directory for Cloudera VM is
1
Q4: Return all the directories under root directory
Variations of Hadoop ls Shell Command
[cloudera@quickstart Desktop]$ hadoop fs -ls /

m to HDFS. Usage: hadoop fs -copyFromLocal<localsrc><hdfs destination> First I have created a file in local named student.
Local students.txt
Q7: Check the contents of file that you copied in HDFS?
cat
HDFS Command that reads a file on HDFS and prints the content of that file to the
standard output.
Usage: hadoop fs –cat /path_to_file_in_hdfs
[cloudera@quickstart Desktop]$ hadoop fs -cat students.txt
Q8: Achieve the same operation as above with put command?
HDFS Command to copy single source or multiple sources from local file system to
the destination file system.
Usage: hadoop fs -put <localsrc><destination>
[cloudera@quickstart Desktop]$ hadoop fs -put students.txt

/user/cloudera/studentscopied.txt
[cloudera@quickstart Desktop]$ hadoop fs -ls

Q9) Copy any file from HDFS to Local File System
copyToLocal
HDFS Command to copy the file from HDFS to Local File System.
Usage: hadoop fs -copyToLocal<hdfs source><localdst>
[cloudera@quickstart Desktop]$ hadoop fs -copyToLocal students.txt

studentscopiedtolocal.txt
[cloudera@quickstart Desktop]$ ls
Q10) Check the health of the Hadoop file system.
fsck
HDFS Command to check the health of the Hadoop file system.
Command: hdfs fsck /
Hdfs fsck /
The filesystem under path '/' is HEALTHY

Q11) Create a directory in HDFS
HDFS Command to create the directory in HDFS.
Usage: hadoop fs –mkdir /directory_name
[cloudera@quickstart Desktop]$ hadoop fs –mkdir CSE2020

Q12) Display the contents of a file inside a directory present in HDFS
[cloudera@quickstart Desktop]$ hadoop fs –mkdir CSE2020
[cloudera@quickstart Desktop]$ hadoop fs –cp cse.txt CSE2020
Q13: Create an empty file in HDFS
touchz
HDFS Command to create a file in HDFS with file size 0 bytes.
Usage: hadoop fs –touchz /directory/filename
[cloudera@quickstart Desktop]$ hadoop fs –touchz empty.txt

Q14) Check the file size of any file in HDFS
du
HDFS Command to check the file size.
Usage: hadoop fs –du –s /directory/filename [cloudera@quickstart Desktop]$ hadoop fs –du students.txt
Q15: Print the contents of a file stored in HDFS
cat
HDFS Command that reads a file on HDFS and prints the content of that file to the standard output.
Usage: hadoop fs –cat /path/to/file_in_hdfs
[cloudera@quickstart Desktop]$ hadoop fs -cat students.txt
Q16) Count the number of directories and files inside a directory in HDFS?
count
HDFS Command to count the number of directories, files, and bytes under the paths
that match the specified file pattern.
Usage: hadoop fs -count <path>
[cloudera@quickstart Desktop]$ hadoop fs –count CSE2020
Q17) Remove a file from HDFS?
rm
HDFS Command to remove the file from HDFS.
Usage: hadoop fs dfs –rm<path>
[cloudera@quickstart Desktop]$ hadoop fs -rm empty.txt
Deleted empty.txt
Q18: Delete a directory completely in HDFS?
rm -r
HDFS Command to remove the entire directory and all of its content from HDFS.
Usage: hadoop fs -rm -r <path>
[cloudera@quickstart Desktop]$ hadoop fs -rm –r CSE2020
Q19) Copy a file or multiple files in a directory in HDFS
cp
Q22) Find the help for a given or all commands
help
HDFS Command that displays help for given command or all commands if none is
specified.
Command: hadoop fs –help
Q23: Check the memory status
Check memory status:
Usage: hadoop fs -dfhdfs :/
[cloudera@quickstart Desktop]$ hadoop fs -df
Filesystem Size Used Available Use%
hdfs://quickstart.cloudera:8020 58531520512 1245229056 46116413440 2%

Check for cluster balancing in HDFS
er Balancing Usage: hadoop balancer Type command

op balancer
Change permission for a file to 777 chmod: Changes the permissions of files. [cloudera@quickstart Desktop]$ hadoop fs -
dera@quickstart Desktop]$ hadoop fs -chmod 777 /user/cloudera/students.txt [cloudera@quickstart Desktop]$ hadoop fs
Q26) Empty the trash in HDFS
appendToFile : Append single src, or multiple srcs from local file system to the
destination file system
[cloudera@quickstart Desktop]$ gedit first.txt
[cloudera@quickstart Desktop]$ gedit second.txt
[cloudera@quickstart Desktop]$ hadoop fs -copyFromLocal second.txt

/user/cloudera/
[cloudera@quickstart Desktop]$ hadoop fs -appendToFile

/home/cloudera/Desktop/first.txt /user/cloudera/second.txt
[cloudera@quickstart Desktop]$ hadoop fs -cat /user/cloudera/second.txt
29. getmerge
Usage: hdfsdfs -getmerge<src><localdst> [addnl]
Takes a source directory and a destination file as input and concatenates files in src
into the destination local file. Optionally -nl can be set to enable adding a newline
character at the end of each file.
[cloudera@quickstart Desktop]$ hadoop fs -mkdir merge
[cloudera@quickstart Desktop]$ hadoop fs -mv pigfile.txt tab.txt

/user/cloudera/merge
[cloudera@quickstart Desktop]$ hadoop fs -getmerge -nl merge

/home/cloudera/Desktop/mergedfile.txt
[cloudera@quickstart Desktop]$ cat mergedfile.txt

V N RV J I E TName of the Experiment: Apache Pig
DS
ow system. It provides a simple language called Pig Latin, for queries and data manipulation, which are then compiled in to
Q2: Create two data sets using gedit command in local?
[cloudera@quickstart Desktop]$ gedit pigfile1.txt
1,2,3
4,5,6
7,8,9
[cloudera@quickstart Desktop]$ cat pigfile.txt
1,2,3
4,5,6
7,8,9
Q3: Copy the above files in HDFS?
[cloudera@quickstart Desktop]$ hadoop fs -put pigfile.txt /user/cloudera/
[cloudera@quickstart Desktop]$ hadoop fs -put pigfile1.txt /user/cloudera/

Q4: How to read your (pigfile.txt and pigfile1.txt) data in PIG
grunt> a = LOAD '/user/cloudera/pigfile.txt' using PigStorage(',');
grunt> dump a;
OUTPUT
(1,2,3)
(4,5,6)
(7,8,9)
grunt> b = LOAD '/user/cloudera/pigfile1.txt' using PigStorage(',');
grunt> dump b;
NOTE: If we want to specify schema, we can, but pig is flexible in that. The
columns can be referred as $0 , $1 and so on. But even if you want to specify
schema we can.
Q4: Specify the schema for above two tables?
grunt> a = LOAD '/user/cloudera/pigfile.txt' using PigStorage(',') as (a1:int, a2:int,

a3:int);
grunt> dump a;
grunt> b = LOAD '/user/cloudera/pigfile1.txt' using PigStorage(',') as (b1:int, b2:int, b3:int);

grunt> dump b;
Q5: Check the schema of the two tables?
grunt> describe a;
OUTPUT
a: {a1: int,a2: int,a3: int}
grunt> describe b;
OUTPUT
b: {b1: int,b2: int,b3: int}
Q6: Combine the two tables
grunt> c= union a,b;
grunt> dump c
Q8: Do filtering on data set c where $1 is greater than 3?
grunt> f = FILTER c BY $1 > 3;
grunt> dump f
Q9: Group data set c by $2?
grunt> g = GROUP c by $2;
grunt> dump g;
Q: 10: Select column 1 and 2 from dataset a ?
grunt> s1 = foreach a generate $1,$2;
grunt> dump s1;

Q11: Store the above result in HDFS?
grunt> store s1 into '/user/cloudera/pigresult';
To check)
Q12: check the file written in HDFS
[cloudera@quickstart Desktop]$ hadoop fs -ls /user/cloudera/pigresult/
Now see what’s inside part-m-00000
[cloudera@quickstart Desktop]$ hadoop fs -cat

/user/cloudera/pigresult/part-m-00000
V N RV J I E TName of the Experiment: Apache Sqoop
EXPERIMENT 3
Sqoop commands
oop storage) and relational database servers like mysql, Oracle RDB, SQLite, Teradata, Netezza, Postgres
ts data from HDFS to relational databases. It efficiently transfers bulk data between Hadoop and external
Q2: Create a database
mysql> create database sqoop_db;
Query OK, 1 row affected (0.27 sec)
Q3: Select the database created

mysql> use sqoop_db;
Q3: Create a table inside created database

mysql> create table emp(empno int primary key, ename varchar(10),job
varchar(9),mgr int,hiredate date,sal int,deptno int);
Query OK, 0 rows affected (0.08 sec)
Q4: Insert records in the table created

mysql> INSERT INTO emp VALUES (1,'kriti','teacher',100,2013-1-12,
50000,10);
Q5: Check the contents in the table?

mysql> select * from emp;
Q8: Open the partitions and check for the records
[cloudera@quickstart Desktop]$ hadoop fs –cat
/user/cloudera/scoop_data/part-m-00000
Sqoop – Export
Q9: Export data(like csv) in MYSQL in a table?
Create a file in local in the Desktop, suppose abc.txt
gedit abc.txt
Next put this abc.txt in HDFS
[cloudera@quickstart Desktop]$ hadoop fs -put abc.txt /user/cloudera/
Come to MYSQL prompt

Q10: Check in MYSQL if data is exported from HDFS into
abc_table
mysql> select * from abc_table;
VNRVJIET Name of the Experiment:Apache
Hive
commands
Roll no:17071A0541
Name of laboratory: BDA ExperimentNo._4 Date: 4/1/21

LAB
Q5:Also check the contents inside emp:
hadoop fs -cat /user/hive/warehouse/emp_details.db/emp/emp_details.txt
Q6:Check the schema of the created table emp?

describe emp;
describe extended emp;
Q7:How to see all the tables present in database

Hive
commands
Roll no:17071A0541

LAB
Q10:Group the sum of salaries as per the deptno
select deptno, sum(sal) from emp group by deptno;
Q11.Get the salary of people between 1000 and 2000
select * from emp where sal between 1000 and 2000;
Q12:Select the name of employees where job has exactly 5

characters
select ename from emp where job LIKE ' ';

Hive
commands
Roll no:17071A0541

LAB
Q13:List the employee names where job has l as the second character
select ename from emp where job LIKE '_l%';
Q14: Retrieve the total salary for each department

select deptno, sum(sal) from emp group by deptno;
Q15: Add a column to the table
alter table emp add COLUMNS(lastname string);
Q16:How to Rename a table
alter table emp rename to emp1;
Q17:How to drop table
drop table emp1;

Hive
commands
Roll no:17071A0541

LAB
Movies dataset and Hive commands: Q1.Create a database called movies
$hive
>create database movies;
>use movies;
Q2.Create movies_details table in movies database

>create table movie_details(no int,
name string,
year int,
rating decimal,
views int,
Genres string,
Director string)
row format delimited fields terminated by ',';
Q3.Load the dataset of movies from local to hive table

>LOAD DATA LOCAL INPATH
'/home/cloudera/Desktop/hive_demo/movies_new' INTO table
movie_details;
Q4.Check if table is created.

>show tables;
Q5.Retrieve all the records in moive_details

Hive
commands
Roll no:17071A0541

LAB
>select * from movie_details;

Q6.Print all movies between year 2005 and 2017
>select * from movie_details where year between 2005 and 2017;
Q7.Select all records where movie name starts with letter c or C.

>select * from movie_details where name like 'C%' or name like 'c%';
Hive
commands
Roll no:17071A0541

LAB
Q8.Select all records where movie name starts with Devil’s

>select * from movie_details where name LIKE 'Devil's%';
Q9.What is the maximum rating among all the movies.

>select max(rating) from movie_details;
Q10. count the number of records

>select count(*) from movie_details;
Q11. select rating of the movie Scott Cooper
>SELECT name,rating FROM movie_details WHERE name = 'Scott
Cooper';
Q12. select rating of the movie The Post

> SELECT name,rating FROM movie_details WHERE name = 'The
Post';
Q13. Print all movies under the direction of Daniel Barnz and James
Franco?
> select name ,rating from movie_details where Director='Daniel Barnz'
or Director = 'James Franco'
Q14. Retriew the movies

which has rating more than 6?
> select * from movie_details where rating>6;
Hive
commands
Roll no:17071A0541

LAB
Q15. Count the adventure Movies from the given dataset.

Q16. Count the Adventure|Animation|Comedy|Family|Fantasy
Movies from the given dataset.
> SELECT count(*) FROM movie_details where
Genres='Adventure|Animation|Comedy|Family|Fantasy' ;
Q17. Group the movie names as per the Genres?

Hive
commands
Roll no:17071A0541

LAB
>
Word count problem in Hive:
Aim: To perform word count on a text file using Hive query Language
Objective: To perform word count on a text file using functions like split
and explode
1)Create and use database
hive> create database hive_count_table;
hive> use hive_count_table;
2)Create a table
hive> create table hive_count_table(data string);

Hive
commands
Roll no:17071A0541

LAB
6) explode is to expand an array in a single row across multiple rows,
one for each value in the array.
hive> select explode( split(data,' ')) as word from hive_count_table;
7)Count words and their frequency
hive> select word,count(1)from (select explode(split(data,' ')) AS word

from hive_count_table)tmp group by word;
Hive
commands
Roll no:17071A0541

LAB
EXPERIMENT-5
Map Reduce Word count Problem

1. Download the jar files
2. Open eclipse
Press ok on the prompt.

3. Click File->New Project->Java Project
Give project name-- > “mapreduce” and click on finish button

VNRVJIET Name of the Experiment:
MapReduce
word count problem
Roll no:17071A0541
Name of laboratory: BDA LAB ExperimentNo._5_Date:25/1/2020
4. Create a package in this project

Right click on the project “mapreduce” select newpackage
Give path as src folder in the project
Give package name as “word” then click finish
5. Create a class  ClassWord

MapReduce
word count problem
Roll no:17071A0541
6. Copy the given mapreduce program in the created class and save it
7. Add jar files by following below steps to remove the errors from above code.
Right click on project->Build Path->Configure Build Path

MapReduce
word count problem
Roll no:17071A0541
Goto Libraries->Add external JARs
Select the downloaded jar files.

8. Create a jar file for a given program
Right click on project->Export
Select JAR file as export type.

MapReduce
word count problem
Roll no:17071A0541
You can give a name to the jar file.
Click on Finish.
9. Jar file will be created on the desktop
10. Open the terminal

Create a txt file (read.txt) on the desktop and move it to HDFS
hi how are you
hope you are fine
we are all good
hi we will meet
see you soon
sure soon we will meet----->content of read.txt
Check whether the file is uploaded to cloudera.

MapReduce
word count problem
Roll no:17071A0541
Execute the jar file
hadoop jar Untitled.jar word.Classword /user/cloudera/read.txt /user/cloudera/result

MapReduce
word count problem
Roll no:17071A0541
Check for the output of mapreduce word count problem.

VNRVJIET Name of the Experiment: Data
visualization using Tableau
Roll no:17071A0541
EXPERIMENT-6
Data visualization using Tableau
Tableau is the fastest growing data visualization and data

analytics tool that aims to help people see and understand data. In other
words, it simply converts raw data into a very easily understandable
format. Data analysis is great, as it is a powerful visualization tool in the
business intelligence industry.
1) Data uploading in Tableau
You can upload data into tableau using MSExcel or Text file or by
connecting to database server.
After loading your dataset you can view its content as shown in below
figure.There can be one or more tables in your dataset and you can apply
union operation to combine the tables for effective data visualization.
67
Union of two tables.
Roll no:17071A0541
2) Creating sheets in Tableau-worksheets can be created by clicking on

create newsheet in the bottom as shown.
3) Applying Filter in Tableau-we can right click the particular column or
row and select apply filter and give respective filter.
4) Creating Dashboard-dashboards can be created by clicking on create

dashboard in the bottom as shown.Dashboards have sequence of
worksheets.
Roll no:17071A0541
5) Creating Storyboard-storyboards can be created by clicking on create

storyboard in the bottom as shown.Storyboards have sequence of
worksheets or dashboards.

Vnrvjiet

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vnrvjiet

Uploaded by

Copyright:

Available Formats

V N RV J I E TName of the Experiment: HDFS shell

commands Roll no:17071A0541

HDFS SHELL COMMANDS

. HDFS stands for ‘Hadoop Distributed File System’.

$ hadoop version Hadoop 2.6.0-cdh5.10.0

Variations of Hadoop ls Shell Command

[cloudera@quickstart Desktop]$ hadoop fs -ls /

Usage: hadoop fs –cat /path_to_file_in_hdfs

[cloudera@quickstart Desktop]$ hadoop fs -cat students.txt

Q8: Achieve the same operation as above with put command?

Usage: hadoop fs -put <localsrc><destination>

[cloudera@quickstart Desktop]$ hadoop fs -put students.txt

[cloudera@quickstart Desktop]$ hadoop fs -ls

Q9) Copy any file from HDFS to Local File System

Usage: hadoop fs -copyToLocal<hdfs source><localdst>

[cloudera@quickstart Desktop]$ hadoop fs -copyToLocal students.txt

HDFS Command to check the health of the Hadoop file system.

Command: hdfs fsck /

The filesystem under path '/' is HEALTHY

Q11) Create a directory in HDFS

HDFS Command to create the directory in HDFS.

Usage: hadoop fs –mkdir /directory_name

[cloudera@quickstart Desktop]$ hadoop fs –mkdir CSE2020

[cloudera@quickstart Desktop]$ hadoop fs -ls

[cloudera@quickstart Desktop]$ hadoop fs –mkdir CSE2020

[cloudera@quickstart Desktop]$ hadoop fs -ls

[cloudera@quickstart Desktop]$ hadoop fs –cp cse.txt CSE2020

Q13: Create an empty file in HDFS

HDFS Command to create a file in HDFS with file size 0 bytes.

Usage: hadoop fs –touchz /directory/filename

[cloudera@quickstart Desktop]$ hadoop fs –touchz empty.txt

[cloudera@quickstart Desktop]$ hadoop fs -ls

Q14) Check the file size of any file in HDFS

HDFS Command to check the file size.

Usage: hadoop fs –du –s /directory/filename [cloudera@quickstart Desktop]$ hadoop fs –du students.txt

Q15: Print the contents of a file stored in HDFS

[cloudera@quickstart Desktop]$ hadoop fs -cat students.txt

Usage: hadoop fs -count <path>

[cloudera@quickstart Desktop]$ hadoop fs –count CSE2020

Q17) Remove a file from HDFS?

HDFS Command to remove the file from HDFS.

Usage: hadoop fs dfs –rm<path>

[cloudera@quickstart Desktop]$ hadoop fs -rm empty.txt

Q18: Delete a directory completely in HDFS?

Usage: hadoop fs -rm -r <path>

[cloudera@quickstart Desktop]$ hadoop fs -rm –r CSE2020

Q19) Copy a file or multiple files in a directory in HDFS

Command: hadoop fs –help

Q23: Check the memory status

Check memory status:

Usage: hadoop fs -dfhdfs :/

[cloudera@quickstart Desktop]$ hadoop fs -df

Filesystem Size Used Available Use%

hdfs://quickstart.cloudera:8020 58531520512 1245229056 46116413440 2%

Check for cluster balancing in HDFS

er Balancing Usage: hadoop balancer Type command

[cloudera@quickstart Desktop]$ gedit first.txt

[cloudera@quickstart Desktop]$ gedit second.txt

[cloudera@quickstart Desktop]$ hadoop fs -copyFromLocal second.txt

[cloudera@quickstart Desktop]$ hadoop fs -appendToFile

[cloudera@quickstart Desktop]$ hadoop fs -cat /user/cloudera/second.txt

Usage: hdfsdfs -getmerge<src><localdst> [addnl]