Professional Documents
Culture Documents
Vnrvjiet
Vnrvjiet
Vnrvjiet
commands
Roll no:17071A0541
Experiment 1
HDFS commands are used to access the Hadoop File System. HDFS stands for
‘Hadoop Distributed File System’.
The HDFS is a sub-project of the Apache Hadoop project. This Apache Software
Foundation project is designed to provide a fault-tolerant file system designed to run
on commodity hardware. HDFS is accessed through a set of shell commands.
Hadoop 2.6.0-cdh5.10.0
This command will list all the available files and subdirectories under default
directory. For instance, in our example the default directory for Cloudera VM is
/user/cloudera
1
Q4: Return all the directories under root directory
2
VNR VJIET Name of the Experiment: HDFS shell
commands
Roll no:17071A0541
copyFromLocal
HDFS Command to copy the file from a Local file system to HDFS.
3
Q7: Check the contents of file that you copied in HDFS?
cat
HDFS Command that reads a file on HDFS and prints the content of that file to the
standard output.
HDFS Command to copy single source or multiple sources from local file system to
the destination file system.
4
VNR VJIET Name of the Experiment: HDFS shell
commands
Roll no:17071A0541
copyToLocal
HDFS Command to copy the file from HDFS to Local File System.
[cloudera@quickstart Desktop]$ ls
5
Q10) Check the health of the Hadoop file system.
fsck
Hdfs fsck /
6
VNR VJIET Name of the Experiment: HDFS shell
commands
Roll no:17071A0541
7
Q12) Display the contents of a file inside a directory present in HDFS
touchz
8
VNR VJIET Name of the Experiment: HDFS shell
commands
Roll no:17071A0541
du
cat
HDFS Command that reads a file on HDFS and prints the content of that file to the
standard output.
9
Usage: hadoop fs –cat /path/to/file_in_hdfs
Q16) Count the number of directories and files inside a directory in HDFS?
count
HDFS Command to count the number of directories, files, and bytes under the paths
that match the specified file pattern.
rm
Deleted empty.txt
rm -r
HDFS Command to remove the entire directory and all of its content from HDFS.
cp
10
VNR VJIET Name of the Experiment: HDFS shell
commands
Roll no:17071A0541
HDFS Command to copy files from source to destination. This command allows
multiple sources as well, in which case the destination must be a directory.
mv
HDFS Command to move files from source to destination. This command allows
multiple sources as well, in which case the destination needs to be a directory.
Usage command gives all the options that can be used with a particular hdfs
command.
11
Q22) Find the help for a given or all commands
help
HDFS Command that displays help for given command or all commands if none is
specified.
12
VNR VJIET Name of the Experiment: HDFS shell
commands
Roll no:17071A0541
Cluster Balancing
Type command
hadoop balancer
13
Q26) Empty the trash in HDFS
14
VNR VJIET Name of the Experiment: HDFS shell
commands
Roll no:17071A0541
expunge: Empties the trash. When you delete a file, it isn’t removed immediately
from HDFS, but is renamed to a file in the /trash directory. As long as the file
remains there, you can undelete it if you change your mind, though only the latest
copy of the deleted file can be restored.
tail
This hadoop command will show the last kilobyte of the file to stdout.
28) Append the contents of a file present in local to a file present in HDFS
15
appendToFile : Append single src, or multiple srcs from local file system to the
destination file system
29. getmerge
Takes a source directory and a destination file as input and concatenates files in src
into the destination local file. Optionally -nl can be set to enable adding a newline
character at the end of each file.
16
VNR VJIET Name of the Experiment: Apache Pig
commands
Roll no:17071A0541
Experiment 2
Pig is an open-source high level data flow system. It provides a simple language
called Pig Latin, for queries and data manipulation, which are then compiled in to
MapReduce jobs that run on Hadoop.
17
Q2: Create two data sets using gedit command in local?
1,2,3
4,5,6
7,8,9
1,2,3
4,5,6
7,8,9
18
VNR VJIET Name of the Experiment: Apache Pig
commands
Roll no:17071A0541
grunt> dump a;
OUTPUT
(1,2,3)
(4,5,6)
(7,8,9)
19
grunt> b = LOAD '/user/cloudera/pigfile1.txt' using PigStorage(',');
20
VNR VJIET Name of the Experiment: Apache Pig
commands
Roll no:17071A0541
grunt> dump b;
21
NOTE: If we want to specify schema, we can, but pig is flexible in that. The
columns can be referred as $0 , $1 and so on. But even if you want to specify
schema we can.
grunt> dump a;
22
VNR VJIET Name of the Experiment: Apache Pig
commands
Roll no:17071A0541
grunt> dump b;
23
Q5: Check the schema of the two tables?
grunt> describe a;
OUTPUT
grunt> describe b;
OUTPUT
grunt> dump c
24
VNR VJIET Name of the Experiment: Apache Pig
commands
Roll no:17071A0541
Q7: Split the c data setinto two different relationseg. d and e? E.g. I want one
data set where $0 is having value 1 and other data set where value of $0 is 4
>dump d;
> dump e;
25
Q8: Do filtering on data set c where $1 is greater than 3?
grunt> dump f
grunt> dump g;
26
VNR VJIET Name of the Experiment: Apache Pig
commands
Roll no:17071A0541
To check)
27
Q12: check the file written in HDFS
28
VNR VJIET Name of the Experiment: Apache Sqoop
commands
Roll no:17071A0541
EXPERIMENT 3
Sqoop commands
29
Q2: Create a database
mysql> create database sqoop_db;
Query OK, 1 row affected (0.27 sec)
30
VNR VJIET Name of the Experiment: Apache Sqoop
commands
Roll no:17071A0541
Sqoop – Import
Q6: Import this table in HDFS
31
Q8: Open the partitions and check for the records
[cloudera@quickstart Desktop]$ hadoop fs –cat
/user/cloudera/scoop_data/part-m-00000
[cloudera@quickstart Desktop]$ hadoop fs -cat
/user/cloudera/scoop_data/part-m-00001
[cloudera@quickstart Desktop]$ hadoop fs -cat
/user/cloudera/scoop_data/part-m-00002
Sqoop – Export
Q9: Export data(like csv) in MYSQL in a table?
gedit abc.txt
32
VNR VJIET Name of the Experiment: Apache Sqoop
commands
Roll no:17071A0541
33
Q10: Check in MYSQL if data is exported from HDFS into
abc_table
mysql> select * from abc_table;
34
VNRVJIET Name of the Experiment:Apache Hive
commands
Roll no:17071A0541
EXPERIMENT 4
HIVE commands
35
Q5:Also check the contents inside emp:
hadoop fs -cat /user/hive/warehouse/emp_details.db/emp/emp_details.txt
36
VNRVJIET Name of the Experiment:Apache Hive
commands
Roll no:17071A0541
show tables;
37
Q10:Group the sum of salaries as per the deptno
38
VNRVJIET Name of the Experiment:Apache Hive
commands
Roll no:17071A0541
39
Q15: Add a column to the table
40
VNRVJIET Name of the Experiment:Apache Hive
commands
Roll no:17071A0541
41
>create table movie_details(no int,
name string,
year int,
rating decimal,
views int,
Genres string,
Director string)
row format delimited fields terminated by ',';
Roll no:17071A0541
43
Q6.Print all movies between year 2005 and 2017
>select * from movie_details where year between 2005 and 2017;
44
VNRVJIET Name of the Experiment:Apache Hive
commands
Roll no:17071A0541
45
Q11. select rating of the movie Scott Cooper
>SELECT name,rating FROM movie_details WHERE name = 'Scott
Cooper';
Q13. Print all movies under the direction of Daniel Barnz and James
Franco?
> select name ,rating from movie_details where Director='Daniel Barnz'
or Director = 'James Franco'
46
VNRVJIET Name of the Experiment:Apache Hive
commands
Roll no:17071A0541
47
Q16. Count the Adventure|Animation|Comedy|Family|Fantasy
Movies from the given dataset.
> SELECT count(*) FROM movie_details where
Genres='Adventure|Animation|Comedy|Family|Fantasy' ;
48
VNRVJIET Name of the Experiment:Apache Hive
commands
Roll no:17071A0541
>
49
Word count problem in Hive:
Aim: To perform word count on a text file using Hive query Language
Objective: To perform word count on a text file using functions like split
and explode
2)Create a table
50
VNRVJIET Name of the Experiment:Apache Hive
commands
Roll no:17071A0541
51
6) explode is to expand an array in a single row across multiple rows,
one for each value in the array.
52
VNRVJIET Name of the Experiment:Apache Hive
commands
Roll no:17071A0541
53
EXPERIMENT-5
54
VNRVJIET Name of the Experiment: MapReduce
word count problem
Roll no:17071A0541
55
Give path as src folder in the project
56
VNRVJIET Name of the Experiment: MapReduce
word count problem
Roll no:17071A0541
6. Copy the given mapreduce program in the created class and save it
57
7. Add jar files by following below steps to remove the errors from above code.
58
VNRVJIET Name of the Experiment: MapReduce
word count problem
Roll no:17071A0541
59
8. Create a jar file for a given program
60
VNRVJIET Name of the Experiment: MapReduce
word count problem
Roll no:17071A0541
Click on Finish.
61
9. Jar file will be created on the desktop
hi we will meet
62
VNRVJIET Name of the Experiment: MapReduce
word count problem
Roll no:17071A0541
63
64
VNRVJIET Name of the Experiment: MapReduce
word count problem
Roll no:17071A0541
65
VNRVJIET Name of the Experiment: Data
visualization using Tableau
Roll no:17071A0541
EXPERIMENT-6
You can upload data into tableau using MSExcel or Text file or by
connecting to database server.
After loading your dataset you can view its content as shown in below
figure.There can be one or more tables in your dataset and you can apply
union operation to combine the tables for effective data visualization.
67
Union of two tables.
68
VNRVJIET Name of the Experiment: Data
visualization using Tableau
Roll no:17071A0541
69
3) Applying Filter in Tableau-we can right click the particular column or
row and select apply filter and give respective filter.
70
VNRVJIET Name of the Experiment: Data
visualization using Tableau
Roll no:17071A0541
71