Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 68

CLOUD COMPUTING

PRACTICAL FILE

Submitted To: Submitted By:


Ms. Ravneet Kaur SAJONAR
Roll No. UE143079
B.E. Cse 7 Semester
Group 5

1
PRACTICAL - 1
AIM: WORKING WITH AMAZON WEB ENGINE.

a) Introduction To AWS Services


In 2006, Amazon Web Services (AWS) started to offer IT services to the market in
the form of web services, which is nowadays known as cloud computing. With this
cloud, we need not plan for servers and other IT infrastructure which takes up much
of time in advance. Instead, these services can instantly spin up hundreds or thousands
of servers in minutes and deliver results faster. We pay only for what we use with no
up-front expenses and no long-term commitments, which makes AWS cost efficient.

Following are the various services Provided by AWS :-

1. EC2 (Elastic Compute Cloud)


Browse, filter and search instances.
View configuration details.
Check status of CloudWatch metrics and alarms.
Perform operations over instances like start, stop, reboot, termination.
Manage security group rules.
Manage Elastic IP Addresses.
View block devices.

2. S3
Browse buckets and view their properties.
View properties of objects.

3. RDS (Relational Database Service)


Browse, filter, search and reboot instances.
View configuration details, security and network settings.

4. Services Dashboard
Provides information of available services and their status.
All information related to the billing of the user.
Switch the users to see the resources in multiple accounts.

b) Launching a Windows Virtual Machine with Amazon EC2


Step 1: Enter the EC2 Dashboard

2
Step 2: Create and Configure Your Virtual Machine

a) Click Launch Instance

b) Find Microsoft Windows Server 2012R2 Base and click Select

3
c) Now choose an instance type. Instance types comprise of varying combinations of
CPU, memory, storage, and networking capacity. Then click Review and Launch

d) click Launch

Step 3: Create a Key Pair and Launch Your Instance

a) select Create a new key pair and name it MyFirstKey. Then click Download Key Pair.
b) After you have downloaded and saved your key pair, click Launch Instance

4
c) click View Instances to view the instance you have just created

Step 4: Connect to Your Instance

a) Select the Windows Server instance you just created and click Connect

5
b) to connect to your Windows virtual machine instance,we need a user name and password:
o The User name defaults to A d ministrator
o To receive your password, click Get Password

c) Click Choose File and browse to the directory you stored My First Key . Your Key Pair will
surface in the text box. Click Decrypt Password.

6
d) It is your Windows Server admin login credentials.

e) Click Download Remote Desktop File and open the file.

7
f) When prompted log in to the instance, use the User Name and Password you generated in to
connect to your virtual machine.

Step 5: Terminate Your Windows VM

a) click the Actions button, navigate to Instante Stat e , and click Terminate.

b) You will be asked to confirm your termination - select Yes, Terminate.

8
c) Store and Retrieve a File with Amazon

Step 1. Enter the Amazon S3 Console

Step 2. Create an S3 Bucket

a) In the S3 dashboard, click Create Bucket.


b) Enter a bucket name the name must be unique across all existing bucket names in Amazon S3
Then select a region to launch your bucket in. Select Next.

9
c) Leave these Options disabled and select Next.

d) Leave the default values and select Next.

10
e) select Create bucket.

Step 3. Upload a File

a) Click on your buckets name to navigate to the bucket.

b) To Create Folder , Click Create Folder

11
c) Click on your Folders name to navigate to the Folder.

d) You are in your Folders home page. Select Upload

e) Add a sample file to store and select Next.

12
f) Leave the default values and select Next.

g) Leave the default values and select Next.

h) Review your configurations and select Upload

13
Step 4: Retrieve the Object

a) Select the checkbox to download and select Download.

Step 5: Delete the Object and Bucket

a) Select the checkbox next to the file you want to delete and select More > Delete.

14
b) Review and confirm the object you want to delete. Select Delete.

c) Select the bucket you created and select Delete. Type in the name of your bucket and select Confirm.

15
d) Create and Connect to a MySQL Database with Amazon RDS

Step 1: Create a MySQL DB Instance

a) select the Region in which you want to create the DB instance.

b) click Instances. Then click Launch DB Instance.

c) click the My SQL icon and then click Select.

16
d) Select the MySQL option under Dev/Test and click Next Step.

e) You will now configure your DB instance

17
f) You are now on the Configure Advanced Settings page where you can provide additional
information that RDS needs to launch the MySQL DB instance.Click Launch DB Instance

g) Click View Your DB Instance

18
Step 2: Download a SQL Client

a) Go to the Download MySQL Workbench page to download and install MySQL


Workbench
b) click No thanks, just start my download for a quick download

Step 3: Connect to the MySQL Database

19
a) Launch the MySQL Workbench application and go to Database > Connect to Database

b) A dialogue box appears. Enter the following:


Hostname: You can find your hostname on the Amazon RDS console as shown in
the screenshot to the right.
Port: The default value should be 3306.
Username: Type in the username you created for the Amazon RDS database. Our
example was ' master Username .'
Password: Click Store in Vault and enter the password you used while creating the
Amazon RDS database.
Click OK

c) Now you can start creating tables, insert data, and run queries.

20
i. Create Table

21
ii. Insert Values Into Table

22
iii. Select Values

Step 4: Delete the DB Instance

a) Go back to your Amazon RDS Console. Select Instance Actions and click Delete
from the dropdown menu.

23
b) check the acknowledgment box, and click Delete.

e) Create and Query a NoSQL Table with Amazon DynamoDB


Step 1: Create a NoSQL Table

a. In the DynamoDB console, click Create Table.

b. Table name field, type tablename


c. Type a partition key name in the Partition Key field.
d. you can enable easy sorting with a Sort Key. Check the Add sort key box. Type a
name in the Sort Key field. Click Create.

24
Step 2: Add Data to the NoSQL Table

a) Click the Items tab. Under the Items tab, click Create item.

b) In the data entry window, Add tuples


c) Click Save to save the item.

25
d) Repeat the process to add a few more items to the table.

Step 3: Query the NoSQL Table

a) Using the drop-down list in the dark gray banner above the items, change Scan to
Query.

b) You can use the console to Execute query

26
Step 4: Delete an Existing Item

a) Select your Query dropdown back to Scan.


b) Click the checkbox next to item you want to delete and the selected item will become
highlighted. In the Actions Dropdown, select Delete. You will be asked whether to
delete the item. Click Delete and your item is deleted.

27
Step 5: Delete a NoSQL Table

a) In the Amazon DynamoDB console, click the Actions dropdown and click Delete
table

b) A confirmation dialog appears; click the Delete button.

28
PRACTICAL 2
AIM: WORKING WITH MICROSOFT AZURE.

INTRODUCTION TO MICROSOFT AZURE


Microsoft Azure (formerly Windows Azure) is a cloud computing service created by
Microsoft for building, testing, deploying, and managing applications and services
through a global network of Microsoft-managed data centres. It provides software as a
service (SAAS), platform as a service and infrastructure as a service and supports
many different programming languages, tools and frameworks, including both
Microsoft-specific and third-party software and systems.
Azure was announced in October 2008 and released on February 1, 2010 as
"Windows Azure" before being renamed "Microsoft Azure" on March 25, 2014.
Azure is generally available in 36 regions around the world. Microsoft has announced
an additional four regions. Microsoft is the first hyper-scale cloud provider that has
committed to building facilities on the continent of Africa with two regions located in
South Africa.
Microsoft Azure uses a specialized operating system, called Microsoft Azure, to run
its "fabric layer: a cluster hosted at Microsoft's data centres that manages computing
and storage resources of the computers and provisions the resources (or a subset of
them) to applications running on top of Microsoft Azure. Microsoft Azure has been
described as a "cloud layer" on top of a number of Windows Server systems, which
use Windows Server 2008 and a customized version of Hyper-V, known as the
Microsoft Azure Hypervisor to provide virtualization of services.
Microsoft Azure offers two deployment models for cloud resources: the "classic"
deployment model and the Azure Resource Manager. In the classic model, each Azure
resource (virtual machine, SQL database, etc.) was managed individually. The Azure
Resource Manager, introduced in 2014, enables users to create groups of related
services so that closely coupled resources can be deployed, managed, and monitored
together.

CREATING AND CONNECTING A VIRTUAL MACHINE


Open the Microsoft Azure Portal and login into it. The dashboard opens after login.

29
1. Click on Virtual Machines in the Navigation pane on the left. The Virtual Machines
panel will open.

2. Click on Create Virtual Machine button. This will give you the options to choose the
type of Virtual Machine you want.

30
3. Click on Windows Server. A navigation pane with various OS options opens on the
right side. Choose the OS as required.

4. After selecting the OS, it is time to configure your virtual machine. First a basic
configuration page will open. Here, you can set the name of the virtual machine,
select the location, set the username and password, select subscription type, the type
of disk and resource group. Then click OK.

31
5. On the next screen, you will be asked to select the configuration based on size of
machine, memory available, number of CPUs, number of IOPS.

6. Next is configuring some optional features for the virtual machine such as High
availability, managed storage, network etc.

32
7. After clicking OK, the final page opens showing the details of the virtual machine,
asking to agree to the Terms and Conditions and then Purchase the virtual machine.
When Purchase is clicked, the deployment starts.

8. After waiting for some time, the machine gets deployed and is ready for use.

33
9. Click on Connect button in the top bar. It downloads a Remote Desktop Connection
file. Open the file and it asks for the Username and Password.

10. After clicking OK, the virtual machine opens.

34
35
PRACTICAL 3

AIM: GOOGLE CLOUD PLATFORM

INTRODUCTION:
Google Cloud Platform, offered by Google, is a suite of cloud computing services that runs
on the same infrastructure that Google uses internally for its end-user products, such
as Google Search and YouTube. Alongside a set of management tools, it provides a series of
modular cloud services including computing, data storage, data analytics and machine
learning
A sample of products are listed below, this is not an exhaustive list.
Google Compute Engine IaaS providing virtual machines.
Google App Engine PaaS for application hosting.
Bigtable IaaS massively scalable NoSQL database.
BigQuery SaaS large scale database analytics.
Google Cloud Functions As of August 2017 is in beta
testing. FaaS providing serverless functions to be triggered by cloud events.
Google Cloud Datastore - DBaaS providing a document-oriented database.
Cloud Pub/Sub - a service for publishing and subscribing to data streams and
messages.Applications can communicate via Pub/Sub, without direct integration
between the applications themselves.
Google Storage - IaaS providing RESTful online file and object storage.

(A) LAUNCH A VIRTUAL MACHINE

1. In the Cloud Platform Console, go to the VM Instances page.


GO TO THE VM INSTANCES PAGE

36
2. Click the Create instance button.

3. In the Boot disk section, click Change to begin configuring your boot disk.

4. In the OS images tab, choose Windows Server 2012 R2.


5. Click Select.

37
6. In the Firewall section, select Allow HTTP traffic.

7. Click the Create button to create the instance.


Allow a short time for the instance to start up. Once ready, it will be listed on the VM
Instances page with a green status icon.
CONNECT TO YOUR INSTANCE
1. Go to the VM Instances page in the Google Cloud Platform Console.
GO TO THE VM INSTANCES PAGE

38
2. Under the Name column, click the name of your virtual machine instance.
3. At the top of the VM instance's details page, click the Create or reset Windows
Password button.
4. Specify a username, then click Set to generate a new password for this Windows
instance. Save the username and password so you can log into the instance.

5. Connect to your instance using RDP:


If you're using a different RDP client (including Windows Remote Desktop
Connection), click the RDP button's overflow menu and download the RDP
file. Open the RDP file with your client.

CLEAN UP

To avoid incurring charges to your Google Cloud Platform account for the resources used in
this quickstart:

1. Go to the VM Instances page in the Google Cloud Platform Console.

GO TO THE VM INSTANCES PAGE

2. Click the name of the instance you created.


3. At the top of the instances details page, click Delete.

(B) DEPLOYING A SINGLE GOOGLE APP ENGINE USING JAVA/DEPLOYING A


JAVA WEB APPLICATION ON GOOGLE APP ENGINE

1. Install a google plugin for ellipse IDE .After installation click on create new
project and select Google folder and then Web Application Project

39
2. Create a new project in Google cloud console and note down the project id.

3. Also click on next in ellipse. Now enter the same project name as created in GCP
and also enter the project id. Click on Finish.

4. Open the index.html file under war folder and change the text you want to display
in the app engine.
Also the java file in the src folder enter the message which will be displayed after
clicking the link.

40
5. In the project properties change the compiler compliance level to 1.7 from 1.8

6. Under the google icon select option deploy Project to Google App Engine. Click
on Deploy option.

41
7. In the browser your first google app will be displayed. Click on the link.

8. The final text will be displayed. Save the app link and open your app any time.
http://1-dot-starry-expanse-137723.appspot.com/my_project

42
(C) CLOUD STORAGE

CREATE A BUCKET

1. Open the Cloud Storage browser in the Google Cloud Platform Console.
OPEN THE CLOUD STORAGE BROWSER

43
2. Click CREATEBUCKET.

3. Enter a unique Name for your bucket.


Do not include sensitive information in the bucket name, because the bucket
namespace is global and publicly visible.
4. Choose Multi-Regional for Storage class.
5. Choose Asia for Location.
6. Click Create.

44
That's it you've just created a Cloud Storage bucket!

UPLOAD AN OBJECT INTO THE BUCKET

1. Click UPLOADFILES.

2. In the file dialog, navigate to the file that you downloaded and select it.

45
After the upload completes, you should see the file name, size, type, and last modified date in
the bucket.

DOWNLOAD AND DISPLAY AN OBJECT

Try these ways of interacting with the objects from the Cloud Storage browser:
1. Right-click the file and select the option to save it.

2. Click on the file to view it in your browser.

46
CREATE FOLDERS

1. Click CREATE FOLDER.


2. Enter folder1 for Name and click Create.

You should see the folder in the bucket with an image of a folder icon to distinguish it
from objects.

DELETE OBJECTS

1. Click the Buckets link to return to the Buckets level.


2. Select the bucket.
3. Select the checkbox next to folder1.
4. Click on the icon of the trash can.
5. Click OK to permanently delete the folder and all objects and subfolders in it.

47
CLEAN UP
To avoid incurring charges to your Google Cloud Platform account for the resources
used in this quickstart:
1. Open the Cloud Storage browser in the Google Cloud Platform Console.
OPEN THE CLOUD STORAGE BROWSER
2. Select the checkbox next to the bucket that you created.
3. Click DELETE.
4. Click Delete to permanently delete the bucket and its contents.

(D) CLOUD SQL FOR MYSQL

CREATE A CLOUD SQL INSTANCE


1. Go to the Cloud SQL Instances page in the Google Cloud Platform Console.
GO TO THE CLOUD SQL INSTANCES PAGE

48
2. Select your project and click Continue.
3. Click Create Instance.

4. Click MySQL.

5. Click Choose Second Generation.

49
6. Enter myinstance for Instance ID.

7. Enter a password for the root user.


Use the default values for the other fields.
8. Click Create.

You are returned to the instances list; your new instance is greyed out while it
initializes and starts.

50
CONNECT TO YOUR INSTANCE USING THE MYSQL CLIENT IN THE CLOUD
SHELL

1. In the Google Cloud Platform Console, click the Cloud Shell icon ( ) in the
upper right corner.
When the Cloud Shell finishes initializing, you should see:
Welcome to Cloud Shell! Type "help" to get started.
username@example-id:~$

2. At the Cloud Shell prompt, connect to your Cloud SQL instance:


gcloudsql connect myinstance --user=root

51
3. Enter your root password.
You should see the mysql prompt.

CREATE A DATABASE AND UPLOAD DATA

1. Create a SQL database on your Cloud SQL instance:


CREATE DATABASE guestbook;

2. Insert sample data into the guestbook database:


USE guestbook;

CREATE TABLE entries (guestNameVARCHAR(255), content VARCHAR(255),


entryID INT NOT NULL AUTO_INCREMENT, PRIMARY KEY(entryID));

INSERT INTO entries (guestName, content) values ("first guest", "I got here!");

INSERT INTO entries (guestName, content) values ("second guest", "Me too!");

3. Retrieve the data:


SELECT * FROM entries;

You should see:

CLEAN UP

To avoid incurring charges to your Google Cloud Platform account for the resources
used in this quickstart:

52
1. Go to the Cloud SQL Instances page in the Google Cloud Platform Console.
GO TO THE CLOUD SQL INSTANCES PAG
2. Select the myinstance instance to open the Instance details page.

3. In the icon bar at the top of the page, click Delete.

4. In the Delete instance window, type myinstance, then click Delete to delete the
instance.

You cannot reuse an instance name for approximately 7 days after an instance is
deleted.

53
(E) CLOUD DATASTORE

STORE DATA

1. Go to the Datastore Entities page in the Google Cloud Platform Console.


GO TO THE DATASTORE ENTITIES PAGE
this page allows you to store, query, update, and delete data.

2. Click Create entity.

3. If you see the following page, you need to select a location. (Go to the next step if you
do not see this page.)

54
The location applies to both Cloud Datastore and Google App Engine for your Google
Cloud Platform project. You cannot change the location after it has been saved.
To save a location, select one of the location values and click Next.

4. On the Create an entity page, use [default] for Namespace.


5. Type Task for Kind.
6. Under Properties use the Add property button to add these properties:

Your creation page should now look like this:

7. Click Create. The console displays the Task entity that you just created.

55
You just stored data in Cloud Datastore!

RUN A QUERY
Cloud Datastore supports querying data by kind or by Google Query Language
(GQL); the instructions below walk you through the steps of doing both.
Run kind queries
1. Click Query by kind.
2. Select Task as the kind.
The query results show the Task entity that you created.

Next, add a query filter to restrict the results to entities that meet specific criteria:
1. Click Filter entities.
2. In the dropdown lists, select done, is a boolean, and that is false.

3. Click Apply filters. The results show the Task entity that you created, since
its done value is false

4. Now try a query of done, is a boolean, and that is true. The results do not include
the Task entity that you created, because its done value is not true.

RUN GQL QUERIES


1. Click Query by GQL.
2. Enter SELECT * FROM Task as the query. Note that Task is case sensitive.
3. Click Run query.
The query results show the Task entity that you created.

56
Again, add a query filter to restrict the results to entities that meet specific criteria:

1. Run a query such as SELECT * FROM Task WHERE done=false. Note


that Task and done are case sensitive. The results show the Task entity that you
created, since its done value is false.

2.

3. Now run a query such as SELECT * FROM Task WHERE done=true. The results do
not include the Taskentity that you created, because its done value is not true.

CLEAN UP
1. Click Query by kind and ensure Task is the selected kind.
2. Click Clear filters.

57
3. Select the Task entity that you created
4. Click Delete, and then confirm you want to delete the Task entity. Once deleted, the
entity is permanently removed from Cloud Datastore.
The Task entity that you previously created is deleted from Cloud Datastore.

58
PRACTICAL 4
Aim: INSTALLATION OF APACHE HADOOP
Installing Hadoop 2.7.1

This method of install Hadoop is to install any version of Hadoop 2.x.x . As we know that
Hadoop requires JVM to run. So we need to install Java before installing Hadoop. So before
installing java let us update our package list doing this will automatically give latest version
of Java form the Linux vender.

To update Package list type this command in your terminal.

$ sudo apt-get update

Doing this will require internet connection. Once you complete the above step you can install
java by typing below command in terminal. (Note- you can use any other command to install
java)

$ sudo apt-get install default-jdk

when you are done to check which java version to do that type

$ java version

Above result describes that our installed version in 1.7.65 make sure that
you have java 1.6 or above.

Next step is to install ssh

59
ssh is Secure shell. This application allows us to get remote access of any machine(or Local
host) by different password other then root and also allows us to bypass the password by
setting it to empty. To install ssh use following command :

$ sudo apt-get install ssh

if we try to connect local host or local machine though ssh it will ask user password. To check
this you can type this command in terminal.

$ ssh localhost

Note-Before going further we need to exit ssh just type exit in same terminal.
So we need to set our ssh for password less communication. To do that execute following
command in terminal.

$ ssh-keygen -t rsa -P ''

Please note that there are two single quotes after 'P' in command without space. After entering
this command it will ask Enter file in which to save the key (/home/abc/.ssh/id_rsa): press
Enter without typing any single word. You will get Image after entering this doing this, this
image is called as randomart image. This image will vary machine to machine and this key
will be used to communicate between any two machine for authentication.
This command will create an RSA key pair with an empty password. Generally, using an
empty password is not recommended, but in this case it is needed to unlock the key without
your interaction (you dont want to enter the passphrase every time Hadoop interacts with its
nodes).

Now we need to save this generated key to local machines host key fingerprint to the users
known hosts file. To do this use this command.

60
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

To check we have bypass the password we need to again execute

$ssh localhost

If this step asks you for a password that means you have done something wrong. So to repair
this you need to repeat the above steps from next to installing ssh again. Note-Before going
further we need to exit ssh just type exit in same terminal.
Once we have completed this we will need to download Hadoop 2.6 or any version from its
official site http://hadoop.apache.org/ then extract this Hadoop tar.gz manually or through
terminal. Now we need to move Hadoop folder to root this step is optional but its recommend
that you may move file to root. To move Hadoop folder to its appropriate location use
following command (note this command is only use to move folder to root if you are placing
to other location you can do it manually).

$sudo mv Desktop/hadoop-2.6.0 /usr/local/hadoop

Explanation of this command

sudo : This is keyword which allows user to grant super user permission temporary. This
command is Linux native command. It means super user do.
mv : This is Linux native command to move any file or directory to any location. it has two
parameters as
parameter 1: source Address
parameter 2:destination Address

Note: Both above addresses should be complete qualified addresses.

61
In above command my source address is Desktop/hadoop-2.6.0 you can change this
according to your source location and my destination address is /usr/local/hadoop

Note: I haven't given '/' after my destination this means that I am renaming my source Folder
name from hadoop-2.6.0 to hadoop.

Now we need to set system environment variable so that our system identifies Hadoop. To do
this open bashrc file as a root in any text editor.(in my case I am using gedit).

$sudo gedit ~/.bashrc

Note some time you get blank file please make sure that this file is ~/.bashrc

Append below content to this file.


#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
#end of Hadoop variable declaration

Explanation of Above code

Line 1: export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 We are setting Java


installation path so that Hadoop can use this path where ever required.
To get your installation path go to /usr/lib search for jvm open folder thee you will get many
folders open one without arrow on it (Arrow marked folders are Symbolic links in Linux,
similar to shortcuts in windows). That's your installed java.

Line 2: export HADOOP_INSTALL=/usr/local/hadoop this line is to identify installed


location of Java in the system. Note if you have kept this folder in some other location you
need to change path
accordingly.

Line 3 to 8: These are Hadoop components locations, We are defining these to reduce or
work later, I will explain the use of these lines later in depth. Save and close this ~/.bashrc.
As we have add successfully added the environment variable we need to reflect these to our
system for this you can do two things :

1.Close all terminals and reopen them as needed.


2.use following command

$source ~/.bashrc

Once you have done this type this command to check we have installed our Hadoop properly
or not you can use this command.

62
$hadoop version

if you get something like this it means you have successful set up Hadoop in your system.
Now the last thing we need to update JAVA_HOME in Hadoop so open
/hadoop/etc/hadoop/hadoop-env.sh from your installed Hadoop path and find these line in it

replace this line with your installed java path

63
save it and exit.

In this way we have installed Hadoop 2.6.0 in our Linux.


Note- above all steps are exactly same for installing all 2.x.x versions of Hadoop.

Hadoop can be used in three different modes:

1) Stand Alone Mode


This mode generally does not requires any configuration to be done. This mode is usually
used for Debugging purpose. All default configuration of Hadoop are done in this mode.

2) Pseudo Distributed Mode


This mode is also called single node mode. This mode needs little configuration. This mode
is used for Development purpose.

3)Distributed Mode
This mode is also called as Multinode node. This mode needs some changes to be done in
Psedudistrbuted mode along with ssh. This mode is generally used for commercial purpose.

1.2 Configuring Hadoop 2.6.0 Single Mode/Pseudo Distributed Mode in Linux

Hadoop is by default is configured in Standalone mode. This stand alone mode is used only
for debugging purpose but to develops any application we need to configure hadoop in
Pseudo Distributed mode.
To configure hadoop in Pseudo Distributed mode we need to edit following files :

64
1)core-site.xml
2)hdfs-site.xml
3)mapred-site.xml
4)yarn-site.xml

Please note that we need to carry out the steps as explained in Previous section of Setting up
hadoop 2.6.0 on Linux.
All mentioned files are present in hadoop installation directory under /etc/hadoop in my
case as per previous document its address is /usr/local/hadoop/etc/hadoop

1) configuring core-site.xml

core site xml is a file containing all core property of hadoop. For example, Namenode url,
Temporary storage directory path, etc. Hadoop has predefined configuration which we need
to override them if we mention any of the configuration in core-site.xml then during startup
of hadoop, hadoop will read these configuration an run hadoop using this. To get more details
of default configuration in hadoop you can visit
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop- common/core-
default.xml . So let us configure some of our requirements.

Open this file in any of the text editor and add these contents in it between

<configurations></configurations>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/abc/tmp</value>
</property>

Explanation of the above code

property 1: fs.defaultFS
This property overrides the default namenode url its syntax is hdfs://<ip-address of
namenode>:<port number> .This property was named as fs.default.name in hadoop 1.x.x
version. Note: Port number
can be any number above 255 to 65536

property 2: hadoop.tmp.dir
This property is used to change the temporary storage directory during execution of any
algorithm in hadoop by default its location is /tmp/hadoop-${user.name} in my case I have
created this directory in my home folder name tmp so its /home/abc/tmp.

2) Configuring hdfs-site.xml
This file contains all configuration about hadoop distributed file system also called as HDFS
such as storage location for namenode, storage location for datanode, replication factor of
HDFS, etc. Similar to core-site.xml we need to place below content between configuration

65
fields to get more information on this we can visit above mentioned link.
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/abc/tmp/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/abc/tmp/datanode</value>
</property>

Explanation of above properties in detail.

Property 1: dfs.replication

This property overrides the replication factor in hadoop. By default its value is 3 but in single
node cluster it is recommended to be 1.

Property 2: dfs.namenode.name.dir

This property overrides storage location of namenode data by default its storage location is
inside /tmp/hadoop-${user.name}. To change this we have set value of our folder location

66
in my case it is inside tmp directory created during core-site.xml

Property 3: dfs.datanode.data.dir

This property overrides storage location of datanode data by default its storage location is
inside /tmp/hadoop-${user.name}. To change this we have set value of our folder location
in my case it is also inside tmp directory created during core-site.xml

Note: for property 1 and property 2

Please make sure if our location of both datanode and namenode is in our root directory then
we should change its ownership and read write access using chown and chmode command in
Linux. Also we can create these directory manually before this setting them to our path else
hadoop will create them for you.

3) Configuring mapred-site.xml

This file contain all configuration about Map Reduce component in hadoop. Please note that
this file doesn't exist but you can copy or rename it from mapred-site.xml.template.
Configuration for this file is should be as followed.

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

Explanation of above property

As we know that from hadoop 2.x.x hadoop has introduced new layer of technology
developed by hadoop to improve performance of map reduce algorithm this layer is called as
yarn that is Yet Another Resource Negotiator. So here we are configuring that our hadoop
framework is yarn if we don't specify this property then our hadoop will use Map reduce 1
also called as MR1.

4) Configuring yarn-site.xml

This file contains all information about YARN as we will be using MR2 we need to specify
the auxiliary services that need to be used with MR2 so add these lines to yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

Now we have successfully configured hadoop 2.6.0 or say hadoop 2.x.x in Pseudo distributed
mode.

Before starting hadoop we need to format our namenode. Execute this command to format
namenode.

67
$hdfs namenode -format

Now to start hadoop we can use two command

$ start-dfs.sh
$ start-yarn.sh

or we can also use deprecated command as

$ start-all.sh

To check the which components are working you can use bellow command
$ jps

We will get output as

68

You might also like