Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 60

V.S.B.

COLLEGE OF ENGINEERING TECHNICAL


CAMPUS
Coimbatore To Pollachi Road NH-209, Ealur Privu,
Kinathukadavu Taluk,Coimbatore - 642109

CCS334 BIG DATA ANALYTICS LABORATORY

REGULATION 2021

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

B.TECH ARTIFICIAL INTELLIGENCE AND DATA SCIENCE


V.S.B. COLLEGE OF ENGINEERING TECHNICAL
CAMPUS
Coimbatore To Pollachi Road NH-209, Ealur Privu,
Kinathukadavu Taluk,Coimbatore - 642109

CCS334 BIG DATA ANALYTICS LABORATORY

REGULATION 2021

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA


SCIENCE

PREPARED BY: APPROVED BY:

SIGNATURE : SIGNATURE :

DATE: DATE:
VISION AND MISSION STATEMENTS OF THE INSTITUTE

Vision:

We endeavor to impart futuristic technical education of the highest quality to the student community and
to inculcate discipline in them to face the world with self-confidence and thus we prepare them for life as
responsible citizens to uphold human values and to be of services at large. We strive to bring up the
Institution as an Institution of Academic excellence of International standard.

Mission:

We transform persons into personalities by the state-of-the-art infrastructure, time consciousness, quick
response and the best academic practices through assessment and advice.

VISION AND MISSION STATEMENTS OF THE


DEPARTMENT

Name of the Department: Department of Artificial Intelligence and Data Science

Vision:

To offer a quality education Artificial Intelligence and Data Science, encourage life-long learning and make
graduates responsible for society by upholding social values in the field of emerging technology.

Mission:

 To produce graduates with sound technical knowledge and good skills that prepare them for
rewarding career in prominent industries.
 To promote collaborative learning and research with Industry, Government and International
organizations for continuous knowledge transfer and enhancement.
 To promote entrepreneurship and mould the graduates to be leaders by cultivating the spirit
of social ethical values.
PEO, PO and PSO Statements

Program Educational Objectives (PEO):

PEO1 - Graduates will have successful careers with high level of technical competency and
problem-solving skills to produce innovative solutions for industrial needs.

PEO2 – Graduates will have good professionalism, team work, effective communication, leadership
qualities and life-long learning for the welfare of mankind.

PEO3 – Graduates will be familiar with recent trends in industry for delivering and implementing
innovative system in collaboration.

Program Outcomes (POs):

1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering fundamentals,


and an engineering specialization to the solution of complex engineering problems.

2. Problem analysis: Identify, formulate, review research literature, and analyze complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences.

3. Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for the
public health and safety, and the cultural, societal, and environmental considerations.

4. Conduct investigations of complex problems: Use research-based knowledge and research methods
including design of experiments, analysis and interpretation of data, and synthesis of the information to
provide valid conclusions.

5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities with an
understanding of the limitations.

6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal,
health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional
engineering practice.

7. Environment and sustainability: Understand the impact of the professional engineering solutions in
societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.

8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of
theengineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader in
diverseteams, and in multidisciplinary settings.

10. Communication: Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write effective reports and
design documentation, make effective presentations, and give and receive clear instructions.

11. Project management and finance: Demonstrate knowledge and understanding of the engineering
and management principles and apply these to one’s own work, as a member and leader in a team, to
manage projects and in multidisciplinary environments.

12. Life Long Learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.

Program Specific Outcomes (PSOs):

PSO1 - Students will be apply programming skills to develop new software with assured quality.

PSO2 - Students will be ability to demonstrate specific coding skills to improve employability.

Name of the Faculty Member HOD


CS3311

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE


CCS334 BIG DATA ANALYTICS LABORATORY

COs Course outcomes


1 Describe big data and use cases from selected business domains.
2 Explain NoSQL big data management.
3 Install, configure, and run Hadoop and HDFS.
4 Perform map-reduce analytics using Hadoop.
5 Use Hadoop-related tools such as HBase, Cassandra, Pig, and Hive for big data analytics.
List of Experiments Mapping with COs, POs &
PSOs
Exp.No Name of the Experiment COs POs PSOs
To Study of Big Data Analytics and
1
Hadoop Architecture.
1 1,2,3,4,9,10,11,12 1,2
Downloading and installing Hadoop;
2 Understanding different Hadoop 4 1,2,3,4,9,10,11,12 1,2
modes. Startup scripts,Configuration
files.
Hadoop Implementation of file
3 management tasks, such as Adding files 4 1,2,3,4,9,10,11,12 1,2
and
directories,retrieving files and Deleting files
Implement of Matrix Multiplication
4 3 1,2,3,4,5,9,10,11,12 1,2
with Hadoop Map Reduce
Run a basic Word Count Map Reduce
5 3 1,2,3,4,5,9,10,11,12 1,2
program to understand Map Reduce
Paradigm.
Installation of Hive along with practice
6 examples. 5 1,2,3,4,5,9,10,11,12 1,2
Installation of HBase, Installing thrift
7
along with Practice examples
5 1,2,3,4,5,9,10,11,12 1,2
Practice importing and exporting data
8 2 1,2,3,4,5,9,10,11,12 1,2
from various databases.
Additional Experiments
MapReduce program to find the grades
9 3 1,2,3,4,5,9,10,11,12 1,2
of student’s.
MapReduce program to find the grades
10 3 1,2,3,4,5,9,10,11,12 1,2
of student’s.

Environment &
PO1 Engineering Knowledge PO7 PSO1 Professional Skills
Sustainability
PO2 Problem Analysis PO8 Ethics PSO2 Competency
PO3 Design & Development PO9 Individual & Team Work PSO3 Developing
PO4 Investigations PO10 Communication Skills
CS3311
Project Management &
PO5 Modern Tools PO11
Finance
PO6 Engineer & Society PO12 Life Long Learning

JUSTIFICATION FOR MAPPING JUSTIFICATION FOR MAPPING

PO/PSO
SNO JUSTIFICATION
MAPPED
PO1 Apply the knowledge acquired to study about introduction to big data and classifications.
Understanding the bigdata helps the students to classifying the types of technology and use
PO2 the convergence of key trends.
PO3 Design and develop industry related data and explore with latest technology.
Understand the bigdata concepts and classifying the types helps in analyzing and the
PO4
concept of web analytics.
Students were used modern tools and technique (crowd sourcing) and identify firewall
PO5 concepts.
CO504.1
To form the team or individual student know about how to classify and analyze the
PO09
big data technologies .
Students were given assignments, seminars, group discussions, and technical quizzes on
PO10 various topics in order to improve their communicative skills.
Students were given project on particular topic in order to improve their project skills and
PO11 develop their creativity skills.
Ability to engage in independent and life-long learning in the broadest context of
PO12 technological change.
PO1 Apply the knowledge acquired to classify the NO SQL and aggregate data models.
Understanding the databases helps the students to classifying the types of databases and
PO2 working on it.
PO3 Design the cassandra and Cassandra data models by applying cassandrs techniques.
Understand the distribution model and classifying the types helps in analyzing and
PO4
interpreting in the given model is good or not.
Students were used modern tools and technique (Cassandra data model) and identify the
PO5 Cassandra clients.
CO504.2
To form the team or individual student know about how to classify and analyze the
PO09
master slave replication
Students were given assignments, seminars, group discussions, and technical quizzes on
PO10 various topics in order to improve their communicative skills.
Students were given project on particular topic in order to improve their project skills and
PO11 develop their creativity skills.
Ability to engage in independent and life-long learning in the broadest context of
PO12 technological change.
PO1 Apply the knowledge acquired to classify map reduce and work with MR UNIT.
Understanding the DATA helps the students to classifying the types of data such as test data
PO2 and local data and working on it.
PO3 Design the YARN by applying mathematical techniques.
Understand the map reduce types classifying the types helps in analyzing and
CO504.3 PO4
interpreting in the input formats and output formats.
PO5 Students were used modern tools and technique (YARN) and identify The job scheduling.
To form the team or individual student know about how to classify move and the task
PO09
execution and its working.
Students were given assignments, seminars, group discussions, and technical quizzes on
PO10 various topics in order to improve their communicative skills.
CS3311
Students were given project on particular topic in order to improve their project skills and
PO11 develop their creativity skills.
Ability to engage in independent and life-long learning in the broadest context of
PO12 technological change.
PO1 Apply the knowledge acquired to understand the data format and analyzing with hadoop
Understanding the concept of hadoop helps the students to classifying the hadoop streaming
PO2 and hadoop pipes and working on it.
PO3 Design the hadoop and hadoop streaming form by applying mathematical techniques.
Understand the basics of hadoop and students will know about file based data
PO4 structures.
CO504.4 To form the team or individual student know about how to classify move and the
PO09
avro and hadoop difference.
Students were given assignments, seminars, group discussions, and technical quizzes on
PO10 various topics in order to improve their communicative skills.
Students were given project on particular topic in order to improve their project skills and
PO11 develop their creativity skills.
Ability to engage in independent and life-long learning in the broadest context of
PO12 technological change.
PO1 Apply the knowledge acquired to know about all hadoop related tools.
Understanding the pidlathin scripts helps the students to classifying the types of algorithm
PO2 and working on it.
Design the hive data models and hiveQL data manipulation by applying mathematical
PO3 techniques.
Understand the recursive andunrecursively classifying the types helps in analyzing and
PO4
interpreting in the given converting PCM to MPCM.
Students were used modern tools and technique grunt and identify the types and new
CO504.5 PO5 mechanisms.
To form the team or individual student know about how to classify hive and the
PO09
hadoop.
Students were given assignments, seminars, group discussions, and technical quizzes on
PO10 various topics in order to improve their communicative skills.
Students were given project on particular topic in order to improve their project skills and
PO11 develop their creativity skills.
Ability to engage in independent and life-long learning in the broadest context of
PO12 technological change.

PO/PSO
SNO JUSTIFICATION
MAPPED
Ability to gain technologies to design the big data concepts and know about how to develop
PSO1
the tools.
CO504.1 Acquire the knowledge and behaviors in designing a applications that leads successful in
PSO2
bid data technology.
Ability to apply the professional skills in identifying the appropriate in data base and
PSO1
classifying to graph database and schema less database.
CO504.2
Acquire Students Understand the distribution model and classifying the types helps in
PSO2
analyzing and interpreting in the given model is good or not.
Ability to gain and Understand the map reduce types classifying the types helps in
PSO1
analyzing and interpreting in the input formats and output formats.
CO504.3
Acquire the knowledge and behaviors of map reduce types which is used is to differentiate
PSO2
input format and output format .
CO504.4 PSO1 Apply the mathematical Functions and identify hadoopstreming and hadooppipes .
CS3311
Ability to Apply the knowledge and behaviors in identifying the appropriate design
PSO2
ofhadoop distributed file system which is useful for the career.
Apply the mathematical skills and Design the PIG LATIN problem and GRUNT by
PSO1
applying mathematical techniques.
CO504.5 Acquire the knowledge and behaviors To form the team or individual student know
PSO2
about how to classify HBASE and the data models.

Name of the Faculty Member HOD


CS3311

CCS334 BIG DATA ANALYTICS LTP C


2023
COURSE OBJECTIVES:
 To understand bigdata.
 To learn and use NoSQL big data management.
 To learn map reduce analytics using Hadoop and related tools.
 To work with map reduce applications.
 To understand the usage of Hadoop related tools for Big Data Analytics.

LIST OF EXPERIMENTS:
1. Downloading and installing Hadoop; Understanding different Hadoop modes. Startup
scripts, Configuration files.
2. Hadoop Implementation of file management tasks, such as Adding files and
directories, retrieving files and Deleting files
3. Implement of Matrix Multiplication with Hadoop MapReduce
4. Run a basicWord Count Map Reduce program to understand MapReduce Paradigm.
5. Installation of Hive along with practice examples.
7. Installation of HBase, Installing thrift along with Practice examples
8. Practice importing and exporting data from various databases.

SoftwareRequirements:
Cassandra, Hadoop, Java, Pig, Hive and HBase.
TOTAL: 30 PERIODS

COURSE OUT COMES:


After the completion of this course, students will be ableto:
CO1:Describe big data and use cases from selected business domains.
CO2:Explain NoSQL big data management.
CO3:Install, configure, and run Hadoop and HDFS.
CO4:Perform map-reduce analytics using Hadoop.
CO5:Use Hadoop-related tools such as HBase, Cassandra, Pig, and Hive for big data analytics.
CS3311 LIST OF EXPERIMENTS

Exp.No Name of the Experiment

To Study of Big Data Analytics and


1
Hadoop Architecture.

Downloading and installing Hadoop; Understanding different Hadoop modes. Startup


2
scripts,Configuration files.

Hadoop Implementation of file management tasks, such as Adding files and directories,retrieving files
3
and Deleting files

4 Implement of Matrix Multiplication with Hadoop Map Reduce

5 Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.

6 Installation of Hive along with practice examples.

7 Installation of HBase, Installing thrift along with Practice examples

8 Practice importing and exporting data from various databases.

Additional Experiments

9 MapReduce program to find the grades of student’s.

10 MapReduce program to find the grades of student’s.


EXP NO: 1
To Study of Big Data Analytics and Hadoop Architecture
Date:

Aim: To Study of Big Data Analytics and Hadoop Architecture.

Introduction of bigdata architecture:


➢ A big data architecture is designed to handle the ingestion, processing, and
analysis of data that is too large or complex for traditional database
systems.

Component of Big Data Architecture


➢ Data sources. All big data solutions start with one or more data sources.
➢ Data storage. Data for batch processing operations is typically stored in a distributed
file store that can hold high volumes of large files in various formats
➢ Batch processing. Because the data sets are so large, often a big data solution must
process data files using long-running batch jobs to filter, aggregate, and otherwise
prepare the data for analysis. Usually these jobs involve reading source files,
processing them, and writing the output to new files.
➢ Real-time message ingestion. If the solution includes real-time sources, the
architecture must include a way to capture and store real-time messages for stream
processing.
➢ Stream processing. After capturing real-time messages, the solution must process
them by filtering, aggregating, and otherwise preparing the data for analysis
➢ Analytical data store. Many big data solutions prepare data for analysis and then
serve the processed data in a structured format that can be queried using analytical
tools
➢ Analysis and reporting. The goal of most big data solutions is to provide insights
into the data through analysis and reporting.
➢ Orchestration. Most big data solutions consist of repeated data processing
operations, encapsulated in work flows, that transform source data, move data
between multiple sources and sinks, load the processed data into an analytical data
store, or push the results straight to a report or dashboard.
Introduction of Hadoop Architecture:

➢ Apache Hadoop offers a scalable, flexible and reliable distributed computing big
data framework for a cluster of systems with storage capacity and local computing
power by leveraging commodity hardware.
➢ Hadoop follows a Master Slave architecture for the transformation and analysis of
large datasets using Hadoop MapReduce paradigm. The 3 important hadoop
components that play a vital role in the Hadooparchitecture.

➢ Hadoop Common – the libraries and utilities used by other Hadoop modules
➢ Hadoop Distributed File System (HDFS) – the Java-based scalable system thatstores
data across multiple machines without prior organization.
➢ YARN – (Yet Another Resource Negotiator) provides resource management for the
processes running on Hadoop.
➢ MapReduce – a parallel processing software framework. It is comprised of two
steps. Map step is a master node that takes inputs and partitions them into smaller
sub problems and then distributes them to worker nodes. After the map step has
taken place, the master node takes the answers to all of the sub problems and
combines them to produce output.

Result: Thus successfully studied of Big Data Analytics and Hadoop Architecture
EXP NO: 2
Downloading and installing Hadoop; Understanding
Date: different Hadoop modes. Startup scripts,Configuration
files.

Aim:
To Install Apache Hadoop.
Hadoop software can be installed in three modes of
Hadoop is a Java-based programming framework that supports the processing and storage of
extremely large datasets on a cluster of inexpensive machines. It was the first major open source
project in the big data playing field and is sponsored by the Apache Software Foundation.

Hadoop-2.7.3 is comprised of four main layers:

 Hadoop Common is the collection of utilities and libraries that support other Hadoop
modules.
 HDFS, which stands for Hadoop Distributed File System, is responsible for persisting
data to disk.
 YARN, short for Yet Another Resource Negotiator, is the "operating system" for HDFS.
 MapReduce is the original processing model for Hadoop clusters. It distributes work within
the cluster or map, then organizes and reduces the results from the nodes into a response to
a query. Many other processing models are available for the 2.x version of Hadoop.
Hadoop clusters are relatively complex to set up, so the project includes a stand-alone mode
which is suitable for learning about Hadoop, performing simple operations, and debugging.

Procedure:

we'll install Hadoop in stand-alone mode and run one of the example example MapReduce
programs it includes to verify the installation.

Prerequisites:

Step1: Installing Java 8 version.


Openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
This output verifies that OpenJDK has been successfully installed.
Note: To set the path for environment variables. i.e. JAVA_HOME

Step2: Installing Hadoop


With Java in place, we'll visit the Apache Hadoop Releases page to find the most
recent stable release. Follow the binary for the current release:
Download Hadoop from www.hadoop.apache.org
lOMoARcPSD|161 847 63

Procedure to Run Hadoop

1. Install Apache Hadoop 2.2.0 in Microsoft Windows OS

If Apache Hadoop 2.2.0 is not already installed then follow the post
Build, Install,Configure and Run Apache Hadoop 2.2.0 in Microsoft
Windows OS.

2. Start HDFS (Namenode and Datanode) and YARN (Resource


Manager and NodeManager)

Run following
commands.
Command Prompt C:\
Users\abhijitg>cd c:\
hadoop c:\
hadoop>sbin\start
-dfs c:\
hadoop>sbin\start
-yarn starting yarn
daemons

Namenode, Datanode, Resource Manager and Node Manager will be


started infew minutes and ready to execute Hadoop MapReduce job in the
Single Node (pseudo-distributed mode) cluster.
lOMoARcPSD|161 847 63

Resource Manager & Node Manager:

Result: Thus Hadoop installed successfully.


lOMoARcPSD|161 847 63

EXP NO: 3
Hadoop Implementation of file management tasks, such as
Date: Adding files and directories,retrieving files and Deleting
files

Aim:
Implement the following file management tasks in Hadoop:
Adding files and directories
Retrieving files
Deleting Files

Procedure:

HDFS is a scalable distributed filesystem designed to scale to petabytes of data


while running on top of the underlying filesystem of the operating system. HDFS
keeps track of where the data resides in a network by associating the name of its
rack (or network switch) with the dataset. This allows Hadoop to efficiently
schedule tasks to those nodes that contain data, or which are nearest to it, optimizing
bandwidth utilization. Hadoop provides a set of command line utilities that work
similarly to the Linux file commands, and serve as your primary interface with
HDFS. We‘re going to have a look into HDFS by interacting with it from the
command line.
We will take a look at the most common file management tasks in Hadoop, which
include:
Adding files and directories to HDFS
Retrieving files from HDFS to local filesystem
Deleting files from HDFS

Algorithm:

Syntax And Commands To Add, Retrieve And Delete Data From Hdfs

Step-1: Adding Files and Directories to HDFS

Before you can run Hadoop programs on data stored in HDFS, you‘ll need to put the
data into HDFS first. Let‘s create a directory and put a file in it. HDFS has a default
working directory of /user/$USER, where $USER is your login user name. This
directory isn‘t automatically created for you, though, so let‘s create it with the mkdir
command. For the purpose of illustration, we use chuck. You should substitute your
user name in the example commands.
hadoop fs -mkdir /user/chuck
hadoop fs -put example.txt
hadoop fs -put example.txt /user/chuck
lOMoARcPSD|161 847 63

Step-2 : Retrieving Files from HDFS

The Hadoop command get copies files from HDFS back to the local filesystem. To retrieve
example.txt, we can run the following command.
hadoop fs -cat example.txt

Step-3: Deleting Files from HDFS

hadoop fs -rm example.txt

Command for creating a directory in hdfs is “hdfs dfs –mkdir /lendicse”


Adding directory is done through the command “hdfs dfs –put lendi_english /”.
Step-4: Copying Data from NFS to HDFS

Copying from directory command is “hdfs dfs –copyFromLocal /home /lendi/ Desktop/
shakes/
glossary /lendicse/”

View the file by using the command “hdfs dfs –cat /lendi_english/glossary”
Command for listing of items in Hadoop is “hdfs dfs –ls hdfs://localhost:9000/”
Command for Deleting files is “hdfs dfs –rm r /kartheek”

Result: Thus implement the file management tasks in Hadoop Completed successfully.
lOMoARcPSD|161 847 63

EXP NO: 4
Implement of Matrix Multiplication with Hadoop Map Reduce
Date:

AIM: To Develop a MapReduce program to implement Matrix Multiplication.

In mathematics, matrix multiplication or the matrix product is a binary


operationthat produces a matrix from two matrices. The definition is motivated
by linear equations and linear transformations on vectors, which have numerous
applicationsin applied mathematics, physics, and engineering. In more detail, if
A is an n × m matrix and B is an m × p matrix, their matrix product AB is an n
× p matrix, in which the m entries across a row of A are multiplied with the m
entries down a column of B and summed to produce an entry of AB. When two
linear transformations are represented by matrices, then the matrix product
represents the composition of the two transformations.

Algorithm for Map Function.


a. for each element mij of M do
lOMoARcPSD|161 847 63

produce (key,value) pairs as ((i,k), (M,j,mij), for k=1,2,3,.. upto the number of
columns of N

b. for each element njk of N do


produce (key,value) pairs as ((i,k),(N,j,Njk), for i = 1,2,3,.. Upto the number
of rows of M.
c. return Set of (key,value) pairs that each key (i,k), has list with values
(M,j,mij) and (N, j,njk) for all possible values of j.
Algorithm for Reduce Function.

d. for each key (i,k) do


e. sort values begin with M by j in listM sort values begin with N by j in
listN multiply mij and njk for jth value of each list
f. sum up mij x njk return (i,k), Σj=1 mij x njk

Step 1. Download the hadoop jar files with these links.


Download Hadoop Common Jar files: https://goo.gl/G4MyHp
$ wget https://goo.gl/G4MyHp -O hadoop-common-2.2.0.jar
Download Hadoop Mapreduce Jar File: https://goo.gl/KT8yfB
$ wget https://goo.gl/KT8yfB -O hadoop-mapreduce-client-core-2.7.1.jar

Step 2. Creating Mapper file for Matrix Multiplication.


import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.ArrayList;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
lOMoARcPSD|161 847 63

import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.ReflectionUtils;

class Element implements Writable


{ int tag;
int index;
double value;
Element() {
tag = 0;
index = 0;
value = 0.0;
}
Element(int tag, int index, double value)
{ this.tag = tag;
this.index = index;
this.value = value;
}
@Override
public void readFields(DataInput input) throws IOException
{ tag = input.readInt();
index = input.readInt();
value = input.readDouble();
}
@Override
public void write(DataOutput output) throws IOException {
output.writeInt(tag);
output.writeInt(index);
output.writeDouble(value);
}
}
class Pair implements WritableComparable<Pair> {
int i;
int j;

Pair() {
i = 0;
lOMoARcPSD|161 847 63

j = 0;
}
Pair(int i, int j) {
this.i = i;
this.j = j;
}
@Override
public void readFields(DataInput input) throws IOException
{ i = input.readInt();
j = input.readInt();
}
@Override
public void write(DataOutput output) throws IOException {
output.writeInt(i);
output.writeInt(j);
}
@Override
public int compareTo(Pair compare)
{ if (i > compare.i) {
return 1;
} else if ( i < compare.i)
{ return -1;
} else {
if(j > compare.j) {
return 1;
} else if (j < compare.j)
{ return -1;
}
}
return 0;
}
public String toString() {
return i + " " + j + "
";
}
}
public class Multiply {
public static class MatriceMapperM extends Mapper<Object,Text,IntWritable,Element>
{
lOMoARcPSD|161 847 63

@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException
{
String readLine = value.toString();
String[] stringTokens =
readLine.split(",");

int index = Integer.parseInt(stringTokens[0]);


double elementValue = Double.parseDouble(stringTokens[2]);
Element e = new Element(0, index, elementValue);
IntWritable keyValue = new
IntWritable(Integer.parseInt(stringTokens[1]));
context.write(keyValue, e);
}
}
public static class MatriceMapperN extends Mapper<Object,Text,IntWritable,Element> {
@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException
{
String readLine = value.toString();
String[] stringTokens =
readLine.split(",");
int index = Integer.parseInt(stringTokens[1]);
double elementValue = Double.parseDouble(stringTokens[2]);
Element e = new Element(1,index, elementValue);
IntWritable keyValue =
new
IntWritable(Integer.parseInt(stringTokens[0]));
context.write(keyValue, e);
}
}
public static class ReducerMxN extends Reducer<IntWritable,Element, Pair,
DoubleWritable> {
@Override
public void reduce(IntWritable key, Iterable<Element> values, Context context) throws
IOException, InterruptedException {
ArrayList<Element> M = new ArrayList<Element>();
ArrayList<Element> N = new ArrayList<Element>();
Configuration conf = context.getConfiguration();
for(Element element : values) {
Element tempElement = ReflectionUtils.newInstance(Element.class,
lOMoARcPSD|161 847 63

conf);
lOMoARcPSD|161 847 63

ReflectionUtils.copy(conf, element, tempElement);

if (tempElement.tag == 0) {
M.add(tempElement);
} else if(tempElement.tag == 1)
{ N.add(tempElement);
}
}
for(int i=0;i<M.size();i++) {
for(int j=0;j<N.size();j++) {

Pair p = new Pair(M.get(i).index,N.get(j).index);


double multiplyOutput = M.get(i).value * N.get(j).value;

context.write(p, new DoubleWritable(multiplyOutput));


}
}
}
}
public static class MapMxN extends Mapper<Object, Text, Pair, DoubleWritable> {
@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException
{
String readLine = value.toString();
String[] pairValue = readLine.split("
"); Pair p = new
Pair(Integer.parseInt(pairValue[0]),Integer.parseInt(pairValue[1]));
DoubleWritable val = new
DoubleWritable(Double.parseDouble(pairValue[2]));
context.write(p, val);
}
}
public static class ReduceMxN extends Reducer<Pair, DoubleWritable, Pair,
DoubleWritable> {
@Override
public void reduce(Pair key, Iterable<DoubleWritable> values, Context
context) throws IOException, InterruptedException {
double sum = 0.0;
for(DoubleWritable value : values)
{
lOMoARcPSD|161 847 63

sum += value.get();
}
context.write(key, new DoubleWritable(sum));
}
}
public static void main(String[] args) throws Exception
{ Job job = Job.getInstance();
job.setJobName("MapIntermediate");
job.setJarByClass(Project1.class);
MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class,
MatriceMapperM.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class,
MatriceMapperN.class);
job.setReducerClass(ReducerMxN.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(Element.class);
job.setOutputKeyClass(Pair.class);
job.setOutputValueClass(DoubleWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.waitForCompletion(true);
Job job2 = Job.getInstance();
job2.setJobName("MapFinalOutput");
job2.setJarByClass(Project1.class);

job2.setMapperClass(MapMxN.class);
job2.setReducerClass(ReduceMxN.class);

job2.setMapOutputKeyClass(Pair.class);
job2.setMapOutputValueClass(DoubleWritable.class);

job2.setOutputKeyClass(Pair.class);
job2.setOutputValueClass(DoubleWritable.class);

job2.setInputFormatClass(TextInputFormat.class);
job2.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.setInputPaths(job2, new Path(args[2]));


FileOutputFormat.setOutputPath(job2, new Path(args[3]));
lOMoARcPSD|161 847 63

job2.waitForCompletion(true);
}
}

Step 5. Compiling the program in particular folder named as operation

#!/bin/bash

rm -rf multiply.jar classes

module load hadoop/2.6.0

mkdir -p classes
javac -d classes -cp classes:`$HADOOP_HOME/bin/hadoop classpath` Multiply.java
jar cf multiply.jar -C classes .

echo "end"

Step 6. Running the program in particular folder named as operation


export HADOOP_CONF_DIR=/home/$USER/cometcluster
module load hadoop/2.6.0
myhadoop-configure.sh
start-dfs.sh
start-yarn.sh

hdfs dfs -mkdir -p /user/$USER


hdfs dfs -put M-matrix-large.txt /user/$USER/M-matrix-large.txt
hdfs dfs -put N-matrix-large.txt /user/$USER/N-matrix-large.txt
hadoop jar multiply.jar edu.uta.cse6331.Multiply /user/$USER/M-matrix-large.txt
/user/$USER/N-matrix-large.txt /user/$USER/intermediate
/user/$USER/output rm -rf output-distr
mkdir output-distr
hdfs dfs -get /user/$USER/output/part* output-distr

stop-yarn.sh
stop-dfs.sh
myhadoop-cleanup.sh
lOMoARcPSD|161 847 63

Result: Thus implement of Matrix Multiplication with Hadoop Map Reduce completed
successfully.
lOMoARcPSD|161 847 63

EXP NO: 5
Run a basic Word Count Map Reduce program to
Date: understand Map Reduce Paradigm.

Aim: To run a basic Word Count Map Reduce program.

Procedure:

Create a text file with some content. We'll pass this file as input to the
wordcount MapReduce job for counting words.

C:\file1.txt
Install Hadoop

Run Hadoop Wordcount Mapreduce Example

Create a directory (say 'input') in HDFS to keep all the text files (say 'file1.txt') to be used for counting
words.
C:\Users\abhijitg>cd c:\hadoop C:\
hadoop>bin\hdfs dfs -mkdir input

Copy the text file(say 'file1.txt') from local disk to the newly created 'input' directory in HDFS.

C:\hadoop>bin\hdfs dfs -copyFromLocal c:/file1.txt input

Check content of the copied file.

C:\hadoop>hdfs dfs -ls


input Found 1 items
-rw-r--r-- 1 ABHIJITG supergroup 55 2014-02-03 13:19 input/file1.txt

C:\hadoop>bin\hdfs dfs -cat input/file1.txt


Install Hadoop
Run Hadoop Wordcount Mapreduce Example

Run the wordcount MapReduce job provided


in %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.2.0.jar

C:\hadoop>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-


2.2.0.jar wordcount input output
14/02/03 13:22:02 INFO client.RMProxy: Connecting to ResourceManager at
/0.0.0.0:8032
14/02/03 13:22:03 INFO input.FileInputFormat: Total input paths to process : 1
14/02/03 13:22:03 INFO mapreduce.JobSubmitter: number of splits:1
:
:
14/02/03 13:22:04 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1391412385921_0002
lOMoARcPSD|161 847 63

14/02/03 13:22:04 INFO impl.YarnClientImpl: Submitted application


application_1391412385921_0002 to ResourceManager at /0.0.0.0:8032
14/02/03 13:22:04 INFO mapreduce.Job: The url to track the job:
http://ABHIJITG:8088/proxy/application_1391412385921_0002/
14/02/03 13:22:04 INFO mapreduce.Job: Running job: job_1391412385921_0002
14/02/03 13:22:14 INFO mapreduce.Job: Job job_1391412385921_0002 running
in uber mode : false
14/02/03 13:22:14 INFO mapreduce.Job: map 0% reduce 0%
14/02/03 13:22:22 INFO mapreduce.Job: map 100% reduce 0%
14/02/03 13:22:30 INFO mapreduce.Job: map 100% reduce 100%
14/02/03 13:22:30 INFO mapreduce.Job: Job job_1391412385921_0002 completed
successfully
14/02/03 13:22:31 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=89
FILE: Number of bytes written=160142
FILE: Number of read operations=0
FILE: Number of large read
operations=0 FILE: Number of write
operations=0
HDFS: Number of bytes read=171
HDFS: Number of bytes written=59
HDFS: Number of read
operations=6
HDFS: Number of large read
operations=0 HDFS: Number of write
operations=2
Job Counters
Launched map tasks=1
Launched reduce
tasks=1 Data-local map
tasks=1
Total time spent by all maps in occupied slots (ms)=5657
Total time spent by all reduces in occupied slots (ms)=6128
Map-Reduce Framework
Map input records=2
Map output records=7
Map output bytes=82
Map output materialized
bytes=89 Input split bytes=116
Combine input records=7
Combine output
records=6 Reduce input
groups=6 Reduce shuffle
bytes=89 Reduce input
records=6 Reduce output
records=6 Spilled
Records=12 Shuffled
lOMoARcPSD|161 847 63

Maps =1
Failed Shuffles=0
Merged Map
outputs=1
GC time elapsed (ms)=145
lOMoARcPSD|161 847 63

CPU time spent (ms)=1418


Physical memory (bytes) snapshot=368246784
Virtual memory (bytes) snapshot=513716224
Total committed heap usage
(bytes)=307757056
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=55
File Output Format Counters
Bytes Written=59

Result: Thus basic Word Count Map Reduce program to understand Map Reduce
Paradigm run successfully.
lOMoARcPSD|161 847 63

EXP NO: 6
Installation of Hive along with practice examples.
Date:

Aim: To Installation of Hive and create Database

Prerequisites:

Step1: Installing Java 8 version.


Openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
This output verifies that OpenJDK has been successfully installed.
Note: To set the path for environment variables. i.e. JAVA_HOME
Step2: Installing Hadoop

With Java in place, we'll visit the Apache Hadoop Releases page to find the most recent stable release.

Procedure to Run Hive:

1. Download Hive zip

Can also use any other STABLE version for Hive.

2. Unzip and Install Hive

After Downloading the Hive, we need to Unzip the apache-hive-3.1.2-bin.tar.gz file.


lOMoARcPSD|161 847 63

Once extracted, we would get a new file apache-hive-3.1.2-bin.tar


Now, once again we need to extract this tar file.
 Now we can organize our Hive installation, we can create a folder and move the final extracted file in
it. For Eg. :-

 Please note while creating folders, DO NOT ADD SPACES IN BETWEEN THE FOLDER NAME.(it
can cause issues later)

 I have placed my Hive in D: drive you can use C: or any other drive also.

3. Setting Up Environment Variables


Another important step in setting up a work environment is to set your Systems environment variable.

To edit environment variables, go to Control Panel > System > click on the “Advanced system
settings” link
Alternatively, We can Right click on This PC icon and click on Properties and click on the “Advanced
system settings” link
Or, easiest way is to search for Environment Variable in search bar and there you go
lOMoARcPSD|161 847 63
lOMoARcPSD|161 847 63

3.1 Setting HIVE_HOME

 Open environment Variable and click on “New” in “User Variable”

 On clicking “New”, we get below screen.

 Now as shown, add HIVE_HOME in variable name and path of Hive in Variable Value.
 Click OK and we are half done with setting HIVE_HOME.

3.2 Setting Path Variable

 Last step in setting Environment variable is setting Path in System Variable.

 Select Path variable in the system variables and click on “Edit”.

 Now we need to add these paths to Path Variable :-


* %HIVE_HOME%\bin

 Click OK and OK. & we are done with Setting Environment Variables.
3.4 Verify the Paths

 Now we need to verify that what we have done is correct and reflecting.

 Open a NEW Command Window

 Run following commands


echo %HIVE_HOME%
lOMoARcPSD|161 847 63

4. Editing Hive

Once we have configured the environment variables next step is to configure Hive. It has 7 parts:-

4.1 Replacing bins

First step in configuring the hive is to download and replace the bin folder.

* Go to this GitHub Repo and download the bin folder as a zip.

* Extract the zip and replace all the files present under bin folder to %HIVE_HOME%\bin

Note:- If you are using different version of HIVE then please search for its respective bin folder and
download it.

4.2 Creating File Hive-site.xml

Now we need to create the Hive-site.xml file in hive for configuring it :-


(We can find these files in Hive -> conf -> hive-default.xml.template)
We need to copy the hive-default.xml.template file and paste it in the same location and rename it to hive-
site.xml. This will act as our main Config file for Hive.

4.3 Editing Configuration Files

4.3.1 Editing the Properties

Now Open the newly created Hive-site.xml and we need to edit the following properties
<property>
<name>hive.metastore.uris</name>
<value>thrift://<Your IP Address>:9083</value>

<property>
<name>hive.downloaded.resources.dir</name>
<value><Your drive Folder>/${hive.session.id}_resources</value>

<property>
lOMoARcPSD|161 847 63
lOMoARcPSD|161 847 63

<name>hive.exec.scratchdir</name>
<value>/tmp/mydir</value>
Replace the value for <Your IP Address> with the IP Address of your System and replace <Your drive
Folder> with the Hive folder Path.

4.3.2 Removing Special Characters

This is a short step and we need to remove all the &#8 character present in the hive-site.xml file.

4.3.3 Adding few More Properties

Now we need to add the following properties as it is in the hive-site.xml File.


<property>
<name>hive.querylog.location</name>
<value>$HIVE_HOME/iotmp</value>
<description>Location of Hive run time structured log file</description>
</property><property>
<name>hive.exec.local.scratchdir</name>
<value>$HIVE_HOME/iotmp</value>
<description>Local scratch space for Hive jobs</description>
</property><property>
<name>hive.downloaded.resources.dir</name>
<value>$HIVE_HOME/iotmp</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
Great..!!! We are almost done with the Hive part, for configuring MySQL database as Metastore for Hive,
we need to follow below steps:-

4.4 Creating Hive User in MySQL

The next important step in configuring Hive is to create users for MySQL.
These Users are used for connecting Hive to MySQL Database for reading and writing data from it.
Note:- You can skip this step if you have created the hive user while SQOOP installation.

 Firstly, we need to open the MySQL Workbench and open the workspace(default or any specific, if
you want). We will be using the default workspace only for now.

 Now Open the Administration option in the Workspace and select Users and privileges option
under Management.
lOMoARcPSD|161 847 63
lOMoARcPSD|161 847 63

 Now select Add Account option and Create an new user with Login Name as hive and Limit to
Host Mapping as the localhost and Password of your choice.

 Now we have to define the roles for this user under Administrative Roles
and select DBManager ,DBDesigner and BackupAdmin Roles
lOMoARcPSD|161 847 63

 Now we need to grant schema privileges for the user by using Add Entry option and
selecting the schemas we need access to.

I am using schema matching pattern as %_bigdata% for all my bigdata related schemas. You can use other
2 options also.

 After clicking OK we need to select All the privileges for this schema.

 Click Apply and we are done with the creating Hive user.
4.5 Granting permission to Users
Once we have created the user hive the next step is to Grant All privileges to this user for all the Tables in
the previously selected Schema.

 Open the MySQL cmd Window. We can open it by using the Window’s Search bar.
lOMoARcPSD|161 847 63

 Upon opening it will ask for your root user password(created while setting up MySQL).

 Now we need to run the below command in the cmd window.


grant all privileges on test_bigdata.* to 'hive'@'localhost';
where test_bigdata will be you schema name and hive@localhost will be the user name @ Host name.

4.6 Creating Metastore

Now we need to create our own metastore for Hive in MySQL..


Firstly, we need to create a database for metastore in MySQL OR we can use the one which used in
previous step test_bigdata in my case.
Now Navigate to the below path
hive -> scripts -> metastore -> upgrade -> mysql and execute the file hive-schema-3.1.0.mysql in MySQL
in your database.
Note:- If you are using different Database, select the folder for same in upgrade folder and execute
the hive-schema file
.
4.7 Adding Few More Properties(Metastore related Properties)

Finally, we need to open our hive-site.xml file once again and make some changes their, these are related
to Hive metastore that’s why did not add them in starting so as to distinguish between the different set of
properties.
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/<Your Database>?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the
connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>

<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://localhost:9000/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value><Hive Password></value>
lOMoARcPSD|161 847 63
lOMoARcPSD|161 847 63

<description>password to use against metastore database</description>


</property>

<property>
<name>datanucleus.schema.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.autoCreateTables</name>
<value>True</value>
</property>

<property>
<name>datanucleus.schema.validateTables</name>
<value>true</value>
<description>validates existing schema against code. turn this on if you want to verify existing
schema</description>
</property>
Replace the value for <Hive Password> with the hive user password that we created in MySQL user
creation. And <Your Database> with the database that we used for metastore in MySQL.

5. Starting Hive

5.1 Starting Hadoop

Now we need to start a new Command Prompt remember to run it as administrator to avoid permission
issues and execute below commands
start-all.cmd
All the 4 daemons should be UP and running.

5.2 Starting Hive Metastore

Open a cmd window, run below command to start the Hive metastore.
hive --service metastore

5.3 Starting Hive

Now open a new cmd window and run the below command to start Hive
hive

Hive – Create Database from Java Example


Hive Java Dependency

<dependency>
<groupId>org.apache. hive</groupId>
<artifactId>hive- jdbc</artifactId>
<version>3.1.2</version>
lOMoARcPSD|161 847 63

</dependency>
Start HiveServer2

To connect to Hive from Java, you need to start hiveserver2 from $HIVE_HOME/bin

prabha@namenode:~/hive/bin$ ./hiveserver2
2020-10-03 23:17:08: Starting HiveServer2
Copy
Below are complete Java example of how to create a Hive Database.
Create a Hive Table from Java Example

package com.sparkbyexamples.hive;

import java.sql.Connection;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveCreateDatabase {


public static void main(String[] args) {
Connection con = null;
try {
String conStr = "jdbc:hive2://192.168.1.148:10000/default";
Class.forName("org.apache.hive.jdbc.HiveDriver");
con = DriverManager.getConnection(conStr, "", "");
Statement stmt = con.createStatement();
stmt.executeQuery("CREATE DATABASE emp");
System.out.println("Database emp created successfully.");
} catch (Exception ex) {
ex.printStackTrace();
} finally {
try {
if (con != null)
con.close();
} catch (Exception ex) {
}
}
}
}

Result: Thus Hive installed successfully and Database created.


lOMoARcPSD|161 847 63

EXP NO: 7
Installation of HBase, Installing thrift along with
Date: Practice examples

Aim: To Installation of HBase and create Database

Prerequisites:

Installing Hadoop

Procedure:

Step 1: Download HBase from Apache HBase site.

 Download Link -> https://hbase.apache.org/downloads.html ( I used version hbase-1.4.9-bin.tar.gz)

Step 2: Unzip it to a folder — I used c:\software\hbase.

Step 3: Now we need to change 2 files a config and a cmd file. Inorder to do that, go to the unzipped location.

Change 1 Edit hbase-config.cmd, located in the bin folder under the unzipped location and add the below line

to set JAVA_HOME [add it below the comments section on the top]

set JAVA_HOME=C:\software\Java\jdk1.8.0_201

Change 2 Edit hbase-site.xml, located in the conf folder under the unzipped location and add the section

below to hbase-site.xml. [inside <configuration> tag]

Note : hbase.rootdir’s value Eg : hdfs://localhost:9000/hbase, must be same as the hadoop core-site.xml’s

fs.defaultFS value.

<property>

<name>hbase.rootdir</name>

<value>file:/home/hadoop/HBase/HFiles</value>

</property>

<property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/home/hadoop/zookeeper</value>

</property>
lOMoARcPSD|161 847 63

<property>

<name>hbase.cluster.distributed</name>

<value>false</value>

</property>

<property>

<name>hbase.rootdir</name>

<value>hdfs://localhost:9000/hbase</value>

</property>

Step 5: Now we are all set to run HBase, to start HBase execute the command below from the bin folder.
 Open Command Prompt and cd to Hbase’ bin directory
 Run start-hbase.cmd
 Look for any errors
Step 6: Test the installation using HBase shell

 Open command prompt and cd to HBase’ bin directory

 Run hbase shell [should connect to the HBase server]

 Try creating a table

 create ‘emp’,’p’

 list [Table name should get printed]

 put ‘emp’,’emp01',’p:fn’,’First Name’

 scan ‘emp’ [The row content should get printed]

Result: Thus Hive installed successfully and Database created.


lOMoARcPSD|161 847 63

EXP NO: 8
Practice importing and exporting data from various databases.
Date:

Aim: To Installation of HBase and create Database

Procedure:

Step 1: Create a blank dossier or open an existing one.

Step 2: Choose Add Data - > New Data to import data into a new dataset. or
In the Datasets panel, click More next to the dataset name and choose Edit Dataset to add data to
the dataset. The Preview Dialog opens. Click Add a new table.
The Data Sources dialog opens.

Step 3: To import data from a specific database, select the corresponding logo (Amazon Redshift, Apache
Cassandra, Cloudera Hive, Google BigQuery, Hadoop, etc.). If you select Pig or Web Services, the
Import from Tables dialog opens, bypassing the Select Import Options dialog, allowing you to type a
query to import a table. If you select SAP Hana, you must build or type a query, instead of selecting
tables.or

To import data without specifying a database type, click Databases.

The Select Import Options dialog opens.

Step 4: Select Select Tables and click Next. The Import from Tables dialog opens. If you selected a specific
database, only the data source connections that correspond to the selected database appear. If you did not
select a database, all available data source connections appear.

If necessary, you can create a new connection to a data source while importing your data.

The terminology on the Import from Tables dialog varies based on the source of the data.

Step 5: In the Data Sources/Projects pane, click on the data source/project that contains the data to import.

Step 6: If your data source/project supports namespaces, select a namespace from the Namespace drop-down list
in the Available Tables/Datasets pane to display only the tables/datasets within a selected namespace. To
search for a namespace, type its name in Namespace. The choices in the drop-down list are filtered as
you type.

Step 7: Expand a table/dataset to view the columns within it. Each column appears with its corresponding
data type in brackets. To search for a table/dataset, type its name in Table. The tables/datasets are
filtered as you type.

Step 8: MicroStrategy creates a cache of the database’s tables and columns when a data source/project is first
used. Hover over the Information icon at the top of the Available Tables/Datasets pane to view a tooltip
displaying the number of tables and the last time the cache was updated.

Step 9: Click Update namespaces in the Available Tables/Datasets pane to refresh the namespaces.

Step 10: Click Update in the Available Tables/Datasets pane to refresh the tables/datasets.
lOMoARcPSD|161 847 63

Step 11: Double-click tables/datasets in the Available Tables/Datasets pane to add them to the list of tables to
import. The tables/datasets appear in the Query Builder pane along with their corresponding columns.

Step 12: Click Prepare Data if you are adding a new dataset and want to preview, modify, and specify import
options.or

Click Add if you are editing an existing dataset.

Step 13: Click Finish if you are adding a new dataset and go to the next step.or

Click Update Dataset if you are editing an existing dataset and skip the next step.

Step 14: The Data Access Mode dialog opens.

Click Connect Live to connect to a live database when retrieving data. Connecting live is useful if you
are working with a large amount of data, when importing into the dossier may not be feasible. Go to
the last step.or

Click Import as an In-memory Dataset to import the data directly into your dossier. Importing the data
leads to faster interaction with the data, but uses more RAM memory. Go to the last step.

Step 15: The Publishing Status dialog opens.

If you are editing a connect live dataset, the existing dataset is refreshed and updated.or

If you are editing an in-memory dataset, you are prompted to refresh the existing dataset first.

Step 16: View the new or updated datasets on the Datasets panel.

Result: Thus importing and exporting data from various databases completed successfully.
lOMoARcPSD|161 847 63

EXP NO: 9
MapReduce program to find the grades of student’s.
Date:

Aim: To Develop a MapReduce program to find the grades of student’s.

Procedure:
Mapper
Assume the input file is parsed as (student, grade) pairs.
Reducer
Perform the average of all values for a given key.

Program:

import java.util.Scanner;
public class JavaExample
{
public static void main(String args[])
{
/* This program assumes that the student has 6 subjects,
* thats why I have created the array of size 6. You can
* change this as per the requirement.
*/
int marks[] = new int[6]; int
i;
float total=0, avg;
Scanner scanner = new Scanner(System.in);
for(i=0; i<6; i++) {
System.out.print("Enter Marks of Subject"+(i+1)+":"); marks[i] =
scanner.nextInt();
total = total + marks[i];
}
scanner.close();
//Calculating average
here avg = total/6;
System.out.print("The student Grade is: ");
if(avg>=80)
{
System.out.print("A");
}
else if(avg>=60 && avg<80)
{
System.out.print("B");
lOMoARcPSD|161 847 63

}
else if(avg>=40 && avg<60)
{

System.out.print("C");
}
else
{

System.out.print("D");

}
}
}

Result: Thus MapReduce program to find the grades of student’s completed successfully.
lOMoARcPSD|161 847 63

EXP NO: 10
MapReduce program to find the grades of student’s.
Date:
Aim: To Develop a MapReduce program to calculate the frequency of a given word in agiven
file Map Function – It takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (Key-Value pair).
Example – (Map function in Word Count)

Input

Set of data
Bus, Car, bus, car, train, car, bus, car, train, bus, TRAIN,BUS, buS, caR, CAR, car, BUS, TRAIN

Output

Convert into another set of


data (Key,Value)

(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
(TRAIN,1),(BUS,1), (buS,1), (caR,1), (CAR,1), (car,1), (BUS,1), (TRAIN,1)
Reduce Function – Takes the output from Map as an input and combines those data tuples
into a smaller set of tuples.

Example – (Reduce function in Word Count)


Input Set of Tuples
(output of Map
function)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
(TRAIN,1),(BUS,1),
(buS,1),(caR,1),(CAR,1), (car,1), (BUS,1), (TRAIN,1)

Output Converts into smaller set of tuples

(BUS,7), (CAR,7), (TRAIN,4)

Work Flow of Program


lOMoARcPSD|161 847 63

Workflow of MapReduce consists of 5 steps


1. Splitting – The splitting parameter can be anything, e.g. splitting by
space, comma, semicolon, or even by a new line (‘\n’).
2. Mapping – as explained above
3. Intermediate splitting – the entire process in parallel on different clusters. In
order to group them in “Reduce Phase” the similar KEY data should be on same
cluster.
4. Reduce – it is nothing but mostly group by phase
5. Combining – The last phase where all the data (individual result set from
each cluster) is combine together to form a Result

Now Let’s See the Word Count Program in Java

Make sure that Hadoop is installed on your system with java idk

Steps to follow

Step 1. Open Eclipse> File > New > Java Project > (Name it – MRProgramsDemo)
> Finish
Step 2. Right Click > New > Package ( Name it - PackageDemo) > Finish
Step 3. Right Click on Package > New > Class (Name it - WordCount)
Step 4. Add Following Reference Libraries –
lOMoARcPSD|161 847 63

Right Click on Project > Build Path> Add External Archivals


 /usr/lib/hadoop-0.20/hadoop-core.jar
 Usr/lib/hadoop-0.20/lib/Commons-cli-1.2.jar

Program: Step 5. Type following Program :

package PackageDemo;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static void main(String [] args) throws Exception
{
Configuration c=new Configuration();
String[] files=new GenericOptionsParser(c,args).getRemainingArgs();
Path input=new Path(files[0]);
Path output=new Path(files[1]);
Job j=new Job(c,"wordcount");
j.setJarByClass(WordCount.class);
j.setMapperClass(MapForWordCount.class);
j.setReducerClass(ReduceForWordCount.class);
j.setOutputKeyClass(Text.class);
lOMoARcPSD|161 847 63

j.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(j, input);
FileOutputFormat.setOutputPath(j, output);
System.exit(j.waitForCompletion(true)?0:1);
}
public static class MapForWordCount extends Mapper<LongWritable, Text, Text,
IntWritable>{
public void map(LongWritable key, Text value, Context con) throws IOException,
InterruptedException
{
String line = value.toString();

String[] words=line.split(",");
for(String word: words )
{
Text outputKey = new Text(word.toUpperCase().trim());
IntWritable outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}
}
}
public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text,
IntWritable>
{
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws
IOException,
InterruptedException
{
int sum = 0;
for(IntWritable value : values)
{
sum += value.get();
}
con.write(word, new IntWritable(sum));
}
}
}
lOMoARcPSD|161 847 63

Make Jar File


Right Click on Project> Export> Select export destination as Jar File > next> Finish
lOMoARcPSD|161 847 63

To Move this into Hadoop directly, open the terminal and enter the following
commands:
[training@localhost ~]$ hadoop fs -put wordcountFile wordCountFile

Run Jar file


(Hadoop jar jarfilename.jar packageName.ClassName PathToInputTextFile
PathToOutputDirectry)

[training@localhost ~]$ Hadoop jar


MRProgramsDemo.jarPackageDemo.WordCount
wordCountFile MRDir1

Result: Open Result

[training@localhost ~]$ hadoop fs -ls MRDir1


Found 3 items
-rw-r--r-- 1 training
supergroup 0 2016-02-23
03:36
/user/training/MRDir1/_SUCCESSdrwxr-xr-x -
training supergroup
0 2016-02-23 03:36 /user/training/MRDir1/_logs
-rw-r--r-- 1 training supergroup
20 2016-02-23 03:36 /user/training/MRDir1/part-r-00000
[training@localhost ~]$ hadoop fs -cat MRDir1/part-r-
00000BUS 7
CAR 4
TRAIN 6

Result: Thus MapReduce program to find the grades of student’s completed successfully.

You might also like