Professional Documents
Culture Documents
BDA Lab Manual AI&DS
BDA Lab Manual AI&DS
REGULATION 2021
REGULATION 2021
SIGNATURE : SIGNATURE :
DATE: DATE:
VISION AND MISSION STATEMENTS OF THE INSTITUTE
Vision:
We endeavor to impart futuristic technical education of the highest quality to the student community and
to inculcate discipline in them to face the world with self-confidence and thus we prepare them for life as
responsible citizens to uphold human values and to be of services at large. We strive to bring up the
Institution as an Institution of Academic excellence of International standard.
Mission:
We transform persons into personalities by the state-of-the-art infrastructure, time consciousness, quick
response and the best academic practices through assessment and advice.
Vision:
To offer a quality education Artificial Intelligence and Data Science, encourage life-long learning and make
graduates responsible for society by upholding social values in the field of emerging technology.
Mission:
To produce graduates with sound technical knowledge and good skills that prepare them for
rewarding career in prominent industries.
To promote collaborative learning and research with Industry, Government and International
organizations for continuous knowledge transfer and enhancement.
To promote entrepreneurship and mould the graduates to be leaders by cultivating the spirit
of social ethical values.
PEO, PO and PSO Statements
PEO1 - Graduates will have successful careers with high level of technical competency and
problem-solving skills to produce innovative solutions for industrial needs.
PEO2 – Graduates will have good professionalism, team work, effective communication, leadership
qualities and life-long learning for the welfare of mankind.
PEO3 – Graduates will be familiar with recent trends in industry for delivering and implementing
innovative system in collaboration.
2. Problem analysis: Identify, formulate, review research literature, and analyze complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences.
3. Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for the
public health and safety, and the cultural, societal, and environmental considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and research methods
including design of experiments, analysis and interpretation of data, and synthesis of the information to
provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities with an
understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal,
health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional
engineering practice.
7. Environment and sustainability: Understand the impact of the professional engineering solutions in
societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of
theengineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader in
diverseteams, and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write effective reports and
design documentation, make effective presentations, and give and receive clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the engineering
and management principles and apply these to one’s own work, as a member and leader in a team, to
manage projects and in multidisciplinary environments.
12. Life Long Learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
PSO1 - Students will be apply programming skills to develop new software with assured quality.
PSO2 - Students will be ability to demonstrate specific coding skills to improve employability.
Environment &
PO1 Engineering Knowledge PO7 PSO1 Professional Skills
Sustainability
PO2 Problem Analysis PO8 Ethics PSO2 Competency
PO3 Design & Development PO9 Individual & Team Work PSO3 Developing
PO4 Investigations PO10 Communication Skills
CS3311
Project Management &
PO5 Modern Tools PO11
Finance
PO6 Engineer & Society PO12 Life Long Learning
PO/PSO
SNO JUSTIFICATION
MAPPED
PO1 Apply the knowledge acquired to study about introduction to big data and classifications.
Understanding the bigdata helps the students to classifying the types of technology and use
PO2 the convergence of key trends.
PO3 Design and develop industry related data and explore with latest technology.
Understand the bigdata concepts and classifying the types helps in analyzing and the
PO4
concept of web analytics.
Students were used modern tools and technique (crowd sourcing) and identify firewall
PO5 concepts.
CO504.1
To form the team or individual student know about how to classify and analyze the
PO09
big data technologies .
Students were given assignments, seminars, group discussions, and technical quizzes on
PO10 various topics in order to improve their communicative skills.
Students were given project on particular topic in order to improve their project skills and
PO11 develop their creativity skills.
Ability to engage in independent and life-long learning in the broadest context of
PO12 technological change.
PO1 Apply the knowledge acquired to classify the NO SQL and aggregate data models.
Understanding the databases helps the students to classifying the types of databases and
PO2 working on it.
PO3 Design the cassandra and Cassandra data models by applying cassandrs techniques.
Understand the distribution model and classifying the types helps in analyzing and
PO4
interpreting in the given model is good or not.
Students were used modern tools and technique (Cassandra data model) and identify the
PO5 Cassandra clients.
CO504.2
To form the team or individual student know about how to classify and analyze the
PO09
master slave replication
Students were given assignments, seminars, group discussions, and technical quizzes on
PO10 various topics in order to improve their communicative skills.
Students were given project on particular topic in order to improve their project skills and
PO11 develop their creativity skills.
Ability to engage in independent and life-long learning in the broadest context of
PO12 technological change.
PO1 Apply the knowledge acquired to classify map reduce and work with MR UNIT.
Understanding the DATA helps the students to classifying the types of data such as test data
PO2 and local data and working on it.
PO3 Design the YARN by applying mathematical techniques.
Understand the map reduce types classifying the types helps in analyzing and
CO504.3 PO4
interpreting in the input formats and output formats.
PO5 Students were used modern tools and technique (YARN) and identify The job scheduling.
To form the team or individual student know about how to classify move and the task
PO09
execution and its working.
Students were given assignments, seminars, group discussions, and technical quizzes on
PO10 various topics in order to improve their communicative skills.
CS3311
Students were given project on particular topic in order to improve their project skills and
PO11 develop their creativity skills.
Ability to engage in independent and life-long learning in the broadest context of
PO12 technological change.
PO1 Apply the knowledge acquired to understand the data format and analyzing with hadoop
Understanding the concept of hadoop helps the students to classifying the hadoop streaming
PO2 and hadoop pipes and working on it.
PO3 Design the hadoop and hadoop streaming form by applying mathematical techniques.
Understand the basics of hadoop and students will know about file based data
PO4 structures.
CO504.4 To form the team or individual student know about how to classify move and the
PO09
avro and hadoop difference.
Students were given assignments, seminars, group discussions, and technical quizzes on
PO10 various topics in order to improve their communicative skills.
Students were given project on particular topic in order to improve their project skills and
PO11 develop their creativity skills.
Ability to engage in independent and life-long learning in the broadest context of
PO12 technological change.
PO1 Apply the knowledge acquired to know about all hadoop related tools.
Understanding the pidlathin scripts helps the students to classifying the types of algorithm
PO2 and working on it.
Design the hive data models and hiveQL data manipulation by applying mathematical
PO3 techniques.
Understand the recursive andunrecursively classifying the types helps in analyzing and
PO4
interpreting in the given converting PCM to MPCM.
Students were used modern tools and technique grunt and identify the types and new
CO504.5 PO5 mechanisms.
To form the team or individual student know about how to classify hive and the
PO09
hadoop.
Students were given assignments, seminars, group discussions, and technical quizzes on
PO10 various topics in order to improve their communicative skills.
Students were given project on particular topic in order to improve their project skills and
PO11 develop their creativity skills.
Ability to engage in independent and life-long learning in the broadest context of
PO12 technological change.
PO/PSO
SNO JUSTIFICATION
MAPPED
Ability to gain technologies to design the big data concepts and know about how to develop
PSO1
the tools.
CO504.1 Acquire the knowledge and behaviors in designing a applications that leads successful in
PSO2
bid data technology.
Ability to apply the professional skills in identifying the appropriate in data base and
PSO1
classifying to graph database and schema less database.
CO504.2
Acquire Students Understand the distribution model and classifying the types helps in
PSO2
analyzing and interpreting in the given model is good or not.
Ability to gain and Understand the map reduce types classifying the types helps in
PSO1
analyzing and interpreting in the input formats and output formats.
CO504.3
Acquire the knowledge and behaviors of map reduce types which is used is to differentiate
PSO2
input format and output format .
CO504.4 PSO1 Apply the mathematical Functions and identify hadoopstreming and hadooppipes .
CS3311
Ability to Apply the knowledge and behaviors in identifying the appropriate design
PSO2
ofhadoop distributed file system which is useful for the career.
Apply the mathematical skills and Design the PIG LATIN problem and GRUNT by
PSO1
applying mathematical techniques.
CO504.5 Acquire the knowledge and behaviors To form the team or individual student know
PSO2
about how to classify HBASE and the data models.
LIST OF EXPERIMENTS:
1. Downloading and installing Hadoop; Understanding different Hadoop modes. Startup
scripts, Configuration files.
2. Hadoop Implementation of file management tasks, such as Adding files and
directories, retrieving files and Deleting files
3. Implement of Matrix Multiplication with Hadoop MapReduce
4. Run a basicWord Count Map Reduce program to understand MapReduce Paradigm.
5. Installation of Hive along with practice examples.
7. Installation of HBase, Installing thrift along with Practice examples
8. Practice importing and exporting data from various databases.
SoftwareRequirements:
Cassandra, Hadoop, Java, Pig, Hive and HBase.
TOTAL: 30 PERIODS
Hadoop Implementation of file management tasks, such as Adding files and directories,retrieving files
3
and Deleting files
5 Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
Additional Experiments
➢ Apache Hadoop offers a scalable, flexible and reliable distributed computing big
data framework for a cluster of systems with storage capacity and local computing
power by leveraging commodity hardware.
➢ Hadoop follows a Master Slave architecture for the transformation and analysis of
large datasets using Hadoop MapReduce paradigm. The 3 important hadoop
components that play a vital role in the Hadooparchitecture.
➢ Hadoop Common – the libraries and utilities used by other Hadoop modules
➢ Hadoop Distributed File System (HDFS) – the Java-based scalable system thatstores
data across multiple machines without prior organization.
➢ YARN – (Yet Another Resource Negotiator) provides resource management for the
processes running on Hadoop.
➢ MapReduce – a parallel processing software framework. It is comprised of two
steps. Map step is a master node that takes inputs and partitions them into smaller
sub problems and then distributes them to worker nodes. After the map step has
taken place, the master node takes the answers to all of the sub problems and
combines them to produce output.
Result: Thus successfully studied of Big Data Analytics and Hadoop Architecture
EXP NO: 2
Downloading and installing Hadoop; Understanding
Date: different Hadoop modes. Startup scripts,Configuration
files.
Aim:
To Install Apache Hadoop.
Hadoop software can be installed in three modes of
Hadoop is a Java-based programming framework that supports the processing and storage of
extremely large datasets on a cluster of inexpensive machines. It was the first major open source
project in the big data playing field and is sponsored by the Apache Software Foundation.
Hadoop Common is the collection of utilities and libraries that support other Hadoop
modules.
HDFS, which stands for Hadoop Distributed File System, is responsible for persisting
data to disk.
YARN, short for Yet Another Resource Negotiator, is the "operating system" for HDFS.
MapReduce is the original processing model for Hadoop clusters. It distributes work within
the cluster or map, then organizes and reduces the results from the nodes into a response to
a query. Many other processing models are available for the 2.x version of Hadoop.
Hadoop clusters are relatively complex to set up, so the project includes a stand-alone mode
which is suitable for learning about Hadoop, performing simple operations, and debugging.
Procedure:
we'll install Hadoop in stand-alone mode and run one of the example example MapReduce
programs it includes to verify the installation.
Prerequisites:
If Apache Hadoop 2.2.0 is not already installed then follow the post
Build, Install,Configure and Run Apache Hadoop 2.2.0 in Microsoft
Windows OS.
Run following
commands.
Command Prompt C:\
Users\abhijitg>cd c:\
hadoop c:\
hadoop>sbin\start
-dfs c:\
hadoop>sbin\start
-yarn starting yarn
daemons
EXP NO: 3
Hadoop Implementation of file management tasks, such as
Date: Adding files and directories,retrieving files and Deleting
files
Aim:
Implement the following file management tasks in Hadoop:
Adding files and directories
Retrieving files
Deleting Files
Procedure:
Algorithm:
Syntax And Commands To Add, Retrieve And Delete Data From Hdfs
Before you can run Hadoop programs on data stored in HDFS, you‘ll need to put the
data into HDFS first. Let‘s create a directory and put a file in it. HDFS has a default
working directory of /user/$USER, where $USER is your login user name. This
directory isn‘t automatically created for you, though, so let‘s create it with the mkdir
command. For the purpose of illustration, we use chuck. You should substitute your
user name in the example commands.
hadoop fs -mkdir /user/chuck
hadoop fs -put example.txt
hadoop fs -put example.txt /user/chuck
lOMoARcPSD|161 847 63
The Hadoop command get copies files from HDFS back to the local filesystem. To retrieve
example.txt, we can run the following command.
hadoop fs -cat example.txt
Copying from directory command is “hdfs dfs –copyFromLocal /home /lendi/ Desktop/
shakes/
glossary /lendicse/”
View the file by using the command “hdfs dfs –cat /lendi_english/glossary”
Command for listing of items in Hadoop is “hdfs dfs –ls hdfs://localhost:9000/”
Command for Deleting files is “hdfs dfs –rm r /kartheek”
Result: Thus implement the file management tasks in Hadoop Completed successfully.
lOMoARcPSD|161 847 63
EXP NO: 4
Implement of Matrix Multiplication with Hadoop Map Reduce
Date:
produce (key,value) pairs as ((i,k), (M,j,mij), for k=1,2,3,.. upto the number of
columns of N
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
lOMoARcPSD|161 847 63
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.ReflectionUtils;
Pair() {
i = 0;
lOMoARcPSD|161 847 63
j = 0;
}
Pair(int i, int j) {
this.i = i;
this.j = j;
}
@Override
public void readFields(DataInput input) throws IOException
{ i = input.readInt();
j = input.readInt();
}
@Override
public void write(DataOutput output) throws IOException {
output.writeInt(i);
output.writeInt(j);
}
@Override
public int compareTo(Pair compare)
{ if (i > compare.i) {
return 1;
} else if ( i < compare.i)
{ return -1;
} else {
if(j > compare.j) {
return 1;
} else if (j < compare.j)
{ return -1;
}
}
return 0;
}
public String toString() {
return i + " " + j + "
";
}
}
public class Multiply {
public static class MatriceMapperM extends Mapper<Object,Text,IntWritable,Element>
{
lOMoARcPSD|161 847 63
@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException
{
String readLine = value.toString();
String[] stringTokens =
readLine.split(",");
conf);
lOMoARcPSD|161 847 63
if (tempElement.tag == 0) {
M.add(tempElement);
} else if(tempElement.tag == 1)
{ N.add(tempElement);
}
}
for(int i=0;i<M.size();i++) {
for(int j=0;j<N.size();j++) {
sum += value.get();
}
context.write(key, new DoubleWritable(sum));
}
}
public static void main(String[] args) throws Exception
{ Job job = Job.getInstance();
job.setJobName("MapIntermediate");
job.setJarByClass(Project1.class);
MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class,
MatriceMapperM.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class,
MatriceMapperN.class);
job.setReducerClass(ReducerMxN.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(Element.class);
job.setOutputKeyClass(Pair.class);
job.setOutputValueClass(DoubleWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.waitForCompletion(true);
Job job2 = Job.getInstance();
job2.setJobName("MapFinalOutput");
job2.setJarByClass(Project1.class);
job2.setMapperClass(MapMxN.class);
job2.setReducerClass(ReduceMxN.class);
job2.setMapOutputKeyClass(Pair.class);
job2.setMapOutputValueClass(DoubleWritable.class);
job2.setOutputKeyClass(Pair.class);
job2.setOutputValueClass(DoubleWritable.class);
job2.setInputFormatClass(TextInputFormat.class);
job2.setOutputFormatClass(TextOutputFormat.class);
job2.waitForCompletion(true);
}
}
#!/bin/bash
mkdir -p classes
javac -d classes -cp classes:`$HADOOP_HOME/bin/hadoop classpath` Multiply.java
jar cf multiply.jar -C classes .
echo "end"
stop-yarn.sh
stop-dfs.sh
myhadoop-cleanup.sh
lOMoARcPSD|161 847 63
Result: Thus implement of Matrix Multiplication with Hadoop Map Reduce completed
successfully.
lOMoARcPSD|161 847 63
EXP NO: 5
Run a basic Word Count Map Reduce program to
Date: understand Map Reduce Paradigm.
Procedure:
Create a text file with some content. We'll pass this file as input to the
wordcount MapReduce job for counting words.
C:\file1.txt
Install Hadoop
Create a directory (say 'input') in HDFS to keep all the text files (say 'file1.txt') to be used for counting
words.
C:\Users\abhijitg>cd c:\hadoop C:\
hadoop>bin\hdfs dfs -mkdir input
Copy the text file(say 'file1.txt') from local disk to the newly created 'input' directory in HDFS.
Maps =1
Failed Shuffles=0
Merged Map
outputs=1
GC time elapsed (ms)=145
lOMoARcPSD|161 847 63
Result: Thus basic Word Count Map Reduce program to understand Map Reduce
Paradigm run successfully.
lOMoARcPSD|161 847 63
EXP NO: 6
Installation of Hive along with practice examples.
Date:
Prerequisites:
With Java in place, we'll visit the Apache Hadoop Releases page to find the most recent stable release.
Please note while creating folders, DO NOT ADD SPACES IN BETWEEN THE FOLDER NAME.(it
can cause issues later)
I have placed my Hive in D: drive you can use C: or any other drive also.
To edit environment variables, go to Control Panel > System > click on the “Advanced system
settings” link
Alternatively, We can Right click on This PC icon and click on Properties and click on the “Advanced
system settings” link
Or, easiest way is to search for Environment Variable in search bar and there you go
lOMoARcPSD|161 847 63
lOMoARcPSD|161 847 63
Now as shown, add HIVE_HOME in variable name and path of Hive in Variable Value.
Click OK and we are half done with setting HIVE_HOME.
Click OK and OK. & we are done with Setting Environment Variables.
3.4 Verify the Paths
Now we need to verify that what we have done is correct and reflecting.
4. Editing Hive
Once we have configured the environment variables next step is to configure Hive. It has 7 parts:-
First step in configuring the hive is to download and replace the bin folder.
* Extract the zip and replace all the files present under bin folder to %HIVE_HOME%\bin
Note:- If you are using different version of HIVE then please search for its respective bin folder and
download it.
Now Open the newly created Hive-site.xml and we need to edit the following properties
<property>
<name>hive.metastore.uris</name>
<value>thrift://<Your IP Address>:9083</value>
<property>
<name>hive.downloaded.resources.dir</name>
<value><Your drive Folder>/${hive.session.id}_resources</value>
<property>
lOMoARcPSD|161 847 63
lOMoARcPSD|161 847 63
<name>hive.exec.scratchdir</name>
<value>/tmp/mydir</value>
Replace the value for <Your IP Address> with the IP Address of your System and replace <Your drive
Folder> with the Hive folder Path.
This is a short step and we need to remove all the  character present in the hive-site.xml file.
The next important step in configuring Hive is to create users for MySQL.
These Users are used for connecting Hive to MySQL Database for reading and writing data from it.
Note:- You can skip this step if you have created the hive user while SQOOP installation.
Firstly, we need to open the MySQL Workbench and open the workspace(default or any specific, if
you want). We will be using the default workspace only for now.
Now Open the Administration option in the Workspace and select Users and privileges option
under Management.
lOMoARcPSD|161 847 63
lOMoARcPSD|161 847 63
Now select Add Account option and Create an new user with Login Name as hive and Limit to
Host Mapping as the localhost and Password of your choice.
Now we have to define the roles for this user under Administrative Roles
and select DBManager ,DBDesigner and BackupAdmin Roles
lOMoARcPSD|161 847 63
Now we need to grant schema privileges for the user by using Add Entry option and
selecting the schemas we need access to.
I am using schema matching pattern as %_bigdata% for all my bigdata related schemas. You can use other
2 options also.
After clicking OK we need to select All the privileges for this schema.
Click Apply and we are done with the creating Hive user.
4.5 Granting permission to Users
Once we have created the user hive the next step is to Grant All privileges to this user for all the Tables in
the previously selected Schema.
Open the MySQL cmd Window. We can open it by using the Window’s Search bar.
lOMoARcPSD|161 847 63
Upon opening it will ask for your root user password(created while setting up MySQL).
Finally, we need to open our hive-site.xml file once again and make some changes their, these are related
to Hive metastore that’s why did not add them in starting so as to distinguish between the different set of
properties.
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/<Your Database>?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the
connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://localhost:9000/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value><Hive Password></value>
lOMoARcPSD|161 847 63
lOMoARcPSD|161 847 63
<property>
<name>datanucleus.schema.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.autoCreateTables</name>
<value>True</value>
</property>
<property>
<name>datanucleus.schema.validateTables</name>
<value>true</value>
<description>validates existing schema against code. turn this on if you want to verify existing
schema</description>
</property>
Replace the value for <Hive Password> with the hive user password that we created in MySQL user
creation. And <Your Database> with the database that we used for metastore in MySQL.
5. Starting Hive
Now we need to start a new Command Prompt remember to run it as administrator to avoid permission
issues and execute below commands
start-all.cmd
All the 4 daemons should be UP and running.
Open a cmd window, run below command to start the Hive metastore.
hive --service metastore
Now open a new cmd window and run the below command to start Hive
hive
<dependency>
<groupId>org.apache. hive</groupId>
<artifactId>hive- jdbc</artifactId>
<version>3.1.2</version>
lOMoARcPSD|161 847 63
</dependency>
Start HiveServer2
To connect to Hive from Java, you need to start hiveserver2 from $HIVE_HOME/bin
prabha@namenode:~/hive/bin$ ./hiveserver2
2020-10-03 23:17:08: Starting HiveServer2
Copy
Below are complete Java example of how to create a Hive Database.
Create a Hive Table from Java Example
package com.sparkbyexamples.hive;
import java.sql.Connection;
import java.sql.Statement;
import java.sql.DriverManager;
EXP NO: 7
Installation of HBase, Installing thrift along with
Date: Practice examples
Prerequisites:
Installing Hadoop
Procedure:
Step 3: Now we need to change 2 files a config and a cmd file. Inorder to do that, go to the unzipped location.
Change 1 Edit hbase-config.cmd, located in the bin folder under the unzipped location and add the below line
set JAVA_HOME=C:\software\Java\jdk1.8.0_201
Change 2 Edit hbase-site.xml, located in the conf folder under the unzipped location and add the section
fs.defaultFS value.
<property>
<name>hbase.rootdir</name>
<value>file:/home/hadoop/HBase/HFiles</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
lOMoARcPSD|161 847 63
<property>
<name>hbase.cluster.distributed</name>
<value>false</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
Step 5: Now we are all set to run HBase, to start HBase execute the command below from the bin folder.
Open Command Prompt and cd to Hbase’ bin directory
Run start-hbase.cmd
Look for any errors
Step 6: Test the installation using HBase shell
create ‘emp’,’p’
EXP NO: 8
Practice importing and exporting data from various databases.
Date:
Procedure:
Step 2: Choose Add Data - > New Data to import data into a new dataset. or
In the Datasets panel, click More next to the dataset name and choose Edit Dataset to add data to
the dataset. The Preview Dialog opens. Click Add a new table.
The Data Sources dialog opens.
Step 3: To import data from a specific database, select the corresponding logo (Amazon Redshift, Apache
Cassandra, Cloudera Hive, Google BigQuery, Hadoop, etc.). If you select Pig or Web Services, the
Import from Tables dialog opens, bypassing the Select Import Options dialog, allowing you to type a
query to import a table. If you select SAP Hana, you must build or type a query, instead of selecting
tables.or
Step 4: Select Select Tables and click Next. The Import from Tables dialog opens. If you selected a specific
database, only the data source connections that correspond to the selected database appear. If you did not
select a database, all available data source connections appear.
If necessary, you can create a new connection to a data source while importing your data.
The terminology on the Import from Tables dialog varies based on the source of the data.
Step 5: In the Data Sources/Projects pane, click on the data source/project that contains the data to import.
Step 6: If your data source/project supports namespaces, select a namespace from the Namespace drop-down list
in the Available Tables/Datasets pane to display only the tables/datasets within a selected namespace. To
search for a namespace, type its name in Namespace. The choices in the drop-down list are filtered as
you type.
Step 7: Expand a table/dataset to view the columns within it. Each column appears with its corresponding
data type in brackets. To search for a table/dataset, type its name in Table. The tables/datasets are
filtered as you type.
Step 8: MicroStrategy creates a cache of the database’s tables and columns when a data source/project is first
used. Hover over the Information icon at the top of the Available Tables/Datasets pane to view a tooltip
displaying the number of tables and the last time the cache was updated.
Step 9: Click Update namespaces in the Available Tables/Datasets pane to refresh the namespaces.
Step 10: Click Update in the Available Tables/Datasets pane to refresh the tables/datasets.
lOMoARcPSD|161 847 63
Step 11: Double-click tables/datasets in the Available Tables/Datasets pane to add them to the list of tables to
import. The tables/datasets appear in the Query Builder pane along with their corresponding columns.
Step 12: Click Prepare Data if you are adding a new dataset and want to preview, modify, and specify import
options.or
Step 13: Click Finish if you are adding a new dataset and go to the next step.or
Click Update Dataset if you are editing an existing dataset and skip the next step.
Click Connect Live to connect to a live database when retrieving data. Connecting live is useful if you
are working with a large amount of data, when importing into the dossier may not be feasible. Go to
the last step.or
Click Import as an In-memory Dataset to import the data directly into your dossier. Importing the data
leads to faster interaction with the data, but uses more RAM memory. Go to the last step.
If you are editing a connect live dataset, the existing dataset is refreshed and updated.or
If you are editing an in-memory dataset, you are prompted to refresh the existing dataset first.
Step 16: View the new or updated datasets on the Datasets panel.
Result: Thus importing and exporting data from various databases completed successfully.
lOMoARcPSD|161 847 63
EXP NO: 9
MapReduce program to find the grades of student’s.
Date:
Procedure:
Mapper
Assume the input file is parsed as (student, grade) pairs.
Reducer
Perform the average of all values for a given key.
Program:
import java.util.Scanner;
public class JavaExample
{
public static void main(String args[])
{
/* This program assumes that the student has 6 subjects,
* thats why I have created the array of size 6. You can
* change this as per the requirement.
*/
int marks[] = new int[6]; int
i;
float total=0, avg;
Scanner scanner = new Scanner(System.in);
for(i=0; i<6; i++) {
System.out.print("Enter Marks of Subject"+(i+1)+":"); marks[i] =
scanner.nextInt();
total = total + marks[i];
}
scanner.close();
//Calculating average
here avg = total/6;
System.out.print("The student Grade is: ");
if(avg>=80)
{
System.out.print("A");
}
else if(avg>=60 && avg<80)
{
System.out.print("B");
lOMoARcPSD|161 847 63
}
else if(avg>=40 && avg<60)
{
System.out.print("C");
}
else
{
System.out.print("D");
}
}
}
Result: Thus MapReduce program to find the grades of student’s completed successfully.
lOMoARcPSD|161 847 63
EXP NO: 10
MapReduce program to find the grades of student’s.
Date:
Aim: To Develop a MapReduce program to calculate the frequency of a given word in agiven
file Map Function – It takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (Key-Value pair).
Example – (Map function in Word Count)
Input
Set of data
Bus, Car, bus, car, train, car, bus, car, train, bus, TRAIN,BUS, buS, caR, CAR, car, BUS, TRAIN
Output
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
(TRAIN,1),(BUS,1), (buS,1), (caR,1), (CAR,1), (car,1), (BUS,1), (TRAIN,1)
Reduce Function – Takes the output from Map as an input and combines those data tuples
into a smaller set of tuples.
Make sure that Hadoop is installed on your system with java idk
Steps to follow
Step 1. Open Eclipse> File > New > Java Project > (Name it – MRProgramsDemo)
> Finish
Step 2. Right Click > New > Package ( Name it - PackageDemo) > Finish
Step 3. Right Click on Package > New > Class (Name it - WordCount)
Step 4. Add Following Reference Libraries –
lOMoARcPSD|161 847 63
package PackageDemo;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static void main(String [] args) throws Exception
{
Configuration c=new Configuration();
String[] files=new GenericOptionsParser(c,args).getRemainingArgs();
Path input=new Path(files[0]);
Path output=new Path(files[1]);
Job j=new Job(c,"wordcount");
j.setJarByClass(WordCount.class);
j.setMapperClass(MapForWordCount.class);
j.setReducerClass(ReduceForWordCount.class);
j.setOutputKeyClass(Text.class);
lOMoARcPSD|161 847 63
j.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(j, input);
FileOutputFormat.setOutputPath(j, output);
System.exit(j.waitForCompletion(true)?0:1);
}
public static class MapForWordCount extends Mapper<LongWritable, Text, Text,
IntWritable>{
public void map(LongWritable key, Text value, Context con) throws IOException,
InterruptedException
{
String line = value.toString();
String[] words=line.split(",");
for(String word: words )
{
Text outputKey = new Text(word.toUpperCase().trim());
IntWritable outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}
}
}
public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text,
IntWritable>
{
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws
IOException,
InterruptedException
{
int sum = 0;
for(IntWritable value : values)
{
sum += value.get();
}
con.write(word, new IntWritable(sum));
}
}
}
lOMoARcPSD|161 847 63
To Move this into Hadoop directly, open the terminal and enter the following
commands:
[training@localhost ~]$ hadoop fs -put wordcountFile wordCountFile
Result: Thus MapReduce program to find the grades of student’s completed successfully.