Welcome to Scribd!

Big Data Analysis Lab File: Objective-Design A Word Count Application Using Mapreduce Programming Model. Theory

Uploaded by

0% found this document useful (0 votes)

23 views12 pages

The document describes 5 experiments related to analyzing big data: 1. Designing a word count application using MapReduce that counts the frequency of words in a text file and outputs the results. 2. Creating a Hive table from a CSV file and performing SQL operations on the table including creating, altering and querying the data. 3. Joining two datasets stored as Hive tables using a common column. 4. Performing partitioning in Hive by loading data into tables partitioned by columns like year and course, demonstrating both static and dynamic partitioning. 5. Running Pig queries on data.

Original Description:

Original Title

BIG DATA ANALYSIS

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

0% found this document useful (0 votes)

23 views12 pages

Big Data Analysis Lab File: Objective-Design A Word Count Application Using Mapreduce Programming Model. Theory

Uploaded by

DEEP INDER SINGH SIDHU

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 12

Search inside document

BIG DATA ANALYSIS

LAB FILE

Experiment-1

Objective- Design a Word Count application using MapReduce programming model.

Theory-In MapReduce word count example, we find out the frequency of each word. Here, the
role of Mapper is to map the keys to the existing values and the role of Reducer is to aggregate the keys
of common values. So, everything is represented in the form of Key-value pair.

Procedure-
1. Copy a local text file to hadoop.

2. Run Word Count from MapReduce.

3. Use word count on the given text file and give an output file for it.

4. Once completed list out the files in hadoop to find it.

5. Then open it further to reveal a success file and the output file.

6. Copy that file to the local and give it a new name.

7. Use more to view the results.

Result-
Conclusion- The word count application has successfully been run via mapreduce and the output is
available.

Experiment-2
Objective- Hive table creation and alterations to it.

Theory- Apache Hive is a distributed, fault-tolerant data warehouse system that enables
analytics at a massive scale. Hive allows users to read, write, and manage petabytes of data using
SQL. Hive is built on top of Apache Hadoop, which is an open-source framework used to
efficiently store and process large datasets.

Procedure-

1. Type Hive in the terminal to initialize the Hive shell.

2. Then create database, view the database and finally set it as default.

3. Create a separate csv file in documents to use.

4. Open another terminal and open the csv file there to view its contents.

5. Use pwd to get its path and gedit to be able to edit the contents.

6. Now in the Hive shell create the employee table and type out its schema.
7. Use the address of the csv file from the other terminal and load it into the table here.

8. Now sql operations can be performed on the table.

Results-

Operations on the created table in hive.

Conclusion-Table sucessfully created in hive, data was added to it and queries performed.

Experiment-3
Objective-Joining two datasets with a common column using Hive.

Procedure-

1. Create 2 csv files.

2. Open an existing database.

3. Create 2 table schemas and input data to them from the csv files.

4. Join the two tables via the common column.

5. Creating a external type table as it cant be deleted by another user.

Results-
The two tables-

The resultant joined table-

Experiment-4
Objective-Performing Partioning via Hive.
Theory-
Hive partitioning – 1. Static or manual Partitioning
2. Dynamic Partitioning.
Static Partitioning – When you know the datafiles belong to which partition, accordingly you arrange
them in a table. You have to do this partition manually.
Dynamic Partitioning – Hive will do partitioning based on one or multiple columns of your data.
Procedure and Result-
STATIC PARTITIONING-
1. Create two csv files one for students studying python and other hadoop.

2. Create a new database and use it.

3. Create a new table and use partitioned by.

4. Load the two csv files into the table but using partition of year and course for both.
5. Check the hue browser for static partitionig result.

DYNAMIC PARTITIONING-
1. Create a new csv file and hive will do the everything in dynamic partitioning.

2. Create a new database.

3. Set for dynamic partitioning.

4. Create a temporary table to store initial data.

5. Load the local csv file into the temp table.

6. Create a table for partitioning.

7. Transfer data from temporary file to new file.

8. Check partitioned files based on course and yesr in the hue browser.
Experiment-5

Objective-Running Pig and queries.

Procedure and Results-

Hospital Management System
Document19 pages
Hospital Management System
Dinesh
100% (1)
GMP Checklist For DI Audit
Document4 pages
GMP Checklist For DI Audit
shri_palani
No ratings yet
Hadoop Interview Questions New
Document9 pages
Hadoop Interview Questions New
Rupali Shetty
No ratings yet
Lab Manual Big Data
Document22 pages
Lab Manual Big Data
Rahul
No ratings yet
Bda Lab Manual
Document20 pages
Bda Lab Manual
RAKSHIT AYACHIT
No ratings yet
Bigdata Lab
Document55 pages
Bigdata Lab
Radheshyam Shah
No ratings yet
Bda Aat
Document18 pages
Bda Aat
Abitha Bala Subramani Dept of Artificial Intelligence
No ratings yet
Bda - Unit 2
Document56 pages
Bda - Unit 2
Kajal Vaniya
No ratings yet
BDAA
Document4 pages
BDAA
Leela Rallapudi
No ratings yet
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
Document23 pages
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
MAMAN MYTHIEN S
No ratings yet
Notes
Document53 pages
Notes
Radheshyam Shah
No ratings yet
BDA Practicalfile
Document19 pages
BDA Practicalfile
hereforpractice
No ratings yet
Hadoop and Their Ecosystem
Document24 pages
Hadoop and Their Ecosystem
sunera pathan
100% (2)
Chapter 2 Introduction To Hadoop
Document31 pages
Chapter 2 Introduction To Hadoop
shubham.ojha2102
No ratings yet
Introduction To Hive: Hive Meta Data Engine + Query Engine For Hadoop
Document15 pages
Introduction To Hive: Hive Meta Data Engine + Query Engine For Hadoop
Dheepika
No ratings yet
Big Data Module 2
Document23 pages
Big Data Module 2
Srikanth M
No ratings yet
Big Data and Hadoop: by - Ujjwal Kumar Gupta
Document57 pages
Big Data and Hadoop: by - Ujjwal Kumar Gupta
Ujjwal Kumar Gupta
No ratings yet
Hadoop Intro - Part1
Document45 pages
Hadoop Intro - Part1
nosopa5904
No ratings yet
BDA Experiment 14 PDF
Document77 pages
BDA Experiment 14 PDF
Nikita Ichale
No ratings yet
Warehousing
Document100 pages
Warehousing
Karthik Sakaraboyina
No ratings yet
Big Data Lab Manual and Syllabus
Document71 pages
Big Data Lab Manual and Syllabus
startechbyjus123
No ratings yet
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
Document89 pages
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
Antony George Sahayaraj
No ratings yet
01 Introduction To Hive
Document17 pages
01 Introduction To Hive
TriniPI Dev
No ratings yet
Bda Lab Manual
Document40 pages
Bda Lab Manual
vishalatdwork573
0% (1)
Install and Run Hadoop On Windows
Document29 pages
Install and Run Hadoop On Windows
sunilswastik
No ratings yet
Updated Hadoop Course Content..
Document7 pages
Updated Hadoop Course Content..
tirupati raja
No ratings yet
BDA Lab Assignment 4 PDF
Document21 pages
BDA Lab Assignment 4 PDF
parth shah
No ratings yet
Kadi Sarva Vishwavidyalaya: LDRP Institute of Technology and Research Gandhinagar
Document44 pages
Kadi Sarva Vishwavidyalaya: LDRP Institute of Technology and Research Gandhinagar
Himanshu M
No ratings yet
Bda Lab 1
Document9 pages
Bda Lab 1
Mohit Gangwani
No ratings yet
Basic Hadoop Interview Questionsxyzz
Document18 pages
Basic Hadoop Interview Questionsxyzz
shubham rathod
No ratings yet
Unit 2
Document30 pages
Unit 2
Awadhesh Maurya
No ratings yet
Bda 18CS72 Mod-2
Document152 pages
Bda 18CS72 Mod-2
Dhathri Reddy
No ratings yet
Weather Data Analysis Using Had Oop
Document9 pages
Weather Data Analysis Using Had Oop
Ganesh Kumar
No ratings yet
Module-2 - Introduction To Hadoop
Document13 pages
Module-2 - Introduction To Hadoop
shreya
No ratings yet
An Experimental Approach Towards Big Data For Analyzing Memory Utilization On A Hadoop Cluster Using Hdfs and Mapreduce
Document6 pages
An Experimental Approach Towards Big Data For Analyzing Memory Utilization On A Hadoop Cluster Using Hdfs and Mapreduce
Pradip Kumar
No ratings yet
Big Data Testing
Document34 pages
Big Data Testing
abhi16101
100% (1)
Unit 5 - Introduction To Hadoop
Document50 pages
Unit 5 - Introduction To Hadoop
Shree Shak
No ratings yet
CS-702 (D) BigData
Document61 pages
CS-702 (D) BigData
garima bh
No ratings yet
Unit 4
Document36 pages
Unit 4
Radhamani V
No ratings yet
Module 2. 16974328568170
Document113 pages
Module 2. 16974328568170
Sagar B S
No ratings yet
2022 Assignment Answers
Document37 pages
2022 Assignment Answers
hodaiml
No ratings yet
Compare Hadoop & Spark Criteria Hadoop Spark
Document18 pages
Compare Hadoop & Spark Criteria Hadoop Spark
dasari ramya
No ratings yet
Hadoop Chapter 1
Document6 pages
Hadoop Chapter 1
Swati
No ratings yet
By Pallavi Mandal Class: CS-B Roll No.: 2014BCS1150
Document17 pages
By Pallavi Mandal Class: CS-B Roll No.: 2014BCS1150
neerendra pratap singh
No ratings yet
Shortnotes For Cloud
Document22 pages
Shortnotes For Cloud
Mahi Mahi
No ratings yet
.Analysis and Processing of Massive Data Based On Hadoop Platform A Perusal of Big Data Classification and Hadoop Technology
Document3 pages
.Analysis and Processing of Massive Data Based On Hadoop Platform A Perusal of Big Data Classification and Hadoop Technology
Precious Pearl
No ratings yet
Unit 5 - Introduction To Hadoop
Document50 pages
Unit 5 - Introduction To Hadoop
Shree Shak
No ratings yet
Big Data Workshop Contents
Document2 pages
Big Data Workshop Contents
Sunil Patil
No ratings yet
Hadoop
Document3 pages
Hadoop
Raj Uptx
No ratings yet
Unit-2 Hadoop HDFS Hadoopecosystem
Document25 pages
Unit-2 Hadoop HDFS Hadoopecosystem
sisodiyaa853
No ratings yet
Big Data
Document17 pages
Big Data
gtfhbmnvh
No ratings yet
Real Time Hadoop Interview Questions From Various Interviews
Document6 pages
Real Time Hadoop Interview Questions From Various Interviews
Saurabh Gupta
No ratings yet
Introduction To Big Dat1
Document6 pages
Introduction To Big Dat1
02.satya.2001
No ratings yet
Bda Lab
Document94 pages
Bda Lab
Dinesh Raj
No ratings yet
Big Data & Analytics Lab Manual
Document51 pages
Big Data & Analytics Lab Manual
Sathish
No ratings yet
Hadoop Ecosystem and Their Components
Document19 pages
Hadoop Ecosystem and Their Components
pallavibhardwaj1124
No ratings yet
Big Data Analytics
Document27 pages
Big Data Analytics
Chinmay Bhake
No ratings yet
Unit - III Advanced Analytics Technology and Tools
Document44 pages
Unit - III Advanced Analytics Technology and Tools
Diksha Chhabra
No ratings yet
04 - Introduction To The Big Data Ecosystem
Document25 pages
04 - Introduction To The Big Data Ecosystem
Jose Evanan
No ratings yet
Importing and Exporting Files in Hadoop Distributed File System
Document6 pages
Importing and Exporting Files in Hadoop Distributed File System
Abhishek Acharya
No ratings yet
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Cyber Security Assignment-1: Incident Analysis and Ethics Computerized Elections Cyber Warfare
Document26 pages
Cyber Security Assignment-1: Incident Analysis and Ethics Computerized Elections Cyber Warfare
DEEP INDER SINGH SIDHU
100% (1)
Humanities Presentation: Role of Mass Media in Portraying Gender Stereotypes By-Deep Inder Singh Sidhu
Document11 pages
Humanities Presentation: Role of Mass Media in Portraying Gender Stereotypes By-Deep Inder Singh Sidhu
DEEP INDER SINGH SIDHU
No ratings yet
Humanities Presentation: Role of Mass Media in Portraying Gender Stereotypes By-Deep Inder Singh Sidhu
Document11 pages
Humanities Presentation: Role of Mass Media in Portraying Gender Stereotypes By-Deep Inder Singh Sidhu
DEEP INDER SINGH SIDHU
No ratings yet
Humanities Presentation: Role of Mass Media in Portraying Gender Stereotypes By-Deep Inder Singh Sidhu
Document11 pages
Humanities Presentation: Role of Mass Media in Portraying Gender Stereotypes By-Deep Inder Singh Sidhu
DEEP INDER SINGH SIDHU
No ratings yet
0420 s14 QP 13
Document20 pages
0420 s14 QP 13
ahmed
No ratings yet
Fingerprint Liveness Detection From Single Image Using Low-Level Features and Shape Analysis
Document10 pages
Fingerprint Liveness Detection From Single Image Using Low-Level Features and Shape Analysis
SR Projects
No ratings yet
7.sap - Abap 2
Document15 pages
7.sap - Abap 2
mlp mlp
No ratings yet
DW Final
Document38 pages
DW Final
Binay Tiwari
No ratings yet
Nutanix ECA v6.5 Datasheet
Document5 pages
Nutanix ECA v6.5 Datasheet
Ilavarasu c b sekar
No ratings yet
Primary Activities - AIS
Document4 pages
Primary Activities - AIS
Mary Grace Ciervo
No ratings yet
Questions CH04 UML M3
Document2 pages
Questions CH04 UML M3
sazidwp
No ratings yet
PGDIT 2013 ProjectGuidelines IT
Document3 pages
PGDIT 2013 ProjectGuidelines IT
dev_thecoolboy
No ratings yet
SSL Summary
Document3 pages
SSL Summary
Raoof Ahmed
No ratings yet
Sans Survey Ics 2023
Document19 pages
Sans Survey Ics 2023
Ashish
No ratings yet
Mtcna PDF
Document271 pages
Mtcna PDF
Erfin Sugiono
No ratings yet
Homomorphic Encryption: Aarushi Gupta 1RV18EC001
Document14 pages
Homomorphic Encryption: Aarushi Gupta 1RV18EC001
crayon
No ratings yet
Hol-1859-01-Adv - PDF - en - Load Balancer Configuration
Document261 pages
Hol-1859-01-Adv - PDF - en - Load Balancer Configuration
Mukesh Singh
No ratings yet
CRM Notes - in Progress (Nov 2015)
Document149 pages
CRM Notes - in Progress (Nov 2015)
FatsieSA
No ratings yet
Sy0 601 18
Document20 pages
Sy0 601 18
MEN'S ARENA
No ratings yet
An IoT Simulator in NS3 and A Key-Based Authentication Architecture For IoT Devices Using Blockchain
Document146 pages
An IoT Simulator in NS3 and A Key-Based Authentication Architecture For IoT Devices Using Blockchain
Ilyas BENKHADDRA
No ratings yet
Form 1.1 Self-Assessment Checklists: CAN I... ?
Document5 pages
Form 1.1 Self-Assessment Checklists: CAN I... ?
Kindly Legarte
No ratings yet
COLLOIUM
Document37 pages
COLLOIUM
chpre tu
No ratings yet
Warehouse Management System
Document406 pages
Warehouse Management System
Prashant Raghav
100% (2)
HIKAPRIL22 - Storage Products
Document45 pages
HIKAPRIL22 - Storage Products
CORAL ALONSO JIMÉNEZ
No ratings yet
Lab 1
Document5 pages
Lab 1
Sagar
No ratings yet
Applications of Gps and Gis in Civil Engineering
Document21 pages
Applications of Gps and Gis in Civil Engineering
Vinay Gupta
No ratings yet
7023T - TK2 - W9 - S10 - R1
Document11 pages
7023T - TK2 - W9 - S10 - R1
Ghema
No ratings yet
Security Part II: Auditing Database Systems1 Data Management Approaches1
Document8 pages
Security Part II: Auditing Database Systems1 Data Management Approaches1
kdsfesl
No ratings yet
Operating System Lab Activity
Document2 pages
Operating System Lab Activity
Bernadette Aycocho
No ratings yet
Big Data Analytics For Healthcare Recommendation Systems
Document6 pages
Big Data Analytics For Healthcare Recommendation Systems
Nirosh Kumar
No ratings yet
5 B 7 BFC 26111 F 955 A
Document16 pages
5 B 7 BFC 26111 F 955 A
Sameer Al-hadithy
No ratings yet
Exalca Company Profile V 0.2
Document8 pages
Exalca Company Profile V 0.2
Rajkumar C
No ratings yet