Format Synopsis

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

SYNOPSIS

On

SENTIMENTAL ANALYSIS ON TWITTER DATA USING


HADOOP

Submitted in Partial Fulfilment of the requirement for the award of the degree of

BACHELOR OF TECHNOLOGY
(CSE)

Submitted by

Anupam Singh Rawat (160970101012)


Shubham (160970101046)
Aditya Nautiyal (160970101004)
Anjali Saklani (160970101009)

Under the Guidance of

……………Mr. Azmat Siddiqui …………

THDC INSTITUTE OF HYDROPOWER ENGINEERING &


TECHNOLOGY TEHRI, UTTRAKHAND, INDIA
(Uttarakhand Technical University, Dehradun)
1. Introduction
Twitter is a widely used platform for posting comments and people can express their views
and opinions. Sentiment analysis refers to use of natural language processing, text analysis
to computational linguistics to identify and extract subjective information in source material.
Number of tweets received every year is increased. It is hard to process this huge data.
To analyze this big data we are using the Hadoop technology .Hadoop is a scalable open
source framework where Hadoop technology helps us to perform operations on distributed
data in an efficient manner. Hadoop contains a programming model called Map Reduce
where it provides an associated implementation for processing and generating big data sets
with parallel, distributed algorithm on a cluster. In this paper, we are taking an opinions of
the people on a well-known person. People expressed their views about the person which
helps us to analyze the positive, negative and neutral comments.

2. Related work

Data analysis is related to database. Hence, the first job was to learn MySql and the various
database queries. Then the next part was to learn the technology: Hadoop and then covered
various tools related to Hadoop such as: Flume (For Data Extraction), Hive (For Database
queries), Pig (For Analysis), Sqoop (For Transferring Data), etc.

3. Methodology/Proposed Methodology

The real-time Twitter data is extracted using Apache Flume


(https://flume.apache.org/). The data being extracted would be in JSON (JavaScript
Object Notation) format. In JSON format, every data are represented in key/value
pairs and separated by commas. The data stored in the HDFS are analyzed using
data access components of Hadoop ecosystem Apache Pig (https://pig.apache.org)
and Apache Hive (https://hive.apache.org).

4. Plan of Work

Week Description
Week 1 Introduction to Big Data Hadoop
Week 2 Worked on MapReduce and Introduction to HDFS
Week 3 Worked on Flume and Hbase
Week 4 Worked on Hive and Pig
Week 5 Worked on Sqoop and Introduction to NoSQL
Week 6 Worked on Kafka

5. References

1. https://hadoop.apache.org/docs/stable/
2. https://pig.apache.org/
3. https://flume.apache.org/

You might also like