Professional Documents
Culture Documents
Aishu Mca Project Doc 1 (Signed)
Aishu Mca Project Doc 1 (Signed)
By
S.AISHWARYA
(Register No. 921317621334)
i
BONAFIDE CERTIFICATE
Certified that this project report titled A STRUCTURE FOR EMOTION EXPLORATION
Ms. S. AISHWARYA( Reg.No.921317621334) who carried out the work under my supervision.
Certified further that to the best of my knowledge and belief, the work reported herein does not
form part of any Other project report or dissertation on the basis of which a degree or an award
Date:18.09.2020 Supervisor,
Assistant Professor
Department of MCA
COUNTERSIGNED
Dr.P.JAGANATHAN, MCA.,Ph.D.
COUNTERSIGNE
Professor & Head
ANNA UNIVERSITY - CHENNAI 600 0
Department of MCA
ABSTRACT
i
This Project “A STRUCTURE FOR EMOTION EXPLORATION WITH OPINION
concern. Informal organizations enable the influenced populace to share their encounters. Web-
based social networking furnishes boundless chances to impart encounters to their best
recommendation. In current situations and with accessible new advances, twitter can be utilized
Twitter is a most prevalent online long range informal communication benefit that
empower client to share and pick up information. This empowered us to precisely speak to client
put away in database and those tweets are distinguished and characterized whether it is client
watchwords related post utilizing Support Vector Machine order. The client watchwords can be
audit/tweets of the general population posted in online networking. This framework manages the
difficulties that show up during the time spent Sentiment Analysis, continuous tweets are
considered as they are rich wellsprings of information for assessment mining and feeling
examination on the tweets that are extricated from the twitter and give time based investigation
to the client.
i
ACKNOWLEDGEMENT
Chairperson, Rtn. MPHF. R.S.K. RAGURAAM D.A.E., M.Com, Pro-Chairman for permitting
I express my gratitude to our Principal Dr. D. VASUDEVAN, M.E., Ph.D., for his
Head, Department of Computer Applications, PSNA College of Engineering and Technology for
his invaluable guidance, support and suggestions throughout the course of this project work.
Technology for her suggestions and constructive criticisms during the course of my project.
I am thankful to all teaching and non-teaching staff of this Department who helped me in
successfully completing my project work. Last but not the least, I am grateful to my parents and
i
i
TABLE OF CONTENT
Chapter Page
Title
No. No.
ABSTRACT iv
ACKNOWLEDGEMENT v
LIST OF TABLES ix
LIST OF FIGURES x
LIST OF ABBREVIATIONS xi
1 INTRODUCTION 1
1.1 Problem Statement 3
1.2 Challenges 3
1.3 Project Aim 4
1.4 Chapter Plan 4
2 SYSTEM ANALYSIS 5
2.1 Existing System 5
2.1.1 Disadvantage of Existing System 5
2.2 Proposed System 6
2.2.1 Advantage of Proposed System 6
2.3 Requirement Gathering 7
2.4 Company Profile 8
2.5 Requirement Specification 8
3 SYSTEM DESIGN 9
3.1 Data Flow Diagram 10
3.1.1 User Login 10
3.1.2 Registration process 10
3.1.3 Pre-Processing 10
i
3.2 Module Description 15
3.3 Database Design 17
4 IMPLEMENTATION 16
4.1 Background Study 18
4.2 Implementation 22
5 SOFTWARE TESTING 32
5.1 Testing 32
5.1.1 White Box Testing 32
5.1.2 Black Box Testing 32
5.2 Unit Testing 33
5.3 Integration Testing 33
5.4 Validation Testing 33
5.5 Test Cases 34
6 CONCLUSION 36
6.1 Conclusion 36
6.2 Future Enhancements 36
LIST OF TABLES
i
3.3 Company Master 18
3.4 Social Problems Master 19
3.5 Add New User Master 19
5.5 Test Cases 35
LIST OF FIGURES
i
5.1 Unit testing 31
5.2 Integration testing 32
LIST OF ABBREVIATIONS
Acronym Abbreviations
OOPS Object Oriented Programming Language
HTML Hypertext Markup Language
CSS Cascading Style Sheet
Ajax Asynchronous JavaScript And XML
OS Operating System
i
i
CHAPTER 1
INTRODUCTION
Nowadays, social network platforms are the getting popular where millions of users can
give their views about any product. Sentiment analysis gives an effective and efficient means to
expose public opinion timely which gives vital information for decision making in various
domains.
Main application of sentiment analysis is to classify a given text to one or more pre-
defined sentiment categories and can be used for decision making in various domains. It is
generally difficult to find the exact causes of sentiment variations since they may involve
complicated internal and external factors.
It seems very difficult to find the exact reasons behind sentiment variations as number of
blogs are more than thousands for the target event. The Latent Dirichlet Allocation (LDA) based
models are used to analyze blogs in significant variation periods, and infer possible reasons for
the variations.
The First LDA –Based model, called Foreground and Background LDA (FB-LDA), can
filter out background topics and extract foreground topics from blog in the variation period, with
the help of a supplementary set of background blogs generated just before the variation. By
taking away the interference of longstanding background topics, FB-LDA can address the first
aforementioned challenge.
To handle the last two challenges, we propose another generative model called Reason
Candidate and Background LDA (RCB-LDA). RCB-LDA first extracts representative tweets for
the foreground topics from FB-LDA) as reason candidates.
57
1.1 PROBLEM STATEMENT
The existing system uses the two state-of-the-art sentiment analysis tools for assigning
the sentiment label where the SentiStrength tool is based on the LIWC sentiment lexicon.
According to the sentiment lexicon a sentiment score is assign to each word in the text and then
select the highest positive score and the highest negative score among those of all individual
words in the text. Then calculate the sum of the highest positive score and the highest negative
score and denoted as Final Score. Lastly, a Final Score is used to indicate whether a tweet is
positive, neutral or negative.
Also another tool is used for this purpose. It is based on a Maximum Entropy classifier. It
automatically collects the blogs with emoticons as noisy labels to train the classifier. A sentiment
label is assigned based on the classifiers. These two tools are popular, but still a large proportion
of blogs contain noises after pre-processing. Therefore its performance on real datasets is
unsatisfactory.
To overcome this existing system with respect to efficiency, we here use, in the proposed
system - SVM (Support Vector Machine) algorithm in place of the sentiment analysis tools.
SVM is generally used for text categorization. It can achieve good performance in high-
dimensional feature space. An SVM algorithm represents the examples as points in space,
mapped so that the examples of the different categories are separated by a clear margin as wide
as possible.
The basic idea is to find the hyper plane which is represented as the vector which
separates document vector in one class from the vectors in other class. In text classification we
have to deal with many features (may be more then 1000). Since SVM uses over fitting
protection, which does not depend on number of features so they have ability to handle large
number of features.
57
1.2 CHALLENGES
Social media scraping and analytics provides a rich source of academic research
challenges for social scientists, computer scientists and funding bodies. Challenges include:
Scraping—although social media data is accessible through APIs, due to the commercial
value of the data, most of the major sources such as Facebook and Google are making it
increasingly difficult for academics to obtain comprehensive access to their ‘raw’ data; very few
social data sources provide affordable data offerings to academia and researchers.
News services such as Thomson Reuters and Bloomberg typically charge a premium for
access to their data. In contrast, Twitter has recently announced the Twitter Data Grants
program, where researchers can apply to get access to Twitter’s public tweets and historical data
in order to get insights from its massive set of data (Twitter has more than 500 million tweets a
day)
Social media data is clearly the largest, richest and most dynamic evidence base of
human behavior, bringing new opportunities to understand individuals, groups and society.
Innovative scientists and industry professionals are increasingly finding novel ways of
automatically collecting, combining and analyzing this wealth of data.
Naturally, doing justice to these pioneering social media applications in a few paragraphs
is challenging. Three illustrative areas are: business, bioscience and social science.
The early business adopters of social media analysis were typically companies in retail
and finance. Retail companies use social media to harness their brand awareness,
product/customer service improvement, advertising/marketing strategies, network structure
analysis, news propagation and even fraud detection.
57
1.3 PROJECT AIM
It has been reported that events in real life indeed have a significant and immediate effect
on the public sentiment in online. However, none of these studies performed further analysis to
mine useful insights behind significant sentiment variation, called public sentiment variation.
This chapter 1, “Introduction” deals with overview of the project which comprises of problem
statement, challenges faced in the project and aim of the project.
Chapter 2, System analysis is followed by chapter 1 which deals with requirement gathering,
existing system of our project and proposed system of our project.
Chapter 3, System design is followed by chapter 2 which depicts module description, database
design and UML diagram.
Next Chapter 4, implementation depicts about frontend, backend software and steps to install the
proposed software.
Last Chapter, Chapter 5 -Testing deals with testing and outcome of test cases and finally
concluded with scope of our project and future proposal of our project.
57
CHAPTER 2
SYSTEM ANALYSIS
In the current framework, connections between words were not considered at all for
supposition examination and a sentence is just considered as a gathering of words.
They required clients' relationship information and past messages that the clients posted,
which are not generally accessible in interpersonal organizations because of the security
laws.
In this framework makes negative connection expectation more difficult than only a
positive.
It is identified with discover feeling extremity from short sentences. Sentence level is
only near subjectivity arrangement.
In existing framework twitter API to gather information from twitter. Tweets which
contain sentiments were sifted through. They likewise worked for disposal of undesirable
highlights by utilizing the Mutual Information. At long last, the approach for anticipating
the tweets as positive or negative improved exactness by this strategy.
DISADVANTAGES
57
2.2 PROPOSED SYSTEM
The proposed framework utilizes Twitter to get the data and process on it. The data from
the Twitter is removed utilizing crawing and Twitter API.
The twitter API will slither the tweets from twitter utilizing twitter4j.
Twitter4j will separate the tweets and show to the client I the table arrangement.
These separated tweets are then preprocessed by supplanting the short frame words with
full shape.
It likewise expel the stop words frame the separated tweets are then put away in the
database.
The pre-processed tweets are additionally arranged utilizing SVM grouping in light of the
classification.
In this framework it is ordered client watchwords related tweets.
Polarity recognition is finished by the watchwords like great, awful and so forth.
Based on the quantity of positive tweets and the quantity of negative tweets it
investigations the best recommendation.
This framework is exceptionally helpful for the clients to pick up learning
about the best proposal.
ADVANTAGES
57
2.3 REQUIREMENT GATHERING
FUNCTIONAL REQUIREMENTS
Functional requirements describe what the software system should do. The
functional requirements mainly are the statements of the services that the system should
provide, how the system react to inputs, and how the system should behave in some
particular situations .
There are six groups of functionalities of social network which are discussed to
facilitate the modularization and integration of different social network applications.
The six basic functionalities consist of identity management, expert finding, context
awareness, contact management, network awareness, and exchange.
57
2.4 COMPANY PROFILE
• At uniq , it means achieving real business results that allow you to transform, and not just
maintain your operations.
• Our IT services, industries solutions, and outsourcing bring you a level of certain that no
other competitor can match.
• you II experience on time completion of requirements not only within budget but also
with high quality. High efficiency and responsiveness to business on our part gives you
the unique ability to shift investment to strategic initiatives.
• We are leaders in software technology, with the most comprehensive range of software
solutions for industries and schools.
HARDWARE REQUIREMENTS
SOFTWARE REQUIREMENTS
Language :Java
57
Web server :Apache Tomcat 8.0
CHAPTER 3
SYSTEM DESIGN
Symbol Meaning
Process
External Entity
Data Flow
Data Store
57
DATA FLOW DIAGRAM:
57
Level 2: Preprocessing Interface
57
Registration
Login
USER SERVER
Preprocessing
SVM Classification
Polarity Detection
CLASS DIAGRAM
The class diagram shows how the different entities (people, things, and data) relate to
each other; in other words, it shows the static structures of the system. A class diagram can be
used to display logical classes. Class diagrams can also be used to show implementation classes,
which are the things that programmers typically deal with. A class is depicted on the class
diagram as a rectangle with three horizontal sections, as shown in above figure. The upper
section shows the class's name; the middle section contains the class's attributes; and the lower
section contains the class's operations (or "methods"). The diagram has five main classes which
give the attributes and operations used in each class.
57
Registration LOGIN e WEB CRAWLER
+mob no
+enter
+register
POLARITY
DETECTION
SVM CLASSIFICATION
+drugs
+drugs
+disease
+detect()
+classify()
+classify()
SEQUENCE DIAGRAM
57
Fig 3.3 Overall Sequence Diagram for Different Process
Fig 3.3 Describes the client send the message to server for pre-processing the Tweets
COLLABORATION DIAGRAM
57
Fig 3.4 Overall Collaboration diagram for UML Cases
SYSTEM ARCHITECTURE
TWITTER TWITTER 4J
API TRAINED DATA
SETS
EXTRACT
TWEETS DATABASE SENTIMENT
ANALYSIS
USING SVM
POLARITY OF
USER COMMENTS
Fig 3.5 Overall System Architecture Diagram for User and Database
Fig 3.5 Describes the Tweet Can process all the data’s in the server
57
3.2 MODULES
MODULE DESCRIPTION
In this tweet we have to first create an registration account, While we using the account
its secure and safe for our peoples. After that we have to create new username, password, and
Email-id and click Register means it will create successfully, that is some peoples don’t have an
account means they have to create that Registration .Then we have to create Login that is Some
people already have an account to use the Tweet because it can easily go inside on that login,
been used in this Twitter.
TWITTER EXTRACTION
Client can collaborate as interface between the client and the framework. New client need
to make a record by giving the username and secret key, the enrolled client can straightforwardly
login and can go into the framework twitter seek space. In seek space client can give the info,
and client get the tweets from the twitter. To remove the tweets, first the association ought to be
built up with twitter account utilizing the twitter API called twitter4j.
At that point make the twitter designer application in twitter engineer site. From the
created application we get the customer key, mystery key, Access token and token mystery key.
Utilizing these keys and tokens, it is Configured and associated with twitter. In this API it
contains numerous parameters to concentrate and read from the Twitter Factory by utilizing
inquiry look and need to keep up the question indexed lists in Query Result. Utilizing get Tweets
strategy we can get the tweets, from which we can remove the tweet username.
57
PREPROCESSING
The separated tweets are the preprocessed by evacuating stop words, short shape and
emoticons. All unmeaningful words in the tweets, for example, stop words are been expelled.
Every single short frame will be supplanted with full words so it is reasonable for every one of
the clients. Emojis are known as smileys, there are shifts sorts of smileys. For each smileys there
are some enthusiastic sentiments in it, which the client use to convey in substantially less
demanding way however it isn't vital all the client will know the importance all things
considered. Along these lines, every one of the emojis are supplanted with their particular
significance.
CLASSIFICATION
Naïve bayes are regulated learning models with related learning calculations that
investigate information and perceive designs, utilized for characterization and relapse
examination. Bolster Vector Machines depend on the idea of choice planes that characterize
choice limits. A choice plane is one that isolates between an arrangements of items having
distinctive class participations.
A schematic case: medications and maladies. After the Preprocessing the tweets are
arranged into catchphrase related tweets. The words are recognized in view of the watchwords to
characterize the tweets. This vocabulary examination strategy is utilized to discover the favored
class from the expansive number of tweets.
POLARITY PREDICTION
The grouped tweets are broke down in view of extremity of the words like great, awful,
not, un and so forth. In light of the extremity the quantity of positive tweets and negative tweets
are distinguished. We are utilizing the Naive bayes classifier for order procedure for finding the
extremity of the tweets and remarks like positive tweets, negative, blended or nonpartisan.
57
3.3 DATABASE DESIGN.
Table 3.1 Database Design Used to store the Database field name, data type, and
constraints to store the particular data.
Table 3.2 Database Design Used to store the Database field name, data type, and
constraints in the username and password.
57
CHAPTER 4
IMPLEMENTATION
Implementation is the stage of the project when the theoretical design is turned into a
working system. Thus it can be considered to be the most critical stage in achieving a successful
new system and in giving the user confidence that the new system will work and be effective.
There are various Features and Benefits in implementing the java language in the project.
PLATFORM INDEPENDENT
Not even a single language is idle to this feature but java is more closer to this feature.
The programs written on one platform can run on any platform provided the platform must have
the JVM.
SIMPLE
There are various features that makes the java as a simple language. Programs are easy to
write and debug because java does not use the pointers explicitly.
It is much harder to write the java programs that can crash the system but we cannot say
about the other programming languages. Java provides the bug free system due to the strong
memory management. It also has the automatic memory allocation and deallocation system.
57
ROBUST
Java has the strong memory allocation and automatic garbage collection mechanism. It
provides the powerful exception handling and type checking mechanism as compare to other
programming languages. Compiler checks the program whether there any error and interpreter
checks any run time error and makes the system secure from crash. All of the above features
makes the java language robust.
PORTABLE
The feature Write-once-run-anywhere makes the java language portable provided that the
system must have interpreter for the JVM. Java also have the standard data size irrespective of
operating system or the processor. These features makes the java as a portable language.
DYNAMIC
While executing the java program the user can get the required files dynamically from a
local drive or from a computer thousands of miles away from the user just by connecting with the
Internet.
SECURE
Java does not use memory pointers explicitly. All the programs in java are run under an
area known as the sand box. Security manager determines the accessibility options of a class like
reading and writing a file to the local disk. Java uses the public key encryption system to allow the
java applications to transmit over the internet in the secure encrypted form. The bytecode Verifier
checks the classes after loading.
PERFORMANCE
Java uses native code usage, and lightweight process called threads. In the beginning
interpretation of bytecode resulted the performance slow but the advance version of JVM uses the
adaptive and just in time compilation technique that improves the performance.
57
4.3.1 BENEFITS
The computer world currently has many platforms. This has its pros and cons. On the one
hand it gives more choices to people; on the other hand it becomes more and more difficult to
produce software that runs on all platforms. With its Java Virtual Machine and API, the Java
Platform provides an ideal solution to this. The Java Platform is designed for running highly
interactive, dynamic, and secure applets and applications on networked computer systems.
Being interactive, dynamic and architecture-neutral, the Java Platform has benefits not
only for the developer and support personnel, but also for the end user.
For the end users, the platform provides live, interactive content on the World Wide Web,
with just-in-time software access. Applications are readily available on all operating systems at
once. Users do not have to choose operating systems based on the applications, they can run the
applications on their favorite machines.
Developers can develop applications on one platform to deliver to that same platform --
the Java Platform, which is available on a wide variety of OS and hardware platforms. This much
reduces the developing cost.
For support personnel, version control and upgrades are much simplified because Java-
enabled application can be kept in a central repository and served from there for each individual
use.
GOALS OF JAVA
Garbage Collection.
57
The java virtual machine(JVM) provide the hardware platform specification to which you
compile all java technology code
Garbage Collection is an allocated memory that is no longer needed should be
deallocated.
FEATURES OF OOPS
* Encapsulation
* Data abstraction
* Inheritance
* Polymorphism
* Message passing
ENCAPSULATION
It is the process that associates the codes and the data in a single unit used for hiding the
complications in the program.
DATA ABSTARCTION
It is defined as the collection of the data and the functions in a single program.
INHERITANCE
It allows the extension and the reuse of the existing code without having to rewrite the
code again. It involves the creation of new classes (derived classes) from the already existing
classes.
POLYMORPHISM
57
It allows the same name or operator to use in different ways depending on the type of data
being passed to it
MESSAGE PASSING
It is the process of invoking a function on an object.
MULTIPLE INHERITANCE
The multiple inheritances is achieved by a concept known as interface in java.
GENERAL CONCEPTS OF JAVA
METHOD OVERRIDDING
If there may be a situation, when an object wants to respond same method with different
behavior by calling the method this process is called overriding.
VECTOR
vectors are similar to arrays that can hold objects of any type and any number. Vector class
is contained in the java.util package.
Syntax:- Vector a=new Vector(3);
Syntax:
Class Class name implements interface name
{
variable declaration;
method codes ();
}
PACKAGES
By using packages we can reuse other program classes . It provides hiding the classes so
that other programs cannot access the classes. There are two types of packages available. They
are java API packages and User defined packages.
57
Threads
The ability to execute several programs at the same time is called multithreaded
programming. A single program is known as thread.
The various states of thread are as follows:
New born state
Runnable state
Running state
Blocked state
Dead state
EXCEPTIONAL HANDLING
Errors are the common problems in programming; the two types of errors are Compile
time errors and run time errors. Exception is a run time error and it should be caught and handled
properly.
What's most special about Java in relation to other programming languages is that it lets
you write special programs called applets that can be downloaded from the Internet and played
safely within a web browser. Traditional computer programs have far too much access to your
system to be downloaded and executed willy-nilly. Although you generally trust the maintainers
of various ftp archives and bulletin boards to do basic virus checking and not to post destructive
software, a lot still slips through the cracks. Even more dangerous software would be
57
promulgated if any web page you visited could run programs on your system. You have no way
of checking these programs for bugs or for out-and-out malicious behavior before downloading
and running them.
Java solves this problem by severely restricting what an applet can do. A Java applet
cannot write to your hard disk without your permission. It cannot write to arbitrary addresses in
memory and thereby introduce a virus into your computer. It should not crash your system.
Much of the syntax of Java is the same as C and C++. One major difference is that Java
does not have pointers. However, the biggest difference is that you must write object oriented
code in Java. Procedural pieces of code can only be embedded in objects. In the following we
assume that the reader has some familiarity with a programming language. In particular, some
familiarity with the syntax of C/C++ is useful.
In Java we distinguish between applications, which are programs that perform the same
functions as those written in other programming languages, and applets, which are programs that
can be embedded in a Web page and accessed over the Internet. Our initial focus will be on
writing applications. When a program is compiled, a byte code is produced that can be read and
executed by any platform that can run Java.
Java technology readily harnesses the power of the network because it is both a
programming language and a selection of specialized platforms. As such, it standardizes the
development and deployment of the kind of secure, portable, reliable, and scalable applications
required by the networked economy. Because the Internet and World Wide Web play a major
role in new business development, consistent and widely supported standard are critical to
growth and success.
57
ECLIPSE
Used in computer programming, and is the most widely used Java IDE. It contains a
base workspace and an extensible plug-in system for customizing the environment. Eclipse is
written mostly in Java and its primary use is for developing Java applications, but it may also be
used to develop applications in other programming languages through the use of plugins, Eclipse
uses plug-ins to provide all the functionality within and on top of the runtime system. Its runtime
system is based on Equinox, an implementation of the OSGi core framework specification.
Advantages of Eclipse
1. Using IDE will cost you less time and effort.
3. Auto completion- one of the best features, you don’t have to remember all.
57
4. Refactoring
57
BENEFITS OF BIG DATA
Big data is really critical to our life and its emerging as one of the most important
technologies in modern world. Follow are just few benefits which are very much known to all
of us:
Using the information kept in the social network like Facebook, the marketing agencies
are learning about the response for their campaigns, promotions, and other advertising
mediums.
Using the information in the social media like preferences and product perception of
their consumers, product companies and retail organizations are planning their
production.
Using the data regarding the previous medical history of patients, hospitals are providing
better and quick service.
To harness the power of big data, you would require an infrastructure that can manage
and process huge volumes of structured and unstructured data in Realtime and can protect data
privacy and security.
There are various technologies in the market from different vendors including Amazon,
IBM, Microsoft, etc., to handle big data. While looking into the technologies that handle big
data, we examine the following two classes of technology:
NoSQL Big Data systems are designed to take advantage of new cloud computing
architectures that have emerged over the past decade to allow massive computations to be run
57
inexpensively and efficiently. This makes operational big data workloads much easier to
manage, cheaper, and faster to implement.
Some NoSQL systems can provide insights into patterns and trends based on real-time data with
minimal coding and without the need for data scientists and additional infrastructure.
Capturing data
Curation
Storage
Searching
Sharing
Transfer
Analysis
Presentation
To fulfill the above challenges, organizations normally take the help of enterprise servers.
HADOOP
57
Doug Cutting, Mike Camarilla and team took the solution provided by Google and
started an Open Source Project called HADOOP in 2005 and Doug named it after his son's toy
elephant. Now Apache Hadoop is a registered trademark of the Apache Software Foundation.
Hadoop runs applications using the MapReduce algorithm, where the data is processed in
parallel on different CPU nodes. In short, Hadoop framework is capable enough to develop
applications capable of running on clusters of computers and they could perform complete
statistical analysis for a huge amounts of data.
ADVANTAGES OF HADOOP
Hadoop framework allows the user to quickly write and test distributed systems. It is
efficient, and it automatic distributes the data and work across the machines and in turn,
utilizes the underlying parallelism of the CPU cores.
Hadoop does not rely on hardware to provide fault-tolerance and high availability
(FTHA), rather Hadoop library itself has been designed to detect and handle failures at
the application layer.
Servers can be added or removed from the cluster dynamically and Hadoop continues to
operate without interruption.
Another big advantage of Hadoop is that apart from being open source, it is compatible
on all the platforms since it is Java based.
MAP REDUCE
MapReduce is a processing technique and a program model for distributed computing
based on java. The MapReduce algorithm contains two important tasks, namely Map and
Reduce. Map takes a set of data and converts it into another set of data, where individual
elements are broken down into tuples (key/value pairs). Secondly, reduce task, which takes the
output from a map as an input and combines those data tuples into a smaller set of tuples. As the
sequence of the name MapReduce implies, the reduce task is always performed after the map
job.
57
INPUTS AND OUTPUTS ( JAVA PERSPECTIVE)
The MapReduce framework operates on <key, value> pairs, that is, the framework views
the input to the job as a set of <key, value> pairs and produces a set of <key, value> pairs as the
output of the job, conceivably of different types.
The key and the value classes should be in serialized manner by the framework and
hence, need to implement the Writable interface. Additionally, the key classes have to
implement the Writable-Comparable interface to facilitate sorting by the framework. Input and
Output types of a MapReduce job: (Input) <k1, v1> -> map -><k2, v2>-> reduce -><k3,
v3>(Output).
The term MapReduce actually refers to the following two different tasks that Hadoop
programs perform:
This is the first task, which takes input data and converts it into a set of data, where
individual elements are broken down into tuples (key/value pairs).
This task takes the output from a map task as input and combines those data tuples into a
smaller set of tuples. The reduce task is always performed after the map task.. Typically both the
input and the output are stored in a file-system.
The framework takes care of scheduling tasks, monitoring them and re-executes the
failed tasks. The Job Tracker is a single point of failure for the Hadoop MapReduce service
which means if Job Tracker goes down, all running jobs are halted.
INTRODUCTION
57
HDFS
HDFS stands for Hadoop Distributed File System. HDFS is one of the core components
of the Hadoop framework and is responsible for the storage aspect.
NAME NODE
The name node is the commodity hardware that contains the GNU/Linux operating system
and the name node software. It is a software that can be run on commodity hardware. The system
having the name node acts as the master server and it does the following tasks:
It also executes file system operations such as renaming, closing, and opening files and
directories.
DATA NODE
Data node is the slave/worker node and holds the user data in the form of Data Blocks.
There can be any number of Data nodes in a Hadoop Cluster.
The data node is a commodity hardware having the GNU/Linux operating system and
data node software.
For every node (Commodity hardware/System) in a cluster, there will be a data node.
These nodes manage the data storage of their system.
Data nodes perform read-write operations on the file systems, as per client request.
57
They also perform operations such as block creation, deletion, and replication according
to the instructions of the name node.
CHAPTER 5
SOFTWARE TESTING
The software testing process commences once the program is created and the
documentation and related data structures are designed. Software testing is essential for
correcting errors. Otherwise the program or the project is not said to be complete. Software
testing is the critical element of software quality assurance and represents the ultimate the review
of specification design and coding. Testing is the process of executing the program with the
intent of finding the error. A good test case design is one that as a probability of finding a yet
undiscovered error.
This t testing, by knowing the specific functions that a product has been design to
perform test can be conducted that demonstrate each function is fully operational at the same
time searching for errors in each function. It can be used in unit Testing for user and login can be
easily identified Testing phase is the development phase that validates the good comments in
tweets and against the Polarity analysis. Testing is a vital to the achievement of the system goals.
57
5.1.2 BLACK BOX TESTING
In this testing by knowing the internal operation of a product, test can be conducted to
ensure that “all gears mesh”, that is the internal operation performs according to specification
and all internal components have been adequately exercised. It fundamentally focuses on the
functional requirements of the software.it is used in validate the each comments on the
emoticons.
57
Integration testing is a systematic technique for construction the program structure while
at the same time conducting tests to uncover errors associated with interfacing. i.e., integration
testing is the complete testing of the set of modules which makes up the product. Critical
modules should be tested as early as possible. The data must be used in sentimental analysis
they, can be easily modified with the required field. They can be easily integrated with positive,
negative, neutral comments.in tweets. Only three times we can integrate the data’s one time. For
example: If we can easily rectified the data before in Tweets by using emoticons and easily
identified the related data.
Table 5.5 Test Cases for overall input and output process
57
Table 5.5 Describes the test cases for user login to register the required field they can
be used emoticons for polarity. They can be seen it is positive, negative or neutral comments.
CHAPTER 6
CONCLUSION
6.1 CONCLUSION
In this paper, we have proposed a framework for ordering drugs in light of extremity
investigation of twitter information. The twitter tweets are removed with twitter API utilizing
twitter4j.
From the twitter created application all the keys and token are produced, with these data
we can associate the twitter with twitter API. At that point extricated tweets are preprocessed by
evacuating stop words, short structures and emoji’s. The preprocessed tweets are characterized
utilizing Naïve Bayes grouping and extremity of the tweets is anticipated for conclusive
arrangement. This framework interpersonal organization based social investigation parameters
can build the forecast more precision and speedy reaction analyze.
57
6.2 FUTURE ENHANCEMENT
This is useful for consumers who want to research the sentiment of products before
purchase, or companies that want to monitor the public sentiment of their brands. There is no
previous research on classifying sentiment of messages on micro blogging services like Twitter.
It have accuracy above 80% when trained with emoticon data. The main contribution of
this paper is the idea of using tweets with emoticons for distant supervised learning. In Future
Enhancement we can use Facebook, Instagram in dataset that we can collect in tweets.
APPENDIX:A
SCREENSHOT
HOME PAGE
57
REGISTER PAGE
57
LOGIN PAGE
57
SEARCH
57
VIEW TWEETS
57
EMOTIONS
57
GOOD WORDS
57
POLARITY
57
GRAPH OUTPUT
57
APPENDIX:B
CODING
57
EmotionServlet.java
package controller;
import java.io.IOException;
import java.util.ArrayList;
import javax.servlet.RequestDispatcher;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import dao.DFSAccess;
import dao.Emotion;
@WebServlet("/EmotionServlet")
super();
57
protected void service(HttpServletRequest, Http Servlet Response response) throws
Servlet Exception, IO Exception
ArrayList<String> al = Emotion.emotions();
DFSAccess.writeEmotion("tweet", al);
request.setAttribute("emotion", al);
RequestDispatcher rd = request.getRequestDispatcher("emotion.jsp");
rd.forward(request, response);
response.getWriter().append("Servedat:").append(request.getContextPath());
/**
response)*/
57
// TODO Auto-generated method stub
Do Get(request, response);
LoginServlet.java
package controller;
import java.io.IOException;
import javax.servlet.RequestDispatcher;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpSession;
import bean.UserBean;
import dao.LoginDao;
@WebServlet("/LoginServlet")
public LoginServlet()
57
{
super();
if(bean.isValid())
HttpSession hs = request.getSession();
hs.setAttribute("user", bean);
RequestDispatcherrd =
request.getRequestDispatcher("search.jsp");
rd.forward(request, response);
else
RequestDispatcher rd = request.getRequestDispatcher("login.jsp");
57
rd.forward(request, response);
/**
* response) */
response.getWriter().append("Servedat:").append(request.getContextPath());
/**
* response)
*/
57
throws ServletException, IOException
doGet(request, response);
PolarityServlet.java
package controller;
import java.io.IOException;
import java.util.HashMap;
import javax.servlet.RequestDispatcher;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import polarity.PolarityDriver;
import polarity.PolarityRead;
57
/**
*/
@WebServlet("/PolarityServlet")
/**
* @see HttpServlet#HttpServlet()
*/
public PolarityServlet()
super();
try
57
{
PolarityDriver.main(null);
catch (Exception e)
System.out.println(e);
e.printStackTrace();
HashMap<String,String> hm = PolarityRead.readPolarity();
request.setAttribute("pol", hm);
RequestDispatcher rd = request.getRequestDispatcher("polarity.jsp");
rd.forward(request, response);
response.getWriter().append("Servedat:").append(request.getContextPath());
57
}
/**
Do Get(request, response);
57
REFERENCE:
[1] G. Cugola, E. D. Nitto, and A. Fuggetta, “The jedi event-based infrastructure and its
application to the development of the opsswfms,” IEEE Transactions on Software Engineering,
vol. 27, no. 9, pp. 827–850, Sep 2001.
[3] S. Wasserkrug, A. Gal, O. Etzion, and Y. Turchin, “Efficient processing of uncertain events
in rule-based systems,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 1,
pp. 45–58, Jan 2012.
[5] S. Bandinelli, A. Fuggetta, and C. Ghezzi, ªProcess Model Evolution in the SPADE
Environment,º IEEE Trans. Software Eng., Dec. 1993
[6] G. Cugola, ªTolerating Deviations in Process Support Systems Via Flexible Enactment of
Process Models,º IEEE Trans. Software Eng., vol. 24, no. 11, Nov. 1998.
[7] A. Fuggetta, G.P. Picco, and G. Vigna, ªUnderstanding Code Mobility,º IEEE Trans.
Software Eng., May 1998.
[8] B. Krishnamurthy and D.S. Rosenblum, ªYeast: A General Purpose Event-Action System,º
IEEE Trans. Software Eng., vol. 21, no. 10, Oct. 1995.
[9] R.N. Taylor, N. Medvidovic, K.M. Anderson, E.J. Whitehead Jr., J.E. Robbins, K.A. Nies, P.
Oreizy, and D.L. Dubrow, ªA Component-Based Architectural Style for GUI Software,º IEEE
Trans. Software Eng., vol. 22, no. 6, June 1996.
57
[10] K.Verma, P.Doshi, K.Gomadam, J.Miller, and A.Sheth, “Optimal adaptation in web
processes with coordination constraints,” in Proceedings of ICWS 2006. Los Alamitos, CA,
USA: IEEE Computer Society, 2006, pp. 257–264.
57