Professional Documents
Culture Documents
Chapter 1 Introduction
Chapter 1 Introduction
Chapter 1
Introduction
With the successfully growing of internet, many social media web sites have been
developed so fare for people to communicate and share contents, every day vast amount of data
is generated by users in every social media web sites, somewhat these generated contents are
Social media [22] web sites such as Twitter became the essential public data provider
among the social websites; twitter is a perfect messaging tool that allows registered users to
transmit short posts, usually called tweets. Twitter members can broadcast tweets and follow
other users and read users tweets by using multiple platforms and devices. The amount of data on
twitter is increasing day by day relatively users tweets are becoming more and more valuable and
features of online social users and group of people, with the increasing amount of data on twitter
people are having a difficult time to manage the data and keep it secure. Useful data or
information is a concern for everyone and every organization; some several techniques and
technologies are introduced to extract meaning full data from the vast quantity of data generated
in twitter data.
Data Mining [2,16] also known as knowledge discovery in the database in the process to
analyzing data from a different side and making it meaningful information, data mining is a
powerful and new technology with great functionalities to help companies focus on the most
critical information stored in their data warehouse, there are many data mining tools available
which predicts the future trends and behavior, data mining techniques are used in many research
areas banking, social media data, web mining, genetics and marking, a type of data mining is
CONTEXTUAL SOCIAL NETWORK ANALYSIS AND MINING USING RAPIDMINER 2
used on twitter data to take advantage for a considerable quantity of data generated by users to
(Application Programming Interface) to extract tweets posted on the server, there are different
types of twitter API available which can be used to download and modify tweets. Twitter API’s
are free to use, but there are various issues, and problems raise using these API’s twitter API’s is
used to search about the timeline of a specific user and tweets for a specific topic, some of the
twitter API’s return a large quantity of data which is not easily understandable by a person, most
people use programming language C#, VB.Net, Java and python to convert data returned from
with a document collection. As in data mining [25,26], text mining seeks to extract useful
information from the data source through the identification and exploration of exciting patterns.
A key element of text mining is its focus on the document collection. A document collection can
be any grouping of text-based documents. Most text mining solutions are aimed at discovering
With the rising popularity of social media networking sites such as Twitter, people are
becoming more familiar with using twitter. Twitter has millions of registered users and
thousands of them signing every day to express their opinion through short messages called
tweets, tweets are becoming more and more critical and are the powerful resource of knowledge
to support the identification of different features of online social users and group of people.
CONTEXTUAL SOCIAL NETWORK ANALYSIS AND MINING USING RAPIDMINER 3
With the increasing quantity of tweets on twitter, people are facing problems to extract
and analyze the tweets as tweets are unstructured data. Therefore, there is a pressing need for
Twitter is the most famous social media networking site where users can post brief online
messages known as tweets. Tweets can consist of a maximum of 140 characters and are available
to other twitter users around the world that are potent sources of exploring and expanding
business. Tweets are unstructured data; therefore, there should be a consistent information
system to convert the unstructured data to structured data and mine the data that can Be further
analyzed. With the increasing availability of data on Twitter, there is no such information system
to extract contextual tweets from twitter to identify user interests for further exploration. The
Due to massive data available on social networking sites in recent years, data mining,
which is also known as knowledge discovery in the database, diverted great interest with the
need to convert such data into useful information. The popularity of twitter and vast data
availability motivated the research in tweets collection from twitter to explore personal opinion.
Therefore, there is rising demand and interest in tweets mining. Generally, tweets are the
data are embedded in the tweets that need to be extracted. Thus, there is a functional need for
effective techniques to process tweets to extract knowledge and explore different aspects of
online social user’s opinions for further decision making and forecasting. Commonly, data
CONTEXTUAL SOCIAL NETWORK ANALYSIS AND MINING USING RAPIDMINER 4
mining techniques are applied to massive databases of highly structured data to find out new
knowledge.
Twitter is a social networking site; thousands of users signing every day and express their
opinion through short messages called tweets, tweets are unstructured data. Therefore, there is a
pressing need for making efficient approaches to mine information from unstructured data; the
mining process will not be valid if the simple tweets are not a good representation of the vast
body of data. Therefore, an essential part of the process is the validation and confirmation of
data.
An information system can be effectively used to extract data from tweets, which can be
further analyzed with data mining techniques to discover more meaning full information. Many
data mining methods have been developed so far to achieve the goal of retrieving meaningful
information for users. A text mining approach that is key-based can be used for mining the
opinion of users. Some others used a phrase-based approach, which is believed a batter approach
than a text mining approach as it is considered more information is carried out than a single text
or term.
implementation of this research. The data can be extracted based on keywords from twitter,
which can be further analyzed and mined using Microsoft SQL Server and open source
application Rapid Miner. The data mined may provide interesting information for marketing