Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4


Chapter 1
With the successfully growing of internet, many social media web sites have been

developed so fare for people to communicate and share contents, every day vast amount of data

is generated by users in every social media web sites, somewhat these generated contents are

significant in terms of exploration forecasting and decision making.

Social media [22] web sites such as Twitter became the essential public data provider

among the social websites; twitter is a perfect messaging tool that allows registered users to

transmit short posts, usually called tweets. Twitter members can broadcast tweets and follow

other users and read users tweets by using multiple platforms and devices. The amount of data on

twitter is increasing day by day relatively users tweets are becoming more and more valuable and

are becoming an authoritative resource of knowledge to support the investigation of different

features of online social users and group of people, with the increasing amount of data on twitter

people are having a difficult time to manage the data and keep it secure. Useful data or

information is a concern for everyone and every organization; some several techniques and

technologies are introduced to extract meaning full data from the vast quantity of data generated

in twitter data.

Data Mining [2,16] also known as knowledge discovery in the database in the process to

analyzing data from a different side and making it meaningful information, data mining is a

powerful and new technology with great functionalities to help companies focus on the most

critical information stored in their data warehouse, there are many data mining tools available

which predicts the future trends and behavior, data mining techniques are used in many research

areas banking, social media data, web mining, genetics and marking, a type of data mining is

used on twitter data to take advantage for a considerable quantity of data generated by users to

explore user interests and behavior.

Twitter [8,14,15] is an internet-based application, recently developed some API’s

(Application Programming Interface) to extract tweets posted on the server, there are different

types of twitter API available which can be used to download and modify tweets. Twitter API’s

are free to use, but there are various issues, and problems raise using these API’s twitter API’s is

used to search about the timeline of a specific user and tweets for a specific topic, some of the

twitter API’s return a large quantity of data which is not easily understandable by a person, most

people use programming language C#, VB.Net, Java and python to convert data returned from

twitter API’s into readable format.

Text mining [23,1] is defined as a knowledge-intensive process in which a user interacts

with a document collection. As in data mining [25,26], text mining seeks to extract useful

information from the data source through the identification and exploration of exciting patterns.

A key element of text mining is its focus on the document collection. A document collection can

be any grouping of text-based documents. Most text mining solutions are aimed at discovering

patterns across substantial document collections.

1.1 Problem Background

With the rising popularity of social media networking sites such as Twitter, people are

becoming more familiar with using twitter. Twitter has millions of registered users and

thousands of them signing every day to express their opinion through short messages called

tweets, tweets are becoming more and more critical and are the powerful resource of knowledge

to support the identification of different features of online social users and group of people.

With the increasing quantity of tweets on twitter, people are facing problems to extract

and analyze the tweets as tweets are unstructured data. Therefore, there is a pressing need for

making an efficient approach to analyze the tweets and extract information.

1.2 problem Statement

Twitter is the most famous social media networking site where users can post brief online

messages known as tweets. Tweets can consist of a maximum of 140 characters and are available

to other twitter users around the world that are potent sources of exploring and expanding

business. Tweets are unstructured data; therefore, there should be a consistent information

system to convert the unstructured data to structured data and mine the data that can Be further

analyzed. With the increasing availability of data on Twitter, there is no such information system

to extract contextual tweets from twitter to identify user interests for further exploration. The

proposed solution can overcome the mentioned issues.

1.3 Motivation of the Research

Due to massive data available on social networking sites in recent years, data mining,

which is also known as knowledge discovery in the database, diverted great interest with the

need to convert such data into useful information. The popularity of twitter and vast data

availability motivated the research in tweets collection from twitter to explore personal opinion.

Therefore, there is rising demand and interest in tweets mining. Generally, tweets are the

collection of unstructured communication text or personal opinion. Therefore, massive assets of

data are embedded in the tweets that need to be extracted. Thus, there is a functional need for

effective techniques to process tweets to extract knowledge and explore different aspects of

online social user’s opinions for further decision making and forecasting. Commonly, data

mining techniques are applied to massive databases of highly structured data to find out new


Twitter is a social networking site; thousands of users signing every day and express their

opinion through short messages called tweets, tweets are unstructured data. Therefore, there is a

pressing need for making efficient approaches to mine information from unstructured data; the

mining process will not be valid if the simple tweets are not a good representation of the vast

body of data. Therefore, an essential part of the process is the validation and confirmation of


An information system can be effectively used to extract data from tweets, which can be

further analyzed with data mining techniques to discover more meaning full information. Many

data mining methods have been developed so far to achieve the goal of retrieving meaningful

information for users. A text mining approach that is key-based can be used for mining the

opinion of users. Some others used a phrase-based approach, which is believed a batter approach

than a text mining approach as it is considered more information is carried out than a single text

or term.

1.4 Benefits of Research

Useful information is a significant asset to any organization after the successful

implementation of this research. The data can be extracted based on keywords from twitter,

which can be further analyzed and mined using Microsoft SQL Server and open source

application Rapid Miner. The data mined may provide interesting information for marketing

purposes, identification of user interests.

You might also like