Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

RSS-ALGORITHM

Why TWIG?

1. Create a data structure to store feeds of relevance


2. Enter the keyword and the list of URLs for the
leading news agencies
3. Retrieve RSS feeds using an API containing a title,
an author, a URL, and the summary of the article
4. Store the RSS feeds in the list
5. Run the web crawler to retrieve the article using
the URLs thus obtained
6. Write all the data obtained in a text file

Gives a vast opinion of people in a given region


Provides a unified dataset harnessed from both social media and
news agencies, thus providing a complete picture
Allows the spatial analysis of human behavior in a given region
Identify hotspots of social unrest or any specified topic in the world

Objectives
The objectives of this work are to:
1. Implement APIs to collect data from multiple social media sites
2. Develop a web crawler to collect RSS feeds and their associated
news articles
3. Develop and implement a GeoCoder module for not only text
based sources, but images and videos as well, that will associate each
piece of data with a location
4. Develop and implement a Topic Detector module for all types of
datasets, that will help us categorize each piece of data gathered
from the web
5. Have a fully functioning User Interface that will allow one to
gather information from multiple sources on the web

Figure 1: TWIG User Interface

GUI
Social Media
(Twitter, Reddit,
Tumblr)

TWIG Engine

Anticipated Uses of the Data Collected


1.
2.
3.
4.
5.

Social unrest anticipation


Natural disaster relief
Trend Analysis for a certain topic of interest
Regional analysis
Story building

RSS Feeds (Leading


newspapers worldwide)

GeoCoder and
Topic Detector

Acknowledgement
We would like to thank Dr. Mei Chen, Dr. Lok Lew-Yan Voon, and The Citadel
Foundation for their support in this work.

Images and Videos


(Flickr, Instagram,
YouTube)

Filtered Data

MetaData Extractor

Location
Date/Time
Topic
Description
Author

You might also like