Professional Documents
Culture Documents
State Department Clinton Foundation Emails Part I
State Department Clinton Foundation Emails Part I
Keerthana Subramani
ASU ID: 1203845157
Source: Huffington Post
News website that has news articles and user
comments.
Data type: Text data
Structure of pages: HTML
Seed URL and search query is specified
Crawl URLs using depth first approach and
store in database.
k, k+1th , k+2 and so on..
Eyeballing
Identify anchors
Access source using the URL
Run RE over HTML data
Get title, timestamp, tags and article data and
store to database.
User comments are also extracted.
Identify by string matching
Query dataset using timestamp.
Creating a timeline of events from the
timestamp of each data
Software used: Timeline creator
Shows the date and title, occurrence of events
over time