Professional Documents
Culture Documents
Submitted by Keerthana Subramani ASU ID: 1203845157
Submitted by Keerthana Subramani ASU ID: 1203845157
Source: Huffington Post News website that has news articles and user comments. Data type: Text data Structure of pages: HTML Seed URL and search query is specified Crawl URLs using depth first approach and store in database. k, k+1th , k+2 and so on..
Eyeballing Identify anchors Access source using the URL Run RE over HTML data Get title, timestamp, tags and article data and store to database. User comments are also extracted. Identify by string matching
Query dataset using timestamp. Creating a timeline of events from the timestamp of each data Software used: Timeline creator Shows the date and title, occurrence of events over time