Real Talk Princeton Historical Data Analysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Real Talk Princeton

Subscription
Service
Real-Time Monitoring & SMS
Overview
● Used stdlib to run cron job to crawl
realtalk-princeton for updates.
● stdlib key-value store as
subscription database.
● On updates from
realtalk-princeton, parse post into
tags/categories, and send SMS
notification to subscribers.
Real Talk
Princeton
Historical Data
Analysis
Frequency, Categorical Data, Sentimental
Analysis
Methodology

- Used GCP’s Natural Language API to help


understand a dataset with significant amounts of
text (Q&A)
- Categorize Q&A → Identify temporal trends within
categories
- Performed “Sentiment Analysis” to identify the
writer’s emotions and attitude ( positive, negative, or
neutral)
Trends in Publish Time
of Posts
Weekly Post Frequency

● Recurring peaks when school


resumes:
○ Beginning of fall term (early
September)
○ Beginning of spring term
(early February)
● Troughs at the beginning of
longer breaks:
○ Summer break (early June)
○ Spring break ( mid-March)
○ Winter break (late December)
Posts Made Between 2-4 am
Number of posts made between 2-4
am each week, normalized based on
number of posts total that week.

● Steady increase in proportion


of late night posts from Sept.
2017 to July 2018 over the
school year.
○ I guess stress is cumulative.
● Steady decrease in late night
posts from mid-summer 2018
to early Sept. 2018.
● Nonconformant fluctuations in
the first four months of the
website.
Trends in Question
Categories
Frequency of Questions in Jobs &
Education Category
● Interest peaks just
before schools starts

● Interest peaks before


school ends

● Minimum value during


Winter/Summer Break
Frequency of Questions in People &
Society Category

● Minimum value during


Winter/Summer Break

● More questions asked


during fall semester as
students are potentially
adjusting to the new
social sphere
Top 4 Categories:
Sentiment Analysis
Averaged Sentiment Scores for Q&As
Weekly average weighted sentiment scores for
question and answer texts. Positive number
indicates positive attitude.

● Question sentiments dip substantially


immediately before school resumes in
September or February.
● Answer sentiments follow question
sentiments closely, while staying
consistently above
○ Questions have a more negative attitude
● Sentiments have decreased in general
over the history
● Sentiments peak in mid July and around
new years.

You might also like