Reflection

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

At the beginning of the course of Data Science for Smart Environments, I set three learning goals that are

above mentioned. To be honest, I don’t really set clear expectations and targets for this course as it is my
extra course to fill my free time in the final month of my master’s study. Eventually, I quite enjoyed this
course, revisited my interest in data science, and learned and applied a new thing about this topic.
Working in groups also makes coursework easier in this short amount of time.

My first learning goal is regarding data acquisition from social media. Since Indonesia’s presidential
election in 2019, I have been a bit interested in the role of social media in influencing the decision of
certain people to vote for the presidential candidate. From this interest, I was introduced to a project
called ‘Drone Emprit’, a project that tried to find the relation of actors to certain issues during this
election. I read their work, and it was interesting, how they can map and cluster certain actors in an
issue.

I think my revisit of this data science and social media topic came at the wrong time due to a change in
regulations in X (ex-Twitter). Since Elon Musk bought Twitter, Twitter’s API is no longer free of charge and
has more restrictions to scrape Twitter data. It’s a bit of a pity as this Twitter data could be alternative
data and it is quite interesting and useful for social science studies. This became the main problem when
I tried to scrape the data from Twitter through their API using Python script. We spent almost two weeks
figuring out the alternative social media data for Twitter. I tried with Mastodon, but there is not enough
data about this issue compared to Twitter. Another alternative is Facebook but turns out the regulation
and the access to its API are more complicated than Twitter. After that, we looked for alternative ways to
scrape data and found a website for a scraping tool called Apify. On this platform, I found several Twitter
scrapers, after three tools I tried, I finally found one that fit to utilize.

My reflection on this process is to always look for the simplest method at the beginning, in this case
perhaps looking for a scraper tool first would be useful. This is to avoid wasting of much time in scraping
the data, although sometimes this process takes most of the project time. Thus, we can have more time
to find alternatives to learn and analyze the data. One thing that needs to be noted is the legality of this
scraper tool, especially if I conduct a more formal project regarding social media. It might be best to just
include API cost in the project cost.

My second learning goal is to increase my knowledge in analyzing data. In our group discussion, we
agreed to do a sentiment analysis of the Twitter and Facebook data. Besides that, because of my goal, I
tried to add more analysis which is Social Network Analysis (SNA). This is the method that I know from
‘Drone Emprit’ and I always wonder how to make it. After several tries using Python script, I failed to
make this, the graph is still not clear, and can’t see the connection between users. I also tried using
software tools that I found called ‘Gephi’, but it turned out I needed to process the Twitter data to a
certain structure that fit this software. Thus, I don’t have time to do this. As an alternative, I made a
comparison between two different datasets in Google Trends and Twitter about ‘stikstofcrisis’. After that,
I made a clustering analysis of Twitter data using k-means clustering.

One most important things that I learned during this text analysis is the importance of vectorizing the
text. Especially when you want to make SNA based on the text in the tweet. Learning by doing is really
working for me, but a note for me is to also try in the beginning to be patient to read the basic theory of
certain methods so I can find more effective ways to learn something new. More discussion with the
group probably will be beneficial if you are stuck in one thing. For future works, I think SNA is important
for Twitter analysis as it can map the opinion of the users to certain issues, the same as sentiment
analysis but with more presentable visualizations.

My final goal is to develop a story from data analysis and how to communicate it. I didn’t do much to
achieve this goal as my focus was more on my 2 nd goal. I just helped to write a description of the analysis
I’ve done. I miss the opportunity to try to present the result to the audience. In the future, I need more
courage to do this.

You might also like