Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/354560416

TikTok and Education: Discovering Knowledge through Learning Videos

Conference Paper · July 2021


DOI: 10.1109/ICEDEG52154.2021.9530988

CITATIONS READS

10 5,560

3 authors, including:

Angel Fiallos Ordoñez Stalin Figueroa


Escuela Superior Politécnica del Litoral (ESPOL) Universidad de las Fuerzas Armadas-ESPE
24 PUBLICATIONS   49 CITATIONS    7 PUBLICATIONS   21 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Control de Activos Fijos para Municipios de los Cantones Pequeños View project

ACCESIBILIDAD DE LOS SITIOS WEB DESARROLLADOS CON WORDPRESS View project

All content following this page was uploaded by Angel Fiallos Ordoñez on 16 September 2021.

The user has requested enhancement of the downloaded file.


Tiktok and Education: Discovering Knowledge
through Learning Videos.
Angel Fiallos Carlos Fiallos Stalin Figueroa
IEEE, Ecuador Section Escuela Superior Politecnica del Universidad de las Fuerzas Armadas,
Guayaquil, Ecuador Litoral, ESPOL ESPE
angel.fiallos@ieee.org Guayaquil, Ecuador Latacunga, Ecuador
cafiallos@espol.edu.ec sgfigueroa@espe.edu.ec

Abstract— TikTok is a video-sharing social networking message conveyed is also supported by a combination of
service that is rapidly growing in popularity. It was the second music, audio, and text snippets included in the video. To
most downloaded app in the app world in 2020. While the achieve this goal, we propose a framework that includes some
platform is known for having users post videos of themselves steps, such as gathering the TikTok videos, processing text
dancing, lip-syncing, or showing off other talents, videos of users and audio metadata information that is part of videos using
sharing specific knowledge have increased because of initiatives computer vision and audio recognition techniques. It also
such as #learnontiktok. This study aims to assess the types of contemplates processes for classifying the information
knowledge and learnings shared on TikTok and the profile of its
collected into knowledge categories using natural language
authors. We collected a set of videos and through and innovative
processing and machine learning techniques.
framework implementation, using computer vision, natural
language processing, and machine learning techniques, we Also, we researched the demographics of authors and
shown the main teaching topics published in the #learnontiktok calculated statistics on areas of knowledge most viewed by
campaign and the disciplines with the highest audience users. We explored the idea that text metadata and audios
engagement. available in the videos are the best predictors of the knowledge
areas, much more than the description of each video written
Keywords—component, formatting, style, styling, insert (key
by authors.
words)
The rest of this work is structured as follows: Section 2
I. INTRODUCTION describes the related work, Section 3 presents the proposed
TikTok is a social network launched in the Chinese market methodology, Section 4 describes the results of the case study
in 2016 (as Douyin) and internationally in 2017 (as TikTok). analysis, and Section 5 presents the conclusions and future
In 2018, it was the most downloaded mobile app in the United work components, incorporating the applicable criteria that
States, and is currently available in more than 150 countries, follow.
and has more than 800 million monthly active users [1]. Of
these, 41% are between 16 and 24 [2], a younger demographic
than we find on other social networks. TikTok allows users to II. PREVIOUS WORKS
create and share short videos (15 to 60 seconds) that are quick
and easy to edit with various effects and sounds included in its Medina et al [7]. explore TikTok videos related to US
gallery. politics and evaluated textual, aural, and visual information
extracted from them, and analyze the different levels of
In May 2020, TikTok launched LearnOnTikTok program, communication made possible by the platform design and
which consists of educational videos to facilitate learning especially concentrated on TikTok’s unique duet feature.
during COVID-19 lockdowns [3].These videos are authored Escamilla-Fajarde et al [8] explores TikTok as a teaching-
by professionals from different disciplines, students, and other learning tool in the body expression courses of the sports
users, who have shared their knowledge to this social science degree because of the ease with which the platform
network's audiences [4]. The videos linked to the hashtag adapts to the subject's expressive and creative content through
#learnontiktok, have varied topics: from chemistry music and movement.
experiments, cooking recipes, health tips, learning other
languages, to creating origami figures, all created by its users. Dyer et al [9], analyzes, TikTok’s public health impact
during the COVID-19 pandemic, and how the medical
Previous studies confirm that the use of video mini- community use the unique characteristics of TikTok to deliver
lectures improves participants' satisfaction [5]. Storytelling information to users, as well as for targeted training for
becomes vital to connect these learning elements together, and medical education. Medical professionals have used the
videos provide a natural way to tell stories. Educational video platform to share videos with the public on topics ranging
motivation is to engage and drive the learner to feel ownership from vaping associated lung injury and the importance of a
of their own learning and associate themselves with the primary care doctor to her video explaining symptoms of
course's stories [6]. COVID-19. Ponzanelli et al [10]. developed a recommender
system to predict relevant YouTube videos. In addition to the
TikTok has the ideal format and tools for authors to create
audio transcripts, they used an OCR tool to transfers the actual
short educational videos. Currently, LearnOnTikTok videos
video information (e.g., slides or subtitles) into text. They
have 72 billion views, and hundreds are uploaded every day.
focus on showing relevant StackOverflow posts for random
However, inferring the knowledge areas from the learning
YouTube videos and tutorials.
content available in posts can be challenging since the

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


III. METHODOLOGY C. Data Collecion
A. TikTok Platform We used scraping algorithms developed for this purpose.
TikTok offers users a unique way to share creative videos First, we selected a sample of 1495 TikTok posts using the
of themselves, their surroundings, or a collection of external hashtag #learnontiktok. The hashtag search yields a limited
audiovisual content. external audiovisual content. The most number of videos and it is not clear how this limit is defined.
straightforward videos consist only of text superimposed on a Popularity may play a role as a hashtag search showcases the
colored background. Videos can be more complex by most popular videos.
including images, video clips, and sounds. Images and video
sequences can be modified using the application's voice Next, we collected posts metadata, such as video file, post
effects, picture filters, and video speed controllers. The description, counts of likes, date, number of views, author
maximum length of a video post is 60 seconds and can consist information, and profile picture. The period of the posts was
of a collection of videos that, combined, tell a story [11]. from June 2020 to January 2021.
When users post videos, they can add a caption with hashtags D. Video Indexer Process.
to describe their clips. As on Twitter, the most commonly used
hashtags represent topics that are trending on the platform, and To get the audio and text information from the video files, we
while on Instagram, videos are categorized according to their use the MS Video Indexer platform [13]. Video Indexer uses
hashtags [12]. AI to automate the extraction of relevant metadata from
videos to make these processes more efficient. It can pull
TikTok is considered a social media platform because, like audio transcripts, detect faces within videos, and analyze text.
Twitter and Instagram, its users have a social group of Once the list of videos has been sent, the processed
followers and other users they follow. However, the main information is stored in a database together with the previous
feature that differentiates TikTok from other social networks metadata. The figure shows an example of the text snippets
is the videos' background music representing the core message
present in the videos. This information and the transcription
that users want to convey. Users can choose the background
music for their videos from a wide variety of music genres. of the audios are acquired for further processing.
From a wide variety of music genres, and can even create
original sound clips, original sound clips. Any sound clip,
including users' voice messages, can be selected by other users
for their videos. In many videos, dance music, a lip-sync
battle, or the backdrop to a comedy sketch. However, sound
can also function as a story builder and can convey a specific
message.
Users access to content by viewing a feed of videos generated
by an algorithm on the 'For You' page. Although it is not
described how the algorithm works, the videos that appear to
the user are primarily based on a central recommendation
algorithm rather than on the user's social network activities.
B. Proposed Framework
Figure 1 represents the phases followed in our work. The
pipeline starts with data collection from the TikTok platform.
Then it continues with the use of processes for obtaining
Fig. 2. TikTok videos screenshots wuth text snippets
audio and text metadata from the video files. Also, trained
multi-label classification models are available to predict the E. Multi-Label Categorization
knowledge areas of the educational videos.
Following the Kim approach [14], we designed a simple CNN
network composed for an input layer with five different
ngrams window sizes and one layer of convolution on top of
word vectors obtained from Word2Vec unsupervised neural
language model [15]. The training dataset was composed of
text sentences tagged in 20 specific areas from Wikipedia,
such as medicine, food and drink, legal, physics, chemistry,
among others.

To run the experiment, first, we trained the text sentences


dataset using 100-dimensional word2vec embeddings. Next,
we used Gensim library [16] and Keras framework [17] to
build the convolutional neural network. During the training,
the model was configured to divide data into training and test
datasets in a ratio of 80/20. The model evaluates itself after
Fig. 1. Framework for processing TikTok Posts every epoch and adjusting parameters according to its loss
function. The result is a set of parameters that have a particular
ability to classify to new values, and the validation accuracy
measured this ability. The networks try to predict 0 or 1 values Next, a total of 542 unique photos of user-profiles were
on every label, and the model uses the confidence values to retrieved from the TikTok post collection. The Face API
produce a ranking. Finally, we used a sigmoid activation process was applied to the profile's photos for the recognition
function to treat the labels independently. Next, the multi- of facial properties. The rest of the photos of user-profiles,
among other reasons, did not show the user's face, belonged
label text classification trained model was applied to to business profiles, or had low quality, did not identify
keyworks identified using Video Indexer process. gender and age properties. Table 1 shows the percentages
F. Exploratory and Demographic Analysis belonging to the user gender groups, and Table 2 shows the
percentages belonging to the user groups by age range:
TikTok videos are rich in features and extra pre-processing
steps were required to extract the information for analysis. TABLE 1. PERCENTAGES OF DETECTED GENRE.
From the original videos that included the user’s face profiles,
Gender Count Percent
we used the images and processed them via Microsoft’s
Female 249 45.95%
Azure Face API [18], which allows gender, and age to be Male 292 54.05%
extracted. Once the process is finished, we selected the
photos in which the exposure value was greater than 0.5 and
the gender and age properties could be detected. Figure 2. TABLE 2. PERCENTAGES OF DETECTED AGE RANGES.
Shows a response from Face API.
Age Count Percent
Under 17 27 5.16%
18 – 34 420 77.49%
35 – 55 89 16.42%
56 – 90 6 0.01%

Then, the tags belonging to the elements identified by the


video indexer process were selected for each of the videos.
Terms such as "person," "text," "indoor," "clothing", "hair"
can be identified, which relate to characteristics of the author
and the background but continue to give a weak idea about
the topics and area of knowledge covered in the video. Figure
5 shown a histogram with the most relevant keywords.
Figure 3. Face detection algorithm applied in user photo profile.

We also explored the most relevant terms in users' video


descriptions. For this purpose, we use natural language
processing and data mining techniques.

IV. RESULTS AND DISCUSSION


Figure 4. shows a word cloud with the most relevant terms
related to video descriptions registered by authors. Only
terms identified as nouns, through Part of Speech Tagging
algorithms [19], were selected for analysis. Some words such
as “psychology”, “food”, “life”, “amazon”, “fun”, could be
identified, which give a weak idea about the topics related to
educational videos.

Fig. 5 Histogram with the most relevant labels from videos

Next, we applied the multiclassification model to the


keywords obtained from the identification of the text snippets
and the text transcription of the audios to assign the
knowledge areas to each video. The model returns a set of
probabilities labels in JSON format, and the following
knowledge areas were identified as having the highest counts.
It can be seen in Figure 6, that Medicine, Food and Drink,
Health, Cooking, Biology, Chemistry, among others, are the
Fig. 4 Word cloud based on videos descriptions
most relevant categories.
V. CONCLUSIONS
TikTok is a platform that, in addition to entertaining videos,
gives a young and global audience access to a new format of
short educational videos created by expert authors. This
perspective presents several opportunities for the
dissemination of knowledge in various fields of science
concisely and effectively.

The proposed framework allows us to identify in an


automatic way the main areas of knowledge associated with
educational videos on the TikTok platform and which areas
are the most preferred by users. This information would
allow us to add efforts in important knowledge areas, but
which are not widely accepted or have few content creators.
In our sample, we find a more extensive collection of Health
Sciences videos and related to STEM areas, even higher than
social sciences such as law and education, which gives an
impression of the potential of this type of videos for science
learning. The study also confirmed that most authors are
people under the age of 34, who also represent the largest
Fig. 6 Histogram with the most relevant knowledge areas.
audience on the social network.
Finally, we associated the knowledge areas with the number
This study also supports the idea that audio and text metadata
of likes for each video in order to establish the categories that
information available in short TikTok videos contains
had the highest user engagement. The results are shown in
concepts that give rise to a better understanding of the video
Table 3. Medicine, Food and Drink, Health, Chemistry, and
learning topics than even the descriptions registered by the
Technology are the areas with the best agreement by the
authors. In future work, we will explore how to measure the
TikTok audience.
message's understanding in the videos from both approaches.
TABLE 3. PERCENTAGES OF KNOWLEDGE AREAS WITH WITH THE HIGHEST
ENGAGEMENT
VI. REFERENCES
Knowledge Area Count Likes Percentage

Medicine and Healthcare 92734900 16.99% [1] A. Meola, "Analyzing Tik Tok user growth and usage
13.53% patterns in 2020," 2020. [Online]. Available:
Food & Drink 73824900
https://www.businessinsider.com/tiktok-marketing-trends-
Health 58686800 10.75% predictions-2020.
Science/Chemistry 41561300 7.62% [2] E. Wang, A. Alper, G. Roumeliotis and Y. Yang, "U.S. opens
national security investigation into TikTok," Reuters, 2019.
Technology/Engineering 35218500 6.45%
[3] A. Hutchinson, "TikTok Announces #LearnOnTikTok
Cooking/Recipes 34245200 6.28% Initiative to Encourage Education During Lockdowns,"
5.80% Social Media Today, 2020.
Science/Physics 31638100
5.33% [4] TikTok, "Refreshing our policies to support community well-
Dating and Relationships 29066800 being," [Online]. Available: https://newsroom.tiktok.com/en-
Human Biology 29032700 5.32% us/refreshing-our-policies-to-support-community-well-
4.81% being. [Accessed 2021].
Science/Chemistry 26269400
4.42%
[5] W.-J. Hsin, "Short videos improve student learning in online
Science/Food Science 24111400 education," Journal of Computing Sciences in Colleges,
Fashion and Style 23957100 4.39% 2013.
4.18% [6] Y. Guseva and T. Kauppinen, "Learning in the Era of Online
Environment 22829600
Videos: How to Improve Teachers’ Competencies of
Education 22558900 4.13% Producing Educational Videos," in Fourth International
Conference on Higher Education Advances, 2018.
[7] J. Medina, O. Papakyriakopoulos and S. Hegelich, "Dancing
This study is not without limitations. It was conducted on a to the Partisan Beat: A First Analysis of Political
limited number of videos, and a validation of the videos by Communication on TikTok.," in 12th ACM Conference on
a multidisciplinary group of experts, such as ground truth, Web Science (WebSci '20), 2020.
would have been desirable. However, as an exploratory [8] E.-F. Paloma, "Incorporating TikTok in higher education:
study, it allows us to get an adequate insight related to the Pedagogical perspectives from a corporal expression sport
areas of science to which each video refers. sciences course," Journal of Hospitality, Leisure, Sport &
Tourism Education, 2021.
[9] G. Dyer and M. Gottlieb, "Is TikTok The Next Social Media [15] N. Azam and J. Yao, "Comparison of term frequency and
Frontier for Medicine?," AEM Education and Training., document frequency based feature selection metrics in text
2020. categorization," Expert Systems with Applications, volume
[10] L. Ponzanelli, G. Bavota and A. Mocci, "CodeTube: 39, pp. 4760 - 4768, 2012.
extracting relevant fragments from software development [16] R. Řehůřek and P. Sojka, "Software Framework for Topic
video tutorials," in In Proceedings of the 38th International Modelling with Large Corpora,," in Proceedings of the LREC
Conference on Software Engineering Companion (ICSE '16), 2010 Workshop on New Challenges for NLP Frameworks,,
New York, NY, USA, 2016. Malta, 2010.
[11] R. Broderic, "Forget The Trade War. TikTok Is Chinas’ Most [17] F. Chollet, "Keras," 2015. [Online]. Available: Available:
Important Export Right Now.," BuzzFeed.News, 2019. https://github.com/fchollet/keras..
[12] B. Chandlee, "Understanding our policies around paid ads. [18] Microsoft., "Microsoft Cognitive Services.," [Online].
TikTok Newsroom (2019).," 2019. [Online]. Available: Available: https://azure.microsoft.com/es-
https://newsroom.tiktok.com/en-us/understanding-our- es/services/cognitive-services/. [Accessed 2019].
policies-around-paid-ads. [Accessed 2020]. [19] K. Toutanova, D. Klein, C. Manning and Y. Singer, "Feature-
[13] Microsoft Azure, "Video Indexer," [Online]. Available: rich part of speech tagging with a cyclic dependency
https://azure.microsoft.com/es-es/services/media- network," in In Proceedings of the Conference of the North
services/video-indexer/. [Accessed 2021]. American Chapter of the Association for Computational
[14] Y. Kim, "Convolutional Neural Networks for Sentence Linguistics on Human Language Technology, 173–180,
Classification," in Proceedings of the 2014 Conference on 2003.
Empirical Methods in Natural Language Processing , Doha,
Qatar, 2014.

View publication stats

You might also like