Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

pturing social media data Department of Media Studies

University of Amsterdam
h Zeeschuimer and 4CAT New Media Reference Worksheet

/tinyurl.com/nmrw-zeeschuimer-tiktok Version: October 2023

ment of Media Studies


sity of Amsterdam
/www.uva.nl/en/disciplines/media-studies
www.mediastudies.nl/

d by Stijn Peeters
nt is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
e (CC BY-NC-ND 4.0)
In this worksheet you learn
● How to design a 'query' to capture the right social media items with;
● What TikTok's affordances for demarcating a dataset are;
● To use the Zeeschuimer browser extension to capture data from the web interface of TikTok and other
platforms;
● To export the data to the 4CAT research tool for further analysis and processing.

Preface
TikTok has rapidly become one of the most important social media platforms of the 2010s and
2020s. Though commentators took their time adjusting to the idea that TikTok could be more than
'just a dancing app', the platform has been used in many other ways virtually since its inception,
as a place to comment on current events, share news and generally do what people do on social
media.

This makes it, and platforms like it (such as its China-focused cousin Douyin) an attractive site for
digital methods research. However, TikTok and many other contemporary platforms actively
resist data capture; they provide no public API, blocks tools that automate browsing of the
platform, and usually terms of service forbid any form of automated data capture for any purpose.

Nevertheless, TikTok does have a fully functional website on which the platform can be browsed
and explored. It is then always possible to simply use the website to collect data manually,
copying information from your browser and downloading relevant pictures or videos. Or… you
can have a computer do this for you. It is possible to make your browser 'look over your shoulder'
while you are browsing the website, and automatically record the metadata of posts you see,
and then export it as a dataset you can analyse in any number of ways.

The goal of this worksheet is to explain how to do this. At the end, you will know how to set up a
browser to 'record' TikTok data, and export it in such a way you can analyse it with other research
tools or open it as a spreadsheet.

The tools discussed in this worksheet are not just compatible with TikTok, but can also be used to
capture data from other platforms (including e.g. Instagram and Twitter). This worksheet focuses
on TikTok but the methods and tools that are covered can also be used for data capture on these
platforms. See the website of the relevant tool, Zeeschuimer, for a list of platforms you can use it
with as well as a list of known limitations.
1. Design your query
You may have done social media research before; in many cases, the tools for this allow you to
enter some sort of query - for example a username, keyword or hashtag - and then produce
relevant results. That approach, which is in many ways similar to how a search engine works, does
not apply in this case. TikTok offers no such functionality to build a tool around. Instead, in the
method discussed here, you manually navigate to a relevant TikTok page, after which the
displayed items will be 'recorded' by your browser, and you can then download or export the
metadata of the posts you've seen for analysis. In other words, you could say that in this case, you
are the search engine.

Nevertheless, you still need a 'query'; a method for making TikTok show you the posts you are
interested in, so they can be captured. The query is then not (only) a text phrase you enter in the
search engine, but rather a strategy for making TikTok show you these posts. For example, if you
want to find out how TikTok responds to a particular politician, you need a strategy for finding
videos related to that politician. For this, you can make use of the various 'views' TikTok offers that
allow you to browse posts on the platform.

There is of course the 'feed', or the famous For You Page, the infinitely scrolling list of videos you
see on TikTok's front page or when you open the app. If you are interested in auditing the TikTok
algorithm, simply collecting videos from the For You Page could be enough. You can then analyse
what type of video gets recommended to you and why. But in many cases, you want to demarcate
your dataset a bit more specifically. There are a couple of other views TikTok offers on its content:

● A user's profile page, which lists their videos in reverse chronological order;
● A hashtag's page, which lists videos using that hashtag in some sort of algorithmic order.
Videos near the top usually, but not always, are more recent and/or have more views;
● The page for a particular sound, which lists videos using that sound in the same 'semi-
popular' order as hashtags;
● Search results for a particular keyword query. You can search for an arbitrary phrase on
TikTok, which will show you videos that use the phrase in their description or hashtags.

Note that some of these require you to log in at https://www.tiktok.com - so do this before starting
your data collection. Then think about how you can use these views to find relevant posts. For
example, if you're interested in the spread of a particular meme, maybe the page for the sound
associated with it is a good place to start; possibly in combination with a related hashtag. You can
start by compiling a list of relevant users, hashtags or sounds, which you will then browse later to
capture the relevant data.

It may seem attractive to use the keyword search feature to find relevant videos, but consider that
TikTok posts are primarily video and sound content, and a keyword search only looks at the
video's description and similar textual content. These often contain relatively little information. So
instead, try to think about how you can use the affordances of TikTok's other features to find
relevant videos; or for other platforms, take stock of the views the platform offers first, and then
consider which are useful.

2. Preparing

Installing Zeeschuimer
Once you know (roughly) what you will be looking for, you can start capturing some data to test if
your query design works. To capture the posts you view on TikTok, you can use Zeeschuimer, a
browser extension made for that purpose. Zeeschuimer is currently only compatible with the
Firefox browser. Firefox can be downloaded for free from https://getfirefox.com. It is often a good
idea to set up a separate 'research browser', to not let e.g. your browser history or previous
cookies interfere with data collection. If you are already using Firefox as your main browser and
want to set up a research browser, you can also install Firefox Developer Edition instead, to have a
fully separate browser just for research.

Install Zeeschuimer from https://tools.digitalmethods.net/zeeschuimer. You'll see a pop-up like


below - click 'Continue to Installation' and then 'Add' in the next screen where it asks for
permissions. If you want to know what Zeeschuimer does with these permissions, and what it does
with the data it captures, you can read a technical description on GitHub (where you can also find
the full source code of the extension).

After installing Zeeschuimer, there will be a new button in your browser toolbar you can click to
open the interface:

Note that Zeeschuimer does not work in Private Browsing/Incognito mode (consider using a
Research Browser instead). Clicking the Zeeschuimer button opens an interface where you can
keep track of how many items have been captured, and do something with the captured items.
When you first install Zeeschuimer, data capture is disabled. Once you're ready to start browsing
TikTok and capturing the posts you see, enable it with the toggle switches at the top of the
interface in the 'active' column.

Connecting Zeeschuimer to 4CAT


After capturing data with Zeeschuimer, you can export it directly from the interface as an NDJSON
file, but this is not an easy format to work with. If you want to download the captured data as a
CSV file, or process it in other ways (for example to download thumbnails of all videos), you can
upload the captured data to 4CAT, the capture & analysis toolkit to which Zeeschuimer is a
companion. If you do not have access yet, ask your instructor for guidance.

If you do have access to 4CAT, make sure you are logged in to it in the same browser you are
capturing your data in. To connect Zeeschuimer to 4CAT, you then need to enter the URL of the
4CAT interface you are using in the extension interface, at the top.

If you're not sure what to enter here, simply copy the URL of the 'Create Dataset' page in 4CAT into
this field. You are now ready to start capturing and analysing your data.
2. Capturing data
To start collecting data, simply browse TikTok in such a way that it shows you relevant videos
according to your query design. For example, go to the page for a particular hashtag, and wait for
it to finish loading. If you check the Zeeschuimer interface, you will see that the TikTok posts listed
on the website have been captured:

If you want to start over, simply use the 'Delete' button to remove all data captured so far.

Many pages allow you to load more items, either by scrolling to the bottom of the page or by
clicking a 'Load More' button. Use this to e.g. capture all relevant videos for a hashtag, by simply
scrolling down until no further videos can be loaded. If you are tired of scrolling (tip: your space
bar can be held down to scroll), take a look at FoxScroller, another browser extension that adds a
button to your browser toolbar with which you can automatically keep scrolling down on a
page.

Once you are satisfied with the posts you've seen, you can (as discussed) download the 'raw' data
you captured with the '.ndjson' button; or you can upload the dataset to 4CAT for further
processing and analysis. In the latter case, make sure that the relevant URL is entered in the '4CAT
server URL' field, and click the 'to 4CAT' button to upload the data.

The interface will keep you updated on the status of the upload. When finished, click the 'View
dataset' link to open the data in the 4CAT interface. Links are also available in the 'Uploaded
datasets' panel at the bottom of Zeeschuimer's interface.

Note that posts are captured in the order you see them in your browser. That can be relevant
particularly if you are interested in TikTok's algorithm; you can, for example, capture posts from
the For You Page and see what the algorithm ranks highest. The order of posts is also important if
you are analysing how TikTok engages with e.g. a political topic or current event. You may assume
that the further down a post is in your dataset, the more recent it is, as is the case in many other
tools. In TikTok, this is usually not the case (see section 1 for some guidance on how items are
sorted in different views). You may want to sort the posts manually after capturing by whatever
metric is relevant to you (e.g. date), as part of your analysis.
3. Analysis with 4CAT
Once you've created a dataset, there are a couple of ways you can start making sense of it. If
you've uploaded the data in 4CAT, you can go to the result page for the uploaded dataset to
inspect the data and perform relevant analyses. You can get there via the 'View dataset' link from
Zeeschuimer, or the 'Datasets' page in the 4CAT interface.

A good first step is to just take a look at the data; either with the 'Preview' button or by
downloading the data as a CSV file and opening it in e.g. Numbers, Excel or Google Sheets. This
will show you the metadata of each captured post. This metadata can be useful to get a sense of
what is interesting in your dataset. For example, you can take a look at the 'author' column to see
which accounts show up more often than others; or take a look at 'music_name' to see if there are
particular sounds that are used often and may be worth looking into further; or sort the data by
the 'likes' column to see which posts got the most engagement. The URL for each post is
furthermore recorded in the 'tiktok_url' column, and you can go there to review a particular post
from the data in its natural habitat.

Based on these first impressions, you may want to extend your query, and visit a couple of extra
pages on TikTok or scroll a bit further before you export the dataset again. Or conversely, maybe
some of the posts are not relevant, in which case you can use the 'Clear' button in Zeeschuimer to
start capturing a new dataset which you can then also upload to 4CAT.

If you are happy with your dataset, you can run a number of analyses on the data with 4CAT to get
further insights. While the goal of this worksheet is not to give a comprehensive overview of all of
4CAT's features, the following processors may be of special interest when dealing with TikTok
data:

TikTok datasets contain links to e.g. thumbnails and video files. Due to how
TikTok works, these expire; they will work for a couple of hours after capture, but
may become unavailable soon after. If you plan to use these URLs to e.g. download
images or videos, do so as soon as possible after capturing the data.

● Download Images can download thumbnails of each post, i.e. the first frame of the video.
The thumbnails (linked in the 'thumbnail_url' column of the dataset) can then be
downloaded as separate images in an archive file, or you can process them further, for
example by creating an Image Wall of the results or (for more advanced users) sorting
them with PixPlot or annotating them automatically with the Google Vision API.
● A Co-tag network can reveal what hashtags are used in the dataset; and whether there are
clusters of hashtags that are used together more often (which can indicate that there are
multiple 'communities' in your data that may be worth investigating on their own).
The .gexf file can be opened in Gephi for thorough analysis or visualised in 4CAT with
. This works best with larger datasets, of a couple of hundred posts or more.
● You can analyse the post descriptions as text with e.g. the Word Tree processor, to see in
what contexts a particular word is used; or Merge texts to combine all descriptions into
one long text string you can enter in e.g. a word cloud generator or Jason Davies' word
trees tool.
● You can Filter by date if you are interested in TikTok responses in relation to a particular
event, or to compare between data from different points in time.

Don't be afraid to start over multiple times. It often takes a couple of tries to find the right query
and know e.g. what hashtags to browse. Use your initial dataset to finetune your approach and
demarcate the case study.
4. A step-by-step example
Here's a step-by-step example of how you can put the above into practice. It assumes you've
followed the instructions in section 2, 'Preparing', so Zeeschuimer is installed in Firefox and linked
to 4CAT.

Our object of study for this example is the hashtag #ukraine. At the time of writing, the war in
Ukraine is one of the most prominent current events; it is interesting how people engage with this
topic on TikTok. Who is talking about this? What are they saying? Such questions can be answered
in a data-driven way by capturing the posts from TikTok and then analysing them.

1. Go to https://www.tiktok.com, and log in.

2. Open the Zeeschuimer interface with the button in the browser toolbar. Use the
'Delete all items' button to make sure we start with a blank slate, and enable data
collection by toggling the switch next to 'tiktok.com' at the top.

3. Navigate to the page you want to capture the data from, which is the hashtag page for
#ukraine in this case, at https://www.tiktok.com/tag/ukraine.

4. If you keep the Zeeschuimer interface open in a separate tab, you should see items coming
in. This is just for convenience - items will be captured whether you have the interface
open or not.
5. Scroll down on the TikTok page until you have a couple of hundred items. You can hold the
spacebar to scroll quickly, or use the FoxScroller browser extension.

6. In the Zeeschuimer interface, click the 'to 4CAT' button. The dataset will upload to 4CAT.
When it's done, click the 'View dataset' link in Zeeschuimer.

7. Click 'Preview' in the 4CAT interface to take a quick first look at the data, to get a sense of
what you are dealing with. For example, look at the 'author' column to see if there are
particular users that come up often, or the 'body' column to see what languages are
represented in the dataset. Are there other hashtags or particular users that would
perhaps be interesting to add to the dataset for a more complete picture of the discourse?

8. Let's take a closer look at the hashtags. Scroll down on the 4CAT result page until you see
the 'Count values' processor listed. Open the 'Options' and choose 'Hashtags' as the value
to aggregate. Limit the analysis to the 25 items. This will give you the top 25 most-used
hashtags in the dataset.

9. Inspect the result by downloading the result csv file, or with the button. What do you
see? At the time of writing, the top hashtags all refer to American media: e.g. #cbsnews,
#viceworldnews, and #nbcnews. This could indicate that your query needs refinement,
because it seems to be biased towards American content; consider for example that large
parts of the world might refer to the country as #україна instead. Or maybe the TikTok
sorting algorithm is biased towards American accounts?

10. Scroll down again, until you see the 'Download TikTok Images' processor. Open the
options and configure the amount of images you want to download and which type of
image. For this sample, keep the amount of downloaded images at its default or lower.
Run the processor. 4CAT will now download thumbnails for all videos.

11. When this is done, click 'More' to see what you can do with the downloaded images. Create
an Image wall. The result will give you a quick impression of the type of content in your
dataset. Do you see a particular style of video that is particularly prominent? Are people
sharing footage of the war, or memes, or recording themselves? What does this tell you
about how people are responding to the war in Ukraine on TikTok?
From here on, you can branch out in several directions. Maybe you want to create a new dataset
that also includes other related hashtags that you found. Or perhaps you saw an interesting type
of video in the thumbnails, and want to look into that in more detail. Is a particular sound being
used often in this context? Then maybe 'following' that lead is promising. Iterate until you have
something that seems representative and worth drawing conclusions from!
Further notes & reading
As mentioned, Zeeschuimer can also be used to capture data from Instagram, LinkedIn, Twitter,
Douyin, Imgur, and 9Gag. That is not the focus of this worksheet, but the method in that case is
largely identical to the method for capturing TikTok data described here - you would just visit
instagram.com, linkedin.com, or twitter.com, et cetera, instead of tiktok.com. The main difference
is that other platforms offer different "views" on their content; for example, Instagram does not
offer a 'true' search function, but you can view all posts tagged with a particular location. LinkedIn
in the meantime has both an algorithmic feed and a search function that allows searching for
posts by particular people or about specific topics, from which Zeeschuimer can then capture the
metadata of the included posts.

The following reports discuss a number of interesting ways to build a case study around data
captured from TikTok:

● Lucia Bainotti, et al. "Tracing the genealogy and change of TikTok audio memes" Digital
Methods Winter School 2022.
● Salvatore Romano, Marc Faddoul, et al. "Mapping Ban and Shadow-Ban on TikTok".
Digital Methods Winter School 2022.

There are some short tutorial videos about 4CAT if you want to know more about how the tool
works:

● 4CAT Tutorials on YouTube

For a refresher on how to work with spreadsheets and networks, you can take a look at the
relevant other Media Studies worksheets:

● Reference Worksheet I: Data Management


● Reference Worksheet III: Data Visualisation, section 5, 'Gephi'

You might also like