Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

THE SPOTIFY DATASET

Anna Nguyen
2023-03-06
Music plays an essential role in people’s lives. It helps modify feelings and moods. Sometimes, a song with a fast and high tempo can cheer you
up, while one that is slow and sad lyrics may bring your mood down. Because of its importance, it is also interesting to explore the popularity of
music in society, for example, the genres, artists, energy, or speed in different periods of time. In this analysis, the main purpose will be to find the
music trends in society by looking at the genres and other common characteristics of the most common pieces of music.

Overview

## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output


## %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──


## ✔ dplyr 1.1.0 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.1 ✔ tibble 3.1.8
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::group_rows() masks kableExtra::group_rows()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/ conflicted package ]8;; to force all conflicts to become errors
## Loading required package: usethis
##
## Skipping install of 'MetBrewer' from a github remote, the SHA1 (3482617e) has not changed since last install.
## Use `force = TRUE` to force installation

The Spotify data consists of over 5000 songs from the late 1950s to the 2020s. It illustrates not only the songs’ names and artists but also deeper
insight like the genres, tempo, mode, or highness. This helps the viewers, and researchers, have an overview of the world’s musical taste and
how it has changed over the decades.

The dataset is a combination of 23 different categories, from the most general and common factors of the songs, such as artist and album, to the
more technical aspects, like instrumentalness or liveness.

The table below is an overview of how variables displayed in the Spotify dataset.

uncleaned_spotify <- read.csv("/Users/Anna/Desktop/Math329/spotify_songs.csv")


uncleaned_spotify %>% head(5) %>% kbl() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed",
"responsive"), full_width = F, font_size = 12)

X track_id track_name track_artist track_popularity track_album_id track_album_name track_album_release_date playlist_name playlist_id playlist_genre playlist_subgenre danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo duration_ms

1 6f807x0ima9a1j3VPbc7VN I Don’t Care Ed Sheeran 66 2oCs0DGTsRO98Gh5ZSl2Cx I Don’t Care (with 2019-06-14 Pop Remix 37i9dQZF1DXcZDD7cfEKhW pop dance pop 0.748 0.916 6 -2.634 1 0.0583 0.1020 0.00e+00 0.0653 0.518 122.036 194754
(with Justin Justin Bieber) [Loud
Bieber) - Luxury Remix]
Loud Luxury
Remix

2 0r7CVbZTWZgbTCYdfa2P31 Memories - Maroon 5 67 63rPSO264uRjW1X5E6cWv6 Memories (Dillon 2019-12-13 Pop Remix 37i9dQZF1DXcZDD7cfEKhW pop dance pop 0.726 0.815 11 -4.969 1 0.0373 0.0724 4.21e-03 0.3570 0.693 99.972 162600
Dillon Francis Remix)
Francis
Remix

3 1z1Hg7Vb0AhHDiEmnDE79l All the Time Zara Larsson 70 1HoSmj2eLcsrR0vE9gThr4 All the Time (Don 2019-07-05 Pop Remix 37i9dQZF1DXcZDD7cfEKhW pop dance pop 0.675 0.931 1 -3.432 0 0.0742 0.0794 2.33e-05 0.1100 0.613 124.008 176616
- Don Diablo Remix)
Diablo
Remix

4 75FpbthrwQmzHlBJLuGdC7 Call You The 60 1nqYsOef1yKKuGOVchbsk6 Call You Mine - The 2019-07-19 Pop Remix 37i9dQZF1DXcZDD7cfEKhW pop dance pop 0.718 0.930 7 -3.778 1 0.1020 0.0287 9.40e-06 0.2040 0.277 121.956 169093
Mine - Chainsmokers Remixes
Keanu Silva
Remix

5 1e8PAfcKUYoKkxPhrHqw4x Someone Lewis Capaldi 69 7m7vv9wlQ4i0LFuJiE2zsQ Someone You 2019-03-05 Pop Remix 37i9dQZF1DXcZDD7cfEKhW pop dance pop 0.650 0.833 1 -4.672 1 0.0359 0.0803 0.00e+00 0.0833 0.725 123.976 189052
You Loved - Loved (Future
Future Humans Remix)
Humans
Remix

As we can notice, the track_name category is uncleaned because it contains not only the songs’ names but also the featuring artist and remixing
person. In some rows, there are even names of the album and the remastered version. Therefore, I’m going to clean up the track_name by
separating the unnecessary parts into different columns. Also, since I may only need the year released the tracks, I will remove the months and
days. The other categories that I also filter out are all the ids.

After cleaning up, the dataset will now look like this.

X track_name track_artist track_popularity track_album_name year playlist_name playlist_genre playlist_subgenre danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo duration_ms

1 I Don’t Care Ed Sheeran 66 I Don’t Care (with 2019 Pop Remix pop dance pop 0.748 0.916 6 -2.634 1 0.0583 0.1020 0.00e+00 0.0653 0.518 122.036 194754
Justin Bieber) [Loud
Luxury Remix]

2 Memories Maroon 5 67 Memories (Dillon 2019 Pop Remix pop dance pop 0.726 0.815 11 -4.969 1 0.0373 0.0724 4.21e-03 0.3570 0.693 99.972 162600
Francis Remix)

3 All the Time Zara Larsson 70 All the Time (Don 2019 Pop Remix pop dance pop 0.675 0.931 1 -3.432 0 0.0742 0.0794 2.33e-05 0.1100 0.613 124.008 176616
Diablo Remix)

4 Call You The 60 Call You Mine - The 2019 Pop Remix pop dance pop 0.718 0.930 7 -3.778 1 0.1020 0.0287 9.40e-06 0.2040 0.277 121.956 169093
Mine Chainsmokers Remixes

5 Someone Lewis Capaldi 69 Someone You 2019 Pop Remix pop dance pop 0.650 0.833 1 -4.672 1 0.0359 0.0803 0.00e+00 0.0833 0.725 123.976 189052
You Loved Loved (Future
Humans Remix)

Rankings and Number of Songs


When looking at a list of music, people tend to focus on the songs with the highest ratings. This action helps them to learn more about the most
popular type of music at the moment. Moreover, this also encourages curiosity so that they may want to try listening to high-ranking pieces of
music.

The following scatterplot shows the top 10 artists with the highest average rankings the whole time. The higher the popularity rates, the more
well-known that artist’ songs are.

Now look at the top 10 artists with the highest number of songs.

From the two plots above, we can see the more songs the artists have, the lower the average ranking. If we only focus on rankings, a lot of artists
appear to be not so well-known for everyone since they only published a few songs for a long time. On the other hand, the “Top 10 Artists with
the most songs” has a well-known list of artists. Also, their average rankings are reasonable and high compared to a large number of music
pieces. Therefore, I will go deeper into those with the most songs published because I believe viewers will get used to these names more.

Popular Genres and Subgenres


To recall, the purpose of this analysis is to explore popular music taste. We will start with the most basic characteristic of songs: genre. If all the
top-place artists have the same genre of music used, we can conclude what the music trends are.

Let’s see the genres portion of the top 100 artists with the most songs.

The pie chart illustrates top 3 having the highest number of published songs are EDM, Latin, and Pop. In short, we can conclude people tend to
prefer these genres of music more than the other types. But to know when they are getting popular with the audience, I will divide them into years
and investigate the first time they appeared. The table below lists the number of songs each genre has each year. We will focus on the top 3 of
EDM, Latin, and Pop mainly.

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

year edm latin pop r&b rap rock total

1965 NA NA NA NA NA 4 4

1966 NA NA NA NA NA 6 6

1968 NA NA NA NA NA 1 1

1969 NA NA 1 NA NA 28 29

1970 NA NA 1 NA NA 8 9

1971 NA NA NA NA 1 33 34

1972 NA NA NA 1 NA 2 3

1973 NA NA NA NA NA 22 22

1974 NA NA 1 NA 1 31 33

1975 NA NA NA NA NA 28 28

1976 NA NA NA NA NA 23 23

1977 NA NA NA NA NA 17 17

1978 NA NA 2 NA NA 34 36

1979 NA NA NA NA NA 8 8

1980 NA NA NA NA NA 19 19

1981 NA NA 3 NA NA 6 9

1982 NA NA 1 NA NA 18 19

1983 NA NA 1 2 NA 11 14

1984 NA 2 2 2 NA 42 48

1985 NA 3 NA 3 NA 18 24

1986 NA 1 2 12 NA 20 35

1987 NA 2 5 NA NA 38 45

1988 NA NA NA 27 2 9 38

1989 NA 5 NA 14 NA 2 21

1990 NA 5 1 8 NA 7 21

1991 NA 3 NA 1 4 27 35

1992 NA NA NA 9 NA 2 11

1993 NA 9 NA 11 18 20 58

1994 NA 3 NA 1 29 16 49

1995 NA 1 NA 4 2 NA 7

1996 NA 1 1 3 28 8 41

1997 NA NA NA 22 12 1 35

1998 NA 2 NA 3 26 13 44

1999 NA 15 NA NA 12 8 35

2000 1 3 1 11 9 6 31

2001 NA 2 2 NA 4 7 15

2002 1 1 6 11 4 2 25

2003 1 11 NA 9 33 NA 54

2004 1 21 3 16 20 19 80

2005 NA 15 1 4 50 39 109

2006 NA 21 15 10 20 3 69

2007 4 19 1 2 20 7 53

2008 4 53 8 15 16 29 125

2009 7 28 2 3 17 12 69

2010 5 49 29 8 9 14 114

2011 22 17 14 7 14 18 92

2012 58 40 55 27 37 NA 217

2013 81 29 33 5 14 NA 162

2014 181 28 64 4 35 18 330

2015 135 53 88 26 53 NA 355

2016 160 75 106 66 56 1 464

2017 100 83 125 54 66 4 432

2018 107 102 158 67 70 6 510

2019 287 220 232 127 95 16 977

2020 19 17 15 20 5 NA 76

Interestingly, the EDM genre is introduced the latest compared to the others but has the highest number of songs. However, if we also look at the
remaining genres, we can see the earlier they are published, the less popular they get over time. Looking at the plot below, we can have a more
open view of their appearance and development.

## Warning: Removed 34 rows containing missing values (`geom_line()`).

## Warning: Removed 18 rows containing missing values (`geom_line()`).

## Warning: Removed 3 rows containing missing values (`geom_line()`).

## Warning: Removed 6 rows containing missing values (`geom_line()`).

## Warning: Removed 5 rows containing missing values (`geom_line()`).

## Warning: Removed 1 row containing missing values (`geom_line()`).

DevelopmentofGenresoverDecades
1.00

0.75
Genres
-EDM
Totalsongs

Latin

50 -Pop

R&B

Rap

Rock

025

00

1980 2000 020

Decade

The line graph shows the portion of the 6 genres in music that changed rapidly. For example, if Rock was in first place in the 1960s decades, it is
now the lowest genre with rare publications. In contrast, EDM appeared much later, but right now the top place among them all.

Conclusion
In summary, the music taste and trend are rapidly changed throughout the decades. People are more likely to prefer the more modern style of
music with positive vibes and a fast tempo. This may be because these pieces can help cheer people up, and the more complicated music may
fascinate young generations in the current time.

Limitations
Although we can see the popular genres and therefore figure out the music tastes of the world over time, the data we used is still limited. Firstly,
we only use the artists with the most songs to be the foundation, and in the first stage of the analysis, we cannot look at the group of artists with
the highest ratings. Secondly, the illustration of the 6 genres was used on the whole dataset instead of the top-rated songs. This can be biased
because it is also important to consider the popularity of each song in society. However, we can only base on a group of 100 artists to conclude
because of the limited size of the program.

You might also like