Professional Documents
Culture Documents
Regression
Regression
1
COMSATS UNIVERSITY ISLAMABAD
Question:
Acquire a dataset (kaggle, GitHub etc), Apply linear regression to make predictions. Also
use ggplot tfor data visualization from multiple aspects.
Your assignment must contain the following:
Data set Description
Regression analysis with and without R libraries
Multiple pairs of variables to form a preditor-target pair for regression.
Various fitting of the model
A paragraph explaining the results by understanding R output for regression
Visualization through graphs
a. Data set Description
This is a dataset of Spotify tracks over a range of 125 different genres. Each track has some
audio features associated with it. The data is in CSV format which is tabular and can be loaded
quickly. A Spotify tracks dataset include the following types of information:
1. Track Information:
Track ID
Track Name
Artist(s)
Album Name
Release Date
Duration
2. Audio Features:
Danceability
Energy
Loudness
Speechiness
Acousticness
Instrumentalness
Liveness
Valence (Positiveness)
2
COMSATS UNIVERSITY ISLAMABAD
Tempo
3. Popularity Metrics:
Popularity Score
Number of Plays/Streams
Charts Rankings
4. Album Information:
Album ID
Album Release Date
Total Tracks in the Album
Album Popularity
5. Genre Information:
Genre(s) associated with the track
6. User-Related Information:
User Ratings/Reviews
Playlists the track is featured in
7. Metadata:
Data related to playlists, user interactions, etc.
8. External Links:
Links to external resources (e.g., Spotify URLs)
3
COMSATS UNIVERSITY ISLAMABAD
Regression Analysis without R libraries (Base R):
4
COMSATS UNIVERSITY ISLAMABAD
Using three different predictor-target pairs:
5
COMSATS UNIVERSITY ISLAMABAD
7
COMSATS UNIVERSITY ISLAMABAD
(You have not taught fitters yet but it was required in the assignment so we worked on it
and understood the concept)
1. Polynomial Regression:
8
COMSATS UNIVERSITY ISLAMABAD
2. Interaction Terms:
9
COMSATS UNIVERSITY ISLAMABAD
3. Compare Models:
The regression analysis results from R provide insights into the relationship between predictor
variables and the target variable, often presented in the summary output. For instance,
considering a model predicting Spotify track popularity based on danceability, the coefficient for
danceability signifies the expected change in popularity for a one-unit increase in danceability. A
low p-value indicates the statistical significance of this relationship, and the R-squared value
gauges how well the model explains the variability in popularity. In interpreting these results,
one assesses both the statistical and practical significance, considering the context of the specific
dataset. Exploring alternative model specifications, such as polynomial terms or interactions,
contributes to a comprehensive understanding of the predictive relationships within the data.
10
COMSATS UNIVERSITY ISLAMABAD
e. Visualization through graphs:
11
COMSATS UNIVERSITY ISLAMABAD
Residual plot:
12
COMSATS UNIVERSITY ISLAMABAD
13