Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

COMSATS UNIVERSITY ISLAMABAD

DATA SCIENCE FUNDAMENTALS


(THEORY)
CLASS ASSIGNMENT: 2
Name: Nayab Khalid
Quraisha Azam
Tayyaba Ali
Registration No.: SP23-BDS-039
SP23-BDS-042
SP23-BDS-060
Class: BSDS-2A

Instructor’s name: Dr. Hufsa Mohsin

Date of submission: 28th November, 2023

1
COMSATS UNIVERSITY ISLAMABAD
Question:
Acquire a dataset (kaggle, GitHub etc), Apply linear regression to make predictions. Also
use ggplot tfor data visualization from multiple aspects.
Your assignment must contain the following:
 Data set Description
 Regression analysis with and without R libraries
 Multiple pairs of variables to form a preditor-target pair for regression.
 Various fitting of the model
 A paragraph explaining the results by understanding R output for regression
 Visualization through graphs

a. Data set Description
This is a dataset of Spotify tracks over a range of 125 different genres. Each track has some
audio features associated with it. The data is in CSV format which is tabular and can be loaded
quickly. A Spotify tracks dataset include the following types of information:

1. Track Information:
 Track ID
 Track Name
 Artist(s)
 Album Name
 Release Date
 Duration

2. Audio Features:
 Danceability
 Energy
 Loudness
 Speechiness
 Acousticness
 Instrumentalness
 Liveness
 Valence (Positiveness)

2
COMSATS UNIVERSITY ISLAMABAD
 Tempo

3. Popularity Metrics:
 Popularity Score
 Number of Plays/Streams
 Charts Rankings

4. Album Information:
 Album ID
 Album Release Date
 Total Tracks in the Album
 Album Popularity

5. Genre Information:
 Genre(s) associated with the track

6. User-Related Information:
 User Ratings/Reviews
 Playlists the track is featured in

7. Metadata:
 Data related to playlists, user interactions, etc.

8. External Links:
 Links to external resources (e.g., Spotify URLs)

b. Regression analysis with and without R libraries

3
COMSATS UNIVERSITY ISLAMABAD
 Regression Analysis without R libraries (Base R):

 Regression Analysis with R libraries (Using ggplot2 library):

b. Multiple pairs of variables to form a preditor-target pair for regression.

4
COMSATS UNIVERSITY ISLAMABAD
Using three different predictor-target pairs:

1. Danceability as a predictor of Popularity:

2. Energy as a predictor of Popularity:

5
COMSATS UNIVERSITY ISLAMABAD

3. Acousticness and Valence as predictors of Popularity:


6
COMSATS UNIVERSITY ISLAMABAD

c. Various fitting of the model:

7
COMSATS UNIVERSITY ISLAMABAD
(You have not taught fitters yet but it was required in the assignment so we worked on it
and understood the concept)

1. Polynomial Regression:

8
COMSATS UNIVERSITY ISLAMABAD
2. Interaction Terms:

9
COMSATS UNIVERSITY ISLAMABAD
3. Compare Models:

d. A paragraph explaining the results by understanding R output for


regression:

The regression analysis results from R provide insights into the relationship between predictor
variables and the target variable, often presented in the summary output. For instance,
considering a model predicting Spotify track popularity based on danceability, the coefficient for
danceability signifies the expected change in popularity for a one-unit increase in danceability. A
low p-value indicates the statistical significance of this relationship, and the R-squared value
gauges how well the model explains the variability in popularity. In interpreting these results,
one assesses both the statistical and practical significance, considering the context of the specific
dataset. Exploring alternative model specifications, such as polynomial terms or interactions,
contributes to a comprehensive understanding of the predictive relationships within the data.

10
COMSATS UNIVERSITY ISLAMABAD
e. Visualization through graphs:

 Scatter Plot with Regression Line:

11
COMSATS UNIVERSITY ISLAMABAD
 Residual plot:

12
COMSATS UNIVERSITY ISLAMABAD

13

You might also like