Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

<TABLEAU> PROJECT REPORT

(Project Semester January-April 2024)

ONLINE SHOPPING TRENDS


Submitted by
Shaik Suheal Ahamed
Registration No: 12114304

Program
DATA SCIENCE
Section
KMO55
Course code: INTB233

Under the Guidance of


TANIMA THAKUR (23532)

Discipline of CSE/IT
Lovely School of computer sciences
Lovely Professional University, Phagwara
CERTIFICATE

This is to certify that SK SUHEAL AHAMED bearing Registration no.


12114304 has completed the INTB233 project titled, “NETFLIX_TITLES”
under my guidance and supervision. To the best of my knowledge, the present
work is the result of his/her original development, effort, and study.

Signature and Name of the Supervisor


Designation of the Supervisor
School of Computer Sciences
Lovely Professional
University Phagwara, Punjab.

Date: 18-04-2024

DECLARATION

I, SK.SUHEAL AHAMED student of DATA SCIENCE (Program name) under


CSE/IT Discipline at, Lovely Professional University, Punjab, hereby declare that
all the information furnished in this project report is based on my own intensive
work and is genuine.

Date: 18-04-2024 Signature


Registration No. 12114304 Name of the student: SK SUHEAL AHAMED

Table of Contents

1. Introduction
2. Objectives/Scope of the Analysis
3. Source of dataset
4. ETL process
5. Analysis on dataset (for each analysis)
i. Introduction ii. General Description iii.
Specific Requirements, functions and formulas
iv. Analysis results v. Visualization
6. List of Analysis with results
7. References
8. Bibliography

INTRODUCTION:

Welcome, everyone! Today, we embark on an exciting journey delving into the


world of entertainment analytics. Our focus? The Netflix_titles dataset and the
fascinating insights it unveils. In this presentation, we'll explore the significance
of this dataset, its key features, and how we've leveraged it to construct an
insightful dashboard.

Netflix has revolutionized the way we consume entertainment. With millions of


titles at our fingertips, understanding the dynamics of its content library is
paramount. The Netflix_titles dataset provides a comprehensive look into this
vast repository, including information on titles, release dates, genres, and much
more.

Our dataset isn't just about the titles themselves; it's a treasure trove of information
waiting to be uncovered. From categorical data like type (movie or TV show) to numerical
insights such as duration and ratings, each attribute offers unique perspectives into Netflix's
content landscape.

Objectives/Scope of the Analysis

1. Total movies & Tv shows by year

2. Total movies & Tv shows by country

3. Top 10 Genre

4. Movies & Tv shows distribution

5. Ratings

6. No of directors from different countries

7. Directors

Source of Dataset:

The source of the dataset is github which is a website where we can find thousands of
datasets and download them for free.
Link of the dataset: https://github.com/DataScienceRoadMapDSRM/Tableau-Dashboards-

info/blob/main/netflix_titles.csv ETL PROCESS:

Extract:

Retrieve the Dataset:

Download the dataset from github.com, ensuring it includes relevant information such as
different types like movies & tv shows, including date added, descriptions, different
genres, ratings and names of the countries which released the movies.

Transform:

Data Cleaning: Identify and handle missing values, duplicates, and inconsistencies in the dataset.

Data Formatting: Standardize data formats and units to ensure consistency across variables.

Feature Engineering: Create new variables or features as needed, such as deriving seasonal
indicators from timestamps or aggregating data for analysis.

Load:

Prepare Data for Tableau:

Export the cleaned and transformed dataset into a format compatible with Tableau, such
as CSV, Excel, or a Tableau Data Extract (TDE) file. Connect Data to Tableau: Import the
prepared dataset into Tableau Desktop or connect it directly to the original data source.
Optimize Performance: If working with large datasets, consider optimizing data extracts
or applying filters to improve Tableau performance during visualization.

Analysis of the dataset (for each analysis)


1. Introduction:
Total movies & Tv shows by year

2. General Description:
It aims to understand how many distinct number of show Ids were added to Netflix in the given
years. This includes both movies & Tv shows.

3. Specific Requirements, functions, and formulas:


Aggregation: Use aggregation functions (COUNT(Distinct) of show Ids) to calculate and
identify the show Ids uniquely.
We choose year (date added) on the horizontal axis and distinct count of show Ids on the vertical.

4. Analysis results
Number of Movies & Tv shows added to the Netflix according the particular year.

5. Visualization:
We use a Area chart to show the visualization

1. Introduction:
Total movies & Tv shows by country
2. General Description:
This visualization represents how many total number of distinct show Ids (including both movies
& Tv shows) were released by each country .
3. Specific Requirements, functions, and formulas:
Aggregation: Use aggregation functions (COUNT(Distinct) of show Ids) to calculate how many
distinct shows were released by each country.
4. Analysis results
Highest number of shows released : United States (565) Lowest number
of shows released : Ghana and Peru (1)
5. Visualization:
We used a Map field to show the visualization

1. Introduction:
Top 10 Genre
2. General Description:
As we navigate through the vast landscape of Netflix's content library, one of the key filters we've
applied to streamline our analysis is the "Top 10 Genres" filter. Genres serve as a fundamental
categorization system, offering insights into the diverse array of content available on the
platform.
4. Analysis results
3. Specific Requirements, functions, and formulas:
Aggregation: Use aggregation functions (COUNT(Distinct) of show ids and Top 10 for listed in)
to know top 10 genres in Netflix.

Documentaries are on the first place with 299 distinct show ids
Dramas, International movies and romantic movies are on the 10 th place with 108 distinct show
ids

5. Visualization:
We used Horizontal bar chart to show the visualization

1. Introduction:
Movies & Tv shows distribution
2. General Description:
4. Analysis results
The percentage aggregate function allows us to calculate the proportion of movies and TV shows
relative to the total number of titles in the dataset. By aggregating this information, we can create
insightful visualizations that illustrate the distribution of content types and their respective shares
of the Netflix library.
3. Specific Requirements, functions, and formulas:
Aggregation: Use aggregation functions (% of Distinct COUNT of show ids) to calculate the
proportion of movies and Tv shows to the total number of titles in the dataset.

Movies added : 91.24% (1656)


Tv shows added : 8.76% (159)
5. Visualization:
We used Heat Maps to show visualization

1. Introduction:
Ratings
2. General Description:
This visualization shows us what are the different ratings that was given to the movies & Tv
shows by Netflix and how many shows ids are under same category.
4. Analysis results
3. Specific Requirements, functions, and formulas:
Aggregation: Use aggregation functions (COUNT(Distinct) of show ids ) to identify each show
uniquely.
4. Analysis results
There are 593(highest) distinct show ids were considered under TV-MA rating.
There is only one show id that was considered under NC – 17 rating.
5. Visualization:
We used bar graph to show visualization
1. Introduction:
No of directors from different countries

2. General Description :
I have used map field to show how many different number of directors were there from different
countries.

3. Specific Requirements, functions, and formulas:


Aggregation: Use aggregation functions (COUNT(Distinct) of directors ) to calculate how many
distinct directors were there from every country.
4. Analysis results
Highest number of directors are from United States : 1175.
5. Visualization:
We showed visualization using map field.

1. Introduction:
Directors
2. General Description:
How many distinct shows were directed by top 10 directors.

3. Specific Requirements, functions, and formulas:


Aggregation: Use aggregation functions (COUNT(Distinct) of show ids ) to identify each show
uniquely. 4. Visualization :

Represented this using bubble chart

List of Analysis With results:


REFERENCES:
COURSERA MOOC
https://www.coursera.org/learn/introduction-to-tableau

BIBLIOGRAPHY:
1. GITHUB , Online dataset available at
https://github.com/DataScienceRoadMapDSRM/TableauDashboardsinfo/blob/main/
netflix_titles.csv

You might also like