Professional Documents
Culture Documents
Tableu Ca Suheal
Tableu Ca Suheal
Program
DATA SCIENCE
Section
KMO55
Course code: INTB233
Discipline of CSE/IT
Lovely School of computer sciences
Lovely Professional University, Phagwara
CERTIFICATE
Date: 18-04-2024
DECLARATION
Table of Contents
1. Introduction
2. Objectives/Scope of the Analysis
3. Source of dataset
4. ETL process
5. Analysis on dataset (for each analysis)
i. Introduction ii. General Description iii.
Specific Requirements, functions and formulas
iv. Analysis results v. Visualization
6. List of Analysis with results
7. References
8. Bibliography
INTRODUCTION:
Our dataset isn't just about the titles themselves; it's a treasure trove of information
waiting to be uncovered. From categorical data like type (movie or TV show) to numerical
insights such as duration and ratings, each attribute offers unique perspectives into Netflix's
content landscape.
3. Top 10 Genre
5. Ratings
7. Directors
Source of Dataset:
The source of the dataset is github which is a website where we can find thousands of
datasets and download them for free.
Link of the dataset: https://github.com/DataScienceRoadMapDSRM/Tableau-Dashboards-
Extract:
Download the dataset from github.com, ensuring it includes relevant information such as
different types like movies & tv shows, including date added, descriptions, different
genres, ratings and names of the countries which released the movies.
Transform:
Data Cleaning: Identify and handle missing values, duplicates, and inconsistencies in the dataset.
Data Formatting: Standardize data formats and units to ensure consistency across variables.
Feature Engineering: Create new variables or features as needed, such as deriving seasonal
indicators from timestamps or aggregating data for analysis.
Load:
Export the cleaned and transformed dataset into a format compatible with Tableau, such
as CSV, Excel, or a Tableau Data Extract (TDE) file. Connect Data to Tableau: Import the
prepared dataset into Tableau Desktop or connect it directly to the original data source.
Optimize Performance: If working with large datasets, consider optimizing data extracts
or applying filters to improve Tableau performance during visualization.
2. General Description:
It aims to understand how many distinct number of show Ids were added to Netflix in the given
years. This includes both movies & Tv shows.
4. Analysis results
Number of Movies & Tv shows added to the Netflix according the particular year.
5. Visualization:
We use a Area chart to show the visualization
1. Introduction:
Total movies & Tv shows by country
2. General Description:
This visualization represents how many total number of distinct show Ids (including both movies
& Tv shows) were released by each country .
3. Specific Requirements, functions, and formulas:
Aggregation: Use aggregation functions (COUNT(Distinct) of show Ids) to calculate how many
distinct shows were released by each country.
4. Analysis results
Highest number of shows released : United States (565) Lowest number
of shows released : Ghana and Peru (1)
5. Visualization:
We used a Map field to show the visualization
1. Introduction:
Top 10 Genre
2. General Description:
As we navigate through the vast landscape of Netflix's content library, one of the key filters we've
applied to streamline our analysis is the "Top 10 Genres" filter. Genres serve as a fundamental
categorization system, offering insights into the diverse array of content available on the
platform.
4. Analysis results
3. Specific Requirements, functions, and formulas:
Aggregation: Use aggregation functions (COUNT(Distinct) of show ids and Top 10 for listed in)
to know top 10 genres in Netflix.
Documentaries are on the first place with 299 distinct show ids
Dramas, International movies and romantic movies are on the 10 th place with 108 distinct show
ids
5. Visualization:
We used Horizontal bar chart to show the visualization
1. Introduction:
Movies & Tv shows distribution
2. General Description:
4. Analysis results
The percentage aggregate function allows us to calculate the proportion of movies and TV shows
relative to the total number of titles in the dataset. By aggregating this information, we can create
insightful visualizations that illustrate the distribution of content types and their respective shares
of the Netflix library.
3. Specific Requirements, functions, and formulas:
Aggregation: Use aggregation functions (% of Distinct COUNT of show ids) to calculate the
proportion of movies and Tv shows to the total number of titles in the dataset.
1. Introduction:
Ratings
2. General Description:
This visualization shows us what are the different ratings that was given to the movies & Tv
shows by Netflix and how many shows ids are under same category.
4. Analysis results
3. Specific Requirements, functions, and formulas:
Aggregation: Use aggregation functions (COUNT(Distinct) of show ids ) to identify each show
uniquely.
4. Analysis results
There are 593(highest) distinct show ids were considered under TV-MA rating.
There is only one show id that was considered under NC – 17 rating.
5. Visualization:
We used bar graph to show visualization
1. Introduction:
No of directors from different countries
2. General Description :
I have used map field to show how many different number of directors were there from different
countries.
1. Introduction:
Directors
2. General Description:
How many distinct shows were directed by top 10 directors.
BIBLIOGRAPHY:
1. GITHUB , Online dataset available at
https://github.com/DataScienceRoadMapDSRM/TableauDashboardsinfo/blob/main/
netflix_titles.csv