Cia 1 Compoent 1: Introduction To Data Science

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

CIA 1 COMPOENT 1

INTRODUCTION TO DATA SCIENCE

EXPLORATORY DATA ANALYSIS USING ORANGE TOOL

KRITIKA INGOLE
19111023
5Bcoma
EXPLORATORY DATA ANALYSIS

 DATA TAKEN: Super sales Data


 DATA TYPE: Business Sales Data
ABOUT THE DATA
This dataset contains various details of products sold at a store. These types
of datasets are studied to find out the patterns in the selling structure and
profit earned from them.
CONTENTS OF DATA
 Order ID: A specific ID given to each product
 Order Priority: Priority of the product
Order Quantity: No of product items
sold
 Sales Ship Mode: Divided in two categories - Express Air and Regular
Air Profit: Profit earned from the sale
Customer Name: Name of the customer purchasing the products Region: Region to which the
customer belongs Customer Segment: Divided as per the size of business
Product Category: Divided according to the usage of the product
 Product Sub-Category: Divided according to the usage of the product
ProductName: Name of the product
 Product Container: Type of container in which the product is shipped
POINTS TO COVER IN THE DATA ANALYSIS:
 Data Preparation: Explore the data
 Data Cleaning: Handled missing Values and
duplicate rows.
 Interactive Data Visualization
 Interpretation of the Data
DATA PREPARATION:
Here I have imported the csv file to orange and label the Profit column as the
target and other columns as Features. And linked the file and the Data Table
together, after this I have showed variable labels, visualized the numeric values.
DATA CLEANING:
o In this step I have cleaned the data and have make ae there any
wrong, missing values and null value are present or not, I found
missing values in second column under Order Priority which I
have imputed with average or most frequent value in the data.
DATA AFTER IMPUTING:
DATA VISULAIZATION (with interpratation)

 Visualized with the widget Distributions and showed the


Region ordered more products.

 FREEVIZ GRAPH: I have shown the ordered quantity and the region of the countries,
basically in free viz visualization Free Viz is an extremely powerful widget that helps you to
extract
important information from the dataset. From this we get that Atlantic and Northwestern
Territories quanties ordered are more.
SIEVE VISULAIZATIONS:
A Sieve Diagram is a graphical method for visualizing frequencies in a two-way
contingency table and comparing them to expected frequencies under
assumption of independence.
Here I have showcased the sales and product sub category, I can see that frequency of office
furnishings and appliances is showing positive sales
SILHOUETTE PLOT:

Silhouette widget offers a graphical representation of consistency within clusters of data and provides
the user with the means to visually assess cluster quality. The silhouette score is a measure of how
similar an object is to its own cluster in comparison to other clusters and is crucial in the creation of a
silhouette plot. The silhouette score close to 1 indicates that the data instance is close to the center of
the cluster and instances possessing the silhouette scores close to 0 are on the border between two
clusters. Here I have shown two clusters customer segments and customer namethis shws the
customer comes in which customer segments.
BAR PLOT

This graph tells that west region ranks first and second comes northern territories and then Atlantic is
at third for sales of products.
 Here the following graph its showcases that the product sub category which is office
furnishings priority is more.
The following visualization shows that west region is giving more profit.
CONCLUSION

THE FINAL TREE OF THE EXPLORATORY DATA ANALYSIS

LINK REFERENCES

https://towardsdatascience.com/data-science-made-easy-interactive-data-visualization-using-orange-
de8d5f6b7f2b

You might also like