DV Lab Manual 2022-23

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

USHA MITTAL INSTITUTE OF TECHNOLOGY

SNDT WOMEN’S UNIVERSITY


Department of Data Science

Name of the Student :

Subject :

Title :

Experiment No. Date Performed

Evaluation Rubric
Variables 1 2 3 4 5 Points

Individual Performance

Attendance

Prelab

Post Lab

Report/Submission

Overall Points (25) Lab Incharge Signature and Date


Lab Journal 2022-23

INDEX

No Lab Experiments Date Remarks Signature


1 Introduction to Matplotlib

2 To create and customize Bar chart and Histogram


using Matplotlib for given dataset using
3 To create and customize Pie chart using Matplotlib for
given dataset
4 To create and customize Line chart using Matplotlib for
given dataset
5 To create dual axis charts using Tableau Public for
given dataset
To create and customize Cartogram map for given
6 dataset
7 To create and customize Cartogram map using Tableau
Public for given dataset
8 To create and customize WordCloud using Matplotlib
for a given webpage
Lab Journal 2022-23

Experiment 1
Title:. Introduction to Matplotlib.
Apparatus: Simulation tools used for experiment…………………………………………………………………………..
Dataset used:……………………………………………………………………………………………………………………………….

Theory:
Data (or information) visualization is used to interpret and gain insight into large amounts of data.
This is achieved through visual representations, often interactive, of raw data. For data visualization
in python, we can use various python data visualization modules such as Matplotlib, Seaborn, Plotly,
etc.
o Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations
in Python. Matplotlib makes easy things easy and hard things possible. It is 2-D plotting library of
Python It was initially released in 2003 and it is the most popular and widely-used plotting library
in the Python community. It comes with an interactive environment across multiple platforms.
Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook,
web application servers, etc. It can be used to embed plots into applications using various GUI
toolkits like Tkinter, GTK+, wxPython, Qt, etc. So you can use Matplotlib to create plots, bar charts,
pie charts, histograms, scatterplots, error charts, power spectra, stemplots, and whatever other
visualization charts you want! The Pyplot module also provides a MATLAB-like interface that is
just as versatile and useful as MATLAB while being free and open source.
o Pyplot
Pyplot is a Matplotlib module that provides a MATLAB-like interface. Matplotlib is designed to be
as usable as MATLAB, with the ability to use Python and the advantage of being free and open-
source. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a
plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc. The
various plots we can utilize using Pyplot are Line Plot, Histogram, Scatter, 3D Plot, Image,
Contour, and Polar.

Procedure:

1. Install and import the necessary libraries


2. Load the dataset
3. Convert dataset into useful format by cleaning and formatting it
4. Represent data using suitable plot methods

Conclusion/Result:

We have studied and used Matplotlib library to represent given data with suitable chart
Lab Journal 2022-23
Experiment 2

Title:. To create and customize Bar chart and Histogram using Matplotlib for given dataset
using.
Apparatus: Simulation tools used for experiment…………………………………………………………………
Dataset used:……………………………………………………………………………………………………………………….

Theory:
A] Bar Chart
Data variables: 1 x categorical, 1 x quantitative-ratio.
Visual variables: Length/height, color-hue.
Description: Bar charts convey data through the length or height of a bar, allowing us
to draw accurate comparisons between categories for both relative and absolute values.
When using length as the visual variable to represent a quantitative value it is
important to show the full extent of this property so always start the bar from the zero
point on the axis. The use of color can help draw attention to the values of specific
categories in accordance with your narrative.

B] Histogram
Data variables: 1 x quantitative-interval, 1 x quantitative-ratio.
Visual variables: Height, width.
Description: Histograms are often mistaken for bar charts but there are important
differences. Histograms show distribution through the frequency of quantitative values
(y axis) against defined intervals of quantitative values (x axis). By contrast, bar charts
facilitate comparison of categorical values. One of the distinguishing features of a
histogram is the lack of gaps between the bars.

Procedure:

1. Install and import the necessary libraries


2. Load the dataset
3. Convert dataset into useful format by cleaning and formatting it
4. Represent data using bar chart and histogram

Conclusion/Result:

We have studied and created bar chart and histogram for given dataset using Matplotlib
library
Lab Journal 2022-23
Experiment 3

Title: To create and customize Pie chart using Matplotlib for given dataset.
Apparatus: Simulation tools used for experiment…………………………………………………………………
Dataset used:……………………………………………………………………………………………………………………….

Theory:
Pie Chart
Data variables: 1 x categorical, 1 x quantitative-ratio.
Visual variables: Angle, area, color-hue.
Description: Pie charts are probably the most contentious chart type and attract much
negative sentiment. While we know it is harder to accurately interpret angles and judge the
area of segments compared to other visual variables, the negativity is arguably more a
reflection of their relentless misuse. The inclusion of too many categories and colors, 3D
decoration, and poorly executed arrangement are often to blame for this. Usually, a simple
bar chart will suffice to demonstrate the part-to-whole relationship. However, if you are
determined to use a pie chart, always start the first slice from the vertical position (to
establish a sense of baseline), minimize the number of categories being displayed (ideally
maximum of three), and arrange the segments as logically as possible. Variations include the
donut chart, which is essentially the same chart but with the center removed (to
accommodate labels or nested donut charts)

Procedure:

1. Install and import the necessary libraries


2. Load the dataset
3. Convert dataset into useful format by cleaning and formatting it
4. Represent data using pie chart with necessary customization.

Conclusion/Result:

We have studied and created pie chart with necessary customization for given dataset using
Matplotlib library
Lab Journal 2022-23
Experiment 4

Title: To create and customize Line chart using Matplotlib for given dataset.
Apparatus: Simulation tools used for experiment…………………………………………………………………
Dataset used:……………………………………………………………………………………………………………………….

Theory
Line chart
Data variables: 1 x quantitative-interval, 1 x quantitative-ratio, 1 x categorical.
Visual variables: Position, slope, color-hue.
Description: Line charts are something we should all be familiar with. They are used to
compare a continuous quantitative variable on the x axis and the size of values on the y axis.
The vertical points are joined up using lines to show the shifting trajectory through the
resulting slopes. Line charts can help unlock powerful stories of the relative or (maybe)
related transition of categorical values. Unlike bar charts, the y axis doesn't need to start from
zero because we are looking at the relative pattern of the data journey

Procedure:

1. Install and import the necessary libraries.


2. Load the dataset.
3. Convert dataset into useful format by cleaning and formatting it.
4. Represent data using Line chart with necessary customization.

Conclusion/Result:

We have studied and created Line chart with necessary customization for given dataset using
Matplotlib library.
Lab Journal 2022-23
Experiment 5

Title: To create dual axis charts using Tableau Public for given dataset.
Apparatus: Simulation tools used for experiment…………………………………………………………………
Dataset used:……………………………………………………………………………………………………………………….

Theory

Tableau Public is a free platform to explore, create, and publicly share data visualizations online.
With the largest repository of data visualizations in the world to learn from, Tableau Public makes
developing data skills easy. It helps to advance the skills in analytics by learning from limitless data
inspiration.

Dual axis charts:

A dual axis chart (also called multiple axes chart) uses two axes to easily illustrate the
relationships between two variables with different magnitudes and scales of measurement.
The relationship between two variables is referred to as correlation. A dual axis chart
illustrates plenty of information using limited space, which helps to discover trends that may
have otherwise missed.

Procedure:

1. Install and import the necessary libraries


2. Load the dataset
3. Convert dataset into useful format by cleaning and formatting it
4. Represent data with dual axis charts using Tableau Public for given dataset

Conclusion/Result:

We have studied and created dual axis charts with necessary customization for given dataset using
Tableau Public.
Lab Journal 2022-23
Experiment 6

Title: To create and customize Cartogram map for given dataset.


Apparatus: Simulation tools used for experiment…………………………………………………………………
Dataset used:……………………………………………………………………………………………………………………….

Theory:

Cartogram map
Data variables: 2 x quantitative-interval, 1 x quantitative-ratio.
Visual variables: Position, size.
Description: Where a choropleth map takes a location and gives it a shade of color to
represent a value, a cartogram takes a location and the geographic shape to represent a value.
The result is a distorted and skewed view of reality in the form of a reconfigured atlas. As with
many of the chart types outlined here, the purpose is not to enable exact readings, rather to
highlight the highly inflated, deflated, and unchanged shapes and sizes. They do rely on a
certain predeveloped familiarity of (for example) a country's position, its shape, and its size.
The most effective deployment of such charts tends to be when they are interactive and you
can unlock all the benefits of exploratory analysis

Geopandas
GeoPandas is an open source project to make working with geospatial data in python easier.
It currently implements GeoSeries and GeoDataFrame types which are subclasses
of pandas.Series and pandas.DataFrame respectively. GeoPandas objects can act
on shapely geometry objects and perform geometric operations.
The goal of GeoPandas is to make working with geospatial data in python easier. It combines
the capabilities of pandas and shapely, providing geospatial operations in pandas and a high-
level interface to multiple geometries to shapely. GeoPandas enables you to easily do
operations in python that would otherwise require a spatial database such as PostGIS.

Procedure:

1. Install and import the necessary libraries


2. Load the dataset
3. Convert dataset into useful format by cleaning and formatting it
4. Represent data using Cartogram map with necessary customization for given dataset

Conclusion/Result:

We have studied and created Cartogram map with necessary customization for given dataset.
Lab Journal 2022-23

Experiment 7

Title: To create and customize Cartogram map using Tableau Public for given dataset.
Apparatus: Simulation tools used for experiment…………………………………………………………………
Dataset used:……………………………………………………………………………………………………………………….

Theory:

A cartogram is a map in which some thematic mapping variable is substituted for land area or
distance. The geometry or space of the map is distorted in order to convey the information of this
alternate variable. Tableau to plot data as a Cartogram using the Polygon Map approach. Although
Cartograms are not a native feature of Tableau, Tableau is a very good and fast option to visualize
Cartogram polygon data.

Hierarchy in Tableau

It is an arrangement where the entities are presented at various levels. In common terms,
Hierarchy is a system or organization that has many levels from highest to lowest; similarly,
in Tableau, hierarchies can be created by bringing one dimension as a level under the
principal dimension. For example, In the Sample Superstore dataset, the Hierarchy for
geographical dimensions is:
 Country (USA)
 State
 City
 Postal Code

Procedure:

1. Install and import the necessary libraries


2. Load the dataset
3. Convert dataset into useful format by cleaning and formatting it
4. Represent data using Cartogram map with necessary customization in Tableau Public for
given dataset

Conclusion/Result:

We have created Cartogram map with necessary customization in Tableau Public for given dataset.
Lab Journal 2022-23

Experiment 8

Title: To create and customize Word Cloud using Matplotlib for a given webpage.
Apparatus: Simulation tools used for experiment…………………………………………………………………
Dataset used:……………………………………………………………………………………………………………………….

Theory:

The text visualization chart is the graphical representation of qualitative data frequency, such as
keywords or customer feedback. The graph gives greater prominence to words that appear more
frequently in a source text. The larger the word, the higher its frequency. The text chart can be used
to perform exploratory textual analysis by identifying words that frequently appear in a set of
interviews, documents, or other text. Also, it can be used to communicate the most salient points or
themes in the reporting stage. Word cloud is one of the method used for text visualization.

Word Cloud

Data variables: 1 x categorical, 1 x quantitative-ratio.


Visual variables: Size.
Description: Word clouds depict the frequency of words used in a given set of text. The font
size indicates the quantity of each word's usage. Color is often just used as decoration (which
you'll notice actually distorts the visual prominence). While it's fair to say they are becoming
something of a ubiquitous visual commodity, they can be useful for exploring datasets for the
first time in order to identify key terms being used. If you feel compelled to use word clouds,
the best advice is to ensure the underlying text being used is carefully prepared in advance to
reduce the noise.

Procedure:

1. Install and import the necessary libraries


2. Load the dataset
3. Convert dataset into useful format by cleaning and formatting it
4. Represent text data using Word Cloud with necessary customization.

Conclusion/Result:

We have created Word Cloud with necessary customization using Matplotlib for a given webpage.

You might also like