Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

21AI602 - DATA VISUALIZATION

UNIT 2 – DATA VISUALIZATION METHODS

LP6 – Indicator-Area Chart-Pivot table- Scatter charts, Scatter maps


- Tree maps

1. Indicator:

 An indicator is a visual representation of a single metric or KPI (Key Performance


Indicator).
 It is often a single number or a simple visualization (such as a gauge) that provides a quick
overview of the performance of a specific metric.
2. Area Chart:
 An area chart is a graphical representation of data where the area under the curve is filled
with color.
 It is commonly used to show how a particular dataset changes over time.

Fig 2.1 Area Chart

When to use area charts:

 When you want to display the volume of the data you have.
 When comparing data across more than one time period

When to avoid area charts:

 Avoid if you need to compare multiple categories, as well as when you need to examine the
specific data value

3. Pivot Table:
 A pivot table is a data processing tool used in spreadsheet programs or business intelligence
software.
 It allows users to summarize and analyze data interactively by reorganizing and
summarizing selected columns and rows.
 Pivot tables can also be enhanced with conditional formatting to provide color scales that
make performance trends more visible.
 Data bars can also be added to cells to run either red or green for positive and negative
values.

Fig 3.1 Pivot Table

When to Use Pivot Tables:


 Cohort analysis performance trends or portfolio analysis with a mix of positive and negative
values

What Not to Use Pivot Tables:


 When your dataset is too large to get a good understanding of the whole
 When data can easily be summarized with a bar chart instead
4. Scatter charts:
 A scatter chart is a type of plot that displays individual data points on a two-dimensional
graph.
 It is useful for visualizing the relationship between two continuous variables.
 Scatter charts prove instrumental in discerning patterns and trends within data, and they
also help us understand how strong and in what direction the relationship is between two
variables.
 They also serve as effective tools for identifying outliers, or those data points that deviate
significantly from anticipated values based on the pattern displayed by other data points.
 These charts find widespread use across a range of fields including, but not limited to,
statistics, engineering, and social sciences, for the purpose of analyzing and visualizing
intricate data sets.
 In the realm of business, they are frequently utilized to identify correlations between
different variables, for instance, examining the relationship between marketing outlays and
resultant sales revenue.

Fig 4.1 Scatter Chart

When to use scatter plots:


 Highlight correlations within your data
 They are useful tools for statistical investigations
 Consider scatter plots to reveal underlying patterns or trends

When to avoid scatter plots:


 For smaller datasets, scatter plots may not be optimal
 Avoid scatter plots for excessively large datasets to prevent unintelligible data clustering
 If your data lacks correlations, scatter plots may not be the best choice

5. Scatter Maps:
o Similar to scatter charts, but data points are plotted on a map, using geographic
coordinates.
o Each point represents a location on the map.
Fig 5.1 Scatter Plots
6. Tree Maps:
 A tree map is a hierarchical visualization that displays nested rectangles to represent
hierarchical data structures.
 Each branch of the tree is given a colored rectangle, and the size of the rectangle
corresponds to a quantitative value.
 A tree map is a type of chart that is used to visualize hierarchical data.
 It consists of a series of nested rectangles, where the size and color of each rectangle
represent a different variable.
 Tree maps are best used when analyzing data that has a hierarchical structure.

Fig 6.1 Tree Maps


When to use tree maps:
 When you want to visualize hierarchical data
 When you need to illustrate the proportion of different categories within a whole

When to avoid tree maps:


 When exact values are important
 When there are too many categories
21AI602 - DATA VISUALIZATION

UNIT 2 – DATA VISUALIZATION METHODS

LP7 – Space filling and non-space filling Methods - Hierarchies and


Recursion.

1. Space-Filling Methods:
 In space-filling methods, visual elements occupy the entire display area, and the
visualization effectively fills the space available.

Fig: Space Filling Graph

i. Treemaps:
 Treemaps represent hierarchical data as nested rectangles.
 Each level of the hierarchy is depicted by a different color or shading, and the size of
each rectangle corresponds to a quantitative value, such as the number of observations
or a numerical attribute.

ii. Sunburst Charts:


 Sunburst charts are radial representations of hierarchical data, similar to treemaps.
 The innermost ring represents the root node, and each subsequent ring represents a
level in the hierarchy.
 The arcs' lengths correspond to the proportion of data at each level.

iii. Circle Packing:


 Circle packing is another way to represent hierarchical data using nested circles.
 Each circle represents a group or category, and the size of the circle corresponds to
the quantity or proportion of data within that group.

iv. Chord Diagrams:


 Chord diagrams display relationships between data points in a circular layout.
 The arcs connect the data points, and their thickness or color intensity can represent
the strength or frequency of connections.

v. Hexagonal Binning:
 Hexagonal binning is a method used in scatter plots where the 2D space is divided
into hexagons, and the color or intensity of each hexagon represents the density of
data points within that region.
 This helps in visualizing the distribution of points in a dense scatter plot.

vi. Bubble Charts:


 Bubble charts use circles to represent data points in a scatter plot.
 The size of each bubble corresponds to a third variable, providing a way to visualize
three dimensions of data.

vii. Grid-based Visualization:


 Grid-based methods divide the visualization space into a grid, and each grid cell
represents a specific data point or category.
 Heatmaps are a common example where color intensity indicates the magnitude of a
variable.

viii. Cartograms:
 Cartograms distort the geographic space to represent statistical information.
 The size or shape of regions is altered based on a particular variable, allowing for
the visualization of spatial patterns.

2. Non-Space-Filling Methods:
 In non-space-filling methods, visual elements may not necessarily fill the entire display
space, and the focus is often on the arrangement and relationships between individual
elements.

i. Scatter Plots:
 Scatter plots are simple yet powerful visualizations where individual data points are
plotted on a two-dimensional plane.
 Each point's position represents the values of two variables, and additional dimensions
can be encoded using color, size, or shape of the markers.
ii. Line Charts:
 Line charts connect data points with lines, making them useful for visualizing trends or
patterns over a continuous variable (e.g., time).
 They are effective for showing the relationship between two variables.

iii. Bar Charts:


Bar charts use rectangular bars to represent data values.
 They are suitable for comparing the quantities of different categories or groups.
 Bar length or height encodes the data, and additional dimensions can be represented
through color or patterns.

iv. Histograms:
 Histograms display the distribution of a single variable by dividing the data into
bins and representing the frequency of observations in each bin.
 They are useful for understanding the shape of the data distribution.

v. Box Plots (Box-and-Whisker Plots):


 Box plots provide a summary of the distribution of a dataset.
 The box represents the interquartile range, the line inside the box is the median,
and the whiskers extend to show the range of the data.
 Outliers can also be displayed.

vi. Parallel Coordinates:


 Parallel coordinates visualize multivariate data by representing each data point
as a line crossing parallel axes.
 The position of the line on each axis corresponds to the value of a particular
variable.

vii. Radar Charts:


 Radar charts (or spider charts) use a circular grid with multiple axes radiating
from the center.
 Each axis represents a different variable, and a polygon is drawn connecting
points on each axis to visualize the values of multiple variables for a single data
point.

viii. Network Diagrams:


 Network diagrams illustrate relationships between entities in a network. Nodes
represent entities, and edges represent connections or relationships.
 The layout and attributes of nodes and edges convey information about the
structure and attributes of the network.

xi. Pie Charts:


 Pie charts represent parts of a whole by dividing a circular area into slices.
 The size of each slice corresponds to the proportion of the whole it represents.
While pie charts are common, caution is advised in their usage, especially for
comparing multiple categories.

Fig: Non Space Filling Graph

3. Hierarchies in Data Visualization:


 Hierarchies play a crucial role in data visualization, allowing users to explore and
understand the relationships and structures within complex datasets.
 Here are several aspects of hierarchies in data visualization:
o Tree Maps
o Sunburst Charts
o Org Charts
o Nested Pie Charts
o Hierarchy Diagrams

Fig: Visualization Hierarchy

4. Recursion in Data Visualization:


 Recursion, a concept from computer science and mathematics, can be applied to
data visualization to represent hierarchical or nested structures.
 Recursive techniques involve breaking down a complex problem into simpler
instances of the same problem.
 In the context of data visualization, recursion can be used to create visual
representations of hierarchical data in a repetitive and structured manner.
 Here are some examples:
o Fractals
o Recursive Visualization Algorithms
o Self-Referential Diagrams
o Recursive Graphs
o Recursive Visual Elements

Fig: Recursion Tree


21AI602 - DATA VISUALIZATION

UNIT 2 – DATA VISUALIZATION METHODS

LP8 – Networks and Graphs -Matrix representation for graphs-


Infographics- EDA using Python

1. Networks and Graphs:


 Graphs: A graph is a mathematical representation of a set of objects where some pairs of the
objects are connected by links.
 The objects are represented by nodes, and the links between them are represented by edges.
 Networks, in the context of data visualization, refer to visual representations of graphs,
often with nodes and edges, to show relationships and connections between entities.

Graph Theory Basics:


 Nodes (Vertices): Represent entities or objects.
 Edges (Links): Represent relationships or connections between nodes.
 Weighted Edges: Assigning a numerical value to edges to indicate strength or
importance.
 Directed vs. Undirected Graphs: In directed graphs, edges have a direction, while in
undirected graphs, edges have no direction.

Network Analysis:
 Centrality Measures: Identify the most central nodes in a network, such as degree
centrality, betweenness centrality, and closeness centrality.
 Community Detection: Identify groups of nodes with strong internal connections using
algorithms like modularity-based methods or spectral clustering.
 Path Analysis: Understand the shortest paths or routes between nodes.
Fig: Graph
2. Matrix Representation for Graphs:
 The matrix representation is one way to represent the adjacency structure of a graph using a
matrix, typically referred to as an adjacency matrix.
 For an undirected graph, the matrix is symmetric; for a directed graph, it may not be
symmetric.

Adjacency Matrix:
o In an adjacency matrix, rows and columns represent nodes in the graph.
o The presence or absence of an edge between two nodes is indicated by a 1 or 0 in
the corresponding matrix cell.
o For a directed graph, the matrix may not be symmetric.
o Weighted graphs can have numerical values in the matrix cells to represent edge
weights.

Adjacency List:
o In an adjacency list, each node has a list of its neighbors.
o It's often implemented using a dictionary or an array of linked lists in programming.
o Suitable for sparse graphs (graphs with relatively few edges).
Edge List
o Another representation is the edge list, where each row represents an edge in the
graph.
o Each row contains the nodes involved in the edge, and possibly a weight.

3. Info Graphics:
 Infographics, short for information graphics, are visual representations of information, data,
or knowledge designed to present complex information quickly and clearly.
 Infographics combine text, images, and graphics to convey information in a visually
engaging and digestible format.
 They are widely used in data visualization to make data-driven insights more accessible to a
broader audience.

Fig: Infographics Tools


4. EDA using Python:
 Exploratory Data Analysis (EDA) is a critical step in the data analysis process, and Python
provides a rich ecosystem of libraries and tools for conducting EDA through data
visualization.
Steps:
1. Import Libraries and Load Data
2. Understanding the Data
3. Data Cleaning
4. Data Visualization
5. Categorical Variables
6. Feature Engineering
7. Outlier Detection and Handling
8. Data Distribution
9. Statistical Testing
10. Interactive Visualizations

Fig: Exploratory Data Analysis


21AI602 - DATA VISUALIZATION

UNIT 2 – DATA VISUALIZATION METHODS

LP5 – Mapping – Time series – Connections and correlations.

1. Mapping:

 In the context of data visualization methods and mapping, the term "Mapping" refers to the
representation of data points or locations on a map.
 It involves the use of graphical elements to convey spatial information in a visual format.
Mapping is a powerful technique in data visualization as it allows users to understand
patterns, relationships, and trends in geographic data.
 Here are some key concepts related to mapping in data visualization:

Geospatial Data: Mapping typically involves the use of geospatial data, which includes
information related to the Earth's surface such as latitude, longitude, and elevation. Geospatial
data can represent physical features, locations of events, or any other information with a spatial
context.

Coordinate Systems: Maps use coordinate systems to represent locations on the Earth's surface.
The most common coordinate system is the latitude and longitude system, but other systems like
UTM (Universal Transverse Mercator) may also be used.

Markers and Symbols: Data points on a map are often represented using markers or symbols.
These markers can vary in size, color, or shape to convey additional information about the data,
such as the magnitude of a variable or the category of a location.

Choropleth Maps: Choropleth maps use color variations to represent spatial variations in a
particular variable. Different shades or colors are used to indicate different levels or categories of
the variable across regions.

Heatmaps: Heatmaps visualize the density or intensity of data points in a particular area.
Hotspots with a higher concentration of data are represented with warmer colors, while cooler
colors indicate lower density.
1
Interactive Maps: With advancements in technology, interactive maps have become popular.
Users can interact with the map, zoom in, pan, and click on specific data points to access
additional information.

GIS (Geographic Information System): GIS is a powerful tool that integrates mapping and
spatial analysis capabilities. It allows users to overlay different layers of information and perform
complex spatial analyses.

Fig Mapping

2. Time Series:
 Time series refers to a sequence of data points collected or recorded over a period
of time. Time series data is characterized by the temporal ordering of observations,
where each observation is associated with a specific timestamp.
 Analyzing and visualizing time series data is essential for understanding patterns,
trends, and changes over time.
 Here are some key aspects of time series in data visualization:

Temporal Axis: Time series data is plotted on a temporal axis, typically along the x-axis.
The x-axis represents time, and the y-axis represents the variable being measured. This
allows viewers to observe how the variable changes over different time intervals.

Line Charts: Line charts are commonly used to visualize time series data. In a line chart,
data points are connected by straight lines, providing a smooth representation of how the
variable evolves over time. Line charts are effective for displaying trends, cycles, and
fluctuations.

Time Series Plots: A time series plot is a specific type of graph that displays data points in
2
the order in which they occur. It helps in understanding the behavior of a variable over
time and is particularly useful for identifying seasonality, trends, and outliers.

Seasonal Patterns: Time series data often exhibits seasonal patterns, where certain
patterns repeat at regular intervals. Seasonal decomposition techniques, such as
decomposition into trend, seasonality, and residuals (e.g., using moving averages), can help
visualize and analyze these patterns.

Bar Charts and Histograms: Bar charts and histograms can be used to represent time
series data when discrete observations are recorded at specific time points. Each bar or
column represents the value of the variable at a particular time.

Heatmaps: Heatmaps are useful for visualizing time series data when there are multiple
variables or dimensions involved. Time can be represented on one axis, and different
variables on the other, with color indicating the intensity or magnitude of the values.

Interactive Time Series Visualizations: Interactive visualizations, such as dynamic charts


or dashboards, enable users to explore time series data interactively. Users can zoom in,
pan, and filter data based on specific time ranges.

Annotations and Events: Annotations on time series charts can highlight significant
events or milestones, aiding in the interpretation of the data. Events like product launches,
policy changes, or external factors impacting the data can be visually marked.

Forecasting and Prediction: Time series visualizations are crucial for forecasting and
predicting future trends. Techniques like trend lines, moving averages, or advanced time
series models can be employed to make predictions.

Fig Time Series Graphs


3
3. Connections and Correlations:
 Connections and correlations are fundamental aspects of data visualization, helping
to reveal relationships and patterns within the data.
 Here are some key ways in which connections and correlations are explored in data
visualization:
3.1 Scatter Plots:
Purpose: Visualize the relationship between two continuous variables.
Representation: Each data point is plotted on a two-dimensional graph, with one variable
on the x-axis and the other on the y-axis.

3.2 Line Charts:


Purpose: Illustrate trends and patterns over time.
Representation: Data points are connected with lines, making it easy to observe overall
trends and identify correlations.

3.3 Correlation Matrix:


Purpose: Explore relationships among multiple variables.
Representation: A matrix where each cell represents the correlation between two
variables, often visualized using a heatmap for quick interpretation.

3.4 Network Diagrams:


Purpose: Visualize connections and relationships in complex datasets.
Representation: Nodes represent variables or entities, and edges depict connections or
correlations between them.

3.5 Heatmaps:
Purpose: Display the intensity of relationships or correlations.
Representation: A grid of colored cells where colors indicate the strength and direction of
correlations, often used for correlation matrices.
3.6 Bubble Charts:
Purpose: Extend scatter plots to include a third variable.
Representation: Similar to scatter plots, but the size of each point represents a third
variable, adding another layer of information.

3.7 Parallel Coordinates:


Purpose: Visualize relationships among multiple variables simultaneously.
Representation: Variables are represented by parallel axes, and lines connecting points
across axes reveal correlations.

3.8 Chord Diagrams:


4
Purpose: Show relationships and connections in a circular layout.
Representation: Arcs connect different categories or variables, and the thickness of the
arcs indicates the strength of connections.

3.9 Sankey Diagrams:


Purpose: Illustrate flow and connections between entities.
Representation: Entities are represented by nodes, and the flow of connections between
them is visualized using directed links.

3.10 Correlation Circles (PCA):


Purpose: Visualize relationships and correlations in reduced dimensions.
Representation: Variables are plotted in a two-dimensional space, and the distance and
angle between variables reflect their relationships.

3.11 Geospatial Visualizations:


Purpose: Explore spatial correlations and connections.
Representation: Maps display data points or regions, allowing users to identify spatial
patterns and relationships.

3.12 Interactive Dashboards:


Purpose: Allow users to interactively explore connections and correlations.
Representation: Users can filter, zoom, and highlight specific data points to investigate
relationships dynamically.

Fig Correlation Maps

You might also like