Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 70

Data Visualisation

Why Data matters?


• In today's world, data plays a crucial role in nearly every
aspect of our lives.
• From business and technology to healthcare and social
sciences, data has become a valuable resource that drives
decision-making, innovation, and progress.
• The availability of vast amounts of data has opened new
opportunities and challenges for individuals and
organizations alike.
Why data is
important
• Informed Decision-Making: Data provides insights and
evidence to support decision-making processes. By
analyzing data, organizations can identify patterns,
trends, and correlations, enabling them to make more
informed and effective decisions.
• Performance Optimization: Data allows organizations to
monitor and optimize their operations, products, and
services. By analyzing data, they can identify areas of
improvement, streamline processes, and enhance
performance.
Why data is important

• Innovation and Research: Data fuels innovation and research across various
fields. It enables scientists, engineers, and researchers to identify new
discoveries, uncover insights, and develop groundbreaking solutions.
• Personalization and Customer Experience: Data helps organizations
understand their customers' preferences, behaviors, and needs. By leveraging
this data, businesses can personalize their offerings, enhance customer
experiences, and build long-lasting relationships.
Objective Information

Identifying patterns and trends

Uncovering Insights and Opportunities


How Data
Helps? Predictive Analysis

Data-driven Decision Making

Performance Monitoring and Evaluation


Correlation

Let's consider the relationship between ice cream sales


Correlation refers to the statistical relationship or
and temperature. There may be a positive correlation
association between two variables. It measures how
between the two variables, meaning that as
changes in one variable are related to changes in
temperature increases, ice cream sales also tend to
another variable. However, correlation does not imply
increase. However, this correlation does not imply that
a cause-and-effect relationship. It simply shows that
temperature causes increased ice cream sales. Other
the variables tend to vary together, either positively
factors, such as the season, marketing efforts, or
(both increase or decrease) or negatively (one
consumer behavior, may influence both variables
increases while the other decreases).
independently.
Causality
• Causality refers to a cause-and-effect relationship between two variables, where changes
in one variable directly cause changes in another variable. It involves demonstrating that a
change in the independent variable leads to a predictable and consistent change in the
dependent variable, while ruling out other potential explanations.
• In a drug efficacy study, a group of participants is randomly assigned to receive either the
experimental drug or a placebo. If the participants receiving the drug show a significant
improvement in symptoms compared to those receiving the placebo, it suggests a causal
relationship between the drug and symptom improvement.
Storks & Babies
• The "storks and babies" example is a
classic illustration used to differentiate
between correlation and causality. It
goes as follows:
• Suppose we observe a positive
correlation between the number of
storks nesting in an area and the
number of babies born in that area. In
other words, as the number of storks
increases, so does the number of
babies born.
• The correlation between storks and
babies suggests that these variables are
associated or vary together. It could be
due to a common underlying factor,
such as population density. Areas with
more people tend to have both more
storks (as they prefer open areas) and
more births (due to a larger
population). Therefore, the correlation
between storks and babies is
coincidental and not causally linked.
Freakonomics
• Freakonomics is a book written by
economist Steven D. Levitt and
journalist Stephen J. Dubner.
• It gained widespread popularity for its
unconventional approach to exploring
economic principles and their impact
on various aspects of society.
• The book delves into a wide range of
topics, presenting economic analysis
and insights in an engaging and
accessible manner.
Freakonomics
• What do schoolteachers and sumo wrestlers have in common?
• Why do drug dealers still live with their moms?
• Where have all the criminals gone?
• What makes a perfect parent?
• Would a Roshanda by Any Other Name Smell as Sweet?
Challenges of Understanding Large Amounts
of Data
• Data Overload: The volume of data generated daily is overwhelming,
making it challenging to process and make sense of. Handling massive
datasets requires efficient storage, processing, and analysis techniques.
• Data Variety: Data comes in various formats, including structured,
semi-structured, and unstructured data. Integrating and analyzing data
from diverse sources poses challenges in terms of data integration and
data quality assurance.
C.O.U.L.A.O.D: continued..
• Data Quality and Reliability: Ensuring data accuracy, reliability, and
consistency is crucial. Inaccurate or incomplete data can lead to
erroneous analysis and flawed decision-making.
• Data Complexity: Some datasets are inherently complex, with multiple
variables, interdependencies, and high-dimensional characteristics.
Extracting meaningful insights from complex data requires advanced
analytical techniques and specialized skills.
Big Data and Its Analytics
• Big data refers to extremely large and complex datasets that are
beyond the capabilities of traditional data processing and management
methods.
• The traditional systems could not handle massive data that were
available for analysis. And this huge amount of data needed rapid
ingestion and progression which the traditional systems could not
handle.
• Further the traditional databases could only handle structured data,
while the newer datasets contained structured, semi-structured and
unstructured data.
Big Data Analytics
Five V’s
• Volume: Volume refers to the vast amount of data generated and
collected from various sources. With the advent of digital
technologies, organizations are accumulating massive volumes of data
that need to be processed and analyzed. Dealing with large volumes of
data requires scalable storage and processing capabilities.
• Velocity: Velocity represents the speed at which data is generated,
captured, and processed. In today's real-time and interconnected
world, data is generated at high speeds, often in continuous streams.
Analyzing data in real-time or near real-time allows organizations to
make timely decisions and take immediate actions based on the
insights derived from the data.
Five V’s
• Variety: Variety refers to the diverse types and formats of data. Data comes in structured,
semi-structured, and unstructured forms. It includes text, numbers, images, videos, audio,
social media posts, sensor data, and more. Analyzing and integrating data from multiple
sources and formats requires flexible data processing techniques and tools.
• Veracity: Veracity focuses on the quality and trustworthiness of data. With the increasing
volume and variety of data, ensuring data accuracy, consistency, and reliability becomes
crucial. Dealing with data from different sources, which may contain errors, inconsistencies,
or biases, poses challenges in maintaining data quality and making accurate analyses.
• Value: Value represents the ultimate goal of data analytics, which is to derive meaningful
insights and value from the data. Extracting actionable insights from large and complex
datasets can help organizations make informed decisions, optimize processes, identify
opportunities, and achieve competitive advantages. However, extracting value from data
requires advanced analytical techniques, effective data visualization, and interpretation skills.
Traditional Database Table

location: Vancouver
time(quarter) item(type)

Home Computer Phone Security


Entertainment

Q1 605 825 14 400

Q2 680 952 31 512

Q3 812 1023 30 501

Q4 927 1038 38 580


Data Cube
A Lattice of Cuboids

all
0-D(apex) cuboid

time item location supplier


1-D cuboids

time,location item,location location,supplier


2-D cuboids
time,supplier item,supplier

time,location,supplier
3-D cuboids
time,item,supplier item,location,supplier

4-D(base) cuboid
What is data visualization?

• Data visualization is the graphical representation of data and information using


visual elements such as charts, graphs, maps, and infographics.
• It is a technique that transforms complex and often large datasets into visual
representations that are more understandable, intuitive, and accessible to a
wide range of audiences.
• Data visualization plays a crucial role in various domains, including business
intelligence, marketing, finance, healthcare, education, and research. It helps in
uncovering insights, identifying trends, supporting decision-making, and
fostering data-driven discussions.
Why data visualization is important
• Enhances Data Comprehension: Data visualization presents complex
data in a visual format that is easy to understand and interpret.
• Facilitates Data Exploration: Data visualization enables users to
explore and interact with data in a dynamic and intuitive manner.
Interactive features like filtering, drilling down, and zooming allow
users to delve into specific aspects of the data, uncover hidden
insights, and gain a deeper understanding of the underlying patterns
and correlations.
Why data visualization is important
• Supports Decision-Making: Visual representations of data provide
decision-makers with a clear and concise overview of relevant
information. By presenting data in a visual format, decision-makers
can easily identify key trends, outliers, and anomalies, aiding in
informed decision-making.
• Enables Effective Communication: Data visualization is an effective
means of communicating data-driven insights to diverse audiences.
Visualizations help tell a story by presenting data in a visually
compelling and engaging manner.
Why data visualization is effective?
• The human brain possesses a remarkable ability to process and interpret
visual information rapidly and efficiently.
• Research suggests that a significant portion of the brain is dedicated to
visual processing, making it one of our most dominant senses.
• Visual Perception: The brain is highly adept at recognizing patterns, shapes,
colors, and spatial relationships. It can quickly process visual stimuli and
extract meaningful information from complex visual scenes.
• Cognitive Efficiency: Visual information is processed much faster than text
or numerical data. The brain can process and comprehend visual cues
simultaneously and in parallel, enabling efficient analysis and understanding
of complex information.
Why data visualization is effective?
• Emotional Impact: Visual information has a powerful emotional
impact on the brain. Data visualization can evoke emotions, engage
the viewer, and create a memorable and persuasive narrative.
• Pattern Recognition: The brain excels at identifying patterns, trends,
and outliers in visual information. Data visualization enhances pattern
recognition by visually encoding data, allowing users to quickly
perceive and understand relationships, correlations, and anomalies
within the data.
Benefits of Data Visualization
• Enabling faster and more accurate understanding of data.
• Simplifying complex information and making it accessible to a wide
audience.
• Facilitating data-driven decision-making and problem-solving.
John Snow’s Cholera Map
• In the mid-19th century, John Snow created a map of cholera cases in
London, overlaying it with the locations of water pumps.
• By visually representing the data, Snow was able to identify a cluster
of cases around a specific pump, ultimately leading to the discovery
that contaminated water was the source of the cholera outbreak.
• This visualization revolutionized the understanding of disease
transmission and paved the way for modern epidemiology.
Florence Nightingale’s Coxcomb Chart
• In the 19th century, Florence Nightingale used a visualization known
as the Coxcomb Chart to highlight the impact of unsanitary conditions
on mortality rates during the Crimean War.
• The visualization effectively conveyed the stark contrast between
deaths caused by preventable diseases and deaths due to wounds,
influencing policymakers and sparking improvements in sanitation
practices and healthcare.
Run Progression in a t20 match
Over Number India Australia Over Number India Australia
1 7 11 11 72 69
2 13 15 12 85 78
13 93 85
3 20 25
14 103 93
4 26 27
15 118 110
5 29 36 16 125 116
6 31 39 17 137 127
7 42 45 18 145 132
9 56 55 19 161 145
10 61 63 20 180 165
Types of Data Visualizations
Data visualizations can be broadly categorized into several types based
on their purpose and the data they represent. Here are four primary
categories for classifying different types of data visualization.
• Comparison Visualizations
• Composition Visualizations
• Distribution Visualizations
• Relationship Visualizations
Comparison Visualisations
• Comparison visualizations are used to compare and analyze data
across different categories, groups, or time periods. They allow for the
examination of variations, trends, and relationships between different
data points.
Column Charts
• Bar Charts: Bar charts use vertical or horizontal bars to represent data
values. They are effective for comparing discrete categories and
displaying numerical data. Bar charts can be simple, grouped, stacked,
or clustered, depending on the complexity of the data.
• Column Charts: Like bar charts, column charts use vertical columns to
represent data values. They are often used to compare data over time
or display data for different subcategories within a larger category.
Bar Chart & Column Chart
Line Charts
• Line Charts: Line charts display data points connected by lines. They
are particularly useful for showing trends and changes over time. Line
charts can compare multiple variables or categories on the same chart,
allowing for easy comparisons and identifying patterns.
Line Chart
Area
Charts
• Area Charts: Area charts
are similar to line charts
but filled with color
between the line and the
x-axis.
• They are effective for
comparing cumulative
values or visualizing the
composition of data over
time.
Scatter Plots
• Scatter Plots: Scatter plots use dots to represent individual data points,
with each dot representing values of two variables. They are suitable
for comparing the relationship and correlation between two variables.
Scatter plots help identify clusters, outliers, and patterns in the data.
Scatter Plot
Scatter Plot
• Bullet Graphs: Bullet graphs are designed to show the
progress toward a goal or target. They include a horizontal
bar that represents the current value, with additional markers
Bullet Graphs or bands indicating the target, comparative values, or ranges
of performance.
Radar Charts
• Radar Charts: Radar charts, also known as spider charts or star plots,
display data on a radial grid with multiple axes originating from a
central point.
• They are useful for comparing multiple variables across different
categories and visualizing the relative strengths and weaknesses of
each category.
Radar Chart
Composition Visualizations
• Composition visualizations are used to represent the composition or
distribution of data, showcasing the parts of a whole or the relative
proportions of different categories. These visualizations provide
insights into the composition, contributions, or distribution of data
elements.
Pie Chart
• Pie charts divide a circular
area into slices to represent
the proportion of different
categories or parts of a
whole. The size of each slice
corresponds to the relative
magnitude of the category or
its contribution to the total.
Donut Charts
• Donut charts are similar to pie
charts but have a hole in the
center. They are used to display
the same type of information as
pie charts while providing space
for additional labels or
annotations.
Stacked
Bar Charts
• Stacked bar charts present
different categories as
stacked bars, where the
length of each bar segment
represents the proportion
of that category within the
total. They show the
composition of multiple
variables or subcategories
in relation to the whole.
Tree maps
• Tree maps use nested
rectangles to represent
hierarchical data, with each
rectangle's size proportional
to a specific attribute or
value. They display the
composition of categories at
different levels, allowing for
a visual understanding of the
hierarchical structure and the
relative contributions of each
category.
Sunburst
Charts
• Sunburst charts also
represent hierarchical data
using concentric circles or
rings. Each ring represents a
level of the hierarchy, with
the width of the arcs
indicating the proportion of
the category. Sunburst charts
provide a visually appealing
way to showcase the
composition and proportions
of hierarchical data.
Waffle Charts

• Waffle Charts: Waffle charts


divide a square or rectangle into
smaller cells, representing the
proportion or distribution of data
categories through the number of
cells filled or colored. They are
often used to depict percentages or
relative proportions in an intuitive
manner.
Distribution
Visualizations
• Distribution visualizations are used to
understand the distribution, spread, and
shape of data values. They provide insights
into the frequency, variability, and patterns
within a dataset. Here are some examples
of distribution visualizations:
Histograms
• Histograms display the distribution of continuous data by
dividing it into bins or intervals along the x-axis and
representing the frequency or count of data points falling into
each bin with the height of bars. Histograms provide a visual
representation of the data's shape, central tendency, and
spread.
Box Plots
• Box plots, also known as box-and-whisker plots,
illustrate the distribution of data through five
summary statistics: minimum, first quartile (Q1),
median (Q2), third quartile (Q3), and maximum.
They display the range, central tendency, and
variability of the data, along with any outliers or
extreme values.
Box Plots
Box Plot in Stock Trading
Density Plots
• Density Plots: Density plots estimate the underlying probability
density function of a continuous variable. They display the distribution
as a smooth line or curve, providing insights into the shape, peaks, and
areas of high or low density. Density plots are particularly useful for
understanding the overall distribution and identifying multiple modes
or clusters.
Density Plot
Violin
Plots
• Violin plots combine the
features of box plots and
density plots. They display
the distribution as a
combination of mirrored
density plots on either side of
a central box plot. Violin
plots provide a visual
representation of both the
summary statistics and the
density of the data.
Violin Plot
Heatmaps
• Heatmaps use color gradients to represent values in a matrix or table.
They are often used to visualize the distribution or patterns in large
datasets, such as correlation matrices, gene expression data, or
geographic data. Heatmaps allow for the identification of clusters,
similarities, or variations across different variables or categories.
Heat Maps
Where to use what?
Let's consider some examples of when to use different visualizations
based on the type of data and the insights you want to convey:
• Line Chart: Use line charts to show trends over time, such as stock
prices, temperature changes, or website traffic fluctuations.
• Bar Chart: Bar charts are suitable for comparing discrete categories or
groups, such as sales by region, product sales, or survey responses by
category.
• Scatter Plot: Scatter plots are ideal for visualizing the relationship
between two continuous variables, such as examining the correlation
between height and weight, or income and education level.
• Pie Chart: Use pie charts to represent the composition of a whole or
the distribution of categories, such as market share, budget allocation,
or survey responses by category.
• Heatmap: Heatmaps are effective for visualizing large datasets or
matrices, such as gene expression data, geographic data, or website
user behavior across multiple dimensions.
• Tree Map: Tree maps are useful for displaying hierarchical data and
comparing the proportions of different categories at multiple levels,
such as market share by product category and subcategory.
Tools for Data Visualization
• Tableau, Power BI, QlikView.
• Python libraries: Matplotlib, Seaborn, Plotly.
• R packages: ggplot2, Shiny.
• Excel and Google Sheets.
Power BI
• Power BI is a powerful business intelligence and data visualization
tool developed by Microsoft. It allows users to connect to various data
sources, transform and model the data, and create interactive
visualizations and reports. Here are some key features and benefits of
Power BI:
• Data Connectivity
• Data Transformation & Modelling
• Interactive Visualisations
• Dashboard Creation
• Collaboration & Sharing
Dashboard
• A Dashboard is a visual display of data that provides an overview of key
metrics, trends, and insights in a concise and interactive format. It
typically consists of multiple visualizations, charts, tables, and other
components that present data in a way that is easy to understand and
analyze.
• Dashboards are designed to present relevant information at a glance,
allowing users to monitor performance, track progress, and make data-
driven decisions.
• Dashboards are commonly used in business, finance, marketing,
operations, and other domains where data analysis and decision-making
are crucial.

You might also like