Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

INTRODUCTION TO ARTIFICIAL INTELLIGENT

AND DATA SCIENCE

Analyzing and Visualizing Data

Tien-Lam Pham. Phd


Van-Quyen Nguyen. M.sc
Falculty of Computer Science
Considering factors that influence tool
selection
Analyzing and Visualizing Data
The simpyfied interative data pipeline

Storage
Ingestion Analysis &
Visualization
Processing
Data sources

3
Factors to consider when selecting tools

Business needs Data characteristics Access to data

Fully understand the business Understand the type and quality of Consider the data pipeline and
needs to be able to determine the data, and how often it is who needs access to analyze and
which data analyses and updated and processed. visualize data.
visualizations are needed to help
develop insights.

4
Business needs

• Which analyses are needed to help develop insights?


• What insights can be pulled from the data?
• What visualization would illustrate the insights?
• Does the consumer need to generate a report or interact with a
dashboard?

5
Business needs: Granularity of insight

Industry Detailed Level Aggregate Level

Finance A finance manager wants information (such as A CFO wants similar metrics at an aggregate level
revenue, costs, and profit margins) about their line across all lines of businesses. They want to be able
of business. to drill down to any line of business.
Marketing A marketing manager wants to know about the A CMO is interested in related metrics but at a
number of leads, opportunities, and closed deals broader level (such as a state or region).
within an area (such as a postal code or city).
Sales A sales manager is focused on their sales pipeline A VP of sales wants similar information at an
and wants to know how long it takes to close an aggregate level. They want to be able to drill
opportunity. They want to assess how many down to a sales representative or sales territory.
opportunities are needed to achieve quota targets.

6
Business needs: Visualizing insights

KPIs Relationships Comparisons Distributions Compositions

Show performance in a Establish or prove Show or examine how Show how your data is Highlight the various
particular area or function. whether a relationship different variables change distributed over certain elements that make up
exists between two or over time, or provide a intervals (based on your data—its
more variables. static snapshot of how clustering or grouping of composition.
different variables data).
compare.

7
Visualize Relationship
Analyzing and Visualizing Data
Visualize Relationship
• Bar/Column chart
• Scatter Plot
• Conected Scatter Plot
• Bubble Chart
• Wordcloud Chart

9
Visualize Relationship
Barchart Use cases

• Useful to display the absolute data that • Volume of google searches by region
include negative values. • Market share in revenure by product
• One axis contains categories, and the
other axis represents values

10
Visualize Relationship
Scatter plot Use cases

• It is used to show the relationship • Display the relationship between salary


between two different variables by using and years spent at company
the dots that represent the values
obtained from two different variables

11
Visualize Relationship
Conected Scatter plot Use cases

• A hybrid between a scatter plot and a line • Cryptocurrency price index


plot, the scatter dots are connected with a • timelines and events when analyzing two
line variables

12
Visualize Relationship
Bubble Chart Use cases

• Bubble charts show the data in the form • Relationship between life expectancy, GDP
of a circle. The values of the variables are per capita, & population size
represented by the x-axis and y-axis. The
size of the circle represents the measure
of the variables

13
Visualize Relationship
Wordcloud chart Use cases

• Visualizing the most prevalent words that • Top 100 used words by customers in
appear in a text. This can be used to customer service tickets
visualize the relationship between
different words that appear together or
capture a trend on the most commonly
prevalent words

14
Visualize Trend
Analyzing and Visualizing Data
Visualize a Trend
• Line chart
• Muilty-line chart
• Area chart
• Starcked Area chart
• Spline Chart

16
Visualize a Trend
Line chart Use cases

• The most explicit way to capture the • Revenue in $ over time


trends over a period of time • Energy consumption in kWh over time
• Google searches over time

17
Visualize a Trend
Multi-Line chart Use cases

• Captures multiple numeric variables over • Apple vs Amazon stocks over time
time.

18
Visualize a Trend
Area chart Use cases

• Shows the trend changes over time and • Total sales over time
can be used to attract the attention of the • Active users over time
audiences to know the total changes
across the trends

19
Visualize Part of a Whole
Analyzing and Visualizing Data
Visualize Part of a Whole
• Pie Chart
• Donut Pie Chart
• Heat maps
• Stacked Column chart
• Treemap chart

21
Visualize Part of a Whole
Pie chart Use cases

• One of the most common ways to show • Voting preference by age group
part to whole data. It is also commonly • Market share of cloud providers
used with percentages

22
Visualize Part of a Whole
Donut chart Use cases

• The donut pie chart is a variant of the pie • Android OS market share
chart • Monthly sales by channel

23
Visualize Part of a Whole
Heatmap Use cases

• Heatmaps are two dimentional charts • Departments with the highest amount of
that use color shading to present data attrition over time
trends

24
Visualize Part of a Whole
Stacked Column chart Use cases

• Best to compare subcategories within • Quarterly sales per region


categorical data. Can also be used to • Total car sales by producer
compare percentages

25
Visualize Part of a Whole
Treemap chart Use cases

• 2D rectangles whose size is proportional • Grocery sales count with categories


to the value being measured and can be • Stock price comparison by industry and
used to display hierarchically structured company
data

26
Visualize Distribution
Analyzing and Visualizing Data
Visualize Distribution
• Histogram
• Box plot
• Violin plot
• Density plot

28
Visualize Distribution
Histogram Use cases

• Shows the distribution of a variable. It • Distribution of salaries in an organization


converts numerical data into bins as • Distribution of height in one cohort
columns.

29
Visualize Distribution
Boxplot Use cases

• Shows the distribution of a variable using • Time spent reading across readers
5 key summary statistics— minimum, first
quartile, median, third quartile, and
maximum

30
Visualize Distribution
Violinplot Use cases

• Shows the distribution of a variable. It • Time spent in restaurants across age


converts numerical data into bins as groups
columns.

31
Visualize Distribution
Density plot Use cases

• Visualizes a distribution by using • Distribution of price of hotel listings


smoothing to allow smoother
distributions and better capture the
distribution shape of the data

32
Visualize a Flow
Analyzing and Visualizing Data
Visualize a Flow
• Sankey chart
• Chord chart
• Network chart

34
Visualize a Flow
Sankey chart Use cases

• Useful for representing flows in systems. • Energy flow between countries


This flow can be any measurable quantity • Supply chain volumes between
warehouses

35
Visualize a Flow
Chord chart Use cases

• Useful for presenting weighted • Export between countries to showcase


relationships or flows between nodes. biggest export partners
Especially useful for highlighting the • Supply chain volumes between the
dominant or important flows largest warehouses

36
Visualize a Flow
Network chart Use cases

• Similar to a graph, it consists of nodes and • How different airports are connected
interconnected edges. It illustrates how worldwide
different items have relationships with • Social media friend group analysis
each other

37
Data characteristics

• How much data is there?


• At what speed and volume does it arrive?
• How frequently is it updated?
• How quickly is it processed?
• What type of data is it?

38
Data characteristics: Examples of data types

Volume and velocity Variety and veracity Value

• Historical analysis: Visualize • Structured data: A relational • A business analyst uses


a year’s worth of sales data. database is queried to report periodic reports to
Users can drill down by on customer service tickets that showcase and report results
region and salesperson. were submitted in a specific to leadership.
period.
• Streaming Internet of • A DevOps engineer uses
Things (IoT) data: Visualize • Unstructured data: Sentiment self-service dashboards to
the real-time error rates of analysis is performed on monitor and analyze
sensors in a factory. customer service emails. performance in real time.

39
Data characteristics: Two fraud detection use cases

Data characteristic considerations Use case 1: Rule-Based Use case 2: ML in Real Time
(Batch Pipeline) (Streaming Pipeline)
How much data is there? Millions of transactions (kilobytes to Millions of transactions
terabytes) (bytes to megabytes)
At what speed and volume is it In predefined intervals In real time
arriving? (minutes to multiple days) (milliseconds to seconds)
How quickly is it processed? Minutes to hours Milliseconds to seconds
What type of data is it? Structured and semistructured Unstructured and semistructured data

What value do insights from the data Historical reporting of fraud cases Ability to detect fraud in real time
provide? Reactive approach Proactive approach

40
Access to data

• Where does the data come from?


• Will the data need to be combined from multiple sources?
• Who needs access to the data and at what level? Who can access
the tools?

41
Access to data: Consider authorization level based on role

• A user’s authorization to access data depends on their role


in the organization.
• A business analyst or manager might be authorized to read
the output that data engineers or data analysts create but
not delete or update it.
• Follow the principle of least privilege. Give users the least
amount of access and responsibility that are needed to
complete their duties.

42

You might also like