Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Final Exam, MA544 Data Visualization

Krysh Rajendran

Problem not done in test:


5. (10 points) High dimensional plots
(a) Given a data matrix with 10 variables and 200 observations, what is the definition of
the correlational matrix?
It is a 10x10 matrix of each variable mapped to every other variable and the value being
the correlation between the two variables. 1.00 indicates maximum correlation and 0
means not correlated.
(b) What type of plot can most effectively observe the correlation of the 10 variables?
Scatter matrix
(c) When we have a dataset with more than 20 dimensions (e.g., defect records of
airplane electronic components, computer vision, natural languages, etc.)
Parallel Coordinates

(d) Explain the curse of dimensionality


Having more dimensions than observations can result in massively overfitting our model
and observations become harder to cluster and they may all appear to be equidistant
from others, we must therefore, in such a case, carefully choose features and try to
reduce too many features.
(e) Give two visualization tools that can represent a high dimension problem and explain
the pros and cons of the two tools.
Scatter Plots:
1. Easy, lots of customization, fits most situations
2. Cons- not attractive to look at
Mosaic Plots:
1. Can support high number of variables
2. Hard to interpret accurately.
Take-Home Component (50 points, Due on Friday)
II (30 points) Evaluation and Redesign the graph
a. Use data visualization principles to evaluate the graph design below.
b. Provide your design that captures the key data-related components
either in the Tableau platform or python plot libraries (you do not have to
really use tools to plot for the data is blur, but sketch the plots to show ideas).
1. (10 points) This graph plots three time series, the genealogy of royalty on the top,
the price of a quart of wheat as the bar chart in the middle, and the weekly salary
of good mechanics in the bottom curve between 1565 to 1821 of 250 years in
total.

Things I noticed about the graph:


1. uses too much ink
2. not coherent
3. does not tell a good story, it does not make me focus on the message its trying to show
4. not clear at all
5. does not follow schniedermans mantra- this only has overview, no filter and details on
demand
6. lots of bloat unwanted information.
My graph: I would add a filter to choose beginning and ending time period as well

2. (10 Points) Distribution of ALL TFBS Regions.

1. It’s a pie chart, can be hard to make comparisons


2. Message is clear after reading the text
3. Very little substance
4. The pie charts angle makes comparisons super difficult
5. The only useful information is the percentage numbers we see
My graph
3. (10 points) DNA fingerprinting: A review of the controversy (with discussion). Statistical

Science 9:222-278
1. Does show a lot of data
2. But the data is very distorted
3. 3D is not efficient, but otherwise it seems fine
4. Encourages comparison, but not east to compare
5. Not very clear
6. Cant find the message that the author is trying to convey
7. Cant understand the context
My Graph:
I would make easy to understand bins, and maybe normalize the sample sizes of each
category first, since the number of observations of each race is different, we can have
some skewed observations. I would just use easy to distinguish colors to make it easier to
compare different races.(maybe related to their flag colors, china- red, Japan-white,
korea-blue, Vietnam- yellow)
4. (10 points Use Shneiderman’s Mantra to design your visualization, including a
dashboard with other details for your client whose portfolio of 401K includes some
tradable mutual funds and stock as well as some untradable equities. Your goal is
that the client can spend a few minutes daily checking the assets and
understanding if they need to take action. More specifically, the portfolio with
historical data of the past 20 years includes:
(1) Seven stocks for about 30% of total assets, Tesla (TSLA), Apple (AAPL), Walmart
(WMT), Google (GOOG), Amazon (AMZN), FedEx (FDX), and Pfizer (PFE),
(2) Three Mutual Funds for about 50% of assets, Vanguard (VFTSX), Wellington
fund admiral (WFADX), and NASDAQ,
(3) Two Equities about 20% of the total assets: Vanguard Equity-Income Fund
(VEIPX) and BlackRock Advantage ESG U.S. Equity Fund (BIRIX). (hint: The
difference between equity funds from other funds or stocks is that they cannot
trade publically before retirement.
5. (10 points) Complete the other problems that you have not done in the in-class test.
I did problem number 5 at the start of this document.

6. (bonus 10 points) Correct the possible wrong answers or incomplete answers to the
problems you did in class.
1. Question 1, the lie factor could be = 2 if we used surface area instead of volume.
But I felt in the moment that using the cylinders volume would be a better
measure of its displayed strength.

You might also like