Professional Documents
Culture Documents
BUSO 758L: Data Analysis: Week 3: Visualization Using Tableau Homework Assignment Guide
BUSO 758L: Data Analysis: Week 3: Visualization Using Tableau Homework Assignment Guide
1
Getting started
• Before we work on the visualization questions, we need to first
understand that each observation in this dataset is a trip. The
important variables are:
• Starting and ending station information
• Duration of the trip (in seconds)
• After reading in the .csv file, we then need to move categorical
variables into the “dimensions” section and move the continuous
variables into the “measures” section
• By the end of this step, your interface should look at that on page 3
• Then, we set the filter as instructed
• Screenshot attached on page 4
2
3
Setting a filter: Select the variable “Tripduration” from the
Measures section and drag and drop it into the “Filters” shelf
4
Question 1
5
Q1: Generate a map that shows each station. For each station, it should also easily
and intuitively show how many trips start at that station and how long the average
trip that starts at that station is. Attach your figure, with a short description as
Figure 1.
• To create a map, press your Control key down and select two
variables from the list: “Start Station Latitude” and “Start Station
Longitude”
• Go to “Show Me” and select “symbol maps”
6
Adding more information to the map
With the map created, we can now add more info:
• Drag variable “Start station ID” into MarksDetail
• this is to tell Tableau to display the start station Latitude and Longitude by
each Start station ID
• Of course, we need to make sure that your Start Station ID is set up as a
dimension (not a measure) in Tableau (see slide 2)
• Drag variable “TripDuration” into MarksColor and then change the
Measure from “sum” to “average”
• Drag variable “TripDuration” into MarksSize and then change the
Measure from “sum” to “count”
7
Your screen should look like this:
8
Question 2
9
We know that the average Citibike trip is 805 seconds. Now, use filter to select the
trips that start from station ID 327, 426, and 3002. Is the average trip that starts
from these three stations shorter or longer than 805 seconds? Is this difference
statistically significant? Give the p-value and interpret it. (You may want to generate
the data you need in Tableau, and then use Excel to calculate the statistics.)
11
• Go to MenuAnalysis
• Then, go to
menuAnalysis
view data
12
Your data should look like
this:
13
Question 3
Set a filter to only include trips that End at station ID# 2006 (it is at the South end of
Central Park). Show a figure that shows where the trips that end at station #2006
start, and how long those trips usually take. Try to get this figure to show as much
information about the trips as possible. Attach this figure as Figure 2. Describe the
figure in a short paragraph. What does this figure tell us? Which station has the most?
What areas have the most? Which areas have the least?
14
1. Set Filter for Tripduration as 2700
(drag Tripduration into Filter, All Values,
enter 2700 as the max value, enter OK- See Slide 4)
15
3. Select “Start Station Latitude” and drag it to
Rows and “Start Station Longitude” to Column
note that the latitude and longitude shown are
the aggregated averages, and we need to
de-aggregate them by specifying MarksDetail
So make sure “Start Station ID” is under
dimensions and drag it under MarksDetail
16
6. Go to Worksheet option and choose Summary.
Summary will show up on the side panel.
17
The Summary you get by default when you choose the Summary option under Worksheet,
may not contain the value of Standard Deviation.
So you need to go to the Side Panel where Summary shows up and click on the arrow button
And select the Standard Deviation option.
19
1. Set Filter for Tripduration as 2700
(drag Tripduration into Filter, All Values,
enter 2700 as the max value, enter OK)
6. Go to Worksheet option and choose Summary. Summary will show up on the side panel. Read off the appropriate details you need
for analysis. The Count at the top of the Summary Panel is the Number of distinct End Stations (distinct averages of Tripduration) where
the bikes are dropped off (answer to 4d).
7. To answer 4 a through 4 c, you need the data to be disaggregated. So go to Analysis and De-select Aggregate Measures (slide 12).
Now the Summary of Tripduration will provide the average of all disaggregate data which is what you need for 4a through 4c. Read off
the required numbers.