Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

BUSO 758L: Data Analysis

Week 3: Visualization using Tableau

Homework Assignment Guide

1
Getting started
• Before we work on the visualization questions, we need to first
understand that each observation in this dataset is a trip. The
important variables are:
• Starting and ending station information
• Duration of the trip (in seconds)
• After reading in the .csv file, we then need to move categorical
variables into the “dimensions” section and move the continuous
variables into the “measures” section
• By the end of this step, your interface should look at that on page 3
• Then, we set the filter as instructed
• Screenshot attached on page 4

2
3
Setting a filter: Select the variable “Tripduration” from the
Measures section and drag and drop it into the “Filters” shelf

4
Question 1

5
Q1: Generate a map that shows each station. For each station, it should also easily
and intuitively show how many trips start at that station and how long the average
trip that starts at that station is. Attach your figure, with a short description as
Figure 1.

• To create a map, press your Control key down and select two
variables from the list: “Start Station Latitude” and “Start Station
Longitude”
• Go to “Show Me” and select “symbol maps”

6
Adding more information to the map
With the map created, we can now add more info:
• Drag variable “Start station ID” into MarksDetail
• this is to tell Tableau to display the start station Latitude and Longitude by
each Start station ID
• Of course, we need to make sure that your Start Station ID is set up as a
dimension (not a measure) in Tableau (see slide 2)
• Drag variable “TripDuration” into MarksColor and then change the
Measure from “sum” to “average”
• Drag variable “TripDuration” into MarksSize and then change the
Measure from “sum” to “count”
7
Your screen should look like this:

8
Question 2

9
We know that the average Citibike trip is 805 seconds. Now, use filter to select the
trips that start from station ID 327, 426, and 3002. Is the average trip that starts
from these three stations shorter or longer than 805 seconds? Is this difference
statistically significant? Give the p-value and interpret it. (You may want to generate
the data you need in Tableau, and then use Excel to calculate the statistics.)

• First, let set the filters


• TripDuration < 2700
• Start Station ID: 327, 426, 3002
(make sure that your Start station ID is recognized as a dimension!)
• Select “Start Station Latitude” and “Start Station Longitude”, and map
them by using Show Mesymbol maps
• Drag variable “Start station ID” into MarksDetail
• Drag variable “TripDuration” into MarksSize and then change the
Measure from “sum” to “average” 10
By this step, Tableau
shows three big dots,
one per Start Station
ID. But in this question,
we want to know the
information by trip,
instead of by station.

Let’s de-aggregate the


data.

11
• Go to MenuAnalysis

Uncheck the option


“Aggregate Measures”

• Then, go to
menuAnalysis
view data

12
Your data should look like
this:

Double check your sample


size; should be 28,845 trips.

Now, you can export it into


a csv file and use StatTools
to do
hypothesis testingmeans.

13
Question 3
Set a filter to only include trips that End at station ID# 2006 (it is at the South end of
Central Park). Show a figure that shows where the trips that end at station #2006
start, and how long those trips usually take. Try to get this figure to show as much
information about the trips as possible. Attach this figure as Figure 2. Describe the
figure in a short paragraph. What does this figure tell us? Which station has the most?
What areas have the most? Which areas have the least?

14
1. Set Filter for Tripduration as 2700
(drag Tripduration into Filter, All Values,
enter 2700 as the max value, enter OK- See Slide 4)

2. Set Filter for End Station Id by dragging


Into Filters, and checking off just the
2006 option and click OK.
(Please ensure End Station Id is in
Dimensions before you do this)

15
3. Select “Start Station Latitude” and drag it to
Rows and “Start Station Longitude” to Column
note that the latitude and longitude shown are
the aggregated averages, and we need to
de-aggregate them by specifying MarksDetail
So make sure “Start Station ID” is under
dimensions and drag it under MarksDetail

4. Drag Tripduration to Colors under Marks.


Change Tripduration “Sum” to “Average”.

5. Drag Tripduration to Size under Marks.


Change Tripduration from “Sum” to “Count”

16
6. Go to Worksheet option and choose Summary.
Summary will show up on the side panel.

Read off the appropriate details you need for


analysis.

17
The Summary you get by default when you choose the Summary option under Worksheet,
may not contain the value of Standard Deviation.

So you need to go to the Side Panel where Summary shows up and click on the arrow button
And select the Standard Deviation option.

This will provide you with the Standard Deviation results.


Question 4

19
1. Set Filter for Tripduration as 2700
(drag Tripduration into Filter, All Values,
enter 2700 as the max value, enter OK)

2. Set Filter for Start Station Id by dragging


Into Filters, and checking off just the
323 option – ensure Statrt Station Id is in
Dimensions before you do this)

3. Select “End Station Latitude” and drag it to


Rows and “End Station Longitude” to Column
Note that the latitude and longitude shown are
the aggregated averages, and we need to
de-aggregate them by specifying Marks Detail
So make sure “End Station ID” is under
dimensions and drag it under MarksDetail

4. Drag Tripduration to Colors under Marks. Change Tripduration “Sum” to “Average”.

6. Go to Worksheet option and choose Summary. Summary will show up on the side panel. Read off the appropriate details you need
for analysis. The Count at the top of the Summary Panel is the Number of distinct End Stations (distinct averages of Tripduration) where
the bikes are dropped off (answer to 4d).

7. To answer 4 a through 4 c, you need the data to be disaggregated. So go to Analysis and De-select Aggregate Measures (slide 12).
Now the Summary of Tripduration will provide the average of all disaggregate data which is what you need for 4a through 4c. Read off
the required numbers.

You might also like