Professional Documents
Culture Documents
Francis 2018
Francis 2018
To cite this article: Amanda Francis & Eric Sullivan (2018) Exploring Real Data A Look at Airbnb,
Math Horizons, 25:3, 14-17, DOI: 10.1080/10724117.2018.1424459
“D
ata is the sword of the 21st cen- In 2008 the San Francisco–based company Airbnb
tury,” wrote Jonathan Rosenberg, changed the nature of finding lodging by providing
former senior vice president of an alternative to traditional hotels.
products at Google, “those who Airbnb describes itself as “a trusted community mar-
wield it well, the samurai.” The ketplace for people to list, discover, and book unique
world of data and data analysis is growing in impor- accommodations around the world. . . . Whether an
tance, and those with the interest and appropriate apartment for a night, a castle for a week, or a villa for
skills are uniquely positioned to solve intriguing a month, Airbnb connects people to unique travel ex-
problems and answer vital questions. periences, at any price point, in more than 65,000 cities
In this article, we look at a freely available, rich, and 191 countries” (airbnb.com).
and complex data set that you might use to ask and As Airbnb has grown in popularity, a wealth of
answer data-driven questions. We propose a collection data has been accumulated about many aspects of the
of explorations that you can try out along with us. Airbnb experience. Insideairbnb.com, an independent
website, has developed “a set of tools and data that
Airbnb allows you to explore how Airbnb is really being used
We have all experienced the traveler’s dilemma: in cities around the world.” This site has gathered
I want to travel cheaply so that my trip can last public data from Airbnb sites around the world, pro-
longer, but the cost of lodging is prohibitively high. viding us with a treasure trove of data to explore!
insideairbnb.com/seattle
Figure 1. A graphical visualization for Seattle on the website Inside Airbnb.
Figure 4. A jittered scatterplot of the price of rentals Figure 6. Cancellation policies in Belltown and Alki
with given cleanliness ratings. beach.
a scatter plot of cleanliness score versus price. An Exploration 4: You want to have an out in case your
immediate concern, however, is that many of the travel plans fall through. Is there a difference between
points will overlap and make the plot difficult to the proportion of rentals in one neighborhood that have a
read. One option is to add a “jitter” to our graphic strict cancellation policy as compared with the rentals in
so that we can see more points, which will give us a another neighborhood?
sense of where the clusters are. Let’s say that we want to stay on the waterfront
Notice that the sample sizes are wildly different: 80 in Seattle. Therefore, we’ll compare Alki beach and
rentals with lower cleanliness ratings and 3,165 with Belltown, both of which are on the water.
higher ratings. It looks like the rentals rated higher Figure 6 shows the distribution of cancellation
for cleanliness get better prices, perhaps with a few policies in the two neighborhoods, and a statistical
outliers. However, our scatterplot dots are still so test on the difference of the proportions of strict
clustered that it is difficult to see what’s going on. policies in each neighborhood gives us a p-value
Let’s try side-by-side violin plots (see figure 5). In much less than 0.01. So there is evidence to sug-
this case, we’ve added a log scale to the price vari- gest that rentals in Belltown have a much stricter
able to make the shapes easier to see. cancellation policy than those in Alki. The safe bet
We see that the bulk of the rental prices for both seems to be Alki, but if we want to spend our social
high and low cleanliness ratings are centered near time downtown, near Belltown, we need to consider
$100, but the clean rentals have some outliers with the transportation costs.
much higher prices. On the whole, the average rental Exploration 5: You are considering inviting some
prices in both categories aren’t drastically different. friends on your trip. You want to know how much more
When we conducted a statistical t-test on two means, to expect to pay for each additional bed.
we came to roughly the same conclusion (with a
Let’s start with two visualizations of our data—in
p-value of about 0.23), so there isn’t strong evidence
a jittered scatterplot and in a contour plot showing
of a difference in price between the two groups.
which combinations of number of beds and rental
price occur most frequently.
A simple linear regression tells us that the best-fit
line for our data is approximately