09B Rnorm WalmartTarget

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

09B-rnorm-WalmartTarget

May 4, 2021

1 9B: Walmart versus Target


[3]: # This code will load the R packages we will use

suppressPackageStartupMessages({
library(mosaic)
library(supernova)
library(Lock5withR)
})

font_size = function (size) {


theme(text = element_text(size = size))}

1.1 1.0 - The Data


1.1 - Walmart and Target would both love to be your one-stop shop. Which do you typically prefer?
Why?
I personally prefer shopping at Target because I’m more familiar with it so I know if
the items I need can be found there and where in the store they are, they still have a
mask mandate, and the nearest Target is closer to my home than a Walmart location
is.
1.2 - Let’s take a look at some data that compares items sold at each store. What are the cases in
the data frame?
[4]: shopping <- read.csv("https://docs.google.com/spreadsheets/d/e/
,→2PACX-1vRkOQ74WHgK-ZRwSRwei6Ddt02PCOvvvGNIlL2xCXmQx4w0dnhUPJeWZwJLv9Qm9akan3HNls6fF_WI/

,→pub?gid=0&single=true&output=csv", header = TRUE)

shopping
# Take a look at the data frame

1
ItemDetail
<chr>
Tide Original Liquid Laundry Detergent, 100 oz
Angel Soft Toilet Paper, 18 mega rolls
Crest 3D White, Whitening Toothpaste Radiant Mint, 4.1 oz, Pack of 3
Dove Men+Care Clean Comfort Deodorant Stick, 3 oz
Downy April Fresh Fabric Softener Dryer Sheets, 240 count
Listerine Cool Mint Antiseptic Mouthwash Oral Care And Breath Freshener, 1.5 L
Neutrogena Ultra Sheer Sunscreen Lotion, SPF 55, 3 oz
OFF! Smooth & Dry Aerosol Twin Pack, 8 fl oz
Pantene Pro-V Classic Clean Shampoo, 20.1 fl oz
Philips Sonicare Essence Rechargeable Sonic Toothbrush
Secret pH Balanced Powder Fresh Invisible Solid Antiperspirant and Deodorant Twin Pa
Olay Moisturizing Face Lotion for Sensitive Skin, 6.0 fl oz
Opti-Free Pure Moist Contact Solution, 20 fl oz
Windex Original Glass Cleaner Spray, 26 fl oz
Febreze Fabric Refresher with Gain Original, 27 oz
Pledge Multi Surface Antibacterial Everyday Cleaner, 9.7 oz
Shark Navigator Freestyle Cordless Stick Vacuum
Magic Bullet NutriBullet Pro 900 Series
Keurig K-Select Single-Serve Coffee Brewer
Energizer Max AAA Batteries, 4 count
Larabar Fruit & Nut Food Bar, 5 count
Oreo Cookies, Chocolate, 19.1 oz
Coca-Cola, 12 fl oz, 12 count
Folgers Gourmet Supreme Dark Roast Ground Coffee, 24.2 oz
Jif Creamy Peanut Butter, 40 oz
Doritos Nacho Cheese Tortilla Chips Party Size!, 15 oz.
Rold Gold Tiny Twists Pretzels, 16 oz
LEGO Creator Mighty Dinosaurs 31058
Monopoly
A data.frame: 174 × 10 Radio Flyer Classic Red Wagon

Brita Large 10 Cup Grand Water Pitcher with Filter - BPA Free - White
Keurig K-Classic Coffee Maker K-Cup Pod, Single Serve, Programmable, Black
Rachel Ray Cucina Porcelain Aluminum 12 Piece Cookware Set
LG NeoChef 0..9 Cu. Ft. 1000W Countertop Microwave
Cuisinart Mini-Prep Food Processor
Char-Broil Classic 2 Burner Gas Grill
Frigidaire 50-Pint Dehumidifier
Crock-Pot 6 Qt. Programmable Cook & Carry Slow Cooker with Digital Timer, Stainles
Pilot G2 Bold Point Retractable Gel Pens, 1 Dozen
S’ip by S’well Vacuum Insulated Stainless Steel Water Bottle
Neutrogena Clear Face Liquid Lotion Sunscreen with SPF 5, 3 fl. Oz
Hasbro Connect 4 Game
Valor Fitness 20lb Soft Kettlebell
Olay Regenerist Whip Facial Moisturizer - SPF 25 - 1.7oz
Neutrogena Ultra Sheer Non-Greasy Sunscreen Stick, SPF 70, 15 oz
L’Oreal Paris True Match Lumi Glotion Highlighter, Fair*
Biotrue for Soft Contact2 Lenses Multi-Purpose Solution, 10 oz
Fiskars PowerGear 2 Hedge Shears (23”)
JBL Flip 4 Bluetooth Portable Stereo Speaker
Playstation 4 Pro 1TB Gaming Console
The cases are the individual items, which are all sold at both stores.
One reason people may prefer one store over the other might be the prices. Many people believe
Walmart to be cheaper.
Does Target have higher prices relative to Walmart? How much more expensive is it?
1.3 - It would be helpful to know how much more expensive some item is at Target relative to
Walmart. Is there a variable like that in this data frame? How could we make one? Explain the
steps below.
Since we already have the variables “WalmartPrice” and “TargetPrice,” we would need
a way to somehow combine these two variables for each of the items and create a new
variable that can quickly tell us which store a certain item is cheaper or more expensive
at. This variable should also tell us the difference between the prices.
1.4 - Follow the steps from 1.3 to create a new variable called PriceDiff within the same data
frame.
[5]: shopping$PriceDiff <- (shopping$WalmartPrice - shopping$TargetPrice)
shopping$PriceDiff

shopping

1. -0.0199999999999996 2. -0.0199999999999996 3. -0.0199999999999996 4. -0.0600000000000005


5. -0.35 6. -0.0199999999999996 7. -0.0200000000000005 8. 0 9. -0.0200000000000005
10. -0.0199999999999996 11. -0.0200000000000005 12. 0 13. -0.0299999999999976 14. -0.17
15. -0.0499999999999998 16. -0.21 17. -22.12 18. -0.989999999999995 19. 0 20. -0.38 21. -0.52
22. -0.13 23. -0.00999999999999979 24. -0.00999999999999979 25. -0.0499999999999998 26. -0.81
27. -0.51 28. 0 29. 0 30. 19.91 31. 0.05 32. 0.04 33. -0.12 34. -2.05 35. -1.61 36. -0.0800000000000001
37. -1.01 38. -0.22 39. -1.19 40. 0.02 41. -0.35 42. -1.01 43. -0.0100000000000002 44. -0.46
45. -1.07 46. -0.51 47. -0.57 48. -0.47 49. -1.11 50. -0.0800000000000001 51. -0.49 52. -0.59
53. -0.0100000000000002 54. 0 55. -1.36 56. -0.00999999999999979 57. 0.0500000000000007
58. -0.0200000000000005 59. -0.109999999999999 60. -0.0500000000000007 61. -0.359999999999999
62. -0.109999999999999 63. -0.469999999999999 64. -0.0200000000000005 65. -0.0199999999999996
66. -0.0199999999999996 67. -0.0700000000000003 68. -0.00999999999999979 69. -0.15 70. -0.11
71. -0.620000000000005 72. -0.109999999999999 73. -0.520000000000003 74. -0.0500000000000007
75. -0.520000000000003 76. -0.0500000000000007 77. -0.02 78. -0.02 79. -0.02 80. 1.09 81. -0.13
82. -2.06 83. -0.0100000000000002 84. -0.13 85. 0.0499999999999998 86. -0.0300000000000002
87. -0.0699999999999998 88. -0.15 89. -0.49 90. -0.51 91. -0.17 92. 0.01 93. -0.01
94. -0.31 95. 0.29 96. 0.45 97. -0.0100000000000002 98. -0.05 99. -0.11 100. -0.51
101. -0.41 102. -0.0600000000000001 103. -0.01 104. -0.15 105. -0.13 106. -0.13 107. -0.56
108. -0.0200000000000005 109. -3.37 110. -0.0500000000000003 111. -0.15 112. 1.76 113. -0.05
114. -0.02 115. -0.0499999999999998 116. -0.0500000000000007 117. -0.0500000000000007
118. -0.0499999999999998 119. -0.0699999999999998 120. -0.00999999999999979 121. 6.04 122. -0.77
123. 7.82 124. -0.21 125. -0.350000000000001 126. 0.43 127. -0.17 128. 0.39 129. -0.0700000000000003
130. -0.0500000000000003 131. -0.76 132. 6 133. -0.05 134. -0.0500000000000003
135. -0.0199999999999996 136. -0.0499999999999998 137. 0 138. -0.0700000000000003
139. -0.319999999999999 140. -0.549999999999997 141. -0.0199999999999996 142. 0.43 143. -0.62
144. -0.0199999999999996 145. 0 146. 0 147. 30.99 148. 2.00999999999999 149. -0.0700000000000003

3
150. 20 151. -0.990000000000009 152. 10 153. -1 154. -3 155. 6.49 156. 0 157. 0 158. -4.02
159. -0.0199999999999996 160. 2.78 161. -0.00999999999999979 162. -0.170000000000002
163. -0.039999999999992 164. 7.95999999999998 165. -2 166. 0 167. -33.43 168. -42.38 169. 0
170. -0.00999999999999979 171. 0 172. 2.77 173. 5.59 174. -0.709999999999999

4
ItemDetail
<chr>
Tide Original Liquid Laundry Detergent, 100 oz
Angel Soft Toilet Paper, 18 mega rolls
Crest 3D White, Whitening Toothpaste Radiant Mint, 4.1 oz, Pack of 3
Dove Men+Care Clean Comfort Deodorant Stick, 3 oz
Downy April Fresh Fabric Softener Dryer Sheets, 240 count
Listerine Cool Mint Antiseptic Mouthwash Oral Care And Breath Freshener, 1.5 L
Neutrogena Ultra Sheer Sunscreen Lotion, SPF 55, 3 oz
OFF! Smooth & Dry Aerosol Twin Pack, 8 fl oz
Pantene Pro-V Classic Clean Shampoo, 20.1 fl oz
Philips Sonicare Essence Rechargeable Sonic Toothbrush
Secret pH Balanced Powder Fresh Invisible Solid Antiperspirant and Deodorant Twin Pa
Olay Moisturizing Face Lotion for Sensitive Skin, 6.0 fl oz
Opti-Free Pure Moist Contact Solution, 20 fl oz
Windex Original Glass Cleaner Spray, 26 fl oz
Febreze Fabric Refresher with Gain Original, 27 oz
Pledge Multi Surface Antibacterial Everyday Cleaner, 9.7 oz
Shark Navigator Freestyle Cordless Stick Vacuum
Magic Bullet NutriBullet Pro 900 Series
Keurig K-Select Single-Serve Coffee Brewer
Energizer Max AAA Batteries, 4 count
Larabar Fruit & Nut Food Bar, 5 count
Oreo Cookies, Chocolate, 19.1 oz
Coca-Cola, 12 fl oz, 12 count
Folgers Gourmet Supreme Dark Roast Ground Coffee, 24.2 oz
Jif Creamy Peanut Butter, 40 oz
Doritos Nacho Cheese Tortilla Chips Party Size!, 15 oz.
Rold Gold Tiny Twists Pretzels, 16 oz
LEGO Creator Mighty Dinosaurs 31058
Monopoly
A data.frame: 174 × 11 Radio Flyer Classic Red Wagon

Brita Large 10 Cup Grand Water Pitcher with Filter - BPA Free - White
Keurig K-Classic Coffee Maker K-Cup Pod, Single Serve, Programmable, Black
Rachel Ray Cucina Porcelain Aluminum 12 Piece Cookware Set
LG NeoChef 0..9 Cu. Ft. 1000W Countertop Microwave
Cuisinart Mini-Prep Food Processor
Char-Broil Classic 2 Burner Gas Grill
Frigidaire 50-Pint Dehumidifier
Crock-Pot 6 Qt. Programmable Cook & Carry Slow Cooker with Digital Timer, Stainles
Pilot G2 Bold Point Retractable Gel Pens, 1 Dozen
S’ip by S’well Vacuum Insulated Stainless Steel Water Bottle
Neutrogena Clear Face Liquid Lotion Sunscreen with SPF 5, 3 fl. Oz
Hasbro Connect 4 Game
Valor Fitness 20lb Soft Kettlebell
Olay Regenerist Whip Facial Moisturizer - SPF 25 - 1.7oz
Neutrogena Ultra Sheer Non-Greasy Sunscreen Stick, SPF 70, 15 oz
L’Oreal Paris True Match Lumi Glotion Highlighter, Fair*
Biotrue for Soft Contact5 Lenses Multi-Purpose Solution, 10 oz
Fiskars PowerGear 2 Hedge Shears (23”)
JBL Flip 4 Bluetooth Portable Stereo Speaker
Playstation 4 Pro 1TB Gaming Console
1.5 - Make some predictions:
If you think Walmart is cheaper than Target, around what number do you think would be the
average for PriceDiff?
If you think Target is cheaper than Walmart, around what number do you think would be the
average for PriceDiff?
(Hint: One of the above has to be a negative number. Figure out which one and why.)
Based on the data given to us by our new variable, I would predict that the negative
value would come from the Target prices because their prices are usually higher for the
same products that can also be found at Walmart. Since Walmart is generally cheaper
than Target, I predict that the average PriceDiff is around -5.00, because there are
several items that have a greater price difference between them even if most of them
have a smaller difference.

1.2 2.0 - Explore Variation


2.1 - Explore the variation in the price differences using a histogram as your visualization. What
do you notice?
[6]: gf_histogram(~PriceDiff, data = shopping)

6
According to the visualization, the vast majority of the items do not have a difference
in their prices at all, or the difference is less than one dollar which causes them to also
land on the zero. We also see that the price differences for a few other items outside
of the zero to fall between around the -5 and +11 dollar range, while others are even
further outside of that. The greatest price difference in the negative range is about -42
dollars, while the greatest in the positive range is around +30 dollars.
2.2 - From this data does it seem like Walmart is fundamentally cheaper than Target? Explain why
or why not.
I think Walmart is at least a bit cheaper than Target based on the data from these
specific items because the vast majority of their price differences show that Walmart
items are often a few cents cheaper than the same items that can also be found at
Target.

1.3 3.0 - Modeling Variation


Everyday prices are changing at Walmart and Target. Everyday they are trying to beat each other.
There are also so many more products at each of these stores than what is included in this small
data set of 174 things.
What we want to know is not just about the price differences of these particular items, but the
DGP that produced these price differences: What is it like? That’s the question we will be focusing
on.
3.1 - Hm, is it possible that the two stores do NOT really differ in their prices? Could that general
idea be true even if some items are more expensive at Walmart and some items are more expensive
at Target?
I think this could be a possibility since the data also shows that some of the items
are actually cheaper at Target than they are at Walmart, so these differences could
outweigh each other in result in very little variation between the two stores’ average
prices.
3.2 - If we assumed that the two store do not really differ in price, what should we expect the
average difference between the prices to be?
If this was true, what would be the best β0 to represent the Data Generating Process? Write that
number in the GLM below.
Yi = 0+ ϵi
(Why are we using β0 and ϵi here?)
I think we are using β0 to represent our population mean and ϵi for our error.
3.3 - Take a look at the first few rows of shopping in the code cell below. Notice that Walmart
prices usually end in .97 and Target prices end in .99. Could the true difference in prices be just
two cents?

7
If that was true, what would be the best β0 to represent the Data Generating Process? Write that
number in the GLM below.
Yi = 0.02 + ϵi
(Why are we using β0 and ϵi here?)

[7]: head(select(shopping, ItemDetail, WalmartPrice, TargetPrice, PriceDiff))

ItemDetail Wa
<chr> <d
1 Tide Original Liquid Laundry Detergent, 100 oz 11.
2 Angel Soft Toilet Paper, 18 mega rolls 14.
A data.frame: 6 × 4
3 Crest 3D White, Whitening Toothpaste Radiant Mint, 4.1 oz, Pack of 3 9.9
4 Dove Men+Care Clean Comfort Deodorant Stick, 3 oz 4.4
5 Downy April Fresh Fabric Softener Dryer Sheets, 240 count 8.9
6 Listerine Cool Mint Antiseptic Mouthwash Oral Care And Breath Freshener, 1.5 L 6.5
3.4 - Does this model of the DGP mean that every thing sold at Walmart will be exactly equal
two cents off of Target’s price? Which part of the GLM shows you that there is still some variation
in the price differences?
The error, or ϵi shows us that there is still variation among the price differences, because
our new model does not mean that everything at Walmart will always be two cents
cheaper. Instead, this means that -0.02 is the most common price difference.
3.5 - Take a look at this histogram from 2.1 again – is it possible that these price differences came
from a DGP of normally distributed price differences?
I think these price differences were not generated by a normal distribution because most
of the price differences are not too far off from zero, while there are also a few outliers.

1.4 4.0 - Modeling Variation


4.1 - We cannot ever really know β0 . But we can estimate it at least. It’s better than guessing,
right? What is our best estimate of the DGP from our sample?
Yi = −0.075 + ϵi
[8]: favstats(~PriceDiff, data = shopping)

min Q1 median Q3 max mean sd n missing


A data.frame: 1 × 9 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
-42.38 -0.3575 -0.05 -0.01 30.99 -0.07534483 5.702052 174 0
4.2 - What does this number mean?
This number is the average price difference between Walmart and Target.

1.5 5.0 - Simulating the DGP


Since statistics and data science is all about what’s possible, let’s consider this:

8
We are trying to imagine that the world of all the stuff you could buy at Target or
Walmart (“the population”) is similar to the price differences reflected in our sample
of 174 items. How much could our b0 vary if we had taken a bunch of different samples,
over and over again?
5.1 - Run the code below a few times and observe how it works. Why does one number stay the
same and the other number change? What are these two numbers?
[9]: mean(shopping$PriceDiff)

PriceDiff.stats <- favstats(~ PriceDiff, data = shopping)


mean(rnorm(174, PriceDiff.stats$mean, PriceDiff.stats$sd))

-0.0753448275862069
0.215123946496186
The top number is the mean, and it could be showing us the average difference between
Target and Walmart prices. The bottom number could be the mean for the population.
5.2 - The code below will generate 10 b0 s. What do you notice about the generated b0 s? What are
they typically like? How big can they get? How small can they get? If you generated a 1000 of
these b0 s, what would the resulting distribution look like?
[10]: do(10) * mean(rnorm(174, PriceDiff.stats$mean, PriceDiff.stats$sd))

mean
<dbl>
-0.2240535
0.1031303
0.7385611
0.3638840
A do.data.frame: 10 × 1
-0.6222328
0.3527681
0.1093849
0.2985917
0.1484727
-0.2785639
We notice that the b0s are all decimal points between 1 and -1. If we generated 1,000
of them, the resulting distribution would have a normal shape, and the center would be
around zero.
5.3 - Modify the code to below generate 1000 means. Then create a visualization of the resulting
object sdob.
[11]: # Modify this code to simulate 1000 samples, and save it into an object called␣
,→sdob

sdob <- do(1000) * mean(rnorm(174, PriceDiff.stats$mean, PriceDiff.stats$sd))

# This gives us a peek at what is inside sdob

9
head(sdob)

# Create a visualization of sdob


gf_histogram(~ mean, data = sdob)

mean
<dbl>
1 -0.14535312
2 -0.04590286
A do.data.frame: 6 × 1
3 0.01131302
4 -0.62639779
5 -0.13451124
6 0.22618730

5.4 - This is a sampling distribution created from the assumption that the DGP is basi-
cally like our sample. What is the DGP represented by rnorm(174, PriceDiff.stats$mean,
PriceDiff.stats$sd)? Put it in the GLM equation below.
Yi = 0 + ϵi

10
5.5 - We have saved our sample b0 in an R object called sampleb0. In the histogram below we
have colored in the middle 95% of means in one color to show that they are “likely” to come from
this DGP. Is our sample (it’s going to appear as a green line) going to be in the “likely”
zone?
Make a prediction, then run the code below.
We think that most of our means will be in the “likely” zone because most of them are
in between -1 and 1.
[12]: # This code will help us color the middle "likely" .95 of samples in
# a different color than the "unlikely" .05 of samples
sampleb0 <- mean(shopping$PriceDiff)
sdob <- do(1000) * mean(rnorm(174, PriceDiff.stats$mean, PriceDiff.stats$sd))
sdob <- arrange(sdob, mean)
sdob$middle.95 <- c(rep("unlikely", 25),rep("likely", 950), rep("unlikely", 25))

# This will create a visualization that includes our sampleb0


gf_histogram(~ mean, data = sdob, binwidth = .05, fill = ~middle.95) %>%
gf_vline(xintercept = sampleb0, color = "green4") %>%
gf_refine(scale_fill_manual(values = c("dodgerblue", "coral")))

Warning message:
“geom_vline(): Ignoring `mapping` because `xintercept` was provided.”

11
1.6 6.0 - What about other DGPs?
That’s nice and all but we aren’t so sure we should assume the DGP is just like our sample! We
want to investigate other possible DGPs (like maybe the true difference is just two cents or even
one dollar).
6.1 - How might we modify the code below to simulate this DGP: Yi = .02 + ϵi ? How about
Yi = 0 + ϵi ? Or Yi = 1 + ϵi ?
Will our sample b0 end up in the “likely” zone from these DGPs? Which ones?
Our sample b0 will only end up in the “likely” zone if we have a mean of 1.
[18]: my.sdob <- do(1000) * mean(rnorm(174, .02, PriceDiff.stats$sd))

# This will help us color the sdob's likely and unlikely zones
my.sdob <- arrange(my.sdob, mean)
my.sdob$middle.95 <- c(rep("unlikely", 25),rep("likely", 950), rep("unlikely",␣
,→25))

12
# This will create a visualization that includes our sampleb0
gf_histogram(~ mean, data = my.sdob, binwidth = .05, fill = ~middle.95) %>%
gf_vline(xintercept = sampleb0, color = "green4") %>%
gf_refine(scale_fill_manual(values = c("dodgerblue", "coral")))

Warning message:
“geom_vline(): Ignoring `mapping` because `xintercept` was provided.”

6.2 - Why is our sample b0 in the unlikely zone when the true DGP has a β0 = 1? Why is it on the
extreme low end of this sampling distribution?
This could be because most of our data is in between -1 and +1, making anything that
ends up on either of these ends much more unlikely because our distribution is uniform.
6.3 - Is it possible that our sample b0 happens to be a bit high? Imagine a DGP where our sample
would end up on the extreme high end of the sampling distribution. What are some β0 s where our
sample would end up unlikely because it is too high?

13
Write your example here in GLM form.
Yi = 1 + ϵi
Then test it out by modifying the code below.
[19]: my.sdob <- do(1000) * mean(rnorm(174, 1, PriceDiff.stats$sd))

# This will help us color the sdob's likely and unlikely zones
my.sdob <- arrange(my.sdob, mean)
my.sdob$middle.95 <- c(rep("unlikely", 25),rep("likely", 950), rep("unlikely",␣
,→25))

# This will create a visualization that includes our sampleb0


gf_histogram(~ mean, data = my.sdob, binwidth = .05, fill = ~middle.95) %>%
gf_vline(xintercept = sampleb0, color = "green4") %>%
gf_refine(scale_fill_manual(values = c("dodgerblue", "coral")))

Warning message:
“geom_vline(): Ignoring `mapping` because `xintercept` was provided.”

14
6.4 - If we simulated a DGP where the true β0 was actually 0, is it possible to generate a sample
that would have the same mean as ours? Is it likely?
This would be likely because zero is in the middle of the “likely” zone, and so there is
a lot of other data around it.
6.5 - What does this data say about the true price differences at Walmart and Target? What could
the true price difference actually be?
This graph shows us that the price difference is only in cents, so prices at Walmart are,
on average, less than a dollar cheaper than the items are at Target.
6.6 - Out of curiosity, how many items are cheaper at Walmart? How many are cheaper at Target?
Does that change your view about which store is “cheaper”? How is this measure of “cheapness”
different from PriceDiff?
[25]: tally(~mean, data = PriceDiff)

Error in eval(x, data, env): object 'PriceDiff' not found


Traceback:

1. tally(~mean, data = PriceDiff)


2. mosaic_tally.formula(~mean, data = PriceDiff)
3. evalFormula(formula, data)
4. evalSubFormula(rhs(formula), ops = ops, data, env = environment(formula))
5. data.frame(eval(x, data, env), stringsAsFactors = FALSE)
6. eval(x, data, env)

1.7 Don’t forget: If you “Close and Halt” before you go, the server won’t be
so slow!

15

You might also like