Professional Documents
Culture Documents
09B Rnorm WalmartTarget
09B Rnorm WalmartTarget
09B Rnorm WalmartTarget
May 4, 2021
suppressPackageStartupMessages({
library(mosaic)
library(supernova)
library(Lock5withR)
})
shopping
# Take a look at the data frame
1
ItemDetail
<chr>
Tide Original Liquid Laundry Detergent, 100 oz
Angel Soft Toilet Paper, 18 mega rolls
Crest 3D White, Whitening Toothpaste Radiant Mint, 4.1 oz, Pack of 3
Dove Men+Care Clean Comfort Deodorant Stick, 3 oz
Downy April Fresh Fabric Softener Dryer Sheets, 240 count
Listerine Cool Mint Antiseptic Mouthwash Oral Care And Breath Freshener, 1.5 L
Neutrogena Ultra Sheer Sunscreen Lotion, SPF 55, 3 oz
OFF! Smooth & Dry Aerosol Twin Pack, 8 fl oz
Pantene Pro-V Classic Clean Shampoo, 20.1 fl oz
Philips Sonicare Essence Rechargeable Sonic Toothbrush
Secret pH Balanced Powder Fresh Invisible Solid Antiperspirant and Deodorant Twin Pa
Olay Moisturizing Face Lotion for Sensitive Skin, 6.0 fl oz
Opti-Free Pure Moist Contact Solution, 20 fl oz
Windex Original Glass Cleaner Spray, 26 fl oz
Febreze Fabric Refresher with Gain Original, 27 oz
Pledge Multi Surface Antibacterial Everyday Cleaner, 9.7 oz
Shark Navigator Freestyle Cordless Stick Vacuum
Magic Bullet NutriBullet Pro 900 Series
Keurig K-Select Single-Serve Coffee Brewer
Energizer Max AAA Batteries, 4 count
Larabar Fruit & Nut Food Bar, 5 count
Oreo Cookies, Chocolate, 19.1 oz
Coca-Cola, 12 fl oz, 12 count
Folgers Gourmet Supreme Dark Roast Ground Coffee, 24.2 oz
Jif Creamy Peanut Butter, 40 oz
Doritos Nacho Cheese Tortilla Chips Party Size!, 15 oz.
Rold Gold Tiny Twists Pretzels, 16 oz
LEGO Creator Mighty Dinosaurs 31058
Monopoly
A data.frame: 174 × 10 Radio Flyer Classic Red Wagon
�
Brita Large 10 Cup Grand Water Pitcher with Filter - BPA Free - White
Keurig K-Classic Coffee Maker K-Cup Pod, Single Serve, Programmable, Black
Rachel Ray Cucina Porcelain Aluminum 12 Piece Cookware Set
LG NeoChef 0..9 Cu. Ft. 1000W Countertop Microwave
Cuisinart Mini-Prep Food Processor
Char-Broil Classic 2 Burner Gas Grill
Frigidaire 50-Pint Dehumidifier
Crock-Pot 6 Qt. Programmable Cook & Carry Slow Cooker with Digital Timer, Stainles
Pilot G2 Bold Point Retractable Gel Pens, 1 Dozen
S’ip by S’well Vacuum Insulated Stainless Steel Water Bottle
Neutrogena Clear Face Liquid Lotion Sunscreen with SPF 5, 3 fl. Oz
Hasbro Connect 4 Game
Valor Fitness 20lb Soft Kettlebell
Olay Regenerist Whip Facial Moisturizer - SPF 25 - 1.7oz
Neutrogena Ultra Sheer Non-Greasy Sunscreen Stick, SPF 70, 15 oz
L’Oreal Paris True Match Lumi Glotion Highlighter, Fair*
Biotrue for Soft Contact2 Lenses Multi-Purpose Solution, 10 oz
Fiskars PowerGear 2 Hedge Shears (23”)
JBL Flip 4 Bluetooth Portable Stereo Speaker
Playstation 4 Pro 1TB Gaming Console
The cases are the individual items, which are all sold at both stores.
One reason people may prefer one store over the other might be the prices. Many people believe
Walmart to be cheaper.
Does Target have higher prices relative to Walmart? How much more expensive is it?
1.3 - It would be helpful to know how much more expensive some item is at Target relative to
Walmart. Is there a variable like that in this data frame? How could we make one? Explain the
steps below.
Since we already have the variables “WalmartPrice” and “TargetPrice,” we would need
a way to somehow combine these two variables for each of the items and create a new
variable that can quickly tell us which store a certain item is cheaper or more expensive
at. This variable should also tell us the difference between the prices.
1.4 - Follow the steps from 1.3 to create a new variable called PriceDiff within the same data
frame.
[5]: shopping$PriceDiff <- (shopping$WalmartPrice - shopping$TargetPrice)
shopping$PriceDiff
shopping
3
150. 20 151. -0.990000000000009 152. 10 153. -1 154. -3 155. 6.49 156. 0 157. 0 158. -4.02
159. -0.0199999999999996 160. 2.78 161. -0.00999999999999979 162. -0.170000000000002
163. -0.039999999999992 164. 7.95999999999998 165. -2 166. 0 167. -33.43 168. -42.38 169. 0
170. -0.00999999999999979 171. 0 172. 2.77 173. 5.59 174. -0.709999999999999
4
ItemDetail
<chr>
Tide Original Liquid Laundry Detergent, 100 oz
Angel Soft Toilet Paper, 18 mega rolls
Crest 3D White, Whitening Toothpaste Radiant Mint, 4.1 oz, Pack of 3
Dove Men+Care Clean Comfort Deodorant Stick, 3 oz
Downy April Fresh Fabric Softener Dryer Sheets, 240 count
Listerine Cool Mint Antiseptic Mouthwash Oral Care And Breath Freshener, 1.5 L
Neutrogena Ultra Sheer Sunscreen Lotion, SPF 55, 3 oz
OFF! Smooth & Dry Aerosol Twin Pack, 8 fl oz
Pantene Pro-V Classic Clean Shampoo, 20.1 fl oz
Philips Sonicare Essence Rechargeable Sonic Toothbrush
Secret pH Balanced Powder Fresh Invisible Solid Antiperspirant and Deodorant Twin Pa
Olay Moisturizing Face Lotion for Sensitive Skin, 6.0 fl oz
Opti-Free Pure Moist Contact Solution, 20 fl oz
Windex Original Glass Cleaner Spray, 26 fl oz
Febreze Fabric Refresher with Gain Original, 27 oz
Pledge Multi Surface Antibacterial Everyday Cleaner, 9.7 oz
Shark Navigator Freestyle Cordless Stick Vacuum
Magic Bullet NutriBullet Pro 900 Series
Keurig K-Select Single-Serve Coffee Brewer
Energizer Max AAA Batteries, 4 count
Larabar Fruit & Nut Food Bar, 5 count
Oreo Cookies, Chocolate, 19.1 oz
Coca-Cola, 12 fl oz, 12 count
Folgers Gourmet Supreme Dark Roast Ground Coffee, 24.2 oz
Jif Creamy Peanut Butter, 40 oz
Doritos Nacho Cheese Tortilla Chips Party Size!, 15 oz.
Rold Gold Tiny Twists Pretzels, 16 oz
LEGO Creator Mighty Dinosaurs 31058
Monopoly
A data.frame: 174 × 11 Radio Flyer Classic Red Wagon
�
Brita Large 10 Cup Grand Water Pitcher with Filter - BPA Free - White
Keurig K-Classic Coffee Maker K-Cup Pod, Single Serve, Programmable, Black
Rachel Ray Cucina Porcelain Aluminum 12 Piece Cookware Set
LG NeoChef 0..9 Cu. Ft. 1000W Countertop Microwave
Cuisinart Mini-Prep Food Processor
Char-Broil Classic 2 Burner Gas Grill
Frigidaire 50-Pint Dehumidifier
Crock-Pot 6 Qt. Programmable Cook & Carry Slow Cooker with Digital Timer, Stainles
Pilot G2 Bold Point Retractable Gel Pens, 1 Dozen
S’ip by S’well Vacuum Insulated Stainless Steel Water Bottle
Neutrogena Clear Face Liquid Lotion Sunscreen with SPF 5, 3 fl. Oz
Hasbro Connect 4 Game
Valor Fitness 20lb Soft Kettlebell
Olay Regenerist Whip Facial Moisturizer - SPF 25 - 1.7oz
Neutrogena Ultra Sheer Non-Greasy Sunscreen Stick, SPF 70, 15 oz
L’Oreal Paris True Match Lumi Glotion Highlighter, Fair*
Biotrue for Soft Contact5 Lenses Multi-Purpose Solution, 10 oz
Fiskars PowerGear 2 Hedge Shears (23”)
JBL Flip 4 Bluetooth Portable Stereo Speaker
Playstation 4 Pro 1TB Gaming Console
1.5 - Make some predictions:
If you think Walmart is cheaper than Target, around what number do you think would be the
average for PriceDiff?
If you think Target is cheaper than Walmart, around what number do you think would be the
average for PriceDiff?
(Hint: One of the above has to be a negative number. Figure out which one and why.)
Based on the data given to us by our new variable, I would predict that the negative
value would come from the Target prices because their prices are usually higher for the
same products that can also be found at Walmart. Since Walmart is generally cheaper
than Target, I predict that the average PriceDiff is around -5.00, because there are
several items that have a greater price difference between them even if most of them
have a smaller difference.
6
According to the visualization, the vast majority of the items do not have a difference
in their prices at all, or the difference is less than one dollar which causes them to also
land on the zero. We also see that the price differences for a few other items outside
of the zero to fall between around the -5 and +11 dollar range, while others are even
further outside of that. The greatest price difference in the negative range is about -42
dollars, while the greatest in the positive range is around +30 dollars.
2.2 - From this data does it seem like Walmart is fundamentally cheaper than Target? Explain why
or why not.
I think Walmart is at least a bit cheaper than Target based on the data from these
specific items because the vast majority of their price differences show that Walmart
items are often a few cents cheaper than the same items that can also be found at
Target.
7
If that was true, what would be the best β0 to represent the Data Generating Process? Write that
number in the GLM below.
Yi = 0.02 + ϵi
(Why are we using β0 and ϵi here?)
ItemDetail Wa
<chr> <d
1 Tide Original Liquid Laundry Detergent, 100 oz 11.
2 Angel Soft Toilet Paper, 18 mega rolls 14.
A data.frame: 6 × 4
3 Crest 3D White, Whitening Toothpaste Radiant Mint, 4.1 oz, Pack of 3 9.9
4 Dove Men+Care Clean Comfort Deodorant Stick, 3 oz 4.4
5 Downy April Fresh Fabric Softener Dryer Sheets, 240 count 8.9
6 Listerine Cool Mint Antiseptic Mouthwash Oral Care And Breath Freshener, 1.5 L 6.5
3.4 - Does this model of the DGP mean that every thing sold at Walmart will be exactly equal
two cents off of Target’s price? Which part of the GLM shows you that there is still some variation
in the price differences?
The error, or ϵi shows us that there is still variation among the price differences, because
our new model does not mean that everything at Walmart will always be two cents
cheaper. Instead, this means that -0.02 is the most common price difference.
3.5 - Take a look at this histogram from 2.1 again – is it possible that these price differences came
from a DGP of normally distributed price differences?
I think these price differences were not generated by a normal distribution because most
of the price differences are not too far off from zero, while there are also a few outliers.
8
We are trying to imagine that the world of all the stuff you could buy at Target or
Walmart (“the population”) is similar to the price differences reflected in our sample
of 174 items. How much could our b0 vary if we had taken a bunch of different samples,
over and over again?
5.1 - Run the code below a few times and observe how it works. Why does one number stay the
same and the other number change? What are these two numbers?
[9]: mean(shopping$PriceDiff)
-0.0753448275862069
0.215123946496186
The top number is the mean, and it could be showing us the average difference between
Target and Walmart prices. The bottom number could be the mean for the population.
5.2 - The code below will generate 10 b0 s. What do you notice about the generated b0 s? What are
they typically like? How big can they get? How small can they get? If you generated a 1000 of
these b0 s, what would the resulting distribution look like?
[10]: do(10) * mean(rnorm(174, PriceDiff.stats$mean, PriceDiff.stats$sd))
mean
<dbl>
-0.2240535
0.1031303
0.7385611
0.3638840
A do.data.frame: 10 × 1
-0.6222328
0.3527681
0.1093849
0.2985917
0.1484727
-0.2785639
We notice that the b0s are all decimal points between 1 and -1. If we generated 1,000
of them, the resulting distribution would have a normal shape, and the center would be
around zero.
5.3 - Modify the code to below generate 1000 means. Then create a visualization of the resulting
object sdob.
[11]: # Modify this code to simulate 1000 samples, and save it into an object called␣
,→sdob
9
head(sdob)
mean
<dbl>
1 -0.14535312
2 -0.04590286
A do.data.frame: 6 × 1
3 0.01131302
4 -0.62639779
5 -0.13451124
6 0.22618730
5.4 - This is a sampling distribution created from the assumption that the DGP is basi-
cally like our sample. What is the DGP represented by rnorm(174, PriceDiff.stats$mean,
PriceDiff.stats$sd)? Put it in the GLM equation below.
Yi = 0 + ϵi
10
5.5 - We have saved our sample b0 in an R object called sampleb0. In the histogram below we
have colored in the middle 95% of means in one color to show that they are “likely” to come from
this DGP. Is our sample (it’s going to appear as a green line) going to be in the “likely”
zone?
Make a prediction, then run the code below.
We think that most of our means will be in the “likely” zone because most of them are
in between -1 and 1.
[12]: # This code will help us color the middle "likely" .95 of samples in
# a different color than the "unlikely" .05 of samples
sampleb0 <- mean(shopping$PriceDiff)
sdob <- do(1000) * mean(rnorm(174, PriceDiff.stats$mean, PriceDiff.stats$sd))
sdob <- arrange(sdob, mean)
sdob$middle.95 <- c(rep("unlikely", 25),rep("likely", 950), rep("unlikely", 25))
Warning message:
“geom_vline(): Ignoring `mapping` because `xintercept` was provided.”
11
1.6 6.0 - What about other DGPs?
That’s nice and all but we aren’t so sure we should assume the DGP is just like our sample! We
want to investigate other possible DGPs (like maybe the true difference is just two cents or even
one dollar).
6.1 - How might we modify the code below to simulate this DGP: Yi = .02 + ϵi ? How about
Yi = 0 + ϵi ? Or Yi = 1 + ϵi ?
Will our sample b0 end up in the “likely” zone from these DGPs? Which ones?
Our sample b0 will only end up in the “likely” zone if we have a mean of 1.
[18]: my.sdob <- do(1000) * mean(rnorm(174, .02, PriceDiff.stats$sd))
# This will help us color the sdob's likely and unlikely zones
my.sdob <- arrange(my.sdob, mean)
my.sdob$middle.95 <- c(rep("unlikely", 25),rep("likely", 950), rep("unlikely",␣
,→25))
12
# This will create a visualization that includes our sampleb0
gf_histogram(~ mean, data = my.sdob, binwidth = .05, fill = ~middle.95) %>%
gf_vline(xintercept = sampleb0, color = "green4") %>%
gf_refine(scale_fill_manual(values = c("dodgerblue", "coral")))
Warning message:
“geom_vline(): Ignoring `mapping` because `xintercept` was provided.”
6.2 - Why is our sample b0 in the unlikely zone when the true DGP has a β0 = 1? Why is it on the
extreme low end of this sampling distribution?
This could be because most of our data is in between -1 and +1, making anything that
ends up on either of these ends much more unlikely because our distribution is uniform.
6.3 - Is it possible that our sample b0 happens to be a bit high? Imagine a DGP where our sample
would end up on the extreme high end of the sampling distribution. What are some β0 s where our
sample would end up unlikely because it is too high?
13
Write your example here in GLM form.
Yi = 1 + ϵi
Then test it out by modifying the code below.
[19]: my.sdob <- do(1000) * mean(rnorm(174, 1, PriceDiff.stats$sd))
# This will help us color the sdob's likely and unlikely zones
my.sdob <- arrange(my.sdob, mean)
my.sdob$middle.95 <- c(rep("unlikely", 25),rep("likely", 950), rep("unlikely",␣
,→25))
Warning message:
“geom_vline(): Ignoring `mapping` because `xintercept` was provided.”
14
6.4 - If we simulated a DGP where the true β0 was actually 0, is it possible to generate a sample
that would have the same mean as ours? Is it likely?
This would be likely because zero is in the middle of the “likely” zone, and so there is
a lot of other data around it.
6.5 - What does this data say about the true price differences at Walmart and Target? What could
the true price difference actually be?
This graph shows us that the price difference is only in cents, so prices at Walmart are,
on average, less than a dollar cheaper than the items are at Target.
6.6 - Out of curiosity, how many items are cheaper at Walmart? How many are cheaper at Target?
Does that change your view about which store is “cheaper”? How is this measure of “cheapness”
different from PriceDiff?
[25]: tally(~mean, data = PriceDiff)
1.7 Don’t forget: If you “Close and Halt” before you go, the server won’t be
so slow!
15