How Does An Outlier Affect The Least Squares

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

How does an

outlier affect
the least
squares line?
BY DIVIT RAJPUT
Introduction
• Hockey games are profitable as
we can sell drinks to the fans.

• The temperature can greatly affect


the sales, which is what is being
explored in this report.

(Figure 2) (Figure 1)
Dependent and Independent
Variable

The dependent variable here is


the number of drinks sold
The dependent variable goes on
assuming that if the temperature
the y axis while the
is higher, more drinks will be
independent variable goes on
bought as people are more likely
the x axis.
to get thirsty in warmer
temperatures.
Effect of temperature on drinks sold
900

800

700

600

Scatter Plot

Drinks sold
500

400

300

200

100

0
0 5 10 15 20 25 30 35 40 45

Temperature
The pattern
• The trend – In most cases, when the temperature increases, so
do the sales of the drinks.
Pearson’s r
• r = 0.431

• This value tells us that there is a weak positive linear


relationship between the number of drinks sold and
temperature.
Least squares
regression equation
• y = 6.326x + 506.62

• The gradient is 6.326, which indicates that for every one


degree rise in temperature, 6.326 (6) more drinks are sold.
We can test the equation by comparing a point,
substituting it into the equation with the actual value.

For instance, lets use the point (30,750).

Testing the If we sub the x value into y = 6.326x + 506.62, we get

equation
y = 6.326 * 30 + 506.62

By solving the equation, we get 750.406, while the


actual value is 750.

We can assume that the equation is reasonable.


An outlier is a discrepancy or an unusual
value in data that does not follow the
general pattern of the graph.

Outliers
An example of an outlier is this data set
is the point (30,200).
If we were to discard to this outlier, the new r value would be 0.964 and the new
equation of the least square line would be y = 9.250x + 481.298

The new r value now indicates a stronger positive linear relationship between
the temperature and the number of drinks sold.

The new least square line equation has bigger positive


gradient with a smaller y intercept.

What if we discard the outlier?


What effect does temperature have on the number of drinks sold and
what impact did outliers have on the results?
• A rise in temperature results in an increase in the number of drinks
sold. This trend can be found in the graph, for example at 6 degrees,
we managed to sell 500 drinks, but when the temperature went all the
way up to 33, we sold 800 drinks.
• Outliers typically increased the accuracy and validity of the data. In
Conclusion this example, when we took the outlier into consideration our
Pearson r value = 0.431, but after avoiding the outlier the r value
increased to r = 0.964.
•This indicates that outliers show an improper representation of the
mean of the data as the Pearson formula uses a lot of averages, but do
not affect the median or mode of a data set.
Resources
https://www.nutritionwarehouse.com.au/products/hydration-drink-by-prime (Figure 1) Accessed
2/8/2022

https://www.nhl.com/news/nhl-centennial-classic-alumni-game-fan-reminders/c-285166780 (Figure 2)
Accessed 3/8/2022

https://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm Accessed 4/8/2022

You might also like