Chapter 3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 98

Chapter 3

Descriptive statistics: Numerical Measures

1
Chapter 3
Part 1: Measures of Location:
a) Mean (sample, weighted and geometric mean)
b) Mode
c) Median
d) Percentiles
e) Quartiles

Part 2: Measures of Variability: Part 3:


a) Range a) Empirical Rule
b) Interquartile Range b) Boxplot and Outliers
c) Variance c) Skewness
d) Standard Deviation d) Measures of association between
e) Coefficient of Variation two variables
2
Chapter 3
Formulas needed in the chapter

3
Part 1
Measures of Location
1. Mean
If the measures are computed
a) Sample mean for data from a sample,
b) Weighted mean they are called sample statistics.
c) Geometric mean
If the measures are computed
2. Mode for data from a population,
they are called population parameters.
3. Median
A sample statistic is referred to
4. Quartiles
as the point estimator of the
corresponding population parameter.
5. Percentiles
4
1. Mean (Sample, Weighted and Geometric Mean)
also called “Average”
a) The 𝐬𝐚𝐦𝐩𝐥𝐞 𝐦𝐞𝐚𝐧 (ഥ
𝒙) also called “Arithmetic mean”:
The mean of a data set is the average of all the data values
σ 𝒙𝒊
ഥ=
𝒙
𝒏
Where σ 𝒙𝒊 is the sum of all observations, and n is the total number of
observations.
The sample mean is a point estimate of the population mean 𝜇.

5
Example A: Apartment Rents
Seventy apartments were randomly
sampled in a small university town.
The monthly rent prices (in €) for
these apartments are listed below:
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440

Find the sample mean for the monthly rent prices. 6


445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440

ഥ :
𝐓𝐡𝐞 𝐬𝐚𝐦𝐩𝐥𝐞 𝐦𝐞𝐚𝐧 𝒙
σ 𝑥𝑖
ഥ=
𝒙 =
𝑛

7
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440

ഥ :
𝐓𝐡𝐞 𝐬𝐚𝐦𝐩𝐥𝐞 𝐦𝐞𝐚𝐧 𝒙
σ 𝑥𝑖 445 + 615 + ⋯ + 440 34356
ഥ=
𝒙 = = = 𝟒𝟗𝟎. 𝟖𝟎
𝑛 70 70

8
b) The 𝐖𝐞𝐢𝐠𝐡𝐭𝐞𝐝 𝐦𝐞𝐚𝐧 𝒙𝒘

• When the mean is computed by giving each data value a


weight that reflects its importance, it is referred to as a
weighted mean.

• When data values vary in importance, the analyst must


choose the weight that best reflects the importance of each
value.
σ 𝒘𝒊 𝒙𝒊
The 𝐖𝐞𝐢𝐠𝐡𝐭𝐞𝐝 𝐦𝐞𝐚𝐧 (𝒙𝒘 ) =
σ 𝒘𝒊

Where 𝒙𝒊 is the value of observation 𝑖, and 𝒘𝒊 is the weight of observation 𝑖.


9
Example B: Mutual Funds
Morningstar tracks the total return for a large number of mutual funds. The
following table shows the total return and the number of funds for four
categories of mutual funds (Morningstar Funds500, 2008).
Number of
Purchase Total Return (%)
Funds
Domestic Equity 9191 4.65
International Equity 2621 18.15
Specialty Stock 1419 11.36
Hybrid 2900 6.75

Using the number of funds as weights, compute the weighted average total
return for the mutual funds covered by Morningstar.

10
Number of Funds Total Return (%)
Purchase
𝒘𝒊 𝒙𝒊

Domestic Equity 9191 4.65


International Equity 2621 18.15
Specialty Stock 1419 11.36
Hybrid 2900 6.75

σ 𝒘𝒊 𝒙𝒊
𝐖𝐞𝐢𝐠𝐡𝐭𝐞𝐝 𝐦𝐞𝐚𝐧 (𝒙𝒘 ) = σ 𝒘𝒊
=

11
Number of Funds Total Return (%)
Purchase 𝒘𝒊 × 𝒙 𝒊
𝒘𝒊 𝒙𝒊

Domestic Equity 9191 4.65 42738.15


International Equity 2621 18.15 47571.15
Specialty Stock 1419 11.36 16119.84
Hybrid 2900 6.75 19575
Total 16131 40.91 126004.14

σ 𝒘𝒊 𝒙𝒊 𝟗𝟏𝟗𝟏∗𝟒.𝟔𝟓 +⋯+(𝟐𝟗𝟎𝟎∗𝟔.𝟕𝟓)
𝐖𝐞𝐢𝐠𝐡𝐭𝐞𝐝 𝐦𝐞𝐚𝐧 (𝒙𝒘 ) = σ 𝒘𝒊
=
𝟗𝟏𝟗𝟏+𝟐𝟔𝟐𝟏+𝟏𝟒𝟏𝟗+𝟐𝟗𝟎𝟎

126004.1
= = 𝟕. 𝟖𝟏
16131

12

c) The 𝐆𝐞𝐨𝐦𝐞𝐭𝐫𝐢𝐜 𝐦𝐞𝐚𝐧 𝒙

• The geometric mean is most appropriate in situations where


the data items to be summarised result from a ratio-type
calculation, such as with growth rates or index numbers.

𝑛
Geometric mean = 𝒙𝒈 = 𝑥1 × 𝑥2 × … × 𝑥𝑛

Where 𝒙𝒊 is the value of observation 𝑖, and 𝒏 is the total number of


observations.

13
Example C: Share Price
Consider the following five data values, which represent the share
price of a company at the beginning of five successive years,
relative to the price at the start of the previous year.

1.11 1.35 0.80 1.40 1.05

compute the geometric mean share price relative.

14
1.11 1.35 0.80 1.40 1.05

𝑛
Geometric mean = 𝒙𝒈 = 𝑥1 × 𝑥2 × … × 𝑥𝑛

15
1.11 1.35 0.80 1.40 1.05

𝑛
Geometric mean = 𝒙𝒈 = 𝑥1 × 𝑥2 × … × 𝑥𝑛

5
= 1.11×1.35×0.80×1.40×1.05

= 𝟏. 𝟏𝟐𝟎

16
2. Mode

• The mode of a data set is the value that occurs with greatest
frequency.

• The greatest frequency can occur at two or more different


values.

• If the data have exactly two modes, the data are bimodal.

• If the data have more than two modes, the data are multimodal.

17
Example:

• The Mode in Example A: Apartment Rents


Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

18
Example:

• The Mode in Example A: Apartment Rents


Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

The value 450 repeated 7 times, the most repeated value, and
therefore the Mode of our data set is 450.
19
3. Median also called “Second Quartile (𝑄2 )”
• The median is a measure of central location provided by the value in the
middle when the data are arranged in ascending order.

• Whenever a data set has extreme values, the median is the preferred
measure of central location.

To get the median, first arrange the data in ascending order (smallest value to
largest value), then:
a) For an odd number of observations, the median is the middle value.
b) For an even number of observations, the median is the average of the two
middle values.
20
Example:

• The Median in Example C: Share Price

Data: 1.11 1.35 0.80 1.40 1.05

21
Example:

• The Median in Example C: Share Price

Sorted Data: 0.80 1.05 1.11 1.35 1.40

n= 5 is odd

Therefore, The Median is 1.11

22
Example:

• The Median in Example A: Apartment Rents


Data:
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440
23
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

24
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

n= 70 is even, so the median is the average of the two middle


values.
𝟒𝟕𝟓+𝟒𝟕𝟓
Here we have 𝑴𝒆𝒅𝒊𝒂𝒏 = = 𝟒𝟕𝟓
𝟐 25
4. Percentiles

• A percentile provides information about how the data are


spread over the interval from the smallest value to the largest
value.

• The pth percentile of a data set is a value such that at least p


percent of the items take on this value or less and at least (100 -
p) percent of the items take on this value or more.

26
Steps to Calculate the pth percentile:

1. Arrange the data in ascending order.

2. Compute marker 𝑖, the position of the 𝑝𝑡ℎ percentile.


𝑝
𝑖 = 𝑛
100
3. If 𝑖 is not an integer, then round up. The 𝑝𝑡ℎ percentile is the
value in the 𝑖𝑡ℎ position.
4. If 𝑖 is an integer, the 𝑝𝑡ℎ percentile is the average of the values
in positions 𝑖 and 𝑖 + 1.
27
Example:

1. Find the 20th Percentile in Example A: Apartment Rents


Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

28
We need to find the 20th Percentile, therefore 𝑝 = 20.
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
n = 70.
𝑝 20
𝑖 = 𝑛= 70 = 14 (integer), then use position 14 and 15.
100 100
Therefore, 20𝑡ℎ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 =
𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑤𝑜 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝟏𝟒 𝑎𝑛𝑑 𝟏𝟓 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎𝑠𝑒𝑡
445 + 445
= = 445.
2 29
Example:

2. Find the 35th Percentile in Example A: Apartment Rents


Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

30
We need to find the 35th Percentile, therefore 𝑝 = 35.
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

n = 70.
𝑝 35
𝑖 = 𝑛= 70 = 24.5 (Not an integer), then round it up ≅ 𝟐𝟓
100 100
Therefore, 35𝑡ℎ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 25 𝑖𝑛 𝑑𝑎𝑡𝑎 = 450.

31
5. Quartiles
• Quartiles are specific percentiles:
 First Quartile (𝑄1 ) = 25th Percentile (25% of data are less or equal to 𝑄1 )

 Second Quartile (𝑄2 ) = 50th Percentile (50% of data are less or equal to 𝑄2 )

 Third Quartile (𝑄3 ) = 75th Percentile (75% of data are less or equal to 𝑄3 )

• Quartiles are also approximately calculated using different methods:


𝒏+𝟏 𝒕𝒉 𝒏+𝟏 𝒕𝒉 𝒏+𝟏 𝒕𝒉
𝑸𝟏 = 𝑸𝟐 = 𝟐 𝑸𝟑 = 𝟑
𝟒 𝟒 𝟒

Note: Using these formulas you will obtain positions of quartiles in the data set, if the position
is integer we take directly its value from the data, and if it is not integer we calculate the
average of values of which the position falls. 32
Quartiles on Number Line

33
Example: Back to Example A: Apartment Rents
Sorted Data: 425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Compute the following:


a) First Quartile (𝑄1 )
b) Second Quartile (𝑄2 )
c) Third Quartile (𝑄3 )
34
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
a) First Quartile (𝑸𝟏 ):
• 1st method: Using Quartile rule with n=70

𝒏+𝟏 𝒕𝒉
𝑸𝟏 = =
𝟒

35
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
a) First Quartile (𝑸𝟏 ):
• 1st method: Using Quartile rule with n=70

𝒏+𝟏 𝒕𝒉 𝟕𝟎+𝟏 𝒕𝒉 𝒕𝒉 𝟏𝟕 𝒕𝒉 + 𝟏𝟖 𝒕𝒉 𝟒𝟒𝟓+𝟒𝟒𝟓


𝑸𝟏 = = = 𝟏𝟕. 𝟕𝟓 = = = 445
𝟒 𝟒 2 2

Therefore 𝑄1 = 445 36
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
a) First Quartile (𝑸𝟏 ):
• 2nd method: Using Percentile rule: 𝑄1 = 25𝑡ℎ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒, n=70 and p=25. Then:
𝑝
𝑖 = 𝑛=
100

37
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
a) First Quartile (𝑸𝟏 ):
• 2nd method: Using Percentile rule: 𝑄1 = 25𝑡ℎ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒, n=70 and p=25. Then:
𝑝 25
𝑖 = 𝑛= 70 = 17.5 ≅ 18, therefore 𝑄1 ≅ 445
100 100

38
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
b) Second Quartile (𝑸𝟐 ):
• 1st method: Using Quartile rule with n=70

𝒏+𝟏 𝒕𝒉
𝑸𝟐 = 𝟐 =
𝟒

39
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
b) Second Quartile (𝑸𝟐 ):
• 1st method: Using Quartile rule with n=70

𝒏+𝟏 𝒕𝒉 𝟕𝟎+𝟏 𝒕𝒉 𝒕𝒉 𝒕𝒉 𝟑𝟓 𝒕𝒉 + 𝟑𝟔 𝒕𝒉
𝑸𝟐 = 𝟐 =2 = 𝟐 𝟏𝟕. 𝟕𝟓 = 𝟑𝟓. 𝟓 =
𝟒 𝟒 2
𝟒𝟕𝟓+𝟒𝟕𝟓
= = 475, Therefore 𝑄1 = 475 40
2
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
b) Second Quartile (𝑸𝟐 ):
• 2nd method: Using Percentile rule: 𝑄2 = 50𝑡ℎ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒, n=70 and p=50. Then:
𝑝
𝑖 = 𝑛=
100

41
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
b) Second Quartile (𝑸𝟐 ):
• 2nd method: Using Percentile rule: 𝑄2 = 50𝑡ℎ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒, n=70 and p=50. Then:

𝑝 50 𝟑𝟓𝒕𝒉 +𝟑𝟔𝒕𝒉 𝟒𝟕𝟓+𝟒𝟕𝟓


𝑖 = 𝑛= 70 = 35, therefore 𝑄1 ≅ = = 475
100 100 2 2
42
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
c) Third Quartile (𝑸𝟑 ):
• 1st method: Using Quartile rule with n=70

𝒏+𝟏 𝒕𝒉
𝑸𝟑 = 𝟑 =
𝟒

43
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
c) Third Quartile (𝑸𝟑 ):
• 1st method: Using Quartile rule with n=70

𝒏+𝟏 𝒕𝒉 𝟕𝟎+𝟏 𝒕𝒉 𝒕𝒉 𝒕𝒉 𝟓𝟑 𝒕𝒉 + 𝟓𝟒 𝒕𝒉
𝑸𝟑 = 𝟑 =3 = 𝟑 𝟏𝟕. 𝟕𝟓 = 𝟓𝟑. 𝟐𝟓 =
𝟒 𝟒 2
𝟓𝟐𝟓+𝟓𝟐𝟓
= = 525, Therefore 𝑄1 = 525 44
2
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
c) Third Quartile (𝑸𝟑 ):
• 2nd method: Using Percentile rule: 𝑄3 = 75𝑡ℎ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒, n=70 and p=75. Then:
𝑝
𝑖 = 𝑛=
100

45
Sorted Data:
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
c) Third Quartile (𝑸𝟑 ):
• 2nd method: Using Percentile rule: 𝑄3 = 75𝑡ℎ 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒, n=70 and p=75. Then:
𝑝 75
𝑖 = 𝑛= 70 = 52.5 ≅ 53, therefore 𝑄3 ≅ 525
100 100

46
Part 2
Measures of Variability
It is often desirable to consider measures of variability (dispersion), as well as
measures of location.
For example, in choosing supplier A or supplier B we might consider not only the
average delivery time for each, but also the variability in delivery time for each.
1. Range
2. Interquartile Range
3. Variance
4. Standard Deviation
5. Coefficient of Variation
47
1. Range

• The Range is the simplest measure of variability.

• The range of a data set is the difference between the largest


and smallest data values.
𝑹𝒂𝒏𝒈𝒆 = 𝒍𝒂𝒓𝒈𝒆𝒔𝒕 𝒗𝒂𝒍𝒖𝒆 𝑴𝒂𝒙 − 𝒔𝒎𝒂𝒍𝒍𝒆𝒔𝒕 𝒗𝒂𝒍𝒖𝒆(𝑴𝒊𝒏)

Note: The range is very sensitive to the smallest and largest data
values.

48
Example:

• Find The Range of the data in Example A (Apartment Rents).


Sorted Data
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

49
• Find The Range of the data in Example A (Apartment Rents).
Sorted Data
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

𝑹𝒂𝒏𝒈𝒆 = 𝑙𝑎𝑟𝑔𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 𝑀𝑎𝑥 − 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒(𝑀𝑖𝑛)

= 615 − 425 = 𝟏𝟗𝟎


50
2. Interquartile Range (IQR)

• The interquartile Range (IQR) is the range for the middle 50% of
the data.

• The interquartile range of a data set is the difference between


the third quartile and the first quartile.
𝑰𝑸𝑹 = 𝑸𝟑 − 𝑸𝟏
Note: The IQR overcomes the sensitivity to extreme data values.

51
Example:

• Find The IQR of the data in Example A (Apartment Rents).


Sorted Data
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: 𝑸𝟑 = 𝟓𝟐𝟓 & 𝑸𝟏 = 𝟒𝟒𝟓


52
• Find The IQR of the data in Example A (Apartment Rents).
Sorted Data
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
𝑸𝟑 = 𝟓𝟐𝟓 & 𝑸𝟏 = 𝟒𝟒𝟓
𝑰𝑸𝑹 = 𝑸𝟑 − 𝑸𝟏 = 𝟓𝟐𝟓 − 𝟒𝟒𝟓 = 𝟖𝟎
53
3. Variance
• The variance is the average of the squared differences
between each data value and the mean.
• The variance is computed as follows:
σ 𝐱 𝐢 −𝛍 𝟐
 Population Variance= 𝛔𝟐 =
𝐍

𝐱 𝟐
σ 𝐱 𝐢 −ത
 Sample Variance= 𝐒 𝟐 =
𝐧−𝟏

Note: To compute the variance, you need to obtain the mean at first. 54
Example D: Class size.
consider the following class size data for a sample of five college
classes:
46 54 42 46 32

Compute the Sample Variance for the class size data.

55
Example D: Class size.
consider the following class size data for a sample of five college
classes:
46 54 42 46 32
ഥ :
𝐓𝐡𝐞 𝐬𝐚𝐦𝐩𝐥𝐞 𝐦𝐞𝐚𝐧 𝒙
σ 𝑥𝑖 46 + 54 + 42 + 46 + 32
ഥ=
𝒙 = = 𝟒𝟒
𝑛 5
σ 𝒙𝒊 − 𝒙
ഥ 𝟐
𝟐
Sample Variance= 𝑺 =
𝒏−𝟏
56
ഥ = 𝟒𝟒
𝒙
Number of
Sample Mean Deviation about Squared Deviation
Students in
the Mean about the Mean
Class
(ഥ
𝒙) ഥ)
(𝒙𝒊 − 𝒙 ഥ )𝟐
(𝒙𝒊 − 𝒙
(𝒙𝒊 )
46
54
42
46
32
Total ഥ) =
෍(𝒙𝒊 − 𝒙 ഥ)𝟐 =
෍(𝒙𝒊 − 𝒙

σ ഥ 𝟐
𝒙𝒊 − 𝒙
Sample Variance= 𝑺𝟐 =
𝒏−𝟏 57
Number of
Sample Mean Deviation about Squared Deviation
Students in
the Mean about the Mean
Class
(ഥ
𝒙) ഥ)
(𝒙𝒊 − 𝒙 ഥ )𝟐
(𝒙𝒊 − 𝒙
(𝒙𝒊 )
46 44 2 4
54 44 10 100
42 44 -2 4
46 44 2 4
32 44 -12 144
Total ഥ) = 𝟎
෍(𝒙𝒊 − 𝒙 ഥ)𝟐 = 𝟐𝟓𝟔
෍(𝒙𝒊 − 𝒙

σ ഥ 𝟐
𝒙𝒊 − 𝒙 𝟐𝟓𝟔
Sample Variance= 𝑺𝟐 = = = 𝟔𝟒
𝒏−𝟏 𝟓−𝟏 58
4. Standard Deviation
• The standard deviation of a data set is the positive
square root of the variance.
• The standard deviation is computed as follows:
 Population standard deviation = 𝛔 = 𝛔𝟐

 Sample standard deviation = 𝐒 = 𝐒 𝟐


Note: To compute the standard deviation, you need to obtain the variance at first.
59
Example:
Using the data in example D (Class Size) Compute the sample
standard deviation for the class size data.
Notes:
𝑥ҧ = 44
Sample Variance= 𝑆 2 = 64

60
𝑥ҧ = 44
Sample Variance= 𝑆 2 = 64

Therefore,
Sample standard deviation = 𝐒 = 𝐒 𝟐 = 𝟔𝟒 = 𝟖.

61
5. Coefficient of Variation (C.V)
• The coefficient of variation indicates how large the
standard deviation is in relation to the mean.
• The coefficient of variation is computed as follows:
𝛔
 Population coefficient of variation = × 𝟏𝟎𝟎 %
𝝁

𝐒
 Sample coefficient of variation = ഥ
× 𝟏𝟎𝟎 %
𝒙
Note: To compute the coefficient of variation, you need to obtain the mean
and the standard deviation at first. 62
Example:
Using the data in example D (Class Size) Compute the Sample
coefficient of variation for the class size data.
Notes:
𝑥ҧ = 44
Sample standard deviation =𝑆 = 8

63
𝑥ҧ = 44
Sample standard deviation =𝑆 = 8

Therefore,
S
Sample coefficient of variation = × 100 %
𝑥ҧ
8
= × 100 % = 𝟏𝟖. 𝟐%
44

64
Extra Exercise:
Try to find the Variance, Standard Deviation as well as the
Coefficient of Variation of Example A data set (Apartment Rents)

• Variance

s2 
 i
( x  x ) 2

 2, 996.16
n1

• Standard Deviation

s  s 2  2996.16  54.74

• Coefficient of Variation the standard


deviation is
s   54.74  about 11%
  100  %    100  %  11.15% of the mean
x   490.80 
65
Part 3
We have described several measures of location and variability for data. In addition, it
is often important to have a measure of the shape of a distribution using the
Empirical Rule, Boxplot, Detecting Outliers and Skewness.
A manager or decision maker is typically interested in the link between two variables,
therefore examining the measures of association between two variables is regarded
to be very essential.

1. Empirical Rule
2. Boxplot and Outliers
3. Skewness
4. Measures of association between two variables
66
1. Empirical Rule
For data having a bell-shaped distribution
99.72%
95.44%
68.26%

x
m
m – 3s m – 1s m + 1s m + 3s
m – 2s m + 2s 67
• 68.26% of the values of a normal random variable are within +/- 1
standard deviation of its mean.
ഥ−𝑺 ; 𝒙
𝒙 ഥ+𝑺
• 95.44% of the values of a normal random variable are within +/- 2
standard deviation of its mean.
ഥ − 𝟐𝑺 ; 𝒙
𝒙 ഥ + 𝟐𝑺
• 99.72% of the values of a normal random variable are within +/- 3
standard deviation of its mean.
ഥ − 𝟑𝑺 ; 𝒙
𝒙 ഥ + 𝟑𝑺
68
Example:
Using the data in example D (Class Size: number of students in
class). Data: 46 54 42 46 32

We have the sample mean = 𝑥ҧ = 44 and the Sample


Standard Deviation = 𝑆 = 8. Answer the following questions:
a) Find the interval of which 99.7% of classes belongs to. Is
there any possible outliers?
b) What is the percentage for the following interval (28 ; 60)?
Note: use the empirical rule to get the answer.
69
Part b:
Data: 32 42 46 46 54 with ഥ = 44 and 𝑺 = 8
𝒙
a) Find the interval of which 99.7% of classes belongs to. Is
there any possible outliers?

70
Answer:

Data: 32 42 46 46 54 with ഥ = 44 and 𝑺 = 8


𝒙
a) Find the interval of which 99.72% of classes belongs to. Is there
any possible outliers?
ഥ − 𝟑𝑺 ; 𝒙
Using the empirical rule for 99.72%: 𝒙 ഥ + 𝟑𝑺
= 𝟒𝟒 − 𝟑 ∗ 𝟖 ; 𝟒𝟒 + 𝟑 ∗ 𝟖
= 𝟒𝟒 − 𝟐𝟒 ; 𝟒𝟒 + 𝟐𝟒
= 𝟐𝟎 ; 𝟔𝟖
No outliers since all the dataset belongs to the 99.72% Interval.
71
Part b:
Data: 32 42 46 46 54 with ഥ = 44 and 𝑺 = 8
𝒙
b) What is the percentage for the following (28 ; 60)? Note: use
the empirical rule to get the answer.

72
Answer:
Data: 32 42 46 46 54 with ഥ = 44 and 𝑺 = 8
𝒙
b) What is the percentage for the following (28 ; 60)? Note: use
the empirical rule to get the answer.
28 ; 60 = 44 − 16 ; 44 + 16
= 44 − 2 ∗ 8 ; 44 + 2 ∗ 8
= 𝒙ത − 𝟐𝑺 ; 𝒙ത + 𝟐𝑺
Therefore the percentage of the interval 28 ; 60 is 95.44%

73
2. Boxplot & Outliers
• A box plot is a graphical summary of data that is used to study
the shape of the distribution of data as well as to detect
potential outliers.

• Outliers are extreme values in your data set.


Lower Limit Upper Limit

𝑸𝟏 𝑸𝟐 𝑸𝟑 Potential Outliers
Potential Outliers
Start End

Variable Name

74
Drawing the boxplot requires the following steps:
1. Find 𝑸𝟏 (𝐹𝑖𝑟𝑠𝑡 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒) , 𝑸𝟐 (𝑀𝑒𝑑𝑖𝑎𝑛) and 𝑸𝟑 (𝑇ℎ𝑖𝑟𝑑 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒)

2. Find the 𝑳𝒐𝒘𝒆𝒓 𝑳𝒊𝒎𝒊𝒕 = 𝑄1 − 1.5 × 𝐼𝑄𝑅 and then determine if you have
outliers from the bottom (any value in the data less than the lower limit is
considered outlier)

3. Find the 𝑼𝒑𝒑𝒆𝒓 𝑳𝒊𝒎𝒊𝒕 = 𝑄3 + 1.5 × 𝐼𝑄𝑅 and then determine if you have
outliers from the top (any value in the data more than the upper limit is
considered outlier)

4. The Start point and End points of the boxplot are the lowest and highest
values in your dataset between the lower and upper limits.
75
Note: You may also draw approximately the boxplot (without using
lower and upper limits, assuming no outliers in the dataset) using
the five number summary, but this will not detect the outliers in
your data set.
𝑸𝟏 𝑸𝟐 𝑸𝟑
Minimum Maximum

Variable Name

Skewed data show a uneven boxplot, where the median cuts the box
into two unequal boxes:
• If the longer part of the box is to the right (or above) the median,
the data is said to be skewed right.
• If the longer part is to the left (or below) the median, the data
is skewed left. 76
Example: Draw the boxplot of the data in Example A (Apartment
Rents), is there any potential outliers?.
Sorted Data
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: 𝑸𝟏 = 𝟒𝟒𝟓 , 𝑸𝟐 = 𝟒𝟕𝟓 , 𝑸𝟑 = 𝟓𝟐𝟓 & 𝑰𝑸𝑹 = 𝟖𝟎

77
425 430 430 435 435 435 435 435 440 440
𝑸𝟏 = 𝟒𝟒𝟓 440 440 440 445 445 445 445 445 450 450
𝑸𝟐 = 𝟒𝟕𝟓 450 450 450 450 450 460 460 460 465 465
𝑸𝟑 = 𝟓𝟐𝟓 465 470 470 472 475 475 475 480 480 480
𝑰𝑸𝑹 = 𝟖𝟎 480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

78
425 430 430 435 435 435 435 435 440 440
𝑸𝟏 = 𝟒𝟒𝟓 440 440 440 445 445 445 445 445 450 450
𝑸𝟐 = 𝟒𝟕𝟓 450 450 450 450 450 460 460 460 465 465
𝑸𝟑 = 𝟓𝟐𝟓 465 470 470 472 475 475 475 480 480 480
𝑰𝑸𝑹 = 𝟖𝟎 480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

1. Lower Limit = Q1 − 1.5 × IQR = 445 − 1.5 × 80 = 325 (No outliers from
the bottom side)

2. Upper Limit = Q 3 + 1.5 × IQR = 525 + 1.5 × 80 = 645 (No outliers from
the top side)

3. Start Point is the Minimum (425), and the End point is the Maximum (615)
79
425 430 430 435 435 435 435 435 440 440
𝑸𝟏 = 𝟒𝟒𝟓 440 440 440 445 445 445 445 445 450 450
𝑸𝟐 = 𝟒𝟕𝟓 450 450 450 450 450 460 460 460 465 465
𝑸𝟑 = 𝟓𝟐𝟓 465 470 470 472 475 475 475 480 480 480
𝐒𝐭𝐚𝐫𝐭 𝐏𝐨𝐢𝐧𝐭 = 𝟒𝟐𝟓 480 485 490 490 490 500 500 500 500 510
𝐄𝐧𝐝 𝐏𝐨𝐢𝐧𝐭 = 𝟔𝟏𝟓 510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
𝑸𝟏 𝑸𝟐 𝑸𝟑
Minimum Maximum

Apartment rents
425 445 475 525 615

Data has no Potential outliers, and the distribution is Right skew (positively
skewed) since the Right box in bigger than the left box.
80
3. Skewness
• The skewness of the dataset shows the shape of the distribution whether it is
symmetric or skewed.

• Symmetric data means that the data is mainly centered around the mean
(average). Right Skew means that the bulk of data belongs to the bottom
range, and therefore the Left Skew means that the majority of data falls in the
top range.

• Skewness can be studies using the Histogram (covered in part 1), Boxplot, and
the comparison between the mean and median.
 IF 𝑿ഥ = 𝑸𝟐 then the data is symmetric
 IF 𝑿ഥ < 𝑸𝟐 then the data is Left Skew (Negatively Skewed)
 IF 𝑿ഥ > 𝑸𝟐 then the data is RightSkew (Positively Skewed)
81
Example: What can you say about the skewness of the data in
Example A (Apartment Rents)?
ഥ = 𝟒𝟗𝟎. 𝟖𝟎 and the boxplot as follow
Note: We have 𝑸𝟐 = 𝟒𝟕𝟓 , 𝒙
𝑸𝟏 𝑸𝟐 𝑸𝟑
Minimum Maximum

Apartment rents
425 445 475 525 615

82
Answer: What can you say about the skewness of the data in
Example A (Apartment Rents)?
ഥ = 𝟒𝟗𝟎. 𝟖𝟎 and the boxplot as follow
Note: We have 𝑸𝟐 = 𝟒𝟕𝟓 , 𝒙
𝑸𝟏 𝑸𝟐 𝑸𝟑
Minimum Maximum

Apartment rents
425 445 475 525 615

• By comparing the mean with the median (𝑸𝟐 ), we have 𝒙 ഥ > 𝑸𝟐


therefore we have right skew distribution.
• Using the boxplot, we have the right side of the box is bigger than
the left side of the box and so we have right skew distribution. 83
4. Measures of association between two variables

So far, we've looked at numerical approaches for summarizing data


for a single variable at a time. We will use covariance and
correlation as descriptive measurements of the relationship
between two variables in this section.

The scatterplot graphically depicts the relationship between the


two variables. When the relationship is linear (the dots in the
scatterplot form a straight pattern), the covariance and correlation
coefficient are used to determine the type and strength of the
relationship.
84
a) Covariance:
• The covariance is a measure of the linear association between
two variables (X and Y).
• Positive values indicate a positive relationship, and Negative
values indicate a negative relationship.
• The Covariance is computed as follows:
σ 𝐱 𝐢 −𝛍𝒙 𝐲𝐢 −𝛍𝒚
 Population Covariance = 𝛔𝑿𝒀 =
𝐍

σ 𝐱 𝐢 −ത
𝐱 𝐲𝐢 −ത
𝐲
 Sample Covariance = 𝐒𝑿𝒀 =
𝐧−𝟏
85
Number of Sales Volume
Example E: Sales Commercials ($100s)
(x) (y)
The store’s manager wants to determine the
2 50
relationship between the number of weekend 5 57
1 41
television commercials shown and the sales
3 54
at the store during the following week. 4 54
Sample data with sales expressed in hundreds 1 38
5 63
of dollars are provided in the following table: 3 48
4 59
2 46
Compute the sample covariance.
86
x y
𝒏 = 𝟏𝟎 2 50
5 57
1 41
3 54
4 54
1 38
5 63
3 48
4 59
2 46
Total
σ 𝐱 𝐢 − 𝐱ത 𝐲𝐢 − 𝐲ത
𝑺𝒂𝒎𝒑𝒍𝒆 𝑪𝒐𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 = 𝐒𝑿𝒀 =
𝐧−𝟏 87
x y 𝐱 𝐢 − 𝐱ത 𝐲𝐢 − 𝐲ത 𝐱 𝐢 − 𝐱ത 𝐲𝐢 − 𝐲ത
𝒏 = 𝟏𝟎 2 50 -1 -1 1
5 57 2 6 12
𝟑𝟎
𝐱ത = =𝟑 1 41 -2 -10 20
𝟏𝟎
𝟓𝟏𝟎 3 54 0 3 0
𝐲ത = = 𝟓𝟏 4 54 1 3 3
𝟏𝟎
1 38 -2 -13 26
5 63 2 12 24
3 48 0 -3 0
4 59 1 8 8
2 46 -1 -5 5
Total 30 510 0 0 99
σ 𝐱 𝐢 −ത
𝐱 𝐲𝐢 −ത
𝐲 𝟗𝟗
𝑺𝒂𝒎𝒑𝒍𝒆 𝑪𝒐𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 = 𝐒𝑿𝒀 = = = 𝟏𝟏 > 𝟎, therefore we have positive relationship.
𝐧−𝟏 𝟏𝟎−𝟏
88
b) Correlation Coefficient:
• Correlation is a measure of linear association and necessarily causation.

• Just because two variables are highly correlated, it does not mean that one
variable is the cause of the other.

• The Correlation Coefficient is computed as follows:


𝛔𝐗𝐘
 Population Correlation Coefficient = 𝛒𝐗𝐘 =
𝛔𝐗 𝛔𝐘
Where 𝝆𝐗𝐘 is the population covariance, 𝝈𝐗 is the population standard deviation
of x, and 𝝈𝐘 is the population standard deviation of y.

𝐒𝐗𝐘
 Sample Correlation Coefficient = 𝐫𝐗𝐘 =
𝐒𝐗 𝐒𝐘
Where 𝐒𝐗𝐘 is the sample covariance, 𝐒𝐗 is the sample standard deviation of x, and
𝐒𝐘 is yje sample standard deviation of y. 89
The correlation coefficient can take only values between -1 and +1.

-1 -0.7 -0.3 0 +0.3 +0.7 +1

Strong Weak Weak Strong


Negative Negative Positive Positive

Moderate Moderate
Negative Positive

90
Example:
Using the data in example E (Sales) Compute the Sample correlation
coefficient.
x y
Notes: 𝐒𝑿𝒀 = 𝟏𝟏, 𝐱ത = 𝟑 & 𝐲ത = 𝟓𝟏 2 50
𝟐 5 57
σ 𝒙𝒊 − 𝒙ҧ
𝐒= 𝐒𝟐 𝒂𝒏𝒅 𝟐
𝑺 = 1 41
𝒏−𝟏 3 54
4 54
𝑺𝑿𝒀 1 38
𝐒𝐚𝐦𝐩𝐥𝐞 𝐜𝐨𝐫𝐫𝐞𝐥𝐚𝐭𝐢𝐨𝐧 𝐜𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 = 𝒓𝑿𝒀 =
𝑺𝑿 𝑺𝒀 5 63
3 48
4 59
2 46
91
𝒙 𝟐
σ 𝒙𝒊 −ഥ
𝐒𝑿𝒀 = 𝟏𝟏, 𝐱ത = 𝟑 & 𝐲ത = 𝟓𝟏 with 𝐒 = 𝐒𝟐 𝒂𝒏𝒅 𝑺𝟐 =
𝒏−𝟏

x y
2 50
5 57
1 41
3 54
4 54
1 38
5 63
3 48
4 59
2 46
Total 30 510
𝑺𝑿𝒀
𝐒𝐚𝐦𝐩𝐥𝐞 𝐜𝐨𝐫𝐫𝐞𝐥𝐚𝐭𝐢𝐨𝐧 𝐜𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 = 𝒓𝑿𝒀 =
𝑺𝑿 𝑺𝒀 92
x y 𝐱 𝐢 − 𝐱ത 𝐲𝐢 − 𝐲ത 𝐱 𝐢 − 𝐱ത 𝟐 𝐲𝐢 − 𝒚 𝟐

𝑺𝑿𝒀 = 𝟏𝟏 2 50 -1 -1 1 1
5 57 2 6 4 36
1 41 -2 -10 4 100

𝑺𝑿 = 𝑺𝑿 𝟐 3 54 0 3 0 9
4 54 1 3 1 9
1 38 -2 -13 4 169
5 63 2 12 4 144
𝑺𝒀 = 𝑺𝒀 𝟐 3 48 0 -3 0 9
4 59 1 8 1 64
2 46 -1 -5 1 25
Total 30 510 0 0 20 566

𝟐𝟎 𝟓𝟔𝟔
𝑺𝑿 = = 𝟏. 𝟒𝟗 𝒂𝒏𝒅 𝑺𝒚 = = 𝟕. 𝟗𝟑
𝟏𝟎 − 𝟏 𝟏𝟎 − 𝟏

93
x y 𝐱 𝐢 − 𝐱ത 𝐲𝐢 − 𝐲ത 𝐱 𝐢 − 𝐱ത 𝟐 𝐲𝐢 − 𝒚 𝟐

2 50 -1 -1 1 1
5 57 2 6 4 36
1 41 -2 -10 4 100
𝑺𝑿𝒀 = 𝟏𝟏 3 54 0 3 0 9
4 54 1 3 1 9
𝑺𝑿 = 𝟏. 𝟒𝟗 1 38 -2 -13 4 169
5 63 2 12 4 144
𝑺𝒀 = 𝟕. 𝟗𝟑 3 48 0 -3 0 9
4 59 1 8 1 64
2 46 -1 -5 1 25
Total 30 510 0 0 20 566

𝑺𝑿𝒀
𝐒𝐚𝐦𝐩𝐥𝐞 𝐜𝐨𝐫𝐫𝐞𝐥𝐚𝐭𝐢𝐨𝐧 𝐜𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 = 𝒓𝑿𝒀 = =
𝑺 𝑿 𝑺𝒀

94
x y 𝐱 𝐢 − 𝐱ത 𝐲𝐢 − 𝐲ത 𝐱 𝐢 − 𝐱ത 𝟐 𝐲𝐢 − 𝒚 𝟐

2 50 -1 -1 1 1
5 57 2 6 4 36
1 41 -2 -10 4 100
𝑺𝑿𝒀 = 𝟏𝟏 3 54 0 3 0 9
4 54 1 3 1 9
𝑺𝑿 = 𝟏. 𝟒𝟗 1 38 -2 -13 4 169
5 63 2 12 4 144
𝑺𝒀 = 𝟕. 𝟗𝟑 3 48 0 -3 0 9
4 59 1 8 1 64
2 46 -1 -5 1 25
Total 30 510 0 0 20 566

𝑺𝑿𝒀 𝟏𝟏
𝐒𝐚𝐦𝐩𝐥𝐞 𝐜𝐨𝐫𝐫𝐞𝐥𝐚𝐭𝐢𝐨𝐧 𝐜𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 = 𝒓𝑿𝒀 = = = 𝟎. 𝟗𝟑
𝑺𝑿 𝑺𝒀 𝟏. 𝟒𝟗 × 𝟕. 𝟗𝟑

95
x y 𝐱 𝐢 − 𝐱ത 𝐲𝐢 − 𝐲ത 𝐱 𝐢 − 𝐱ത 𝟐 𝐲𝐢 − 𝒚 𝟐

2 50 -1 -1 1 1
5 57 2 6 4 36
1 41 -2 -10 4 100
𝑺𝑿𝒀 = 𝟏𝟏 3 54 0 3 0 9
4 54 1 3 1 9
𝑺𝑿 = 𝟏. 𝟒𝟗 1 38 -2 -13 4 169
5 63 2 12 4 144
𝑺𝒀 = 𝟕. 𝟗𝟑 3 48 0 -3 0 9
4 59 1 8 1 64
2 46 -1 -5 1 25
Total 30 510 0 0 20 566

𝑺𝑿𝒀 𝟏𝟏
𝐒𝐚𝐦𝐩𝐥𝐞 𝐜𝐨𝐫𝐫𝐞𝐥𝐚𝐭𝐢𝐨𝐧 𝐜𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 = 𝒓𝑿𝒀 = = = 𝟎. 𝟗𝟑 > 𝟎. 𝟕
𝑺𝑿 𝑺𝒀 𝟏. 𝟒𝟗 × 𝟕. 𝟗𝟑
Therefore, we have strong positive linear relationship between “Number of Commercials”
and “Sales Volume” 96
Extra Exercise:
Given the following dataset:
2 4 6 8 9 10 12 14 16
Answer the following questions:
1. Find the Mode and Median of this data set
2. Calculate the sample mean and deduce the shape of the distribution
3. Find the 23rd percentile of the data
4. Compute the Inter Quartile Range (IQR)
5. Find the sample variance, sample standard deviation and deduce the coefficient of
Variation.
6. Use the empirical rule to compute the 68.26% interval of this dataset.
7. Which of the following graphs (A or B) is the boxplot of our data set:
Graph A: Graph B:

2 5 9 13 16 2 5 9 13 16 97
End of Session

You might also like