Math2101Stat 2 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

COURSE TITLE: Linear Algebra, Statistics and

Probability
COURSE CODE: MATH-2101

Instructor: M. Ershadul Haque

Associate Professor

Department of Statistics, DU
Mode
• Mode: The mode is simply that value which has the highest frequency.

• The scores obtained by 5 students in a statistics test are 10, 7, 7, 7, and 0. The value “7”
has the highest frequency, therefore the mode is “7”

• Find the measure of central tendency from the following frequency distribution showing the
opinion of DU students regarding their curriculum load.
Religion Frequency
Islam 400
Hinduism 36
Buddhism 6
Christianity 5
Others 3

The category “Islam” has the highest frequency, therefore the mode is “Islam”
Mode for grouped data
• For grouped data mode is obtained by using the following formula

(𝑓0 − 𝑓−1 )
𝑀𝑜 = 𝐿 + ℎ
𝑓0 − 𝑓−1 + (𝑓0 − 𝑓1 )

✓ L = lower boundary of the modal class.

✓ 𝑓−1 = frequency of pre-modal class.

✓ 𝑓0 = frequency of the modal class.

✓ 𝑓1 = frequency of post-modal class.

✓ h = width of modal class

✓ The class that contains highest frequency is the modal class.


Mode for grouped data (cont…)
• Calculate the modal age (in years) of workers from the following data
Age Frequency
11-20 5
21-30 15
31-40 50
41-50 45
51-60 35

• The class boundaries of modal class is 30.5-40.5, the highest frequency belongs to this class.

50−15
• Mode = 30.5 + × 10 = 30.5 + 8.75 = 39.25
50−15 +(50−45)
Choosing measures of central tendency
• The mean is only suitable for only quantitative data. For this type of data, the median is
used as a measure of central tendency if some unusual values arise.

• The mode may be the only measure available where it is not possible to do arithmetic
operation on the data, as in the case of qualitative variable.

• In the following cases arithmetic mean should not be used:

✓ When there are very large and very small values of observations (median can be used)

✓ In distributions with open-end class (median can be used)

✓ When the distribution is unevenly spread and the concentration being small or large at
irregular points (see Figure-2). (median can be used).

✓ When the variable under study is qualitative.


Choosing measures of central tendency (cont…)
• The mean is only suitable for only ratio or interval data. For this type of data, the median is
used as a measure of central tendency if some unusual values arise.

Figure 1: Bell-shaped distribution

Figure 2: Skewed distribution


Shape Characteristics
• Shape characteristic of a distribution determine the extent of its asymmetry and peakedness
relative to an agreed upon standard.

• The shape characteristics of a distribution is of crucial importance in comparing a distribution with


other distributions.

Skewness

• The term skewness refers to the lack of symmetry i.e., the distribution has the same shape on either
side of the center.

• The lack of symmetry in a distribution is always determined with reference to a normal (bell
shaped) distribution.

• Any departure of a distribution from symmetry leads to an asymmetric distribution and in such case,
we call this distribution as skewed.

• The skewness may be either positive or negative.


Skewness (cont…)

• Shape of a normal distribution


Positively skewed distribution

• Skewness is said to be positive if the right side tail is longer than the left side tail.

• When the skewness is positive, the associated distribution is positively skewed. For a
positively skewed distribution, the following relationship hold:
𝑀𝑜𝑑𝑒 < 𝑀𝑒𝑑𝑖𝑎𝑛 < 𝑀𝑒𝑎𝑛.
Negatively skewed distribution

• Skewness is said to be negative if the left side tail is longer than the right side tail.

• When the skewness is negative, we call the distribution a negatively skewed. For a
negatively skewed distribution 𝑀𝑒𝑎𝑛 < 𝑀𝑒𝑑𝑖𝑎𝑛 < 𝑀𝑜𝑑𝑒.
How to Detect the Shape of a Distribution
• The shape of a distribution can be detected graphically by plotting a histogram, a frequency
polygon/ frequency curve, a box plot and a stem-and-leaf plot.

• The shape of a distribution can also be detected numerically by computing mean, median,
mode, quartiles, deciles, percentiles, or by some measures of skewness.
Measures of Dispersion

• The measurement of scatteredness of the values of a dataset among themselves is called a


measure of dispersion or measure of variation.

• Necessity: A measure of central value (average) alone cannot adequately describe a set of
observations. It fails to give us any idea about the formulation of the data sets. For this
reason, it is necessary to study the dispersion (variability) along with average for describing
a dataset.

• Purpose of measure of dispersion or variation: Measure of dispersion is important for


the following purposes

✓ To determine the reliability of an average.

✓ To compare two or more series with regard to their variability.

✓ It is one of the most important quantities used to characterize a frequency distribution.


Different measure of dispersion

• Absolute measures of dispersion: When dispersion is measured in original units then it is


called absolute dispersion. The four important absolute measures of dispersion are as
follows:

✓ Range

✓ Quartile deviation

✓ Mean deviation

✓ Standard deviation
Different measure of dispersion

• Relative measures of dispersion: It is a unit free measure of dispersion. When the


various sets of data are expressed in different units, the absolute measures are not
comparable. Even with identical units of measurements, when the average values of two
or more set are very much different, the absolute measures give misleading conclusion. In
such cases relative measures of dispersion should be used. The four important relative
measures of dispersion are as follows:

✓ Coefficient of range

✓ Coefficient of quartile deviation

✓ Coefficient of mean deviation

✓ Coefficient of variation
Measure of Dispersion: Mean deviation

• Mean deviation: The arithmetic mean of the absolute values of deviations from a typical
value of a distribution is called mean deviation. The typical value may be the arithmetic
mean or median. If the typical value is mean then the mean deviation is called mean
deviation about the mean. It is denoted by 𝑀𝐷(𝑥).
ҧ By definition

σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ
𝑀𝐷(𝑥)ҧ =
𝑛

• Example: Consider the following sampled data of IQ scores: 95, 103, 105, 110, 104, 105,
112, 90. Compute the mean deviation.
Measure of Dispersion: Mean deviation (cont…)

• To compute the mean deviation, we can construct the following table


Weights (𝑥𝑖 ) 𝑥ҧ 𝑥𝑖 − 𝑥ҧ 𝑥𝑖 − 𝑥ҧ
95 -8 8
103 0 0
105 2 2

110 103 7 7
104 1 1
105 2 2
112 9 9
90 -13 13
σ 𝑥𝑖 = 824 σ 𝑥𝑖 − 𝑥ҧ = 42
σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ 42
• Here 𝑛 = 8, we have 𝑀𝐷 𝑥ҧ = = = 5.25
𝑛 8

• The mean deviation of scores is 5.25. The IQ scores deviates, on average by 5.25 from the
mean.
Different Measures of Dispersion: Standard deviation

• The arithmetic mean of the squared deviations from the mean is called variance. The
positive square root of the variance is known as the standard deviation.

• Sample variance is denoted by 𝑆 2 . The sample variance of n values 𝑥1 , 𝑥2 , … , 𝑥𝑛 is defined


σ 𝑥𝑖 −𝑥ҧ 2
as 𝑆2 = . We can write, σ 𝑥𝑖 − 𝑥ҧ 2 = σ 𝑥𝑖 2 − 𝑛𝑥ҧ 2 . Therefore the sample variance of
𝑛−1

σ 𝑥𝑖 −𝑥ҧ 2 1
𝑛 values 𝑥1 , 𝑥2 , … , 𝑥𝑛 can be calculated by 𝑆2 = = σ 𝑥𝑖 2 − 𝑛𝑥ҧ 2 .
𝑛−1 𝑛−1

• As the sample variance is denoted by 𝑆 2 , the sample standard deviation is denoted by 𝑆


and is defined as 𝑆 = + 𝑆 2 .

• Example: Consider the following sampled data of IQ scores: 95, 103, 105, 110, 104, 105,
112, 90. Compute the standard deviation.
Standard deviation (cont…)

• To compute the standard deviation, we can construct the following table


Weights (𝑥𝑖 ) 𝑥ҧ 𝑥𝑖 − 𝑥ҧ 𝑥𝑖 − 𝑥ҧ 2
95 -8 64
103 0 0
105 2 4
110 103 7 49
104 1 1
105 2 4
112 9 81
90 -13 169
σ 𝑥𝑖 = 824 σ 𝑥𝑖 − 𝑥ҧ 2 = 372

2 σ 𝑥𝑖 −𝑥ҧ 2 372
• Here 𝑛 = 8, we have 𝑆 = = = 53.14286
𝑛−1 7

• Equivalently,
1 1 1 372
𝑆2 = ෍ 𝑥𝑖 2 − 𝑛𝑥ҧ 2 = 952 + ⋯ + 902 − 8 × 1032 = 85244 − 84872 = = 53.1428
𝑛−1 7 7 7

• Therefore the sample standard deviation is 𝑆 = 53.14286 = 7.289915


Properties of Variance

• Every set of interval or ratio level data has a variance. That is, mean can be computed only
for quantitative variable.

• The variance is unique and all the values are included in computing the mean.

• Variance is independent on origin but depend on the scale of measurement. If 𝑦𝑖 = 𝑎 +


𝑏𝑥𝑖 , 𝑖 = 1, 2, ⋯ , 𝑛 for some constants 𝑎 and 𝑏, then 𝑆𝑦2 = 𝑏 2 𝑆𝑥2 . Similarly if 𝑦𝑖 = 𝑎 − 𝑏𝑥𝑖 , 𝑖 =
1, 2, ⋯ , 𝑛 for some constants 𝑎 and 𝑏, then 𝑆𝑦2 = 𝑏 2 𝑆𝑥2 .

• If a set consists of 𝑛1 observations of the form 𝑥11 , 𝑥12 , … , 𝑥1𝑛1 with variance 𝑆𝑥21 and a
second set consists of 𝑛2 observations of the form 𝑥21 , 𝑥22 , … , 𝑥2𝑛2 with variance 𝑆𝑥22 , then
the variance of all the 𝑛1 + 𝑛2 observations called combined variance, is given by

(𝑛1 −1)𝑆𝑥21 +(𝑛2 −1)𝑆𝑥22


S𝑐2 =
𝑛1 +𝑛2 −2
Relative Measures of Dispersion

• All relative measures of dispersion are computed using the absolute measures along with
the measures of central tendency/location. Relative measures are usually expressed as
percentage. We will discuss only one measure based on mean and standard deviation.

• Coefficient of Mean Deviation: The coefficient of mean deviation (𝐶𝑀𝐷 ) is computed is


computed by the following formula
𝑚𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝐶𝑀𝐷 = × 100
𝑚𝑒𝑎𝑛
• Coefficient of Variation (CV): The coefficient of variation (CV) is computed as a ratio of
the standard deviation of a dataset to the mean of the same dataset. That is
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝐶𝑉 = × 100
𝑚𝑒𝑎𝑛
Exercise
• (1) The following table shows the nutritional status (actual weight expressed as a percentage of
expected weight for actual height) values of the 20 cases studied
73.3, 54.6, 82.4, 76.5, 72.2, 73.6, 74.0, 80.5, 71.0, 56.8, 80.6, 100.0, 79.6, 67.3,
50.4, 66.0, 83.0, 72.3, 55.7, 64.1.
For these data compute the following descriptive measures: mean, median, mode.
• (2) The following table gives the age distribution for the number of deaths in New York State due
to accidents for residents age 25 and older.

Compute the following descriptive measures: mean, median, mode.


Exercise (cont…)
• (3) Suppose we want to determine the vocabulary comprehension of first-grade students in
a particular elementary class. The following is a sample of scores that might be found:

100, 87, 84, 100, 53,54,98, 89, 67, 115, 80, 76, 72, 70, 91, 110, 94, 79, 86, 91, 93, 105, 83,

89, 92, 84, 100, 81, 105, 86, 95, 80, 69,77,74,79,64,61

✓ (a) Find the mean deviation.

✓ (b) Compute the standard deviation.


Thank You

You might also like