Lecture 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Sampling Terminologies

i) Population
A group of individual persons, objects or items from which samples are taken.
ii) Sample: A sample is a subject of the population. A sample is a finite part of the
population whose properties are studied to gain information about the whole
population

 Assume we have population of 100 Pebble stones


 We are interested in the weight of the stones
 We shall examine different methods of obtaining a sample of size 10

iii) Survey/study population: This is the finite population from which we will select
our samples (the 100 stones).
iv) Population characteristic: this is the aspect of the population we wish to measure.
In this case, it is the weight of the pebbles.
v) Sampling unit: the individual unit we are sampling. In this case, it is an individual
pebble.
vi) Sampling frame: A list of all sampling units in the survey/study population. In this
case, it is a list containing the stones numbers 1 − 100 .
vii) Census: A survey consisting of every member of the population. A census would
involve weighing all 100 stones

1
Reasons for sampling
 Cost
Sampling a fraction of the population is cheaper (cost effective) than conducting
a census.
 Sampling rather than using a census saves time
 We do sample because some population are partly accessible.
 Some populations are very large
 For accuracy purposes

Sampling Methods
Sampling is the act, process or technique of collecting a suitable sample, or presenting
part of the population for the purpose of determining parameters or characteristics of the
whole population.
i) Accessibility sampling: the most easily obtained observations are chosen.
ii) Judgment sampling: the experimenter chooses the sample based on what he
or she thinks is a representative sample
iii) Quota sampling: this typically combines accessibility and judgment sampling.

iv) Random sampling: members of the sample are chosen at random. There are
two types of basic random sampling.
 Simple random sampling: This is random sampling without
replacement. Each population is either not in the sample or in once.
Simple random sampling gives equal probability of selection to every
permitted (unordered) sample of a given size.
 Unrestricted random sampling: this is a random sampling with
replacement. All possible population members are available for each
random selection. So a population member may be in the sample more
than once.

v) Stratified sampling:
Sometimes populations within an entire population vary considerately. In this
case, it is advantageous to divide the sample into subpopulations called strata
and then perform simple random sampling within each stratum. This is known
as stratified sampling.

vi) Cluster and multi- stage sampling

Types of data

Quantitative and qualitative data

2
Descriptive statistics
Presentation of;
Tables
Frequency table, cumulative frequency tables and the stem-and-leaf tables
Graphs
Histograms, frequency polygons, cumulative frequency polygons

Steps involved in making the frequency table


i) Compute the range ′𝑟′ of data

𝑋 = 𝑙𝑎𝑟𝑔𝑒𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
If { 𝑚𝑎𝑥
𝑋𝑚𝑖𝑛 = 𝑙𝑜𝑤𝑒𝑠𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛

Then 𝑟 = 𝑋𝑚𝑎𝑥 − 𝑋_𝑚𝑖𝑛


ii) Find the number of classes C and the class width(class length) 𝑤

# Of classes ⇒ 2𝐶 ≥ 𝑛 where n is the number of observations (sample size)

𝑛 = 100, find c.

Example 1

i) 𝑛 = 100 , find 𝑐

2𝑐 ≥ 𝑛 (for the first time)

2𝑐 ≥ 100

21 ≱ 100

22 ≱ 100

23 ≱ 100

24 ≱ 100

25 ≱ 100

3
26 ≱ 100

27 ≥ 100

i.e. 27 = 128 ,128 ≥ 100

𝐶 = # 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 = 7

ii) If 𝑛 = 250 , find c

𝑐 = 8 classes (i.e. 28 = 256 ≥ 250

The width is given by


𝑟 𝑟𝑎𝑛𝑔𝑒
𝑤 = 𝑐 = # 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 (Round off to the nearest whole number)

iii) Define the class limits (Lower limit ℒ 𝑖 and upper limit 𝒰𝑖 )

𝐿𝑖 for the first class is supposed to be less or equal to 𝑋_𝑚𝑖𝑛

𝒰𝑖 =Upper limit for the first class

𝒰𝑖 = 𝐿𝑖 + 𝑤
𝐿𝑖 +𝒰𝑖
Class mark, 𝑋𝑖 = ,𝑖 = 1 , 2 ,3 ,⋯⋯𝑐
2

iv) Get the frequencies, 𝑛𝑖 , 𝑖 = 1 ,2 , ⋯ ⋯ 𝑐


Can also get the relative frequencies given by

𝑛𝑖
ℱ𝑖 =
𝑛

[10 − 20[ Means 10 is included but 20 isn’t.

4
Frequency table:

# of classes class limit class mark 𝐶𝐹𝑖 relative freq(𝑛𝑖 )

(𝑋𝑖 ) Freq

1 [𝐿1 − 𝒰1 [ 𝑋1 𝑛1 ℱ1 𝑛1

2 [𝐿2 − 𝒰2 [ 𝑋2 𝑛1 + 𝑛2 ℱ2 𝑛2

⋮ ⋮ ⋮ ⋮ ⋮ ⋮

𝑐 [𝐿𝑐 − 𝒰𝑐 [ 𝑋𝑐 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑐 ℱ𝑐 𝑛𝑖

Stem - and – leaf table

Example

Take marks obtained in MA 110

50 , 55 , 58 , 60 , 62 , 63 , 64 , 69 , 70 , 72 , 77 , 80 , 84

Stems Leaves

5 0 5 8

6 0 2 3 4 9

7 0 2 7

8 0 4

5
Numerical or Descriptive Measures

Measures of location: Mean, mode, median

Measures of spread: variance, standard deviation

Measures of skewness: coefficient of skewness

Measures of relative spread: Coefficient of variation

Ungrouped Data (raw data):

Mean, 𝑋̅ (𝑋 bar):

Sample data 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛

𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠


𝑋̅ =
𝑠𝑖𝑧𝑒 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒

𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛
𝑋̅ =
𝑛

𝑛
1
𝑋̅ = ∑ 𝑋𝑖
𝑛
𝑖=1

Median

Median observation refers to the middle observation or value when the data are
arranged in increasing sequence.

Sample data: 𝑋(1) , 𝑋(2) , 𝑋(3) , …. , 𝑋(𝑛)

𝑋𝑚𝑖𝑛 𝑋𝑚𝑎𝑥

6
Median
𝑋(𝑛+1) 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
2
⇒ 𝑀𝑑 = {𝑋(𝑛) + 𝑋(𝑛+1)
2 2
𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2

Example

Find the median in the following sequence

50, 85, 47, 62, 58, 80 (sample size 𝑛 = 6)

Example

Find the variance for the sequence

2, 4, 7, 11, 15

39
𝑋̅ = 5 = 7.8

𝑐
(𝑋1 − 𝑋̅)2 + (𝑋2 − 𝑋̅ )2 + ⋯ + (𝑋𝑐 − 𝑋̅)2 1
𝑉𝑎𝑟(𝑋) = = ∑(𝑋𝑖 − 𝑋̅ )2
𝑛−1 𝑛−1
𝑖=1

Variance
(2 − 7.8)2 + (4 − 7.8)2 + (7 − 7.8)2 + (11 − 7.8)2 + (15 − 7.8)2
(𝑆 2 ) =
5−1
110.8
= 4

= 27.7

Standard deviation S

S= √𝑆 2

7
Measure of skewness

3(𝑋̅−𝑀𝑑 )
𝑆𝑘 = 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑠𝑘𝑒𝑤𝑛𝑒𝑠𝑠 =
𝑆

i) If 𝑆𝑘 = 0 (meaning 𝑋̅ = median), then the data set is symmetrical

ii) If 𝑆𝑘 is less than 0 (𝑋̅ < 𝑚𝑒𝑑𝑖𝑎𝑛), then the data is skewed to the left

iii) If 𝑆𝑘 is greater than 0(𝑋̅ > 𝑚𝑒𝑑𝑖𝑎𝑛), then the data is skewed to the right.

COEFFICIENT OF VARIATION
𝑆
CV = 𝑋̅

8
GROUPED DATA

To group the data, we will arrange the data in a table as follows:

𝐼𝐷# class class Freq 𝐶𝐹𝑖 𝑛𝑖 𝑥 𝑖 𝑅. 𝑓𝑟𝑒𝑞 𝑛𝑖 (𝑥𝑖 − 𝑥̅ )2


marks(𝑥𝑖 ) (𝑛𝑖 )
limit 𝑓𝑖
𝑛1
1 [𝐿1 − 𝒰1 [ 𝑥1 𝑛1 𝐶𝐹1 𝑛1 𝑥1 𝑛1 (𝑥1 − 𝑥̅ )2
𝑛

2 [𝐿2 − 𝒰2 [ 𝑥2 𝑛2 𝐶𝐹2 𝑛2 𝑥 2 𝑛2 𝑛2 (𝑥2 − 𝑥̅ )2


𝑛

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

𝑘−1 [𝐿𝑘−1 − 𝒰𝑘−1 [ 𝑥𝑘−1 𝑛𝑘−1 𝑛𝑘−1 𝑥𝑘−1 𝑛𝑘−1 ⋮


𝐶𝐹𝑘−1 𝑛

𝑛𝑘
𝑘 [𝐿𝑘 − 𝒰𝑘 [ 𝑥𝑘 𝑛𝑘 𝐶𝐹𝑘 𝑛𝑘 𝑥 𝑘
𝑛

𝑘+1 [𝐿𝑘+1 − 𝒰𝑘+1 [ 𝑥𝑘+1 𝑛𝑘+1 𝐶𝐹𝑘+1 𝑛𝑘+1 𝑥𝑘+1 𝑛𝑘−1


𝑛

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

𝑛𝑐
𝑐 [𝐿𝑐 − 𝒰𝑐 [ 𝑥𝑐 𝑛𝑐 𝐶𝐹𝑐 𝑛𝑐 𝑥 𝑐
𝑛

9
Mean:

(𝑛1 𝑋1 + 𝑛2 𝑋2 + ⋯ + 𝑛𝑐 𝑋𝑐 )
𝑋̅ =
𝑛

Sample mean:

𝑐
1
= ∑ 𝑛𝑖 𝑋𝑖
𝑛
𝑖=1

Variance(𝑆 2 ):

𝑛1 (𝑋1 − 𝑋̅ )2 + 𝑛2 (𝑋2 − 𝑋̅)2 + ⋯ + 𝑛𝑐 (𝑋𝑐 − 𝑋̅)2


𝑆2 =
𝑛−1

Sample variance

𝑐
1
= ∑ 𝑛𝑖 (𝑋𝑖 − 𝑋̅ )2
𝑛−1
𝑖=1

Median(𝑴𝒅 )

Locate the median class 𝑘 that that


𝑛
𝐶𝐹𝑘 ≥ (for the first time)
2

Where 𝐶𝐹𝑘 is the cumulative frequency for the 𝑘 𝑡ℎ class and 𝑛 is the sample size

Then use the formula


𝑤 𝑛
𝑀𝑑 = 𝐿𝑘 + 𝑛 ( 2 − 𝐶𝐹𝑘−1 ) to compute the median.
𝑘

10
Mode (𝑀𝑜 )

Locate the modal class(class having the highest frequency). If we let 𝑘 be the modal
class, then
𝑑1
𝑀𝑜 = 𝐿𝑘 + 𝑤 (𝑑 )
1 +𝑑2

Where

𝑑1 = 𝑛𝑘 − 𝑛𝑘−1

𝑑2 = 𝑛𝑘 − 𝑛𝑘+1

Example

The following data represent the examinations scores obtained by 100 students in
MA110 course.

40 41 42 44 45 46 46 46 47 47 47 47
48 48 49 50 50 50 51 51 52 52 52 52
52 52 53 53 53 53 53 53 54 54 54 55
55 55 55 56 56 56 56 56 57 57 57 57
57 57 57 57 57 57 57 58 58 58 58 58
58 58 59 59 59 59 60 60 60 60 61 61
61 61 61 62 62 62 63 63 63 63 64 64
64 65 65 66 66 67 67 67 67 68 68 69
70 71 72 74

a. Construct a stem – and – leaf plot and use it to determine the mode and the
median for the above ungrouped data.

Stem leaves

4 0 1 2 4 5 6 6 6 7 7 7 7 8 8
5 0 0 0 1 1 2 2 2 2 2 2 3 3 3
6 0 0 0 0 1 1 1 1 1 2 2 2 3 3
7 0 1 2 3 4

11
Stem leaves

4 9
5 3 3 3 4 4 4 5 5 5 5 6 6 6 6 6
6 3 3 4 4 4 5 5 6 6 7 7 7 7 8 8
7

Stems leaves

5 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 9 9

6 9

Summary

Leaves

4 5 6 7

Total 15 51 30 4 100

Mode(𝑴𝒐 ) = 57%

It has the highest frequency. Most of the students got 57%


𝑋 𝑛 +𝑋 𝑛
( ) ( +1)
2 2
Median = , 𝑛 = 100(sample size)
2
𝑋(50) + 𝑋(51)
=
2
57+57
= 2

= 57

12
Roughly, 50% of the students got above 57% mark and 50% of the students got below
or equal to 57%.

b. Construct an absolute frequency distribution and an absolute cumulative


frequency distributions
c. Construct a histogram, a frequency polygon and use the histogram to estimate
the mode. Use the cumulative frequency polygon to estimate the median.
d. Determine the mean, the mode, the median, the variance, the standard deviation
and coefficient of skewness for the grouped data. Interpret these numbers
correctly

Answers

b. Absolute frequency and absolute cumulative frequency distribution

I. First get the range

Range, 𝑟 = 𝑋max − 𝑋𝑚𝑖𝑛


= 74 − 40

= 34

ii. find the number of classes

# of classes= 2𝑐 ≥ 𝑛 , 𝑛 = 100

∴ 𝑐 = 7 classes

iii. Determine the class width


𝑟 34
𝑤=𝑐= = 4.85
7

iv. Determine the first lower limit(should be less or equal to 𝑋𝑚𝑖𝑛

NB: The frequency polygon starts and ends with frequency of zero

𝐿1 ≤ 40, thus we can have 35 as our first limit which implies that

𝒰𝑖 = 𝐿𝑖 + 𝜔

= 35 + 5 = 40

This means that the first class limit is [35 − 40[

13
Note that the table below is the absolute cumulative frequency table because it has
both the frequency and the cumulative frequency columns.

ID # Class limit Class Frequency 𝑛𝑖 𝑥 𝑖 𝐶𝐹𝑖 𝑛𝑖 (𝑋𝑖 − 𝑋̅)2


mark(𝑥𝑖 )
[𝐿𝑖 − 𝒰𝑖 [ (𝑛𝑖 )

0 [35 − 40[ 37.5 0 0 0 0

1 [40 − 45[ 42.5 4 170 4 894.01

2 [45 − 50[ 47.5 11 522.5 15 1089.0275


3 52.5 20 1050 35 490.05
[50 − 55[
4
[55 − 60[ 57.5 31 1782.5 66 0.075
5
[60 − 65[ 62.5 19 1187.5 85 484.5475
6
[65 − 70[ 67.5 11 742.5 96 1111.0278
7
[70 − 75[ 72.5 4 290 100 906.01

8 [75 − 80[ 77.6 0 0 100 0

100 = 𝑛 5745 4974.75

For grouped data


𝑐
1
𝑥̅ = ∑ 𝑛𝑖 𝑥𝑖
𝑛
𝑖

1
= 100 × 5745

= 57.45

𝑛2 = 4(42.5 − 57.45)2 = 894.01


14
Construct the histogram

(35, 0), (40, 4), (45, 11), (50, 20), (55, 31);

35

Frequency 30

25

20 • frequency polygon

15

10 • •

5 • •

• •
35 40 45 50 55 60 65 70 75 80

Mode ≈ 57 Lower limit

Histogram

Plot lower limit / frequency

Frequency polygon

Plot class mark / frequency

(40, 0), (45, 4), (50, 15)

15
CF polygon

Plot 𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 / 𝐶𝐹𝑖

100 •

90

80

70 cumulative frequency

60 polygon

50 •

40

30

20 •

10 •

40 45 50 55 60 65 70 75 80 upper limit

median≈ 57

Mean:

𝑐
1
𝑥̅ = ∑ 𝑛𝑖 𝑥𝑖
𝑛
1=1

1
= 100 (5745)

= 57.45

Mode(𝑀0 )

The modal class is class 4

16
𝑢 is the value for the modal class

i.e. 𝑢 = 4

𝑑1 = 𝑛4 − 𝑛3 = 31 − 20 = 11

𝑑2 = 𝑛4 − 𝑛5 = 31 − 19 = 19
𝑑1
𝑀𝑜 = 𝐿𝑘 + 𝜔 (𝑑 )
1 +𝑑2

11
= 55 + 5 (11+12)

= 57.39

Median (𝑀𝑑 )

Find the median class from the table using the following procedure.

*Modal class will not always be the median class, in our example, it was just a
coincidence.
𝑛
We find the median class by comparing 𝑛 and 𝐹𝑖 , since 𝑛 = 100, = 50. So the median
2
class is the class that reaches 50 (for the first time) in the cumulative frequency column,
that is

Class 1= 4

Class 2= 16

Class 3= 35

Class 4= 66

Thus class 4 is the median class because it reaches 50 for the first time

∴ 𝑘 = 4 class = 4
𝑤 𝑛
𝑀𝑑 = 𝐿𝑘 + 𝑛 ( 2 − 𝐶𝐹𝑘−1 )
𝑘

𝑤 𝑛
𝑀𝑑 = 𝐿4 + 𝑛 ( 2 − 𝐶𝐹3 )
4

5 100
= 55 + ( − 35)
31 2

= 57.42
17
Variance (𝑺𝟐 )
1
𝑆 2 = 𝑛−1 ∑𝑐𝑖=1 𝑛𝑖 (𝑋𝑖 − 𝑋̅ )2

1
= 100−1 (4974.75)

1
= 99 (4974.75)

= 50.25

Standard deviation

𝑠 = √𝑆 2

= √7.09

Coefficient of skewness

3(𝑋̅−𝑀𝑑 )
𝑆𝑘 = 𝑆

3(57.45−57.42)
= 7.09

= 0.012 (round off to nearest whole number)

= 0 (Rounding off)

Data is symmetrical since 𝑆𝑘 = 0

18
Q11 (worksheet)

Class i Class Tally Frequency(𝑛𝑖 ) C. relative Class 𝑛𝑖 𝑥 𝑖 𝑛𝑖 (𝑥𝑖 − 𝑥̅ )2


limit frequency mark

[𝐿𝑖 − 𝒰𝑖 [ (𝑥𝑖 )

||||| ||||
1 [10 − 12[ 9 9 = 9/50 11 99 53.5824

||||| |||||
2 [12 − 14[ ||||| 26 35 = 35/50 13 333 5.0336
| |||||
|||||
||||| |||||
3 [14 − 16 10 45 15 150 24.336
45 =
50

|||||
4 [16 − 18[ 5 50 = 50/50 17 85 63.368

672 146.32

Mean

1
𝑥̅ = ∑𝑛 𝑥
𝑛 𝑖 𝑖
1
= 50 (672)

= $13.44

19
1
Variance(𝑆 2 ) = 𝑛−1 ∑𝑐𝑖=1 𝑛𝑖 (𝑥𝑖 − 𝑥̅ )2

1
= (146.32)
50−1

1
= 49 (146.32)

= 2.986

Mode
𝑑1
𝑀𝑜 = 𝐿𝑘 + 𝜔 (𝑑 )
1 +𝑑2

Modal class is class 2, thus 𝑘 = 2

𝐿(2) = 12

𝑑1 = 𝑛(2) − 𝑛(1) 𝑑2 = 𝑛(2) − 𝑛(3)

= 26 − 9 = 26 − 10

= 17 = 16

17
𝑀𝑜 = 12 + 2 (17+16 )

= $ 13.03

Median
𝜔 𝑛
𝑀𝑑 = 𝐿𝑘 + 𝑛 ( 2 − 𝐶𝐹𝑘+1 )
𝑘

𝑛
𝐶𝐹 ≥ for the first time in class 2 , thus 𝑘 = 2
2

𝐿𝑘 = 12 , 𝜔 = 2 , 𝑛𝑘 = 26 , 𝐶𝐹(2−1) = 9

2 50
𝑀𝑑 = 12 + 26 ( 2 − 9)

1
= 12 + 13 (25 − 9)

= 13.23

20
13. Find sample size 𝑛, 𝑛 = 300

To find sample proportion, find

𝑛(𝑆𝑂𝑇)
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒

i.e. sample presentation(SOT)


𝑛(𝑆𝑂𝑇)
=
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒

100 1
= 300 = 3 = 0.33̅

𝑛(𝑓𝑒𝑚𝑎𝑙𝑒) 82
= 300
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒

41
= 150 = 0.273̅

𝑛(𝑑𝑒𝑔𝑟𝑒𝑒) 235
= 300 = 0.783̅
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒

A probability should lie between 0 and 1

Probability is 1 when you are certain

When there is no chance of something happening, the probability is zero.

21

You might also like