Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

TABLE OF CONTENTS

Contents Page

1.0 FREQUENCY DISTRIBUTION AND GRAPH


1.1 Frequency Distribution 2
1.2 Frequency Distribution Table 2
1.3 Types of Graphs That Determines Frequency Distribution 2
1.4 Pattern of Distribution of Data 4

1.5 Workings 5

2.0 CENTRAL MEASURES OF LOCATION AND STATISTICAL


MEASURES OF SPREAD
2.1 Central Measures of Location 7
2.2 Statistical Measures of Spread 8
2.3 Variability of The Athlete Results 12

3.0 CONFIDENCE INTERVAL AND LEVEL


3.1 Confidence Interval 13
3.2 Computation of Confidence Interval 13

3.3 Summary on Confidence Interval 15

4.0 CORRELATION 16

5.0 REGRESSION
5.1 Regression Analysis 18
5.2 Computation on Regression Analysis 18

5.3 Conclusion 22

Reference 23

1
1.0 FREQUENCY DISTRIBUTION AND GRAPH

1.1 FREQUENCY DISTRIBUTION


The next step after the completion of data collection is to organize the data into a meaningful form
so that a trend, if any, emerging out of the data can be seen easily. One of the common methods
for organizing data is to construct frequency distribution.
According to Manikandan (2011), frequency distribution is an organized tabulation and graphical
representation of the number of individuals in each category on the scale of measurement. It
allows the researcher to have a glance at the entire data conveniently. It shows whether the
observations are high or low and also whether they are concentrated in one area or spread out
across the entire scale.
Thus, frequency distribution can be defined as the organization of raw data in table form, using
classes and frequencies in order to present a picture of how the individual observations are
distributed in the measurement scale (Bluman 2013).
The various components of the frequency distribution includes class interval, class boundaries,
midpoint or class mark, width or size of class interval, class frequency, frequency density, relative
frequency, etc.

1.2 FREQUENCY DISTRIBUTION TABLE


Given the results of the athletes sprint in the women training camp, a frequency distribution table
is hereby prepared to summarise and analyse the data.

Cummulative
Class Class CummulativeRelative
Frequency Midpoint Frequency Frequency
Relative
interval Boundary
Frequency
29.1 - 29.4 29.05 - 29.45 2 29.25 2 0.02 0.02
29.5 - 29.8 29.45 - 29.85 18 29.65 20 0.17 0.19
29.9 - 30.2 29.85 - 30.25 69 30.05 89 0.66 0.85
30.3 - 30.6 30.25 - 30.65 15 30.45 104 0.14 0.99
30.7 - 31 30.65 - 31.05 1 30.85 105 0.01 1.00
105 150.25 1.00
Table 1 - Frequency distriution table

1.3 TYPES OF GRAPHS THAT DETERMINES FREQUENCY DISTRIBUTION


Frequency distributions can be represented graphically such that appeals to the human power of
visualization. Some of these graphical tools as highlighted by Fowler (2017) are:
a). Histogram: This is a graph that measures continuous and quantitative data by using
contiguous vertical bars of various heights to represent the frequencies of the classes.This
can be used to plot more than one distribution. Histogram helps to display the progress of
a particular event or an object throughout history. This method of frequency distribution
helps us estimate the highs, the lows, the places where your values are concentrated and
where they are scarce. The touching bars produce a continuous figure, which emphasizes
the continuity of the variable.
2
b). Bar Graph: This is a graph that represents data by using vertical or horizontal bars whose
heights or lengths represent the frequencies of the data. They are often used for
qualitative data.
c). Frequency Polygon: This is a line graph that is used for displaying a frequency
distribution of a continuous variables. It involves the displays of data by using lines that
connect points plotted for the frequencies at the midpoints of the classes. The frequencies
are represented by the heights of the points. This graph is appropriate where there is a
large number of classes.
d). Pie Chart: This is the graph that shows the relationship between part and the whole. It is
used to display the relative frequency distribution of a nominal or categorical variable by
visually comparing sizes of the sections. Percentages and proportions are used.
e). Ogive: This is a frequency polygon that represents the cumulative or relative cummulative
frequencies for the classes in a frequency distribution. Ogives are used to visually
represent how many values are below a certain upper class boundary.
f). Pareto Chart: This is a quality control tool that is used to represent a frequency
distribution for a categorical variable, and the frequencies are displayed by the heights of
vertical bars, which are arranged in order from highest to lowest. It is a bar chart of
qualitative data with no gaps between the bars and the bars are arranged by frquency.
g). Relative frequency graph: It is a distribution graph that uses proportions instead of
frequencies.
h). Time series graph: This tool can be used for forcasting and predicting. It represents data
that occur over a specific period of time. It can be used to look for patterns or trends that
occur over time.
i). Stem and leaf plot: This is a graph that displays number in a visual histogram-likeform. It
is the combination of sorting and graphing. Data is represented by using part of the data
value as the stem and part of the data value as the leaf to form groups or classes. This
graph has an advantage of retaining the actual data while showing them in graphical form.

Thus, the best graph to use to determine the distribution of frequencies on the time spent by each
of the 105 athlethes that particpated in the 200m sprint as seen in Table 1 above, is the
histogram. Histogram deems suitable because the collated result of the sprint is a quanitative and
continuous data and has a normal distribution.
Using the extract from table 1, the below table will be used to prepare our graph for better
understanding.
Cumulative
Bin Frequency %
29.4 2 1.90%
29.8 18 19.05%
30.2 69 84.76%
30.6 15 99.05%
31 1 100.00%

Table 2 - Frequency histogram table


3
Histogram for Athletes 200m Training
80 120.00%
69
70
100.00%
60
80.00%
50
FREQUENCY

40 60.00%

30
40.00%
18
20 15
20.00%
10
2 1
0 0.00%
29.4 29.8 30.2 30.6 31
CLASS INTERVAL
Frequency Cumulative %

Figure 1 - Graphical representation of frequency distriution

1.4 PATTERN OF DISTRIBUTION OF DATA


An overview of the set of result collated in the women's 200m sprint training, reveals that the data
is quantitative in nature and have a continuous distribution.
Continuous distribution is such that contains infinite (variable) data which is a variable with a set of
possible values that is infinite and uncountable.This can be seen in the time result of athletes
taken (30.2, 30.1, 29.8, 29.9, ...). That is, it measures something rather than just count.
The data distribution of the athletes results can also be described as a normal distribution due to
the fact that it possesses the characteristics of a Gaussian (normal) distribution. In that, the mean
= median = mode as seen in subsequent calculations which produced a mean of (30.03), median
(30.03) and mode (30), all located at the centre. More so, there is a symmetry around the centre
and the distribution curve is "bell shaped". (Frost, 2018)
This has been graphically represented below:

Normal Distribution Chart


1.8 Mean = 30.03 30
1.6
25
1.4
Distribution

1.2 20
1
15
0.8
0.6 10
0.4
5
0.2
0 0
29.2 29.3 29.4 29.5 29.7 29.8 29.9 30.0 30.1 30.3 30.4 30.5 30.6 30.7
Data Set (Time)
Figure 2 - Bell curve / normal distribution of data
4
1.5 WORKINGS
The results of women’s 200m training camp (in seconds) taken on a single 200m lap by each
athlete.
30.2 30.1 29.8 29.9 30.1 30 30.2
30 30.1 30 29.7 30.4 30.2 29.8
30.2 30.5 29.9 30 29.9 29.7 30.1
30 30.2 30.2 30.2 30.3 29.3 29.7
30.7 29.6 30 29.8 29.4 30.2 29.9
29.9 30 30 29.9 30 29.9 29.8
29.8 30.1 29.9 30.6 30.2 29.7 29.7
30.1 30 29.9 30.5 30.3 29.9 30.3
30.1 29.9 30.2 30 30 30.1 30.1
29.9 30 29.6 30.2 30.1 30 30
30 29.9 30 30 29.9 30.1 30.4
30.1 29.8 29.9 30.1 29.8 30.5 30.3
30.5 29.9 30.1 30.1 29.8 29.9 30.2
30.2 29.9 30.2 30.3 29.9 30.4 30.4
30.1 29.7 29.7 30 29.7 30.3 30

Summarised as:
Result Frequency Tally
29.3 1 |
29.4 1 |
29.6 2 ||
29.7 8 |||| |||
29.8 8 |||| |||
29.9 19 |||| |||| |||| ||||
30 20 |||| |||| |||| ||||
30.1 16 |||| |||| |||| |
30.2 14 |||| |||| ||||
30.3 6 |||| |
30.4 4 ||||
30.5 4 ||||
30.6 1 |
30.7 1 |

a). Number of Classes


C = Log(n)
Log2
= Log17
Log2
= 4.08746
Rounded up to 5
5
b). Class Width - This is the size of each class

h = Range
Number of classes
= 30.7 - 29.3
5
= 0.28
Rounded up to 0.3

c). Class Intervals


The upper class is determined by adding the class size of 3 to the lower class.
Lower class Upper Class
29.1 29.4
29.5 29.8
29.9 30.2
30.3 30.6
30.7 31

d). Class Boundary


To determine the starting pioint and ending of our class boundaries,
= 29.6 - 29.5
2
= 0.05
Thus, the class boundary is obtained by subtractimg 0.05 from the lower class limit and add 0.05
to the upper class
Lower class Upper Class
29.05 29.45
29.45 29.85
29.85 30.25
30.25 30.65
30.65 31.05

e). Relative frequenccy


This is the percentage or proportion of individual class frequency to the total frequency.
= class frequency
total frequency

f). Midpoint
This is the average of the upper class interval / boundary and lower class
interval/boundary of each class of data.
𝑋𝑚 = Lower boundary + upper boundary
2

6
2 CENTRAL MEASURES OF LOCATION AND STATISTICAL MEASURES OF SPREAD
2.1 CENTRAL MEASURES OF LOCATION
These are statistical tools that are aimed at indentifying the center of a set of data. There are three
common ways of measuring the central location of the results from of the women's training.

Using this table, the value of each central measaures of location has been determined below.

Class Class Frequency Cummulative


Midpoint (x) Fx
interval Boundary (f) Frequency

29.1 - 29.4 29.05 - 29.45 2 29.25 58.5 2


29.5 - 29.8 29.45 - 29.85 18 29.65 533.7 20
29.9 - 30.2 29.85 - 30.25 69 30.05 2073.45 89
30.3 - 30.6 30.25 - 30.65 15 30.45 456.75 104
30.7 - 31 30.65 - 31.05 1 30.85 30.85 105
105 150.25 3153.25
Table 3 - Cummulative Frequency distribution table

2.1.1 Mean
The mean is the avearge of the set of data given. The mean of grouped data is obtained
by:
µ =
෍ 𝑓𝑥

N
= 3153.25
105
= 30.03
2.1.2 Mode
This is the central measure of location that determines the class with the highest
frequency. It can be obtained by:

x = + CW 𝐹𝑚𝑜𝑑 −𝐹𝑏𝑚𝑜𝑑
𝐿𝐶𝐵𝑚𝑜𝑑
𝐹𝑚𝑜𝑑 −𝐹𝑏𝑚𝑜𝑑 ) + (𝐹𝑚𝑜𝑑 −𝐹𝑎𝑚𝑜𝑑
where:
𝐿𝐶𝐵𝑚𝑒𝑑
= Lower class boundary of the modal class
CW = Class width
𝐹𝑚𝑜𝑑 = Frequency of the modal class
𝐹𝑏𝑚𝑜𝑑 = Frequency before the modal class
𝐹𝑎𝑚𝑜𝑑 = Frequency after the modal class
F = Total frequency

7
69 − 18
= 29.85 + 0.3
69 − 18) + 69 − 15

51
= 29.85 + 0.3
51 + 54

= 29.85 + 0.3(0.4857)
= 30.00

2.1.3 Median
Median is the cetral locatio tool that measures the value mid poi the set of data. For the
grouped data of the result of the women's athletic training, we can determine the median
by:
x = + 𝐶𝑊 ෍ 𝑓 − 𝐶𝐹𝑏𝑚𝑒𝑑
𝐿𝐶𝐵𝑚𝑒𝑑
𝐹𝑚𝑒𝑑 2
where:
𝐿𝐶𝐵𝑚𝑒𝑑
= Lower class boundary of the median class

CW = Class width
𝐹𝑚𝑒𝑑 = Frequency of the median class
𝐶𝐹𝑏𝑚𝑒𝑑 = Cummulative frequency before the median class

F = Total frequency

0.3 105 − 20
= 29.85 +
69 2

= 29.85 + 0.004348 (42.5)

= 30.03

From the results of the measures of central location with the mean of 30.03, mode of 30 and
median of 30.3, we can say that this training group is on par with world standard. As the world
standard is to reach 30 seconds within an acceptance range of 29.5; 30.5.

2.2 STATISTICAL MEASURES OF SPREAD


This is the statistical tool that helps us to understand variation and diversity in a set of data. That
is, how far is their spread from the mean.
The component of this measure include the Range, Interquartile range, Variance and
Standard deviation which will be computed with the aid of table 4 below:

8
Class Class Frequency Midpoint
FX |𝑋 − 𝑥|2
interval Boundary (f) (X) |X -𝑥| 𝐹|𝑋 − 𝑥|2

29.1 - 29.4 29.05 - 29.45 2 29.25 58.5 -0.78 0.61 1.22


29.5 - 29.8 29.45 - 29.85 18 29.65 533.7 -0.38 0.15 2.612
29.9 - 30.2 29.85 - 30.25 69 30.05 2073.45 0.02 0.00 0.025
30.3 - 30.6 30.25 - 30.65 15 30.45 456.75 0.42 0.18 2.634
30.7 - 31 30.65 - 31.05 1 30.85 30.85 0.82 0.67 0.671
105 150.25 3153.25 0.10 1.60 7.16
Table 4 - Frequency distribution (variance) table

2.2.1 Range
The range shows variability by considering the difference between the highest and the
lowest value in a set of given data. Like in the case of the women's training, the least time
used in the 200m sprint was 29.3s while the most time spent by an athlete for the same
sprint is 30.7s.
Thus, our range can be calculated as:
𝑅𝑥 = 𝑅𝑚𝑎𝑥𝑖𝑚𝑢𝑚 + 𝑅𝑚𝑖𝑛𝑖𝑚𝑢𝑚
𝑅𝑥 = 30.7 - 29.3
= 1.4

Though range is a weak tool for determining the spread of data because it only considers
two values amongst all that is availabe in the set of data, a low range indicates there is a
small variability while a high range shows there is a wide variabilty.
Thus, with our range falling at 1.4, we can say that there is little variablity in the time taken
by the various athletes to complete the 200m sprint.

2.2.2 Variance
Variance is the method of determining how individual numbers relates with each other
within a given set of data by computing the average squared difference of the values from
the mean. Unlike the range, the variance includes all values in the calculation by
comparing each value to the mean.
This can be seen here:

2 = ෍ 𝑓(𝑋 − 𝑥)2
𝜎
N
2
Where 𝑓(𝑥) = product of frequency and mean deviation
f = frequency
N = total number of items

𝜎2 = 7.16
105
= 0.06821

9
2.2.3 Standard Deviation
Standard Deviation is another tool that is used to estimate how spread out a set of data is
from its mean. Thus, the standard deviation of the data result of the athletes can be
computed by:

𝜎 = 𝜎2

= 0.26117

A high standard deviation points that data are pretty far from the average while low
standard deviation imples that they are close to the average/mean.

2.2.4 Interquartile Range


The interquartile range represents the central portion of the distribution, and is calculated
as the difference between the third quartile and the first quartile. This range includes about
one-half of the observations in the set, leaving one-quarter of the observations on each
side. (Jin, 2018)

Computing with Table 3,


Interquartile range = 𝑄3 − 𝑄1

Thus, we would start by finding the value of the third and first quartile using the table
below:
Data Orde Data Order Data Order Data Order Data Order
29.3 r 1st 29.9 22nd 30 43rd 30.1 64th 30.2 85th
29.4 2nd 29.9 23rd 30 44th 30.1 65th 30.2 86th
29.6 3rd 29.9 24th 30 45th 30.1 66th 30.2 87th
29.6 4th 29.9 25th 30 46th 30.1 67th 30.2 88th
29.7 5th 29.9 26th 30 47th 30.1 68th 30.2 89th
29.7 6th 29.9 27th 30 48th 30.1 69th 30.3 90th
29.7 7th 29.9 28th 30 49th 30.1 70th 30.3 91st
29.7 8th 29.9 29th 30 50th 30.1 71st 30.3 92nd
29.7 9th 29.9 30th 30 51st 30.1 72nd 30.3 93rd
29.7 10th 29.9 31st 30 52nd 30.1 73rd 30.3 94th
29.7 11th 29.9 32nd 30 53rd 30.1 74th 30.3 95th
29.7 12th 29.9 33rd 30 54th 30.1 75th 30.4 96th
29.8 13th 29.9 34th 30 55th 30.2 76th 30.4 97th
29.8 14th 29.9 35th 30 56th 30.2 77th 30.4 98th
29.8 15th 29.9 36th 30 57th 30.2 78th 30.4 99th
29.8 16th 29.9 37th 30 58th 30.2 79th 30.5 100th
29.8 17th 29.9 38th 30 59th 30.2 80th 30.5 101st
29.8 18th 29.9 39th 30.1 60th 30.2 81st 30.5 102nd
29.8 19th 30 40th 30.1 61st 30.2 82nd 30.5 103rd
29.8 20th 30 41st 30.1 62nd 30.2 83rd 30.6 104th
29.9 21st 30 42nd 30.1 63rd 30.2 84th 30.7 105th
Table 5 - Quartile distribution table

10
𝑛
𝑄1 = 𝐿𝑄1 + 4 −𝐶𝐹𝑏
𝐹𝑄1

The first quartile at the 25th position of 29.9 falls under the 29.9 - 30.2 class interval
The first quartile falls between:
𝑛 = 105
4 4
= 26.25

The lower class boundary = 29.85


The cummulative frequency before = 20
The frequency of the class = 69

𝑄1 = 29.85 + 26.25 − 20
69

= 29.85 + 0.09
= 29.9406

𝑄3 = 𝐿𝑄3 + 3𝑛
4 −𝐶𝐹𝑏
𝐹𝑄3
The first quartile at the 75th position of 30.1 falls under the 29.9 - 30.2 class interval
The third quartile falls between:
105
3𝑛 = 3( ) = 78.75
4
4

The lower class boundary = 29.85


The cummulative frequency before = 20
The frequency of the class 69
=
𝑄3 = 29.85 + 78.75 − 20
69

= 29.85 + 0.85
= 30.7014
Therefore, the interquartile range for the distribution of athletes'result is:
IQR = 𝑄3 − 𝑄1
= 30.7 - 29.9

= 0.7608696

11
2.3 CONCLUSION ON THE VARIABILITY OF THE ATHLETE RESULTS
The conclusion about the variablity of the athlete results can be easily drawn from its Coefficient of
variation which according to Sundar (2017) is a measure of the relative variability of a given set of
data. It estimates the ratio of the standard deviation to the mean. It is obtained by:

σ
CV = µ x 100

= 0.26 x 100
30.03
= 0.009

Based on the decision rule of coefficent of variation, it can be concluded that the athlete results
has low variability due to the fact that its CV of 0.009 is less than 1.

12
3.0 CONFIDENCE INTERVAL AND LEVEL
3.1 CONFIDENCE INTERVAL
Glen described confidence interval as how much uncertainty there is with any particular statistics.
The confidence interval from a sample data is a range of values that is likely to include the
population parameters at some specified confidence level.
The confidence interval can be computed by:
CI = µ ± E

Where µ is the point of estimate (the sample mean) and E is the margin of error

Confidence level is the probability that the value of a parameter falls within a specified range of
values. Confidence level is expressed as a percentage.
Confidence interval is computed with the margin of error which is that statistical tool that
measures how by how many percentage points the results will differ from the real population
value.

Margin of error is derived by:


σ
= 𝑍α/2
𝑛
where,
α = significant level which is equal to 1 - Confidence level
Critical value which is the equal balance of the alpha on both the negative
𝑍α/2 = and positive side of the z- distribution chart.
σ = Standard deviation
µ = Mean
N = Population size

3.2 COMPUTATION OF CONFIDENCE INTERVAL


a) At 68% level of confidence,

Confidence level = 68%


Significant level (α) = 1 - 0.68
= 0.32
αൗ =
2 0.32ൗ
2
= 0.16
Critical level 𝑍0.16 = 0.994458

σ = 0.26
µ = 30.03

Margin of error (E ) = 0.994458 0.26


104
= 0.03
13
Limits of the interval of confidence:
Lower limit = µ-E
= 30.01
Upper limit = µ+E
= 30.06
b) At 95.5% level of confidence,
Confidence level = 95.5%
Significant level (α) = 1 - 0.955
= 0.045
αൗ =
2 0.05ൗ
2
= 0.0250
Critical level 𝑍0.025 = 1.959964

σ = 0.26
µ = 30.03
Margin of error (E ) = 1.959964 0.26
104
= 0.05
Limits of the interval of confidence:
Lower limit = µ-E
= 29.98
Upper limit = µ+E
= 30.08
c) At 99.7% level of confidence,
Confidence level = 99.7%
Significant level (α) = 1 - 0.997
= 0.003
αൗ =
2 0.003ൗ
2
= 0.0015
Critical level 𝑍0.0015 = 2.967738

σ = 0.26
µ = 30.03
Margin of error (E ) = 2.967738 0.26
104
= 0.08
Limits of the interval of confidence:
Lower limit = µ-E
= 29.96
Upper limit = µ+E
= 30.11
14
3.3 SUMMARY ON CONFIDENCE INTERVAL
In summary, the interval as been calculated as:
Confidence level 68% 95.5% 99.7%
Critical level 0.99 1.96 2.97
Margin of error 0.03 0.05 0.08
Confidence interval 30.01, 30.06 29.98, 30.08 29.96,30.11
The confidence interval can be expressed as:
lower limits < µ < Upper limits
@ 68%: 30.01 < µ < 30.06
@ 95.5%: 29.98 < µ < 30.08
@ 99.7%: 29.96 < µ < 30.11
That is,
* 68% confident that the mean score of the population is between 30.01 and 30.06
* 95.5% confident that the mean score of the population is between 29.98 and 30.08
* 99.7% confident that the mean score of the population is between 29.96 and 30.11
There is a normal distribution empirical rule which assumes that the standard deviation indicates
precisely how the scores are distributed at certain level of confidence. Thus, it states that,

* About 68% of all scores lie within one standard deviation of the mean. In another words,
roughly two thirds of the scores lie between one standard deviation on either side of the
mean.
* About 95% of all scores lie within two standard deviation of the mean (Normal scores:
close to the mean).
* About 99.7% of all scores lie within three standard deviation of the mean.
This can be graphically represented as:

1.80
CONFIDENCE INTERVAL
* Mean = 30.3
1.60
1 Standard Deviation = 68% of data
* 2 Standard Deviation = 95% of data
1.40 * 3 Standard Deviation = 99.7% of data

1.20
Axis Title

1.00

0.80

0.60

0.40

0.20

0.00
29 29.2 29.4 29.6 29.8 30 30.2 30.4 30.6 30.8 31
Time Take
Figure 3 - Graphical representation of normal distribution of confident intervals and level
15
4.0 CORRELATION
According to Kundu, Correlation is described as a measure of association between two or more
variables. When two or more variables vary in sympathy so that movement in one tends to be
accompanied by corresponding movements in the other variable(s), they are said to be correlated.

Therefore, correlation in statistics is used to test the relationship between quantitative or


categorical variables.
The main result of a correlation is called Correlation coefficient ( r ) which ranges from -1.0 to
+1.0. The closer r is to +1 or -1, the more closely the two variables (bivariate data) are related.

Correlation coefficient can be determined using Scattered Plot where the two variables are
plotted on a graph paper with one variable on the x-axis and the other on the y-axis. In a way that
we get points on the graph which are generally scattered. Another method of finding the
correlation coefficient is the Pearson's Correlation (Q) method which gives a numerical
expression for the measure of the strength of the linear relationship between two variables. The
other method of correlation is the Spearman Rank Correlation which involves rank
randomization.

Using data provided on the twelve random samples obtained on the time and the age of the
athletes,
Age 20 22 23 27 28 27 29 29 27 26 27 28
Time 29.7 29.6 29.9 30.1 30 30 30.1 30 30.2 30.1 30.2 30.3

Correlation will be determined using the Pearson's correlation method.


r = σ 𝑥𝑦
σ(𝑥)2 σ(𝑦)2

Age
Time (y) |𝑥 -𝑥| |𝑦 - 𝑦| 𝒙𝑦 (𝒙)𝟐 (𝒚)2
(x )
20 29.7 -6.08 -0.32 1.93 37.01 0.1003
22 29.6 -4.08 -0.42 1.70 16.67 0.1736
23 29.9 -3.08 -0.12 0.36 9.51 0.0136
27 30.1 0.92 0.08 0.08 0.84 0.0069
28 30.0 1.92 -0.02 - 0.03 3.67 0.0003
27 30.0 0.92 -0.02 - 0.02 0.84 0.0003
29 30.1 2.92 0.08 0.24 8.51 0.0069
29 30.0 2.92 -0.02 - 0.05 8.51 0.0003
27 30.2 0.92 0.18 0.17 0.84 0.0336
26 30.1 -0.08 0.08 - 0.01 0.01 0.0069
27 30.2 0.92 0.18 0.17 0.84 0.0336
28 30.3 1.92 0.28 0.54 3.67 0.0803
313 360.2 0.00 0.00 5.08 90.92 0.4567
Table 6 - Correlation Data Table
16
First finding the mean,
ӯ = σ 𝑦𝑖
𝑛

Age (x) = 313


12

= 26.0833

Time (y) = 360.2


12

= 30.0167

5.08
r = 90.92 𝑥 0.4567
= 0.78891

The range of correlation coefficient is from -1 to +1. Our result is 0.789 or 78.9%, which means
that there is a very strong positive relationship between the time and age of the athletes which is
in line with the thoughts of the trainer.

Age (x) Time (y)


Age (x) 1
Time (y) 0.78890949 1
Table 7 - Correlation summary by Excel

17
5.0 REGRESSION

5.1 REGRESSION ANALYSIS

CFI described regression analysis as a set of statistical methods used for the estimation of
relationships between a dependent variable and one or more independent variables. It can be
utilized to assess the strength of the relationship between variables and for modeling the future
relationship between them.

Regression analysis includes several variations, such as linear, multiple linear, and nonlinear.
Simple linear regression is a model that assesses the relationship between a dependent variable
and an independent variable. This is based on six assumptions which state that:

* The dependent and independent variables show a linear relationship between the slope
and the intercept - linearity .
* The independent variable is not random.
* The value of the residual (error) is zero.
* The value of the residual (error) is constant across all observations - Homoscedasticity .
* X variables and residuals are uncorrelated.
* The residual (error) values follow the normal distribution.
*

The simple linear model is expressed using the following equation:


Y = a + bX + ϵ

Where:
Y – Dependent variable

X – Independent (explanatory) variable

a – Intercept

b – Slope

ϵ – Residual (error)

5.2 COMPUTATION ON REGRESSION ANALYSIS

Using Table 6 below, we will carry out a regression analysis on the sample data collected on the
age and time taken by the athletes on the 200m sprint in order to be able to determine the best
age to produce a time of 30seconds on a 200m sprint.

18
Age
Time (y) 𝒙𝑦 (𝒙)𝟐 (𝒚)2
(x )
20 29.7 594.0 400 882.1
22 29.6 651.2 484 876.2
23 29.9 687.7 529 894.0
27 30.1 812.7 729 906.0
28 30.0 840.0 784 900.0
27 30.0 810.0 729 900.0
29 30.1 872.9 841 906.0
29 30.0 870.0 841 900.0
27 30.2 815.4 729 912.0
26 30.1 782.6 676 906.0
27 30.2 815.4 729 912.0
28 30.3 848.4 784 918.1
313 360.2 9400.30 8255.00 10,812.5
Table 7 - Regression Data Table
If Y = a + bX

a =

= (360.2)(8255) - (313)(9400.30)
12(8255) - (313)^2

= 2973451 - 2942294
99060 - 97969
= 31157.1
1091
= 28.5583

b =

= 12(9400.30) - (313)(360.2)
12(8255) - (313)^2
= 112803.6 - 112742.6
99060 - 97969
= 61
1091
= 0.05591

19
Having determined a and b, we would insert the values of x individaully,
y = 28.56 + 0.056x
When x is 20,
y = a + bX
= 28.56 + 0.056(20)
= 29.68

When x is 22,
y = a + bX
= 28.56 + 0.056(22)
= 29.79

When x is 23,
y = a + bX
= 28.56 + 0.056(23)
= 29.84

When x is 27,
y = a + bX
= 28.56 + 0.056(27)
= 30.07

When x is 28,
y = a + bX
= 28.56 + 0.056(28)
= 30.12

When x is 27,
y = a + bX
= 28.56 + 0.056(27)
= 30.07

When x is 29,
y = a + bX
= 2.855 + 0.056(29)
= 30.18
When x is 29,
y = a + bX
= 28.56 + 0.056(29)
= 30.18
When x is 27,
y = a + bX
= 28.56 + 0.056(27)
= 30.07
20
When x is 26,
y = a + bX
= 28.56 + 0.056(26)
= 30.01

When x is 27,
y = a + bX
= 28.56 + 0.056(27)
= 30.07

When x is 28,
y = a + bX
= 28.56 + 0.056(28)
= 30.12

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.7889
R Square 0.6224
Adjusted R Square 0.5846
Standard Error 0.1313
Observations 12

Residual Output
Predicted Standard
Observation
Y Residuals Residuals
1 29.676535 0.0235 0.1874
2 29.788359 -0.1884 -1.5044
3 29.844271 0.0557 0.4451
4 30.067919 0.0321 0.2562
5 30.123831 -0.1238 -0.9890
6 30.067919 -0.0679 -0.5425
7 30.179743 -0.0797 -0.6369
8 30.179743 -0.1797 -1.4356
9 30.067919 0.1321 1.0549
10 30.012007 0.0880 0.7028
11 30.067919 0.1321 1.0549
12 30.123831 0.1762 1.4070

21
Line Plot of Athletes Time and Age
30.4

30.3

30.2

30.1

30.0

29.9

29.8

29.7

29.6

29.5
0 5 10 15 20 25 30 35

Y Predicted Y

Figure 4 - Graphical representation of regression analysis

5.3 Conclusion
Based on the above computation, we can see that athletes at below age 26 will complete
the race within a time range of less than 30seconds with age 20 completing in 29.68s, 22
in 29.79s and 29.84s. On the other hand, athletes above the age of 26 will spend more
than 30seconds in completing the race. In that, age 26 will complete the race in 30.01s,
age 27 - 30.07s, age 28 - 30.12s and age 29 - 30.18s.
Bearing this in mind, we can conclude that athlethes less than 26years of age stand a
better chance of completing the race within the standard time. However, if considering the
best age(a single age) to produce a time of 30 seconds on a 200 sprint, that will be age 26
because its 30.01 timing is the closest to 30seconds

22
REFERENCES

Agresti, A. & Finlay, B. (1997). Statistical methods for the social sciences . Upper Saddle River, NJ.
Prentice Hall, Inc.
Frost, J. (2018). Normal Distribution in Statistics (Online). Available at:
https://statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode/
Manikandan, S. (2011). Frequency Distribution. Journal of Pharmacology & Pharmacotherapeutics,
Medknow Publications (online). Available at: https://ncbl.nlm.nih.govt/pmc/articles/PMC3117575.
[Accessed 03 Aug. 2020]
Jin, G. (2018). Summary Measure of Quantitaive Data [online] by SoftChalk LessonBuilder. Available
at: https://my.ilstu.edu/~gjin/hsc204-hed/ Module-5-Summary-Measure-2'. [Accessed 06 Aug. 2020]

Data Handling - Frequency Distribution and Data: Types, Tables, and Graphs (online). Available at
https://www.toppr.com/guides/maths/data-handling/data-and-its-frequency-distribution/ [Accessed 03
Aug. 2020]

Bluman, A.G. (2013). Elementary Statistics: A Step-by-Step Approach with Formula Card. McGraw-Hill
Science Engineering, New York , US

Fowler, E. (2017) Chapter 2 – Frequency Distributions and Graphs or Making Pretty Tables and Pretty
Pictures [online]. Available at: https://silo.tips/download/chapter-2-frequency-distributions-and-graphs-or-
making-pretty-tables-and-pretty-pictures/ [Accessed 06 Aug. 2020]

Glen, S. "Frequency Distribution Table: Examples, How to Make One" From StatisticsHowTo.com:
Elementary Statistics for the rest of us! (online). Available at:
https://www.statisticshowto.com/probability-and-statistics/descriptive-statistics/frequency-distribution-
table/ [Accessed 02 Aug. 2020]

Privitera, G. J. (2012). Statistics for the behavioral sciences . Thousand Oaks, CA. SAGE Publications,
Inc.
Sundar, K. (2017). "Explain the Use of Coefficient of Variation with examples" From Benchmark 6ix
sigma.(online). Available at: https:/www.benchmarksixsigma/forum/topic/34927-cv-coefficient-of-
variation/ [Accessed 08 Aug. 2020]
Douglas, S. & Zhiyi, Z. (2012). Beginning Statistics .(online) Avaliable at:
http://2012books.lardbucket.org/ [Accessed 06 Aug. 2020]
Watkins, J. (2012). An Introduction to the Science of Statistics: From Theory to Implementation . Arizona
Maths, University of Arizona.
Larson & Farber (2012). Elementary Statistics : Picturing the World . Pearson Education, Inc.
Glen, S. "Confidence Interval: How to find a confidence interval: The Easy way!"
From StatisticsHowTo.com: Elementary Statistics for the rest of us! (online). Available at:
https://www.statisticshowto.com/probability-and-statistics/confidence-interval/ [Accessed 08 Aug. 2020]

23
Gonick, L. (1993). The Cartoon Guide to Statistics. HarperPerennial.
Edwards, A. L. (1976). An Introduction to Linear Regression and Correlation. San Francisco, CA.

Neutens, J. J., & Rubinson, L. (1997). Research techniques for the health sciences . Needham Heights,
MA. Allyn & Bacon.

CFI Education Inc. "Regression Analysis: Formulas, Explanation, Examples ad Definition. ( online).
Available at: https://www.corporatefinanceinstitute.com/resources/knowledge/finance/regression-
analysis/

Kundu, S. An introduction to Business Statistics - Directorate of Distance Education -Guru. (online).


Available at: https://www.ddegjust.ac.in/studymaterial/mcom/mc-106.pdf

Gravetter FJ, Wallnau LB (2000). Statistics for the behavioral sciences . 5th ed. Belmont: Wadsworth –
Thomson Learning; 2000. [Google Scholar] [Accessed 08 Aug. 2020]

Edwards, A. L. (1976). An Introduction to Linear Regression and Correlation. San Francisco, CA.

24

You might also like