Professional Documents
Culture Documents
S & Punit 1
S & Punit 1
Descriptive Measures
Unit: 1
Sr. Course PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
No Outcome
1 CO 1 3 3 3 3 1 1 2
2 CO 2 3 3 3 2 1 1 2 2
3 CO 3 3 2 3 2 1 1 1
4 CO 4 3 2 2 3 1 1 1
5 CO.5 3 3 2 2 1 1 1 2 2
• Introduction
• Measures of central tendency – mean, median, mode
• Measures of dispersion – mean deviation, standard deviation,
quartile deviation, variance
• Moment
• Skewness and kurtosis
• Least squares principles of curve fitting, Covariance
• Correlation and Regression analysis
• Correlation coefficient: Karl Pearson coefficient, rank correlation
coefficient
• Uni-variate and multivariate linear regression
• Application of regression analysis, Logistic Regression
• Time series analysis- Trend analysis (Least square method).
9/22/2022 Faculty Name Aakansha Vyas Unit Number 1 12
Unit Objective(CO 1)
• The objective of this course is to familiarize the engineers with concept
of Statistical techniques.
• It aims to show case the students with standard concepts and tools
from B. Tech to deal with advanced level of mathematics and
applications that would be essential for their disciplines.
=𝑁
Solution:
Computation of mean
𝑓1 x1 +𝑓2 x2 +⋯ + 𝑓𝑛 xn
𝑥ҧ =
𝑓1 + 𝑓2 + ⋯ + 𝑓𝑛
Example
1. Median of Values 25, 20, 15, 35, 18. Median: 20
2. Median of Values 8, 20, 50, 25, 15, 30. Median: 22.5
Solution:
𝑁 8+10+11+16+20+25+15+9+6 120
i. Find = = = 60, where N
2 2 2
=σ𝑛
𝑖=1 𝑓𝑖
𝑁
ii. See the cumulative frequency (c.f.) just greater than .
2
𝑁
Here N = 120, The cumulative frequency just greater than is 65 and
2
the 2 value of x corresponding to 65 is 5. Therefore, median is 5.
➢ Mode:
• Mode is the value which occurs most frequently in a set of
observations and around which the other items of the set cluster
densely.
• It is the point of maximum frequency or the point of greatest
density.
• In other words the mode or modal value of the distribution is that
value of the variate for which frequency is maximum.
Calculation of Mode
❖ In case of discrete distribution: Mode is the value of x
corresponding to maximum frequency but in any one (or more)of
the following cases.
where 𝑙 is the lower limit,ℎ 𝑡ℎ𝑒 width and 𝑓𝑚 the frequency of the
model class 𝑓1 𝑎𝑛𝑑 𝑓2 are the frequencies of the classes preceding and
succeeding the modal class respectively. While applying the above
formula it is necessary to see that the class intervals are of the same
size.
𝑺𝒊𝒛𝒆(𝒙) 1 2 3 4 5 6
4 2 7
5 5 13
6 8 17 15
7 9 21 22 29
8 12 26 35
9 14 28 40 43
10 14 29 40
11 15 26 39
12 11 24
13 13
𝑓𝑚 −𝑓1
Mode= 𝑙 + ×ℎ
2𝑓𝑚 −𝑓1 −𝑓2
32 − 16
= 15.5 + ×5
64 − 16 − 24
16
= 15.5 + ×5
24
10
= 15.5 +
3
= 18.83 𝑚𝑎𝑟𝑘𝑠
Q.1 Calculate the mean, median and mode of the following data-
Wages (in Rs) 0-20 20-40 40-60 60-80 80-100 100-120 120-140
No. of Workers 6 8 10 12 6 5 3
Definition
Absolute Relative
Expressed in the In the form of ratio
same units in or percentage, so is
which data is independent of
expressed units
RANGE:-
It is the simplest measures of dispersion
It is defined as the difference between the largest and
smallest values in the series
R=L–S
R = Range
L = Largest Value
S = Smallest Value
𝐿−𝑆
Coefficient of Range=
𝐿+𝑆
❖ Individual Series:-
Q1: Find the range & Coefficient of Range for the following data: 20,
35, 25, 30, 15
Solution:-
L = Largest Value=35
S = Smallest Value=15
(Range)R = L – S=35-15=20
𝐿−𝑆 35−15 20
Coefficient of Range = = = = 0.4
𝐿+𝑆 35+15 50
❖ Discrete Frequency Distribution:
Q2: Find the range & Coefficient of
(Range)R = L – S=70-10=60
𝐿−𝑆 70−10 60
Coefficient of Range = = = = 0.75
𝐿+𝑆 70+10 80
Continuous Frequency Distribution
Q3: Find the range & Coefficient of Range:
Size 5-10 10-15 15-20 20-25 25-30
F 4 9 15 30 40
𝑛+1 7+1
Q3 = 𝑆𝑖𝑧𝑒 𝑜𝑓3 𝑡ℎ 𝑖𝑡𝑒𝑚 = 𝑆𝑖𝑧𝑒 𝑜𝑓3 𝑡ℎ 𝑖𝑡𝑒𝑚=28
4 4
Symbolically, Interquartile Range = Q3 – Q1 =28-18 =10
Q – Q1 28–18
Quartile Deviation = 3 = =5
2 2
Q – Q1 28–18
Coefficient of Q.D. = 3 = = 0.217
Q3 + Q1 28+ 1 8
Example:
X 10 20 30 40 50 60
F 2 8 20 35 42 20
Solution:
X F C.F.
10 2 2
:
20 8 10
30 20 30
40 35 65
50 42 107
60 20 127
N=127
9+1
= 𝑆𝑖𝑧𝑒 𝑜𝑓 𝑡ℎ 𝑡𝑒𝑟𝑚 = 40
2
σ 𝒅𝒙ഥ 160
M.D from Mean 𝑀. 𝐷.𝑥ҧ = = = 17.78
𝑛 9
𝑀.𝐷.ഥ𝑥 17.78
Coefficient of 𝑀. 𝐷.𝑥ҧ = = = 0.39
𝑥ҧ 45
σ 𝒅𝒎 155
M.D from Median 𝑀. 𝐷.𝑀 = = = 17.22
𝑛 9
𝑀.𝐷.𝑀 17.22
Coefficient of 𝑀. 𝐷.𝑀 = = = 0.43
𝑀 40
9/22/2022 Faculty Name Aakansha Vyas Unit Number 1 50
MEAN DEVIATION(CO1)
Marks X Deviation from mean Deviation from median
45 𝒅ഥ𝒙 = 𝑿 − 𝟒𝟓 40 𝒅𝒎 = 𝑿 − 𝟒𝟎
20 25 20
22 23 18
25 20 15
38 7 2
40 5 0
50 5 10
65 20 25
70 25 30
75 30 35
N=9,
σ𝑋 = 𝒅ഥ𝒙 =160 𝒅𝒎 =155
405
Faculty Name Aakansha Vyas Unit
9/22/2022 51
Number 1
MEAN DEVIATION(CO1)
Solution:
σ 𝑓 𝒅𝒎 620
M.D from Median= = = 10.33
𝑁 60
𝑀.𝐷.𝑀 10.33
Coefficient of 𝑀. 𝐷.𝑀 = = = 0.258
𝑀 40
σ 𝑓𝑥 2460
Mean𝑥ҧ = = = 41
𝑁 60
σ 𝑓 𝒅𝒙ഥ 640
M.D from Mean= = = 10.67
𝑁 60
𝑀.𝐷.ഥ𝑥 10.67
Coefficient of 𝑀. 𝐷.𝑥ҧ = = = 0.26
𝑥ҧ 41
❖ For an Individual Series :If 𝑥1 , 𝑥2,….. 𝑥𝑛 are the values of the variable
under consideration , ഥ𝑥 is defined as
where 𝑁 = σ𝑛𝑖=1 𝑓𝑖
Note. In case of a frequency distribution with class intervals, the values
of 𝑥 are the midpoints of the intervals.
Example 1.Find the Variance and standard deviation for the following
individual series.
𝒙 3 6 8 10 18
Solution:
𝒙 𝒙−ഥ
𝒙 𝒙−ഥ
𝒙 𝟐
3 -6 36
6 -3 9
8 -1 1
10 1 1
18 9 81
𝑥 = 45 𝒙−ഥ
𝒙 𝟐 = 𝟏𝟐𝟖
σ𝑥 45
n=5,σ 𝑥 = 45, 𝑥ҧ = 𝑛
=
5
=9
1 128
𝜎 2 = 𝑛 σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 2
= 5
= 25.6,
Standard deviation= variance = 25.6 = 𝟓. 𝟎𝟓
• Example Find the variance and standard deviation for the following
frequency distribution.
• Sol.
s Studen Point = 𝒙 − 𝟑𝟒
ts(𝒇) (𝒙)
5-15 10 10 100 -24 5760
σ 𝑓𝑥 3400
𝑥ҧ = = = 34
𝑁 100
Moments
• In mathematical statistics it involve a basic calculation. These
calculations can be used to find a probability distribution's mean,
variance, and skewness.
σ𝑛
𝑖=1 𝑥𝑖 − 𝑥ҧ
𝑟
Moment about mean 𝜇𝑟 = ; r = 0,1,2, … .
𝑛
σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥ҧ 𝑟
𝜇𝑟 = ; r = 0,1,2 … .
𝑁
where 𝑁 = σ𝑛𝑖=1 𝑓𝑖
1 1 𝑁
in particular 𝜇0 = σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥ҧ 0
= σ𝑛𝑖=1 𝑓𝑖 = =1
𝑁 𝑁 𝑁
Note. In case of a frequency distribution with class intervals, the values
of 𝑥 are the midpoints of the intervals.
Example 1.Find the first four moments for the following individual series.
Solution: Calculation of Moments
𝒙 3 6 8 10 18
Therefore for any distribution ,𝜇2 coincides with the variance of the
distribution.
1 486
Similarly, 𝜇3 = σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 3
= = 97.2
𝑛 5
1 7940
𝜇4 = σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 4
= = 1588
𝑛 5
σ 𝑥 45
Now 𝑥ҧ = = =9
𝑛 5
σ 𝑥− 𝑥ҧ 0
𝜇1 = = =0,
𝑛 5
σ 𝑥− 𝑥ҧ 2 128
𝜇2 = = =25.6,
𝑛 5
σ 𝑥− 𝑥ҧ 3 486
𝜇3 = = =97.2,
𝑛 5
σ 𝑥− 𝑥ҧ 4 7940
𝜇4 = = =1588,
𝑛 5
Therefore for any distribution ,𝜇2 coincides with the variance of the
distribution.
1
Similarly, 𝜇3 = σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥ҧ 3
𝑁
1
𝜇4 = σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥ҧ 4
and so on.
𝑁
s Studen Point =𝒙 −ഥ 𝒙 )𝟑
ts(𝒇) (𝒙) − 𝟑𝟒
5-15 10 10 100 -24 -240 5760 -138240 3317760
σ 𝑓𝑥 3400
𝑥ҧ = = = 34
100𝑁
σ𝒇 𝒙 −ഥ𝒙 0
𝜇1 = = =0
𝑁 100
1
𝜇′𝑟 = σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝐴 𝑟 ; 𝑟 = 0,1,2, …
𝑁
Where,𝑁 = σ𝑛𝑖=1 𝑓𝑖
1
For 𝑟 = 0, 𝜇′0 = σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝐴 0
=1
𝑁
1 1 𝐴
For 𝑟 = 1, 𝜇′1 = σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝐴 = σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − σ𝑛𝑖=1 𝑓𝑖 = 𝑥ҧ − 𝐴
𝑁 𝑁 𝑁
1
For 𝑟 = 2, 𝜇′2 = σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝐴 2
𝑁
1
For 𝑟 = 3, 𝜇′3 = σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝐴 3
and so on.
𝑁
In Calculation work, if we find that there is some common factor ℎ(>1)
in values of 𝑥 − 𝐴,we can ease our calculation work by defining 𝑢 =
𝑥−𝐴
.
ℎ
In that case , we have
Where, 𝑁 = σ𝑛𝑖=1 𝑓𝑖
1 𝑁
For 𝑟 = 0, 𝑣0 = σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 0 = =1
𝑁 𝑁
1
For 𝑟 = 1, 𝑣1 = σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 = 𝑥ҧ
𝑁
1
For 𝑟 = 2, 𝑣2 = σ𝑛𝑖=1 𝑓𝑖 𝑥𝑖 2 and so on.
𝑁
relations:
𝜇1 = 0
𝜇2 = 𝜇2 ′ − 𝜇1 ′2
𝜇3 = 𝜇3 ′ − 3𝜇2 ′𝜇1 ′ + 2𝜇1 ′3
𝜇4 = 𝜇4′ − 4𝜇3′ 𝜇1 ′ + 6𝜇2 ′𝜇1 ′2 − 3𝜇1 ′4
𝜇3 2 𝜇4
𝛽1 = 𝛽2 = (𝛽 −coefficients)
𝜇2 3 𝜇2 2
𝛾1 = + 𝛽1 𝛾2 = 𝛽2 − 3 (𝛾 −coefficients)
= −40 − 48 + 2 = −86.
= 141290.11.
Example 3:Calculate the variance and third central moment from the
following data.
𝒙𝒊 0 1 2 3 4 5 6 7 8
𝐹𝑖 1 9 26 59 72 52 29 7 1
Solution: Calculation of Moments
𝒙 𝒇 𝒖=
𝒙−𝑨
, 𝑨 = 𝟒, 𝒉 = 𝟏 𝒇𝒖 𝒇𝒖𝟐 𝒇𝒖𝟑
𝒉
0 1 -4 -4 16 -64
1 9 -3 -27 81 -243
2 26 -2 -52 104 -208
3 59 -1 -59 59 -59
4 72 0 0 0 0
σ 𝑓𝑢2 507
𝜇′2 = ℎ2 = =1.9805
𝑁 256
σ 𝑓𝑢3 3 −37
𝜇′3 = ℎ = = −0.1445
𝑁 256
Moments about Mean:
𝜇1 = 0
2
𝜇2 = 𝜇′2 − 𝜇 ′1 = 1.9805 − −.02734 2
= 1.97975
Variance=1.97975
Also 𝜇3 = 𝜇′3 − 3𝜇′2 𝜇′1 + 2𝜇1 ′3
3
= −0.1445 − 3 1.9805 −0.02734 + 2 −0.02734
=0.0178997
Third central moment= 0.0178997.
Skewness
• It tells us whether the distribution is normal or not
• It gives us an idea about the nature and degree of concentration of
observations about the mean
• The empirical relation of mean, median and mode are based on a
moderately skewed distribution
❑ Skewness:
• I t means lack of symmetry.
• It gives us an idea about the shape of the curve which we can draw with
the help of the given data.
• A distribution is said to be skewed if—
Mean, median and mode fall at different points, i.e.,
Mean ƒ= Median ƒ= Mode;
• Quartiles are not equidistant from median; and
• The curve drawn with the help of the given data is not symmetrical
but stretched more to one side than to the other.
Symmetrical Distribution
A symmetric distribution is a type of distribution where the left side of
the distribution mirrors the right side. In a symmetric distribution, the
mean, mode and median all fall at the same point.
M e a s u r e s o f S ke w n e s s :
The measures of skewness are:
• Sk = M − M d ,
• Sk = M − M o ,
• Sk = (Q3 − M d ) − (M d − Q1),
where M is the mean, M d , the median, M o , the mode, Q1, the first
quartile deviation and Q3 , the third quartile deviation of the distribution.
These are the absolute measures of skewness.
• C o e f f i c i e n t s o f S k e w n e s s : For comparing two series we do
not calculate these absolute measures but we calculate the relative measures
called the coefficients of skewness which are pure numbers independent of
units of measurement.
C o e f f i c i e n t o f S ke w n e s s b a s e d u p o n M o m e n t s
Definition
𝜇3
It is defined as: 𝛾1 =
𝜇2 3
Pe ars o n ’s 𝜷𝟏 a n d 𝜸 𝟏 C o e f f i c i e n t s :
𝝁𝟑
𝜸 𝟏 = 𝜷𝟏 = ±
𝝁𝟐 𝟑
Q1. Karl Pearson coefficient of skewness of a distribution is 0.32, its
standard deviation is 6.5 and mean is 29.6. find the mode of the
distribution.
Solution: Given that 𝑆𝐾𝑝 = 0.32, σ=6.5 mean =29.6
𝐴. 𝑀. −𝑀𝑜𝑑𝑒 3 𝑀 − Md
𝑆𝐾𝑝 = =
𝑆. 𝐷 σ
29.6 − 𝑀𝑜𝑑𝑒
0.32 = ⟹ 𝑀𝑜𝑑𝑒 = 27.52
6.5
Kurtosis
• Describe the concepts of kurtosis
• Explain the different measures of kurtosis
• Explain how kurtosis describe the shape of a distribution.
❑ Kurtosis
• If we know the measures of central tendency, dispersion and skewness,
we still cannot form a complete idea about the distribution. Let us
consider the figure in which all the three curves
• A, B, and C are symmetrical about the mean and have the same range.
Curve of the type C which is more peaked than the normal curveis called leptokurtic
curveand for such curve 𝛽2 > 3, i.e., γ2 >0.
Q2. For a distribution, the mean is 10, variance is 16, γ1 is +1 and 𝛽2 is 4. Comment
about the nature of distribution. Also find third central moment.
𝝁𝟑
Solution1 = ± ⇒ 𝝁𝟑 =64, 𝝁𝟐 =16,
𝟒𝟎𝟗𝟔
𝜇4
4= ⇒ 𝜇4 = 1024
256
Example 3 The first four moment about the working mean 28.5 of a
distribution are 0.294,7.144,42.409 and 454.98. Calculate the first four
moment about mean. Also evaluate 𝛽1 and 𝛽2 and comment upon
the skewness and kurtosis of the distribution.
Solution:𝜇′1 = .294, 𝜇′2 = 7.144, 𝜇′3 = 42.409, 𝜇′4 =
454.98Moment about mean
𝜇1 = 0,
𝜇2 = 𝜇2′ − 𝜇1 ′2 = 7.0576.
𝜇3 = 𝜇3′ − 3𝜇2′𝜇1 ′ + 2𝜇1 ′3 = 36.1588,
𝜇4 = 𝜇4′ − 4𝜇3′𝜇1′ + 6𝜇2′𝜇1 ′2 − 3𝜇1′4 = 408.7896
𝜇4
𝛽2 = 2 = 8.207
𝜇2
Skewness :𝛽1 is positive so 𝛾 1 =
1.9285 so distribution is positivley skewed.
Kurtosis: 𝛽2 = 8.207 > 3 so distribution is leptokutic.
Q1. Find all four central moments and Discuss Skewness and Kurtosis
for the following distribution-
No.of Library 5 10 8 16 14 12
✓ Moments
✓ Relation between 𝑣𝑟 𝑎𝑛𝑑 𝜇𝑟
✓ Relation between 𝜇𝑟 𝑎𝑛𝑑 𝜇′𝑟
✓ Moment generating function.
✓ Skewness
✓ Kurtosis
Curve Fitting
• The objective of curve fitting is to find the parameters of a
mathematical model that describes a set of data in a way that
minimizes the difference between the model and the data.
Sol. Let the straight line obtained from the given data be
𝑦 = 𝑎. 1 + 𝑏𝑥 (1)
then the normal equations are
σ 𝑦 = 𝑚𝑎 + 𝑏 σ 𝑥 (2)
σ 𝑥𝑦 = 𝑎 σ 𝑥 + 𝑏 σ 𝑥 2 (3) m=5
Example 5 Use the method of least squares to the fit the curve:
𝑐0
𝑦= + 𝑐1 𝑥 to the following table of values:
𝑥
𝒙 𝑦 𝑦 𝑦 𝑥 𝟏 1
𝑥 𝑥 𝑥2
0.1 21 210 6.64078 3.16228 100
0.2 11 55 4.91935 2.23607 25
0.4 7 17.5 4.42719 1.58114 6.25
0.5 6 12 4.24264 1.41421 4
1 5 5 5 1 1
2 6 3 8.48528 0.70711 0.25
4.2 302.5 33.7152 10.1008 136.5
4 1
𝑥 0 1 2 3 4
𝑓 1 0 3 10 21
Time series
1. It helps to understand the concept of Time-series for future
prediction of data values.
2. Understand the different basic concept / fundamentals of Time
Series Analysis.
3. Understand the importance of Time Series Analysis.
Faculty
Nidhi Name
Sharma Aakansha Vyas Unit Unit-2
KMB-104
9/22/2022 118
Number 1
INTRODUCTION OF TIME SERIES(CO1)
We know that planning about future is very necessary for the every
business firm, every govt. institute, every individual and for every
country. Every family is also doing planning for his income
expenditure. As like every business is doing planning for possibilities
of its financial resources & sales and for maximization its profit.
Definition: “A time series is a set of observation taken at specified
times, usually at equal intervals”.
“A time series may be defined as a collection of reading belonging to
different time periods of some economic or composite variables”.
By –Ya-Lun-Chau
▪ Time series establish relation between “cause” & “Effects”.
▪ One variable is “Time” which is independent variable & and the
second is “Data” which is the dependent variable.
Faculty Name
Nidhi Sharma Aakansha
KMB-104Vyas Unit Unit-2
9/22/2022 119
Number 1
TIME SERIES ANALYSIS(CO1)
Faculty Name
Nidhi Sharma Aakansha
KMB-104Vyas Unit Unit-2
9/22/2022 120
Number 1
Example(CO1)
We explain it from the following example:
Day No. of Packets of milk sold Year Population (in Million)
Monday 90 1921 251
Tuesday 88 1931 279
Wednesday 85
1941 319
Thursday 75
1951 361
Friday 72
1961 439
Saturday 90
1971 548
Sunday 102
1981 685
• From example 1 it is clear that the sale of milk packets is decrease from
Monday to Friday then again its start to increase.
Faculty Name Aakansha Vyas Unit
• Same thing in example 2 the population
9/22/2022
Number 1 is continuously increase.
121
Time Series (CO1)
Examples
• Stock price, Censex
• Exchange rate, interest rate, inflation rate, national GDP
• Retail sales
• Electric power consumption
• Number of accident fatalities
Faculty
Nidhi Name
Sharma Aakansha Vyas Unit Unit-2
KMB-104
9/22/2022 122
Number 1
Time Series (CO1)
The Method of least square can be used either to fit a straight line trend or a
parabolic trend.
The straight line trend is represented by the equation:-
= Yc = a + bx
N= 5 X = 15 Y =50 XY = 166 X = 55
2
Y = Na + b X
XY = a X + b X 2
-10b = -16
b = 1.6
Now we put the value of “b” in the equation (iii)
Faculty
Nidhi Name
Sharma Aakansha Vyas Unit Unit-2
KMB-104
9/22/2022 125
Number 1
Time Series (CO1)
= 5a + 15(1.6) = 50
5a = 26
a = 5.2
As according the value of ‘a’ and ‘b’ the trend line:-
Yc = a + bx
Y= 5.2 + 1.6X
Faculty
Nidhi Name
Sharma Aakansha Vyas Unit Unit-2
KMB-104
9/22/2022 126
Number 1
Daily Quiz (CO1)
Q1. Fit a straight line trend by the method of least square (taking 1980 as
year of origin) to the following data:
Year 1980 1981 1982 1983 1984 1985 1986
Faculty
Nidhi Name
Sharma Aakansha Vyas Unit Unit-2
KMB-104
9/22/2022 127
Number 1
Recap(CO1)
✓ Moments
✓ Relation between 𝑣𝑟 𝑎𝑛𝑑 𝜇𝑟
✓ Relation between 𝜇𝑟 𝑎𝑛𝑑 𝜇′𝑟
✓ Moment generating function.
✓ Skewness & kurtosis
✓ Curve fitting
✓ Time Series
Correlation
• Identify the direction and strength of a correlation between two factors.
• Compute and interpret the Pearson correlation coefficient and test for
significance.
• Compute and interpret the coefficient of determination.
• Compute and interpret the Spearman correlation coefficient and test
for significance.
σ 𝑥 − 𝑥ҧ 𝑦 − 𝑦ത
𝑟𝑥𝑦 =
𝑛𝜎𝑥 𝜎𝑦
𝑛 σ 𝑥𝑦−σ 𝑥 σ 𝑦
Or 𝑟 𝑥, 𝑦 =
𝑛 σ 𝑥2 − σ 𝑥 2 𝑛 σ 𝑦2 − σ 𝑦 2
Here 𝑛 is the no. of pairs of values of 𝑥 𝑎𝑛𝑑 𝑦.
Note: Correlation co efficient is independent of change of origin and
scale.
Let us define two new variables 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑠
𝑥−𝑎 𝑦−𝑏
𝑢= ,𝑣 = where 𝑎, 𝑏, ℎ, 𝑘 𝑎𝑟𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑡ℎ𝑒𝑛 𝑟𝑥𝑦 = 𝑟𝑢𝑣
ℎ 𝑘
𝑛 σ 𝑢𝑣−σ 𝑢 σ 𝑣
Then 𝑟 𝑢, 𝑣 =
𝑛 σ 𝑢2 − σ 𝑢 2 𝑛 σ 𝑣 2 − σ 𝑣 2
𝑥 = 34 𝑦 = 90 𝑥 2 = 248 𝑦2 = 1446
𝑥𝑦 = 582
𝑥−22 𝑦−24
Solution: Let 𝑢 = ,𝑣 =
4 6
𝒙 𝒚 𝒖 𝒗 𝒖𝟐 𝒗𝟐 𝒖𝒗
10 18 -3 -1 9 1 3
14 12 -2 -2 4 4 4
18 24 -1 0 1 0 0
22 6 0 -3 0 9 0
26 30 1 1 1 1 1
30 36 2 2 4 4 4
Total
𝑢 = −3 𝑣 = −3 𝑢 2 = 19 𝑣 2 = 19 𝑢𝑣
= 12
1 1 1 1 1 1
𝑢 = σ𝑢 =
Hence,n=6,ത −3 = − ; 𝑣ҧ = σ 𝑣 = −3 = −
𝑛 6 2 𝑛 6 2
𝑛 σ 𝑢𝑣−σ 𝑢 σ 𝑣
Then 𝑟𝑢𝑣 =
𝑛 σ 𝑢2 − σ 𝑢 2 𝑛 σ 𝑣 2 − σ 𝑣 2
6 × 12 − −3 −3 63
= = = 0.6
6 × 19 − −3 2 6 × 19 − −3 2 105 105
Marks 18 19 20 21 total
10-20 4 2 2 8
20-30 5 4 6 4 19
30-40 6 8 10 11 35
40-50 4 4 6 8 22
50-60 2 4 4 10
60-70 2 3 1 6
Total 19 22 31 28 100
𝑦−45
Let us define two new variables 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑠 𝑢 = , 𝑣 = 𝑥 − 20
10
= 0.25
RANK CORRELATION:
Definition: Assuming that no two individuals are bracketed equal in either
classification, each of the variables X and Y takes the values 1, 2, ...,n.
Hence, the rank correlation coefficient between A and Bis denoted by r,
and is given as:
Person A B C D E F G H I J
Rank in 9 10 6 5 7 2 4 8 1 3
maths
Rank in 1 2 3 4 5 6 7 8 9 10
physics
Person 𝑹𝟏 𝑹𝟐 D=𝑹𝟏 − 𝑹𝟐 𝑫𝟐
A 9 1 8 64
B 10 2 8 64
C 6 3 3 9
D 5 4 1 1
E 7 5 2 4
F 2 6 -4 16
G 4 7 -3 9
H 8 8 0 0
I 1 9 -8 64
J 3 10 -7 49
𝐷 2 = 280
Uses:
• It is used for finding correlation coefficient if we are dealing with
qualitative characteristics which cannot be measured quantitatively but
can be arranged serially.
• It can also be used where actual data are given.
• In case of extreme observations, Spearman’s formula is preferred to
Pearson’s formula.
Limitations
• It is not applicable in the case of bivariate frequency distribution.
• For n > 30, this formula should not be used unless the ranks are given,
since in the contrary case the calculations are quite time-consuming.
𝒙 68 64 75 50 64 80 75 40 55 64
𝑦 62 58 68 45 81 60 68 48 50 70
75 𝑿 68 64 75 50 64 80 75 40 55 64 Total
𝑌 62 58 68 45 81 60 68 48 50 70
Ranks in 4 6 2.5 9 6 1 2.5 10 8 6
𝑋(𝑥)
Ranks in 5 7 3.5 10 1 6 3.5 9 8 2
Y(𝑦)
𝐷 = 𝑥 − 𝑦 -1 -1 -1 -1 5 -5 -1 1 0 4 0
𝐷2 1 1 1 1 25 25 1 1 0 16 72
75 2 times
64 3 times
68 2 times
6 × 75 6
= 1− = = 0.545
990 11
Q1. Find the rank correlation coefficient for the following data:
𝑥 23 27 28 28 29 30 31 33 35 36
𝑦 18 20 22 27 21 29 27 29 28 29
✓ Correlation
✓ Karl Pearson coefficient of correlation
✓ Rank Correlation
✓ Tied Rank
Regression
• Explanation of the variation in the dependent variable, based on
the variation in independent variables and Predict the values of the
dependent variable.
❑ REGRESSION ANALYSIS:
• Regression measures the nature and extent of correlation
.Regression is the estimation or prediction of unknown values of
one variable from known values of another variable.
Difference between curve fitting and regression analysis: The only
fundamental difference, if any between problems of curve fitting and
regression is that in regression, any of the variables may be considered
as independent or dependent while in curve fitting, one variable
cannot be dependent.
Curve of regression and regression equation:
• If two variates 𝑥 𝑎𝑛𝑑 𝑦 are correlated i.e., there exists an
association or relationship between them, then the scatter diagram
➢ LINEAR REGRESSION:
• When the point of the scatter diagram concentrated round a
straight line, the regression is called linear and this straight line is
known as the line of regression.
• Regression will be called non-linear if there exists a relationship
other than a straight line between the variables under
consideration.
LINES OF REGRESSION:
Let 𝑦 = 𝑎 + 𝑏𝑥 ----.(1)
be the equation of regression line of 𝑦 𝑜𝑛 𝑥.
σ 𝑦 = 𝑛𝑎 + 𝑏 σ 𝑥 … … .(2)
σ 𝑥𝑦 = 𝑎 σ 𝑥 + 𝑏 σ 𝑥 2 … … .(3)
Solving (2) and (3) for ‘𝑎’ and ‘𝑏’ we get.
1
σ 𝑥𝑦− σ 𝑥 σ 𝑦 𝑛 σ 𝑥𝑦−σ 𝑥 σ 𝑦
𝑛
𝑏= 1 = …..(4)
σ 𝑥2 − σ𝑥 2 𝑛 σ 𝑥2 − σ 𝑥 2
𝑛
σ𝑦 σ𝑥
𝑎= −𝑏 = 𝑦ത − 𝑏𝑥ҧ … …(5)
𝑛 𝑛
Eqt.(5) given 𝑦ത = 𝑎 + 𝑏𝑥ҧ
Hence 𝑦 = 𝑎 + 𝑏𝑥 line passes through point 𝑥,ҧ 𝑦ത
Putting 𝑎 = 𝑦ത − 𝑏 𝑥ҧ in equation 𝑦 = 𝑎 + 𝑏𝑥 ,we get
𝑦 − 𝑦ത = 𝑏 𝑥 − 𝑥ҧ ………(6)
Eqt.(6) is called regression line of 𝑦 𝑜𝑛 𝑥.′ 𝑏′ is called the regression
coefficient of 𝑦 𝑜𝑛 𝑥 and is usually denoted by 𝑏𝑦𝑥.
𝑦 − 𝑦ത = 𝑏𝑦𝑥 𝑥 − 𝑥ҧ
𝜎𝑦
𝑏𝑦𝑥 = 𝑟
𝜎𝑥
⇒ 𝑛𝑟𝜎𝑥 𝜎𝑦 = 𝑎 0 + 𝑏𝑛𝜎𝑥 2
𝜎𝑦
⇒𝑏=𝑟
𝜎𝑥
Where 𝑟 is the coefficient of correlation 𝜎𝑥 𝑎𝑛𝑑 𝜎𝑦 are the standard
deviations of 𝑥 𝑎𝑛𝑑 𝑦 series respectively.
𝑟𝜎𝑦 𝑟𝜎𝑥
G.M. between them= × = 𝑟 2 = r =coefficient of
𝜎𝑥 𝜎𝑦
correlation.
Property 2.If one of the regression coefficients is greater than unity,
the other must be less than unity.
𝑟𝜎𝑦 𝑟𝜎𝑥
Proof. The two regression coefficients are 𝑏𝑦𝑥 = and 𝑏𝑥𝑦 = .
𝜎𝑥 𝜎𝑦
1
Let 𝑏𝑦𝑥 >1,then <1
𝑏𝑦𝑥
𝜎𝑥 2 + 𝜎𝑦 2 > 2𝜎𝑥 𝜎𝑦
2
𝜎𝑥 − 𝜎𝑦 > 0 which is true.
Property 4:Regression coefficients are independent of the origin but
not of scale.
𝑥−𝑎 𝑦−𝑏
Proof. Let 𝑢 = ,𝑣 = , where a, b, h and k are constants
ℎ 𝑘
ℎ
Similarly, 𝑏𝑥𝑦 = 𝑏𝑢𝑣 ,
𝑘
Thus 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are both independent of a and b but not of ℎ 𝑎𝑛𝑑 𝑘.
1−𝑟 2 𝜎𝑥 𝜎𝑦
𝑡𝑎𝑛𝜃 = . , where 𝑟, 𝜎𝑥,𝜎𝑦 have their usual meanings.
𝑟 𝜎𝑥 2 +𝜎𝑦 2
tan
➢ NON-LINEAR REGRESSION:
Let 𝑦 = 𝑎. 1 + 𝑏𝑥 + 𝑐𝑥 2
Be a second degree parabolic curve of regression of 𝑦 on 𝑥.
⇒ 𝑦 = 𝑛𝑎 + 𝑏 𝑥 + 𝑐 𝑥 2
⇒ 𝑥𝑦 = 𝑎 𝑥 + 𝑏 𝑥 2 + 𝑐 𝑥 3
⇒ 𝑥2𝑦 = 𝑎 𝑥2 + 𝑏 𝑥3 + 𝑐 𝑥4
𝑥𝑦 = 𝑎 𝑥 + 𝑏 𝑥 2 + 𝑐 𝑥𝑧
𝑦𝑧 = 𝑎 𝑧 + 𝑏 𝑥𝑧 + 𝑐 𝑧 2
𝒙 1 2 3 4
𝑦 12 18 24 30
𝑧 0 1 2 3
𝑥𝑦 = 𝑎 𝑥 + 𝑏 𝑥 2 + 𝑐 𝑥𝑧
𝑦𝑧 = 𝑎 𝑧 + 𝑏 𝑥𝑧 + 𝑐 𝑧 2
Q1. Fit a straight line trend by the method of least square (taking 1978
as year of origin) to the following data:
Year 1979 1980 1981 1982 1983 1984
5 7 9 10 12 17
Production
X 1 2 3 4 5
Y 2 4 5 3 6
Q4. Fit a straight line trend by the method of least squares to the
following data: -
Year 2012 2013 2014 2015 2016 2017
Sales of T.V. 7 10 12 14 17 24
sets (in’000)
Q3. Sum of squares of items 2430, mean is 7 N=12, find the variance.
i. 176.5
ii. 12.38
iii. 153.26
iv. 14
Q4. Calculate the standard variation of the following
9, 8, 6,5,8,6
i. 2
ii. 3
iii. 1.414
iv. 2.414
• Set A.docx