Professional Documents
Culture Documents
2: Methodology:: Name: Hassan Raza Roll:20-10755 Math-107-Project Section: C
2: Methodology:: Name: Hassan Raza Roll:20-10755 Math-107-Project Section: C
2: Methodology:: Name: Hassan Raza Roll:20-10755 Math-107-Project Section: C
Roll:20-10755
Math-107-Project
Section: C
Data Set:3
1: Introduction: The data provided to me is the data taken from 51 people and the data is the
gross and budget of different Bollywood movies released. The highest value in gross is
500.75(Baahubali) while the highest value in data budget is of 130.The data has no outliners and
is more continuous and the data is quantative due to numeric data.
2: Methodology:
Finding Number of classes
No of Classes: 2k>n
2k>50
26>50
K=6
No of classes (For both the variables) = 6
Finding class interval: h= highest value – lowest value
N
For Data A: 500.75-2 For Data B: 130-6
50 50
For A Class interval= 85 For Data B Class interval=22
1, Frequency Distribution
Frequency(F) Frequency(F)
40 16
35 14
30 12
25 10
20 8
15 6
10 4
5 2
0 0
44 129 214 299 384 469 16.5 38.5 60.5 82.5 104.5 126.5
Data A Data B
Interpretation: Both the variables have totally different frequency curves or polygons. The
curve of Data A is high and then flat while the curve of Data B is in zig zag shape.
Key: The median is indicated with the blue line and the mode is indicated with red line in both
graphs.
Mean= ∑Fx
∑F
Mean= 3859/51 = 75.66
median= L + h/f n/2-c
Median class = n/2 = 51/2 = 25.5(lies in 1st row (2-86)
Median= 1.5 + 85/36 25.5-36
Median=1.5+24.78
Median = 26.28
Mode= L + fm – f1 *h
(fm-f1) + fm-f2)-
Modal class = Highest frequency class =36= 2-86
Mode= 1.5 + 36-0 * 85
(35-0) + (36-14)
Mode=1.5 + 53.68
Mode= 55.18
The data is symmetrical (Skewness=0) as the mean is higher than the median which is less
than the mode. I. e (75.66>26.28<55.18)
Mean is the best appropriate measure for the central tendency as this data is continuous and
have no outliners. So, mean poses the best measure for central tendency.
Measures of Dispersion:
1:Range: largest value – lowest value
Range=500.75-2
Range=498.75
Coefficient of Range = L – S
L+S
Coefficient of range = 500.75-2
500.75+2
Coefficient of range= 498.75
502.75
Coefficient of range = 0.992
Quartile deviation
Quartile Deviation = Q3 – Q1
2
Q3= L + h/f (3(n/4 – C)
Quartile class = 3n/4 = 3(51/4) = 38.25 (lies in 2nd class 87-171)
Q3= 86.4+ 85/14 (38.25-36)
Q3= 86.5 +6.07(38.25-36)
Q3= 100.15
Class=n/4= 12.5 lies in 1st row (2-86)
Q1= L + h/f (n/4- C)
Q1= 1.5 + 85/36(12.5-0)
Q1= 31.01
Quartile Deviation = Q3 – Q1
2
Q.D = 100.15 – 31.01
2
Q.D = 34.57
Coefficient of Quartile deviation: Q3-Q1
Q3+Q1
= 100.15-31.01
100.15+31.01
= 34.57
131.16
=0.263
Mean Deviation (About mean) = ∑F(X-mean) = 2280/51 = 44.70
∑F
By calculating the first four moments about mean it shows that the data is
positively skewed, and the results coincide with the mean, median and mode
of the data. 4792959053.4
𝑏 1 = 𝑚3 2 = 26665.14 2= 711029691.21= 0
𝑚2 3 69231.20 3 331820389639663
𝑏 2 = 𝑚4 = 10481160254 = 2.186
𝑚2 2 69231.202
Data: B
Mean= ∑Fx
∑Fs
Mean= 2535.5/51 = 49.71 Crores
Median class = n/2 = 51/2 = 25.5 (lies in 2nd row (28-49)
Median= 27.5 + 22/15 (25.5-15)
Median=42.90Crores
Mode= L + fm – f1 *h
(fm-f1) + fm-f2)
Modal class = Highest frequency class =15= 28-49
Measures of Dispersion:
Range: largest value – lowest value
Range:130-6
Range:124
Coefficient of Range = L – S
L+S
Coefficient of range = 130-6
130+6
Coefficient of range= 124
136
Coefficient of range = 0.911
Quartile Deviation = Q3 – Q1
2
Q3= L + h/f (3(n/4 – C)
Quartile class = 3n/4 = 50/4= 37.5 (lies in 4th class 72-93)
Q3= 71.5+ 22/12 (3(12.5-6)
Q3= 71.5 + 1.83(37.5 -6)
Q3= 129.145
Q1= L + h/f (n/4- C)
Q1= 5.5+ 22/15(12.5-0)
Q1= 23.833
Quartile Deviation = Q3 – Q1
2
Q.D = 129.145-23.833
2
Q.D = 52.65
Coefficient of Quartile deviation: Q3-Q1
Q3+Q1
Coefficient of Q.D = 129.145-23.833
129.145+23.833
Coefficient of Q. D= 0.829
Mean Deviation (About mean) = ∑F(X-mean) = 1735.05/51 = 34.02
∑F
Coefficient of mean deviation: = mean deviation = 34.02 = 0.6843
Mean 49.715
Standard Deviation: Σ𝑓(𝑋-Mean)2 =√ 246819.36 = 70.25
Σ𝑓 51
Coefficient of Variance = S/Mean * 100 = 70.25/49.715 * 100 = 141.31
Skewness
M1=∑F(X-mean) = -0/51 = 0
∑F
M2=∑F(X-mean)2 = 47136.15/51= 924.238
∑F
M3=∑F(X-mean)3 = 939920.33/51 = -18429.81
∑F
M4= ∑F(X-mean)4 = 111199011.81/51 = 2180372.7
∑F
By calculating the first four moments about mean it shows that the data is
positively skewed, and the results coincide with the mean, median and mode of the
data.
𝑏 1 = 𝑚3 2 = 18429.812 339657896.63= 0.43
𝑚2 3 924.2383 789498777.094
𝑚2 2 924.2382 854215.88
As 𝑏2 < 3, the distribution is said to be flat topped and the curve is platykurtic
Variable B have data closer to normal distribution as its value is closer to 3 i.e. 2.55 and
normal distribution have values of kurtosis 3 and skewness near to zero.
Conclusion: I study the data and applied various measures such as measures of central tendency
(Mean, Median, Mode).The mean of the Set A was 75.66 crores and for Set B it was 49.71
Crores, For set A it was high due to high value 500.5 crore. Median was also similar having
values for SET A it was 26.71 and for set b it was 42.71 crores. Measures of dispersion (Quartile
deviation) etc. And further the skewness was calculated which found to be positive for set A and
for Set Both the data are related to each other and have about same results, but Data A have
slightly difference due to a large value of 500.25 Crores.