2: Methodology:: Name: Hassan Raza Roll:20-10755 Math-107-Project Section: C

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Name: Hassan Raza

Roll:20-10755
Math-107-Project
Section: C
Data Set:3
1: Introduction: The data provided to me is the data taken from 51 people and the data is the
gross and budget of different Bollywood movies released. The highest value in gross is
500.75(Baahubali) while the highest value in data budget is of 130.The data has no outliners and
is more continuous and the data is quantative due to numeric data.

2: Methodology:
Finding Number of classes

No of Classes: 2k>n
2k>50
26>50
K=6
No of classes (For both the variables) = 6
Finding class interval: h= highest value – lowest value
N
For Data A: 500.75-2 For Data B: 130-6
50 50
For A Class interval= 85 For Data B Class interval=22

1, Frequency Distribution

For Data A for Data B

Class Midpoint(x) Class Midpoint(x)


Intervals Frequency(F) Intervals Frequency(F)
2-86 44 36 6-27 16.5 15
87-171 129 14 28-49 38.5 15
172-256 214 0 50-71 60.5 6
257-341 299 0 72-93 82.5 12
342-426 384 0 94-115 104.5 1
427-511 469 1 116-137 126.5 2
∑F= 51 ∑F= 51
Frequency Polygons:

Frequency(F) Frequency(F)
40 16
35 14
30 12
25 10
20 8
15 6
10 4
5 2
0 0
44 129 214 299 384 469 16.5 38.5 60.5 82.5 104.5 126.5

Data A Data B

Interpretation: Both the variables have totally different frequency curves or polygons. The
curve of Data A is high and then flat while the curve of Data B is in zig zag shape.
Key: The median is indicated with the blue line and the mode is indicated with red line in both
graphs.

2: Measures of Central Tendency


Class
Frequency(F Class X- F(X- (X- Midpoints(x
Interval C. F(X-mean)2
) Fx limits mean mean) Mean)2 )
s F
1.5- 1139.7 1002.35
2--86 36 1584 36 31.66 36084.78 44
86.5 6 5
86.5- 2845.15
87-171 14 1806 50 53.34 746.76 39832 129
171.5 5
171.5
138.3 19137.9
172-256 0 0 50 - 0 0 214
4 5
256.5
256.5
223.3
257-341 0 0 50 - 0 49880 0 299
4
341.5
341.5
308.3
342-426 0 0 50 - 0 95073 0 384
4
426.5
426.5
393.3
427-511 1 469 51 - 393.34 154716 154716 469
4
511.5
51 ∑F(X- ∑F(X-
∑ ∑Fx
     
mean)   mean)2=230,632.9 ∑X=207
F=   =3859
=2280 5
Mean, median and Mode
For Data A

Mean= ∑Fx
∑F
Mean= 3859/51 = 75.66
median= L + h/f n/2-c
Median class = n/2 = 51/2 = 25.5(lies in 1st row (2-86)
Median= 1.5 + 85/36 25.5-36
Median=1.5+24.78
Median = 26.28
Mode= L + fm – f1 *h
(fm-f1) + fm-f2)-
Modal class = Highest frequency class =36= 2-86
Mode= 1.5 + 36-0 * 85
(35-0) + (36-14)
Mode=1.5 + 53.68
Mode= 55.18

The data is symmetrical (Skewness=0) as the mean is higher than the median which is less
than the mode. I. e (75.66>26.28<55.18)
Mean is the best appropriate measure for the central tendency as this data is continuous and
have no outliners. So, mean poses the best measure for central tendency.

Measures of Dispersion:
1:Range: largest value – lowest value
Range=500.75-2
Range=498.75
Coefficient of Range = L – S
L+S
Coefficient of range = 500.75-2
500.75+2
Coefficient of range= 498.75
502.75
Coefficient of range = 0.992
Quartile deviation
Quartile Deviation = Q3 – Q1
2
Q3= L + h/f (3(n/4 – C)
Quartile class = 3n/4 = 3(51/4) = 38.25 (lies in 2nd class 87-171)
Q3= 86.4+ 85/14 (38.25-36)
Q3= 86.5 +6.07(38.25-36)
Q3= 100.15
Class=n/4= 12.5 lies in 1st row (2-86)
Q1= L + h/f (n/4- C)
Q1= 1.5 + 85/36(12.5-0)
Q1= 31.01
Quartile Deviation = Q3 – Q1
2
Q.D = 100.15 – 31.01
2

Q.D = 34.57
Coefficient of Quartile deviation: Q3-Q1
Q3+Q1

= 100.15-31.01
100.15+31.01
= 34.57
131.16
=0.263
Mean Deviation (About mean) = ∑F(X-mean) = 2280/51 = 44.70
∑F

Coefficient of Mean deviation= mean deviation = 44,70 = 0.59


Mean 75.66
Standard Deviation: Σ𝑓(𝑋-Mean)2 =√230632.95 = 67.91
Σ𝑓 51
Coefficient of Variance= S/Mean * 100 = 67.91/75.66 * 100 = 89.75

𝑠𝑘=𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒 = 75.66 – 55.18 = 0.301


𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 67.91

The value of skewness indicates that the data is positively skewed.


Standard deviation is best measure for dispersion for the data as it gives clear view
about the skewness of data.
Moments
X - Mean F(x-mean) F (X – Mean)2 F (X – Mean)3 F(X-Mean)4

-31.66 -1139.76 36084.80 -1142444.76 36169801.10


53.34 746.76 39832.17 2124647.94 113328721.11
138.34 0 0 0 0
223.34 0 0 0 0
308.34 0 0 0 0
393.34 393.34 3454,874.35 1358940276.82 534389674456.69
∑F(X-mean) =0 ∑F(X- ∑F(X-mean)3= ∑F(X-mean)4=
mean)2=3530791.32 1359922480 534539172978.69
M1=∑F(X-mean) = 0/51 = 0
∑F
M2=∑F(X-mean)2 = 3530791.32/51 = 69231.20
∑F
M3=∑F(X-mean)3 = 1359922480/51= 26665.14
∑F
M4= ∑F(X-mean)4 =534539172978.69/51 = 10481160254
∑F

By calculating the first four moments about mean it shows that the data is
positively skewed, and the results coincide with the mean, median and mode
of the data. 4792959053.4
𝑏 1 = 𝑚3 2 = 26665.14 2= 711029691.21= 0

𝑚2 3 69231.20 3 331820389639663

𝑏 2 = 𝑚4 = 10481160254 = 2.186

𝑚2 2 69231.202

Class Class X- (X-


Midpoint(x) Frequency(F) Fx C.F F(X-Mean) F(X-M
Intervals limits Mean Mean)2
5.5-
6--27 16.5 15 247.5 15 33.21 498.15 1102.9 1
27.5
27.5-
28-49 38.5 15 577.5 30 11.215 168.225 125.77 1
49.5
49.5-
50-71 60.5 6 363 36 10.785 64.71 116.31
71.5
71.5-
72-93 82.5 12 990 48 32.785 393.42 1074.85 1
93.5
93.5-
94-115 104.5 1 104.5 49 54.785 104.5 3001.39 3
115.5
115.5-
116-137 126.5 2 253 51 76.785 506 5895.93 1
137.5
∑F(X-
∑Fx=2
∑F=   51       Mean)  
535.5 Mean
=1735.005
. As 𝑏2 < 3, the distribution is said to be flat topped and the curve is platykurtic

Data: B

Mean= ∑Fx
∑Fs
Mean= 2535.5/51 = 49.71 Crores
Median class = n/2 = 51/2 = 25.5 (lies in 2nd row (28-49)
Median= 27.5 + 22/15 (25.5-15)
Median=42.90Crores
Mode= L + fm – f1 *h
(fm-f1) + fm-f2)
Modal class = Highest frequency class =15= 28-49

Mode= 27.5+ 15-15 * 14


(35-15) + (15-6)
Mode=27.5 + 0
Mode= 27.5 Crores
The data is skewed to the left as the mean is greater than the median which is greater than the
mode. I. e (49.715>42.90>27.5).
Mean is the best appropriate measure for the central tendency as this data is continuous and
have no outliners. So, mean poses the best measure for central tendency.

Measures of Dispersion:
Range: largest value – lowest value
Range:130-6
Range:124
Coefficient of Range = L – S
L+S
Coefficient of range = 130-6
130+6
Coefficient of range= 124
136
Coefficient of range = 0.911

Quartile Deviation = Q3 – Q1
2
Q3= L + h/f (3(n/4 – C)
Quartile class = 3n/4 = 50/4= 37.5 (lies in 4th class 72-93)
Q3= 71.5+ 22/12 (3(12.5-6)
Q3= 71.5 + 1.83(37.5 -6)
Q3= 129.145
Q1= L + h/f (n/4- C)
Q1= 5.5+ 22/15(12.5-0)
Q1= 23.833
Quartile Deviation = Q3 – Q1
2
Q.D = 129.145-23.833
2
Q.D = 52.65
Coefficient of Quartile deviation: Q3-Q1
Q3+Q1
Coefficient of Q.D = 129.145-23.833
129.145+23.833
Coefficient of Q. D= 0.829
Mean Deviation (About mean) = ∑F(X-mean) = 1735.05/51 = 34.02
∑F
Coefficient of mean deviation: = mean deviation = 34.02 = 0.6843
Mean 49.715
Standard Deviation: Σ𝑓(𝑋-Mean)2 =√ 246819.36 = 70.25
Σ𝑓 51
Coefficient of Variance = S/Mean * 100 = 70.25/49.715 * 100 = 141.31

Skewness

𝑠𝑘=𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒 = 49.715 – 27.5 = 0.316


𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 70.25
The value of skewness indicates that the data is positively skewed
Standard deviation is best measure for dispersion for the data as it gives clear view
about the skewness of data.
Moments
X - Mean F(x-mean) F (X – Mean)2 F(X – Mean)3 F(X-Mean)4

-33.21 -498.15 16543.56 -549412 18245972.52


-11.21 -168.15 1883.2 -21110.67 236650.61
10.78 64.68 697.25 7460.57 80418.8
32.78 403.36 13222.14 433417.36 14129405.93
54.78 54.78 3000 164340 9002545
76.78 153.56 11790.33 905236 69504020
∑F(X-mean) =-0 ∑F(X-mean)2=- ∑F(X-mean)3= ∑F(X-
47136.15 939920.33 mean)4=111199011.81

M1=∑F(X-mean) = -0/51 = 0
∑F
M2=∑F(X-mean)2 = 47136.15/51= 924.238
∑F
M3=∑F(X-mean)3 = 939920.33/51 = -18429.81
∑F
M4= ∑F(X-mean)4 = 111199011.81/51 = 2180372.7
∑F

By calculating the first four moments about mean it shows that the data is
positively skewed, and the results coincide with the mean, median and mode of the
data.
𝑏 1 = 𝑚3 2 = 18429.812 339657896.63= 0.43

𝑚2 3 924.2383 789498777.094

𝑏 2 = 𝑚4 = 2180372.7= 2180372.7 = 2.55

𝑚2 2 924.2382 854215.88

As 𝑏2 < 3, the distribution is said to be flat topped and the curve is platykurtic
Variable B have data closer to normal distribution as its value is closer to 3 i.e. 2.55 and
normal distribution have values of kurtosis 3 and skewness near to zero.
Conclusion: I study the data and applied various measures such as measures of central tendency
(Mean, Median, Mode).The mean of the Set A was 75.66 crores and for Set B it was 49.71
Crores, For set A it was high due to high value 500.5 crore. Median was also similar having
values for SET A it was 26.71 and for set b it was 42.71 crores. Measures of dispersion (Quartile
deviation) etc. And further the skewness was calculated which found to be positive for set A and
for Set Both the data are related to each other and have about same results, but Data A have
slightly difference due to a large value of 500.25 Crores.

You might also like