Professional Documents
Culture Documents
Lab 1 - 2021197285 - Siti Raziatul
Lab 1 - 2021197285 - Siti Raziatul
Lab 1 - 2021197285 - Siti Raziatul
LAB 1
Prepared By:
2021197285
Prepared For:
b) Mode .............................................................................................................2
c) Median ..........................................................................................................2
Variability ...................................................................................................................3
a) Range ...........................................................................................................3
c) Variance........................................................................................................4
d) Kurtosis .........................................................................................................4
e) Skewness .....................................................................................................5
We were given a task to be completed on November 16 th, 2021 by our lecturer, Dr.
Nabilah. This task is actually an introduction to statistic. In this task, it has 4 exercises.
Exercise 1 is descriptive statistic, Exercise 2 is Inferential Statistics, Exercise 3 on
Correlation Coefficient and Exercise 4 on Linear Regression. All data and instruction
were given clearly.
In exercise 1 we have to calculate the central tendency, variability and plotting the graph
based on data obtained. Next exercise 2, we have to calculate the significance of data
using t-test. Exercise 3, we have to calculate correlation coefficient using suitable
analysis, creating graph and determine the variables that has highest correlation with
call data. Lastly, in Exercise 4, we have to plot graph to visualize the linear regression
between Call data and all variables that has been calculated.
This task is an individual task that has been given to get better understanding on statistic
and refresh on previous study. For this task, I have chosen Call data and the other 4
variables are Jobs data, Low Education data, Renters data and Unemployed data. All
these 5 data are used from Exercise 1 until Exercise 4.
1
PROCEDURE
Central Tendency
a) Mean
Mean is the average of data recorded. To calculate mean, 𝜇 total sum of data value
is divided by number of data sets. The equation is shown below
∑𝑥
𝜇=
𝑛
∑𝑥 = total sum of data value
𝑛 = total number of data set
b) Mode
Mode is the highest frequency of the data. We can either see the data and arrange
the data itself or we can use graph to help identify the mode of the data. For example:
• Given data set: 1, 1, 2, 2, 3, 3, 3, 4, 4, 5
• Answer: 3
c) Median
Median is calculated by arranging the number from lowest to largest. Then, midpoint
number of the arrangement is considered as median. The equation is for median
calculation is
2
Example 2:
• Given in a data set: {1, 2, 3, 4, 5, 6}
• Solution: (3 + 4) / 2 = 3.5
• Median is 3.5
Variability
a) Range
Range is the difference between the highest value and the lowest value. For
example:
• Given data: 4, 5, 1, 9, 7
• Solution: The highest value is 7 and the lowest value is 1.
• Answer: 7-1 = 6
b) Standard deviation
Standard deviation is to measure the widespread of a dataset relative to its mean
and the simple calculation is square root of variance.
∑(𝑥𝑖 − 𝜇)2
𝜎=√
𝑁
3
c) Variance
Variance is to measure the variability from the average mean. The formula given
is
∑(𝑥𝑖 − 𝑥̅ )2
𝑆2 =
𝑛−1
𝑠 2 = Sample variance
𝑥𝑖 = value of one observation
𝑥̅ = mean value of the observation
𝑛 = number of observation
d) Kurtosis
Kurtosis is a measure of combines weight of a distribution’s tail relative to the
centre of distribution. Kurtosis can be analysed usually by using graph
(histogram) to see the peak value. There are three types of kurtosis. Leptokurtic
which is positive kurtosis, mesokurtic as normal distribution and platykurtic for
negative kurtosis. Formula of kurtosis can be defined as:
𝜇4
𝐾𝑢𝑟𝑡 = 4
𝜎
𝐾𝑢𝑟𝑡 = Kurtosis
𝜎 4 = Standard deviation
4
e) Skewness
Skewness is a graph that shows direction of outliers. If the graph is showing peak
on the left side, it shows positive skew. If the peak shows on the right side, it is
negative skew.
𝜇
̃3 = Skewness
𝑋𝑖 = 𝑅𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝜎 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
T test is also known as student test. It tells how significant the different between groups
are. In sort, it is a test to test the differences by measure the mean. It is a procedure
which is used to test the hypothesis of no difference between two variables and to see
if the sample is related to each other or otherwise.
5
Figure 4 : T-test graph
The formula given:
6
Correlation Coefficient
Correlation coefficient formulas are used to find the strength of relationship between
data. The formula returned a value between -1 and 1 where
Explanation:
7
r sxy
xy=
sx sy
Linear Regression
8
ANALYSIS AND DISCUSSION
1. Central Tendency of the Call data and any four of the independent variables in the
data
LOW
CALL DATA JOBS RENTERS UNEMPLOYED
EDUCATION
MEAN 24.7356 838.5172 125.1149 317.5057 46.5747
MODE 1 112 38 23 23
MEDIAN 16 182 98 191 35
2. Variability of the Call data and any four of the independent variables in the data
LOW
CALL DATA JOBS RENTERS UNEMPLOYED
EDUCATION
STANDARD
3.0644 250.1634 13.8756 36.5496 4.5556
ERROR
STANDARD
28.58 2,333.37 129.42 340.91 42.49
DEVIATION
VARIANCE 816.99 5,258,491.51 16,750.34 115,729.42 1,724.22
KURTOSIS 10.2380 31.2044 15.0944 5.3054 4.1831
SKEWNESS 2.8143 5.2179 3.2588 2.1393 1.887566488
RANGE 176.0000 16,976.0000 898.0000 1,851.0000 209.0000
MINIMUM - - - 15.0000 -
MAXIMUM 176.0000 16,976.0000 898.0000 1,866.0000 209.0000
SUM 2,152.0000 72,951.0000 10,885.0000 27,623.0000 4,052.0000
COUNT 87.0000 87.0000 87.0000 87.0000 87.0000
9
3. Using appropriate graph, plot the Call data as dependent variables and the other 4
selected variables in the data as independent variables in 4 different graphs.
10
Figure 7.3.3 : Bar Chart Call Data and Renters Variables
11
Exercise 2 – Inferential Statistics
1. Calculate statistics significance of the Call data and any four of the independent
variables in the data by using t-test.
CALL JOB
Mean 24.73563 838.5172
Variance 816.9874 5258492
Observations 87 87
Hypothesized Mean Difference 0
df 86
t Stat -3.30981
P(T<=t) one-tail 0.000683
t Critical one-tail 1.662765
P(T<=t) two-tail 0.001365
t Critical two-tail 1.987934
Figure 7.4.1 : Significant of Call Data and Job Variables by using T- Test
CALL LOWEDUC
Mean 24.73563218 125.1149425
Variance 816.9874365 16750.33547
Observations 87 87
Hypothesized Mean Difference 0
df 94
t Stat 7.064005712
P(T<=t) one-tail 1.39043E-10
t Critical one-tail 1.661225855
P(T<=t) two-tail 2.78086E-10
t Critical two-tail 1.985523442
Figure 7.4.2 : Significant of Call Data and Loweduc Variables by using T- Test
12
CALL RENTERS
Mean 24.73563218 317.5057
Variance 816.9874365 115729.4
Observations 87 87
Hypothesized Mean Difference 0
df 87
-
t Stat 7.999022754
P(T<=t) one-tail 2.48584E-12
t Critical one-tail 1.662557349
P(T<=t) two-tail 4.97169E-12
t Critical two-tail 1.987608282
Figure 7.4.3 : Significant of Call Data and Renters Variables by using T- Test
CALL UNEMPLOYED
Mean 24.73563 46.57471264
Variance 816.9874 1724.224004
Observations 87 87
Hypothesized Mean Difference 0
df 153
t Stat -4.04086
P(T<=t) one-tail 4.2E-05
t Critical one-tail 1.654874
P(T<=t) two-tail 8.41E-05
t Critical two-tail 1.97559
Figure 7.4.4 : Significant of Call Data and Unemployed Variables by using T- Test
13
Exercise 3 – Correlation Coefficient
1. You need to calculate the correlation coefficient using suitable correlation analysis
technique.
LOW
VARIABLES CALL DATA JOBS RENTERS UNEMPLOYED
EDUCATION
LOW
0.7529 0.2338 1.0000 0.7382 0.7486
EDUCATION
2. Create appropriate graph to visualize the correlation between Call and the
independent variables.
150
Calls
100
50
0
0 5000 10000 15000 20000
Jobs
14
Correlation Call and Unemployed
Variables
200
150
CALL
100
50
0
0 50 100 150 200 250
UNEMPLOYED
150
CALLS
100
50
0
0 200 400 600 800 1000
LOWEDUC
15
CORRELATION CALL DATA AND
RENTERS VARIABLES
200
150
CALLS
100
50
0
0 500 1000 1500 2000
RENTERS
3. Determine which among the variables has the highest correlation with Call data.
Based on Figure Figure 8.1 : Correlation Coefficient for each independent variable,
for job is 0.583. For LowEduc is 0.753, renters are 0.745 and unemployed is 0.737.
The chart of highest to lowest correlation shown below.
16
Exercise 4 – Linear Regression
1. You need to create appropriate graph to visualize the linear regression between Call
and the independent variables. Show all calculation involves
150
Calls
100
50
0
0 5000 10000 15000 20000
Jobs
Regression Statistics
Multiple R 0.583207288
R Square 0.34013074
Adjusted R Square 0.332367573
Standard Error 23.35481332
Observations 87
ANOVA
df SS MS F Significance F
Regression 1 23897.899 23897.899 43.813 3.07527E-09
Residual 85 46363.021 545.447
Total 86 70260.920
17
Correlation Call and LowEduc Variables
200
180
160
140
120
Calls
100
80
60
40
20
0
0 200 400 600 800 1000
LowEduc
Regression Statistics
Multiple R 0.752942
R Square 0.566921
Adjusted R Square 0.561826
Standard Error 18.92043
Observations 87
ANOVA
Significance
df SS MS F F
Regression 1 39832.4 39832.399 111.2691 4.07E-17
Residual 85 30428.52 357.98259
Total 86 70260.92
18
Correlation Call Data and Renters Variables
200
180
160
140
120
Calls
100
80
60
40
20
0
0 500 1000 1500 2000
Renters
Regression Statistics
Multiple R 0.7847
R Square 0.6157
Adjusted R Square 0.6112
Standard Error 17.8234
Observations 87.0000
ANOVA
df SS MS F Significance F
Regression 1 43258.6373 43258.6373 136.1731 2.44002E-19
Residual 85 27002.2823 317.6739
Total 86 70260.9195
19
Correlation Call and Unemployed
Variables
200
180
160
140
120
CALL
100
80
60
40
20
0
0 50 100 150 200 250
UNEMPLOYED
Regression Statistics
Multiple R 0.5832
R Square 0.3401
Adjusted R Square 0.3324
Standard Error 23.3548
Observations 87
ANOVA
Significance
df SS MS F F
Regression 1 23897.8986 23897.8986 43.8134 3.0753E-09
Residual 85 46363.0210 545.4473
Total 86 70260.9195
20
CONCLUSION
REFERENCES
Measures of Variability | Real Statistics Using Excel. (2022). Retrieved 31 January 2022,
from https://www.real-statistics.com/descriptive-statistics/measures-variability/
21