Lab 1 - 2021197285 - Siti Raziatul

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

UNIVERSITI TEKNOLOGI MARA

FACULTY OF ARCHITECTURE, PLANNING AND SURVEYING


CENTRE OF STUDIES FOR QUANTITY SURVEYING

MASTER OF SCIENCE IN GEOGRAPHICAL INFORMATION SCIENCE


(AP720)

GIS 721 GEOSPATIAL STATISTIC

LAB 1

Prepared By:

SITI RAZIATUL ASYIQIN BINTI ROSLI

2021197285

Prepared For:

DR NABILAH BINTI NAHARUDDIN


TABLE OF CONTENTS

INTRODUCTION TO TASK ..........................................................................................1


PROCEDURE ...............................................................................................................2
Central Tendency ......................................................................................................2
a) Mean .............................................................................................................2

b) Mode .............................................................................................................2

c) Median ..........................................................................................................2

Variability ...................................................................................................................3
a) Range ...........................................................................................................3

b) Standard deviation ........................................................................................3

c) Variance........................................................................................................4

d) Kurtosis .........................................................................................................4

e) Skewness .....................................................................................................5

Correlated or paired T-Test .......................................................................................5


Correlation Coefficient ...............................................................................................7
Linear Regression......................................................................................................8
ANALYSIS AND DISCUSSION .....................................................................................9
Exercise 1 – Descriptive Statistics .............................................................................9
Exercise 2 – Inferential Statistics ............................................................................. 12
Exercise 3 – Correlation Coefficient......................................................................... 14
Exercise 4 – Linear Regression ............................................................................... 17
CONCLUSION ............................................................................................................ 21
REFERENCES ............................................................................................................ 21
INTRODUCTION TO TASK

We were given a task to be completed on November 16 th, 2021 by our lecturer, Dr.
Nabilah. This task is actually an introduction to statistic. In this task, it has 4 exercises.
Exercise 1 is descriptive statistic, Exercise 2 is Inferential Statistics, Exercise 3 on
Correlation Coefficient and Exercise 4 on Linear Regression. All data and instruction
were given clearly.

In exercise 1 we have to calculate the central tendency, variability and plotting the graph
based on data obtained. Next exercise 2, we have to calculate the significance of data
using t-test. Exercise 3, we have to calculate correlation coefficient using suitable
analysis, creating graph and determine the variables that has highest correlation with
call data. Lastly, in Exercise 4, we have to plot graph to visualize the linear regression
between Call data and all variables that has been calculated.

This task is an individual task that has been given to get better understanding on statistic
and refresh on previous study. For this task, I have chosen Call data and the other 4
variables are Jobs data, Low Education data, Renters data and Unemployed data. All
these 5 data are used from Exercise 1 until Exercise 4.

1
PROCEDURE

Central Tendency

a) Mean
Mean is the average of data recorded. To calculate mean, 𝜇 total sum of data value
is divided by number of data sets. The equation is shown below
∑𝑥
𝜇=
𝑛
∑𝑥 = total sum of data value
𝑛 = total number of data set

b) Mode
Mode is the highest frequency of the data. We can either see the data and arrange
the data itself or we can use graph to help identify the mode of the data. For example:
• Given data set: 1, 1, 2, 2, 3, 3, 3, 4, 4, 5
• Answer: 3

c) Median
Median is calculated by arranging the number from lowest to largest. Then, midpoint
number of the arrangement is considered as median. The equation is for median
calculation is

Figure 1 : Median Formula


For example:
Example 1:
• Given in a data set: {1, 2, 3, 4, 5}
• Median is 3.

2
Example 2:
• Given in a data set: {1, 2, 3, 4, 5, 6}
• Solution: (3 + 4) / 2 = 3.5
• Median is 3.5

Variability

a) Range
Range is the difference between the highest value and the lowest value. For
example:
• Given data: 4, 5, 1, 9, 7
• Solution: The highest value is 7 and the lowest value is 1.
• Answer: 7-1 = 6

b) Standard deviation
Standard deviation is to measure the widespread of a dataset relative to its mean
and the simple calculation is square root of variance.

∑(𝑥𝑖 − 𝜇)2
𝜎=√
𝑁

𝜎 = population standard deviation


𝑁 = size of population
𝑥𝑖 = value of each population
𝜇 = population mean

3
c) Variance
Variance is to measure the variability from the average mean. The formula given
is
∑(𝑥𝑖 − 𝑥̅ )2
𝑆2 =
𝑛−1
𝑠 2 = Sample variance
𝑥𝑖 = value of one observation
𝑥̅ = mean value of the observation
𝑛 = number of observation

d) Kurtosis
Kurtosis is a measure of combines weight of a distribution’s tail relative to the
centre of distribution. Kurtosis can be analysed usually by using graph
(histogram) to see the peak value. There are three types of kurtosis. Leptokurtic
which is positive kurtosis, mesokurtic as normal distribution and platykurtic for
negative kurtosis. Formula of kurtosis can be defined as:
𝜇4
𝐾𝑢𝑟𝑡 = 4
𝜎

𝐾𝑢𝑟𝑡 = Kurtosis

𝜇4 = Fourth central moment

𝜎 4 = Standard deviation

Figure 2 : Kurtosis graph

4
e) Skewness
Skewness is a graph that shows direction of outliers. If the graph is showing peak
on the left side, it shows positive skew. If the peak shows on the right side, it is
negative skew.

Figure 3: Skewness graph

Hence, the formula can be defined as:

𝛴𝑖𝑁 (𝑋𝑖 − 𝑋̅)3


𝜇̃3 =
(𝑁 − 1) ∗ 𝜎 3

𝜇
̃3 = Skewness

𝑁 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛

𝑋𝑖 = 𝑅𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒

𝑋̅ = 𝑀𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛

𝜎 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

Correlated or paired T-Test

T test is also known as student test. It tells how significant the different between groups
are. In sort, it is a test to test the differences by measure the mean. It is a procedure
which is used to test the hypothesis of no difference between two variables and to see
if the sample is related to each other or otherwise.

5
Figure 4 : T-test graph
The formula given:

Figure 5 : Paired T test formula

6
Correlation Coefficient

Correlation coefficient formulas are used to find the strength of relationship between
data. The formula returned a value between -1 and 1 where

• 1 shows a strong positive relationship


• -1 shows strong negative relationship
• 0 shows no relationship at all

Figure 6 : Correlation Coefficient graph

Explanation:

• A correlation coefficient of 1 means for every positive increase in a variable,


there will be increasing in a fixed proportion of other. For example, when the
body size increase, the shirt size also increases.
• A correlation coefficient of -1 means, every positive increase in a variable there
will be decreased in a fixed proportion of other. For example, the amount of
helium gas in a balloon decrease as time increase.
• Zero means the two every increase, there is no positive or negative. Which
mean, they are not related at all.

Types of correlation coefficient formula:

i. Pearson’s correlation coefficient


𝑛(𝛴𝑥𝑦) − (𝛴𝑥)(𝛴𝑦)
𝑟=
√[𝑛𝛴𝑥 2 − (𝛴𝑥)2 ][𝑛𝛴𝑦 2 − (𝛴𝑦)2 ]

ii. Sample correlation coefficient

7
r sxy
xy=
sx sy

iii. Population correlation coefficient


𝜎𝑥𝑦
𝜌𝑥𝑦 =
𝜎𝑥 𝜎𝑦

Linear Regression

Linear regression is a linear approach for modelling relationship between scalar


response and one or more variables. It used to predict the value of variable based on
value of another variable. The variable that is used to predict is dependent variable also
known as outcome variable. Formula given:

Y = a + bX, where X is the explanatory variable and Y is the dependent variable, b is


slope of line and a is intercept.

8
ANALYSIS AND DISCUSSION

Exercise 1 – Descriptive Statistics

1. Central Tendency of the Call data and any four of the independent variables in the
data

LOW
CALL DATA JOBS RENTERS UNEMPLOYED
EDUCATION
MEAN 24.7356 838.5172 125.1149 317.5057 46.5747
MODE 1 112 38 23 23
MEDIAN 16 182 98 191 35

Figure 7.1 : Central Tendency Table

2. Variability of the Call data and any four of the independent variables in the data

LOW
CALL DATA JOBS RENTERS UNEMPLOYED
EDUCATION
STANDARD
3.0644 250.1634 13.8756 36.5496 4.5556
ERROR
STANDARD
28.58 2,333.37 129.42 340.91 42.49
DEVIATION
VARIANCE 816.99 5,258,491.51 16,750.34 115,729.42 1,724.22
KURTOSIS 10.2380 31.2044 15.0944 5.3054 4.1831
SKEWNESS 2.8143 5.2179 3.2588 2.1393 1.887566488
RANGE 176.0000 16,976.0000 898.0000 1,851.0000 209.0000
MINIMUM - - - 15.0000 -
MAXIMUM 176.0000 16,976.0000 898.0000 1,866.0000 209.0000
SUM 2,152.0000 72,951.0000 10,885.0000 27,623.0000 4,052.0000
COUNT 87.0000 87.0000 87.0000 87.0000 87.0000

Figure 7.2 : Variability Table

9
3. Using appropriate graph, plot the Call data as dependent variables and the other 4
selected variables in the data as independent variables in 4 different graphs.

Figure 7.3.1 : Bar Chart Call Data and Job Variables

Figure 7.3.2 : Bar Chart Call Data and Educ Variables

10
Figure 7.3.3 : Bar Chart Call Data and Renters Variables

Figure 7.3.4 : Bar Chart Call Data and Unemployed Variables

11
Exercise 2 – Inferential Statistics

1. Calculate statistics significance of the Call data and any four of the independent
variables in the data by using t-test.

Significance of the Call data and Pop Variables by using T-Test

CALL JOB
Mean 24.73563 838.5172
Variance 816.9874 5258492
Observations 87 87
Hypothesized Mean Difference 0
df 86
t Stat -3.30981
P(T<=t) one-tail 0.000683
t Critical one-tail 1.662765
P(T<=t) two-tail 0.001365
t Critical two-tail 1.987934

Figure 7.4.1 : Significant of Call Data and Job Variables by using T- Test

CALL LOWEDUC
Mean 24.73563218 125.1149425
Variance 816.9874365 16750.33547
Observations 87 87
Hypothesized Mean Difference 0
df 94
t Stat 7.064005712
P(T<=t) one-tail 1.39043E-10
t Critical one-tail 1.661225855
P(T<=t) two-tail 2.78086E-10
t Critical two-tail 1.985523442

Figure 7.4.2 : Significant of Call Data and Loweduc Variables by using T- Test

12
CALL RENTERS
Mean 24.73563218 317.5057
Variance 816.9874365 115729.4
Observations 87 87
Hypothesized Mean Difference 0
df 87
-
t Stat 7.999022754
P(T<=t) one-tail 2.48584E-12
t Critical one-tail 1.662557349
P(T<=t) two-tail 4.97169E-12
t Critical two-tail 1.987608282

Figure 7.4.3 : Significant of Call Data and Renters Variables by using T- Test

CALL UNEMPLOYED
Mean 24.73563 46.57471264
Variance 816.9874 1724.224004
Observations 87 87
Hypothesized Mean Difference 0
df 153
t Stat -4.04086
P(T<=t) one-tail 4.2E-05
t Critical one-tail 1.654874
P(T<=t) two-tail 8.41E-05
t Critical two-tail 1.97559

Figure 7.4.4 : Significant of Call Data and Unemployed Variables by using T- Test

13
Exercise 3 – Correlation Coefficient

1. You need to calculate the correlation coefficient using suitable correlation analysis
technique.

LOW
VARIABLES CALL DATA JOBS RENTERS UNEMPLOYED
EDUCATION

CALL DATA 1.0000 0.5832 0.7529 0.7529 0.7371

JOBS 0.5832 1.0000 0.2338 0.4651 0.4651

LOW
0.7529 0.2338 1.0000 0.7382 0.7486
EDUCATION

RENTERS 0.7847 0.4651 0.7382 1.0000 0.7639

UNEMPLOYED 0.7371 0.4992 0.7486 0.7639 1.0000

Figure 8.1 : Correlation Coefficient for each independent variable

2. Create appropriate graph to visualize the correlation between Call and the
independent variables.

Correlation Call and Jobs Variables


200

150
Calls

100

50

0
0 5000 10000 15000 20000
Jobs

Figure 8.2.1 : Correlation Call and Job Variables

14
Correlation Call and Unemployed
Variables
200

150
CALL

100

50

0
0 50 100 150 200 250
UNEMPLOYED

Figure 8.2.2 : Correlation Call and Unemployed Variables

CORRELATION CALL AND


LOWEDUC VARIABLES
200

150
CALLS

100

50

0
0 200 400 600 800 1000
LOWEDUC

Figure 8.2.3 : Correlation Call and LowEduc Variables

15
CORRELATION CALL DATA AND
RENTERS VARIABLES
200

150
CALLS

100

50

0
0 500 1000 1500 2000
RENTERS

Figure 8.2.4 : Correlation Call and Renters Variables

3. Determine which among the variables has the highest correlation with Call data.
Based on Figure Figure 8.1 : Correlation Coefficient for each independent variable,
for job is 0.583. For LowEduc is 0.753, renters are 0.745 and unemployed is 0.737.
The chart of highest to lowest correlation shown below.

LowEduc Renters Unemployed Job

Figure 8.3.1 : Correlation from highest to lowest

16
Exercise 4 – Linear Regression

1. You need to create appropriate graph to visualize the linear regression between Call
and the independent variables. Show all calculation involves

Correlation Call and Jobs Variables


200

150
Calls

100

50

0
0 5000 10000 15000 20000
Jobs

Regression Statistics
Multiple R 0.583207288
R Square 0.34013074
Adjusted R Square 0.332367573
Standard Error 23.35481332
Observations 87

ANOVA
df SS MS F Significance F
Regression 1 23897.899 23897.899 43.813 3.07527E-09
Residual 85 46363.021 545.447
Total 86 70260.920

Coefficients Standard t Stat P- Lower Upper Lower Upper


Error value 95% 95% 95.0% 95.0%
Intercept 18.6401 2.6679 6.9869 0.0000 13.3356 23.9445 13.3356 23.9445
Jobs 0.0073 0.0011 6.6192 0.0000 0.0051 0.0095 0.0051 0.0095

17
Correlation Call and LowEduc Variables
200
180
160
140
120
Calls

100
80
60
40
20
0
0 200 400 600 800 1000
LowEduc

Regression Statistics
Multiple R 0.752942
R Square 0.566921
Adjusted R Square 0.561826
Standard Error 18.92043
Observations 87

ANOVA
Significance
df SS MS F F
Regression 1 39832.4 39832.399 111.2691 4.07E-17
Residual 85 30428.52 357.98259
Total 86 70260.92

Coefficients Standard t Stat P- Lower Upper Lower Upper


Error value 95% 95% 95.0% 95.0%
Intercept 3.9307 2.8293 1.3893 0.1684 - 9.5561 - 9.5561
1.6947 1.6947
LowEduc 0.1663 0.0158 10.5484 0.0000 0.1349 0.1976 0.1349 0.1976

18
Correlation Call Data and Renters Variables
200
180
160
140
120
Calls

100
80
60
40
20
0
0 500 1000 1500 2000
Renters

Regression Statistics
Multiple R 0.7847
R Square 0.6157
Adjusted R Square 0.6112
Standard Error 17.8234
Observations 87.0000

ANOVA
df SS MS F Significance F
Regression 1 43258.6373 43258.6373 136.1731 2.44002E-19
Residual 85 27002.2823 317.6739
Total 86 70260.9195

Coefficients Standard t Stat P- Lower Upper Lower Upper


Error value 95% 95% 95.0% 95.0%
Intercept 3.8033 2.6209 1.4512 0.1504 - 9.0144 - 9.0144
1.4077 1.4077
Renters 0.0659 0.0056 11.6693 0.0000 0.0547 0.0772 0.0547 0.0772

19
Correlation Call and Unemployed
Variables
200
180
160
140
120
CALL

100
80
60
40
20
0
0 50 100 150 200 250
UNEMPLOYED

Regression Statistics
Multiple R 0.5832
R Square 0.3401
Adjusted R Square 0.3324
Standard Error 23.3548
Observations 87

ANOVA
Significance
df SS MS F F
Regression 1 23897.8986 23897.8986 43.8134 3.0753E-09
Residual 85 46363.0210 545.4473
Total 86 70260.9195

Coefficients Standard t Stat P- Lower Upper Lower Upper


Error value 95% 95% 95.0% 95.0%
Intercept 18.6401 2.6679 6.9869 0.0000 13.3356 23.9445 13.3356 23.9445
X Variable 1 0.0073 0.0011 6.6192 0.0000 0.0051 0.0095 0.0051 0.0095

20
CONCLUSION

As for conclusion, in descriptive statistic, we gain information on mode, mean and


median which leads to the nest one, variability of the data. These data were the plotted
and we can see the Call data vs Variables data graph. Next, we calculate the significant
of the data to see the data is correlated or not. Lastly, we analyse the regression of the
data and showed all the calculation involved in preparing the graph.

REFERENCES

Correlation in Excel. (2022). Retrieved 30 January 2022, from https://www.excel-


easy.com/examples/correlation.html

Descriptive Statistics in Excel. (2022). Retrieved 30 January 2022, from


https://www.excel-easy.com/examples/descriptive-statistics.html

Measures of Variability | Real Statistics Using Excel. (2022). Retrieved 31 January 2022,
from https://www.real-statistics.com/descriptive-statistics/measures-variability/

Regression Analysis in Excel. (2022). Retrieved 30 January 2022, from


https://www.excel-easy.com/examples/regression.html

t-Test in Excel. (2022). Retrieved 30 January 2022, from https://www.excel-


easy.com/examples/t-test.html

21

You might also like