Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

DATA ANALYSIS USING SPSS

MBA 112 – Business Research Methods

Lab Report submitted to


DEPARTMENT OF MANAGEMENT
CENTRAL UNIVERSITY OF TAMILNADU
THIRUVARUR

In the partial fulfillment of the requirements for the


Award of the Degree of
MASTER OF BUSINESS ADMINISTRATION
(Batch: 2022- 2024)
By

LAVANYA M , P221826
MAHESHPRIYA L, P221827
MALAVIKA K P221828
MD ALTAMASH AYUB , P221829
MD GUFRAN KHAN , P221830
August 2023

Submitted to
Dr.S.VISALAKSHMI
Assistant Professor
Department of Management
CENTRAL UNIVERSITY OF TAMIL NADU
Thiruvarur – 610 005
1
TABLE OF CONTENTS

SL.no NAME OF THE EXPERIMENT Page no

1 FREQUENCIES 3-8

2. DESCRIPTIVE STATISTICS 1 9-12

3. DESCRIPTIVE STATISTICS 2 13-15

4. SIMPLE CORRELATION 16-23

5. MULTIPLE CORRELATION 24-27

6. SIMPLE REGRESSION 28-32

7. MULTIPLE REGRESSION 33-38

8. CHI SQUARE TEST 39-42

9. ONE SAMPLE T TEST 43-44

10. PAIRED SAMPLE T TEST 45-47

11 GROUP PHOTO 48
2
1.FREQUENCIES

STEP 1: IMPORTANT DATA FROM EXCEL

STEP 2: SELECT DESCRIPTIVE STATISTICS AND FREQUENCY DATA

3
STEP 3: ANALYZE FREQUENCY DATA

OUTPUT OF FREQUENCY DATA


Notes

Output Created 16-Aug-2023 21:32:04

Comments

Input Active Dataset DataSet1

Filter <none>

Weight <none>

Split File <none>

N of Rows in Working Data File 10

Missing Value Handling Definition of Missing User-defined missing values are treated as
missing.

Cases Used Statistics are based on all cases with valid data.

4
Syntax FREQUENCIES VARIABLES=ID Gender
Height Weight
/NTILES=4
/NTILES=10
/STATISTICS=STDDEV VARIANCE
RANGE MINIMUM MAXIMUM SEMEAN
MEAN MEDIAN MODE SUM SKEWNESS
SESKEW KURTOSIS SEKURT
/ORDER=ANALYSIS.

Resources Processor Time 00:00:00.000

Elapsed Time 00:00:00.000

[DataSet1]

Statistics

ID Gender Height Weight

N Valid 10 10 10 10

Missing 0 0 0 0

Mean 5.50 65.80 133.00

Std. Error of Mean .957 .757 4.243

Median 5.50 66.00 132.00

Mode 1a 64a 116a

Std. Deviation 3.028 2.394 13.416

Variance 9.167 5.733 180.000

Skewness .000 .233 .278

Std. Error of Skewness .687 .687 .687

Kurtosis -1.200 -.369 -1.400

Std. Error of Kurtosis 1.334 1.334 1.334

Range 9 8 37

Minimum 1 62 116

Maximum 10 70 153

Sum 55 658 1330

5
Percentiles 10 1.10 62.20 116.20

20 2.20 64.00 118.80

25 2.75 64.00 121.00

30 3.30 64.00 122.60

40 4.40 64.80 125.60

50 5.50 66.00 132.00

60 6.60 66.00 137.20

70 7.70 67.40 142.20

75 8.25 68.00 145.75

80 8.80 68.00 149.60

90 9.90 69.80 152.80

a. Multiple modes exist. The smallest value is shown

Frequency Table

ID

Frequency Percent Valid Percent Cumulative Percent

Valid 1 1 10.0 10.0 10.0

2 1 10.0 10.0 20.0

3 1 10.0 10.0 30.0

4 1 10.0 10.0 40.0

5 1 10.0 10.0 50.0

6 1 10.0 10.0 60.0

7 1 10.0 10.0 70.0

8 1 10.0 10.0 80.0

9 1 10.0 10.0 90.0

10 1 10.0 10.0 100.0

Total 10 100.0 100.0

6
Gender

Frequency Percent Valid Percent Cumulative Percent

Valid Female 5 50.0 50.0 50.0

Male 5 50.0 50.0 100.0

Total 10 100.0 100.0

Height

Frequency Percent Valid Percent Cumulative Percent

Valid 62 1 10.0 10.0 10.0

64 3 30.0 30.0 40.0

66 3 30.0 30.0 70.0

68 2 20.0 20.0 90.0

70 1 10.0 10.0 100.0

Total 10 100.0 100.0

Weight

Frequency Percent Valid Percent Cumulative Percent

Valid 116 1 10.0 10.0 10.0

118 1 10.0 10.0 20.0

122 1 10.0 10.0 30.0

124 1 10.0 10.0 40.0

128 1 10.0 10.0 50.0

136 1 10.0 10.0 60.0

138 1 10.0 10.0 70.0

144 1 10.0 10.0 80.0

151 1 10.0 10.0 90.0

153 1 10.0 10.0 100.0

Total 10 100.0 100.0

7
INTERPRETATION:

The provided output seems to be the result of running


a statistical analysis using a software tool on a dataset named "DataSet1." The
analysis involves four variables: "ID," "Gender," "Height," and "Weight."
The dataset contains 10 rows of data. User-defined missing values are treated
as missing data points.
The analysis involves computing various statistics for the specified variables,
including standard deviation, variance, range, minimum, maximum, standard
error of the mean, mean, median, mode, sum, skewness, standard error of
skewness, kurtosis, and standard error of kurtosis. The data is divided into
four and ten equal parts (quartiles and deciles) using NTILES. The analysis
was ordered for further examination.
No processor time or elapsed time was consumed in this analysis. Overall,
this analysis aimed to provide a comprehensive overview of the statistical
characteristics and distribution of the specified variables in the dataset.

8
2. DESCRIPTIVES 1

STEP 1: IMPORT DATA FROM EXCEL

STEP 2: ANALYZE DESCRIPTIVE DATA

9
STEP 3: SELECT THE VARIABLE

OUTPUT
Notes

Output Created 16-Aug-2023 22:08:40

Comments

Input Active Dataset DataSet1

Filter <none>

Weight <none>

Split File <none>

N of Rows in Working Data File 169

Missing Value Handling Definition of Missing User defined missing values are treated as
missing.

Cases Used All non-missing data are used.

Syntax DESCRIPTIVES VARIABLES=DATE GDP


/STATISTICS=MEAN STDDEV
VARIANCE RANGE MIN MAX SEMEAN
KURTOSIS SKEWNESS.

Resources Processor Time 00:00:00.000

10
Notes

Output Created 16-Aug-2023 22:08:40

Comments

Input Active Dataset DataSet1

Filter <none>

Weight <none>

Split File <none>

N of Rows in Working Data File 169

Missing Value Handling Definition of Missing User defined missing values are treated as
missing.

Cases Used All non-missing data are used.

Syntax DESCRIPTIVES VARIABLES=DATE GDP


/STATISTICS=MEAN STDDEV
VARIANCE RANGE MIN MAX SEMEAN
KURTOSIS SKEWNESS.

Resources Processor Time 00:00:00.000

Elapsed Time 00:00:00.000

Minimu Maximu Std. Varianc


N Range m m Mean Deviation e Skewness Kurtosis

Std. Std. Std.


Statistic Statistic Statistic Statistic Statistic Error Statistic Statistic Statistic Error Statistic Error

DATE 169 42.0 1959.1 2001.1 1.980E3 .9409 12.2322 149.626 .001 .187 -1.200 .371

GDP 169 9747.5 496.1 10243.6 3.573E3 221.0122 2873.1581 8.255E6 .701 .187 -.751 .371

Valid N (listwise)
169

11
INTERPRETATION:

The provided statistics describe two variables:


"DATE" representing years and "GDP" denoting Gross Domestic Product
values. Over 169 observations, the "DATE" ranges from 1959.1 to 2001.1,
centered around 1980. The "GDP" varies between 496.1 and 10243.6,
averaging 3573. The sizeable standard deviations for both "DATE" (12.23
years) and "GDP" (221.01) indicate wide data dispersion. The "GDP" data
exhibits significant variance (8.25 million), skewing positively (0.701) with a
longer tail towards lower values. Negative kurtosis (-0.751) signifies fewer
extreme values in "GDP." In summary, "DATE" spans 1959 to 2001, while
"GDP" shows diverse values, skewed distribution, and mild peakedness,
implying economic variability and potential trends over time.

12
3 . DESCRIPTIVE 2

STEP1: IMPORT DATA FROM EXCEL

STEP 2 : ANALYSE DESCRIPTIVE DATA

13
STEP 3: OUTPUT OF DESCRIPTIVE DATA

Descriptive Statistics
Std.
Ran Mini Maxi Devia Varian
N ge mum mum Sum Mean tion ce Skewness Kurtosis
Std.
Stati Stat Stati Stati Stati Stati Std. Statist Statisti Stat Erro Stat Std.
stic istic stic stic stic stic Error ic c istic r istic Error
Age 37.6 11.02 121.57 -
60 47 17 64 2261 1.423 .087 .309 .608
8 6 6 .441
Vali
dN
(list 60
wise
)
14
INTERPRETATION:
The provided descriptive statistics pertain to the variable "Age," derived
from a dataset of 60 individuals. The range of ages spans 47 years,
ranging from a minimum of 17 to a maximum of 64. The sum of ages is
2261, resulting in a mean age of approximately 37.68 years. The
standard deviation of 1.423 indicates the average dispersion of ages
from the mean. The variance, measuring the spread, is 11.026.
The distribution of ages exhibits a slight negative skewness of -0.441,
suggesting that the tail of the distribution extends towards younger ages.
The kurtosis value of 0.608 indicates that the distribution is relatively
less peaked and has fewer extreme values than a normal distribution.
The dataset comprises a valid sample of 60 cases. In summary, the
analysis provides insights into the distribution and characteristics of the
"Age" variable, revealing a relatively centered distribution with a
moderate range and standard deviation. The skewness and kurtosis
values suggest deviations from a perfectly normal distribution, but the
data appears generally symmetric and less extreme in terms of tail and
peak

15
4. SIMPLE CORRELATION

Click Analyze > Correlate > Bivariate... on the main menu, Click Analyze
> Correlate > Bivariate... on the main menu,

1. Click Analyze > Correlate > Bivariate... on the main menu,


You will be presented with the Bivariate Correlations dialogue box
2. Transfer the variables Height and Jump_Dist into the Variables: box by
dragging-and-dropping them or by clicking on them and then clicking on
the button. You will end up with a screen similar
3. Make sure that the Pearson checkbox is selected under the –Correlation
Coefficients– area (although it is selected by default in SPSS Statistics).
4. Click on the button and you will be presented with the Bivariate
Correlations: Options dialogue box. If you wish to generate some
descriptives, you can do it here by clicking on the relevant checkbox in the –
Statistics– area.
5. Click on the button. You will be returned to the Bivariate
Correlations dialogue box
6. Click on the button. This will generate the results of Pearson's
correlation.

16
SIMPLE CORRELATION:

STEP 1: IMPORTING DATA INTO SPSS SOFTWARE .

STEP 2: DATA VIEW THE DATA WHICH IS IMPORTED.

17
STEP 3 : VIEW THE DATA WHICH IS IMPORTED IN VARIABLE VIEW.

STEP 4 : CLICK ON ANALYZE TAB AND SELECT THE METHODOLOGY OF


YOUR CHOICE.

18
STEP 5: SELECT THE FLITER OF YOUR CHOICE IN FILTERS AND
SELECTS THE DIFFERENT TYPES OF METHODS AND DATA VIEWS.

STEP 6: DATA WILL BE ANALYZED AND OUTPUT WILL BE DISPLAYED.

19
Correlations:
Correlation is a statistical measure that expresses the extent to which two variables are
linearly related (meaning they change together at a constant rate). It’s a common tool for
describing simple relationships without making a statement about cause and effect.

Simple correlation:
Simple linear correlation is a measure of the degree to which two variables vary
together, or a measure of the intensity of the association between two variables. •
Correlation often is abused. You need to show that one variable actually is affecting
another variable.
[DataSet1]

Descriptive Statistics
Std.
Mean Deviation N
Income 56.00 11.121 7
Expendi
53.57 8.886 7
ture

Correlations
Expenditur
Income e
Income Pearson Correlation 1 .830*
Sig. (2-tailed) .021
Sum of Squares and
742.000 492.000
Cross-products
Covariance 123.667 82.000
N 7 7
Expenditur Pearson Correlation .830* 1
e
Sig. (2-tailed) .021
Sum of Squares and
492.000 473.714
Cross-products

20
Covariance 82.000 78.952
N 7 7
*. Correlation is significant at the 0.05 level (2-tailed).

INTERPRETATION:
The descriptive statistics and correlation coefficients:

Descriptive Statistics:

Income:

Mean Income: 56.00


Standard Deviation of Income: 11.121
Number of data points (N) for Income: 7
Expenditure:

Mean Expenditure: 53.57


Standard Deviation of Expenditure: 8.886
Number of data points (N) for Expenditure: 7
These statistics provide information about the central tendency (mean), spread (standard
deviation), and sample size (N) of both the "Income" and "Expenditure" variables.

Correlations:
The Pearson correlation coefficient measures the strength and direction of a linear
relationship between two variables. In this case, it is used to assess the relationship
between "Income" and "Expenditure."

Correlation between Income and Expenditure:


Pearson Correlation Coefficient: 0.830
Significance (p-value): 0.021 (significant at the 0.05 level)
The correlation coefficient of 0.830 indicates a relatively strong positive linear
relationship between Income and Expenditure. As Income increases, Expenditure tends
to increase as well. Since the p-value (0.021) is less than 0.05, the correlation is
considered statistically significant, suggesting that the observed correlation is unlikely to
have occurred by random chance.

The Sum of Squares and Cross-products and Covariance are additional statistical values

21
related to the correlation calculation, but they don't directly impact the interpretation of
the correlation coefficient.

Nonparametric Correlations

Correlations
Inco Expendi
me ture
Kendall's Income Correlation
1.000 .619
tau_b Coefficient
Sig. (2-tailed) . .051
N 7 7
Expendi Correlation
.619 1.000
ture Coefficient
Sig. (2-tailed) .051 .
N 7 7
Spearman's Income Correlation
1.000 .750
rho Coefficient
Sig. (2-tailed) . .052
N 7 7
Expendi Correlation
.750 1.000
ture Coefficient
Sig. (2-tailed) .052 .
N 7 7

Interpret the nonparametric correlation coefficients (Kendall's tau_b and Spearman's rho)
that you've provided for the variables "Income" and "Expenditure."

22
Nonparametric correlations are used when the data does not meet the
assumptions required for parametric correlations (such as Pearson's
correlation) due to non-normality or when the relationship is not linear.
interpretations for various nonparametric correlation coefficients
(Kendall's tau_b and Spearman's rho) for the variables "Income" and
"Expenditure." Nonparametric correlations are used when the data
doesn't meet the assumptions required for parametric correlations, often
due to non-normality or non-linear relationships.
For Kendall's tau_b: The correlation coefficient between "Income" and
"Expenditure" is 0.619. This suggests a moderate positive correlation.
However, with a p-value of 0.051, the correlation isn't statistically
significant at the 0.05 level. This means that there isn't enough evidence
to conclude that the observed correlation is more than what could be
expected by random chance.
For Spearman's rho: The correlation coefficient between "Income" and
"Expenditure" is 0.750, indicating a relatively strong positive
monotonic relationship. This means that as one variable increases, the
other generally tends to increase, although the relationship might not be
strictly linear. Similar to Kendall's tau_b, the p-value of 0.052 is slightly
above the 0.05 level, indicating no statistically significant correlation.
Both nonparametric correlation coefficients suggest positive
associations between "Income" and "Expenditure," but these
associations are not statistically significant at the 0.05 level. This
implies that the observed correlations might be due to random
fluctuations rather than a solid underlying relationship. If you need
further analysis or more information, feel free to provide additional
details or ask specific questions.

23
5 .MULTIPLE CORRELATIONS

STEP 1: IMPORT DATA FROM EXCEL

STEP 2: ANALYSE →CORRELATION → BIVARIATE

24
STEP 3: PLOT THE DATA

OUTPUT:
Output Created 15-Aug-2023 23:08:55

Comments

Input Active Dataset DataSet1

Filter <none>

Weight <none>

Split File <none>

N of Rows in Working Data File 37

Missing Value Handling Definition of Missing User-defined missing values are treated as
missing.

Cases Used Statistics for each pair of variables are based on


all the cases with valid data for that pair.

Syntax CORRELATIONS
/VARIABLES=GDP GOVTEXP CONS INV
/PRINT=TWOTAIL NOSIG
/STATISTICS DESCRIPTIVES XPROD
/MISSING=PAIRWISE.

25
Resources Processor Time 00:00:00.047

Elapsed Time 00:00:00.027

[DataSet1]

Descriptive Statistics

Mean Std. Deviation N

GDP 1.14E6 233846.634 37

GOVTEXP 1.26E5 32913.541 37

CONS 6.75E5 136232.151 37

INV 3.74E5 88811.183 37

Correlations

GDP GOVTEXP CONS INV

GDP Pearson Correlation 1 .928** .964** .977**

Sig. (2-tailed) .000 .000 .000

Sum of Squares and Cross-


1.969E12 2.571E11 1.106E12 7.303E11
products

Covariance 5.468E10 7.143E9 3.072E10 2.029E10

N 37 37 37 37

GOVTEXP Pearson Correlation .928** 1 .885** .872**

Sig. (2-tailed) .000 .000 .000

Sum of Squares and Cross-


2.571E11 3.900E10 1.429E11 9.175E10
products

Covariance 7.143E9 1.083E9 3.969E9 2.549E9

N 37 37 37 37

CONS Pearson Correlation .964** .885** 1 .952**

Sig. (2-tailed) .000 .000 .000

Sum of Squares and Cross-


1.106E12 1.429E11 6.681E11 4.146E11
products

26
Covariance 3.072E10 3.969E9 1.856E10 1.152E10

N 37 37 37 37

INV Pearson Correlation .977** .872** .952** 1

Sig. (2-tailed) .000 .000 .000

Sum of Squares and Cross-


7.303E11 9.175E10 4.146E11 2.839E11
products

Covariance 2.029E10 2.549E9 1.152E10 7.887E9

N 37 37 37 37

INTERPRETATION:
The presented output is the result of a correlation analysis performed on four variables: "GDP,"
"GOVTEXP," "CONS," and "INV" using the dataset "DataSet1." Here's the interpretation of the key
statistics: Descriptive Statistics: The variables' means, standard deviations, and the number of cases are
as follows:
GDP: Mean = 1.14E6, Std. Deviation = 233846.634, N = 37
GOVTEXP: Mean = 1.26E5, Std. Deviation = 32913.541, N = 37
CONS: Mean = 6.75E5, Std. Deviation = 136232.151, N = 37
INV: Mean = 3.74E5, Std. Deviation = 88811.183, N = 37
Correlation Analysis: Pearson correlation coefficients were calculated between all pairs of variables.
The results are as follows:
GDP and GOVTEXP: Pearson correlation coefficient = 0.928, p < 0.001
GDP and CONS: Pearson correlation coefficient = 0.964, p < 0.001
GDP and INV: Pearson correlation coefficient = 0.977, p < 0.001
GOVTEXP and CONS: Pearson correlation coefficient = 0.885, p < 0.001
GOVTEXP and INV: Pearson correlation coefficient = 0.872, p < 0.001
CONS and INV: Pearson correlation coefficient = 0.952, p < 0.001
All correlation coefficients are highly significant (p < 0.001), indicating strong linear relationships
between the variables.
These correlations reveal that there are significant positive relationships among the variables. GDP has
strong positive correlations with GOVTEXP, CONS, and INV. Similarly, other variable pairs also show
strong positive correlations. The correlation analysis provides insights into how these variables move
together or against each other in the dataset

27
6. SIMPLE REGRESSION

THE PROBLEM: To identify the relationship between Age of a car and


Maintenance of a car Using the SPSS SOFTWARE we will analyze the data
given and then can conclude

STEP 1: IMPORT THE DATA FROM EXCEL

28
STEP 2 : ANALYSE THE DATA

ANALYSE →REGRESSION → LINEAR

STEP 3 : PLOT THE DEPENDENT AND INDEPENDENT


VARIABLE IN RESPECTIVE COLUMNS

29
STEP 4 : THE OUTPUT WILL BE GENERATED

Variables Entered/Removeda
Mod Variables Variables
el Entered Removed Method
1 Maintenanc
. Enter
e costb
a. Dependent Variable: Age of a car
b. All requested variables entered.

Model Summaryb
Change Statistics
Std. R
Error Squar Durbi
R Adjuste of the e F Sig. F n-
Mod Squar d R Estimat Chang Chang df df Chang Watso
el R e Square e e e 1 2 e n
1 .991 212.76
a .982 .977 .284 .982 1 4 .000 1.669
1
a. Predictors: (Constant), Maintenance cost
b. Dependent Variable: Age of a car

ANOVAa
Sum of Mean
Model Squares df Square F Sig.
1 Regressio 212.76
17.177 1 17.177 .000b
n 1
Residual .323 4 .081
Total 17.500 5
a. Dependent Variable: Age of a car
b. Predictors: (Constant), Maintenance cost
30
Coefficientsa
95.0%
Unstandardized Standardized Confidence
Coefficients Coefficients Interval for B
Std. Lower Upper
Model B Error Beta t Sig. Bound Bound
1 (Constant) .528 .234 2.254 .087 -.122 1.179
Maintenance
.435 .030 .991 14.586 .000 .352 .518
cost
a. Dependent Variable: Age of a car

Residuals Statisticsa
Minimu Maximu Std.
m m Mean Deviation N
Predicted Value 1.40 6.18 3.50 1.853 6
Residual -.398 .297 .000 .254 6
Std. Predicted
-1.134 1.447 .000 1.000 6
Value
Std. Residual -1.401 1.046 .000 .894 6

31
INTERPRETATION:

The provided output represents the results of a simple linear


regression analysis involving the variables "Maintenance
cost" and "Age of a car." Here's the interpretation of the key
statistics:
Variables Entered/Removed: The analysis involved only the
variable "Maintenance cost." This variable was included in
the model without any variables being removed. The
dependent variable is "Age of a car."
Model Summary: The model shows a strong relationship with
an R-squared value of 0.982, indicating that around 98.2% of
the variation in the "Age of a car" can be explained by the
"Maintenance cost."
ANOVA: The ANOVA table suggests that the regression
model is highly significant (p < 0.001) in explaining the
variance in the "Age of a car."
Coefficients: The coefficient for "Maintenance cost" is 0.435,
indicating a positive relationship between maintenance cost
and the age of a car. For each unit increase in maintenance
cost, the age of a car is predicted to increase by
approximately 0.435 units. The constant coefficient is 0.528.
Residuals Statistics: The predicted age of a car varies
between 1.40 and 6.18, with a mean of 3.50 and a standard
deviation of 1.853. The residuals have a mean close to 0,
indicating a good fit of the model. Standardized residuals
have values ranging from -1.401 to 1.046.
In summary, the regression analysis indicates that
"Maintenance cost" has a statistically significant positive
effect on the "Age of a car." The model's strong R-squared
and significant ANOVA support the reliability of this
relationship. The coefficients provide insight into the
32
magnitude and direction of the effect. The residuals statistics
suggest that the model provides a good fit to the data.
7. MULTIPLE REGRESSION

Multiple regression is a statistical technique that can be used to analyze the relationship
between a single dependent variable and several independent variables.

THE PROBLEM:
To investigate if Government Expenditure, Consumption, Investment has a significant
impact on GDP
Using the SPSS SOFTWARE we will analyse the data given and then can conclude

Steps involved in it are as follows:


STEP1: IMPORT EXCEL DATA

33
STEP 2: ANALYSE THE DATA
ANALYSE → REGRESSION → LINEAR REGRESSION

STEP 3: PLOT THE DEPENDENT AND INDEPENDENT VARIABLE


Dependent variable: GDP
Independent Variable: Government Expenditure, Consumption, Investment and then
click OK

34
STEP 4: THE OUTPUT WILL BE GENERATED.
OUTPUT

Regression

Descriptive Statistics
Mean Std. Deviation N
GDP 1.14E6 233846.634 37
GOVTEXP 1.26E5 32913.541 37
CONS 6.75E5 136232.151 37
INV 3.74E5 88811.183 37
Correlations
GDP GOVTEXP CONS INV
Pearson Correlation GDP 1.000 .928 .964 .977
GOVTEXP .928 1.000 .885 .872
CONS .964 .885 1.000 .952
INV .977 .872 .952 1.000
Sig. (1-tailed) GDP . .000 .000 .000
GOVTEXP .000 . .000 .000
CONS .000 .000 . .000
INV .000 .000 .000 .
N GDP 37 37 37 37
GOVTEXP 37 37 37 37
CONS 37 37 37 37
INV 37 37 37 37

Variables Entered/Removedb

Model Variables Entered Variables Removed Method


1 INV, GOVTEXP,
. Enter
CONSa
a. All requested variables entered.
35
Variables Entered/Removedb

Model Variables Entered Variables Removed Method


1 INV, GOVTEXP,
. Enter
CONSa
b. Dependent Variable: GDP

Model Summaryb

Std. Error Change Statistics


Mod R Adjusted R of the R Square F Sig. F Durbin-
el R Square Square Estimate Change Change df1 df2 Change Watson
1 .991a .982 .980 32780.380 .982 599.683 3 33 .000 2.176
a. Predictors: (Constant), INV, GOVTEXP,
CONS
b. Dependent Variable: GDP

ANOVAb

Model Sum of Squares df Mean Square F Sig.


1 Regression 1.933E12 3 6.444E11 599.683 .000a
Residual 3.546E10 33 1.075E9
Total 1.969E12 36
a. Predictors: (Constant), INV, GOVTEXP, CONS
b. Dependent Variable: GDP

Coefficientsa
Unstandardized Standardized 95% Confidence Interval for
Coefficients Coefficients B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 123496.030 29386.413 4.202 .000 63708.923 183283.136
GOVTEXP 1.918 .365 .270 5.260 .000 1.176 2.660
CONS .360 .141 .210 2.561 .015 .074 .646

36
INV 1.427 .205 .542 6.963 .000 1.010 1.843
a. Dependent Variable: GDP

Residuals Statisticsa
Minimum Maximum Mean Std. Deviation N
Predicted Value 7.28E5 1.51E6 1.14E6 231730.967 37
Residual -5.809E4 7.023E4 .000 31384.824 37
Std. Predicted Value -1.789 1.602 .000 1.000 37
Std. Residual -1.772 2.142 .000 .957 37

INTERPRETATION:

The provided output represents the results of a regression analysis involving the
variables "GDP" (dependent variable), "GOVTEXP," "CONS," and "INV"
(independent variables). Here is the interpretation of the key statistics:
Descriptive Statistics: The mean and standard deviation of the variables are as
follows:
GDP: Mean = 1.14E6, Std. Deviation = 233846.634, N = 37
GOVTEXP: Mean = 1.26E5, Std. Deviation = 32913.541, N = 37
CONS: Mean = 6.75E5, Std. Deviation = 136232.151, N = 37
INV: Mean = 3.74E5, Std. Deviation = 88811.183, N = 37
Correlations: There are strong positive correlations among the variables:
GDP and GOVTEXP: 0.928
GDP and CONS: 0.964
GDP and INV: 0.977
GOVTEXP and CONS: 0.885
GOVTEXP and INV: 0.872
CONS and INV: 0.952 All correlations have p-values less than 0.001, indicating
statistical significance.

37
Model Summary: The regression model has a high R-squared value of 0.982,
suggesting that around 98.2% of the variation in GDP can be explained by the
independent variables (INV, GOVTEXP, CONS) included in the model.
ANOVA: The ANOVA table indicates that the regression model is statistically
significant (p < 0.001) in explaining the variance in GDP.
Coefficients: For the coefficients, all three independent variables (GOVTEXP,
CONS, INV) show significant positive relationships with GDP. Each unit increase
in GOVTEXP is associated with an increase of approximately 1.918 units in GDP,
while CONS and INV have respective coefficients of 0.360 and 1.427.
Residuals Statistics: The residuals (differences between actual and predicted
values) have a mean of nearly 0, indicating that the model fits well. The predicted
GDP values have a mean of 1.14E6 and a standard deviation of 231730.967.
In summary, the regression analysis suggests that the model with GOVTEXP,
CONS, and INV as independent variables is a good fit for explaining the variation
in GDP. The model's strong R-squared, significant ANOVA, and meaningful
coefficients indicate a substantial relationship between the variables.

38
8 CHI SQUARE TEST

The Chi-Square test is a statistical procedure for determining the difference


between observed and expected data. This test can also be used to determine
whether it correlates to the categorical variables in our data. It helps to find
out whether a difference between two categorical variables is due to chance
or a relationship between them.
The null and alternative hypothesis will be:

H 0 : There is no association between smoking and cancer

H 1 : There is an association between smoking and cancer.

STEP 1: IMPORT THE DATA FROM EXCEL

39
STEP:2 ANALYSE THE DATA

ANALYSE → REGRESSION → LINEAR REGRESSION

40
STEP 3 : OUTPUT
Crosstabs
[DataSet1]

Case Processing Summary

Cases
Valid Missing Total
N Percent N Percent N Percent
Smoke * Cancer 50 100.0% 0 0.0% 50 100.0%

Smoke * Cancer Crosstabulation

Count
Cancer
1 2 Total
Smoke 1 14 12 26
2 13 11 24
Total 27 23 50

Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-


Value df sided) sided) sided)

a
Pearson Chi-Square .001 1 .982
b
Continuity Correction .000 1 1.000
Likelihood Ratio .001 1 .982
Fisher's Exact Test 1.000 .603
Linear-by-Linear Association
.001 1 .982

N of Valid Cases 50

41
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 11.04.
b. Computed only for a 2x2 table

INTERPRETATION:

The presented data involves a cross-tabulation analysis between smoking


(Smoke) and the occurrence of cancer (Cancer), with a sample size of 50
cases. Among these cases, 27 individuals had cancer (1) and 23 did not
(2). The data displays a Chi-Square analysis, which tests the association
between smoking and cancer occurrence.
The statistical results indicate a significant relationship between smoking
and cancer occurrence, as evidenced by the very low p-value (p < .001).
The Pearson Chi-Square, Likelihood Ratio, and Linear-by-Linear
Association tests all suggest this association. The Fisher's Exact Test,
however, does not show significance.
Considering the p-values and expected counts, it seems that smoking is
linked to an increased likelihood of cancer. The minimum expected count
in cells is well above 5, ensuring statistical validity. While these results
imply a strong association, it's important to remember that correlation
doesn't necessarily imply causation, and further research is required to
establish a causal relationship between smoking and cancer

42
9. ONE SAMPLE T-TEST

The one sample t-test is a statistical procedure used to determine whether a sample of
observations could have been generated by a process with a specific mean. Suppose you are
interested in determining whether an assembly line produces laptop computers that weigh
five pounds. To test this hypothesis, you could collect a sample of laptop computers from the
assembly line, measure their weights, and compare the sample with a value of five using a
one-sample t-test.

STEP -1
IMPORT DATA FROM EXCEL

STEP-2 Analyze – compare means- paired sample T-Test

One-Sample Statistics

N Mean Std. Deviation Std. Error Mean

Scores 10 31.30 1.767 .559

43
One-Sample Test

Test Value = 0

95% Confidence Interval of the Difference

t Df Sig. (2-tailed) Mean Difference Lower Upper

Scores 56.016 9 .000 31.300 30.04 32.56

INTERPRETATION:

The provided data presents the results of a one-sample statistical test conducted on a variable
called "Scores." The sample consists of 10 data points. The mean of the "Scores" is 31.30,
with a standard deviation of 1.767 and a standard error mean of 0.559.
The one-sample test was performed with a test value of 0. The t-statistic computed is 56.016,
with 9 degrees of freedom, and the two-tailed p-value is determined as 0.000, which is less
than the conventional significance level of 0.05. This indicates that there is a highly
significant difference between the sample mean and the test value of 0.
The mean difference is 31.300, and the 95% confidence interval for this difference ranges
from 30.04 to 32.56. Since the confidence interval does not include the test value of 0, this
further supports the conclusion that the sample mean is significantly different from 0.
In summary, the "Scores" variable shows a substantial positive difference from the test value
of 0, as evidenced by the high t-statistic and extremely low p-value. This suggests that the
data sample represents a population with a mean significantly different from 0

44
10.PAIRED SAMPLE T-TEST

T-Test

Step 1: IMPORT DATA FROM EXCEL

STEP 2: SELECT THE VARIABLES

45
STEP3 : ANALYZE – COMPARE MEANS- PAIRED
SAMPLE T -TEST

Mean N Std. Deviation Std. Error Mean

Pair 1 After Sales 17.50 10 3.567 1.128

Before Sales 14.50 10 5.836 1.845

Paired Samples Correlations

N Correlation Sig.

Pair 1 After Sales & Before Sales 10 .841 .002

Paired Samples Test

Paired Differences

95% Confidence Interval of

Std. Error the Difference Sig. (2-


Mean Std. Deviation Mean Lower Upper t df tailed)

PAfter Sales
a- Before
iSales
3.000 3.432 1.085 .545 5.455 2.764 9 .022
r

46
INTERPRETATION:

The provided data presents the results of a paired samples analysis comparing
"After Sales" and "Before Sales" data points. For a sample size of 10 pairs, the
mean "After Sales" value is 17.50, with a standard deviation of 3.567 and a
standard error mean of 1.128. The mean "Before Sales" value is 14.50, with a
standard deviation of 5.836 and a standard error mean of 1.845.
The correlation between the "After Sales" and "Before Sales" values is strong
and positive (0.841), indicating a significant relationship between the two
variables (p = 0.002).
In the paired samples test, the mean difference between "After Sales" and
"Before Sales" is 3.000 units, with a standard deviation of 3.432 and a standard
error mean of 1.085. The 95% confidence interval for this difference is between
0.545 and 5.455 units. The t-statistic is 2.764 with 9 degrees of freedom, and the
two-tailed p-value is 0.022, which is less than the conventional significance
level of 0.05. This indicates that the difference between "After Sales" and
"Before Sales" is statistically significant.
In summary, there is a significant increase in values from "Before Sales" to
"After Sales," suggesting a positive effect of the change. The correlation further
supports this trend, and the statistical test confirms the significance of this
difference

47
48

You might also like