C11 - Quantitative Data Analysis and Interpretation

CHAPTER 11
QUANTITATIVE
DATA ANALYSIS
AND
INTERPRETATION
Research Methodology:
Tools, Methods and Techniques
Sundram, V.P.K., Chandran, V.G.R., Atikah, S.B., Rohani, M., Nazura, M.S., Akmal, A.O., & Krishnasamy, T.
Learning
Learning Objectives
Objectives
After
Aftercompleting
completingthis
thischapter,
chapter,you
youshould
shouldbe
beable
ableto:
to:
 Understand
Understandthe
theimportance
importanceof
ofediting
editingthe
thecollected
collectedraw
rawdata
datatotodetect
detecterrors
errorsand
andomissions
omissions
 Set
Setup
upthe
thecoding
codingkey
keyfor
forthe
thedata
dataset
setand
andcode
codethe
thedata
data
 Categorize
Categorizedata
dataand
andcreate
createdata
datafiles
files
 Get
Getaa‘feel’
‘feel’for
forthe
thedata
data
 Test
Testthe
thegoodness
goodnessof
ofdata
data
 Understand
Understandthe
theuse
useof
ofcontent
contentanalysis
analysistotointerpret
interpretand
andsummarize
summarizeopen
openquestions
questions
 Understand
Understandthe
theproblems
problemsand
andsolutions
solutionsfor
for“don’t
“don’tknow”
know”responses
responses
 Understand
Understandthe
theoptions
optionsfor
fordata
dataentry
entryand
andmanipulation
manipulation
 Interpret
Interpret the
the computer
computer results
results and
and prepare
prepare recommendations
recommendations based
based on
on the
the quantitative
quantitative data
data
analysis
analysis
Research Methodology: Tools, Methods and Techniques 2
Table of Content
11.1 DATA DIAGNOSIS AND TREATMENT
11.2 APPROPRIATE STATISTICAL ANALYSIS
11.3 INTERPRETING SELECTED DATA ANALYSIS

CHAPTER 11
11.1 DATA DIAGNOSIS AND

TREATMENT

11.1.1 Missing Data
 Missing data are a certifiably big deal in multivariate analysis,
and it is important to have some tools for dealing with them.
 A single missing value for a variable can cause either the
variable or the case to be excluded.
 When dealing with missing data, you may leave the cell blank
or assign value codes. If you choose the latter, then a number
of rules apply:
 Missing value codes must be of the same data type as the data they
represent.
 Missing codes cannot occur as data in the data set
 By convention, the choice of digit is usually 9.
11.1.1 Missing Data
Dealing With Missing Data
An example of this is to replace a sampled country with another country
Case substitution
not yet included in the sample.
Another way of dealing with missing data is by replacing missing data
points with mean value of the variable. This is done by substituting a
Mean substitution
variable’s mean value computed from available cases to fill in missing
data values on the remaining cases.
Cold deck This method replaces the missing value by a constant value from an
substitution external source (for example, from a previous survey).
Regression This is the best method if you have strong relationships and a moderate
substitution amount of missing data.
This is a composite estimation based on several methods. For example,
if you have multiple linear relationships, you could estimate the variable
Multiple methods
value from regression
Research of many
Methodology: Tools, Methodsdifferent variables and take the mean
and Techniques 6 of
11.1.2 Outliers
 An outlier is a value that lies outside the normal
range of data.
 Data values for the outliers are added, and
identifiers may be provided for interesting values.
 Box and whisker plots are particularly useful for
comparing group categories (e.g., men versus
women) or several variables (e.g., relative
importance levels of product attributes).

Boxplot Components
Largest observed
Smallest observed
value of upper
value of lower hinge
hinge
Outside value or Median Outside value

outlier
Whiskers or outlier

11.1.3 Normality Tests
 The assumption of normality is a perquisite for many
inferential statistical techniques.
 There are a number of different ways to explore this
assumption graphically:
Stem-and-leaf Normal
Histogram Box plot
plot probability plot
Kolmogorov-
Smirnov statistic,
Detrended with a Lilliefors
Skewness Kurtosis
normal plot significance level
and the Shapiro-
Wilks statistic
10 Research Methodology: Tools, Methods and Techniques
• This is another measure of central tendency for quantitative
variables.
•
Median
It is defined as the value that sits right in the middle of all data
entries when they are listed in ascending order.
• This is the most powerful measure of dispersion for quantitative
data.
deviation
• It permits very sophisticated descriptions of various
Standard
distributions.
• The square of the standard deviation.
Variance
• The mean of a quantitative variable is defined as the sum of all
entries divided by their number.
Mean
11.1.4 Feel of Data
11.1.5 Goodness of Fit
 Reliability – established by testing for both
consistency and stability.
 Consistency indicates how well the items measuring a
concept hang together as a set. Another measure of
consistency reliability used in specific situations is the split-
half reliability coefficient.
 The stability measure can be accessed through:
 parallel-form reliability – when a high correlation between two
similar forms of a measure is obtained
 test-retest reliability – a group of people (preferably 30 or more)
complete the questionnaire twice, with a reasonable time period
(e.g. a week) between the completions.
 Validity
 Factorial validity – established by submitting the data
for factor analysis. The results of factor analysis (a
multivariate technique) will confirm whether or not the
theorized dimensions emerge.
 Criterion-related validity – established by testing for
the power of the measure to differentiate individuals
who are known to be different.

 Convergent validity – established when there is a high
degree of correlation between two different sources
responding to the same measure.
Example
Both supervisor and subordinates respond in a similar way to a perceived reward
system measure administrated to them.
 Discriminant validity – established when two distinctly

different concepts are not correlated to each other.
Example
Courage and honesty; leadership and motivation; attitudes and behaviour.

CHAPTER 11
11.2 APPROPRIATE STATISTICAL

ANALYSIS

11.2.1 Parametric
 A t-test is used to determine whether a set or sets
of scores are from the same population.
 Three main types of t-test may be applied:
 One sample
 Independent groups
 Repeated measures

11.2.2 Assumption Testing
 Each statistical test has certain assumptions that must be met
prior to analysis.
 These assumptions need to be evaluated, because the accuracy of
test interpretation depends on whether assumptions have been
violated.
 The generic assumptions underlying all types of t-test are:
1. Scale of Measurement – the data should be at the interval or ratio level
of measurement.
2. Random sampling – the scores should be randomly sampled from the
population of interest.
3. Normality – the scores should be normally distributed in the population.

One-sample t-test

One-way ANOVA
Independence of groups Homogeneity of variance

11.2.3 Non-parametric Test
Wilcoxon The test is used when you would use a repeated measures or paired t-test – that
●
●
is, when the same participants perform under each of the independent variable.
The test is used to compare two or more related samples, and

Friedman
●
●
is equivalent to repeated measures or within-subject’s ANOVA.
Mann-Whitney
●
● It tests the hypothesis that two independent samples come from populations having
the same distribution. This test is equivalent to the independent groups t-test.
The test is equivalent to the one-way between-groups ANOVA and thus

Kruskal-Wallis
●
●
allows us to examine possible differences between two or more groups.
A non-parametric alternative to the parametric bivariate

Spearman rho
●
●
correlation (Pearson’s r) is Spearman’s rho.

CHAPTER 11
11.3 INTERPRETING SELECTED

DATA ANALYSIS

11.3.1 Interpretation of Descriptive Analysis
 Descriptive statistics are used to describe, examine
and summarize the main features of a collected
data quantitatively.
 Model case 1

The table below shows the study of the relationship

between economic fundamentals and the money
supply.
Table 1: Results of Descriptive Analysis (Money Supply)
Variable Mean Std Dev Max Min

Money Supply 38180.5 1580 43235.5 33862.7
Inflation (CPI) 101.2 3.5 112.5 88.9
Government Debt 11264.58 1871 11578.86 9435.17
National Income (GDP) 20359.8 2002 21802.6 19189.3

(i) Mean
 Mean is used to measure the center tendency of the arithmetic
average of the scores. To compute the mean, all the values are
added up and divided by the number of values.
 The maximum amount for money supply is RM43,235.50 and a
minimum of RM33,862.70. Also, it has a mean of RM38,180.50.
 The inflation (CPI) has the maximum score of 112.5% and a
minimum score of 88.9%. While it has an average score of 101.2%.
 The score for government debt is within the range of RM9,435.17 to
RM11,578.86 and the mean is at RM11,264.58.
 The national income variable has a minimum score of RM19,189.30
and a maximum of RM21,802.60. While the mean is RM20,359.80.
(ii) Standard deviation
 Standard deviation is used to measure variability of the square root of
variance providing an index of variability in the distribution of scores.
 The standard deviation for the variables of money supply, inflation,
government debt, and national income is RM1,580, 3.5%, RM1,871,
and RM2,002 respectively.
 In the case inflation variable, the standard deviation is 3.5/101.2 or
3.46% of the mean where this value can be considered as small. On
the other hand, for government debt variable, the standard deviation
is 16.61% (1871/11264.58) of the mean, where this score is perceived
as a large deviation.

11.3.2 Interpretation of Correlation Analysis
 The correlation analysis determines whether and to
what degree a relationship exists between two or
more quantifiable variables.
 For example, it is used to measure the relationship
strength between the dependent and independent
variables.
 Model case 1

The table below shows the correlation coefficients for the
variables average income, total expenditure and number of
people living in the households.
Table 2: Results of Correlation Analysis (Firm’s Employees in SME Malaysia)
Average Number of
Total expenditure
Employee Salary Employee
Average Pearson Correlation 1 0.539** 0.293**
Employee Salary Sig (2 tailed) 0.000 0.034
Pearson Correlation 0.539** 1 0.373**
Total expenditure
Sig (2 tailed) 0.000 0.000
Number of Pearson Correlation 0.293** 0.373** 1
Employee Sig (2 tailed) 0.034 0.000
** correlations are significant
(i) Definition of correlation coefficient

 Correlation is used to look at the ‘net strength’ relationship
between two continuous variables (Sweet and Martin, 2008).
 A correlation coefficient shows the direction, strength, and
significance of the bivariate relationship among all the variables
that were measured at an interval or ratio level.
 There could be a perfect positive correlation between two
variables, represented by 1.0 (plus 1) or a perfect negative
correlation, which would be -1.0 (minus 1).
 It does not tell us which variable causes which, but it tells us that
the two variables are associated with each other.

(ii) Explanation of the study’s correlation analysis

 There is a positively moderate correlation ( = 0.539) or
substantial relationship between the average employee salary and
total expenditure. Also, this relationship is significant at the 0.01
level.
 While average employee salary have a low correlation ( = 0.293)
which is definite but small relationship with number of employee.
However, it has a significant relationship at the 0.01 level.
 The total expenditure and number of employee also have a
definite but small relationship. In other words, have a low
correlation ( = 0.373) and significant at the 0.01 level.

Correlation Strength Based on Guilford’s Law
R Strength of relationship
< 0.20 Almost negligible relationship
0.20 – 0.40 Low correlation; definite but small relationship
0.40 – 0.70 Moderate correlation; substantial relationship
0.70 – 0.90 High correlation; marked relationship
> 0.90 Very high correlation; very dependable relationship

11.3.3 Interpretation of Regression Analysis
 Regression analysis is used to measure how many
percent dependent variables can be explain by the
independent variable.
 Model case 1

The table below shows the result of regression analysis of
four independent variables regressed against customer
satisfaction.
Table 3: Results of Regression Analysis (Customer Satisfaction)
Standardized
Unstandardized Coefficients
Model Coefficients t Sig
B Std. Error Beta
(Constant) 1.483 .290 5.114 .000
Product Quality .235 .069 .277 3.432 .001
Customer Service .024 .082 .026 .285 .776
Pricing .198 .076 .223 2.620 .010
Promotion .351 .080 .161 1.977 .025
F value 9.349
Sig .000
Adjusted R2 .181
R2 .203
(i) Model fit / Coefficient of determination (R2)
 R2 indicates the percentage variance in the dependent variable
that is explained by the variation in the independent variables.
 The R2 of 0.203 implies that all the independent variables explain
20 percent of the variance in dependent variable.
 79.7 percent of the variance in the dependent variable is not
explained by the independent variables in this study. This
indicates, there are other independent variables which are not
included in this study and could further strengthen the regression
equation.

(ii) Adjusted R2
 Adjustment of R-squared that penalizes the additional of
independent variable (IVs) to the model.
 Adjustment of R-squared penalizes the additional of 0.181 unit of
independent variable (IVs) to the model.
(iii) Model significance

 F-test is significant base on the value of 0.000. Hence all
independent variables significantly explained dependent variable.

(iv) Parameter significance (t-test)
 The result for product quality variable is 0.001 (0.1%), which is below the 5%
significant level. Therefore, product quality variable is significant. Hence,
explain that product quality is positively related with dependent variable.
 The variable for customer service is not significant. It is because the p-value for
customer service variable is 0.776 (77.6%), which is above the 5% significant
level. Hence, explain that customer service is not related with dependent
variable.
 Pricing variable has a p-value of 0.010 (1%), which is below the 5% significant
level. Therefore, pricing variable is significant. Hence, explain that pricing is
positively related with dependent variable.
 The promotion variable is significant with a p-value of 0.025 (2.5%). Thus,
shows it is below the 5% significant level. Hence, explain that promotion is
positively related with dependent variable.
(v) Unstandardized Beta Coefficients
 They are the value of regression equation function for predicting the dependent variable
from the independent variable.
 The column of estimates provides the value for 0 , 1 , 2 for this equation.
 Customer Satisfaction = 1.483 + 0.235 Product Quality + 0.024 Customer Service +
0.198 Pricing + 0.351 Promotion
 For each one-unit increase in product quality, customer satisfaction will increase by
0.235 units with holding other independent variable constant.
 For each one-unit increase in customer service, customer satisfaction will increase by
0.024 units with holding other independent variable constant.
 For each one-unit increase in pricing, customer satisfaction will increase by 0.198 units
with holding other independent variable constant.
 For each one-unit increase in promotion, customer satisfaction will increase by 0.351
units with holding other independent variable constant.

(vi) Standardized Beta Coefficients
 The beta uses a standard unit that is the same for all variables in the equation.
 It tells the same thing as unstandardized beta value but is expressed as standard deviation.
 As product quality increase by one standard deviation, customer satisfaction increase by
0.277 of a standard deviation.
 As customer service increase by one standard deviation, customer satisfaction increase by
0.026 of a standard deviation.
 As pricing increase by one standard deviation, customer satisfaction increase by 0.223 of a
standard deviation.
 As promotion increase by one standard deviation, customer satisfaction increase by 0.161
of a standard deviation.
 Therefore, the strongest would be the product quality variable with a beta weight of 0.277.
The second would be the pricing variable with a beta weight of 0.223. The weakest variable
would be promotion with beta weight of 0.161. While customer service variable does not
explain the variance in customer satisfaction significantly.

(vii) Recommendation
 The company should ensure that employees are continuously
producing high quality product to ensure customer satisfaction.
 The company needs to put the best price and promotion
advertisement to attract customers.
(viii) Future research

 Future studies should use other variables that have possible
contribution on customer satisfaction.
 Suggest moderating and mediating variables that would influence the
relationship between independent variable and dependent variable.

C11 - Quantitative Data Analysis and Interpretation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

C11 - Quantitative Data Analysis and Interpretation

Uploaded by

Copyright:

Available Formats

CHAPTER 11

Research Methodology: Tools, Methods and Techniques 3

11.1 DATA DIAGNOSIS AND

Research Methodology: Tools, Methods and Techniques 4

Research Methodology: Tools, Methods and Techniques 7

Outside value or Median Outside value

Research Methodology: Tools, Methods and Techniques 8

Research Methodology: Tools, Methods and Techniques 12

 Discriminant validity – established when two distinctly

Courage and honesty; leadership and motivation; attitudes and behaviour.

Research Methodology: Tools, Methods and Techniques 13

11.2 APPROPRIATE STATISTICAL

Research Methodology: Tools, Methods and Techniques 14

Research Methodology: Tools, Methods and Techniques 15

Research Methodology: Tools, Methods and Techniques 16

Research Methodology: Tools, Methods and Techniques 17

Independence of groups Homogeneity of variance

Research Methodology: Tools, Methods and Techniques 18

The test is used to compare two or more related samples, and

is equivalent to repeated measures or within-subject’s ANOVA.

The test is equivalent to the one-way between-groups ANOVA and thus

allows us to examine possible differences between two or more groups.

A non-parametric alternative to the parametric bivariate

correlation (Pearson’s r) is Spearman’s rho.

Research Methodology: Tools, Methods and Techniques 19

11.3 INTERPRETING SELECTED

Research Methodology: Tools, Methods and Techniques 20

Research Methodology: Tools, Methods and Techniques 21

The table below shows the study of the relationship

Variable Mean Std Dev Max Min

Research Methodology: Tools, Methods and Techniques 22

Research Methodology: Tools, Methods and Techniques 24

Research Methodology: Tools, Methods and Techniques 25

(i) Definition of correlation coefficient

Research Methodology: Tools, Methods and Techniques 27

(ii) Explanation of the study’s correlation analysis

Research Methodology: Tools, Methods and Techniques 28

Research Methodology: Tools, Methods and Techniques 29

Research Methodology: Tools, Methods and Techniques 30

Research Methodology: Tools, Methods and Techniques 32

(iii) Model significance

Research Methodology: Tools, Methods and Techniques 33

Research Methodology: Tools, Methods and Techniques 35

Research Methodology: Tools, Methods and Techniques 36

(viii) Future research

Research Methodology: Tools, Methods and Techniques 37

You might also like