Data Analysis and Inference - Mba - UPD DR DENARTO DENNIS

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

Data Analysis and Inference

“Data has become a more important resource than oil in the modern
(World Bank, 2017)

Dr. Denarto Dennis

Structure of Presentation
 A review of the research writing process/structure

 An exploration of the branches of statistics and data analysis

 An exploration of descriptive statistical techniques

 An exploration of inferential statistical techniques

 Practical application of descriptive and inferential statistical techniques

Stages in the Research Process

1. Define the problem

2. Plan a research design
3. Decide on a sampling procedure
4. Collect data
5. Analyze data
6. Formulate conclusions and prepare the report
Preparation of your Research Report
The major planks of your research report are:

 Introduction
 Literature Review/Review of Materials
 Methodology
 Analysis of Data
 Discussion and Conclusions
 References
 Appendix
 This is one of the most important chapters in a typical research paper. It
includes a specification of the following:

1. What methods will be used in the study, for example qualitative,

quantitative, or mixed methods.

2. The research design and instruments

3. The methods of sampling

4. The methods of data analysis

Qualitative, Quantitative and Mixed Methods
 In general, quantitative research seeks to understand the causal or
correlational relationship between variables through testing hypotheses,
whereas qualitative research seeks to understand a phenomenon within a
real-world context through the use of interviews and observation.

 Mixed methods research is a research method that integrates qualitative

and quantitative research methods in a single research study. It involves
collecting and analyzing qualitative and quantitative data to understand
a phenomenon better and answer the research questions.
Analysis of Results
 In this chapter, the focus is on the application of statistical and other tools
to analyze the data that has emerged from your study. Such analyses may
involve both descriptive and inferential statistical tools and techniques.
 Examples of descriptive statistical tools which may be utilized are
measures of central tendencies and variation, measures of shape,
frequency distributions, and crosstabulations.
 Examples of inferential statistical tools include regression and correlation,
hypothesis testing techniques such as chi-square tests, paired samples t–
tests, independent samples t-tests, ANOVA (Analysis of Variance) tests etc.
 The focus here is on summarizing and interpreting the results, not to offer
any detailed discussion.
Examples of Analytical Software

 If you are doing quantitative research, the following software can

be considered:
2. SAS
3. R
4. Stata
Examples of Analytical Software

 If you are doing qualitative research, the following software can

be considered:
1. Qaultrics
2. Hubspot
3. Quirkos
4. Envivo
Types of Statistics
 Descriptive Statistics – The process of collecting, analyzing and
presenting data. This is typically done in the form of tables, charts,
frequency distributions, measures of central tendency, measures
of variation and measures of shape.

 Descriptive statistics forms the basis for inferential statistics.

Types of Statistics
 Inferential Data – This is the practice of using sampled data to draw
conclusions or make predictions about a larger sample data sample or
population. This is usually achieved using a myriad of statistical
techniques such as hypothesis testing and estimations. Common tests
of hypotheses include the Z and t tests for single means, chi-square
tests of association, independent samples t-test, paired samples t-test
and Analysis of Variance (ANOVA). Regression and correlation analyses
are also classified under inferential statistical techniques.
Discussion of Descriptive Statistics
Measures of central tendency are used to summarize a data set and give
and indication of the distribution of the data points. The primary
measures of central tendency are:
1. The mean or arithmetic average
2. The mode
3. The median

Measures of variation or dispersion are used to examine how widely

spread the values in a dataset are. These include:
4. Variance and Standard Deviation
5. Range
Discussion of Descriptive Statistics
Measures of shape provide an indication of how the data points in a dataset are
distributed. For example, whether the points are concentrated around some
central value or whether a larger proportion of the data points are to the right or
left of the overall series mean. Examples of measures of shape include:
1. Skewness – The distribution of the data points around the mean. The types of
skewness are positive skewness, negative skewness and zero skewness.
Skewness can be conferred by various algebraic and graphical representations
such as box and whisker plots, stem and leaf diagrams, histogram and
frequency polygons, and comparisons of the measures of central tendencies.
2. Kurtosis – The flatness or ‘peakness’ of the points in a data set as reflected by
the frequency polygon. A frequency polygon that is higher than a typical
symmetric curve is set to be leptokurtic and one that is flatter is said to be
Discussion of Descriptive Statistics
Some of the measures highlighted may be affected by outliers, which are, in
simple terms, an extremely high or extremely low data point relative to the
nearest data point and the rest of the neighboring co-existing values in a data
graph or dataset you're working with. Outliers are extreme values that stand
out greatly from the overall pattern of values in a dataset or graph.
For example, let’s spot the outliers in the dataset – 5, 30,34,35,37,101.

By way of information, Turkey's method is a mathematical method to find

outliers. As per the Turkey method, the outliers are the points lying beyond
the upper boundary of Q3 +1.5 IQR and the lower boundary of Q1 - 1.5 IQR.
These boundaries are referred to as outlier fences.
Discussion of Descriptive Statistics
We will now illustrate the descriptive statistical techniques outlined using
the previous slides by using the SPSS/Excel software to process and
discuss the dataset labeled “Data Analysis and Inference” which explores
the association between the selling price of a house and its physical size
and area of lot. We will also use the dataset labeled as “measures of
shape” to graphically aid in illustrating the concept of skewness.

Note that descriptive statistical analyses are necessary to establish a

comprehensive understanding of a dataset before conducting inferential
statistical analyses.
Discussion of Inferential Statistics
Descriptive Statistical Analysis helps a researcher to establish and
understanding of a dataset and therefore makes it easier to analyze and
Inferential statistical analysis goes deeper and is focused on using sample
analyses, such as hypothesis testing, to establish generalizations about the
population from which such samples are drawn.
Hypothesis tests normally have a null (Ho) and alternate hypothesis (H1). The
modes that these take will vary depending on the types of hypothesis testing
being carried out, but in general, the null hypothesis will reflect a default
position, or a known or established claim or condition that is being challenged.
The alternate hypothesis, on the other hand, will, in general, reflect the new
proposition or a “challenge” to the null hypothesis.
Discussion of Inferential Statistics
Let us discuss a few prominently applied hypothesis testing techniques and their
 Hypothesis testing for single means using Z and t-tests – These are used to
determine if there is statistically significant evidence to conclude that a known
or established mean has changed in some direction (i.e. increased, decreased
or simply different). Here the null hypothesis will generally indicate that the
mean is equivalent to a specific value while the alternate hypothesis will
represent some contradiction to the null.
 Example – A researcher wishing to carry out a statistical test to determine if
the mean sales volume of a competitor’s product has changed could take a
sample and conduct a hypothesis test of single means.
Discussion of Inferential Statistics
 The Independent Samples t-test: The independent t-test, also called
the two sample t-test, independent-samples t-test, or student's t-test,
is an inferential statistical test that determines whether there is a
statistically significant difference between the means in two unrelated

 Example – A researcher wishing to carry out a test to determine if the

mean output of two unrelated machines has a statistically significant
difference may carry out an independent samples t-test.
Discussion of Inferential Statistics
 (Cont’d)
Paired Samples t-test - The Paired Samples t-Test compares the means of
two measurements taken from the same individual, object, or related units.
These "paired" measurements can represent things like: A measurement
taken at two different times (e.g., pre-test and post-test scores with an
intervention administered between the two time points)

 Example – A researcher who wishes to find out if there is a statistically

significant difference in IQ of twins would likely carry out this test.
 Example – A researcher wishing to find out if there is a statistically
significant difference in students’ grades before and after an academic
intervention would likely carry out this test.
Discussion of Inferential Statistics
 Analysis of Variance - One-way analysis of variance (ANOVA) is a
statistical method for testing for differences in the means of three or
more groups.

 Example – A researcher who wishes to find out if there is a statistically

significant difference in the mean sales performance of three or more
brands would likely carry out a one-way ANOVA.
Discussion of Inferential Statistics
 (Cont’d)
Chi-Square tests – These may be thought of as statistical tests of
association between two categorical variables. The Chi-Square test is a
statistical procedure used by researchers to examine the differences
between categorical variables in the same population.
 For example, imagine that a research group is interested in whether or not
education level and gender are related for all people in Jamaica.
 After collecting a simple random sample of 500 Jamaican citizens, and
administering a survey to this sample, the researchers could first manually
observe the frequency distribution of gender and education category
within their sample.
 The researchers could then perform a Chi-Square test to validate or provide
additional context for these observed frequencies.
"What we find changes who we become” (Moville, 2006)

You might also like