Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 56

Chapter 5

DATA PREPARATION,
ANALYSES,
AND INTERPRETATION

https://sites.google.com/site/iotictcourses

1
• In most types of research studies, the
process of data analysis involves the
following three steps:
1. Preparing the data for analysis,
2. Analyzing the data, and
3. Interpreting the data
 Testing the research hypotheses and drawing
valid inferences.

2
DATA PREPARATION
• All studies require some form of data collection and
entry mechanisms.
• Data must be treated with great respect and care.
• Important to
– ensure the confidentiality and security of personal data.
• carefully plan the way how the data will be:
– Logged, entered,
– transformed and organized to facilitate accurate and
efficient statistical analysis.
• We can use computer applications for logging,
tracking & analyzing research data
– e.g: MS Access, MS Excel, SPSS, STATA, SAS, Lisrel,
Amos, PLS Beta Testing etc. 3
Summarizing Data
• Data are a bunch of values of one or more
variables.
• A variable is something that has different
values.
– Values can be numbers or names, depending
on the variable:
• Numeric, e.g. weight
• Counting, e.g. number of injuries
• Ordinal, e.g. competitive level (values are
numbers/names)
• Nominal, e.g. sex (values are names)
4
• A statistics is a number summarizing a bunch
of values.
– Simple or univariate statistics summarize values
of one variable.
– Effect or outcome statistics summarize the
relationship between values of two or more
variables.
Descriptive Statistics
• Describes the frequency and/or percentage
distribution of a single variable
• Tells how many and what percent
• Example: 33% of the respondents are male and
67% are female 5
• Measures of Central Tendency: The 3-M’s
– mode: most frequent response
– median: mid-point of the distribution
– mean: arithmetic average
• Which to use depends on the type of data
you have
– nominal, ordinal, interval or ratio
• It Describes (The 3-M’s ):
– how similar are the data?
– Example:
– How similar are the ages of this group of people?

6
• Measures of Dispersion
– Shows how dissimilar are the data?
– example: How much variation in the ages?
– Example:
– Range & standard deviation/variation
• Range
– difference between the highest and lowest value
– simple to calculate, but not very valuable
• Standard deviation
– measure of the spread of the scores around the
mean
– superior measure, it allows every case to have
7
an impact on its value
Hypothesis testing
• The most common kind of statistical inference is
hypothesis testing.
• For example, if our research hypothesis is that the
coin is fair. (Null hypothesis).
• If the probability of getting our sample results from a
fair coin is very low, we feel confident in rejecting the
null hypothesis (that the coin is fair).
– So the hypothesis is not supported.

8
Statistical Data Analysis: p-value
• In statistical hypothesis testing we use a p-value
(probability value) to decide whether we have
enough evidence to reject the null hypothesis and
say our research hypothesis is supported by the
data.
• The p-value is a numerical measure of the
statistical significance of a hypothesis test.
• By convention, if the p-value is less than 0.05 (p <
0.05), we conclude that the null hypothesis can be
rejected (i.e., the coin is not fair).
• In other words, when p < 0.05 we say that the
results are statistically significant.
9
• We study a sample to find out something new about the
population.
• The value of a statistics for a sample is only an estimate of
the true (population) value.
• Express precision or uncertainty in true value using
confidence limits. Like ( 95% ,99% confidence limits)
– Confidence limits represent likely range of the true
value.
– They do NOT represent a range of values in different
subjects.
– There's a 5% chance that the true value is outside the
95% confidence interval: the Type I error rate.

10
T-Test
• Most commonly used Statistical Data Analysis procedure
for hypothesis testing.
• There are several kinds of T-tests, but the most common is
the "two-sample T-test" also known as "Student's T-test" or
the "independent samples T-test".
• The two sample T-test simply tests whether or not two
independent populations have different mean values on
some measure.
• For example, we might have a research hypothesis that
rich people have a different quality of life than poor people.
We give a questionnaire that measures quality of life to a
random sample of rich people and a random sample of
poor people. The null hypothesis, which is assumed to be
true until proven wrong, is that there is really no difference
between these two populations. 11
• We gather some sample data and observe that the
two groups have different average scores. But
does this represent a real difference between the
two populations, or just a chance difference in our
samples?
• T-test allows us to answer this question by using
the T-test statistic to determine a p-value that
indicates how likely we could have gotten these
results by chance.
• By convention, if there is a less than 5% chance of
getting the observed differences by chance, we
reject the null hypothesis and say we found a
statistically significant difference between the two
groups.
12
Measures of Association
• Correlation Coefficient, r : The quantity r, called
the linear correlation coefficient, measures the
strength and the direction of a linear relationship
between two variables.
• The linear correlation coefficient is sometimes
referred to as the Pearson product moment
correlation coefficient in honor of its developer Karl
Pearson.

13
• The value of r is such that -1 < r < +1.
– The + and – signs are used for positive linear
correlations and negative linear correlations,
respectively. They indicate direction.
• Positive correlation: If x and y have a strong
positive linear correlation, r is close to +1.
– An r value of exactly +1 indicates a perfect
positive fit.
– Positive values indicate a relationship
between x and y variables such that as values
for x increases, values for y also increase.

14
• Negative correlation: If x and y have a strong
negative linear correlation, r is close to -1.
– An r value of exactly -1 indicates a perfect
negative fit.
– Negative values indicate a relationship between
x and y such that as values for x increase,
values for y decrease.
• No correlation: If there is no linear correlation or
a weak linear correlation, r is close to 0.
– A value near zero means that there is a
random, nonlinear relationship between the two
variables
15
• Correlations
0 0.1 0.3 0.5 0.7 0.9 1
trivial small moderate large very large !!!

Crosstabs/Cross tabulation
– presented in a matrix format
– displays two or more variables
simultaneously
– each cell shows number of
respondents

16
Chi-Square (χ2) and Frequency Data
• Up to this point, the inference to the population has been
concerned with “scores” on one or more variables, such
as CAT scores, mathematics achievement, and hours
spent on the computer.
• We used these scores to make the inferences about
population means. To be sure not all research questions
involve score data.
• Today the data that we analyze consists of frequencies;
that is, the number of individuals falling into categories.
• In other words, the variables are measured on a nominal
scale.
• The test statistic for frequency data is Pearson Chi-Square
(X2) Test.
• The magnitude of Pearson Chi-Square Test reflects the
amount of discrepancy between observed frequencies
and expected frequencies. 17
18
Steps in Test of Hypothesis
1. Determine the appropriate test

2. Establish the level of significance:α

3. Formulate the statistical hypothesis

4. Calculate the Test statistic

5. Determine the degree of freedom

6. Compare computed test statistic against a


tabled/critical value
19
1. Determine Appropriate Test
• Chi Square is used when both variables are
measured on a nominal scale.
• It can be applied to interval or ratio data that have
been categorized into a small number of groups.
• All observations are independent (an individual
can appear only once in a table and there are no
overlapping categories).
• It does not make any assumptions about the
shape of the distribution nor about the
homogeneity of variances.
• T-Test or another type
20
2. Establish Level of
Significance
• α is a predetermined value
• The convention
• α = 0.05
• α = 0.01
• α = 0.001

21
3. Determine The Hypothesis:
Whether There is an
Association or Not
• Ho : The two variables are independent
• Ha : The two variables are associated

22
4. Calculating Test Statistics
• Contrasts observed frequencies in each cell
of a contingency table with expected
frequencies.
• Expected frequency of two unrelated events
is product of the row and column frequency
divided by number of cases.
F e= F r F c / N

23
4. Calculating Test Statistics

 ( Fo  Fe )  2
  
2

 Fe 

24
4. Calculating Test Statistics
O
fre bse
qu r ve
en d
c ie
s

 ( Fo  Fe )  2
  
2

 Fe 

Ex que
fre
pe ncy
c te
d
qu ed
cy
fre pect
en
Ex

25
26
5. Determine Degrees of

f
be r o
Num column
df = (R-1)(C-1)

ls i n
l e ve a r i a bl e
Freedom

v
Numb
e
levels r of
in ro
variab w
le
6. Compare computed test statistic
against a tabled/critical value
• The computed value is compared with
the critical tabled value to determine if
the computed value is improbable.
• The critical tabled values are based on
sampling distributions of the Pearson
chi-square statistic
• If calculated 2 is greater than 2 table
value, reject Ho.
27
Example
• Suppose a researcher is interested in
voting preferences on gun control issues.
• A questionnaire was developed and sent
to a random sample of 90 voters.
• The researcher also collects information
about the political party membership of the
sample of 90 respondents.

28
Bivariate Frequency Table or
Contingency Table

Favor Neutral Oppose f row

Democrat 10 10 30 50

Republican 15 15 10 40

f column 25 25 40 n = 90

29
Bivariate Frequency Table or
Contingency Table

Favor Neutral Oppose f row

Democrat 10 10 30 50

Republica 15 15 10 40
n

f column d 25 25 40 n = 90
r ve ies
b se nc
O que
fre 30
Bivariate Frequency Table or

Row frequency
Contingency Table

Favor Neutral Oppose f row

Democrat 10 10 30 50

Republica 15 15 10 40
n

f column 25 25 40 n = 90

31
Bivariate Frequency Table or
Contingency Table

Favor Neutral Oppose f row

Democrat 10 10 30 50

Republica 15 15 10 40
n

f column 25 25 40 n = 90
Column frequency

32
1. Determine Appropriate Test

1. Party Membership ( 2 levels) and


Nominal
2. Voting Preference ( 3 levels) and
Nominal

33
2. Establish Level of
Significance
Alpha of .05

34
3. Determine The Hypothesis
• Ho : There is no difference between D & R
in their opinion on gun control issue.

• Ha : There is an association between


responses to the gun control survey and
the party membership in the population.

35
4. Calculating Test Statistics
Favor Neutral Oppose f row

Democrat fo =10 fo =10 fo =30 50


fe =13.9 fe =13.9 fe=22.2
Republican fo =15 fo =15 fo =10 40
fe =11.1 fe =11.1 fe =17.8
f column 25 25 40 n = 90

36
4. Calculating Test Statistics
Favor Neutral Oppose f row

Democrat fo =10 fo =10 fo =30


= 50*25/90 50
fe =13.9 fe =13.9 fe=22.2
Republica fo =15 fo =15 fo =10 40
n fe =11.1 fe =11.1 fe =17.8
f column 25 25 40 n = 90

37
4. Calculating Test Statistics
Favor Neutral Oppose f row

Democrat fo =10 fo =10 fo =30 50


fe =13.9 fe =13.9 fe=22.2
Republica fo =15 fo =15 fo =10 40
= 40* 25/90
n fe =11.1 fe =11.1 fe =17.8
f column 25 25 40 n = 90

38
4. Calculating Test Statistics

(10  13. 89) 2


(10  13. 89) 2
(30  22. 2 ) 2
2    
13.89 13.89 22.2

(15  11.11) 2 (15  11.11) 2 (10  17.8) 2


 
11.11 11.11 17.8

= 11.03

39
5. Determine Degrees of
Freedom
df = (R-1)(C-1) =
(2-1)(3-1) = 2

40
6. Compare computed test statistic
against a tabled/critical value
• α = 0.05
• df = 2
• Critical tabled value = 5.991
• Test statistic, 11.03, exceeds critical value
• Null hypothesis is rejected
• Democrats & Republicans differ
significantly in their opinions on gun
control issues

41
How to report Findings
• Once you have completed your research
(analyzed your data), there are three
ways of reporting the findings
1. Written reports
2. Journal articles
3. Oral presentation

42
Written report format
Traditional written reports tend to be produced in the
following format.
1. Title of the page
– This contains the title of the report,
– The name of the researcher
– The date of publication.
– If the report is dissertation or thesis, the title page
will include details about the purpose of the report
• for example ‘ a thesis submitted in partial fulfillment
of the requirements of for the degree of Masters of
Science in Information Systems’.
– If the research has been funded by a particular
organization, details of this may be included on the
title page.  sponsors
43
2. Contents page - list the contents of the
report either in chapter or section
headings with sub headings if relevant
and page numbers
3. List of illustrations – this section includes,
title & page number of all graphs, tables,
illustrations & charts etc.
4. Acknowledgements – Some researchers
may wish to acknowledge the help of
their research participants, tutors,
employers and/or funding bodies
– Political/Religious Affiliations/ acknowledgements
– No need to criticize others who made problems
44
5. Abstract/Summary –This tends to be a one
page summary of the research, its purpose,
scope, methods, main findings / Results and
discussions/conclusions.
6. Introduction- This section introduces
research ,settings out the aims and objectives,
terms and definitions. It includes the rationale
for the research and summary of the report
structure.
7. Background - in this section is included all your
background research, which may be obtained
from literature, personal experience or both.
45
• Citation  acknowledge
– You must indicate from where all the
information to which you refer has come, so
remember to keep a complete record of every
thing you read.
– If you do not do this, you could be accused of
plagiarism which is a form of intellectual theft.
– When you are referring to a particular book or
journal article, find out the accepted standard
for referencing from your institution.

46
8. Methodology and Methods- this section is set out
a description of ,and justification for, the chosen
methodology and research methods.
• The length and depth of this section will depend
up on whether you are a student or employee.
• If you are undergraduate student ,you will need
to raise some of the methodological and
theoretical issues pertinent to your work.
• If you are postgraduate student, you will need
also to be aware of epistemological and
ontological issues involved.

47
9. Findings/Analysis- this section should
include your main findings.
• The content of this section will depend on your
chosen methodology and methods.
• If you have a large quantitative survey,This
section may contain tables, graphs, pie charts
and associated statistics.
• If you have conducted a qualitative pieces of
research ,this section may be descriptive prose
containing lengthy questions .

48
10. Conclusion- In this section you sum up your
findings and draw conclusions from them,
perhaps in relation to other research or
literature.
11. Recommendations –Some academic reports
will not need this section.
• If you are an employee researcher ,this section
could be the most important part of the port.
• In this section is set out a list of clear
recommendations which have been developed
from your research.

49
12. Further research – It is useful in both academic
reports and work-related reports to include a
section which shows how a research can be
continued.
• Perhaps some results are inconclusive ,or
perhaps the research has thrown up many
more research questions which need to be
addressed
• It is useful to include this section because it
shows that you are aware of the wider picture
and that your are not trying to cover up some
thing which you feel may be lacking from your
own work .
50
13. Reference –Small research projects will need
only a reference section.
– Harvard system or Chicago system
• This includes all the literature to which you have
referred in your project.
• The popular method is the Harvard system which
lists the authors surnames
alphabetically ,followed by their initials, date of
publication, Title of book in italics ,place of
publication and publisher.
– If the reference is journal article the title of the
article appears in inverted commas and the
name of the journal appears in italics, followed
by the volume number and pages of the article. 51
14. Bibliography - If you have read other
work in relation to your research but not
actually referred to them when writing up
your report ,you might need to include in
the bibliography,

52
Journal Articles
• If you want your research findings to reach a wider
audience, it might worth considering producing an
article for journal.
• Most academic journals do not pay for articles
they publish, but many professional of trade
publications do pay for contribution, if published.
• However the competition can be fierce and your
article will have to stand out from the crowd if you
want to be successful.

53
Oral presentations
• Another method of presenting your
research findings is through an oral
presentation. This may be at the university
or college to other students or tutors, at a
conference to other researchers or work
colleagues, or in work place to colleagues
employers or funding bodies.
• Helps a wider audience to find out about a
research .
54
Considerations in Slide preparation
– Your slides should be clear, visible and legible to the audience

• Layout: consider background and text colors


– blue background with white/bright yellow text
• Background: consistency
– use same background color and style for all
slides
• Font: size and style (capital, small letters)

55
Delivering the oral presentation
– Have hard copies as a reserve in case no electric power/LCD
– Begin the presentation on time
– Familiarize your self to the room locations, switch, microphones
– Arrive early to the presentation hall so that you can discus with the
organizers
– Insure that Your sound is hearable comfortable for the audience
– Do not be nervous
– Make eye contact with your audience
– Draw the attention of your audience to important points
– Make sure that you do not block the view of the audience
– Mention owners of works in your literatures
– Finish on time
– Start the question/ answer session and welcome the audience

56

You might also like