RESEARCH METHODS in Nursing

Ph.
D In Nursing
RESEARCH METHODS in nursing
Module No: 7 DATA PREPARATION
Name of the subtopic

7.3 data verification and Qualitative vs.
Quantitative data analysis techniques
Faculty Name
Date:
Subject Code:
School of Nursing
Learning Objectives
 At the end of this module the students will be able to ;
 Understand the Meaning of data verification
 list the Errors in data
 Explain the Steps in data verification
 Describe the Importance of data cleaning and maintaining
consistency
 Discuss the Data verification procedures using WinDEM
 Getting ready for analysis
 Selecting a Data Analysis Strategy
 Explain the Quantitative and qualitative analytical techniques
2
List of contents
 introduction
 Meaning of data verification
 Errors in data
 Steps in data verification
 Importance of data cleaning and maintaining consistency
 Data verification procedures using WinDEM
 Getting ready for analysis
 Selecting a Data Analysis Strategy
 Quantitative and qualitative analytical techniques
 Summary
 references
3
Introduction
nursing science is the body of knowledge that supports
evidence-based practice
The format of the data after data entry and data verification
processes have been completed, is often not the best format for
the use in data analyses. In order to manipulate, analyze and
report the information collected in a convenient and efficient
way, the data needs to be organized in a database system. Such
a database system is a structured aggregation of data-elements
which can be related and linked to each other through specified
relationships. The data-elements can then be accessed through
specified criteria and through a set of transaction operations,
which are usually implemented through a data-retrieval
language.
4
Meaning of Data verification
 A critical step in the management of survey data is the

verification of the data. It must be ensured that the data are
consistent and conform to the definitions in the codebook and
are ready for analytical use.
5
Causes of data errors
 Whenever data are collected, almost inevitably errors are
introduced by various sources. Substantial delays occur when
these errors are not anticipated and safeguards built into
procedures.
 Errors in the data may be caused:
 (a) by faulty preparation of the field monitoring and survey
tracking instruments,
 (b) by the assignment of incorrect identifications to the
respondents,
 (c) during the field administration,
 (d) during the preparation of the instruments including the
coding of the responses, and
 (e) during transcription of the data.
6
Causes of data errors-contd
 questions may have been skipped or answered in a way not

intended because of misunderstanding during translation and/or
ambiguities in the question. Such deviations can result in:
 (a) variables which have been skipped but which should have
been included; or, variables which have been included when
they should not have been (e.g. when they were misprinted);
(b) incorrectly coded variables;
 (c) variables which have a content different from that specified
by the codebook.
7
Causes of data errors-contd
 The amount of work involved in resolving these problems,

often called “data cleaning” can greatly be reduced by using
well-designed instruments, qualified field administration and
coding personnel and appropriate transcription mechanisms.
8
Data verification steps
 The steps that must be undertaken to verify the data are

implied in the quality standards that have been defined for the
corresponding survey. Procedures must be implemented for
checking invalid, incorrect and inconsistent data, which may
range from simple deterministic univariate range checks to
multivariate contingency tests between different variables and
different respondents.
9
 The steps that must be undertaken to verify the data are
implied in the quality standards that have been defined for the
corresponding survey. Procedures must be implemented for
checking invalid, incorrect and inconsistent data, which may
range from simple deterministic univariate range checks to
multivariate contingency tests between different variables and
different respondents.
10
 The criteria on which these checks are based depend, on the

one hand, on variable type (i.e. different checks may apply to
data variables, control variables, and identification variables)
and, on the other hand, to the manner and sequence in which
questions are asked.
 For some questions a certain number of responses are required,
or responses must be given in a special way, due to a
dependency or logical relationship between questions.
11
 Depending on the data collection procedures used, it must be

defined when and at what stage data verification steps are
implemented.
 For example, some range validation checks may be applied
during data entry whereas more complex checks for the
internal consistency of data or for outliers may be more
appropriately applied after the completion of the data entry.
Problems that have been detected through verification
procedures need to be resolved efficiently and quickly.
12
Data verification steps-(validation check)
Substruction: Operational System
Pain Intensity Functional Status

Instrument: Instrument:1-5 Likert scale,
VAScontinuous
10 cm scale 1=low & 5=high function
Scale: or
discrete?
(low to high pain)
Scale: continuous or discrete?
13
Data verification steps-(validation
check)
Scaling
Discrete: non-parametric (Chi square)
 Nominal gender
 Ordinal low, medium, high income
Continuous: parametric (t or F tests)
 Interval Likert scale, 1-5
functionality
 Ratio money, age, blood pressure
14
Rigor in Quantitative Research
 Theoretical Grounding: Axioms & postulates –

substruction-validity of hypothesized relationships
 Design validity (internal & external) of research design;
Instrument validity and reliability
 Statistical assumptions met (scaling, normal curve, linear
relationship, etc.)
(Note: Polit & Beck: reliability, validity, generalizability,

objectivity)
15
check)
Design Validity
 Statistical conclusion validity
 Construct validity of Cause & Effect (X & Y)
 Internal validity
 External validity
16
check)
Design Validity
 Statistical Conclusion Validity rxy?

 Type I error (alpha 0.05)
 Type II error (Beta) Power = 1-Beta, inadequate power,
i.e. low sample size
 Reliability of measures
Can you trust the statistical findings?

check)
Design Validity
 Construct Validity of Putative Cause & Effect (X  Y?)

 Theoretical basis linking constructs and concepts
(substruction)
 Outcomes sensitive to nursing care
 Link intervention with outcome theoretically
Is there any theoretical rationale for why X and Y should be

related?
check)
Design Validity
Internal Validity
 Threat of history (intervening event)
 Threat of maturation (developmental change)
 Threat of testing (instrument causes an effect)
 Threat of instrumentation (reliability of measure)
 Threat of mortality (subject drop out)
 Threat of selection bias (poor selection of subjects)
Are any Z variables causing the observed changes in Y?

check)
Design Validity
External Validity
 Threat of low generalizability to people, places, & time
 Can we generalize to others?

check)
Building Knowledge
 Goal is to have confidence in our descriptive, correlational, and

causal data.
 Rigor means to follow the required techniques and strategies
for increasing our trust and confidence in the research findings.
Data verification -contd
 Some problems can, using certain assumptions, be resolved

automatically on the basis of cross-checks in the datafiles.
Other problems will require further inspection and manual
resolving. Where problems cannot be resolved, the problematic
data-values must be recoded to special missing codes.
 The criteria on which the checking was based depended, on the
one hand, on the type of variables that were used to code the
information (for example, different criteria apply to data
variables,
22
Data verification -contd
 identification variables, control variables, filter variables, and

dependent variables) and, on the other hand on the way and
sequence in which questions were asked (for example, for
some questions a certain number of responses are required, or
responses must be given in a special way, or there is a
dependency, or a logical relationship between certain
questions).
23
 Data verification steps

 Usually, data verification is undertaken through a series of
steps, which for each survey need to be established and
sequenced in accordance with the quality requirements, the
type of data collected, and the field operations and data entry
procedures applied.
24
Data verification steps-contd
Common data verification steps are:
 • the verification of returned instruments for completeness and
correctness;
 • the verification of identification codes against field
monitoring and survey tracking instruments;
 • a check for the file integrity, in particular, the verification of
the structure of the datafiles against the specifications in the
codebook;
25
 the verification of the identification codes for internal
consistency;
 • the verification of the data variables for each student or
teacher against the validation criteria specified in the
codebook;
 • the verification of the data variables for each student and
teacher against certain control variables in the datafiles;
26
 the verification of the data obtained for each respondent for

internal consistency (for example, the responses for questions
which were coded through split variables the answers to these
can be cross-validated);
 • the cross-validation of related data between respondents;
 • the verification of linkages between related datafiles,
especially in the case of hierarchically structured data; and
 • the verification of the handling of missing data.
27
 1. Verification of file integrity
 As a first step it needs to be ensured that the overall structure
of the datafiles conforms to the specifications in the codebook.
For example, if raw datafiles are used, then each record should
have the same length in correspondence with the codebook.
Often it is useful to introduce column-control variables in the
codebook at regular intervals which the coder should code with
blanks.
28
Special recordings
 Sometimes it is necessary to recode certain variables before
they can be used in the data analyses. Examples for such
situations are:
 • The sequence of questions or items has been changed for
some reason and is not any more in accordance with the
codebook;
 • A question or item may have been asked to some respondents
in a different way or format than was intended;
 • A question or item has not been asked to some respondents
but the missing information could be derived from other
variables.
29
 Value validation
 Background questions and test items for which a fixed set of
codes rather than open-ended values applied need to be
checked against the range validation criteria defined in the
codebook. Variables with open-ended values (e.g. “Student
age”) need to be checked against theoretical ranges.
30
Treatment of duplicate identification codes
 Each datafile should be checked for duplicate identification
codes. It is thereby often useful to distinguished between the
following two cases:
 • Respondents who have identical identification codes but
different values for a number of key data variables. These
respondents have probably been assigned invalid identification
codes.
 • Respondents who have identical identification codes and at
the same time also identical data values for the key data
variables. These respondents are most likely duplicate entries
in which case the second occurrence of the respondents data
was removed from the datafiles.
31
Internal validation of an hierarchical identification system
 If a hierarchical identification system is used for identifying
respondents at different levels, then the structure of this system
can be verified for internal consistency. The number of errors
in identification codes which are often a serious threat to the
use of the data can thus be dramatically reduced. Often
inconsistencies can then be resolved automatically or even
avoided during the entry of data.
32
Verification of the linkages between datafiles

 Often data are collected at different levels, for example data
are collected for students and for the teachers which teach the
classes in which the students are enrolled. Then it is important
to verify the linkages between such levels of data collection.
33
 There may be cases where for a teacher in the teacher datafile
there are no students linked to him or her in the student datafile
(that is, for a certain Teacher ID in the teacher datafile there is no
matching Teacher ID in the student datafile);
 • There may be cases in the student datafile which do not have a
match in the teacher datafile even though they are associated
with a Teacher ID;
 • There may be cases where the Class IDs of all students which
were linked to a teacher were different from the Class ID of this
teacher in the teacher datafile; and
 • There may be cases where the Class IDs of all students which
were linked to a teacher are different from the Class ID of this
teacher in the teacher datafile
34
 Verification of participation indicator variables against data

variables. for example, in the first situation we would base the
student score only on the answers given in the first testing
session and exclude the items in the second testing session
from scoring, whereas in the second situation case we would
score all items in the second testing session as wrong.
 To allow the verification of this, the datafile should, besides the
variables with the questions and test items, contain also
information about the participation status of the respondents in
the different
35
 testing sessions. It is often useful to group the variables in the

codebook into blocks. Each of these blocks can then begin with
a variable the code of which indicates the participation of the
student in the respective testing session. It can then be verified
whether the participation indicator variables matched the data
in the corresponding data variables.
36
 Verification of exclusions of respondents

 Checking for inconsistencies in the data
37
Data verification procedures using
WinDEM
 Data verification procedures using WinDEM

 WinDEM offers some simple data verification procedures. All
the procedures are found under the “Verify” menu. The
following section describes some of the fundamental ones:
38
WinDEM
1. Unique ID check
 There must be only one record within a file for each unit of
analysis surveyed. This verification procedure checks whether
each record has been allocated with a unique identification
code.
2.Column check
 When a series of similar variables exist in a file, it is possible
that the enterer skips a variable or enters a variable twice, and
consequently a column shift occurs. This can be avoided if you
introduce variables in the datafile at regular positions in the
codebook, into which the data entry personnel must enter a
blank value.
39
WinDEM-contd
2.Column check-contd
 . In order to be recognized by the automatic checking routines
of the WinDEM program, the names of these variables must
have the prefix “CHECK”. Column shift should not occur if
the data enters followed these directions of entering the blank
values. You can also see that the data entry proceeded correctly
by looking at the “Table entry” from the “view” menu.
40
WinDEM-contd
3.Validation check
 As mentioned before, WinDEM assures that the values are
within the range specified in the structure file unless the data
puncher explicitly confirms the out-of-range values entered.
This validation criteria check will show all the variables of all
the cases that have been “confirmed” to contain out-of-range
values. This can be useful especially when many data enterers
are involved in the survey study.
41
WinDEM-contd
Merge check
 WinDEM allows you to check the consistency between
variables. This check detects records in a datafile that do not
have matches in a related datafile for a higher level of data
aggregation
42
WinDEM-contd
 Using “Merge check” from the “Verify” menu, you can select
the variables (or variable combinations) by which the records
in the selected data file are matched against the records in the
higher-level aggregated data file. The software will ask you to
specify the datafile against which to check the merge of the
current datafile in the “File Open Dialog”.
 The program will notify you if some errors are found. The
software will ask you if you want to open the data verification
report
43
WinDEM-contd
 Double coding check

 In order to produce high-quality data, it is sometimes
recommended to enter data into two different computers
(requiring two different data enterers). This allows you to
examine if the two files have exactly the same structures and
the values on all records. In order to check this, however, you
will have to have these two files under different names and the
two corresponding codebooks on one
44
GETTING DATA READY FOR
ANALYSIS
 After data are obtained through questionnaires, interviews,

observation, or through secondary sources, they need to be
edited. The blank responses, if any, have to be handled in some
way, the data coded, and a categorization scheme has to be set
up. The data will then have to be keyed in, and some software
program used to analyze them.
45
Selecting a Data Analysis Strategy
Earlier Steps (1, 2, 3) of the Marketing Research Process
Known Characteristics of Data
Properties of Statistical Techniques
Background & Philosophy of the Researcher
Data Analysis Strategy

Quantitative techniques of data analysis
A Classification of Univariate
Techniques Univariate Techniques
Metric Data Non-numeric

Data
One Sample Two or More One Sample Two or More

Samples Samples
* t test * Frequency
* Z test * Chi-Square
* K-S
* Runs
* Binomial
Independent Related
* Two- * Paired Independent Related
Groups t * t test
test * Chi-Square * Sign
* Z test * Mann-Whitney * Wilcoxon
* One-Way * Median * McNemar
ANOVA * K-S * Chi-Square
* K-W ANOVA
A Classification of Multivariate
Techniques
Multivariate Techniques
Dependence Interdependenc
Technique e Technique
One Dependent More Than One Variable Interobject

Variable Dependent Interdependenc Similarity
Variable e
* Cross- * Multivariate * Factor * Cluster Analysis
Tabulation Analysis of Analysis *
* Analysis of Variance and Multidimensiona
Variance and Covariance l Scaling
Covariance * Canonical
* Multiple Correlation
Regression * Multiple
* Conjoint Discriminant
Analysis Analysis
Types of Statistical Analysis Used in Research
 Descriptive Analysis:
 Mean, Mode, Median and Standard deviation.
 Inferential Analysis:
 Hypothesis testing and estimation of true population values.
 Differences Analysis:
 Determination of significant differences exit in the
population.
 Associative Analysis:
 Investigation of how two and more variables are related.
 Predictive Analysis:
 It is used to enhance prediction capabilities of marketing
researcher. Ex: regression analysis
Understanding Data Via Descriptive Analysis
 Measure of Central Tendency:

 Mode
 Highest occurrence in a set of variables.
 Median
 Occurrence in the middle of a set values.
 Mean:
 Arithmetic average of a set of numbers.
Understanding Data Via Descriptive Analysis (cont.)
 Measure of Variability:
 Frequency Distribution:
 Number of times that each different value appears.
 Range:
 Identifies the distance between the lowest and the highest
value in an ordered set of variables.
 Standard Deviation:
 The degree of variation or diversity in the values in a such
a way to be translated in a normal bell-shaped distribution.
Understanding the Data Via Descriptive
Statistics (cont.)
 Other Descriptive Measures:
 Measure of Skewness:
 Reveals the degree of direction of asymmetry in a
distribution. A ‘0’ value indicates symmetric distribution,
a negative value indicates distribution has tail to the left, a
positive value indicates distribution has tail to the right.
 Kurtosis:
 How pointed and peaked a distribution appears. A ‘0’
value indicates distribution is bell shaped, a negative value
indicates distribution is more flat, a positive value
indicated distribution is more peaked than the bell shaped
curve.
Other Quantitative techniques of data analysis
2. Programming technique
a) Linear programming
b) Inventory control
c) Game theory
d) Network analyze
e) Non-linear program
Limitation of quantitative technique
 The inherent limitation concerning mathematical expressions.

 High cost are involved in the use of quantitative techniques
 Quantitative techniques do not take into consideration the
intangible factors (i.e.) non-measurable human factors
 Quantitative techniques are just the tools of analysis and not
the complete decision making process
Qualitative data analysis techniques
 The qualitative researcher, however, has no system for pre-

coding, therefore a method of identifying and labelling or
coding data needs to be developed that is bespoke for each
research. - which is called content analysis.
 Content analysis can be used when qualitative data has been
collected through:
 Interviews
 Focus groups
 Observation
 Documentary analysis
56
 The analysis of qualitative research involves aiming to uncover

and / or understand the big picture - by using the data to
describe the phenomenon and what this means. Both
qualitative and quantitative analysis involves labelling and
coding all of the data in order that similarities and differences
can be recognised. Responses from even an unstructured
qualitative interview can be entered into a computer in order
for it to be coded, counted and analysed.
57
 Content analysis is '...a procedure for the categorisation of

verbal or behavioural data, for purposes of classification,
summarisation and tabulation.'
 The content can be analysed on two levels:
 Basic level or the manifest level: a descriptive account of the
data i.e. this is what was said, but no comments or theories as
to why or how
 Higher level or latent level of analysis: a more interpretive
analysis that is concerned with the response as well as what
may have been inferred or implied
58
 Content analysis involves coding and classifying data, also

referred to as categorising and indexing and the aim of context
analysis is to make sense of the data collected and to highlight
the important messages, features or findings.
59
Six Steps in Qualitative Data Analysis
1. Give codes from the notes.
2. Note personal reflections in the margin.
3. Sort and sift the notes to identify similar and different
relationships between patterns.

Six Steps in Qualitative Data Analysis
4. Identify these patterns, similarities and differences.
5. Elaborate a small set of generalizations that cover the
consistencies.
6. Examine those generalizations and form grounded theory.

Grounded Theory Analysis
Strategies
 Grounded theory:
 A process of constructing various data
 Inductive process by collecting, analyzing and comparing data
systematically.
 Theory is grounded on data to explain the phenomena.
 The main purpose is to develop theory through understanding
concepts that are related by means of statements of
relationships.
Strategies-contd
 Recur by moving back and forth with the data, analyzing,
collecting more data and analyzing some more until reaching
conclusions.
 An interactional method of theory building by comparing and
analyzing the data.

Strategies-contd
 Three steps in the grounded theory analytic process:
1. Open coding:
Break data into small parts compare for similarities and

differences explain the meanings of the data by focusing on
“ who, when, where, what, how much, why” (ask questions to
get a clear story)
Strategies-contd
2. Axial coding:
After open coding, make connection (sort) between categories

and confirm or disconfirm your hypotheses.
3. Selective coding:
Select the core category (match hypotheses) and explain the

minor category (against hypotheses) with additional supporting
data.
 Coding process:
 Open coding
 Axial coding
 Select coding
Interpretation Issues in
Qualitative Data Analysis
A. Triangulating Data
 Use multiple methods and data sources to support the
strength of interpretations and conclusion
Ex) semi-structured interviews, consent form, grounded

theory
Qualitative Data Analysis-contd
B. Audits
 Questions to examine the data for interpretations and conclusion
1. Is sampling appropriate to ground the findings?
2. Are coding strategies applied correctly?
3. Is the category process appropriate?
4. Do the results link hypotheses? (examine literature review)
5. Are the negative cases explained? (minority’s voice)

Suggestions
 Four steps of negative case testing
1. Make a rough hypothesis
2. Conduct a thorough search
3. Discard or reformulate hypothesis
4. Examine all relevant cases

C. Cultural bias
 Discuss cultural differences with different groups of
participants
 To see whether divergence is based on culturally different
interpretations
D. Generalization
 Not appropriate for qualitative research
 Two perspectives for generalization
1. Case-to-case translation (transferability)-
by providing thick description to apply to another setting
2. Analytic generalization-
form a particular set of results to a broader theory
Ex) use deviant cases

Summary
 This module explains quantitative and qualitative research in

detail. The validity, strength and weakness were given in detail.
In the next module we will understand data analyses and
parametric test in detailed
72
References
 Carol Leslie Macnee, (2008), Understanding Nursing Research:
Using Research in Evidence-based Practice, Lippincott Williams
& Wilkins, ISBN 0781775582, 9780781775588
 Densise.Polit, et.al, (2013). ‘Nursing research-principles and
methods’, revised edition, Philadelphia, Lippincott
 https://researchrundowns.wordpress.com/quantitative-
methods/instrument-validity-reliability/
 http://www.nationaltechcenter.org/index.php/products/at-
research-matters/validity/
 http://ocw.jhsph.edu/courses/hsre/PDFs/HSRE_lect7_weiner.pdf
 http://ccnmtl.columbia.edu/projects/qmss/measurement/validity_
and_reliability.html
Thanks
Next Topic>>
PARAMETRIC
TEST
74

RESEARCH METHODS in Nursing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RESEARCH METHODS in Nursing

Uploaded by

Copyright:

Available Formats

Ph.

RESEARCH METHODS in nursing

Module No: 7 DATA PREPARATION

Name of the subtopic

 A critical step in the management of survey data is the

 questions may have been skipped or answered in a way not

 The amount of work involved in resolving these problems,

 The steps that must be undertaken to verify the data are

 The criteria on which these checks are based depend, on the

 Depending on the data collection procedures used, it must be

Substruction: Operational System

Pain Intensity Functional Status

 Theoretical Grounding: Axioms & postulates –

(Note: Polit & Beck: reliability, validity, generalizability,

 Statistical Conclusion Validity rxy?

Can you trust the statistical findings?

 Construct Validity of Putative Cause & Effect (X  Y?)

Is there any theoretical rationale for why X and Y should be

Are any Z variables causing the observed changes in Y?

 Can we generalize to others?

 Goal is to have confidence in our descriptive, correlational, and

 Some problems can, using certain assumptions, be resolved

 identification variables, control variables, filter variables, and

 Data verification steps

 the verification of the data obtained for each respondent for

Verification of the linkages between datafiles

 Verification of participation indicator variables against data

 testing sessions. It is often useful to group the variables in the

 Verification of exclusions of respondents

 Data verification procedures using WinDEM

 Double coding check

 After data are obtained through questionnaires, interviews,

Known Characteristics of Data

Properties of Statistical Techniques

Background & Philosophy of the Researcher

Data Analysis Strategy

Metric Data Non-numeric

One Sample Two or More One Sample Two or More

One Dependent More Than One Variable Interobject

 Measure of Central Tendency:

 The inherent limitation concerning mathematical expressions.

 The qualitative researcher, however, has no system for pre-

 The analysis of qualitative research involves aiming to uncover

 Content analysis is '...a procedure for the categorisation of

 Content analysis involves coding and classifying data, also

1. Give codes from the notes.

2. Note personal reflections in the margin.

3. Sort and sift the notes to identify similar and different

relationships between patterns.

4. Identify these patterns, similarities and differences.

5. Elaborate a small set of generalizations that cover the

6. Examine those generalizations and form grounded theory.

 Recur by moving back and forth with the data, analyzing,

collecting more data and analyzing some more until reaching

 An interactional method of theory building by comparing and

analyzing the data.

 Three steps in the grounded theory analytic process:

Break data into small parts compare for similarities and

After open coding, make connection (sort) between categories

Select the core category (match hypotheses) and explain the

Ex) semi-structured interviews, consent form, grounded

1. Is sampling appropriate to ground the findings?