Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 33

VETERINARY EPIDEMIOLOGY II

11-Feb-21 1
Chapter1

Epidemiological data collection and data management

I. Methods of data collection


II. Data collection tools
III. Data management
 Data edition and cleaning
 Data coding & checking completeness of the data
 Data entry
11-Feb-21 2
Methods of data collection

Data are collected in three main ways

1.Observation (e.g., clinical examination, diagnostic

imaging and post-mortem examination)

2. Completing questionnaires (either directly or by interview);

3.Use of documentary sources (e.g., clinical records,

and records of diagnostic laboratory results), with use of data

sets generated by other workers.


11-Feb-21 3
Methods of data collection…
• Data derived from observation and questionnaire are primary

• Data obtained from documentary sources are secondary

• Diagnostic criteria needed for observation of disease has four criteria:


 Clinical signs and symptoms

 Detection of specific agents

 Reactions to diagnostic tests

 Identification of lesions.

11-Feb-21 4
Methods of data collection…
• Observation is central to the practice of clinical veterinary medicine and

• Is also important in many epidemiological investigations (e.g.


outbreak

investigation).

• Additionally, surveillance, surveys and observational studies may use secondary

data.

• However, there are occasions when the appropriate information is not readily

available, in which case it must be collected using questionnaires.


11-Feb-21 5
Collection involving measurement
• If a high degree of precision is required in the study, the variable being
investigated will normally have to be measured in some way.

• This may involve taking a biological specimen from an animal for a


diagnostic test, weighing the animals, measuring milk yield, or measuring
climatic variables such as rainfall, temperature

• Before measuring begins, it is important to understand exactly what is


being measured and what are the advantages and disadvantages of the
method used.

11-Feb-21 6
• If the procedure for diagnostic tests is involves complex equipment;

• The person using it must master all its aspects before the survey begins, to
ensure that an acceptable level of consistency in the measurements is being
obtained.

• The equipment used during a field investigation should be calibrated and


checked for accuracy before the start of each series of measurements and
should be regularly maintained.

• Errors in observations and measurements account due to variations between


observers and the measurement procedures used.

11-Feb-21 7
A. Errors due to variations between observers
• Many epidemiological studies are conducted with the help of enumerators, usually field
services staff

• Variations between different observers may occur when some degree of


subjective judgment is involved, eg. Tentative diagnosis.

• Criteria need to be established by which a diagnosis is arrived at and adhered to by all those
engaged in the study.

• Such considerations are of particular importance in retrospective studies.

• An additional problem frequently encountered is that of bias on the part of the observer.

• Can be avoided by the use of a “blind” technique where by the observer is kept ignorant of
the distribution of the determinant in the groups being studied.

11-Feb-21 8
B. Errors due to measurement
• Errors inherent in the procedures by which variable is being measured are
common in epidemiological studies.

• For example, if two weighing scales are being used in a study, one scale
may consistently give a higher reading than the other.

• Careful checking and monitoring of such apparatus before and during the
study will reduce errors of this kind

• Further errors may occur when diagnostic test are being used to determine
the presence or absence of an infectious agent.
11-Feb-21 9
The terms used to describe the reliability of diagnostic procedures are:

• Repeatability: the ability of a diagnostic test to give consistent results.

• Accuracy: the ability of a diagnostic test to give true measure.

 Accuracy is normally measured by two criteria:

• Sensitivity: the capability of that test to identify an individual as being


infected with a disease agent when that individual is truly infected.

• Specificity: the capability of that test to identify an individual as being


uninfected with a disease agent when that individual is truly not infected.

11-Feb-21 10
Questionnaires
A questionnaire is a set of written questions with different structure.

• The person who answers the questionnaire is termed


the respondent.

• Types of questionnaire

• Structures vs semi-structured

• Questions may be either open-ended or closed

11-Feb-21 11
Open-ended questions
• These allow the respondent freedom to answer in his or her own words

• Advantage:

• Freedom of expression that it permits:

• The respondent is allowed to comment, pass opinions and discuss other events that
are related to the question's topic.

• Disadvantages:

• Can increase the length of time taken to complete a questionnaire and the answers
cannot be coded when the questionnaire is designed, because the full range
of answers is not known.
• [

• A range of answers may be difficult to categorize and


code.
11-Feb-21 12
Closed questions:
Closed questions have a fixed number of options of answers.

• The questions may be dichotomous; that is, with two possible answers (yes or no)?

• Alternatively, the questions may be multiple choices;

• Advantages:

• Ease of analysis and coding because of the limited, fixed response that is allowed.

• Ease to answer.

• Disadvantage:

• Because the options of answers are fixed, the answers may not reveal
related events that may be significant.
11-Feb-21 13
Completing questionnaire

• Questionnaires can be completed by:

• By mail

• An interviewer who presents the questions verbally (either in


person)

• By telephone, to the respondents

11-Feb-21 14
Mailed and self-completed questionnaires
The main requirements for a mailed or self-completed questionnaire are great clarity and politely
explaining the reason for sending the questionnaire on the covering letter.

Advantages:

• Relatively cheap with potential for wide coverage

• Quick and easy to organize

• Avoids interviewer bias

• Allows a highly motivated respondent to 'check the facts' over a period of time;

Disadvantages:

Necessity of clarity of question and response rate low - 50% is not uncommon, and the value can be
as low as 10%.
11-Feb-21 15
Interviews
• Can overcome some of the disadvantages of mailed and self-completed questionnaires,

• Is particularly useful if many of the questions are open-ended, and where illiteracy of the
respondent is a problem.

• Questionnaires can be longer than self-completed ones, and response rates of 90% can
sometimes be achieved.

• However, personal interviews can be costly to organize, involving training, payment


and travelling expenses of interviewers.

• Telephone interviews have high response rates, and can produce results more quickly and
cheaply than personal interviews and mailed questionnaires.

• However, questions need to be short to reduce conversation time to a minimum.


11-Feb-21 16
Designing a questionnaire
• The success of a questionnaire depends on careful design.

• Ideally, everyone who is issued with a questionnaire should complete it.


• [

• The proportion of those who respond is the response rate (percentage).

• The non-response rate is therefore 100 - response rate (%);

e.g , a response rate of 70% represents a non-response rate of 30%.

• Good questionnaire design decreases non-response


11-Feb-21 17
Initial presentation: The title of the questionnaire should be brief and accurate.

• A polite letter, explaining the reason for producing the questionnaire, and the value
of the results deriving from its completion should be enclosed.

Wording: should be unambiguous, brief, polite, unemotional and non-technical.

• If technical terms are used, then they should be defined simply.

• Double negatives should be avoided.

• Each question should contain only one idea.

• Sensitive, emotive and emotional questions should be avoided

• The questionnaire should be as short as possible


11-Feb-21 18
Question sequence
• Related questions may need to be separated because the answer
given to one question may influence that given to the
succeeding question producing the phenomenon termed 'carry
over’

• General questions should be presented first, and specific ones later.

• The questionnaire can be made more interesting by 'branching out'


from one question to another
11-Feb-21 19
• Generally when drafting questions, you must keep in mind:

• Who is responding, whether or not the data are readily available

• The response burden (the length and complexity of the questionnaire)

• Confidentiality and sensitivity of the data being collected

• The reliability of the data (validity of question)

• Ultimately how the data will be processed (coding and computer entry)

11-Feb-21 20
Testing questionnaires
• Several drafts of a questionnaire are usually required following testing.

• There are normally two stages to testing:

• Informal testing: is carried out on colleagues who can detect trivia, ambiguities and defects in
questionnaire design.

• Formal testing: is undertaken on a small random sample of the population on which the full survey
will be conducted.

• This testing is called a pilot survey.

• The size of the sample is chosen using the guidelines for sample size-determination in surveys

• The pilot survey exposes further defects in questionnaire design.

• This survey should never be used as part of the full survey, and respondents used in the pilot survey
should never be used again in the full one
11-Feb-21 21
Coding and editing of questionnaire
• Before administering any questionnaire procedures for coding of responses and
computer data entry should be considered.

• When coding responses, it is wise to have a single value to represent missing


values.

• Do not simply leave these blank as, subsequently, it will be impossible to


differentiate items that were not answered on the questionnaire from those
that were missed in coding or data entry.

• Consistency of coding is important and because it is convenient to analyse no/yes


(dichotomous) variables coded as 0/1, it is advisable to use this coding from
the start.

11-Feb-21 22
• Coding of responses is best accomplished directly on the paper forms

• Do not attempt to combine coding and data entry into a single step.

• It is a good idea to use a distinctive colour of ink for recording all codes
on the forms so it is easy to differentiate writing done by the coder from
that done by the respondent or interviewer.

• Computer data entry can be done using specialised software or general


purpose programs such as spreadsheets and database managers.

• The advantage of specialised software is that it allows you to set validation


criteria easily that preclude entry of illogical values.
11-Feb-21 23
• One useful public domain program for data entry is Epi-Data (freeware http://www/epidata.dk).

• Spreadsheets must be used with caution.

• While they are convenient and easy to set up for data entry, the ability to sort individual columns
in the spreadsheet makes it possible to completely destroy the data

• General-purpose database managers are useful and allow greater manipulation of the data.

• However, because most data will ultimately be transferred to a statistical package for verification
and analysis,

• It is advisable to perform all data manipulations in that statistical package, where it is easier to
document and record all procedures carried out.

11-Feb-21 24
Data management
A. Data collection sheets

• Are either survey forms, data-collection forms

• It is important to establish a permanent storage system for all original


data collection sheets; if they are needed during the analysis.

• Some things to consider when dealing with the file are as follows:

• Do not remove originals from this file.

• If you need to take a specific sheet for use at another location,


make a photocopy of the sheet.

• Never ship the original to another location without first making copies of it.

11-Feb-21 25
• Set up a system for recording the insertion of data collection sheets into the file so
that you know how many remain to be collected before further work begins.

• Once all of the forms have been collected, before you do anything else, scan
through all sheets to get an impression for their completeness.

• If there are omissions in the data-collection sheet (i.e. forgetting to complete the
last page of a questionnaire),

• Retuning to the data source to complete these data will more likely be successful
if it is done soon after data were initially collected rather than weeks or
months later (after data analysis has begun).
11-Feb-21 26
B. Data coding
• It is advisable to have a space to allow for coding directly on the data collection sheet

• Other issues to consider when coding your data are as follows:

• assign a specific number to all missing values

• if you have 'open' questions, scan the responses and develop a list of needed codes before
starting coding

• Maintain a master list of all codes assigned

• Use numeric codes.

• In general, avoid the use of string variables except for rare instances where you need to
capture some textual information (e.g a comment field).

• Only code one piece of information in a single variable.

• Never make compound codes e.g l =male, domestic shorthair, 2=female domestic
shorthair, 3= male Siamese, etc.
11-Feb-21 27
• For all types of data, note any obvious outlier responses

• E.g an individual cow’s milk production reported as 250 kg/day)


and correct them on the datasheet

• Use a different coloured pen so your coding notations can clearly


be differentiated from anything previously recorded on the
data collection sheets.

11-Feb-21 28
C. Data entry
 Issues to consider when entering your data into a computer file are as follows:

• Double-data entry, followed by comparison of the two files to detect any inconsistencies, is preferable to single-
data entry.

• Spreadsheets are a convenient tool for initial data entry, but these must be used with extreme caution; because it is
possible to sort individual columns, it is possible to destroy your entire dataset with one inappropriate
'sort’ command.

• Custom data entry software programs provide a greater margin of safety and allows to do more data verification at
the time of entry. E.g Epi Data (http://www.epidata.dk/).

• Using hierarchical database software can make data entry and retrieval more efficient for large quantities of multi-
level data (e.g every lactation for each dairy cow from several herds over several years)

• Alternatively, it is possible to set up separate files for data at each level (e.g a herd file, a cow file etc and merge the
files after data entry.
11-Feb-21 29
• As soon as the data-entry process has been completed, save the original data files in a safe
location.

• In large, expensive trials it might be best to have a copy of all originals stored in another
location.

• If the data entry program which you use does not have the ability to save your data in the
format of the statistical package that you are going to use, there are a number of
commercially available software programs geared specifically to convert data from one
format to another

• If you use a general purpose program (e.g spreadsheet) to enter your data, as soon as the
data are entered, convert them to files usable by the statistical program that you are going
to use for the analysis.
11-Feb-21 30
D. Data editing
• Before beginning any analyses, it is very helpful to spend some time editing your
data

• The most important components of this process are labelling variables and values
within variables, formatting variables and correctly coding missing values

• All variables should have a label attached to them which more fully describes
the contents of the variable

• While variable names are often quite short (eg < 8 or <16 characters), labels
can be much longer.

• Note With some computer programs, the labels are stored in a separate file.

11-Feb-21 31
• Categorical variables should have meaningful labels attached to each of
the
categories.

• For example, sex could be coded as l or 2, but should have labels for 'male' and
'female' attached to those values.

• The number that was assigned to all missing values needs to be converted into
the code used by your statistics program for missing values.

• Some programs will allow you to attach 'notes' directly to the dataset (or to
individual variables within the dataset).

• These explanatory notes can be invaluable in documenting the contents offices.


11-Feb-21 32
E. Data verification
• Before starting any analyses, verification of data for correctness.

• BY: If you have a very small dataset, to print the entire dataset and check

• For continuous variables: determine the number of valid observations and


the number of missing values

• check the maximum and minimum values

• prepare a histogram of the data to get an idea of the distribution

• For categorical variables: determine the number of valid observations and


the number of missing values
• obtain a frequency distribution to see if the counts in each category
look, reasonable (and to make sure there are no unexpected categories).
11-Feb-21 33

You might also like