Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 65

+

DATA COLLECTION

Data collection is the process of gathering and measuring information on
variables of interest, in an established systematic fashion that enables one
to answer stated research questions, test hypotheses, and evaluate
outcomes.
• Primary data is data that is collected by a researcher from first-hand
sources, using methods like surveys, interviews, or experiments. It is
collected with the research project in mind, directly from primary sources.
• The term is used in contrast with the term secondary data. Secondary
data is data gathered from studies, surveys, or experiments that have
been run by other people or for other research.
PRIMARY DATA COLLECTION METHOD
1. Direct Personal interviews/observation
• The data is collected by the investigator personally.

• Investigator must be a keen observer, tactful and


courteous in behaviour

• Eg. Wants to study the living condition of the people


in the village – go to village and get the needed infm.
Merits
• Response is more encouraging – people willing to answer.
• Information - More accurate.
• Possible obtain the respondent personal characteristics and
environment.
• Interviewer can adjust suitably depending upon the respondent’s
reaction

Interviewer should adopt the language of communication according to


the respondent’s education level.
Demerits
• Costly – large number of respondents
• Chance of personal prejudice and bias is greater.
• Interviewer – properly trained – otherwise the entire work will be
affected .
• Time required is more.

Method is suitable for intensive rather than extensive survey.


2. Indirect oral investigation
• When the respondent is reluctant to supply information– this method
is adopted.
• The investigator contacts the third party – to know about the
respondents.
• Eg. Police enquiry
Merits

• It is simple and convenient

• Saves time, money and labour

• Used in the investigation of large area

• Information is unbaised

• Adequtate information obtained

• True information as it is from different people


Demerits
• cannot be relied due to absence of direct contact.

• Get real position – sufficient number of persons have


to be - interviewed.

• Careless attitude of the informant affect the accuracy

• Witnesses may Change the details


3. Information from correspondents

• Investigator appoints local agents in diff place to collect infm.

• They collect and sent it to central office.

• Newspaper agencies- accident, strikes, etc.,


Merits
• Extensive information can be
Collected

• Most cheap and economical method

• Speedy information is possible

• Useful where information is needed


regularly.
Demerits

• Information may be biased

• Degree of accuracy cannot be maintained

• Uniformity cannot be maintained

• Data may not be original.


4. Mailed questionnaire method
• List of question is prepared and sent to various respondents by
post.
• Questionnaire contains – question and space for answer.
• Request them to fill up and send it back within a specified time.
Merits

• Most economical/ less costly

• Use – area of investigation is large

• Saves money, labour, and time.Error is less – as it


is directly collected

from the respondents.


Demerits

• Can be adopted - only for literate respondents

• Less accuracy

• Uncertainty about the responses

leads to in correct answer.


5. Schedules sent through enumerators

• Enumerators are selected and trained – provided – standardized


questionnaire.

• They contact the respondents – get replies to the questions - fill in


their own hand writing

• Honest and original.


Merits
• Adopted - illiterate people

• Less non response rate. – personal

• Accuracy of statement – maintained


Demerits

• Costly – enumerators paid person

• Time taken is longer

• Success mainly depends on the training given to Enumerators.


STATISTICAL ENQUIRY
A STATISTICAL ENQUIRY IS A
SEARCH FOR KNOWLEDGE
THROUGH STATISTICAL DEVICES.
IT INVOLVES GATHERING OF
DATA AND ANALYSIS OF DATA TO
COME TO A CONCLUSION.
STATISTICAL SURVEY

I.PLANNING A SURVEY

II.EXECUTION OF A SURVEY
Statistical Enquiry/Investigation
• Statistical enquiry – search for knowledge
• A decision can be taken- statistical enquiry
• General purpose enquiry e.g. population census
• Special purpose enquiry – to analyse a specific
problem

Stages of
conducting
survey

Planning the Executing the


survey survey
PLANNING THE SURVEY

1.Specification of purpose- objective

2. scope of enquiry- limits of the enquiry

3. Statistical Units –unit of collection,


units of analysis & interpretation

4. Sources of data

5. Techniques data of collection –census / sample

6. Frame- list

7. Desired degree of accuracy

8. Miscellaneous considerations
EXECUTING THE SURVEY

1. Setting up of administrative team

2. Design of forms and questionnaire

3. Selection and designing of field investigation

4. Supervision of field work

5. Follow up of non-response

6. Processing and analysing of data

7. Preparation of Report
1.1.PURPOSE OF ENQUIRY
OBJECTIVE TO BE CLEARLY SET.TYPE OF
INFORMATION AND ITS USES. EG
COLLECT INFORMATIONRELATING TO A
PROBLEM OR TO TEST A HYPOTHESIS
AVOIDS CONFUSION AND WASTAGE OF
RESOURCES
INFORMATION MAY BE OF USE TO GOVT
DEPARTMENTS
1.2.SCOPE OF ENQUIRY
COVERAGE REGARDING THE GEOGRAPHICAL
AREA TO BE COVERED,TYPE OF INFORMATION
AND SUBJECT MATTER. EG

DEPENDS ON AVAILABILITY OF TIME AND


RESOURCES AND OBJECT OF ENQUIRY. NO
DELAY.

LARGER COVERAGE LEADS TO MORE


REPRESENTATION

SCOPE FIXES THE LIMITS OF ENQUIRY-


INCLUSION OR OMISSION
1.3. UNIT OF DATA COLLECTION
STATISTICAL UNITS- IN TERMS OF WHICH THE INVESTIGATOR COUNTS
THE VARIABLES SELECTED FOR ANALYSIS AND INTERPRETATION.
EG STUDY ON THE SIZE OF SUGAR MILLS
UNIT MUST SUIT PURPOSE, CLEAR , SPECIFIC AND UNIFORM
THROUGH OUT THE STUDY
UNITS OF ANALYSIS AND
INTERPRETATION
• UNITS WHICH HELP IN COMPARISON
• RATES, PERCENTAGES, RATIOS ETC

1.4.SOURCES OF DATA
DEPENDS ON OBJECT AND SCOPE.
PRIMARY AND SECONDARY DATA.
1.5.TECHNIQUES OF DATA
COLLECTION
• CENSUS AND SAMPLE TECHNIQUE
• COMPLETE ENUMERATION OF ALL UNITS OF THE UNIVERSE
• STUDY OF PART OF UNIVERSE
• TYPE DEPENDS ON COST, TIME ,RESOURCES AND SCOPE OF PROBLEM
• CENSUS MORE TIME CONSUMING AND EXPENSIVE
1.6.THE FRAME
• A LIST OR MAP OF THE UNITS
• PLANNING DEPENDS ON ACCURACY AND NATURE OF THE FRAME
• NEEDS DETAILED FIELD WORK OR MAY BE INACCURATE INCOMPLETE
OR INADEQUATE.
• WHOLE STRUCTURE OF ENQUIRY DETERMINED BY THE FRAME.
1.7.DEGREE OF ACCURACY
• DEPENDS ON OBJECT OF ENQUIRY – RICE, GOLD
• NOT POSSIBLE – BIAS, IMPERFECT TOOLS OF MEASUREMENT,
STATISTICS BASED ON ESTIMATES
• CLERICAL ERROR TO BE REDUCED
• APPROXIMATE RESULT DESIRED BY INVESTIGATOR
1.8.OTHER FACTORS
• TO BE CONSIDERED
• OFFICIAL OR NON OFFICIAL
• CONFIDENTIAL OR NON CONFIDENTIAL
• REGULAR OR ADHOC
• INITIAL OR REPETITIVE
• DIRECT OR INDIRECT
II.EXECUTING THE SURVEY
• SETTING UP ADMINISTRATIVE ORGANISATION
DEPENDS ON NATURE AND SCOPE. CENTRAL OR REGIONAL
 DESIGN OF FORMS – FRAMING OF QUESTIONNAIRE AND OTHER
SCHEDULES
 SELECTION, TRAINING AND SUPERVISION OF FIELD INVESTIGATORS –
EXISTING STAFF OR SPECIALLY APPOINTED, VOLUNTARY OR
HONORARIUM
Execution of survey

• PRELIMINARY TESTS
• TRAINING AND FIELD SUPERVISION
• FOLLOW UP OF NON RESPONSE
• CONTROL OVER THE ACCUACY OF THE FIELD WORK – FIELD CHECK AT
RANDOM TO CHECK THE PROGRESS OF WORK
• ANALYSIS AND REPORTING.
Sample and Population
CENSUS AND SAMPLE SURVEY

• All items in any field of inquiry constitutes a “Universe” or


“Population”.
• A complete enumeration of all items in the ‘population’ is known as a
census inquiry.
• This type of inquiry involves a great deal of time, money and energy.
• Government is the only institution which uses
this method of complete enumeration.
• Under sampling method, information is
collected about only a part of the population
(called a sample) and on the basis of this
information conclusion are drawn for the whole
population.
• The selected respondents constitutes what is
technically called a ‘sample’ and the selection
process is called ‘sampling technique’.
• The survey so conducted is known as ‘sample
survey’
• Refer sample questionnaire 1.28 in stat book
DATA
• Data is the base for all operations in Statistics
• Data is a collection of facts, such as numbers, words, measurements,
observations or even just descriptions of things.
Example :
The data shown below are marks scored by a person in Five Math
tests.
45, 23, 67, 82, 71

This data used to compare the scores and learn the progress.
• Qualitative data is descriptive information
(it describes something)
• Quantitative data is numerical information (numbers)
What do we know about the Dog?
DATA…
• Qualitative:
- He is brown and black
- He has long hair
- He has lots of energy
• Quantitative:
• Discrete:
• He has 4 legs
• He has 2 brothers
• Continuous:
• He weighs 25.5 kg
• He is 565 mm tall
DATA TYPES
I. Categorical Data (Nominal, Ordinal)
II. Numerical Data (Discrete, Continuous, Interval, Ratio)
• Categorical Data represents characteristics. Therefore it can represent
things like a person’s gender, language etc. Categorical data can also
take on numerical values (Example: 1 for female and 0 for male).
(numbers don’t have mathematical meaning)
NOMINAL DATA
Nominal data:
Nominal values represent
discrete units and are used to
label variables, that have no
quantitative value. Just think
of them as “labels”. Note that
nominal data that has no order.
Therefore if you would change
the order of its values, the
meaning would not change.
ORDIANL DATA
Ordinal values represent discrete (counted), and ordered units. It is
therefore nearly the same as nominal data, except that it’s ordering
matters.
Interval & Ratio
‘Interval’ indicates ‘distance Ratio is defined as a variable
between two entities’, which is measurement scale that not only
what Interval scale helps in produces the order of variables but
achieving. also makes the difference between
variables. It is calculated by
Likert's scale, Net Promoter assuming that the variables have an
Score, Semantic Differential option for zero.
Scale, Bipolar Matrix Table are
the most-used interval scale
examples.
ii.Numerical data
Discrete data can only take on certain values, can’t be measured but it can be counted. It
basically represents information that can be categorized into a classification.

Continuous data is quantitative data that can be measured.


It has an infinite number of possible values within a selected range
Continuous data can be measured and broken down into smaller parts and still have
meaning.

Example:
Money, temperature, Volume and time
DATA
• Data is the base for all operations in Statistics
• Data is a collection of facts, such as numbers, words, measurements,
observations or even just descriptions of things.
Example :
The data shown below are marks scored by a person in Five Math
tests.
45, 23, 67, 82, 71

This data used to compare the scores and learn the progress.
• Qualitative data is descriptive information
(it describes something)
• Quantitative data is numerical information (numbers)
What do we know about the Dog?
DATA…
• Qualitative:
- He is brown and black
- He has long hair
- He has lots of energy
• Quantitative:
• Discrete:
• He has 4 legs
• He has 2 brothers
• Continuous:
• He weighs 25.5 kg
• He is 565 mm tall
DATA TYPES
I. Categorical Data (Nominal, Ordinal)
II. Numerical Data (Discrete, Continuous, Interval, Ratio)
• Categorical Data represents characteristics. Therefore it can represent
things like a person’s gender, language etc. Categorical data can also
take on numerical values (Example: 1 for female and 0 for male).
(numbers don’t have mathematical meaning)
NOMINAL DATA
Nominal data:
Nominal values represent
discrete units and are used to
label variables, that have no
quantitative value. Just think
of them as “labels”. Note that
nominal data that has no order.
Therefore if you would change
the order of its values, the
meaning would not change.
ORDIANL DATA
Ordinal values represent discrete (counted), and ordered units. It is
therefore nearly the same as nominal data, except that it’s ordering
matters.
Interval & Ratio
‘Interval’ indicates ‘distance Ratio is defined as a variable
between two entities’, which is measurement scale that not only
what Interval scale helps in produces the order of variables but
achieving. also makes the difference between
variables. It is calculated by
Likert's scale, Net Promoter assuming that the variables have an
Score, Semantic Differential option for zero.
Scale, Bipolar Matrix Table are
the most-used interval scale
examples.
ii.Numerical data
Discrete data can only take on certain values, can’t be measured but it can be counted. It
basically represents information that can be categorized into a classification.

Continuous data is quantitative data that can be measured.


It has an infinite number of possible values within a selected range
Continuous data can be measured and broken down into smaller parts and still have
meaning.

Example:
Money, temperature, Volume and time
PARAMETRIC AND NON PARAMETRIC TEST
• Parametric statistics are based on assumptions about the distribution of
population from which the sample was taken. Nonparametric statistics are not
based on assumptions, that is, the data can be collected from a sample that does
not follow a specific distribution.
• Common parametric statistics are, for example, the Student's t-tests. Common
nonparametric statistics are, for example, the Mann-Whitney-Wilcoxon
(MWW) test or the Wilcoxon test.
• Background of parametric and nonparametric statistics
In parametric statistics, the information about the distribution of the population
is known and is based on a fixed set of parameters. In nonparametric statistics,
the information about the distribution of a population is unknown, and the
parameters are not fixed, which makes is necessary to test the hypothesis for the
population.
• Usage of parametric and nonparametric statistics
To decide whether to use parametric or nonparametric statistics, you should
consider several criteria about the sample data and the assumptions, and
carefully evaluate the validity of those assumptions.
Parametric
• T-test
tests
 An Independent Samples t-test compares the means for two groups.
 A Paired sample t-test compares means from the same group at
different times (say, one year apart).
 A One sample t-test tests the mean of a single group against a known
mean.
• Correlation
- identify whether two or more variables are significantly related to
each other
• Regression
A technique for determining the statistical relationship between two or
more variables where a change in a dependent variable is associated
with, and depends on, a change in one or more independent variables
Parametric test(contd..)

ANOVA test
• The only difference between one-way and two-way ANOVA is the number of
 independent variables. A one-way ANOVA has one independent variable, while a two-way
ANOVA has two.
• One-way ANOVA: Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka)
and race finish times in a marathon.
• Two-way ANOVA: Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka),
runner age group (junior, senior, master’s), and race finishing times in a marathon.
Non Parametric Test
Mann – Whitney Rank Sum ‘U’ test
The Mann-Whitney U test is used to compare whether there is a difference in the
dependent variable for two independent groups.

Wilcoxn sign rank test used to compare two related samples, matched samples, or
to conduct a paired difference test of repeated measurements on a single sample to
assess whether their population mean ranks differ.

Kruskal Wallis Test (Alternative to Anova) is an alternative to One way ANOVA


test.The Kruskal Wallis test will tell you if there is a significant difference between
groups. 

Chi square test designed to test for a statistically significant relationship between
nominal and ordinal variables organized in a bivariate table. 

You might also like