Professional Documents
Culture Documents
Untitled
Untitled
DATA COLLECTION
•
Data collection is the process of gathering and measuring information on
variables of interest, in an established systematic fashion that enables one
to answer stated research questions, test hypotheses, and evaluate
outcomes.
• Primary data is data that is collected by a researcher from first-hand
sources, using methods like surveys, interviews, or experiments. It is
collected with the research project in mind, directly from primary sources.
• The term is used in contrast with the term secondary data. Secondary
data is data gathered from studies, surveys, or experiments that have
been run by other people or for other research.
PRIMARY DATA COLLECTION METHOD
1. Direct Personal interviews/observation
• The data is collected by the investigator personally.
• Information is unbaised
• Less accuracy
I.PLANNING A SURVEY
II.EXECUTION OF A SURVEY
Statistical Enquiry/Investigation
• Statistical enquiry – search for knowledge
• A decision can be taken- statistical enquiry
• General purpose enquiry e.g. population census
• Special purpose enquiry – to analyse a specific
problem
Stages of
conducting
survey
4. Sources of data
6. Frame- list
8. Miscellaneous considerations
EXECUTING THE SURVEY
5. Follow up of non-response
7. Preparation of Report
1.1.PURPOSE OF ENQUIRY
OBJECTIVE TO BE CLEARLY SET.TYPE OF
INFORMATION AND ITS USES. EG
COLLECT INFORMATIONRELATING TO A
PROBLEM OR TO TEST A HYPOTHESIS
AVOIDS CONFUSION AND WASTAGE OF
RESOURCES
INFORMATION MAY BE OF USE TO GOVT
DEPARTMENTS
1.2.SCOPE OF ENQUIRY
COVERAGE REGARDING THE GEOGRAPHICAL
AREA TO BE COVERED,TYPE OF INFORMATION
AND SUBJECT MATTER. EG
1.4.SOURCES OF DATA
DEPENDS ON OBJECT AND SCOPE.
PRIMARY AND SECONDARY DATA.
1.5.TECHNIQUES OF DATA
COLLECTION
• CENSUS AND SAMPLE TECHNIQUE
• COMPLETE ENUMERATION OF ALL UNITS OF THE UNIVERSE
• STUDY OF PART OF UNIVERSE
• TYPE DEPENDS ON COST, TIME ,RESOURCES AND SCOPE OF PROBLEM
• CENSUS MORE TIME CONSUMING AND EXPENSIVE
1.6.THE FRAME
• A LIST OR MAP OF THE UNITS
• PLANNING DEPENDS ON ACCURACY AND NATURE OF THE FRAME
• NEEDS DETAILED FIELD WORK OR MAY BE INACCURATE INCOMPLETE
OR INADEQUATE.
• WHOLE STRUCTURE OF ENQUIRY DETERMINED BY THE FRAME.
1.7.DEGREE OF ACCURACY
• DEPENDS ON OBJECT OF ENQUIRY – RICE, GOLD
• NOT POSSIBLE – BIAS, IMPERFECT TOOLS OF MEASUREMENT,
STATISTICS BASED ON ESTIMATES
• CLERICAL ERROR TO BE REDUCED
• APPROXIMATE RESULT DESIRED BY INVESTIGATOR
1.8.OTHER FACTORS
• TO BE CONSIDERED
• OFFICIAL OR NON OFFICIAL
• CONFIDENTIAL OR NON CONFIDENTIAL
• REGULAR OR ADHOC
• INITIAL OR REPETITIVE
• DIRECT OR INDIRECT
II.EXECUTING THE SURVEY
• SETTING UP ADMINISTRATIVE ORGANISATION
DEPENDS ON NATURE AND SCOPE. CENTRAL OR REGIONAL
DESIGN OF FORMS – FRAMING OF QUESTIONNAIRE AND OTHER
SCHEDULES
SELECTION, TRAINING AND SUPERVISION OF FIELD INVESTIGATORS –
EXISTING STAFF OR SPECIALLY APPOINTED, VOLUNTARY OR
HONORARIUM
Execution of survey
• PRELIMINARY TESTS
• TRAINING AND FIELD SUPERVISION
• FOLLOW UP OF NON RESPONSE
• CONTROL OVER THE ACCUACY OF THE FIELD WORK – FIELD CHECK AT
RANDOM TO CHECK THE PROGRESS OF WORK
• ANALYSIS AND REPORTING.
Sample and Population
CENSUS AND SAMPLE SURVEY
This data used to compare the scores and learn the progress.
• Qualitative data is descriptive information
(it describes something)
• Quantitative data is numerical information (numbers)
What do we know about the Dog?
DATA…
• Qualitative:
- He is brown and black
- He has long hair
- He has lots of energy
• Quantitative:
• Discrete:
• He has 4 legs
• He has 2 brothers
• Continuous:
• He weighs 25.5 kg
• He is 565 mm tall
DATA TYPES
I. Categorical Data (Nominal, Ordinal)
II. Numerical Data (Discrete, Continuous, Interval, Ratio)
• Categorical Data represents characteristics. Therefore it can represent
things like a person’s gender, language etc. Categorical data can also
take on numerical values (Example: 1 for female and 0 for male).
(numbers don’t have mathematical meaning)
NOMINAL DATA
Nominal data:
Nominal values represent
discrete units and are used to
label variables, that have no
quantitative value. Just think
of them as “labels”. Note that
nominal data that has no order.
Therefore if you would change
the order of its values, the
meaning would not change.
ORDIANL DATA
Ordinal values represent discrete (counted), and ordered units. It is
therefore nearly the same as nominal data, except that it’s ordering
matters.
Interval & Ratio
‘Interval’ indicates ‘distance Ratio is defined as a variable
between two entities’, which is measurement scale that not only
what Interval scale helps in produces the order of variables but
achieving. also makes the difference between
variables. It is calculated by
Likert's scale, Net Promoter assuming that the variables have an
Score, Semantic Differential option for zero.
Scale, Bipolar Matrix Table are
the most-used interval scale
examples.
ii.Numerical data
Discrete data can only take on certain values, can’t be measured but it can be counted. It
basically represents information that can be categorized into a classification.
Example:
Money, temperature, Volume and time
DATA
• Data is the base for all operations in Statistics
• Data is a collection of facts, such as numbers, words, measurements,
observations or even just descriptions of things.
Example :
The data shown below are marks scored by a person in Five Math
tests.
45, 23, 67, 82, 71
This data used to compare the scores and learn the progress.
• Qualitative data is descriptive information
(it describes something)
• Quantitative data is numerical information (numbers)
What do we know about the Dog?
DATA…
• Qualitative:
- He is brown and black
- He has long hair
- He has lots of energy
• Quantitative:
• Discrete:
• He has 4 legs
• He has 2 brothers
• Continuous:
• He weighs 25.5 kg
• He is 565 mm tall
DATA TYPES
I. Categorical Data (Nominal, Ordinal)
II. Numerical Data (Discrete, Continuous, Interval, Ratio)
• Categorical Data represents characteristics. Therefore it can represent
things like a person’s gender, language etc. Categorical data can also
take on numerical values (Example: 1 for female and 0 for male).
(numbers don’t have mathematical meaning)
NOMINAL DATA
Nominal data:
Nominal values represent
discrete units and are used to
label variables, that have no
quantitative value. Just think
of them as “labels”. Note that
nominal data that has no order.
Therefore if you would change
the order of its values, the
meaning would not change.
ORDIANL DATA
Ordinal values represent discrete (counted), and ordered units. It is
therefore nearly the same as nominal data, except that it’s ordering
matters.
Interval & Ratio
‘Interval’ indicates ‘distance Ratio is defined as a variable
between two entities’, which is measurement scale that not only
what Interval scale helps in produces the order of variables but
achieving. also makes the difference between
variables. It is calculated by
Likert's scale, Net Promoter assuming that the variables have an
Score, Semantic Differential option for zero.
Scale, Bipolar Matrix Table are
the most-used interval scale
examples.
ii.Numerical data
Discrete data can only take on certain values, can’t be measured but it can be counted. It
basically represents information that can be categorized into a classification.
Example:
Money, temperature, Volume and time
PARAMETRIC AND NON PARAMETRIC TEST
• Parametric statistics are based on assumptions about the distribution of
population from which the sample was taken. Nonparametric statistics are not
based on assumptions, that is, the data can be collected from a sample that does
not follow a specific distribution.
• Common parametric statistics are, for example, the Student's t-tests. Common
nonparametric statistics are, for example, the Mann-Whitney-Wilcoxon
(MWW) test or the Wilcoxon test.
• Background of parametric and nonparametric statistics
In parametric statistics, the information about the distribution of the population
is known and is based on a fixed set of parameters. In nonparametric statistics,
the information about the distribution of a population is unknown, and the
parameters are not fixed, which makes is necessary to test the hypothesis for the
population.
• Usage of parametric and nonparametric statistics
To decide whether to use parametric or nonparametric statistics, you should
consider several criteria about the sample data and the assumptions, and
carefully evaluate the validity of those assumptions.
Parametric
• T-test
tests
An Independent Samples t-test compares the means for two groups.
A Paired sample t-test compares means from the same group at
different times (say, one year apart).
A One sample t-test tests the mean of a single group against a known
mean.
• Correlation
- identify whether two or more variables are significantly related to
each other
• Regression
A technique for determining the statistical relationship between two or
more variables where a change in a dependent variable is associated
with, and depends on, a change in one or more independent variables
Parametric test(contd..)
ANOVA test
• The only difference between one-way and two-way ANOVA is the number of
independent variables. A one-way ANOVA has one independent variable, while a two-way
ANOVA has two.
• One-way ANOVA: Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka)
and race finish times in a marathon.
• Two-way ANOVA: Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka),
runner age group (junior, senior, master’s), and race finishing times in a marathon.
Non Parametric Test
Mann – Whitney Rank Sum ‘U’ test
The Mann-Whitney U test is used to compare whether there is a difference in the
dependent variable for two independent groups.
Wilcoxn sign rank test used to compare two related samples, matched samples, or
to conduct a paired difference test of repeated measurements on a single sample to
assess whether their population mean ranks differ.
Chi square test designed to test for a statistically significant relationship between
nominal and ordinal variables organized in a bivariate table.