Professional Documents
Culture Documents
RM Unit 2
RM Unit 2
1. Type of universe
• The first step in developing any sample design is to clearly define
the set of objects, technically called the Universe, to be studied.
• The universe can be finite or infinite.
2. Sampling unit
• A decision has to be taken concerning a sampling unit before
selecting sample.
• Sampling unit may be a geographical one such as state, district,
village, etc. or a construction unit such as house, flat, etc., or it
may be a social unit such as family, club, school, etc., or it may be
an individual.
• The researcher will have to decide one or more of such units that
he has to select for his study.
Steps in Sample Design Cont…
3. Source list
• It is also known as ‘sampling frame’ from which sample is to
be drawn.
• It should be comprehensive, correct, reliable and
appropriate.
• It is extremely important for the source list to be as
representative of the population as possible.
4. Size of sample
• This refers to the number of items to be selected from the
universe to constitute a sample.
• The size of sample should neither be excessively large, nor
too small. It should be optimum.
• An optimum sample is one which fulfills the requirements of
efficiency, representativeness, reliability, and flexibility.
Steps in Sample Design Cont…
5. Parameters of interest
• In determining the sample design, one must consider the
question of the specific population parameters which are
of interest.
6. Budgetary constraint
• Cost considerations
7. Sampling procedure
• Finally, the researcher must decide the type of sample he
will use i.e., he must decide about the technique to be
used in selecting the items for the sample.
Types of sampling
Sampling
Non-
Probability
Probability
Sampling
Sampling
Probability Sampling
Probability
sampling is also known as ‘random sampling’ or
‘chance sampling’.
Sampling
Simple
Systematic Stratified Cluster Multi-Stage
Random
sampling Sampling Sampling Sampling
Sampling
SIMPLE RANDOM SAMPLING
where σ1, σ2 , ... and σk denote the standard deviations of the k strata,
N1, N2,…, Nk denote the sizes of the k strata and n1, n2,…, nk denote
the sample sizes of k strata. This is called ‘optimum allocation’ in the
context of disproportionate sampling. The allocation in such a
situation results in the following formula for determining the sample
sizes different strata:
Numerical
Cluster Sampling
The population is divided into subgroups (clusters) like families. A
simple random sample is taken of the subgroups and then all
members of the cluster selected are surveyed.
Stratified Sampling Vs Cluster Sampling
Multistage Sampling
In this type of sampling, items for the sample are selected
deliberately by the researcher; his choice concerning the items
remains supreme.
Quota Sampling
Convenience Sampling/
Snowball Sampling
Then judgment used to select subjects or units from each segment based on
a specified proportion.
Forexample, an interviewer may be told to sample 200 females and 300
males between the age of 45 and 60.
CONVENIENCE SAMPLING
Ordinal Scale
Interval Scale
Ratio Scale
Nominal Scale
• Nominal scale is simply a system of assigning number symbols to
events in order to label them.
• In this case, 100° > 70° or 95° < 135° which simply means that
110° is warmer than 70° and that 95° is cooler than 135°.
• And, 95° – 70° = 135° – 110°, it sense that the same amount of
heat is required to raise the temperature of an object from 70° to
95° or from 110° to 135°.
•
Interval scale
• On the other hand, it would not mean much if we said that 126°F is
twice as hot as 63°F, even though 126°/63° = 2.
• To show the reason, we have only to change to the centigrade scale,
where The first temperature becomes 5/9 (126 – 32) = 52°, the
second temperature becomes 5/9 (63 –32) = 17° and the first figure is
now more than three times the second.
• This difficulty arises from the fact that Fahrenheit and Centigrade
scales both have artificial origins (zeros) i.e., the number 0 of neither
scale is indicative of the absence of whatever quantity we are trying to
measure.
Ratio scale
• When in addition to setting up inequalities and forming differences we
can also form quotients (i.e., when we can perform all the customary
operations of mathematics), we refer to such data as ratio data.
• Ratio data includes all the usual measurement (or determinations) of
length, height, money amounts, weight, volume, area, pressures, etc.
Methods of Data Collection
Types of data:
This method of data collection is very useful in extensive enquiries and can lead to
fairly reliable results. It is, however, very expensive and is usually adopted in
investigations conducted by governmental agencies or by some big organisations.
Population census all over the world is conducted through this method.
DIFFERENCE BETWEEN QUESTIONNAIRES
AND SCHEDULES
Both questionnaire and schedule are popularly used methods of collecting
data in research surveys. There is much resemblance in the nature of these
two methods. But from the technical point of view there is difference between
the two. The important points of difference are as under:
The questionnaire is generally sent through mail to informants to be
answered as specified in a covering letter without further assistance from
the sender. The schedule is generally filled out by the research worker or
the enumerator, who can interpret questions when necessary.
Due to the above mentioned reason, before using a secondary data one
must see that they possess following characteristics i.e. reliability,
suitability, and adequacy. If answer comes yes then only move
further with the data obtained.
Secondary Data Collection
Reliability
The reliability can be tested by finding out such things about the said data:
Who collected the data?
What were the sources of data?
Were they collected by using proper methods
At what time were they collected?
Was there any bias of the compiler?
What level of accuracy was desired?
Was it achieved ?
Suitability
The data that are suitable for one enquiry may not necessarily be found suitable in
another enquiry. The researcher must very carefully scrutinize the definition of various
terms and units of collection used at the time of collecting the data from the primary
source. Similarly, the object, scope and nature of the original enquiry must also be
studied. If the researcher finds differences in these, the data will remain unsuitable for
the present enquiry and should not be used.
Secondary Data Collection
Adequacy
If the level of accuracy achieved in data is found inadequate for the purpose
of the present enquiry, they will be considered as inadequate and should not
be used by the researcher.
The data will also be considered inadequate, if they are related to an area
which may be either narrower or wider than the area of the present enquiry.
Guidelines for Constructing Questionnaire /
Schedule
There are no hard-and-fast rules about how to design a questionnaire,
but there are a number of points that can be borne in mind:
Choose
the Put
Check
Decide Define method(s question
Decide Develop the Pre-test Develop
the the ) of s into a
on the length of the the final
informati target reaching meaningf
question question the question survey
on responde your ul order
content. wording. question naire. form.
required. nts. target and
naire.
responde format.
nts.
Steps in Data Pre Processing
Data Pre-processing refers to the cleaning, transforming, and integrating of data
in order to make it ready for analysis. The goal of data preprocessing is to
improve the quality of the data and make it more suitable for the specific data
analysis.
Steps Involved in Data Preprocessing:
1. Data Cleaning:
The data can have many irrelevant and missing parts. To handle this part, data
cleaning is done. It involves handling of missing data, noisy data etc.
(a). Missing Data:
This situation arises when some data is missing in the data. It can be handled in
various ways.
Some of them are:
Ignore the tuples:
This approach is suitable only when the dataset we have is quite large and
multiple values are missing within a tuple.
Fill the Missing values:
There are various ways to do this task. You can choose to fill the missing values
manually, by attribute mean or the most probable value.
Steps in Data Pre Processing
(b). Noisy Data:
Noisy data is a meaningless data that can’t be interpreted by algorithms.
It can be generated due to faulty data collection, data entry errors etc. It
can be handled in following ways :
Binning Method:
This method works on sorted data in order to smooth it. The whole data is
divided into segments of equal size and then various methods are
performed to complete the task. Each segmented is handled separately.
One can replace all data in a segment by its mean or boundary values
can be used to complete the task.
Regression:
Here data can be made smooth by fitting it to a regression function. The
regression used may be linear (having one independent variable) or
multiple (having multiple independent variables).
Clustering:
This approach groups the similar data in a cluster. The outliers may fall
outside the clusters.
Steps in Data Pre Processing
2. Data Transformation:
This step is taken in order to transform the data in appropriate forms
suitable for analysis process. This involves following ways:
Normalization:
It is done in order to scale the data values in a specified range (-1.0 to
1.0 or 0.0 to 1.0)
Attribute Selection:
In this strategy, new attributes are constructed from the given set of
attributes to help the analysis.
3. Data Reduction:
Data reduction is a crucial step in the data mining process that involves
reducing the size of the dataset while preserving the important
information. This is done to improve the efficiency of data analysis and to
avoid overfitting of the model. Some common steps involved in data
reduction are:
Feature Selection: This involves selecting a subset of relevant features
from the dataset. Feature selection is often performed to remove
irrelevant or redundant features from the dataset. It can be done using
various techniques such as correlation analysis, mutual information, and
principal component analysis (PCA).
Steps in Data Pre Processing
• Feature Extraction: This involves transforming the data into a lower-
dimensional space while preserving the important information. Feature
extraction is often used when the original features are high dimensional
and complex. It can be done using techniques such as PCA, linear
discriminant analysis (LDA), and non-negative matrix factorization
(NMF).
• Sampling: This involves selecting a subset of data points from the
dataset. Sampling is often used to reduce the size of the dataset while
preserving the important information. It can be done using techniques
such as random sampling, stratified sampling, and systematic
sampling.
• Clustering: This involves grouping similar data points together into
clusters. Clustering is often used to reduce the size of the dataset by
replacing similar data points with a representative centroid. It can be
done using techniques such as k-means, hierarchical clustering, and
density-based clustering.