Professional Documents
Culture Documents
Statistical Techniques in Scientific Research: Statistics
Statistical Techniques in Scientific Research: Statistics
Statistical Techniques in Scientific Research: Statistics
STATISTICS
Descriptive Statistics, in fact, refers to analysis, synthesis of data so that better description of the situation can be made
thereby promoting better understanding of the facts.
Inductive Statistics is concerned with procedure in which values of a group to be estimated by examining small portion
of that group. The group is known as ‘population’ or ‘universe’ and the portion is known as ‘sample’. Concerned values
in the sample are known as statistics and the values in the population are known as parameters. Thus, inductive
statistics is concerned with estimating universe parameters from the sample statistics. Inductive statistics is also known
as ‘Inferential Statistics’ or Sampling Statistics’.
Deductive Statistics concerned with establishment of rules and procedures for choosing one course from alternative
courses of actions under situations of uncertainty. Deductive Statistics uses probability theory and it provides a rational
basis for dealing with situations influenced by chance-related factors.
Research comprises
Purpose of Research:
1. The purpose of research is to discover answer to questions through application of scientific procedures. It is to
find out the truth which is hidden and which has not been discovered yet.
2. It is to gain familiarity with phenomenon or to achieve new insight into it. –Exploratory or formulative research
3. To portray accurately the characteristics of a particular individual, situation or a group. – Descriptive Research
4. To determine the frequency with which something occurs or with which it is associated with something else. –
Diagnostic Research
5. To test a hypothesis of casual relationship between variables. – Hypothesis-testing Research.
Formulate Hypothesis
(a) The sampling design which deals with the method of selecting items to be observed for the given study;
(b) The observational design which relates to the conditions under which observations are to be made;
(c) The statistical design concerns with the question of how many items are to be observed and how the
information and data gathered to be analyzed;
(d) The operational design which de deals
als with the techniques by which the procedures specified in the sampling,
statistical and observational designs can be carried out.
We focus our attention on the employment of statistics in data analysis and sampling part of any research.
Measure of central tendency can represent the values of variable as a single value but it cannot reveal the entire feature
of the data. Particularly, it fails to give an idea about the scatter of the values of a variable around the true value of
average. In order to measure this scatter, we have measure of dispersion as a statistical device.
Mesure of Dispersion
Measures of Relationship: So far, we have dealt with those statistical measures that are useful in the context of
univariate population. Pairs of values (X, Y), where for every measurement of a variable X, we have corresponding value
of second variable, Y, is known as bivariate popopulation
pulation and sometime in addition we may have corresponding
corres value of
third variable, Z or even values of more number of variables, in which case tuples of values are called multivariate
population.
In case of bivariate or multivariate populations, we are iinterested in knowing relation of the two and/or more variables
in the data to one another. For example, one may be interested in knowing whether the number of hours students
devote for studies (H) is somehow related to their family income(I), to age(A), to sex(S) or to any other similar factor. In
other words, the question is if there exists a function ‘F’ such that H = F(I, A, S, … ). There are several methods of
determining the relationship between variables, but no methods would tell us with certainty th that a correlation is
indicative of causal relationship. So, we have the following basic types of questions in the case of bivariate or
multivariate population:
The first equation answer through correlation technique and the second is through regression technique.
Mesure of Relation
Bivariate Multivariate
Simple regression, in which case we deal with bivariate population, explains to what extent an independent variable (X)
influence in changes in dependent variable and relationship is given by
called regression equation of Y on X. In this case the regression coefficient is the change Y corresponding to a unit
change in X. Similarly, the multiple regression equation is in the form
Estimation of Parameter:
In most of the research, conducting a census study is practically not possible. The usual approach is to make
generalization or to draw inferences based on samples about the characteristics of population from which the samples
are taken. The characteristics of population of researcher’s concern are known as parameters (often called paramparameters
of interest). As a researcher selects a few items from the population for his study and this collection of selected items is
known as sample. Sampling is done with the assumption that the sample data enables the researcher to estimate the
parameter. Sample should be truly representative of population without any bias so that the conclusion derived from
the sample is valid and reliable. But the fact is irrespective of the method adopted by the researcher, it is near
impossible to make such estimation free ree from any error unless a census study is made. In the following diagram, the
occurrence of sampling error is explained.
Precision is the range within which the population parameter lie in accordance with reliability specified in the confidence
level as a percentage of the estimate as a numerical quantity. The confidence level or reliability is the expected
percentage of items that the actual values fall within the stated precision limits.
The estimate of a population parameter may be one single value or it could be a range of values. If the estimate is one
single value, it is referred as point estimate, whereas in the range of values case it is termed as interval estimate.
(i) An estimator should on the average be equal to the value of the parameter being estimated. (Property of
Unbiased ness)
(ii) An estimator should have relatively less variance. (Property of efficiency)
(iii) An estimator should use as much as possible the information available from sample (Property of Sufficiency)
(iv) An estimator should approach the value of parameter as the sample size becomes larger and larger.
(Property of Consistency)
If the population mean is the parameter of researcher’s interest, then the point estimator of population mean ( µ ) is X
, the sample mean. The interval estimator for the mean µ is given by the interval around X for certain degree of
confidence with the help of Standard error.
Standard error is determined by the sampling distribution, in fact, through the variation of concerned statistics for all
different samples of same size. In case of population mean the standard error is obtained by the expression
SE = σ X =
s
=
∑ (x − x)
i
2
n n ( n −1) .
Similarly, we can discuss estimation of all other parameters associated with the population.
In the context of estimation of population mean above, it has been noted that limits of interval estimation are quite
influenced by the number of items chosen in the sample. So, the determination of appropriate si size of the sample in the
sample design is very important for suitable reliability of the conclusion. Size of the sample should be determined by a
researcher keeping the following points:
(ii) Number of classes proposed: If many class groups are to be formed, a large sample would be require
required
because a small sample may not be able to give reasonable number of items in each class
class-group.
(iii) Nature of Study: If items are to be intensively and continuously studied, the sample should be small. For a
general survey the size of the sample should be la large,
rge, but small sample is considered appropriate in
technical survey.
(iv) Type of sampling: Sampling technique plays an important part in determining the size of the sample. A small
random sample is apt to be much superior to a larger but badly selected sample.
(vi) Availability of finance: In practice, the size of the sample depends upon the amount of money available for
the study purposes. This factor should be kept in view while determining the size of the sample. Larger
sample result in increasing the cost of sampling estimates.
(vii) Other considerations: Nature of units, size of population, size of questionnaire, availability of trained
investigators, the conditions under which the sample is being conducted, the time available for completion
of the study are few other considerations to which a researcher must pay attention while selecting the size
of the sample.
In general, we follow the following diagram while deciding the sample size.
Determine the
sample size
Identify the
Objectives of subject to
Parameters of
Research confidence level
Interest
and precesion
expected
Note that the limits of confidence interval for the Mean of Population is by
σ
X ± z.SE = X ± z ,
n
If the researcher like to estimate the mean of population within desired precision ± e , then get
σ z 2σ 2
e=z and therefore n = .
n e2
σ N −n z 2σ 2 N
e = z.SE = z and therefore n =
n N −1 ( N − 1)e 2 + z 2σ 2
Many a times, the standard deviation of population is not known and sample is not yet taken, rough estimate of the
population is given by
Range in the above may have to be obtained from past records or through a pilot survey of large number of items.
If we are to find the sample size for estimating a proportion of population, our reasoning remains similar to what we
have said in the context of population mean. It is required to specify the precision and the confidence level and then
estimate the sample size as under:
pq
SE = σ p = (in case of infinite population)
n
pq N − n
SE = σ p = (in case of finite population of size N)
n N −1
Where, p is the sample proportion, q = 1-p, z is the standard variate for appropriate confidence level and n is the sample
size.Further, confidence interval for the population proportion is given by
p ± z.SE
If e is the precision rate, the acceptable error then the sample size can be expressed as
z 2 pq
n= (in case of infinite population)
e2
z 2 pqN
n= 2 (in case of finite population)
e ( N − 1) + z 2 pq
So, depending on the objectives and the parameter of interest, the method of identifying the sample size varies.
In most of the researches, it is in practice to make some conjecture related to possible conclusion while designing the
research and test the same after obtaining the sample data. In fact, hypothesis is an assumption or some supposition to
be proved or rejected.
Definition: Hypothesis is a proposition or a set of propositions set forth as an explanation for the occurrence of some
specified group of phenomena either asserted merely as a provisional conjecture to guide some investigation or
accepted as highly probable in the light of established facts.
Answer
Depends on what hypothesis a researcher framed and how the variable is measured
With reference to same objective, we may have several hypotheses in place and the methods to be employed may be
different in each case. In the following, we have several hypotheses, where (i)-(v) deals with objective of verifying the
impact of counseling in students showing good performance. In each of the case, research designs are different.
Examples:
Characteristic of Hypothesis:
Type I error: The error of rejecting the hypothesis when it should have been accepted is known as type I error.
Type II error: The error of accepting the hypothesis when it should have been rejected is known as type II error.
The probability of Type I error is usually determined in advance and understood as level of significance of testing the
hypothesis. If the type I error is fixed at 5%, it means that there are about 5 chances in 100 that we reject H 0 when H 0
is true. But with a fixed sample size, n, when we try to reduce the type I error, the probability of committing type II error
increases. Both type of error cannot be reduced simultaneously. In testing of hypothesis, subject to level of significance,
we identify the reject region for test statistics in order to reject or accept null hypothesis.
State H 0 as well H 1
Calculate the probability that sample result would diverge as widely as it has from
expectations, if the null hypothesis were true (find z-value or t-value for the purpose)
Compare this probability with significance level( α / 2 in case of two tailed test; α in
case of one tail test).(Find whether calculated z or t value is in the rejection region)
Yes No
Reject H 0 Accept H 0
Example 1.
In this case benchmark of greater performance is to be defined. If obtaining a CGPA of more than 7.5 is a good
performance, then we the following Null and Alternative Hypothesis
Example 2.
In this case we compare the performance of student after the counseling with same student’s performance before the
counseling
Example 3.
Hypothesis: Student who receive counseling perform better than the others casual in receiving the counseling
Here we may have category of students, namely, students not receiving the counseling, students those who were
receive counseling casually and the other category is serious in taking the counseling. In this case we compare the
performance of two groups of students.
Alternative hypothesis: Performance of students receiving the counseling seriously is better than that of group of
students receiving the counseling casually.
Examples 4.
Example 5.
Null Hypothesis : No correlation between the variables – No Hrs of counseling received & Performance
(Correlation test)
Example 6.
Hypothesis: Family income status (Poor, Middle, Rich) has no association with performance (Poor, Mediocre, Excellent)
of student.
In this case we are dealing with attributes rather than variables. We shall have nonparametric test for the purpose.
(Chi-Square test)
When more than one variable influence on a dependent variable, we may employ multiple regression techniques.
Similarly in reducing the number of factors to a few, we employ factor analysis.
(i) The test should not be used in a mechanical fashion. It should be kept in view that testing is not decision
making itself; the tests are only useful aids for decision-making. Hence proper interpretation of statistical
evidence is important to intelligent decision.
(ii) Tests do not explain the reasons as to why do the difference exist. They simply indicate whether the
difference is due to fluctuations of sampling or because of other reasons but tests do not tell us as to which
is/are the other reason(s) causing the difference.
Ref: CR Cothari
Data :
Gender
Gender
14
Cumulative 12
6
Female 12 60.0 60.0 100.0 4
Total
Frequency
20 100.0 100.0 2
0
Male Female
Age class
Cumulative 12
Frequency
2
Total 20 100.0 100.0 0
<=22 23 - 26 > 26
Age class
Sales class
Sales class
Cumulative 12
Frequency
More than 2
Region
Region
Cumulative 10
0
State 1 State 2 State 3
Descriptive Statistics
One-Sample Test
Test Value = 24
95% Confidence Interval
of the Difference
Mean
t df Sig. (2-tailed) Difference Lower Upper
Age .946 19 .356 .60 -.73 1.93
80 5
2
9
13 10
70
60
3 11
1
50
7
16
40
14 20
15
Sales (in Rs'000)
18
12
17
30
6
8
20 19
4
10
18 20 22 24 26 28 30 32
Age
Correlations
No Correlation between age
Sales (in
Age Rs'000) and sales
Age Pearson Correlation 1 .118
Sig. (2-tailed)
N
. .619 &
20 20
Sales (in Rs'000) Pearson Correlation .118 1
Sig. (2-tailed) .619 . Analysis continues
N 20 20