Statistics Ready Reckoner

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

STATISTICS READY RECKONER

* Dr. Manit Mishra

 Population: A collection of all the elements we are studying and about which we are
trying to draw conclusions.
 Census: Examination of every person or item in the population.
 Sample: A collection of some, but not all, of the elements of the population
 Advantages of sample:
 Costs less.
 Takes less time.
 Reduces risk.
 Can raise the quality level.
 Parameter: It describes the characteristics of a population.
Statistic: It describes the characteristics of a sample.
 Probability sampling: All items in the population have a chance of being chosen in the sample.
 Simple random sampling: Each possible sample has an equal probability of being
picked and each item in the entire population has an equal chance of being included in
the sample.
 Systematic sampling: Elements are selected from the population at a uniform
interval that is measured in time, order or space.
 Stratified sampling: The population is divided into relatively homogeneous groups
and then a specified number of elements corresponding to the proportion of that
stratum in the population is selected.
 Cluster sampling: The population is divided into clusters and then a random sample of
these clusters is selected.
 Data: The collection of any number of related observations.
 Variable: Any aspect that takes different values as a result of the outcomes of a random
experiment.
 Frequency distribution: Describes the number of observations from the data set against
the different outputs of a variable. It is a listing of the observed frequencies of all the outcomes
of an experiment that actually occurred when the experiment was done.
 Central tendency: It is the middle point of a distribution. The measures of central tendency
are arithmetic mean, geometric mean, median and mode.
 Dispersion: It describes the spread of the data in a distribution, that is, the extent to which the
observations are scattered. The measures of dispersion are:
 Variance: The average of the sum of the squared distances between the mean and each
item in the population. Population variance is denoted by σ2 whereas the sample
variance is denoted by s2.

*Dr. Manit Mishra is Associate Professor (Marketing & QT) at International Management
Institute, Bhubaneswar. He can be contacted at: manit.mishra@imibh.edu.in.
 Standard deviation: It is the square root of the variance. Population
variance is denoted by σ whereas the sample variance is denoted by s.
 Skewness: Curves representing data points such that values in the frequency
distribution are concentrated either at lower end (positively skewed) or higher end
(negatively skewed).
 Kurtosis: It is a measure of the peakedness of the frequency distribution.
 Probability distribution: It is a theoretical frequency distribution. It is a listing of
the probabilities of all the possible outcomes that could result if the experiment
were done.
 Discrete probability distribution: It can take only a limited number of
values.
 Continuous probability distribution: The variable under consideration
can take any value within a given range.
 Normal probability distribution: It is unimodal and bell shaped. The mean lies at
the center of its normal curve. The values for mean, median and mode are same. The
two tails extend indefinitely and never touch the horizontal axis.
 Standard normal probability distribution: It is a normal distribution with mean
(µ=0) and standard deviation (σ=1).
 Sampling distribution of mean: A probability distribution of all the possible means
of the samples.
 Standard error of mean: The standard deviation of the distribution of sample mean.
. It indicates not only the size of the chance error that has been made, but also the
accuracy that we are likely to get if we use a sample statistic to estimate a population
parameter.
 Central limit theorem: It states that the sampling distribution of the mean
approaches normality as the sample size increases. It permits us to use sample
statistics to make inferences about population parameters without knowing anything
about the shape of the frequency distribution. It also conveys that the mean of the
sampling distribution of mean will equal the population mean, regardless of the
sample size, even if the population is not normal.
 Estimates: We make inferences about characteristics of populations from
information contained in samples. There are two types of estimates. A point estimate
is a single number that is used to estimate an unknown population parameter. An
interval estimate is a range of values used to estimate a population parameter.
 Confidence Interval and Confidence level: Confidence interval is the range of the
estimate we are making. The probability that we associate with an interval
estimate is called as the confidence level. The most commonly used confidence
levels are 90%, 95%, and 99% but we are free to apply any confidence level. A high
confidence level will produce a larger confidence interval, and such large intervals
are not precise. A 95% confidence level means that if we select many random
samples of the same size and calculate a confidence interval for each of these
samples, then in about 95% of these cases, the population mean will lie within that
interval.
 Sample size: The sample size depends upon confidence level (standard normal
variate), standard deviation and confidence interval (precision required). The sample
size has to be optimum since there is a diminishing return in increasing sample size.
 Hypothesis: It is an assumption about a population parameter.
 Level of significance: It is the percentage of sample means that is outside certain
limits, assuming that the null hypothesis is correct. The higher the significance level
we use for testing a hypothesis, the higher the probability of rejecting a null
hypothesis when it is true.
 Type I and type II error: Rejecting a null hypothesis when it is true is called a Type I
error and is denoted by α whereas accepting a null hypothesis when it is false is
called a Type II error and is denoted by β. In social science we prefer a type II error
rather than making a type I error.
 Two-tailed test of hypothesis: It will reject the null hypothesis if the sample mean
is significantly higher than or lower than the hypothesised population mean.
 One-tailed test of hypothesis: It will reject the null hypothesis if the sample
mean is either significantly higher than (Right tailed) or lower than (Left tailed) the
hypothesised population mean.
 P-value: The p-value is the probability of getting a value of sample statistic this
far or farther from the hypothesised population mean. It is precisely the largest
significance level at which we would accept the null hypothesis. Whenever the p-value
is less than the level of significance, reject null hypothesis.
 t-test: It can be used whenever:
The sample size is 30 or less.
The population standard deviation is not known.
Population can be assumed to be normal.
t-test can be used for fulfilment of following purposes:
For hypothesis testing of mean when the population standard deviation is not
known.
To test for the difference between means for small sample sizes in case of
independent samples (i.e.
when the respondents of the two groups are different).
To test for the difference between means for small sample sizes in case of dependent
samples (i.e. when the respondents of the two groups are same but separated by time
or any other variable).
Chi-square test: It is a very useful non-parametric test. It can be used to fulfil the
following objectives:
To test the independence of two attributes (Null hypothesis should state that the
attributes are not dependent on each other).
As a test of goodness of fit to decide whether a particular probability distribution is
the appropriate distribution.
To compare more than two sample proportions, and
To draw inference about a population variance.
ANOVA (analysis of variance): It uses the F-ratio (the ratio of variance between
samples and the variance within samples) to:
Test for the significance of the differences among more than two sample means.
Test for the significance of the difference between two sample variances.
Correlation analysis: It is the statistical tool that can be used to describe the degree to
which one variable is linearly related to another. The coefficient of correlation (r)
varies between -1 and +1 indicating the direction of relationship. The coefficient of
determination (r2) is the square of the coefficient of correlation. It varies between 0
and +1. It measures the strength of a linear relationship between two variables. For
example, if the coefficient of determination is 0.25, it indicates the amount of variation
in the dependent variable that is explained by the independent variable.
 Regression analysis: It shows us how to determine both the nature and strength of a
relationship between two variables. We develop an estimating equation that relates the
known variables to the unknown variable. The known variable is called the
independent variable and the variable we are trying to predict is the dependent
variable. We should try to find the estimating equation that minimises the sum of the
squares of the errors. The standard error of estimate measures the variability of the
observed values around the regression equation.
 Non-parametric tests : They do not make restrictive assumptions about the shape of
population distributions. These are also known as distribution-free tests. The sign test
is based on the direction (or signs for pluses or minuses) of a pair of observations, not
on their numerical magnitude. It is used to test for the difference between paired
observations where + and – signs are substituted for quantitative values. The Mann-
Whitney U test is used to compare the means of two populations. Run test is used to
determine the randomness with which the items in a sample have been selected.

You might also like