Statistics

STATISTICS is the science of classifying, organizing, and analyzing data.
In this sense, the

word is singular. It is used as a science, a specialization within the field of mathematics.
In today’s fast-paced world, statistics is playing a major role in the field of research; that helps in
the collection, analysis and presentation of data in a measurable form. This includes all types of
research, majorly psychological.
PSYCHOLOGICAL STATISTICS is application of formulae, theorems, numbers and laws to
psychology. Statistical methods for psychology include development and application statistical
theory and methods for modeling psychological data. These methods include psychometrics,
factor analysis, experimental designs, Multivariate Behavioural Research.
Statistics allows us to make sense of and interpret a great deal of information. For example,
consider the sheer volume of data you encounter in a given day. How many hours did you
sleep? How many students in your class ate breakfast this morning? How many people live
within a one-kilometre radius of your home? By using statistics, we can organize and interpret
all of this information in a meaningful way.
Statistics allows psychologists to:
1. ORGANISE DATA - When dealing with an enormous amount of information, it is all too
easy to become overwhelmed. Statistics allow psychologists to present data in ways that
are easier to comprehend. For example, one can use virtual displays like charts,
diagrams, etc.
2. DESCRIBE DATA - Using statistics, we can accurately describe the information that has
been gathered in a way that is easy to understand. Descriptive statistics provide a way
to summarize what already exists in a given population, such as how many men and
women there are, how many children there are, or how many people are currently
employed.
3. MAKE INFERENCES BASED ON DATA - Psychologists use the data they have
collected to test a hypothesis or a guess about what they predict will happen. Using this
type of statistical analysis, researchers can determine the likelihood that a hypothesis
should be either accepted or rejected.
There are two types of statistics: descriptive and inferential.
DESCRIPTIVE STATISTICS is that branch of statistics which is concerned with describing the
population under study. Descriptive Statistics refers to a discipline that quantitatively describes
the important characteristics of the dataset. For the purpose of describing properties, it uses
measures of central tendency, i.e. mean, median, mode and the measures of dispersion i.e.
range, standard deviation, quartile deviation and variance, etc.
The data is summarised by the researcher, in a useful way, with the help of numerical and
graphical tools such as charts, tables, and graphs, to represent data in an accurate way.
For example, if one wants to find the average of a data set, they will find the mean of the data
using the following formula: 𝜇 = (𝛴 𝑛1 + 𝑛2 + 𝑛3+. . . ) / 𝑁where the numerator is the sum of all
observations and the denominator is the number of occurring observations.
If you have the following data: 2,4,6,8,10,12,14
The mean = 2+4+6+8+10+12+14
= 56/7 = 8
_______________________________________________________________________________
7
Thus, the average of our data set is 8.
INFERENTIAL STATISTICS is used to make the generalisation about the population based on
the samples. Thus, it focuses on drawing conclusions about the population, on the basis of
sample analysis and observation. The results of analysis of the sample can be deduced to the
larger population, from which the sample is taken. It is a convenient way to draw conclusions
about the population when it is not possible to query each and every member of the universe.
The sample chosen is a representative of the entire population; therefore, it should contain
important features of the population. The major inferential statistics are based on the statistical
models such as Analysis of Variance, chi-square test, student’s t distribution, regression
analysis, etc. Methods of inferential statistics: estimation of parameters and testing of
hypothesis.
For example, the mean intelligence of a population is 100 and the standard deviation (a quantity
expressing by how much the members of a group differ from the mean value for the group) is
15. Our sample size is 900. How many people’s IQ is below 112?
z = x - 𝜇 / Sx = 112 - 100 / 15 = 12 / 15 = 0.8 = .2881 ( A z-score is the measure of how many

standard deviations below or above the population mean a raw score is)
Thus, those falling below 112 = .2881 + .5 = .7881 = n x .7881 = 900 x .7881 = 709 people.
So, there is a big difference between descriptive and inferential statistics, i.e. what you do with
your data.
BASIS OF COMPARISON DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS

WHAT IT DOES Organize, analyze and Compares, test and predicts
present data in a meaningful data.
way.
FORM OF FINAL RESULT Charts, Graphs and Tables. Probability.
USAGE To describe a situation. To describe a situation.
FUNCTION It explains the data, which is It attempts to reach the

already known, to summarize conclusion to learn about the
sample. population, that extends
beyond the data available.
TESTING A HYPOTHESIS AND THE TYPES OF TESTS:
In inferential statistics, the topic of interest is estimating population parameter from sample
statistic. In most cases, the statistic is used to test some hypothesis. A hypothesis is a precise
testable statement of what the researcher predicts will be the outcome of the study. This usually
involves proposing a possible relationship between two variables being studied.
For example, a. Do groups differ on some outcome variable?
b. Is the difference more than would be expected by chance?
c. Can one factor predict another?
In research, there is a convention that a hypothesis is written in two forms: null hypothesis
and alternative hypothesis.
The hypothesis that a researcher tests is called the null hypothesis, symbolised by H0. It is the
hypothesis that he/she will decide to retain or reject. Basically, it states that what is being
measured is the same as what is used for measuring; there is no difference between the two
variables being studied. The results are due to chance (5%) and not significant in terms of
supporting the idea being investigated.
For every null hypothesis there is also an alternative hypothesis, symbolised by Ha. It states
that there is a difference between the two variables being studied. One variable has an effect on
the other variable. The results are not due to chance (95%) and significant in terms of
supporting the idea being investigated.
They are both statements about the population parameter, and not the sample statistic. The
decision to retain or reject the hypothesis always refers to H0 and never to Ha. It is H0 that is the
subject of the test. The reason why we are interested in proving the a null hypothesis is that we
can never 100% prove an alternative hypothesis. What we can do is see if we can reject a null
hypothesis.
In order to understand the two hypotheses, it is critical to understand the two types of tests:
one-tailed and two-tailed.
ONE-TAILED TEST: The alternative hypothesis states that the population parameter differs
from the value stated in H0 in one particular direction and the critical region is located in only
one tail of the sampling distribution.
TWO-TAILED TEST: The alternative hypothesis states that the population parameter may be
either less than or greater than the value stated in H0 and the critical region is divided between
both tails of the sampling distribution.
For example, earlier, a pharmaceutical company stated that a particular drug X manufactured by
them took 10 minutes to show an effect (reaction time). After a few years, they challenge their
own statement by saying that it actually took less than 10 minutes. This would be a one-tailed
test because the direction of the tail of the distribution is stated (negative/left). The same goes
for if they stated that it actually took more time (positive/right direction). But if they would’ve said
the reaction time is actually different (not specifying if it was less or more), it would’ve been a
two-tailed test, because it could go either in the negative/left or positive/right direction.
LEFT (-VE) RIGHT (+VE)
THUS, H0 = 𝜇0 = 10
Ha = 𝜇0 < 10 (One-tailed)
Ha = 𝜇0 > 10 (One-tailed)
Ha = 𝜇0 ≠ 10 (Two-tailed)
Suppose the answer turns out to be 8 minutes. It is thus a lower reaction time than what was
earlier claimed by the company (10 minutes). But is it significant enough for generalising over
the population?
LEVEL OF SIGNIFICANCE: Level of significance is the probability value that is used as a

criterion to decide that an obtained sample statistic has a low probability of occurring by chance
if the null hypothesis is true. In simple terms, it is the value that rejects a null hypothesis.
If the sample mean is so different from what is expected when H0 is true that its appearance
would be unlikely to have occurred by chance, H0 should be rejected. Common research
practice is to reject H0 if the sample mean is so deviant that its probability of occurrence by
chance in random sampling is .05 (5%) or less. Such a criterion is called the level of significance
and is symbolized by the Greek letter 𝛼 (alpha). In some cases, the researcher may wish to be
even more stringent and use a level of significance of .01 or less, but it is rare to choose one
greater than .05.
REGION OF RETENTION & REJECTION: pg. 199.
(MARK regions, std. dev., %)
PROCEDURE OF HYPOTHESIS TESTING: (pg. 199)
CRITICAL VALUE (z): the value that separates the regions. One-tailed = 土 1.65, two-tailed = 土
1.96.
p-value: It is the probability, when H0 is true, of observing a sample mean as deviant as deviant
or more deviant than the obtained value of X bar.
This value tells the probability of obtaining any result by chance. It is not established in advance
and is not a statement of risk; it simply describes the rarity of a sample outcome if H0 is true.
The smaller the p-value, the greater is the evidence against H0 .
If the p-value is less than or equal to the level of significance, the sample result is considered
sufficiently rare to call for rejecting H0.
TYPES OF ERROR: TYPE I AND II
There are, so to speak, two “states of nature”: Either the null hypothesis, H0, is true or it is false.
Similarly, there are two possible decisions: we can reject the null hypothesis or we can retain
it. Taken in combination, there are four possibilities. TYPE I ERROR: If we reject H0 and in fact
it is true. TYPE II ERROR: If we retain H0 and in fact it is false.
H0 FALSE H0 TRUE
RETAIN TYPE II ERROR (denoted by CORRECT DECISION

beta)
REJECT CORRECT DECISION TYPE I ERROR (denoted by

alpha)
z-test AND t-test:
There are two hypothesis testing procedures, i.e. parametric test and non-parametric test,
wherein the parametric test is based on the fact that the variables are measured on an interval
scale, whereas in the non-parametric test, the same is assumed to be measured on an ordinal
scale.
t-test is a hypothesis test used by the researcher to compare population means for a variable,
classified into two categories depending on the less-than interval variable. More precisely, a t-
test is used to examine how the means taken from two independent samples differ.
T-test follows t-distribution, which is appropriate when the sample size is small, and the
population standard deviation is not known. The shape of a t-distribution is highly affected by
the degree of freedom. The degree of freedom implies the number of independent observations
in a given set of observations.
ASSUMPTIONS OF A t-TEST:
1. All data points are independent.
2. The sample size is small. Generally, a sample size exceeding 30 sample units is
regarded as large, otherwise small but that should not be less than 5, to apply t-test.
3. Sample values are to be taken and recorded accurately.
The test statistic is:
x i̅ s the sample mean
s is sample standard deviation
n is sample size
μ is the population mean
On the other hand, z-test is also a univariate test but is based on standard normal distribution.
Z-test is used to test the hypothesis when proportions from two independent samples differ
greatly. It determines to what extent a data point is away from its mean of the data set, in
standard deviation.
The researcher adopts z-test, when the population variance is known, in essence, when there is
a large sample size, sample variance is deemed to be approximately equal to the population
variance. In this way, it is assumed to be known, despite the fact that only sample data is
available and so normal test can be applied.
Assumptions of Z-test:
● All sample observations are independent

● Sample size should be more than 30.
● Distribution of Z is normal, with a mean zero and variance 1.
x i̅ s the sample mean
σ is population standard deviation
n is sample size
μ is the population mean
BASIS FOR COMPARISON T-TEST Z-TEST

MEANING T-test refers to a type of Z-test implies a hypothesis
parametric test that is applied test which ascertains if the
to identify, how the means of means of two datasets are
two sets of data differ from different from each other
one another when variance is when variance is given.
not given.
BASED ON Student-t distribution Normal distribution
POPULATION VARIANCE Unknown Known
SAMPLE SIZE Small Large

Statistics

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics

Uploaded by

Copyright:

Available Formats

STATISTICS is the science of classifying, organizing, and analyzing data.

In this sense, the

Statistics allows psychologists to:

There are two types of statistics: descriptive and inferential.

z = x - 𝜇 / Sx = 112 - 100 / 15 = 12 / 15 = 0.8 = .2881 ( A z-score is the measure of how many

BASIS OF COMPARISON DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS

FORM OF FINAL RESULT Charts, Graphs and Tables. Probability.

USAGE To describe a situation. To describe a situation.

FUNCTION It explains the data, which is It attempts to reach the

TESTING A HYPOTHESIS AND THE TYPES OF TESTS:

LEFT (-VE) RIGHT (+VE)

LEVEL OF SIGNIFICANCE: Level of significance is the probability value that is used as a

REGION OF RETENTION & REJECTION: pg. 199.

(MARK regions, std. dev., %)

PROCEDURE OF HYPOTHESIS TESTING: (pg. 199)

TYPES OF ERROR: TYPE I AND II

RETAIN TYPE II ERROR (denoted by CORRECT DECISION

REJECT CORRECT DECISION TYPE I ERROR (denoted by

The test statistic is:

x i̅ s the sample mean

s is sample standard deviation

● All sample observations are independent

x i̅ s the sample mean

σ is population standard deviation

μ is the population mean

BASIS FOR COMPARISON T-TEST Z-TEST

BASED ON Student-t distribution Normal distribution

POPULATION VARIANCE Unknown Known

SAMPLE SIZE Small Large

You might also like