Statistics in Psychology
NOTE: All questions are compulsory
Answer the following question in about 1000 words (wherever applicable) each.
Q. 1. Define Statistics. Discuss types scope and use of statistics.
Ans. The word statistics is derived from Latin word status or Italian Statista which means statesman. It was
used in the 18th century by Professor Gott Fried Achenwall. These words were used for political state during the
early period. Statista was used to keep the census records or data on states wealth. Its meaning and usage
gradually have changed.
Statistics conveys different meanings in singular and plural sense.
Statistics in Singular Sense
In singular sense, it is a branch of science that deals with classification, tabulation and analysis of numerical facts
and makes decision on that basis. It includes statistical methods for collection, classification, analysis and interpretations
of data.
Statistics in Plural Sense
In plural sense, statistics means that quantitative information or available data. For instance, informa-tion on
population or demographic features of a country, enrolment of students in a college are statistics.
Websters define statistics as the classified facts on the conditions of the people in a State and those facts can be
presented in number or in tables of number or classified arrangement.
In plural sense, Horace Secrist describes statistics as aggregates of facts affected to a marked extent by
multiplicity of causes numerically expressed, enumera-ted or estimated as per the reasonable standard of accuracy,
collected in a systematic manner for a pre-determined purpose and placed in relation to each other. Thus, statistics
should have the following characteristics:
1. They must be aggregate of facts i.e., no individual figure is regarded as statistics.
2. They are affected by multiplicity of factors; like circumstances. For example, any yield of crop is affected by
various circumstances i.e., soil, seed, rainfall and temperature etc.
3. They must be enumerated or estimated accord-ing to reasonable standards of accuracy. However, degree of
accuracy depends on nature of data. Again, whatever standard of accuracy is once adopted, it should be
maintained throughout the whole study.
4. They must be collected in a systematic manner for a predetermined purpose i.e., the data must be properly
5. They must be placed in relation to each other i.e., the facts should be comparable regarding time, space or
Thus, we can say that all statistics are numerical statements of facts, but all numerical statements of facts cannot
be called statistics.

Definition of Statistics
Statistics deals with classification, tabulation and analysis of numerical facts. Statisticians have defined these
aspects of statistics in different ways.
A.L. Bowley defines statistics in the following ways:
(i) Statistics is the science of counting. Its focus is on enumeration aspect.
(ii) Statistics is the science of average.
(iii) It is the science of measurement of social organism as a whole in all its manifestations.
Yet Bowely did not include all aspects of statistics with these definitions.
Selligman defines statistics as the science that deals with the methods of collecting, classifying, presenting,
comparing and interpreting numerical data collected with a purpose to reveal in an enquiry.

According to Croxton and Cowden, statistics is the collection, presentation, analysis and interpretation of numerical
data. This definition covers all aspects statistics.
Given below are different aspects of statistics:
Collection of Data: This is the first basic step in a study. Collection of data may be from primary or secondary
source or from both the sources as per the requirement.
Classification and Presentation: This process comes after data collection. It means arranging data in a format
to draw some conclusions. Classification of data means arrangement of data in groups as per their similarities.
Tabulation: It is the presentation of data in a table. Classified and tabulated data can be presented in diagrams
and graphs.
Analysis of Data: Analysis of data is conducted to process the observed data and transform it in such a manner
as to make it suitable for decision-making.
Interpretation of Data: Data is interpreted to make that data useful in real life. The quality of interpretation
depends on the experience of the researcher.
Types of Statistics
Statistics can be classified in two wayson the basis of functions and on the basis of distribution.
On the Basis of Functions
On this basis, statistics can be classified into three types:
(i) Descriptive statistics,
(ii) Correlational statistics, and
(iii) Inferential statistics.
(i) Descriptive Statistics: Descriptive statistics describes the main features of a collection of information. It
aims to summarize a sample, rather than use the data to learn about the population that the sample of data is thought
to represent. Some measures that are commonly used to describe a data set are measures of central tendency and
measures of variability or dispers-ion. Measures of central tendency include the mean, median and mode, while
measures of variability include the standard deviation, the minimum and maximum values of the variables.
(ii) Correlational Statistics: Correlational statistics show whether and how strongly pairs of variables are
related. An intelligent correlation analysis can lead to a greater understanding of the data. Correlation can tell whether
the relationship is positive or negative and the strength of relationship. Correlation is a powerful tool that provides
these vital pieces of information.
(iii) Inferential Statistics: Statistical inference is the process of drawing conclusions from data that are subject
to random variation. Inferential statistics are used to test hypotheses and make estimations using sample data. The
outcome of statistical inference may be an answer to the question what should be done next, where this might be a
decision about making further experiments or surveys, or about drawing a conclusion before implementing some
organizational or governmental policy.
On the Basis of Distribution of Data
On the basis of distribution of data statistics can be classified as parametric and non-parametric statistics. Both
deal with population or sample which means the total number of items in a sphere. In statistics, the number of a
population is finite. Kerlinger defines the term population and universe as all the members of any well-defined class of
people, events or objects. Statis-tical population may include three types of properties:

(a) finite number knowable items,
(b) finite number of unknowable items, and
(c) infinite number of items.
Sample is a set of data collected and/or selected from a statistical population by a defined procedure. Since the
population is very large, making a census or a complete enumeration of all the values in the population impractical or
impossible, sample represents a subset of manageable size. Samples are collected and statistics are calculated from
the samples so that one can make inferences or extrapolations from the sample to the population. This process of
collecting information from a sample is referred to as sampling.
Parametric statistics assumes that the data has come from a type of probability distribution and makes inferences
about the parameters of the distribution. Most well-known elementary statistical methods are parametric.
Generally, speaking parametric methods make more assumptions than non-parametric methods. If those extra

assumptions are correct, parametric methods can produce more accurate and precise estimates. They are said to
have more statistical power. However, if assumptions are incorrect, parametric methods can be very misleading. For
that reason they are often not considered robust. On the other hand, parametric formulae are often simpler to write
down and faster to compute. In some, but definitely not all cases, their simplicity makes up for their non-robustness,
especially if care is taken to examine diagnostic statistics. Para-metric statistics moves after confirming its populations
property of normal distribution.
Advantages of parametric statistics are that it is more reliable and authentic relative to the non-parametric
statistics. It is also more powerful to establish the statistical significance of effects and differences among variables. It
is also more appropriate in research applications.
Disadvantages of parametric statistics are that it follows rigid assumption of normal distribution and it narrows the
scope of its usage. In case of small sample, parametric statistics cannot be used since normal distribution cannot be
achieved. Besides, in parametric statistics computation is lengthy and complex because of large samples and numerical
calculations. Some of the major parametric statistics used for data analysis include T-test, F-test and r-test.
In non-parametric statistics, we do not have to make any assumption of normality for the population we are
studying. Indeed, the methods do not have any dependency on the population of interest. It is for this reason that non-
parametric methods are also referred to as distribution free methods. Non-parametric methods are growing in popularity
and influence for a number of reasons. The main reason is that we are not constrained for making as many assumptions
about the population that we are working with as what we have to make with a parametric method. Many of these
non-parametric methods are easy to apply and to under-stand. Chi-square, Spearmans rank difference method of
correlation, Kendalls rank difference method and Mann-Whitney U-test are some of the non-parametric methods.
Scope And use of Statistics
The scope of statistics is wide and vast. Given below are some of them:
Policy Planning: Statistics is used in analyzing data for policy planning. For instance, companies use previous
sales data to make future strategies to achieve maximum benefit in sales and profit.
Management: Statistics is used for effective management. Organisations use data in various aspects of work
and well being of the employees and manage-ment of the business.
Behavioural and Social Sciences: In social sciences, statistics is used to understand the patterns of behaviour/
trend. Parametric statistics or non-parametric statistics are used to explain the pattern of activities when the
characteristics of the population being studied are normally distributed.
Education: The use of statistics has become indispensable for educational research. Statistics is used to describe
and analyse the groups surveyed by means of quantitative treatment. In using statistical techniques various problems
are involved.
Commerce and Accounts: Statistics has been inevitable in commerce and accounts on estimating money
matters to manage the funds properly enabling efforts in various sectors. It is used in the cost and benefit analysis for
maximum benefit at minimum cost.
Industries: Statistics is used to handle daily matters at various levels in big as well as small organization. It is
used to manage the data with respect to expenditure and the staff.
Statistical tools are used to differentiate among employees.

Pure Sciences and Mathematics: Statistical tools are used in pure sciences and to see differences on different
occasions in various conditions. Statistics is a branch of mathematics. It helps in understanding differences among
properties of various applications in mathematics.
Problem Solving: It helps in finding out the best applicable solution in a problem situation. It is used to analyse
the pattern of response and the correct solution and thus minimizes the error factor.
Theoretical Researches: Statistical measures are used to decide on the facts and data whether a particular
theory can be maintained or challenged. The significance of facts for a particular paradigm or phenomena is established
by statistical analyses.
Q. 2. Find out whether correlationship exists between the following sets of scores using Spearmans
Data X 43 23 34 32 34 34 45 32 23 21

Data Y 23 34 35 34 44 65 65 34 43 43
X Y RX Ry D = RX RY D2
43 23 5 1 4 16
23 34 2 2 0 0
34 35 4 3 1 1
32 34 3 2 1 1
34 44 4 5 1 1
34 65 4 6 2 4
45 65 6 6 0 0
32 34 3 2 1 1
23 43 2 4 2 4
21 43 1 4 3 9
D2 = 37
rs = 1 (6 D2) / n (n2 1)

(6 37 ) 1 222
= =
10 (100 1 ) 10 99

1 222
rs =
Hence the rcorrelation exits b/w the following data.
Q. 3. With the help of t test find whether significant difference exists inh the scores obtained by the
two groups of employees on self esteem scale.
Group A 46 34 23 12 23 21 33 23 12 23
Group B 32 34 34 32 34 45 43 45 43 5

A B A A2 B B 2
463 323 21 34441 3 9
34 34 9 81 1 1
23 34 2 4 1 1
12 32 13 169 3 9
23 34 2 4 1 1
21 45 4 16 10 100
33 43 8 64 8 64
23 45 2 4 10 100

23 5 2 4 30 900
250 34.7 = 35 952 1239

SD = [(952 + 1239) / 9 + (9)]

= + 9

= 243. 4 253

20 1
SEO = SD 253
100 5

t = (2535)

= t 10
df = (10 1) + (10 1)
9+9 = 18
Therefore Higificant difference exits in the scores.
Answer the following questions in about 400 words (wherever applicable) each.
Q. 4. Compute mean, median mode for the following data.
34, 34, 45, 71, 34, 45, 69, 34, 43, 82, 45, 84, 59, 71, 32, 74, 65, 77, 65, 78, 87, 76
Ans. Data: 34, 34, 45, 71, 34, 45, 69, 34, 43, 82, 45, 84, 59, 71, 32, 74, 65, 77, 65, 78, 87, 76. Ascending Order 32,
34, 34, 34, 34, 45, 45, 45, 59, 65, 65, 69, 71, 71, 74, 76, 77, 78, 82, 84, 87
Mode = 34 (as it occues most of the time)
n n
n n
+ + 1
Median = 2 2
Median n = 22

(1)n + (12 )n

65 + 65 130
= = 65
2 2
Median = 65

32 + 34 + 34 + 34 + 34+ 43 + 45 + 45 + 45 + 59 + 65 + 65 + 69 + 71
+ 74 + 76 + 77 + 78 + 82 + 84 + 87
Mean =

= 59.3 approx.
Q. 5. Tabulate the following scores in cumulative frequency distribution with the use of an appropriate
Class Interval Fraquency Cf
30-40 2 2
40-50 4 6
50-60 16 22
60-70 17 39
70-80 15 54
80-90 147 68
90-100 5 73
Q. 6. Discuss graphical presentation of data with suitable examples and diagrams.
Ans. Data may also be presented graphically on a pictorial platform formed of horizontal and vertical lines,
known as graph. Two mutually perpendicular lines called the X and Y-axes on which appropriate scales are indicated
are used to plot a graph. The horizontal line is called abscissa and the vertical ordinate. There are different types of
graph such as bar graphs, line graphs, pie, pictographs, etc. Some of these will be discussed in the following
This type of graph chart is made by using frequency of data. Histogram includes a series of rectangles, with its
width equal to the class-interval of the variable on horizontal axis and the corresponding frequency on the vertical axis
as its heights. The upper limit of a class is the lower limit of the following class. Given below are the steps for
constructing a histogram:
Step 1: Present the frequency distribution data in a table.
Step 2: Plan a suitable scale for horizontal axis and the number of squares needed as per the width of the graph.
Step 3: Draw bars. The width and the height of bars should correspond to the class-interval and the frequency in
that particular interval. The edge of a bar stands for the lower real limit for the higher interval and the upper real limit
for another interval.1

Step 4: Use either real limit or midpoint of class-interval and identify class-intervals along the horizontal axis.
These will be placed under the edge of each bar. If we use midpoint of class-interval, it will be put under the middle
of each bar.
Step 5: Both axes should be labled and give appropriate title to the histogram.
Table 8: Results of 200 students in Academic Achievement Test.
Class-interval Frequency
10 20 15
20 30 7
30 40 45
40 50 35

50 60 55
60 70 18
70 80 25
We will build a histogram on the basis of the data given above.







10 20 30 40 50 60 70 80
Achievement Scores

Frequency Polygon
Frequency polygons are a graphical devices for understanding the shapes of distributions. They serve the same
purpose as histograms, but are especially helpful for comparing sets of data. Frequency polygons are also a good
choice for displaying cumulative frequency distributions. To create a frequency polygon, start just as for histograms,
by choosing a classinterval. Then draw an X-axis representing the values of the scores in your data. Mark the middle
of each class-interval with a tick mark, and label it with the middle value represented by the class. Draw the Y-axis to
indicate the frequency of each class. Place a point in the middle of each class interval at the height corresponding to
its frequency. Finally, connect the points. You should include one class-interval below the lowest value in your data
and one above the highest value. The graph will then touch the X-axis on both sides. The following is the frequency
polygon of the data given in Table 8.







0 10 20 30 40 50 60 70 80 90
Achievement Scores

Frequency Polygon
Frequency Curve
To draw the frequency curve it is necessary first to draw the polygon. The polygon is then smoothened out
keeping in view the fact that the area of the curve should be equal to that of the histogram. The objective is to remove
erratic fluctuations in the data presented.
Fig about shows the frequency curve on the basis of data given in Table on previous page.
Cumulative Frequency Curve or Ogive
Cumulative frequency curve or ogive refers to the graph of a cumulative frequency distribution. There are two
types of ogives since there are two types of cumulative frequency distribution: less than and more than cumulative
(i) Less than Ogive: It is plotted against the upper class boundaries of the respective classes. It is a rising
curve with slopes upwards from left to right.
(ii) More than Ogive: It is plotted against the lower class boundaries of the respective classes. It is a falling
curve and slopes down-wards from left to right.
The examples of both ogives based on data given in the following table:
Class-interval Frequency Less than c.f. More than c.f.
10 20 12 12 200
20 30 11 23 188
30 40 34 57 177
40 50 56 113 143
50 60 44 157 87
60 70 26 183 43
70 80 17 200 17
The figure given below has both the ogives:

Less than and more than type ogives

Misuse of Graphical Presentations
Utmost care should be taken while presenting the data graphically because manipulation of the vertical (ordinate
or Y-axis) and horizontal (abscissa or X-axis) lines of a graph can distort the data. Elimination of zero frequency on
ordinate difference among bars or ups and downs in a curve line can be highlighted as one wants.
Q. 7. Describe the signicane of measures of Dispersion. Discuss the properties and limitations of
average deviation and standard deviation.
Ans. The measures of variability are important because of the following reasons:
(i) Measures of variability show to what extent an average represents the given data. Small variation indicates
high uniformity of values in the distribution and the average stands for the characteristics of the data. Large
variation indicates lower degree of uniformity and unreliable average.
(ii) Measures of variation show the nature and cause of variation. This information is useful to control the

(iii) Measures of variation help in understanding uniformity or consistency in data when comparing the spread
in two or more sets of data.
(iv) Measures of variation increases the scope of using statistical techniques like correlation, and regression
The Average Deviation
Deviation Scores: Deviation scores are expressed the differences or deviations from some value, usually the
mean. To convert data to deviation scores typically means to subtract the mean score from each other score. The
average deviation is one of several indices of variability to characterize the dispersion among the measures in a given
population. To calculate the average deviation of a set of scores it is first necessary to compute their mean and then
specify the distance between each score and that mean without regard to whether the score is above or below the
mean. The average deviation is defined as the mean of these absolute values. Garrett says, The average deviation is
the mean of the deviation of all of the separate scores is a series taken from their mean. Guilfdord define The
average deviation is the arithmetic mean of all the deviation when we disregard the algebraic signs.
Properties of the Average Deviation
It is easy to calculate average deviation. Equal weights are given to each observed values in the calculation of
average deviation. Thus, it shows how far each observation lie from mean.
Limitation of the Average Deviation
In calculating average deviation, we consider all values as plus. Because of this limitation, it is not used in
inferential statistics.
The Standard Deviation
The standard deviation, represented by the Greek letter sigma, , measures the amount of variation or dispersion
from the average. Standard deviation is also known as root-mean, square deviation. Karl Pearson for the first time
used the term standard deviation in 1894. Guilford defines Standard deviation is the square root of the arithmetic
means of squared deviation of measurements from their means.
Properties of the Standard Deviation
A low standard deviation indicates that the data points tend to be very close to the mean. A high standard
deviation indicates that the data points are spread out over a large range of values. Smaller standard deviation is more
reliable than with large standard deviation. The smaller standard deviation indicates the homogeneity of the data. If
there is large difference between mean and standard deviation, the theory being tested probably needs to be revised.
Limitation of the Standard Deviation
Extreme values get more importance and number closer to means get less importance when standard deviation is
calculated. Square of deviation from mean (XM) makes the deviation larger when we calculate standard deviation.
For example the deviation 3 and 12 are in the ratio of 1:4, but their square 9 and l44 are in the ratio 3:48.
Q. 8. Discuss Chi-Square as a test of Goodness of Fit. Describe the steps involved in Chi-square with
the help of suitable example.
Ans. Chi-square as a Test of Goodness of Fit: We use chi-square test to know how the observed value of a
given phenomenon is different from the expected value. We compare the observed sample distribution with the
expected probability distribution. The test determines how well theoretical distribution like normal, binomial and Poisson

fits the empirical distribution. We divide the sample data into intervals in chi-square goodness of fit test. We compare
the numbers of points that fall into the interval with the expected numbers of points in each interval.
In chi-square goodness of fit test, null and alternative hypothesis have different points of view.
(a) Null Hypothesis: It states that there is no significant difference between the observed and the expected
(b) Alternative Hypothesis: It states that there is a significant difference between the observed and the
expected value.
We will calculate chi-square using the formula and find out whether the obtained chi-square value is significant at
.05 or .01 levels.
Steps for Chi-square Testing
First, we have to prepare a null of hypotheses.

Second, we have to collect the data and find out observed frequency.
Third, we will get the expected frequencies by adding all the observed frequencies divided by number of categories.
Fourth, we get the difference between observed frequencies and expected frequencies.
Fifth, we have to get the square of the difference between observed and expected frequency.
Sixth, we divide it by expected frequency and we get a quotient.
Seventh, we will get the sum of these quotients.
Eighth: Know the degree of freedom and get the critical value of 2 from table.
Finally, make a comparison of the calculated and table value of 2 and use the following decision rule. We accept
the null hypothesis if the obtained value is less than critical value given in table. We reject the null hypothesis if the
obtained value is more than what is given in the table under .05 or .01 significance levels.
Answer the following in about 50 words each.
Q. 9. Level of Significance
Ans. Researchers decide about the level of significance at which they want to test his hypothesis.
In social sciences, researchers most often use .05 and .01 levels of significance. When we use 05 or 5% level of
significance to reject a null hypothesis, it implies that the chances are 95 out of 100 that are not true and only 5
chances out of 100 the difference is a true difference.
If we use .01 or 1% level of significance to reject a hypothesis, it shows the chances are 99 out of 100 that the
hypothesis is not true and it is true only 1 chance out of 100. The level of significance a researcher will accept should
be set by researcher himself before he collects the data.
Q. 10. Parameter
Ans. A parameter is any numerical quantity that characterizes a given population or some aspect of it. This
means the parameter tells us something about the whole population. The most common statistics para-meters are the
measures of central tendency. These tell us how the data behaves on an average basis. For example, mean, median
and mode are measures of central tendency that give us an idea about where the data concentrates. Standard
deviation tells us how the data is spread from the central tendency, i.e., whether the distribution is wide or narrow.
Such parameters are often very useful in analysis.
Q. 11. Pie Diagram
Ans. A pie diagram consists of a circle divided into as many sectors to indicate the various components of a given
variable. The area of each sector is in proportion to the angle, which subtends at the centre of the circle and with the
relative frequency of the class.
The sum of the sectors is taken equivalent to 360o. So, in order to get the angles of the desired sectors, we divide
360 in the proportion of the various relative frequencies.
Q. 12. Variance
Ans. The variance refers to the sum of squares of the deviations of each score of a variable from the mean of
that variables divided by number of observations.
Thus, V = / N
Standard deviation of variable X () = (fx / N ).
Q. 13. Skewness
Ans. Skeweness :Skeweness means lack of symmetry. A normal curve has a balance between the right and left
halves of the curve. It is always symmetrical. The mean, median and mode fall at the same point. In skewed curve,

the mean and median fall at different points and the balance is shifted to the left or to the right. Look at the figure
given below:

The following are the properties of a normal curve:

(i) The normal curve is a possible model of probability distribution.
(ii) The normal curve is an infinite number of possible curves. The same algebraic expres-sion describes all of
Normal curves are similar in shape and symmetry. In normal curves tails never touch the X-axis. They are
bilaterally symmetrical. Most of the area under normal curve falls within a limited range of the number line. The total
area of normal curves is 1.00. So the area in each half of the distribution is 0.5.
Q. 14. Mean
Ans. Mean: Mean of a variable X is calculated by dividing the sum of scores (X) by number of observations
(N). Thus the formula for Mean = X / N
Mean is the average. We add up all the numbers and then divide by the number of numbers. The mean of the
following list of values is
13, 18, 13, 14, 13, 16, 14, 21, 13
The mean is the usual average, so
(13 + 18 + 13 + 14 + 13 + 16 + 14 + 21 + 13) 9
= 15
Mean = X/N
Q. 15. Negative correlation
Ans. Negative Correlation
In the case of negative correlation, both the variables move in opposite direction. It means if the values of one
variable decrease the values on other variable increase. If the values of one variable increase, the values on other
variable decrease. For example, as the temperature decrease, the sales of woollen clothes increase.
The figure given below presents scatterplot of the negative relationship. Here the higher scores on X-axis are
associated with lower scores on Y-axis and lower scores on X-axis are associated with higher scores on Y-axis. Here
higher scores on temperature are linked with the lower score on sales of woollen clothes. Lower temperatures are
associated with higher sales of the woollen clothes.


Negative relationship between temperature and sales of woolen clothes

Q. 16. Nominal Scale
Ans. Nominal scale refers to the number assigned for the purpose of identification. For example, the numbers are
written on football players jerseys. The numbers do not indicate that one is better or worse than other. The numbers
only serve the purpose of identification. There is no mathematical procedure followed for giving these numbers.
Q. 17. Multiple Bar Diagram
Ans. Multiple Bar Diagram: For representing different variables, multiple bar diagrams are used by constructing
the bars adjoining to each other. The method for the construction of multiple bar is quiet similar to the simple bar, the
only difference is that it shows the relationship among the two or more components of the same variable. It facilitates
the comparison between the different values of same variables or different variables. The bars are shaded differently
or may be shown in different colours.
Q. 18. Phi Coefficient

Ans. The Pearsons Product Moment Correlation calculated is called as Phi coefficient (phi) when both the
variables (X and Y) are dichotomous. If we investigate relationship between employment status and marital status,
both the variables are dichotomous. Employment status is dichotomous having two levels, employed and unemployed.
The marital status is dichotomous at two levelsmarried and unmarried. Here Phi coefficient (phi) will be used.
n n


