Assignment – 1

Course Code: Educational Statistics (8614)

Spring 2021
Question. 1
What do you understand by statistics? What are the characteristics of statistics? Explain in
1. What is Statistics
Statistics is a broad subject with applications in vast variety of fields. The word “statistics” is
derived from the Latin word “Status”, which means a political state. Statistics is a branch of
knowledge that deals with facts and figures. The term statistics refers to a set of methods and
rules for organizing, summarizing, and interpreting information. It is a way of getting information
from data. We can say that Statistics is a science of collecting, organizing, interpreting and
reporting data. It is a group of methods which are used for collecting, displaying, analyzing, and
drawing conclusions from the data. In other words, statistics is a methodology which a researcher
uses for collecting and interpreting data and drawing conclusion from collected data.
Statistical data can be used to answer the questions like:
∙ What kind and how much data we need to collect?
∙ How should we organize and summarize the data?
∙ How can we analyze the data and draw conclusion from it?
∙ How can we assess the strength of the conclusion and evaluate their uncertainty?
Above discussion lead us to the conclusion that statistics provides methods for:
○ Design: Planning and carrying out research studies.
○ Description: Summarizing and exploring data.
○ Inferences: Making predictions and generalization about phenomena represented by
○ the data.
2. Characteristics of Statistics Following are the characteristics of statistics.
a. Statistics consists of aggregate facts
The facts which can be studied in relation to time, place or frequency can be called
statistics. A single isolated and unconnected fact or figure is not statistics because we
cannot study it in relation to other facts and figures. Only aggregate of facts e.g. academic
achievement of the students, I.Q. of a group of students, weight of students in a class,
profit of a firm etc. are called statistics.
b. Multiple causes affect Statistics
A phenomena may be affected by so many factors. We cannot study the effects of one
factor on the phenomena only by ignoring others. To have a true picture we will have to
study the effects of all factors on the phenomena separately as well as collectively,
because effects of the factors can change with change of place, time or situation. For
example, we can say that result of class X in board examination does not depend on any
single factor but collectively on standard of teachers, teaching methods, teaching aids,
practical’s performance of students, standard of question papers, environment of the
examination hall, exam supervisory staff and standard of evaluation of answers after the
c. Data should be numerically expressed, enumerated of estimated
Data to be called statistics should be numerically expressed so that counting or
measurement of data can be made possible. It means that the data or the fact must be in
quantitative form as achievement scores 60, 50, 85, 78, and 91 out of 100. If it is not in
quantitative form it should be quantified.
d. Statistics are enumerated or estimated according to reasonable standard of
For a clear picture of the phenomena under investigation, it should be researched using
reasonable standard of accuracy depending upon the nature and purpose of collection of
data. Data collection should be free from personal prejudices and biases. Biased and
personally prejudiced data leads to inaccurate conclusion.
e. Statistics are collected in a Systematic Manner
In order to have reasonable standard of accuracy statistics/data must be collected in a
very systematic manner. Any rough and haphazard method of collection will not be
desirable for that may lead to improper and wrong conclusion.
f. Statistics for a Pre-determined Purpose
Before collection of data, investigator/researcher must have a purpose and then should
collect data accordingly. Data collected without any purpose is of no use. Suppose we
want to know intelligence of a section of people, we must collect data relating to I.O. level
and data relating to income, attitude and interest level of that group of people will be of no
use. Without having a clear idea about the purpose we will not be in a position to
distinguish between necessary data and unnecessary data or relevant data and irrelevant
g. Statistics are Capable of being placed in Relation to each other
Statistics is a method for the purpose of comparison etc. It must be capable of being
compared; otherwise, it will lose much of its significance. Comparison can be made only
if the data are homogeneous. Data on memory test can be compared with I.Q. It is with
the use of comparison only that we can illustrate changes which may relate to time, place,
frequency or any other character, and statistical devices are used for this purpose.

3. Importance and Scope of Statistics

Statistics is important in our daily life. We live in the information world and much of this information
is determined mathematically with the help of statistics. It means statistics keeps us informed
about day to day happening. Importance of statistics in our daily life is discussed under following
○ Every day we watch weather forecasting. It is possible due to some computer models
based on statistical concepts. These models compare prior weather with the current
weather and predict future weather.
○ Statistics is frequently used by the researchers. They use statistical techniques to collect
relevant data. Otherwise there may be loss of money, time and other resources.
○ In business market statistics play a greater role. Statistical techniques are the key of how
traders and businessmen invest and make money. Also, in industry, these tools are used
in quality testing. Production managers are always interested to find out whether the
product is confirming the specification or not. He uses statistical tools like inspection plan,
control chart etc.
○ Statistics also has a big role in the medical field. Before any drugs prescribed, pharmacists
show statistically valid rate of effectiveness. Similarly statistics is behind all other medical
studies. Doctors predict diseases on the bases of statistical concepts.
○ Print and electronic media use statistical tools to make predictions of winner of elections
and coming government.
○ Statistics has widely been used in psychology and education to determine the reliability
and validity to a test, factor analysis etc.
○ Apart from above statistics has a wide application in marketing, production, finance,
banking, investment, purchase, accounting and management control.
Limitations of Statistics
The science of Statistics has following limitations:
a. The use of statistics is limited to numerical studies
We cannot apply statistical techniques to all type of phenomena. These techniques can
only be applied to the phenomena that are capable of being quantitatively measured and
numerically expressed. For example, the health, intelligence, honesty, efficacy etc. cannot be
quantitatively measured, and thus are unsuitable for statistical study. In order to apply statistical
techniques to these constructs, first we will have to quantify them.
b. Statistical techniques deal with population or aggregate of individuals rather than
with individuals
For example, when we say that the average height of a Pakistani is 1 meter and 80
centimeters, we mean to shows the height not of an individual but as found by the study
of all individuals living in Pakistan.
c. Statistics relies on estimation and approximations
Statistical techniques are not exact laws like mathematical or chemical laws. They are
derived by taking a majority of cases and are not true for every individual. Thus the
statistical inferences are uncertain.
d. Statistical results might lead to fallacious conclusions
Statistical results are represented by figures, which are liable to be manipulated. Also the
data placed in the hands of an expert may lead to fallacious results because figures may
be stated without their context or may be applied to a fact other than the one to which they
really relate. An interesting example is a survey made some years ago which reported that
33% of all the girl students at John Hopkins University had married University teachers.
Whereas the University had only three girls student at that time and one of them married
to a teacher.
Question. 2
What do you understand by the term “data”? Write in detail the types of data.
What is Data
The term “data” refers to the kind of information a researcher obtains to achieve objectives of his
research. All research processes start with collection of data, which plays a significant role in the
statistical analysis. This term is used in different contexts. In general, it indicates facts or figures
from which conclusions can be drawn. Or it is a raw material from which information is obtained.
Data are the actual pieces of information that you collect through your study. In other words data
can be defined as collection of facts and details like text, figures, observations, symbols, or simply
description of things, event or entity gathered with a view of drawing inferences. It is a raw fact
which should be processed to get information.
Data can be defined as a systematic record of a particular quantity. It is the different values of
that quantity represented together in a set. It is a collection of facts and figures to be used for a
specific purpose such as a survey or analysis. When arranged in an organized form, can be called
information. The source of data ( primary data, secondary data) is also an important factor.
Types of Data
In research, different methods are used to collect data, all of which fall into two categories, i.e.
primary data and secondary data. It is a common classification based upon who collected the
Primary data
As the name suggests, is one which is collected for the first time by the researcher himself.
Primary data is originated by the researcher for the first time for addressing his research problem.
It is also known as first hand raw data. The data can be collected using various methods like
survey, observations, physical testing, mailed questionnaire, questionnaire filled and sent by
enumerators, personal interviews, telephonic interviews, focus groups discussion, case studies,
Secondary data
Point towards the second hand information already collected and recorded by any other person
with a purpose not relating to current research problem. It is readily available form of data and
saves time and cast of the researcher. But as the data is gathered for the purpose other than the
problem under investigation, so the usefulness of the data may be limited in a number of ways
like relevance and accuracy. Also, the objectives and methods adopted to collect data may not
be suitable to the current situation. Therefore, the researcher should be careful when using
secondary data. Examples of secondary data are censuses data, publications, internal records of
the organizations, reports, books, journal articles, websites etc.
Data may be qualitative or quantitative. Once you know the difference between them, you can
know how to use them.

Qualitative Data: They represent some characteristics or attributes. They depict descriptions that
may be observed but cannot be computed or calculated. For example, data on attributes such as
intelligence, honesty, wisdom, cleanliness, and creativity collected using the students of your
class a sample would be classified as qualitative. They are more exploratory than conclusive in
Qualitative data is a bunch of information that cannot be measured in the form of numbers. It is
also known as categorical data. It normally comprises words, narratives, and we labelled them
with names.
It delivers information about the qualities of things in data. The outcome of qualitative data
analysis can come in the type of featuring key words, extracting data, and ideas elaboration.
For examples:
Hair colour- black, brown, red
Opinion- agree, disagree, neutral
Quantitative Data: These can be measured and not simply observed. They can be numerically
represented and calculations can be performed on them. For example, data on the number of
students playing different sports from your class gives an estimate of how many of the total
students play which sport. This information is numerical and can be classified as quantitative.
Quantitative data is a bunch of information gathered from a group of individuals and includes
statistical data analysis. Numerical data is another name for quantitative data. Simply, it gives
information about quantities of items in the data and the items that can be estimated. And, we can
formulate them in terms of numbers.
For examples:
We can measure the height (1.70 meters), distance (1.35 miles) with the help of a ruler or tape.
We can measure water (1.5 litres) with a jug.
Under a subdivision, nominal data and ordinal data come under qualitative data. Interval data and
ratio data come under quantitative data. Here we will read in detail about all these data types.
Nominal data are used to label variables where there is no quantitative value and has no order.
So, if you change the order of the value then the meaning will remain the same.
Thus, nominal data are observed but not measured, are unordered but non-equidistant, and have
no meaningful zero.
The only numerical activities you can perform on nominal data is to state that perception is (or
isn't) equivalent to another (equity or inequity), and you can use this data to amass them.
You can't organize nominal data, so you can't sort them.
Neither would you be able to do any numerical tasks as they are saved for numerical data. With
nominal data, you can calculate frequencies, proportions, percentages, and central points.
Examples of Nominal data:
What languages do you speak?
What’s your nationality?
Ordinal Data
Ordinal data is almost the same as nominal data but not in the case of order as their categories
can be ordered like 1st, 2nd, etc. However, there is no continuity in the relative distances between
adjacent categories.
Ordinal Data is observed but not measured, is ordered but non-equidistant, and has no
meaningful zero. Ordinal scales are always used for measuring happiness, satisfaction, etc.
Discrete Data: These are data that can take only certain specific values rather than a range of
values. For example, data on the blood group of a certain population or on their genders is termed
as discrete data. A usual way to represent this is by using bar charts.

Continuous Data: These are data that can take values between a certain range with the highest
and lowest values. The difference between the highest and lowest value is called the range of
data. For example, the age of persons can take values even in decimals or so is the case of the
height and weights of the students of your school. These are classified as continuous data.
Continuous data can be tabulated in what is called a frequency distribution. They can be
graphically represented using histograms.
Key Differences Between primary And Secondary Data
Some key differences between primary and secondary data are given in the following lines.
∙ Primary data refers to the data originated by the researcher for the first time. Secondary
data is already existing data, collected by other researchers, agencies, and organizations.
∙ Primary data is real-time data whereas secondary data is one which relates to the past.
∙ Primary data is collected to address the problem in hand while the purpose behind
collection of secondary data is different from the problem in hand.
∙ Collection of primary data is a laborious process. On the other hand collection of
secondary data is easy and rapid.
∙ Sources of primary data are survey, observations, physical testing, mailed questionnaire,
questionnaire filled and sent by enumerators, personal interviews, telephonic interviews,
focus groups discussion, case studies, etc. On the other hand sources of secondary are
censuses data, publications, internal records of the organizations, reports, books, journal
articles, websites etc.
∙ Collection of primary data requires a large amount of resources like time, cost, and human
resources. On the other hand collection of secondary data is expensive and easily
∙ Primary data is specific to the researcher’s needs. He can control the quality of research.
On the other hand, secondary data is neither specific to researcher needs nor has he
control over the quality of data.
∙ Primary data is available in the raw form while secondary data has undergone some
statistical procedures and is refined from primary data.
∙ Data collected from primary sources are more reliable and accurate than the secondary
Question. 3
What types of characteristics a pictogram should have to successfully convey the
meaning? Write down the advantages and drawbacks of using pictograms.
A pictogram is a graphical symbol that conveys its meaning through its pictorial resemblance to
a physical object. A pictogram may include a symbol plus graphic elements such as border, back
pattern, or color that is intended to covey specific information s. we can also say that a pictogram
is a kind of graph that uses pictures instead of bars to represent data under analysis. A pictogram
is also called “pictograph”, or simply “picto”. A pictogram or pictograph represents the frequency
of data as pictures of symbols. Each picture or symbols may represent one or more units of data.
Pictograms form a part of our daily lives. They are used in transport, medication, education,
computers etc. they indicate, in iconic form, places, directions, actions or constraints on actions
in either the real world (a road, a town, etc) or in virtual world (computer, internet etc.).
The pictograph is a method to represent the data using images. Each image in the pictograph
represents certain things. In other words, pictographs define the frequency of the data using
images or symbols, which are relevant to the data. The pictograph is extremely easy to
understand, and it is one of the simplest ways to represent the statistical data. In the pictograph,
we use a key, which denotes the value of the symbol. While using symbols or images, all the
symbols should be of the same size.
Detailed Explanation of Pictogram What does a pictogram mean? An image that addresses a
word or a thought by representation. picture charts information is the most significant and
significant character in science. Pictograms are a visual way o showing factual information. They
are otherwise called pictorial unit outlines, pictographs, and pictorial unit bar graphs. For instance,
I will attract a pictogram to analyze the nations that have the most tanks. First I need a table of
measurable information showing the nations with the most tanks I will give it a proper title. I will
at that point utilize a proper image to address the tanks. Each tank imagined in the chart will
address 1000 tanks. Presently how about we draw the pictograph. Russia has 22,950 tanks so I
will gather together the figures and draw 23 tanks. For the United States, draw nine tanks. China
I will draw seven. Pakistan and (N) Koria. What's more, here's our finished pictograph.
A typical tricky chart includes utilizing a vertical scale that begins at some worth more noteworthy
than zero to misrepresent contrasts between gatherings. Here's an illustration of two diagrams
that portray similar data. The chart on the left doesn't have a zero beginning stage, it really begins
at 10%. The chart on the privilege has a zero beginning stage. The diagrams address similar data
albeit the chart on the left would cause you to accept that there's greater contrast between those
utilizing oxycontin and their involvement in sickness than the fake treatment. When in actuality
there's that large of a distinction. So consistently analyze a chart cautiously to see whether the
vertical pivot starts sooner or later other than zero so contrasts are overstated. Pictographs.
Drawings of items called pictographs are frequently deceptive. Information that is one-
dimensional in nature, for example, spending sums, are regularly portrayed with two-dimensional
items, for example, dollar notes or three-dimensional articles like heaps of coins homes, or
barrels. By utilizing pictographs craftsmen can make bogus impressions that terribly mutilate
contrasts by utilizing these straightforward standards of fundamental math. At the point when you
twofold each side of a square its territory does it only twofold, it increments by a factor of four. At
the point when you twofold each side of a 3D square its volume doesn't only twofold, it increments
by a factor of eight. Here's an illustration of a pictogram portrayal of the abatement in smoking
from 1970 to 2013. Presently, these are three-dimensional items, the chamber fits as a fiddle
fundamentally. It would appear that the cigarette on the privilege is a lot more modest than the
cigarette on the left - not exactly a large portion of the size without a doubt.
Characteristics a pictogram should have to successfully convey the meaning
To successfully convey the meaning, a pictogram:
∙ Should be self-explanatory.
∙ Should be recognizable by all people.
∙ Must represent a general concept.
∙ Should be clear concise and interesting.
∙ Should be identifiable as a set, through uniform treatment of scale, style and
∙ subject.
∙ Should be highly visible, easy to reproduce in any scale and in positive or negative
∙ form.
∙ Should not be dependent upon a border and should work equally well in positive or
∙ negative form.
∙ Should avoid stylistic fads or a commercial appearance and should imply to wide
∙ audience that has a sophisticated, creative culture.
∙ Should be attractive when used with their design, elements and typestyles.
Advantages and Drawbacks of Pictograms
Following are the advantages of pictograms:
∙ Pictograms can make warnings more eye-catching.
∙ They can serve as an “instant reminder” of a hazard or an established message.
∙ They may improve warning comprehension for those with visual or literacy
∙ difficulties.
∙ They have the potential to be interpreted more accurately and more quickly than words.
∙ They can be recognized and recalled far better than words.
∙ They can improve the legibility of warnings.
∙ They may be better when undertaking familiar routine tasks.
∙ Pictographs are used to express large information in a simple manner.
∙ It is easy to read, as all the information is provided at one glance.
∙ It does not require more explanation, as it is universally used.
∙ It attracts the attention of the viewers or readers, as it has many attractive images.
Disadvantages of Pictograms. There are a number of disadvantages of relying on pictograms.
∙ Very few pictograms are universally understood.
∙ Even well understood pictograms will not be interpreted equally by all groups of
∙ peoples and across all cultures, and it takes years for any pictogram to reach maximum
∙ They have the potential for interpreting the opposite or often undesired meaning which
can create additional confusion.
How to Make a Pictograph?
The different steps to make a pictograph are given below:

∙ Step 1: Collect the Data

The first step in making a pictograph is the collection of relevant information, which we
want to represent. Once the data is collected, make a table or a list of data.
∙ Step 2: Select the Symbol or Images
To represent the data, pick any images/pictures or symbols. For example, if the data
represents the rainfall for different cities, make use of cloud images or some other images
which are relevant to the data.
∙ Step 3: Assign a Key
While representing the data using images, use a key, which denotes the value of the
image. Because, if the frequency of the data is too high, then one image is not enough to
represent the data. Thus, the numerical value called “key” is used, which should be written
along with the pictograph.
∙ Step 4: Draw the Pictograph
While making a pictograph, use two columns that represent the category and data. Finally,
draw the pictograph using symbols/images, which represents the frequency. In case, if the
frequency is not a whole number, the symbols can be drawn as fractions.
∙ Step 5: Review the Data and Pictograph
Once the pictograph is drawn, make sure that the images exactly represent data as well
as the labelling of the pictograph.

Question. 4
Define normal curve. Write down the properties of normal curve.
Normal Curve
One way of presenting out how data are distributed is to plot them in a graph. If the data is evenly
distributed, our graph will come across a curve. In statistics this curve is called a normal curve
and in social sciences, it is called the bell curve. Normal or bell curved is distribution of data may
naturally occur in several possible ways, with a number of possibilities for standard deviation
(which could be from 1 to infinity). A standard normal curve has a mean of 0 and standard of 1.
The larger the standard deviation, the flatter the curve will be and vice versa.
A normal curve has following properties.
∙ The mean, median or mode are equal.
∙ The curve is symmetric at the center (i.e. around the mean).
∙ Exactly half of the values are to the left of the center and half to the right.
∙ The total area under the curve is 1.
The normal distribution is the most important probability distribution in statistics because it fits
many natural phenomena. For example, heights, blood pressure, measurement error, and IQ
scores follow the normal distribution. It is also known as the Gaussian distribution and the bell
The normal distribution is a probability function that describes how the values of a variable are
distributed. It is a symmetric distribution where most of the observations cluster around the central
peak and the probabilities for values further away from the mean taper off equally in both
directions. Extreme values in both tails of the distribution are similarly unlikely.
In this blog post, you’ll learn how to use the normal distribution, about its parameters, and how to
calculate Z-scores to standardize your data and find probabilities.
Despite the different shapes, all forms of the normal distribution have the following characteristic
They’re all symmetric. The normal distribution cannot model skewed distributions.
The mean, median, and mode are all equal. Half of the population is less than the mean and half
is greater than the mean. The Empirical Rule allows you to determine the proportion of values
that fall within certain distances from the mean. More on this below!
While the normal distribution is essential in statistics, it is just one of many probability distributions,
and it does not fit all populations. To learn how to determine whether the normal distribution
provides the best fit to your sample data, read my posts about How to Identify the Distribution of
Your Data and Assessing Normality: Histograms vs. Normal Probability Plots.
If you have continuous data that are skewed, you’ll need to use a different distribution, such as
the Weibull distribution, exponential distribution, or the gamma distribution.
The Empirical Rule for the Normal Distribution
When you have normally distributed data, the standard deviation becomes particularly valuable.
You can use it to determine the proportion of the values that fall within a specified number of
standard deviations from the mean. For example, in a normal distribution, 68% of the observations
fall within +/- 1 standard deviation from the mean. This property is part of the Empirical Rule,
which describes the percentage of the data that fall within specific numbers of standard deviations
from the mean for bell-shaped curves.
Numerical Measures of Shape
One of the fundamental tasks in any statistical analysis is to characterize the location and
variability of a data set. Two important measures of shape, skewness and kurtosis, give us a more
precise evaluation of the data. Measures of dispersion tell us about the variation of the data set,
while skewness tells us about the direction of variation and kurtosis tells us the shape variation.
Let us have a brief review of these measures of shape.
Skewness tells us about the amount and direction of the variation of the data set. It is a measure
of symmetry. A distribution or data set is symmetric if it looks the same to the left and right of the
central point. If bulk of data is at the left i.e. the peak is towards left and the right tail is longer, we
say that the distribution is skewed right or positively skewed. On the other hand if the bulk of data
is towards right or, in other words, the peak is towards right and the left tail is longer, we say that
the distribution is skewed left or negatively skewed.If the skewness is equal to zero, the data are
perfectly symmetrical. But it is quiet unlikely in real world.
Kurtosis is a parameter that describes the shape of variation. It is a measurement that tells us
how the graph of the set of data is peaked and how high the graph is around the mean. In other
words we can say that kurtosis measures the shape of the distribution, .i.e. the fatness of the tails,
it focuses on how returns are arranged around the mean. A positive value means that too little
data is in the tail and positive value means that too much data is in the tail. This heaviness or the
lightness in the tail means that data looks more peaked of less peaked. Kurtosis is measured
against the standard normal distribution. A standard normal distribution has a kurtosis of 3.
Kurtosis has three types, mesokurtic, platykurtic, and leptokurtic. If the distribution has kurtosis of
zero, then the graph is nearly normal. This nearly normal distribution is called mesokurtic. If the
distribution has negative kurtosis, it is called platykurtic. An example of platykurtic distribution is
a uniform distribution, which has as much data in each tail as it does in the peak. If the distribution
has positive kurtosis, it is called leptokurtic. Such distribution has bulk of data in the peak.
Question - 5
Explain procedure for determining median, with one example each at least, if:
i. The number of scores is even
ii. The number of scores is odd.
Median is the middle value of rank order data. It divides the distribution in two halves (i.e. 50% of
scores or observations on either side of median value). It means that this value separates higher
half of the data set from the lower half. The goal of the median is to determine the precise midpoint
of the distribution. Median is appropriate for describing ordinal data.
The median is the middle number in a sorted, ascending or descending, list of numbers and can
be more descriptive of that data set than the average.
Median is a statistical measure that determines the middle value of a dataset listed in ascending
order (i.e., from smallest to largest value). The measure divides the lower half from the higher half
of the dataset. Along with mean and mode, median is a measure of central tendency.
In statistics and probability theory, the median is the value separating the higher half from the
lower half of a data sample, a population, or a probability distribution. For a data set, it may be
thought of as "the middle" value. The basic feature of the median in describing data compared to
the mean (often simply described as the "average") is that it is not skewed by a small proportion
of extremely large or small values, and therefore provides a better representation of a "typical"
value. Median income, for example, may be a better way to suggest what a "typical" income is,
because income distribution can be very skewed. The median is of central importance in robust
statistics, as it is the most resistant statistic, having a breakdown point of 50%: so long as no more
than half the data are contaminated, the median is not an arbitrarily large or small result.
∙ The median is the middle number in a sorted, ascending or descending, list of numbers
and can be more descriptive of that data set than the average.
∙ The median is sometimes used as opposed to the mean when there are outliers in the
sequence that might skew the average of the values.
∙ If there is an odd amount of numbers, the median value is the number that is in the middle,
with the same amount of numbers below and above.
∙ If there is an even amount of numbers in the list, the middle pair must be determined,
added together, and divided by two to find the median value.
Understanding the Median
Median is the middle number in a sorted list of numbers. To determine the median value in a
sequence of numbers, the numbers must first be sorted, or arranged, in value order from lowest
to highest or highest to lowest. The median can be used to determine an approximate average,
or mean, but is not to be confused with the actual mean.

If there is an odd amount of numbers, the median value is the number that is in the middle, with
the same amount of numbers below and above.
If there is an even amount of numbers in the list, the middle pair must be determined, added
together, and divided by two to find the median value.
The median is sometimes used as opposed to the mean when there are outliers in the sequence
that might skew the average of the values. The median of a sequence can be less affected by
outliers than the mean.

Median Example
To find the median value in a list with an odd amount of numbers, one would find the number that
is in the middle with an equal amount of numbers on either side of the median. To find the median,
first arrange the numbers in order, usually from lowest to highest.
For example, in a data set of {3, 13, 2, 34, 11, 26, 47}, the sorted order becomes {2, 3, 11, 13,
26, 34, 47}. The median is the number in the middle {2, 3, 11, 13, 26, 34, 47}, which in this instance
is 13 since there are three numbers on either side.
To find the median value in a list with an even amount of numbers, one must determine the middle
pair, add them, and divide by two. Again, arrange the numbers in order from lowest to highest.
For example, in a data set of {3, 13, 2, 34, 11, 17, 27, 47}, the sorted order becomes {2, 3, 11,
13, 17, 27, 34, 47}. The median is the average of the two numbers in the middle {2, 3, 11, 13, 17,
26 34, 47}, which in this case is fifteen {(13 + 17) ÷ 2 = 15}.
Procedure for Determining Median
When the number of scores is odd, simply arrange the scores in order (from lower to higher or
from higher to lower). The median will be the middle score in the list. Consider the set of scores
2, 5, 7, 10, 12. The score “7” lies in the middle of the scores, so it is median.
When there is an even number of scores in the distribution, arrange the scores in order (from
lower to higher or from higher to lower). The median will be the average of the middle two score
in the list. Consider the set of scores 4, 6, 9, 14 16, 20.
The average of the middle two scores 11.5 (i.e. 9+14/2 = 23/2 = 11.5) is the median of the
distribution. Median is less affected by outliers and skewed data and is usually preferred measure
of central tendency when the distribution is not symmetrical. The median cannot be determined
for categorical or nominal data.
Merits of Median
∙ It is rigidly defined.
∙ It is easy to understand and calculate.
∙ It is not affected by extreme values.
∙ Even if the extreme values are not known median can be calculated.
∙ It can be located just by inspection in many cases.
∙ It can be located graphically.
∙ It is not much affected by sampling fluctuations.
∙ It can be calculated by data based on ordinal scale.
∙ It is suitable for skewed distribution.
∙ It is easily located in individual and discrete classes.
Demerits of Median
∙ It is not based on all values of the given data.
∙ For larger data size the arrangements of the data in the increasing order is
∙ somewhat difficult process.
∙ It is not capable for further mathematical treatment.
∙ It is not sensitive to some change in the data value.
∙ It cannot be used for further mathematical processing.
Uses of Median
The median can be used as a measure of location when one attaches reduced importance to
extreme values, typically because a distribution is skewed, extreme values are not known, or
outliers are untrustworthy, i.e., may be measurement/transcription errors.

For example, consider the multiset

1, 2, 2, 2, 3, 14.
The median is 2 in this case, (as is the mode), and it might be seen as a better indication of the
center than the arithmetic mean of 4, which is larger than all-but-one of the values. However, the
widely cited empirical relationship that the mean is shifted "further into the tail" of a distribution
than the median is not generally true. At most, one can say that the two statistics cannot be "too
far" apart; see § Inequality relating means and medians below.

As a median is based on the middle data in a set, it is not necessary to know the value of extreme
results in order to calculate it. For example, in a psychology test investigating the time needed to
solve a problem, if a small number of people failed to solve the problem at all in the given time a
median can still be calculated.

Because the median is simple to understand and easy to calculate, while also a robust
approximation to the mean, the median is a popular summary statistic in descriptive statistics. In
this context, there are several choices for a measure of variability: the range, the interquartile
range, the mean absolute deviation, and the median absolute deviation.

For practical purposes, different measures of location and dispersion are often compared on the
basis of how well the corresponding population values can be estimated from a sample of data.
The median, estimated using the sample median, has good properties in this regard. While it is
not usually optimal if a given population distribution is assumed, its properties are always
reasonably good. For example, a comparison of the efficiency of candidate estimators shows that
the sample mean is more statistically efficient when — and only when — data is uncontaminated
by data from heavy-tailed distributions or from mixtures of distributions.[citation needed] Even
then, the median has a 64% efficiency compared to the minimum-variance mean (for large normal
samples), which is to say the variance of the median will be ~50% greater than the variance of
the mean.

