Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Module 001 Basic Statistical Concept

At the end of this module you are expected to:


1. Explain basic statistical terms and concepts;
2. Compute for mean, median, mode; and
3. Identify the importance of statistics to your area of specialization.

Introduction
In today’s technologically advanced world, we have access to large volumes of data. The first
step of data analysis is to accurately summarize all of this data, both graphically and
numerically, so that we can understand what the data reveals. To be able to use and interpret
the data correctly is essential in making informed decisions. For instance, when you see a
survey of opinion about a certain TV program, you may be interested in the proportion of
those people who indeed like the program.
This is an example of application of statistics:

What is Statistics?
According to the International Encyclopedia of Statistical Science, Statistics is the
study of how to collect, organize, analyze, and interpret numerical information from
data. It is both the science of uncertainty and the technology of extracting information
from data.
It is a particularly useful branch of mathematics that is not only studied theoretically
by advanced mathematicians but one that is used by researchers in many fields to
organize, analyze and summarize data. Statistical methods and analyses are often used
to communicate research findings and to support hypotheses and give cre dibility to
research methodology and conclusions.
Statistics is a branch of science that deals with the collection, organization, analysis of
data and drawing of inferences from the samples to the whole population. This
requires a proper design of the study, an appropriate selection of the study sample and
choice of a suitable statistical test. An adequate knowledge of statistics is necessary for
proper designing of an epidemiological study or a clinical trial. Improper statistical
methods may result in erroneous conclusions which may lead to unethical practice.

Variables
Variable is a characteristic that varies from one individual member of population to
another individual. Variables such as height and weight are measured by some type
of scale, convey quantitative information and are called as quantitative variables.
Sex and eye color give qualitative information and are called as qualitative variables .
Course Module
Quantitative Variables
Quantitative or numerical data are numerical measurements that arise from a
natural numerical scale. It represents measurable quantities. The values which these
variables can take can be ordered in a logical or natural way. Examples are:
 Size of shoes,
 Price of houses,
 Number of semesters studied, and
 Weight of a person
Quantitative Variables are subdivided into discrete and continuous
measurements:
1. Discrete numerical data are recorded as a whole number such as 0, 1, 2, 3,…
(integer). Observations that can be counted constitute the discrete data.
Example:
Number of episodes of respiratory arrests or the number of re-intubations in
an intensive care unit

2. Continuous data can assume any value. and observations that can be
measured constitute the continuous data.
Example:
The serial serum glucose levels, partial pressure of oxygen in arterial blood
and the esophageal temperature.

Qualitative Variables

Qualitative Data are measurements for which there is no natural numerical scale,
but which consist of attributes, labels, or other non numerical characteristics. These
are variables that cannot be ordered in a logical or natural way. For example:
 The color of the eye,
 The name of a political party, and
 The type of transport used to travel to work.
These are all qualitative variables. Neither is there any reason to list blue eyes before
brown eyes (or vice versa) nor does it make sense to list buses before trains (or vice
versa).
Descriptive and Inferential Statistics

There are two main branches of statistics: descriptive and inferential. Descriptive
statistics is used to say something about a set of information that has been collected
only. It tries to describe the relationship between variables in a sample or population
and provides a summary of data in the form of mean, median and mode. Inferential
statistics is used to make predictions or comparisons about a larger group (a
population) using information gathered about a small part of that population and
uses a random sample of data taken from a population to describe and make
inferences about the whole population. It is valuable when it is not possible to
examine each member of an entire population. Thus, inferential statistics involves
generalizing beyond the data, something that descriptive statistics does not do.

For deeper understanding:


Descriptive Statistics is the branch of statistics that involves organizing, displaying
and describing data. It aims to describe various aspects of the data obtained in the
study.
 Listings,
 Summary Statistics,
 Graphics

The extent to which the observations cluster around a central location is described
by the central tendency and the spread towards the extremes is described by the
degree of dispersion

Measures of Central Tendency


The measures of central tendency are mean, median and mode.
Mean (or the arithmetic average) is the sum of all the scores divided by the number
of scores. Mean may be influenced profoundly by the extreme variables. For example,
the average stay of organ phosphorus poisoning patients in ICU may be influenced by
a single patient who stays in ICU for around 5 months because of septicemia. The
extreme values are called outliers. In short, the mean is the sum of all the values in a
set, divided by the number of values. The mean of a whole population is usually
denoted by µ while the mean of a sample is usually denoted by 𝑥̅ .
Thus the mean of a set {a 1, a2, … , an} is given by

𝑎1 + 𝑎2 + … + 𝑎𝑛
𝜇=
𝑛

where
µ is the population mean, or
𝑥̅ is the sample mean
Course Module
n is the total number of items in a set
a is each element in a set
Example:
Given the set of values: {1, 2, 4, 7}, we substitute the values to the given formula.

1+2+4+ 7
𝜇=
4
14
𝜇=
4
𝜇 = 3.5

The mean of the given set {1, 2, 4, 7} is 3.5.

Median is defined as the middle of a distribution in a ranked data (with half of the
variables in the sample above and half below the median value). If the number of
values in a set is even, then the median is the sum of the two middle values divided
by two (2).
Mode is the most frequently occurring variable in a distribution. A set can have more
than one mode.
 Unimodal – A set that has only one mode
 Bimodal – A set with two modes
 Multimodal – A set with three or more modes
Variance is a measure of how spread out is the distribution. It gives an indication of
how close an individual observation clusters about the mean value. The variance of a
population is defined by the following formula:

2
∑ (Xi − X)2
σ =
N

where:
σ2 is the population variance,
X is the population mean,
Xi is the i th element from the population, and
N is the number of elements in the population.
The variance of a sample is defined by slightly different formula:

∑ (Xi − X)2
s2=
n−1
where:
s2 is the sample variance,
x is the sample mean,
xi is the i th element from the sample and
n is the number of elements in the sample.

The formula for the variance of a population has the value ‘n’ as the denominator. The
expression ‘n−1’ is known as the degrees of freedom and is one less than the number
of parameters. Each observation is free to vary, except the last one which must be a
defined value. The variance is measured in squared units. To make the interpretation
of the data simple and to retain the basic unit of observation, the square root of
variance is used. The square root of the variance is the standard deviation (SD).The
SD of a population is defined by the following formula:

σ = √((∑(Xi − X)2 /N)


where:
σ is the population SD,
X is the population mean,
Xi is the i th element from the population, and
N is the number of elements in the population.

The SD of a sample is defined by slightly different formula:

∑(Xi − X)2
σ = √(( )
n− 1

where:
s is the sample SD,
x is the sample mean,
xi is the i th element from the sample, and
n is the number of elements in the sample

Course Module
Inferential Statistics is the branch of statistics that involves drawing conclusions
about a population based on the information contained in a sample taken from that
population. It forms a basis for a conclusion regarding a pre specified objective
addressing the underlying population.

Population vs Sample

We begin with a simple example. There are millions of passenger automobiles in the
United States. What is their average value? It is obviously impractical to attempt to
solve this problem directly by assessing the value of every single car in the country,
adding up all those numbers, and then dividing by however many numbers there are.
Instead, the best we can do would be to estimate the average. One natural way to do
so would be to randomly select some cars, say 200 of them, ascertain the value of each
of those cars, and find the average of those 200 numbers.
The set of all those millions of vehicles is called the population of interest, and the
number attached to each one is a measurement, the average value is a parameter. The
set of 200 cars selected from the population is called a sample, and the 200 numbers,
the monetary values of the cars we selected, are the sample data. And the average of
the data is the statistics.

For better understanding, a population is any specific collection of objects of interest


while a sample is any subset or sub collection of the population, including the case
that sample consists of the whole population. A measurement is a number or
attribute computed for each member of a population or a sample while a parameter
is a number that summarizes some aspect of the population as a whole.

Basis for Comparison Population Sample

Meaning Refers to the collection of Means a subgroup of the


all elements possessing members of population
common characteristics, chosen for participation in
that comprises universe the study

Includes Each and every unit of the Only a handful of units of


group population

Characteristics Parameter Statistic

Data Collection Complete enumeration or Sample survey or sampling


census

Focus On Identifying the Making inferences about


characteristics population

Table 1: Shows the Comparison of Population and Sample


Population
In simple terms, population means the aggregate of all elements under study having
one or more common characteristic, for example, all people living in India constitutes
the population. The population is not confined to people only, but it may also include
animals, events, objects, buildings, etc. It can be of any size, and the number of
elements or members in a population is known as population size, i.e. if there are
hundred million people in India, then the population size (N) is 100 million. The
different types of population are discussed as under:

1. Finite Population: When the number of elements of the population is fixed and
thus making it possible to enumerate it in totality, the population is said to be
finite.

2. Infinite Population: When the number of units in a population are uncountable,


and so it is impossible to observe all the items of the universe, then the
population is considered as infinite.

3. Existent Population: The population which comprises of objects that exist in


reality is called existent population.

4. Hypothetical Population: Hypothetical or imaginary population is the population


which exists hypothetically.

Examples
 The population of all workers working in the sugar factory
 The population of motorcycles produced by a particular company
 The population of mosquitoes in a town
 The population of tax payers in India

Sample

By the term sample, we mean a part of population chosen at random for participation
in the study. The sample so selected should be such that it represent the population
in all its characteristics, and it should be free from bias, so as to produce miniature
cross-section, as the sample observations are used to make generalizations about the
population.
In other words, the respondents selected out of population constitutes a ‘sample’, and
the process of selecting respondents is known as ‘sampling.’ The units under study
are called sampling units, and the number of units in a sample is called sample size.
While conducting statistical testing, samples are mainly used when the sample size is
too large to include all the members of the population under study.

Key Differences Between Population and Sample


Course Module
The difference between population and sample can be drawn clearly on the following
grounds:

The collection of all elements possessing common characteristics that comprise


universe is known as the population. A subgroup of the members of population
chosen for participation in the study is called sample.

The population consists of each and every element of the entire group. On th e other
hand, only a handful of items of the population is included in a sample.

The characteristic of population based on all units is called parameter while the
measure of sample observation is called statistic.

When information is collected from all units of population, the process is known as
census or complete enumeration. Conversely, the sample survey is conducted to
gather information from the sample using sampling method.

With population, the focus is to identify the characteristics of the eleme nts whereas
in the case of the sample; the focus is made on making the generalization about the
characteristics of the population, from which the sample came from.

Importance of Statistics
In general, statistics can be defined as a branch of applied resear ch which is
concerned with the development and application methods for collecting, organizing,
presenting, analyzing and interpreting quantitative data in such a way that the
reliability of conclusions based on data may be evaluated in terms of probability
statements. It can be used in a diversified field of study; some of the purpose of
statistics can be as follows:

1. To Present Facts in Definite Form

We can represent the things in their true form with the help of figures. Without
a statistical study, our ideas would be vague and indefinite. The facts are to be
given in a definite form. If the results are given in numbers, then they are more
convincing than if the results are expressed on the basis of quality. The
statements like, there is lot of unemployment in India or population is
increasing at a faster rate are not in the definite form. The statements should
be in definite form like the population in 2004 would be 15% more as
compared to 1990.
2. Precision to the Facts

The statistics are presented in a definite form so they also help in condensing
the data into important figure, so statistical methods present meaningful
information. In other words statistics helps in simplifying complex data to
simple-to make them understandable. The data may be presented in the form
of a graph, diagram or through an average, or coefficients etc. For example, we
cannot know the price position from individual prices of all good, but we can
know it, if we get the index of general level of prices.

3. Comparisons

After simplifying the data, it can be correlated as well as compared. The


relationship between the two groups is best represented by certain
mathematical quantities like average or coefficients etc. Comparison is one of
the main functions of statistics as the absolute figures convey a very less
meaning.

4. Formulation and Testing of Hypothesis

These statistical methods help us in formulating and testing the hypothesis or


a new theory. With the help of statistical techniques, we can know the effect of
imposing tax on the exports of tea on the consumption of tea in other
countries. The other example could be to study whether credit squeeze is
effective in checking inflation or not.

5. Forecasting

Statistics is not only concerned with the above functions, but it also predicts
the future course of action of the phenomena. We can make future policies on
the basis of estimates made with the help of Statistics. We can predict the
demand for goods in 2005 if we know the population in 2004 on the basis of
growth rate of population in past. Similarly a businessman can exploit the
market situation in a successful manner if he knows about the trends in the
market. The statistics help in shaping future policies.

6. Policy Making

With help of statistics we can frame favourable policies. How much food is
required to be imported in 2007? It depends on the food-production in 2007
and the demand for food in 2007. Without knowing these factors we cannot
estimate the amount of imports. On the basis of forecast the government fo rms
the policies about food grains, housing etc. But if the forecasting is not correct,
then the whole set up will be affected.
Course Module
7. It Enlarges Knowledge

Whipple rightly remarks that “Statistics enables one to enlarge his horizon”.
So when a person goes through various procedures of statistics, it widens his
knowledge pattern. It also widens his thinking and reasoning power. It also
helps him to reach to a rational conclusion.

8. To Measure Uncertainty:

Future is uncertain, but statistics help the various authorities in all the
phenomenon of the world to make correct estimation by taking and analyzing
the various data of the part. So the uncertainty could be decreased. As we have
to make a forecast we have also to create trend behaviors of the past, for which
we use techniques like regression, interpolation and time series analysis.

Some of the major purposes of statistics are to help us understand and describe
phenomena in our world and to help us draw reliable conclusions about those
phenomena. It plays an important role in every field of human activity.

1. Business – It helps a businessman to plan his production according to the taste


and preference of the customer. It also helps to determine the quality of the
product. A businessman can make correct decision regarding the location of
business, marketing of products, finance and resources through statistics.

2. Economics – Economics largely depends upon statistics, In economics,


statistical methods are used for collecting and analysis the data. The
relationship between supply and demands is also studied by statistical
methods. The imports and exports, inflation rate, the per capita income are
some of the problems which require good knowledge in statistics.

3. Mathematics – The large number of statistical methods like probability


averages, dispersions, and estimation are used in mathematics and different
techniques of pure mathematics like integration, differentiation and algebra
are used in statistics, thus statistics and mathematics are interrelated with
each other.

4. Banking – The banks make use of statistics for number of purposes. The
bankers use statistical approaches to estimate the number of depositors and
their claims for a certain day.
References and Supplementary Materials
Books and Journals
1. Jones, Burton W.; 2015; Enhanced Edition of Understandable Statistics; USA; Cengage
Learning
2. Huemann, Christian, Schomaker, Michael; 2016; Introduction to Statistics and Data
Analysis; Switzerland; Springer

Online Supplementary Reading Materials


1. MA121: Introduction to Statistics;
https://learn.saylor.org/course/view.php?id=28&sectionid=273; July 29, 2018
2. The Role of Statistics in Research;
https://www.bcps.org/offices/lis/researchcourse/statistics_role.html; July 29, 2018
3. The Importance of Statistics;
https://slministryofplanning.org/index.php/news/english/72-the-importance-of-
statistics; July 29, 2018
4. Basic Definitions and Concepts; https://saylordotorg.github.io/text_introductory-
statistics/s05-01-basic-definitions-and-concepts.html; July 29, 2018
5. 8 Functions of Statistics; www.economicdiscussion.net/statistics/8-functions-of-
statistics-scope-and-importance/2325; July 29, 2018
6. Basic Statistical Tools in Research and Data Analysis;
https://www.researchgate.net/publication/308133810_Basic_statistical_tools_in_res
earch_and_data_analysis; August 2, 2018

Online Instructional Videos


1. Discuss the definition of statistics, difference of inferential and descriptive and
population and sample. It is a brief overview about statistics and common vocabulary
used in the field of statistics.; https://youtu.be/MXaJ7sa7q-8; July 29, 2018
2. It is a breakdown of statistics into three areas namely Sampling Method, Descriptive
Statistics and Inferential Statistics; https://youtu.be/SFPGVTThJNk; July 29, 2018
3. Discuss the definition of statistics and its importance. It uses graphics, pictures, and
interesting stories to illustrate the relevance of statistics and how many different
things can be learned about the nation and its communities through the study of
statistics; https://youtu.be/yxXsPcobphQ; July 29, 2018

Course Module

You might also like