Statistics: Introduction

Collection of methods for planning experiments, obtaining data, and then organizing,
summarizing, presenting, analyzing, interpreting, and drawing conclusions.
Characteristic or attribute that can assume different values
Random Variable
A variable whose values are determined by chance.
All subjects possessing a common characteristic that is being studied.
A subgroup or subset of the population.
Characteristic or measure obtained from a population.
Statistic (not to be confused with Statistics)
Characteristic or measure obtained from a sample.

The two main branches of statistics:

1. Descriptive Statistics
Collection, organization, summarization, and presentation of data.
2. Inferential Statistics
Generalizing from samples to populations using probabilities. Performing hypothesis
testing, determining relationships between variables, and making predictions.

Qualitative Variables
Variables which assume non-numerical values.
Quantitative Variables
Variables which assume numerical values.
Discrete Variables
Variables which assume a finite or countable number of possible values. Usually
obtained by counting.
Continuous Variables
Variables which assume an infinite number of possible values. Usually obtained by
Nominal Level
Level of measurement which classifies data into mutually exclusive, all inclusive
categories in which no order or ranking can be imposed on the data.
Ordinal Level
Level of measurement which classifies data into categories that can be ranked.
Differences between the ranks do not exist.
Interval Level
Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
Ratio Level
Level of measurement which classifies data that can be ranked, differences are
meaningful, and there is a true zero. True ratios exist between the different units of
Random Sampling
Sampling in which the data is collected using chance methods or random numbers.
Systematic Sampling
Sampling in which data is obtained by selecting every kth object.
Convenience Sampling
Sampling in which data is which is readily available is used.
Stratified Sampling
Sampling in which the population is divided into groups (called strata) according to
some characteristic. Each of these strata is then sampled using one of the other
sampling techniques.
Cluster Sampling
Sampling in which the population is divided into groups (usually geographically). Some
of these groups are randomly selected, and then all of the elements in those groups
are selected.

Statistics, is a branch of mathematics that deals with the theory and method of
collecting, organizing, presenting, analyzing, and interpreting data. Statistical data are
concerned with qualitative and quantitative data. Data gathering includes gathering
information through interviews, questionnaires, objective observation, experimentations,
psychological tests, and other methods.
In the field of education, statistical tools are used to gather data and information on
enrolment, finance, and physical facilities that are essentially needed to have an effective
administration and management. When an individual analyzes achievement grades, prepares
the tests, provide solution for teaching-learning process, statistical tools or techniques are
In the field of business and economics, it plays an important role in the exploration of
new markets for a product, forecasting of business trends, control and maintenance of high-
quality products, improvement of employee-employer relationship and analysis of data
concerning insurance, investment, sales, employment, transportation, communications,
auditing and accounting procedures and the like.
In the field of science and technology, discoveries as well as inventions are made
possible through scientific experiments. The cause and effect of different variables affecting
experiments are best analysed by means of statistical techniques.
In psychology, statistical tools are used to organize data on intelligent scores,
attitudes, personality traits, ratings, aptitudes, values, behaviour, pattern, etc.
In the government, various records are collected, organized and analysed statistically
for intelligent policy-making. Some examples of these records are taxes, natural resources,
movement of population, income, expenditure, budgets, and many more.
Statistics is a very important tool in researches and studies. Statistical designs and
experiments are utilized to gather more information from a limited body of observation.
Various statistical techniques are used in the laboratories, experimental fields, or under
controlled conditions. The utilization of these tools in statistics is needed so that accurate
and reliable results are determined.
Thus, the study of statistics requires primarily the understanding of basic concepts,
symbols, and mathematical notations.

Descriptive Statistics and Inferential Statistics

The two main branches of statistics are descriptive statistics and inferential statistics.
Both of these are employed in scientific analysis of data and both are equally important for
the student of statistics.

Descriptive Statistics
Descriptive statistics deals with the presentation and collection of data. This is
usually the first part of a statistical analysis. It is usually not as simple as it sounds, and the
statistician needs to be aware of designing experiments, choosing the right focus group and
avoid biases that are so easy to creep into the experiment.

3 main types of descriptive statistics

The 3 main types of descriptive statistics concern with the frequency distribution, central
tendency, and variability of a dataset.
 Distribution refers to the frequencies of different responses.
 Measures of central tendency give you the average for each response.
 Measures of variability show you the spread or dispersion of your dataset.

Inferential Statistics
Inferential statistics, as the name suggests, involves drawing the right conclusions
from the statistical analysis that has been performed using descriptive statistics. In the end,
it is the inferences that make studies important and this aspect is dealt with in inferential
Most predictions of the future and generalizations about a population by studying a
smaller sample come under the purview of inferential statistics. Most social sciences
experiments deal with studying a small sample population that helps determine how the
population in general behaves. By designing the right experiment, the researcher is able
to draw conclusions relevant to his study.
While drawing conclusions, one needs to be very careful so as not to draw
the wrong or biased conclusions. Even though this appears like a science, there are ways in
which one can manipulate studies and results through various means

Population vs Sample

The population includes all objects of interest whereas the sample is only a portion of
the population. Parameters are associated with populations and statistic with samples.
Parameters are usually denoted using Greek letters (mu, sigma) while statistic are usually
denoted using Roman letters (x, s).
There are several reasons why we don't work with populations. They are usually large,
and it is often impossible to get data for every object we're studying. Sampling does not
usually occur without cost, and the more items surveyed, the larger the cost.
We compute statistics, and use them to estimate parameters. The computation is the
first part of the statistics course (Descriptive Statistics) and the estimation is the second
part (Inferential Statistics)

 Parameter
o Characteristic or measure obtained from a population.
 Statistic (not to be confused with Statistics)
o Characteristic or measure obtained from a sample.

Discrete vs Continuous (Categorical and Continuous Variables)

Categorical variables are also known as discrete or qualitative variables. Categorical
variables can be further categorized as either nominal, ordinal or dichotomous.
Discrete variables are usually obtained by counting. There are a finite or countable
number of choices available with discrete data. You can't have 2.63 people in the room.
Continuous variables are usually obtained by measuring. Length, weight, and time are all
examples of continuous variables. Since continuous variables are real numbers, we usually
round them. This implies a boundary depending on the number of decimal places. For
example: 64 is really anything 63.5 ≤ x < 64.5. Likewise, if there are two decimal places,
then 64.03 is really anything 63.025 ≤x < 63.035. Boundaries always have one more decimal
place than the data and end in a 5.

Levels of Measurement
There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These go
from lowest level to highest level. Data is classified according to the highest level which it
fits. Each additional level adds something the previous level didn't have.
 Nominal is the lowest level. Only names are meaningful here.
 Nominal Scale. Nominal variables (also called categorical variables) can be
placed into categories. They don’t have a numeric value and so cannot be
added, subtracted, divided or multiplied. They also have no order; if they
appear to have an order then you probably have ordinal variables instead.
 Nominal variables are variables that have two or more categories, but which do
not have an intrinsic order. For example, a real estate agent could classify
their types of property into distinct categories such as houses, condos, co-ops
or bungalows. Ordinal adds an order to the names. Brand names of motorcycle
in Philippine market, name of public and private universities in Negros Island,
and others.
 Dichotomous variables are nominal variables which have only two categories or
levels. For example, if we were looking at gender, we would most probably
categorize somebody as either "male" or "female". This is an example of a
dichotomous variable (and also a nominal variable). Another example might be
if we asked a person if they owned a mobile phone. Here, we may categorize
mobile phone ownership as either "Yes" or "No".
 Ordinal variable

 Ordinal variables are variables that have

two or more categories just like nominal
variables only the categories can also be
ordered or ranked. So if you asked
someone if they liked the policies of the
Duterte Administration and they could
answer, "They are OK" or "Yes, “Not Okay
or No”, “undecided or it can be yes or
not”, not very much and many more - a
lot categories, then you have an ordinal variable. Why? Because you have
categories in an orderly manner.

Thus, the result can be ranked, you can rank them from the most positive (Yes,
a lot), to the middle response (They are OK), to the least positive (Not very much).
However, while we can rank the levels, we cannot place a "value" to them; we cannot
say that "They are OK" is twice as positive as "Not very much" for example.

Another example:
Response Strongly disagree undecided agree Strongly agree
Rating 1 2 3 4 5

Characteristics of the Ordinal Scale

 The ordinal scale shows the relative ranking of the variables
 It identifies and describes the magnitude of a variable
 Along with the information provided by the nominal scale, ordinal scales give the
rankings of those variables
 The interval properties are not known
 The surveyors can quickly analyze the degree of agreement concerning the identified
order of variables
 Ranking of school students – 1st, 2nd, 3rd, etc.
 Ratings in restaurants
 Evaluating the frequency of occurrences
Very often
Not often
Not at all
 Assessing the degree of agreement
Totally agree
Totally disagree
 Interval adds meaningful differences
Interval variables are variables for which their central characteristic is that they can
be measured along a continuum and they have a numerical value (for example,
temperature measured in degrees Celsius or Fahrenheit). So the difference between
20°C and 30°C is the same as 30°C to 40°C. However, temperature measured in
degrees Celsius or Fahrenheit is NOT a ratio variable.

Characteristics of Interval Scale:

 The interval scale is quantitative as it can quantify the difference between the values
 It allows calculating the mean and median of the variables
 To understand the difference between the variables, you can subtract the values
between the variables
 The interval scale is the preferred scale in Statistics as it helps to assign any
numerical values to arbitrary assessment such as feelings, calendar types, etc.

 Likert Scale
 Net Promoter Score (NPS)
How likely is it that you would recommend this company to a friend or colleague?

 Ratio adds a zero so that ratios are meaningful.

Note: The ratio scale is exactly the same as the interval scale with one major
difference: zero is meaningful.

Ratio variables are interval variables, but with the added condition that 0 (zero) of
the measurement indicates that there is none of that variable. So, temperature
measured in degrees Celsius or Fahrenheit is not a ratio variable because 0°C does not
mean there is no temperature. However, temperature measured in Kelvin is a ratio
variable as 0 Kelvin (often called absolute zero) indicates that there is no
temperature whatsoever. Other examples of ratio variables include height, mass,
distance and many more.
The name "ratio" reflects the fact that you can use the ratio of measurements.
So, for example, a distance of ten metres is twice the distance of 5 metres.
Ambiguities in classifying a type of variable 1

In some cases, the measurement scale for data is ordinal, but the variable is treated
as continuous. For example, a Likert scale that contains five values - strongly agree, agree,
neither agree nor disagree, disagree, and strongly disagree - is ordinal. However, where a
Likert scale contains seven or more value - strongly agree, moderately agree, agree, neither
agree nor disagree, disagree, moderately disagree, and strongly disagree - the underlying
scale is sometimes treated as continuous (although where you should do this is a cause of
great dispute).
It is worth noting that how we categorize variables is somewhat of a choice. Whilst
we categorized gender as a dichotomous variable (you are either male or female), social
scientists may disagree with this, arguing that gender is a more complex variable involving
more than two distinctions, but also including measurement levels like genderqueer,
intersex and transgender. At the same time, some researchers would argue that a Likert
scale, even with seven values, should never be treated as a continuous variable.
Population vs Sample – What is the difference?

Usually, a sample of the population is used in research, as it is easier and cost-

effective to process a smaller subset of the population rather than the entire group.
In this table, we can take a closer look at the difference between sample and population:

Population Sample
The measurable characteristic of the
The measurable characteristic of the sample is
population like the mean or standard
called a statistic.
deviation is known as the parameter.
Population data is a whole and complete The sample is a subset of the population that is
set. derived using sampling.
A survey done of an entire population is
A survey done using a sample of the population
accurate and more precise with no margin of
bears accurate results, only after further
error except human inaccuracy in responses.
factoring the margin of error and confidence
However, this may not be
possible always.
The parameter of the population is a The statistic is the descriptive component of
numerical or measurable element that the sample found by using sample mean or
defines the system of the set. sample proportion.

Although Population and Sample are two different terms, they both are related to
each other. The population is used to draw samples. To make statistical inferences about
the population is the primary purpose of the sample. Without the population, samples can’t
exist. The better the quality of the sample, the higher the level of accuracy of

Sample rather than the population

Population vs Sample – top seven reasons to choose a sample from a given population
Sampling is a must to conduct any research study. Here are the top seven reasons to use a
 Practicality: In most cases, a population can be too large to collect accurate data –
which is not practical. Samples offer a representation of the whole population if sampled
accordingly. Samples allow researchers to collect data that can be analyzed to provide
insights into the entire population.
 It offers urgent data: When it comes to research, the amount of time available can be a
defining factor for a study. A sample provides a smaller set of the population for review
that delivers data that is useful to represent the whole population. Surveying a smaller
sample, as opposed to the entire population, can save precious time for researchers and
offer urgent data.
 Cost-effective: The cost of conducting research is often a parameter for the study.
Researchers must do the best with the resources they have at hand, to carry out a survey
and gain accurate insights. Surveying a representative sample of a population is cost-
effective as it requires fewer resources – like computers, researchers, interviewers,
servers, and data collection centers.
 Accuracy of representation: Depending on the method of sampling, research conducted
on a sample can be accurate with lesser non-response bias, than if performed by the
census. A sample that is selected using the non-probability method is an accurate
representation of the population. This data collected can be used to gather insight into
the whole community.
 Inferential statistics: Inferential statistics is a process by which representative data is
used to infer insights about the entire population. Data collected from a sample
represents the whole population. Inferential statistics can only be obtained using data
 At times, a sample is more accurate than a census: A census of an entire population
does not always offer accurate data due to errors such as inconsistency in responses, or
non-response bias. A carefully obtained sample, however, does away with this bias and
provides more accurate data – that adequately represents the population.
 Manageable: Sometimes, collecting an entire population of data is near impossible as
some populations are too challenging to come by. In this case, a sample can be used to
represent the study as it is feasible, manageable, and accessible.

Types of Sampling (probabilistic and non-probabilistic sampling)

Random sampling simply describes when every element in a population has an
equal chance of being chosen for the sample.
Probability sampling means that every member of the target population has a
known chance of being included in the sample. Probability sampling methods include
simple random sampling, systematic sampling, stratified sampling, and cluster sampling.

 Random sampling is analogous to putting everyone's name into a hat and drawing out
several names. Each element in the population has an equal chance of occuring.
While this is the preferred way of sampling, it is often difficult to do. It requires that
a complete list of every element in the population be obtained. Computer generated
lists are often used with random sampling.
 Systematic sampling is easier to do than random sampling. In systematic sampling, the
list of elements is "counted off". That is, every kth element is taken. This is similar to
lining everyone up and numbering off "1,2,3,4; 1,2,3,4; etc". When done numbering,
all people numbered 4 would be used.
 Stratified sampling also divides the population into groups called strata. However, this
time it is by some characteristic, not geographically. For instance, the population
might be separated into males and females. A sample is taken from each of these
strata using either random, systematic, or convenience sampling.
 Cluster sampling is accomplished by dividing the population into groups -- usually
geographically. These groups are called clusters or blocks. The clusters are randomly
selected, and each element in the selected clusters are used.

Cluster sampling starts by dividing a population into groups, or clusters. What

makes this different that stratified sampling is that each cluster must be representative of
the population. Then, you randomly selecting entire clusters to sample.
 Cluster sampling: convenience and ease of use.
 Simple random sampling: creates samples that are highly representative of the
 Stratified random sampling: creates strata or layers that are highly representative of
strata or layers in the population.
 Systematic sampling: creates samples that are highly representative of the population,
without the need for a random number generator.

 Cluster sampling: might not work well if unit members are not homogeneous (i.e. if
they are different from each other).
 Simple random sampling: tedious and time consuming, especially when creating larger
 Stratified random sampling: tedious and time consuming, especially when creating
larger samples.
 Systematic sampling: not as random as simple random sampling,

Non-probability sampling

Non-probability sampling, on the other hand, does not involve “random”

processes for selecting participants. In non-probability sampling, the members of
the population will not have an equal chance of being selected, and in many
cases, there will be members of the population who have no chance of being
selected. For example, if your population of interest is college professors but you
only invite professors from your school to participate, this would be a non-
probability sample because professors from other colleges have no chance to
participate. This method of convenience sampling, which involves selecting only
participants who are readily accessible, is one of the most common types of non-
probability sampling.
Non-probability sampling is not ideal for quantitative research because
results from non-probability samples cannot be generalized to the larger
population as confidently compared to probability samples. However, non-
probability sampling is often used in quantitative research because probability
sampling is not always feasible. Going back to the college professor example, it
may not be possible for you to select a random sample from all possible college
professors in the general population. You likely would not be able to compile a
list of every single college professor in the population with their contact
information. In these cases, quantitative researchers may resort to convenience
sampling. On the other hand, non-probability sampling is well-suited for many
types of qualitative research. This is because qualitative research is not always
concerned with generalizing the results to a larger population.
Qualitative researchers often use purposive sampling, a non-probability
sampling technique in which the researcher chooses participants because they
have specific expertise or insight regarding the phenomenon of interest.
 Convenience sampling is very easy to do, but it's probably the worst technique to use.
In convenience sampling, readily available data is used. That is, the first people the
surveyor runs into.
 Quota sampling is a non-probabilistic sampling method where we divide the survey
population into mutually exclusive subgroups. These subgroups are selected with
respect to certain known (and thus non-random) features, traits, or interests. People
in each subgroup are selected by the researcher or interviewer who is conducting the

How to get quota sampling right

Unlike random sampling or stratified sampling, quota sampling has no formal rules or

proportions. Follow the steps below to get quota sampling right.
1. Divide the sample population into subgroups
These should be mutually exclusive. For example, you might divide a certain student
population by their professional degree courses, such as engineering, arts, humanities, and
2. Figure out the weightages of subgroups
The weightage is how much of your sample a given subgroup will be. For example, you can
assign a weightage of 25% for engineering students, 30% for humanities students, 15% for arts
students, and 30% for students specializing in medicine.
3. Select an appropriate sample size
The quota size should be representative of the collective subgroup population. For example,
you can select a total sample of 500 students from a population of 50,000 students.
4. Survey while adhering to the subgroup population proportions
Choose survey respondents based on the weightages allotted in Step 2. For our example,
survey engineering students until you reach the specified weightage — 25% of the 500-
student sample, which is 125 students. Continue with this process until all the quotas are
filled and 500 students have been surveyed.

Snowball Sampling: Definition, Advantages and Disadvantages

Snowball sampling is where research participants recruit other participants for a test
or study. It is used where potential participants are hard to find. It’s called snowball
sampling because (in theory) once you have the ball rolling, it picks up more “snow” along
the way and becomes larger and larger. Snowball sampling is a non-probability sampling
method. It doesn’t have the probability involved, with say, simple random sampling (where
the odds are the same for any particular participant being chosen). Rather, the researchers
used their own judgment to choose participants.
Snowball sampling consists of two steps:
1. Identify potential subjects in the population. Often, only one or two subjects can be
found initially.
2. Ask those subjects to recruit other people (and then ask those people to recruit.
Participants should be made aware that they do not have to provide any other names.
These steps are repeated until the needed sample size is found. Ethically, the study
participants should not be asked to identify other potential participants. Rather, they should
be asked to encourage others to come forward. When individuals are named, it’s sometimes
called “cold-calling”, as you are calling out of the blue. Cold-calling is usually reserved for
snowball sampling where there’s no risk of potential embarrassment or other ethical

Why is Snowball Sampling Used?

Some people may not want to be found. For example, if a study was investigating
cheating on exams, shoplifting, drug use, or any other “unacceptable” societal behavior,
potential participants would be wary of coming forward because of possible ramifications.
However, other study participants would likely know other people in the same situation as
themselves and could inform others about the benefits of the study and reassure them of

Advantages and Disadvantages of Snowball Sampling

It allows for studies to take place where otherwise it might be impossible to conduct
because of a lack of participants.
 Snowball sampling may help you discover characteristics about a population that you
weren’t aware existed. For example, the casual illegal downloader vs. the for-profit

 It is usually impossible to determine the sampling error or make inferences about
populations based on the obtained sample.
Snowball sampling is also known as cold-calling, chain sampling, chain-referral sampling,
and referral sampling.

Judgmental sampling is a non-probability sampling technique where the researcher

selects units to be sampled based on their knowledge and professional judgment.
This type of sampling technique is also known as purposive sampling and
authoritative sampling.
Purposive sampling is used in cases where the specialty of an authority can select a
more representative sample that can bring more accurate results than by using other
probability sampling techniques. The process involves nothing but purposely handpicking
individuals from the population based on the authority's or the researcher's knowledge and

Example of Judgmental Sampling

In a study wherein a researcher wants to know what it takes to graduate summa cum
laude in college, the only people who can give the researcher first hand advise are the
individuals who graduated summa cum laude. With this very specific and very limited pool of
individuals that can be considered as a subject, the researcher must use judgmental

When to Use Judgmental Sampling

Judgmental sampling design is usually used when a limited number of individuals
possess the trait of interest. It is the only viable sampling technique in obtaining information
from a very specific group of people. It is also possible to use judgmental sampling if the
researcher knows a reliable professional or authority that he thinks is capable of assembling
a representative sample.

Setbacks of Judgmental Sampling

The two main weaknesses of authoritative sampling are with the authority and in the
sampling process; both of which pertains to the reliability and the bias that accompanies the
sampling technique.
Unfortunately, there is usually no way to evaluate the reliability of the expert or the
authority. The best way to avoid sampling error brought by the expert is to choose the best
and most experienced authority in the field of interest.
When it comes to the sampling process, it is usually biased since
no randomization was used in obtaining the sample. It is also worth noting that the members
of the population did not have equal chances of being selected. The consequence of this is
the misrepresentation of the entire population which will then limit generalizations of the
results of the st

