Professional Documents
Culture Documents
Statistics and Probability Handouts - Basic Terms in Statistics
Statistics and Probability Handouts - Basic Terms in Statistics
Introducing Statistics
Statistics as a Tool in Decision-Making
Statistics is defined as a science that studies data to be able to make a decision.
Hence, it is a tool in decision-making process. Mention that Statistics as a science involves the
methods of collecting, processing, summarizing and analyzing data in order to provide answers
or solutions to an inquiry. One also needs to interpret and communicate the results of the
methods identified above to support a decision that one makes when faced with a problem or an
inquiry.
Trivia: The word “statistics” actually comes from the word “state”—because
governments have been involved in the statistical activities, especially the conduct of censuses
either for military or taxation purposes. The need for and conduct of censuses are recorded in
the pages of holy texts. In the Christian Bible, particularly the Book of Numbers, God is reported
to have instructed Moses to carry out a census. Another census mentioned in the Bible is the
census ordered by Caesar Augustus throughout the entire Roman Empire before the birth of
Christ.
KEY POINTS
Difference between questions that could be and those that could not answered using
Statistics.
Statistics is a science that studies data.
There are many uses of Statistics but its main use is in decision-making.
Logical decisions or solutions to a problem could be attained through a statistical process.
Basic Terms in Statistics
Definition of Basic Terms
The collection of respondents from whom one obtains the data is called the universe of
the study. Universe is not necessarily composed of people. Since there are studies where the
observations were taken from plants or animals or even from non-living things like buildings,
vehicles, farms, etc. So formally, we define universe as the collection or set of units or entities
from whom we got the data.
A variable is a characteristic that is observable or measurable in every unit of the
universe.
The set of all possible values of a variable is referred to as a population. Thus for each
variable we observed, we have a population of values. The number of population in a study will
be equal to the number of variables observed.
A subgroup of a universe or of a population is a sample. There are several ways to take
a sample from a universe or a population and the way we draw the sample dictates the kind of
analysis we do with our data.
ii. Quantitative (otherwise called numerical) data, whose sizes are meaningful, answer
questions such as “how much” or “how many”. Quantitative variables have actual units of
measure. Examples of quantitative variables include the height, weight, number of
registered cars, household size, and total household expenditures/income of survey
respondents. Quantitative data may be further classified into:
a. Discrete data are those data that can be counted, e.g., the number of days for
cellphones to fail, the ages of survey respondents measured to the nearest year,
and the number of patients in a hospital. These data assume only (a finite or
infinitely) countable number of values.
b. Continuous data are those that can be measured, e.g. the exact height of a survey
respondent and the exact volume of some liquid substance. The possible values are
uncountably infinite.
KEY POINTS
Levels of Measurement
Levels of Measurement
a. Nominal level of measurement arises when we have variables that are categorical and
non-numeric or where the numbers have no sense of ordering. Examples of the
variables measured at the nominal level include sex, marital status, religious affiliation.
The numbers used in nominal level are simply for numerical codes, and cannot be used
for ordering and any mathematical computation.
b. Ordinal level also deals with categorical variables like the nominal level, but in this level
ordering is important, that is the values of the variable could be ranked. Using the codes,
the responses could be ranked. Examples of the ordinal scale include socio economic
status (A to E, where A is wealthy, E is poor), difficulty of questions in an exam (easy,
medium difficult), rank in a contest (first place, second place, etc.), and perceptions in
Likert scales.
c. Interval level tells us that one unit differs by a certain amount of degree from another
unit. Knowing how much one unit differs from another is an additional property of the
interval level on top of having the properties possess by the ordinal level. Celsius scale
is in interval level. Other example of a variable measure at the interval is the Intelligence
Quotient (IQ) of a person.
d. Ratio level also tells us that one unit has so many times as much of the property as
does another unit. The ratio level possesses a meaningful (unique and non-arbitrary)
absolute, fixed zero point and allows all arithmetic operations. The existence of the zero
point is the only difference between ratio and interval level of measurement. Examples of
the ratio scale include mass, heights, weights, energy and electric charge.
Variables were observed or measured using any of the three methods of data collection,
namely: objective, subjective and use of existing records. The objective and subjective
methods obtained the data directly from the source. The former uses any or combination of the
five senses (sense of sight, touch, hearing, taste and smell) to measure the variable while the
latter obtains data by getting responses through a questionnaire. The resulting data from these
two methods of data collection is referred to as primary data.
On the other hand, secondary data are obtained through the use of existing records or
data collected by other entities for certain purposes. For example, when we use data gathered
by the Philippine Statistics Authority, we are using secondary data and the method we employ
to get the data is the use of existing records. Other data sources include administrative records,
news articles, internet, and the like. However, we must emphasize to the students that when we
use existing data we must be confident of the quality of the data we are using by knowing how
the data were gathered. Also, we must remember to request permission and acknowledge the
source of the data when using data gathered by other agency or people.
KEY POINTS