Statistics 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Chapter IV

The Science of Statistics


When Steve Harvey made a mistake by announcing that Miss Columbia won the Miss Universe contest
in 2015, instead of Miss Pia Wurtzback of the Philippines, he was able to convince the people of the
world by showing statistics that indeed it was Pia who won.
Without statistics, Steve would have been dead by now.
History of Statistics
The Word statistics have been derived from Latin word “Status” or the Italian word “Statista”, which
means “Political State” or a Government.
The application of statistics was very limited but rulers and kings needed information about lands,
agriculture, commerce, population of their states to assess their military potential, their wealth, taxation
and other aspects of government.
 In 1771 W. Hooper (Englishman) defines statistics as the science that teaches the political
arrangement of all the modern states of the known world.
 William S Gosset in the 20th century, he developed the methods for decision making based on
small set of data.
Meaning of Statistics
 Statistics -- is the study of the collection, organization, analysis, interpretation and
presentation of data. It deals with all aspects of data, including the planning of data collection
in terms of the design of surveys and experiments.
The field of statistics is divided into mathematical statistics and applied statistics.
 Mathematical statistics - deal with the development and exposition of theories that serve as
bases of statistical methods.
 Applied statistics - refers to the application of statistical methods to solve real problems
involving randomly generated data as well as the development of new statistical method
motivated by real numbers.
There are two branches of statistics and these are inferential statistics and descriptive
statistics.
Inferential statistics - refers to the process of drawing and making decision on the population based
on evidence obtained from a sample. Inferential statistics include estimation and hypothesis testing.
Descriptive statistics - summarize the population data by describing what was observed in the
sample numerically or graphically. Numerical descriptors include mean and standard
deviation for continuous data types (like heights or weights), while frequency and percentage are more
useful in terms of describing categorical data (like race).
The data (Asaad, 2004) are the quantities or qualities measured or observed that are to be collected
and/or analyzed.
 A collection of data is called data set.
Two categories of data are categorical and continuous data.
 Categorical data are nominal and ordinal scales.
 Continuous data are ratio and interval scales.
Nominal scale consists of a finite set of possible values having no particular order.
Example: Gender, Mode of transportation, Nationality, Occupation, Civil status
Ordinal Scale is a set of possible values having a specific order.
Examples: Pain level, Social status, Attitude toward a subject
Continuous Data has interval and ratio scales.
Interval scales are measured in continuum and differences between any two numbers on the scale
are known size.
Exam ples: temperature, tons of garbage, number of arrests, income, and age.
A variable refers to a property that can take on different values or categories which cannot be
predicted with certainty.
The three common types of variables are (1) independent variables or X which also called
explanatory variables, these may be continuous, nominal or ordinal.
(2) Dependent variables or Y variables which are also called the responsive variables.
(3) Control variables, the Z variables
Variables can also be classified as quantitative variable and quantitative variable.
Quantitative variable is one that can be measured and ordered according to quantity.
Qualitative variable is one simply used as labels to distinguish one group from the another.
Quantitative variable may be discrete or continuous variable.
 Discrete variable includes finite or countably finite.
 Continuous variable covers the values in an interval of real number line.
The data gathered shall be presented, analyzed and interpreted in textual, tabular, graphical or
combination of these.
Textual presentation uses statements with numerals in order to describe the data for the
concrete information and interpretation it carries.
Tabular presentation uses statistical table to directly display the quantities or values collected
as data.
Graphical presentation illustrates data in a form of graphs aiding readers to understand the
text easily.
Population and Sample
 In statistics, population refers to the totality of the individuals in the research study. The
population is denoted by capital N. The population is classified into target population and
sampled population.
 The target population is the entire set of individuals about which we require information.
 The sampled population is the basic finite set of individuals from which a sample is drawn.
This may be the target population, or may be a more limited, more accessible population whose
properties we hope can be extrapolated to the larger target population.
 Sample refers to the representative portion of the population under study which means a
portion of the population chosen in such a manner that the characteristics and variations are
reflected.
𝑁𝑁
Formula in solving the sample size:
1+𝑁𝑁𝑒𝑒 2

Where:
N is the population
e = 0.05
There are four (4) basic reason for the use of samples and this are as follows:
1. A sample allows us to obtain information with greater speed.
2. A sample allows us to obtain information with reduced cost.
3. A sample allows us to obtain information over a greater scope.
4. A sample allows us to obtain information with greater accuracy.
SUMMATION NOTATION
One important symbol that we will encounter is the Greek letter ∑ (sigma), which refers to total or
to take the sum and 𝑥𝑥 is the variable which represents a set of measurements from the first to the last.
General Form of Sum m ation Notation
1. Sum m ation of n Variables:
𝑛𝑛

� 𝑥𝑥𝑖𝑖
𝑖𝑖=1

(read as the “summation of x, where i runs from 1 to n’ )


Thus, the sum of n number of observation is represented as,
𝒏𝒏

� 𝒙𝒙𝒊𝒊 = 𝒙𝒙𝟏𝟏 + 𝒙𝒙𝟐𝟐 + 𝒙𝒙𝟑𝟑 +. . . . +𝒙𝒙𝒏𝒏


𝒊𝒊=𝟏𝟏
Measures of Central Tendency
( Mean and Weighted Mean)
Central Tendency
It determines a numerical value in the central region of a distribution of scores. It refers to the center
of a distribution of observations. There are three measures of central tendency: the mean, median,
mode.
• The measures of central tendency gives the statistician the average figure that may represent
the data set
• The measures of central tendency also tells the statistician the middlemost measure of the data
set
• And finally, it tells the statistician the most frequent score of the data set at hand.
Examples
1. After computing for the mean, the statistician may say that the average height of Filipinos is
5’5” whereas the average height of the Americans is 6’2”. Conclusion: Americans are taller
than the Filipinos in general.
2. After computing for the median of the scores in the recent exam, the value computed was 74.
Since the passing score is 75, the statistician concludes that more than 50% of the class
failed.
3. 3. After the worship service, the church treasurer gathered the collection and found out that
99% of the paper bills are P20. The mode is P20. Conclusion: People can afford only P20
for church offering.
Mean
The mean, 𝑀𝑀𝑛𝑛 is also called the arithmetic mean or average. It can be affected by extreme
scores. It is stable, and varies from sample to sample.
It is used if the most reliable measure is desired and when there are a few very low values. The
mean is the balance point of a score distribution.
How to compute the mean?
A. Ungrouped Data
𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣
a. Mean: 𝑀𝑀𝑛𝑛 = 𝑡𝑡ℎ𝑒𝑒 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣

∑𝒇𝒇𝒇𝒇
B. Weighted Mean: 𝑾𝑾𝑴𝑴𝒏𝒏 =
𝑵𝑵

Where:
𝑊𝑊𝑀𝑀𝑛𝑛 = weighted mean, 𝑓𝑓 =frequency, 𝑋𝑋 = score
∑𝑓𝑓𝑓𝑓 = sum of the product of frequency and score
𝑁𝑁 = total frequency

You might also like