Professional Documents
Culture Documents
(Math 01) Basic Statistics
(Math 01) Basic Statistics
STATISTICS
Statistics - the art and science of collecting, presenting/organizing, analyzing, and interpreting data
Collection of data: the process of gathering relevant information from the population.
Presentation/Organization of data: the systematic arrangement of data into tables, graphs, or charts so that logical and
statistical conclusions can easily be derived from the collected information.
Analysis of data: the process of deducing relevant information from the given data so that numerical description can be
formulated.
Interpretation of data: deriving conclusion from the data that have been analyzed. It also involves making predictions or
forecasts about large groups based on gathered data from small groups.
FIELDS OF STATISTICS
1. Descriptive Statistics - methods concerned w/ collecting, describing, and analyzing a set of data without drawing
conclusions (or inferences) about a large group. Here, the statistician tries to describe a given situation.
Examples:
▪ Total number of Statistics students weighing at least 50 kilograms.
▪ The University registrar cited statistics showing an increase number of students during the past five years.
2. Inferential Statistics - methods concerned with the analysis of a subset of data leading to predictions or inferences
about the entire set of data. Here, the statistician tries to make inferences from samples to population. This area also
makes use of the concept of probability
▪ Example: A new teaching strategy was designed to improve the academic performance of college students was
tested on randomly selected college students. Based on the results, it was concluded that the new teaching
strategy is effective in improving the academic performance of college students.
TERMINOLOGIES
• Population - is the set of all possible values of the variable; it refers to the entire group being studied.
• Parameter – is a value calculated using the data from the population.
• Sample – is a subset of the population.
• Sampling– the process in which information obtained is only a part of the population.
• Statistic – is a value (average, percentage, etc.) calculated using the data from the sample
• Sampling bias – a sampling method is bias if not all members of the population has equal likelihood of being in the
sample
• Variable – is a characteristic observed or measured on every unit of the universe.
a. Quantitative – provide information in which a count or quantity is most important (e.g. weight, height, systolic
blood pressure)
b. Qualitative – variables that yield observations by which individuals can be categorized according to some
characteristic or quality. (e.g., gender, marital status and blood type)
HISTORY
• 3800 B.C. – records of population in Babylonia
• 3000 B.C. – records of population in China
• 5000 years ago – Sumerians census taxation
• Egyptians – occupation of their people
• Bible – census
• Athenians, Romans – census (male citizens) for war; census (general population) for food
1|MATH 01 – BASIC STATISTICS
• 2000 years ago – each Roman male had to return to the city for census and taxation
• Middle Ages – registrations on land ownership and manpower for wars
• 13th century – tax lists in Paris; William Conqueror required the compilation of info on population & resources (The
Domesday Book)
• Achenwall (1719 – 1772) - STATISTIKS
• Zimmermann and Sinclair – STATISTICS
• 16th century – gambling followed certain laws
• Girolamo Cardano – Liber de Ludo Aleae – 1st known study of probability
• Chevalier de Mere and Blaise Pascal – mathematics of probability
• Laplace – Theories Analytique des Probabilities (1812)
• 18th century – statistics was used in the study “Political Arrangement of the Modern States of the Known World”
• 19th century – Quetelet (Belgian) applied the theory of probability to anthropological measurements, established a
central commission for Statistics
• Francis Galton – use of percentiles
• Galton and Pearson – correlation theory
• Sir Ronald Fisher – Fisher’s Test or F test
POPULATION SAMPLE
➢ the totality of observations (human or things) under ➢ part of the population that best represents the group
study ➢ have the same characteristics as the population it is
➢ a measurable characteristic of a population is called representing
parameter ➢ a measurable characteristic of a sample is called a
statistic
COLLECTION OF DATA
In every study, statisticians or researchers collect information for variables which describe the event to obtain data.
Variable - any characteristics, number, or quantity that can be measured or counted, can take on different values, and
may also be called a data item.
▪ Examples: age, gender, income, place of birth, skills, religion, eye color, and height
Quantitative
1. Discrete variables - consist of values obtained by counting or ratings on a 5-point scale and which there are no
intermediate values possible.
▪ Example: The number of customers going to Jollibee in a day. There is no such thing as 105.5 customers.
2. Continuous variables - obtained by measurements and which can have any value in some interval of real numbers.
▪ Example: the height of a person.
Types of Data
1. Primary Data - data that are collected by the researcher himself/herself personally. These are the information that
has never been gathered before, whether in a particular way, or at a certain period of time.
▪ Examples: interviews and other ways that require face to face or personal interaction by the researcher
2|MATH 01 – BASIC STATISTICS
2. Secondary Data - data that comes from other studies done by other individuals, institutions or organizations.
▪ Examples: books, business periodicals, newspapers, government reports, or from the internet
Measurement Scales - Variables are classified by how they are categorized, counted, or measured. It helps us find out
which statistical inference test that will be used to analyze the data.
- Stanley Smith Stevens, 1940
1. Nominal Scales - classifies elements into two or more categories or classes and is used for labeling variables, without
any quantitative value. It possesses identity which is the property that allows a person to make a distinction of one
number from the other and is identified by the shapes of the way they are written.
Examples:
▪ Status of employment of workers (measured as permanent, probationary, contractual)
▪ Marital status (measured as married, single, widow, separated/annulled, divorced)
▪ Brand of automobile (measured as Toyota, Honda, Ferrari, and so forth); hair color (black, chest nut brown, etc)
2. Ordinal Scale - classifies a variable according to rank or order of categories but the differences between each one is
not really known. It possesses order which is the property that tells us that numbers follows an arrangement.
Examples:
▪ Performance of students in class (measured as outstanding, very satisfactory, satisfactory, poor, needs
improvement) Socio-economic status (measured as lower class, lower-middle class, middle class, upper-middle
class, upper class)
▪ Rank of winners in a pageant (queen/king, first runner – up, second runner – up, and so on)
▪ Likert – type scales (on a scale of 1 to 5…)
3. Interval Scale - numeric scale in which we know not only the order, but also the exact differences between the values
though this doesn’t have any true zero. It carries with it the additive property that allows adding of numbers
Examples:
▪ Fahrenheit scale to measure the temperature
▪ Household’s socio-economic status based on the income level and age bracket they belong
4. Ratio Scale - measurement scale that, in addition to being an interval scale, also has an absolute zero in the scale
which means nothing of the characteristic being measured.
Examples:
▪ All these can have a value of zero but they can’t have a negative value: Income (measured in Philippine peso, with
0 equal to no income at all), years of formal education, weight of an object, enzyme activity, dose amount, reaction
rate, flow rate, concentration, pulse, weight, length, temperature in Kelvin (0.0 Kelvin really does mean “no heat”),
survival time.
▪ Note: When working with ratio variables, but not interval variables, the ratio of two measurements has a
meaningful interpretation.
SAMPLING
Sampling - process of choosing a representative portion from a statistical population to estimate attributes of the entire
population. This representative portion of the population is referred to as the sample
To determine the sample size if the population is very large, we will use Slovin’s Formula.
Margin of error – A statistic expressing the amount of random sampling error in a survey's results. It tells you how reliable
your surveys are. The larger the margin of error, the less confidence one should have that the survey's reported outcomes
are close to the "exact" figures; that is, the data for the entire population.
Examples:
1. Suppose that you want to know the status of financial wellness of the 1000 employees in SLC. Since 1000 is a big
number for you to consider as respondents, you then decided that you will just take a sample of employees at 5%
margin of error. What will be your sample size?
Therefore, 286 employees will be taken as sample from the 1000 employees of SLC.
2. Gina wants to know the average income of the 3,000 families living in Barangay Carlatan, San Fernando, La Union.
Calculate the sample size Gina will need if a 3% margin of error is allowed.
Given: N= 3,000 ; e= 3% = 0.03
Required: n
Solution:
Therefore, Gina will need 811 samples from the 3,000 families residing at Barangay Carlatan at 3% margin of error.
Therefore, the margin of error used to get 800 samples from a population of 2,400 is 0.03 or
3%. 0.0289 ~ 2.89%
4. Find the population needed in order to have a sample size of 325 at 3.5% margin of error.
Therefore, the population needed to get 325 samples is 540 at 3.5%. margin.
SAMPLING TECHNIQUES
Statistical sampling techniques - methods applied by researchers during the statistical sampling process. Sampling may
be done either a probability (random sampling) or a non-probability (non – random sampling) basis.
A. Random Sampling
Each member of the population has an equal chance of being selected as subject in the study. The entire process of
sampling is done in a single step with each subject selected independently of the other members of the population.
1. Simple Random Sampling - most basic and common type of sampling method used in quantitative social science
research and in scientific research generally. The main benefit of the simple random sample is that each member
of the population has an equal chance of being chosen for the study. This means that it guarantees that the sample
chosen is representative of the population and that the sample is selected in an unbiased way.
a. Lottery/Fish bowl Technique - most common and easiest method offrandom sampling. This is done by
randomly picking a piece of paper, with numbers/names on it, which are placed in a bowl, a box, or a jar
in order to create the sample. To create a sample this way, the researcher must ensure that the numbers
are well mixed before selecting the sample population.
b. Random Number Table - One of the most convenient ways of creating a simple random sample is to use
a random number table. These will be composed of integers between zero and nine which are randomly
generated and are arranged in groups of five. These tables are carefully created to ensure that each
number is equally probable, so using it is a way to produce a random sample required for valid research
outcomes.
2. Systematic Sampling - process of selecting sample members from a larger population according to a random
starting point and a fixed, periodic interval. Here, every "kth" member is selected from the total population for
inclusion in the sample population.
3. Stratified Random Sampling - applied when the population is divided into different classes/strata wherein each
class/stratum must be represented in the study.
How to use stratified random sampling:
a. Define the population in the study.
b. Choose the stratification relevant to the study.
c. Make a list of the population in the study.
▪ Example:
Suppose that in SLC there 380 employees consisting the management team, the academic personnel,
academic support personnel, and the non–academic personnel. The table below shows the distribution of
the employees by strata:
Distribution of Employees in SLC
Management 20
Academic Personnel 235
Academic Support Personnel 35
Non – Academic Personnel 90
TOTAL NUMBER OF EMPLOYEES 380
Suppose we are to take a sample of 195 employees, stratified according to the above categories. First, we
need to find the total number of staff (380) and calculate the percentage in each group.
Classification of Employees N % n
This means that we have to get 3 employees from the management, 30 from the academic personnel, 5 from
the Academic Support Personnel, and 12 from the Non – Academic Personnel to be our sample.
4. Cluster Sampling - used when the geographical area where the study will be conducted is too big and the target
population is too large. The selection of sample units is not by individuals but by groups called clusters. The area
will be divided into clusters, and then a desired number of clusters will be selected at random.
▪ Example:
Dra. Rica Rodriguez wants to make a province wide study on the correlation between liquor drinking and death
rate. She decided to focus on the 19 municipalities and 1 city of La Union, which can be considered as clusters.
If five of the 20 municipalities and city are the desired sample units, the names of the 19 municipalities and 1
city will be written on small pieces of paper, then five will be picked at random using the lottery method. All
the residents of the selected five clusters will be included in the study.
5. Multistage sampling - large populations are divided into stages to make the sampling process more practical. It
is usually a combination of stratified sampling or cluster sampling and simple random sampling.
▪ Example:
You wanted to find out which courses senior high school students in the Philippines preferred to take in
college. Taking a list of all senior high school students in the Philippines would be near-impossible to come
by, so you cannot take a sample of the population. Instead, you divide the population into provinces and take
a simple random sample of the provinces. Then divide again the chosen provinces into municipalities and take
again a simple random sample. For the next stage, you might take a simple random sample of schools from
within those municipalities. Finally, you could do simple random sampling on the students within the schools
to get your sample.
1. Purposive sampling - the sample or respondents of the study will be chosen based on their characteristics or
knowledge of the information required by the researcher. This is also known as judgmental, selective, or
subjective sampling
▪ Example:
Suppose Estrella wants to make a historical study about La Union. The target population will be the senior
citizens of La Union since they are the most reliable persons who know the history of the province.
3. Convenience Sampling - used by researchers who need the information the fastest way possible. The telephone
is one of the means that can be used to interview the respondents about their opinions on a certain issue. This
method may be fast but it is also biased because those who have no telephones do not have a chance to be
included in the study.
▪ Example:
Chrisjoel wants to know the favorite FM radio station of the college students in all the colleges and universities
in the City of San Fernando, La Union. To do this, he just interviewed his classmates, and called his friends and
asked them about their favorite FM radio station.
1. Direct or Interview Method - the interviewer will have a face to face contact with the interviewee. The interviewer
will personally ask the needed information from the interviewee. This method provides consistent and more precise
information since clarifications can be made and questions may be repeated or reconstructed for better
understanding by the interviewee. However, this method is time-consuming, expensive and has limited coverage.
2. Indirect or Questionnaire Method - written answers are given to questions listed in a questionnaire. A questionnaire
is list of queries which are meant to draw answers to the problems of the study. It is a popular means of collecting
data, but is difficult to design and often require many revisions before an acceptable questionnaire is produced.
3. Registration Method or Records Review - entails perusal of existing records of an agency or person.
Examples: data obtained from the National Statistics Office, Land Transportation Office, Department of Education,
CHED, SEC, Supreme Court, and other government agencies. Certain laws enforce this method
4. Observation Method - commonly used in psychological and anthropological studies. It is a method of obtaining data
by observing the behavior of persons or organizations through seeing, hearing, testing, touching, and smelling,
whichever is necessary to use. The observer can become part of the group being studied to have an authentic
experience of what he/she is studying.
5. Experiment Method - a controlled study in which the researcher attempts to understand cause-and-effect
relationships of certain events. This method is usually used in scientific studies like those conducted in the field of
medicine.
1. Textual presentation shows the data in paragraph or narrative form. It combines words and numerical facts in a
statistical report. It includes listing essential characteristics, stressing significant statistics, and recognizing important
features of data.
Example:
Luzon, which is composed of eight regions, comprised more than half (56.9 percent) of the country’s total population.
It was followed by Mindanao (23.9 percent), which has six regions and Visayas (19.2 percent), which has four regions.
2. Graphical presentation is a pictorial representation of data through the use of illustrations of numerical data which is
considered as the most effective method of visually presenting statistical results or findings. The different kinds of
graphs/charts are the following: line graph, bar graph, circle graph/pie chart, pictograph, map graph, scatter point
diagram, and box plot
Examples:
a. Bar Graph - a technique of showing visually the differences in frequencies or percentages among categories
of a nominal or ordinal variable and the categories are displayed as rectangles of equal width with their height
proportional to the frequency of percentage of the category. Bar graphs are very useful for comparing
categories of a variable among different groups.
b. Line graph - a diagram that shows a line joining several points, or a line that shows the relationship between
the points.
e. Map Graph - an information graphics in the field of Statistics used to visualize quantitative data. It is especially
designed to show a particular theme connected with a specific geographic area.
f. Scatter Plot - or scatter graph is a type of graph which is drawn in Cartesian coordinate to visually represent
the values for two variables for a set of data. It is a graphical representation that shows how one variable is
affected by the other. The data is presented in the form of collection of points, each of which has one value
of a variable positioned on the horizontal or x-axis, also called explanatory variable and the value of the other
variable positioned on the vertical or y-axis, also called response variable
g. Tabular presentation - displays data in a more concise, systematic and logical arrangement into rows and
columns called statistical table.
Example:
Statistical tables - shows the values of the cumulative distribution functions, probability functions, or
probability density function of certain common distributions for different values of their parameters and are
used specially to determine whether or not a particular statistical result exceeds the required significance
level. They are also useful means of displaying both qualitative and quantitative data.
1. Classes or class intervals (C.I.) are the range of each group of data in terms of numerical intervals.
Example: 29 – 34
2. Class limits refer to the smallest and highest values or observations in each class. The smallest value or observation in
each class is known as the lower-class limit and the highest value or observation is called the upper-class limit.
Example: Consider the class interval 29 – 34, the lower-class limit is 29 and the upper-class limit is 34.
3. Class size (i) is the length of a class interval or class boundaries. It is the difference between the upper and the lower-
class boundaries. Thus, for the class interval 39-41, whose class boundaries are 38.5 and 41.5, the class size is 41.5 −
38.5 = 3. The class size is also obtained by getting the difference between two successive upper or lower limits.
Consider the successive upper limits 39 and 42, the difference is also 3. The size of the class intervals must be constant
throughout the distribution.
4. Frequency (f) shows the number of times a value falls under each class interval. This is obtained after making a tally
of all the given values or scores.
5. Class Mark (x) is the midpoint of a class interval. To obtain this, add the lower-class limit and upper-class limit, then
divide the sum by two. Example: the class interval 39-41, X is 40.
7. Cumulative Frequency is the sum of the class and all classes below it in a frequency distribution. To find the cumulative
frequency for a class, take the number in the current class and add on the previous cumulative frequency for the class
below.
8. Relative Frequency is the fraction of times an answer occurs. To find the relative frequencies, divide each frequency
by the total number of students in the sample All the Relative Frequencies add up to 1 (except for any rounding error).
The formula used in finding the relative frequencies is
1. Find the range using the formula R = highest value/score – lowest value/score.
2. Determine the class size (i) using the formula
3. Construct the class intervals. The lowest score in the data set will be the lower limit of the first-class interval. Then
add the class size (i) to the lower limit minus 1 to get the upper limit of the interval. Do the same for the succeeding
intervals in the distribution until the highest interval containing the highest value/score is found. The lower limit of
the succeeding intervals is always the upper limit of the previous interval plus 1.
4. Determine the class frequency (f) for each class interval by counting the elements (tallying). This is facilitated by the
tally column.
5. Complete the table by providing the rest of the columns – class mark, class boundary, cumulative frequency, and
relative frequency.
Example:
Construct a frequency distribution table for the data below: