Professional Documents
Culture Documents
Module 1: Nature of Statistics
Module 1: Nature of Statistics
Module 1: Nature of Statistics
Introduction
Statistical Thinking will one day be as necessary for efficient citizenship as the ability to read
and write (H. G. Wells).
In 2017, The Economist published one of the striking changes in the world economy. It claims
that the world’s most valuable resource is no longer oil, but data. The five biggest tech giants –
Google, Amazon, Apple, Facebook, and Microsoft – had been taking advantage and profiting from
making use of consumer/customer data. This phenomenon prompted professionals to the use
of statistics and later popularizing the concept of data science.
To date, many business companies all over the world are hiring statisticians and data
scientists to further their competitive advantage. Since everything is data and everyone needs it
analyzed, you need to learn the important knowledge and skills of statistics. As Florence Nightingale
puts it,
“Statistics… is the most important science in the whole world: for upon it depends the practical
application of every other science and of every art; the one science essential to all political and social
administration, all education, all organization based upon experience, for it only gives the results of
our experience.”
Levels of Measurement
Measurement levels refer to different types of variables that imply how to analyze them.
1. It is a variable whose values don’t have an undisputed order. It may have two or
more exhaustive, non-overlapping categories but there is no intrinsic ordering of the
categories. Examples: sex, socioeconomic status, civil status, school division, religious
affiliation, mother-tongue
2. It holds the value that has an undisputed order but no fixed unit of measurement. An
ordinal variable is similar to a nominal variable except that there is a clear ordering of
the variables. Although, the difference between each range cannot be stated with
certainty. Examples: rating scales (Likert scales), shoe/shirt sizes, ranking, monthly
income (range)
3. An interval variable is similar to ordinal data except that the ranges are equally
spaced. It has a fixed unit of measurement but zero does not mean anything.
Examples: temperature, pressure, IQ score, mental ability ratings
4. A ratio variable is an interval variable with a true zero. It has a fixed unit of
measurement and zero means nothing. Example: weight, height, age, income
Introduction
After identifying your research problem, the next step is to collect appropriate and relevant
data. Data collection is crucial to the success of any investigation or study. If the investigator was not
able to collect enough relevant data, the findings and results of the study will be affected; thus,
conclusions, generalization, or implications derived from the available data may not be reliable or
valid. Becoming an expert in data collection methods and techniques require time and effort.
Guidance from an experienced researcher or statistician may help you in working your data collection
and sampling design
Data collection is a methodical process of gathering and analyzing specific information to give
solutions to relevant research questions.
Types of Data
1. Primary Data. These are data collected by the investigator himself/ herself for a
specific purpose. For instance, the data collected by an investigator for their research
projects is an example of primary
2. Secondary These are data collected by someone else for some other purposes, but
the being utilized by the current investigator for another purpose. For instance, the
census data is used to analyze the impact of education on career choice, and earning
is an example of secondary data.
Sampling Techniques
1. Probability Sampling. It is a sampling technique wherein the members of the population are
given an (almost) equal chance to be included as a sample.
Simple Random All members of the population have a chance of being included in the sample.
Example: lottery method, random numbers
Systematic Random Sampling (with a random start). It selects every kth member of the
population with a starting point determined at random. Example: Selecting every 5th member
of N = 1000, to get 200 samples. For instance, starting at 7th member, we have the 12th, 17th,
22nd, and so
Stratified Random This is used when the population can be divided into several smaller non-
overlapping groups (strata), then the sample is randomly selected from each group.
Cluster Sampling. Also called area sampling in which groups or cluster, instead of individuals
are selected randomly as sample
Multi-stage Sampling. If the population is too big, two or more sampling techniques may be
used until the desired sample is
2. Non-probability Sampling. It is a sampling technique wherein the sample is determined by set
criteria, purpose, or personal
1. Purposive or Judgment The sample is selected based on predetermined criteria set by
the researcher. Example: To determine the difficulties encountered by students in the
2017 national achievement test, only the Grade 6 pupils of the said school
will be included as a sample.
2. Convenience or Accidental It relies on data collection from population members who
are conveniently available to participate in the study. Facebook polls or questions can
be mentioned as a popular example of convenience sampling.
3. Quota Sampling. It is a non-probability sampling technique in which researchers look for a
specific characteristic in their respondents, and then take a tailored sample that is in
proportion to a population of
4. Snowball The samples are determined by referrals made by previous members of the sample
MODULE 3: DATA PRESENTATION AND VISUALIZATION
Introduction
Data visualization is a graphical representation of information and data. The different data
visualization tools provide an accessible way to see and understand trends, outliers, and patterns in
data. Being another form of visual art, data visualization grabs the interest and attention of the
audience on the message. It helps to tell the important stories by curating data into a form easier to
understand, highlighting the most important aspect of the data set. However, data presentation and
visualization are not as simple as creating graphs and tables. Effective presentation and visualization
of data involve a balance between form (aesthetics) and function.
A statistical graph (or chart) is a tool that helps readers to understand the characteristics of a
distribution of sample or a population. Effective data presentation follows the following principles.
Five Essential Elements of Data Visualization (Data Craze, 2020)
1. Consistent Style and Colors. Carefully choose and maintain the same style across your
visualizations. Remember that the true meaning and value of data are not just in
2. Select Right Visualization. A bar or pie chart is not the only visualization method in your
arsenal. Adjust what you want to present based on the purpose and type of data you
3. Less is More. Focus on the quality of what you want to present. The excessive number of charts
or indicators is distracting. Simplicity comes at a price – the less information to analyze the
4. Effective Visualization. The difference between effective and impressive visualization can be
huge. The data presented in the application should foremost give a value – effect in the form of
specific
5. Data Quality. The trust of users is difficult to build, but it is easy to lose. Unexpected
information is desirable, errors are not. Try to detect errors at an early
Other useful graphs and charts, with their description, use, and other important features may be
found at The Data Visualization Catalogue via datavizcatalogue.com
Here are some tips on improving your charts and graphs (Visme, 2020).
1. Our eyes do not follow a specific order, so you need to create that order. Create a visualization
that deliberately takes viewers on a predefined visual
2. Our eyes first focus on what stands out, so be intentional with your focal point. Create charts
and graphs with one clear message that can be effortlessly
3. Our eyes can only handle a few things at once, so do not over crowd your design. Simplify your
charts so that they highlight one main point you want you
4. Our brains are designed to immediately look for connections and try to find meaning in the data.
Assign colors deliberately to improve the functionality of your
5. We are guided by cultural
Lesson 2: Tabular Presentation of Data
Almost all research and technical reports use tables to present data. Tabular presentation of
data is a systematic and logical arrangement of data into rows and columns with respect to the
characteristics of data.
Components of Tables
1. Table Number and Title. It is included for easy reference and identification. It should indicated
the nature of the information that is included in the
2. Stub (Row Labels). It is placed on the left side of the tabular form indicating specific issues in the
3. Captions (Column Headings). It placed at the top of the columns of a table to explain figures of
the
4. Body. The most important part of the table which comprises numerical contents and reveal the
whole story of investigated
5. Footnote. It provides further explanation that may be needed for any item that is included in a
6. Source note. It is placed at the bottom of the table to indicate the sources of
Tabular Presentation of Nominal and Ordinal Data
Nominal or ordinal data are presented using a frequency table or frequency distribution
table. The table displays frequency count and percentages for each value of a variable.
Example: Suppose your research objective is to determine the profile of the respondents. The data
may be presented as follows.
A contingency table or crosstabulation can also be used to display the relationship between
categorical variables. This type of presentation allows us to examine a hypothesis regarding the
independence or dependence of between variables.
Example: Suppose your research objective is to determine the profile of the respondents. The data
may be presented in crosstabulation as follows.
Tabular Presentation of Interval and Ratio Data
The data on the interval or ratio scale are organized using a frequency distribution table.
These are the steps in constructing a frequency distribution table.
1. Determine the number of class intervals, = 1 + 3.322 , the range = – , and the class size c = R/k
2. Construct the class intervals based on the class The first and last class intervals should contain
the minimum and maximum value, respectively. It is advisable to start the first class interval
with the minimum value.
3. Arrange the data in in either ascending or descending order. Then tally the scores based on the
class intervals in step
4. Add columns for class boundaries, class mark or class midpoint, relative frequency, and
cumulative
The class interval contains the lower (L) and upper limits (U). (e g. In the class interval 46
– 65, the lower limit is 46 and the upper limit is 65)
The class mark or class midpoint (X) is the value in the middle of the class interval. (e. g. In the
class interval 46 – 65, the class mark is 55.5; that is,
The class boundaries are the true class limits of the class intervals. It is halfway below the
lower limit and halfway above the upper limit. (e. g. In the class interval 46 – 65, the class
boundary is 44.5 – 65.5)
The relative frequency (also known as percentage frequency) is computed using the formula
where f is the frequency of the class interval and n is the total of the frequencies.
The less than cumulative frequency (<cf) and greater than cumulative frequency (>cf) are
obtained by adding the frequencies from top to bottom and from bottom to top,
respectively.
Example: Using the scores of 50 students in a 55-item Mathematics test, construct a frequency
distribution table.
43 30 35 37 42 19 26 48 34 15
35 18 46 41 27 18 13 40 29 14
40 17 10 21 28 13 14 39 30 5
19 50 36 20 31 28 48 32 20 38
25 12 33 31 28 16 40 32 26 35
Solution:
Step 1: Determine the number of class intervals the range, and the class size.
= 1 + 3.322 = –
= 1 + 3.322 (50) = 50 – 5
= 6.643978 = 7 = 45
c= R/K
c = 45/7 = 6.43 =7
Step 2: Construct the class intervals based on the class size.
Since our minimum value is 5 and the class size if 7, the first class interval is 5 – 11. Note
that this class interval contains 7 values – 5, 6, 7, 8, 9, 10, 11.
To construct the succeeding intervals, add the class size to the lower and upper limits.
Class
Intervals
5 – 11
12 – 18
19 – 25
26 – 32
33 – 39
40 – 46
47 – 53
Step 3: Arrange the data in in either ascending or descending order. Then tally the scores based on the
class intervals in step 2.
5 14 18 20 27 30 32 35 40 43
10 14 18 21 28 30 33 36 40 46
12 15 19 25 28 31 34 37 40 48
13 16 19 26 28 31 35 38 41 48
13 17 20 26 29 32 35 39 42 50
This data set can be organized or sorted using stem-and-leaf plot. A stem-and-leaf plot is a special
table where each data value is split into a stem, first digit or digits, and a leaf, last digit.
Stem Leaf
0 5
1 0233445678899
2 00156678889
3 001122345556789
4 000123688
5 0
Step 4. Add columns for class boundaries, class mark or class midpoint, relative frequency, and
cumulative frequencies.