Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

STATAPP Statistics means the art or science which is

concerned with the collection, presentation,


Reasons for using statistics
and analysis of quantitative data so that
• aid in summarization intelligent judgment may be formed upon them.

• aid in “getting at what’s going on”


• aid in extracting “information” from the In Plural sense
data
Statistics means information useful to decision
• aid in communication makers such as statistics on employment and
unemployment, or vital statistics on births,
deaths and marriages.
Origin of statistics
❖ Statistics were obtained long before Classifications:
Christ was born in order to prepare
available manpower and resources in Descriptive Statistics - Gives numerical and
wars graphic procedures to summarize a collection
of data in a clear and understandable way.
❖ The word Statistics was derived from
Italian word statistica meaning Inferential Statistics - Provides procedures to
statesman. draw inferences about a population from a
sample.
Uses of statistics:
The main function of statistics is to enlarge our
knowledge of complex phenomena. Populations & Samples

The following are some uses of statistics: Population –The entire set of individuals or
objects of interest or the measurements
1. It presents facts in a definite and precise obtained from all individuals or objects of
form. interest.
2. Data reduction. Sample – A portion, or part, of the population
3. Measuring the magnitude of variations in of interest.
data.
4. Furnishes a technique of comparison A Good Sample
5. Estimating unknown population o The sample needs to be substantial
characteristics. enough to accurately represent the
6. Testing and formulating of hypothesis. entire population.

7. Studying the relationship between two or o Every unit in the population has an
more variable. equal and independent probability of
being chosen; individuals are chosen at
8. Forecasting future events random.
o If data is not collected randomly, it
What is/are Statistics? cannot be used in any meaningful way
to draw conclusions. Random (or
In Singular sense
probability) sampling lowers bias and weight) are infinitely divisible into
sample error. whatever units a researcher may
choose. For example, time can be
measured to the nearest minute,
Statistical data second, half-second, etc.

The collection of data that are relevant to the Basic Concepts and Definition
problem being studied is commonly the most
Variable: A characteristic about each individual
difficult, expensive, and time-consuming part of
element of a population or sample.
the entire research project.
Data (singular): The value of the variable
Statistical data are usually obtained by
associated with one element of a population or
counting or measuring items.
sample. This value may be a number, a word,
 Primary data are collected or a symbol.
specifically for the analysis
Data (plural): The set of values collected for
desired
the variable from each of the elements
 Secondary data have already belonging to the sample.
been compiled and are available
Experiment: A planned activity whose results
for statistical analysis
yield a set of data.

Parameter: A numerical value summarizing all
A variable is an item of interest that can take the data of an entire population.
on many different numerical values.
Statistic: A numerical value summarizing the
A constant has a fixed numerical value. sample data.

Data Types of variables

Statistical data are usually obtained by


counting or measuring items. Most data can be
put into the following categories:
• Qualitative - data are measurements
that each fail into one of several
categories. (hair color, ethnic groups
and other attributes of the population)
• Quantitative - data are observations
that are measured on a numerical scale
(distance traveled to college, number of
children in a family, etc.)
Types of Variables Example: A college dean is interested in
learning about the average age of faculty.
• Variables can be classified as discrete
Identify the basic terms in this situation.
or continuous.
The population is the age of all faculty
• Discrete variables (such as class size)
members at the college.
consist of indivisible categories, and
continuous variables (such as time or
A sample is any subset of that population. For What is the The type of Your blood
example, we might select 10 faculty members effect of diet soda you sugar
and determine their age. and regular drink (diet or levels
soda on blood regular)
The variable is the “age” of each faculty
sugar levels?
member.
One data would be the age of a specific faculty
member. How does The amount Number of
The data would be the set of values in the phone use of phone use hours of
sample. before bedtime before bed sleep
affect sleep?
The experiment would be the method used to Quality of
select the ages forming the sample and sleep
determining the actual age of each faculty
member in the sample.
The parameter of interest is the “average” age
of all faculty at the college. Measuring Variables

The statistic is the “average” age for all faculty • To establish relationships between
in the sample. variables, researchers must observe the
variables and record their observations.
The variables in a study of a cause-and-effect This requires that the variables be
relationship are called the independent and measured.
dependent variables.
• The process of measuring a variable
The independent variable is the cause. Its requires a set of categories called a
value is independent of other variables scale of measurement and a process
that classifies each individual into one
The dependent variable is the effect. Its
category.
value depends on changes in the independent
variable.
Research Independent Dependent 4 Types of Measurement Scales
Question variable(s) variable(s)
1. A nominal scale is an unordered set of
categories identified only by name.
Nominal measurements only permit you
to determine whether two individuals are
Do tomatoes The type of The
the same or different.
grow fastest light the rate of
under tomato plant growth 2. An ordinal scale is an ordered set of
fluorescent, is grown of the categories. Ordinal measurements tell
incandescent, under tomato you the direction of difference between
or natural light? plant two individuals.
3. An interval scale is an ordered series of
equal-sized categories. Interval
measurements identify the direction and
magnitude of a difference. The zero
point is located arbitrarily on an interval
scale.
4. A ratio scale is an interval scale where 2.1 Measuring Variables
a value of zero indicates none of the
To specify and apply the appropriate statistical
variable. Ratio measurements identify
approach for its analysis and inferences, one
the direction and magnitude of
must have a thorough understanding of the
differences and allow ratio comparisons
nature and type of the data that will be used.
of measurements.
To establish relationships between variables,
researchers must observe the variables and
Applications of Statistics record their observations. This requires that
the variables be measured. The process of
• describe the characteristic of the
measuring a variable requires a set of
elements in the population under study
categories called a scale of measurement
through the computation or estimation of
and a process that classifies each individual
a parameter such as the proportion,
into one category.
total, and average
• compare the characteristics of the
elements in the different subgroups in 4 Types of Measurement Scales
the population through contrasts of their
1. A nominal scale is an unordered set of
respective summary measures
categories identified only by name.
• justify an assertion made by the Nominal measurements only permit you
researcher about a particular to determine whether two individuals are
characteristic of the population or the same or different.
subgroups in the population
2. An ordinal scale is an ordered set of
• determine the nature and strength of
categories. Ordinal measurements tell
relationships among the different
you the direction of difference between
variables of interest such as the
relationship of a person’s grade when he two individuals.
graduated from college with his current 3. An interval scale is an ordered series of
income, or, the relationship of a person’s
equal-sized categories. Interval
calorie intake and aerobic exercise with
measurements identify the direction and
his weight
magnitude of a difference. The zero
• identify the different groups of inter- point is located arbitrarily on an interval
related variables under study scale.
• reveal the natural groupings of the 4. A ratio scale is an interval scale where
elements in the population based on the
a value of zero indicates none of the
values of a set of variables
variable. Ratio measurements identify
• determine the effects of one or more the direction and magnitude of
variables on a response variable differences and allow ratio comparisons
• clarify patterns and trends in the values of measurements.
of a variable over time or space
• predict future outcomes such as next
2.2 Summation and Product Notation
year’s GNP or next quarter’s demand for
various agricultural products 2.2.1 Summation Notation
Summation notation is necessary in LIMITED BUDGET, TIME AND
computing descriptive statistics like the total, WORKFORCE.
average, and standard deviation. It is a
mathematical symbol that denotes sums of
DIFFERENT TYPES OF NON-PROBABILITY
numerical values.
SAMPLING
For example, we would like to get the total
1. JUDGEMENTAL/PURPOSIVE
amount spent for groceries for the last 6
months or we would like to compute the total In this method, researchers select the samples
sales of a certain product for the week. based purely on the researcher’s knowledge
and credibility. Researchers choose only those
A. Definition
people who they deem fit to participate in the
∑𝐧𝐢=𝟏 𝐱 𝐢 = x1 + x2 + x3 + …., + xn research study. compared to other individuals.
It’s commonly used in qualitative research,
Where:
especially when considering specific issues
∑ is the summation operator ( Greek with unique cases.
capital letter sigma)
2. ACCIDENTAL/CONVENIENCE
i is the index of summation
It is when samples are selected from the
1 is the lower limit of the summation population only because they are conveniently
available to the researcher. Researchers
n is the upper limit of the summation choose these samples just because they are
xi is the value of the variable for the ith easy to recruit and accessible.
observation 3. CONSECUTIVE
The Greek capital letter sigma, ∑. is the It is a sampling method is very similar to
shorthand used for the operation of addition. convenience sampling, with a slight variation.
Consecutive sampling technique gives the
researcher a chance to work with many topics
NON-PROBABILITY SAMPLING and fine-tune his/her research by collecting
WHEN TO USE NON-PROBABILITY results that have vital insights.
SAMPLING? 4. QUOTA
• THIS TYPE OF SAMPLING CAN BE Defined as a non-probability sampling method
USED WHEN DEMONSTRATING THAT in which researchers create a convenience
A PARTICULAR TRAIT EXISTS IN THE sample involving individuals that represent a
POPULATION. population. Researchers choose these
• IT CAN ALSO BE USED WHEN THE individuals according to specific traits or
RESEARCHER AIMS TO DO A qualities. They decide and create quotas so
QUALITATIVE, PILOT OR that the market research samples can be
EXPLORATORY STUDY. useful in collecting data.
• IT CAN BE USED WHEN
RANDOMIZATION IS IMPOSSIBLE 5. SNOWBALL
LIKE WHEN THE POPULATION IS It is a method where new units are recruited by
ALMOST LIMITLESS. other units to form part of the sample. Thus, a
• THIS KIND OF SAMPLING IS USEFUL technique where an initial sample will be asked
WHEN THE RESEARCHER HAS to identify other members of the population.
Advantages of Non-probability Sampling Steps involved in sampling:
• does not require a sampling frame 1. Identify and define the target population
• allows researchers to target particular 2. Select sampling frame
groups within the population 3. Choose sampling methods
• more conducive and practical method 4. Determine sample size
• getting responses quicker 5. Collect the needed data
• more cost-effective SAMPLING FRAME
It refers to a list or an organized representation
Disadvantages of Non-probability Sampling of all the items or individuals that could
potentially be selected for a research study or
• samples are unlikely to represent to the survey. It serves as the basis for selecting a
population representative sample from a larger population.
• undermines the generalizability and The population could be people, objects,
validity of the results events, or any other units of interest depending
• risk of several kinds of research bias on the research context.
such as:
- sampling bias
QUALITIES OF A GOOD SAMPLING FRAME
- observer bias
• Include all individuals in the target
- undercoverage bias population.
• Exclude all individuals not in the target
population.
SAMPLING • Includes accurate information that can
be used to contact selected individuals.
Definition:
Sampling is the act of studying only a segment
or subject of the population which represents
the whole.
It is basically concerned with the selection of a
subset of individuals from within a statistical
population to estimate the characteristics of the
whole population.
POPULATION VS. SAMPLE
The population is the entirety of the group from
whom you intend to derive conclusions. The
sample, meanwhile, refers to the specific SAMPLING DESIGN: PROBABILITY
population from which you will obtain your data. SAMPLING

Importance Simple Random Sampling – Uses randomly


generated numbers to choose elements to
• reducing cost sample.
• improving accuracy
• creating more scope Example: In our recitation in Tax 2, our
• gathering data with a greater speed professor is using shuffled index cards with our
name written on it, to choose someone to recite
or to answer some questions. There is an Consecutive Sampling – A sampling method
equal chance for us to be chosen and to be where the first subject that meets the inclusion
asked some questions. criteria will be selected for the study. If the
second subject also meets that criteria, he or
Stratified Sampling – Divides a population
she will also be included, and so forth.
into categorized groups and samples one
element from each group. Example: Online and social media surveys,
asking acquaintances, and surveying people in
Example: A person wanted to know which
a mall, on the street, and in other crowded
department in University of Batangas is the
locations regarding the quality of a make-up
best in math. In stratified sampling, one way to
brand. After, they move to a place to another.
determine that, is to choose one person in
each department who will be a representative samples or respondents are selected because
to compete in math competition. In that they happened to be in the right place and in
example, the population is the University of the right time.
Batangas, the groups are the departments (ex.
Judgmental Sampling – Sample is selected
Acct dept. engr. dept) and the samples are the
based on the researcher’s judgement. It
representatives.
involves the researcher making a judgement on
Systematic Sampling – Samples every nth which elements of the population should be
element in a population. For example, every 3rd included in the sample.
person.
Example: A researcher wants to conduct a
Example: Selecting every 20th person to enter study regarding the effectiveness of the distant
the movie room in a line of 200 moviegoers. learning or online classes in Batangas city.
Cluster Random Sampling – Divides a The researcher believes that online class
population into representative clusters and students in Batangas should be the
samples one whole cluster. respondents of the research.
Example: A group of researchers are Quota Sampling – Relies on the non-random
conducting a study on the consumption of soda selection of predetermined number or portion
in Batangas city. The Batangas city is then of units. This is called a quota.
divided into areas called clusters, and then
Example: A cigarette company wants to find out
select certain clusters to be a part of the
what age group prefers what brand of
sample group.
cigarettes in Batangas city. They apply survey
quota on the age of 20 years old above. With a
total of 50 respondents both male and female.
Sampling Design: Non-Probability Sampling
Snowball Sampling – A non-probability
Convenience Sampling – Attempts to obtain
sampling method where new units are recruited
samples without pre planning of the selection
by the other units to form part of the sample.
of samples. Depends on the convenience of
researcher. Example: For example, a researcher studying
the experiences of undocumented immigrants
Examples: Students in grade 12 are assigned
in a particular city. This population might be
to conduct research at their school. They are
difficult to reach through traditional sampling
only permitted to conduct surveys during the
methods due to fear of legal repercussions,
week. During the week, the researcher
lack of formal records, and other barriers.
selected samples or respondents who were
present at the time. The researcher might start by contacting a
local organization that provides services to
immigrants. Through this organization, the • Availability of data: Some data may be
researcher could connect with a few willing considered ‘privileged’ information by
individuals to participate in the study. agencies, projects, or government
officials.
• Data may be available only on
aggregated levels or already calculated
Indicators
into indicators that may not be the ideal
• It is a specific, observable, and indicators for your programme or
measurable characteristic that is used to activities.
demonstrate changes or progress in • Resources: Ideal indicators might
achieving a certain goal. require collecting data to calculate an
• For each outcome, there must be at unknown denominator, or national data
least one indicator. This indicator must to compare with project area data, or
be focused, clear, and specific. The tracking lifetime statistics for an affected
change measured by the indicator and/or control population, etc.
should represent the progress that the • The cost of collecting appropriate data
plan aims to make. for ideal indicators is prohibitive.
• An indicator should be defined in precise • Human resources and technical skills
terms that states clearly and exactly may be a constraint as well.
what is being measured. The indicator • Programmatic and external
should also give a good idea of the data requirements: Indicators may be
needed and the population among imposed from above by those not
whom the indicator is measured. trained in monitoring and evaluation
• An indicator does not define a certain techniques.
level of achievement. Words like • Reporting schedules may not be
“improved”, “increased”, or “decreased” synchronized (e.g. fiscal vs. reporting
are not included in an indicator. year).
• Different stakeholders’ priorities may
Characteristics of Indicators: diverge.
• Valid - there should be an accurate • Standardized indicators should be used
measure of behavior, practice, or task if available.
which is the expected output of the
intervention. How many indicators are enough?
• Reliable - it should be consistently • Some guidelines to follow when
measurable over time and in the same selecting indicators:
way by different observers. • At least one or two indicators per result
• Precise - it should be operationally (ideally, from different sources)
defined in clear terms. • At least one indicator for every core
• Measurable - it should be quantifiable activity (e.g. training, airing of TV spot)
with the use of the available tools or • No more than 8-10 indicators per area of
methods. significant programme focus
• Timely - a good indicator provides a • Use a mix of data collection strategies
measurement at time intervals that are and sources
relevant and appropriate in terms of
goals and activities of the plan/project. There are 2 different kinds of indicators,
• Programmatically important - it should knowing the difference between them is
be linked to the plan/project or to important.
achieving the objectives of the Process Indicators are used to monitor the
plan/project. number and types of activities carried
out. Examples include:
Considerations
• The number and types of services • Monitoring and evaluation frameworks
provided and plans should incorporate both
• The number of people trained process and results indicators.
Results Indicators are used to evaluate Examples:
whether or not the activity achieved the Employment Survey:
intended objectives or results. Example
include: Indicators: Localized unemployment rates (by
region, department, and employment area)
• The perceptions of survivors about the
quality and benefits of services provided
by an organization or institution as Motivation needed by teachers:
measured by individual interviews Indicators:
• Results indicators can be developed at 1. Performance level in proportion to
the output, outcome and impact levels. extrinsic motivation
(Bott, Guedes and Claramunt, 2004) 2. Performance level in proportion to
• Output indicators illustrate change extrinsic motivation
directly related to the activities
undertaken in the program (e.g.
Register of Enterprises:
percentage of traditional leaders in
community x who completed the training Indicators: Business start-ups
on international human rights standards
related to violence against women and Performance Level:
girls whose knowledge improved).
• Outcome indicators refer to changes Indicators:
demonstrated as a result of program 1. Students’ evaluation
interventions over the medium and long 2. Attendance rates
term (e.g. the number of decisions in the
informal justice system of community x
related to violence against women that Probability Sampling
reflect a human rights-based approach)
• Impact indicators measure the long- Probability sampling is a method in statistics
term effect of programme interventions that involves selecting a sample from a larger
(e.g. the prevalence of violence against population using a random process, where
women and girls in community x.) each member of the population has a known
• An important question that needs to be and equal chance of being chosen. This
answered in order to track project approach is essential for ensuring that the
progress is how to define success. sample is representative of the entire
Typically, organizations can track the population, which allows for accurate
number of events they have organized generalizations and statistical inferences.
and the number of people who attended
(outcomes), but cannot track how Step 1: Choose your population of interest
people change their attitudes or Step 2: Determine a suitable sample frame
behavior (outcomes), especially over Step 3: Select your sample and start your
time. survey
• The key impact indicator is a reduction
in violence rates and rates, but this will WHEN TO USE?
take years to achieve and measure. As
a result, more indicators are needed to • When you want to reduce the sampling
gauge whether programs are on track. bias
• When the population is usually diverse
• To create an accurate sample
Advantages of Probability Sampling: Data is a collection of discrete or continuous
values that transmit information, describing
It’s cost effective amount, quality, fact, statistics, other
- A larger sample can also be chosen based on fundamental units of meaning, or just a series
numbers assigned to the samples. Then you of symbols that can be more formally
can choose random numbers from the more understood.
significant sample.
Good Data Defined
It's simple and straightforward
Probability sampling is an easy way as it does • Data quality tells us how reliable a
not involve a complicated process. It’s quick particular set of data is and whether or
and saves time. not it will be good enough for a user to
employ in decision-making.
It’s non-technical • Data quality management is a core
This sampling method doesn’t require any component of the overall data
technical knowledge because of its simplicity. management process, and data quality
improvement efforts are often closely
DIFFERENCE BETWEEN PROBABILITY tied to data governance programs that
AND NON-PROBABILITY aim to ensure data is formatted and
used consistently throughout an
Probability Non-probability organization.
Sampling Sampling
The samples are Samples are Why data quality is important
randomly selected. selected based on
the researcher’s • Bad data can have significant business
subjective judgment. consequences for companies.
Everyone in the Not everyone has an • Poor-quality data is often pegged as the
population has an equal chance to source of operational snafus, inaccurate
equal chance of participate. analytics and ill-conceived business
getting selected.
strategies.
Researchers use this Sampling bias is not
technique when they a concern for the Characteristics of a good data
want to keep a tab on researcher.
sampling bias. • Accuracy – ensures the data being
Useful in an Useful in an gathered and used for various purposes
environment having a environment that are trustworthy and reliable.
diverse population. shares similar traits. • Reliability - the extent to which the data
Used when the This method does is consistent and dependable for its
researcher wants to not help in intended use.
create accurate representing the • Timeliness – refers to availability,
samples. population
accessibility, and up to date information.
accurately.
• Relevance – refers to the degree or
Finding the correct Finding an audience
audience is complex. is very simple. level of importance of data to subject
matter.
CHARACTERISTICS OF GOOD DATA • Completeness – How comprehensive is
the information?
Data is information, information is knowledge, a) All necessary data are present
and knowledge is power. and accurate.
b) Complete data can help
organization make better
decisions.
How to Improve Data Quality
• Data profiling
• Data Standardization
• Geocoding
• Matching or Linking
• Data Quality Monitoring
• Batch and Real time
A good data quality service should provide
a data quality dashboard that delivers a
flexible user experience, and can be
tailored to the specific needs of the data
quality stewards and data scientists running
data quality oversight

You might also like