Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 21

1

INTRODUCTION
Statistics is a branch of mathematics that deals with the processes of gathering,
describing, organizing, analyzing and interpreting numerical data as well as
drawing valid conclusions and making reasonable decisions on the basis of such
analysis.

Scope of Statistics:

Statistics is used in almost all fields of human endeavor like:


 Agriculture  Government
 Anthropology  Health
 Biology  Insurance
 Business  Literature
 Economics  Manufacturing
 Education  Marketing
 Engineering  Medicine
 Entertainment  Meteorology
 Environmental studies  Physics
 Fisheries  Politics
 Forestry  Psychology
 Genetics  Sociology
 Geography  Sports

It is said that statistics is the “tool” of all sciences. It is also called the “language of
research”.

Statistics is important to research since it:

 permits the exact kind of description


 forces the researcher to be definite and exact in his/her procedures and
in his/her thinking
 enables the researcher to summarize results in a meaningful and
convenient form
 allows the researcher to draw general conclusions: the process of
extracting conclusions is carried out according to accepted rules
 enables the researcher to predict “how much” of a thing will happen
under conditions he/she knows he/she has mentioned
Accordingly, statistics allows a person to see the significance of data, the
relationships between and among occurrences and forecast what may happen in the
future or determine what may have happened in the past.

However, statistical figures can be used to present the truth or it can be used to distort
the facts and hide the truth. This implies that statistics can also be misused and
abused.

Abuses in the Use of Statistics


 Bad Samples
o Unrepresentative
Samples
o Sampling Errors
 Small Samples
 Loaded Questions
 Misleading Graphs
 Partial Pictures
 Deliberate Distortions
 Scale breaks and axes
scaling
 Distorted Percentage
Definition of Terms

 Data- are the raw materials with which the statistician works

 Ungrouped or raw data - are data which are not organized in any specific way

 Grouped data - are data organized into groups or categories with corresponding
frequencies

 Population- refers to the groups or aggregates of people, objects, materials, events or


things of any form. Measures are called parameters.

 Sample - is a subgroup of the population; taken from the population so as to represent the
population characteristics or traits. Measures are called estimates or statistics

 Variable-the characteristic that is being studied. Examples are age, gender, intelligence,
personality type, attitudes, political or religious affiliation, height, weight, marital status
etc…

 Constant- is a quantity that does not change its value

 Measurement - is the process of assigning a numerical value to a variable

Two Types of Variables

 Qualitative Variables-have values that are described by words rather than numbers;
represent differences in quality, character or kind i.e. gender, birthplace, geographic
locations, religious preference, marital status

 Quantitative Variables-arise from counting, measuring something or from some kind of


mathematical operation; are numerical in nature and can be ordered or ranked i.e. weight,
height, age, test scores, speed and body temperatures

Note: there are variables that can be expressed both qualitatively and quantitatively like
grades in school in percent or in letters

Quantitative variables may be classified as:

Discrete - values can be counted using integral values such as the number of enrollees, drop-outs,
graduates, deaths, employees, cars, math subjects, calls (integral values)
Continuous - can assume any numerical value over an interval or intervals like height, weight,
temperature, time, pressure in a tire, number of kilometers driven (decimals or fractions)

A variable can be:


 Dependent (that whose value is being predicted)
 Independent (predictor)

Scales of Measurement

 Nominal data use numbers for the purpose of identifying name or membership in a group
or category. All qualitative variables use this scale. Examples are gender, zip codes,
religion, marital status, major field of study, names of schools attended, brands of soaps
purchased, diagnosis

 Ordinal data connote ranking or inequalities. One category is higher than the other one.
Examples are social class or incomes, responses to an item on an instrument (always,
sometimes, never), contest results (first, second, third), birth order study (first or second-
born)

 Interval scales indicate an actual amount and there is equal unit of measurement
separating each score, specifically equal intervals. The interval between 50 F and 60 F is
the same as the interval between 20 F and 30 F. Since intervals between numbers
represent distances, mathematical operation can be done like getting the average.
However, since the zero point is arbitrary meaning 0 F does not mean there is no
temperature, we can’t say that 60 F is twice as warm as 30 F. Examples are Fahrenheit
temperature, Likert scale (5 or 7 scale points) scores on test as a measure of knowledge
and aptitude test scores

 Ratio data are the strongest level of measurement; have all the properties of the other
three but has a meaningful zero that represents the absence of the quantity being
measured. Because of this, ratios are meaningful. Thus, you can say that the income of
one is twice or thrice the other. Examples are election vote, speed of a production line,
average delivery and measurements of length, weight, area, volume, density, velocity,
money and duration.

Kinds of Statistics
 Descriptive Statistics-the collection and organization of data where the statistician tries to
describe a situation; masses of unorganized numerical data are of little value unless
statistical techniques are available to organize this type of data into a meaningful form

 Inferential Statistics-consists of generalizing from samples to populations, performing


hypothesis testing, determining relationships among the variables and making
predictions. Its main concern is to analyze the organized data leading to prediction or
inferences. Statisticians make predictions or inferences based on the conditions of past
and present data.

Types of inferential statistics:

 Parametric-appropriate when data represent an interval or ratio scale of


measurement and the distribution approximates a normal curve

 Non-parametric-appropriate when the data represent an ordinal or nominal scale


or when the nature of the distribution is not known

Types of Data

Primary – refer to information which are gathered directly from an original source or which are
based on direct or first hand experience.

Secondary – refer to information which are taken from published or unpublished data which
were previously gathered by other individuals or agencies

Methods Used in the Collection of Data

 Direct or Interview Method – person-to-person exchange; provides consistent and more


precise information since the interviewee may give clarification. Questions may be
repeated or modified to suit each interviewee’s level of understanding. However, this
method is time-consuming, expensive and has limited field coverage.

 Indirect or questionnaire method – written responses are given to prepared questions.


This method is inexpensive and can cover a wide area in a shorter span of time. Informers
may feel a greater sense of freedom to express views and opinions because their
anonymity is maintained. There is a strong probability, however, of non-response
especially if questionnaires are mailed. Questions not easily understood will also
probably not be answered.

 Registration Method – information is available due to certain laws.

 Observation Method – used when the subjects cannot talk or write. This makes possible
the recording of behavior at the appropriate time and situation.

 Experiment Method – used when the objective is to determine the cause and effect
relationship of certain phenomena under controlled conditions.

What is Qualitative Data?

Qualitative data consists of information that, in general, cannot be reduced to a specific


number. Ways of obtaining qualitative information include observation, interviews, or focus
groups that concentrate on some aspect of the participant’s experiences. Unlike with
quantitative data, there is less emphasis on, for example, counting the numbers of people who
think or behave in certain ways, and more emphasis on explaining why people think and
behave in these ways. The use of qualitative data generally involves smaller numbers of
respondents and opened-ended questions.

Strengths

 Useful for expanding the meaningfulness of quantitative information because it


allows more in-depth data gathering.

 In focus groups or group interviews, questions are directed at the group rather than
specific individuals; highly sensitive subjects can therefore be explored without the
individuals feeling pressured to respond or disclose.

Weaknesses

 Qualitative data collection is often time consuming and resource intensive.

 Results may not be fully generalized to the entire study population or community
because the group is not a representative sub-sample in a strict or formal sense.

 Data can be more difficult to analyze.


How do you get it?

a. Observation:

 Involves observation rather than asking questions; therefore; you are not dependent
on respondent self-report.

 Used to better understand behaviors, the social context in which they arise, and the
meanings that individuals attach to them.

 Observers compile field notes describing what they observe. Analysis focuses on
what happened and why.

 Maybe the most feasible way of collecting information from certain populations (e.g.,
infants and children or individuals who frequent public sex environments or drug
shooting galleries, children or infants).

 The protocol lists what the observer will be looking for.

b. One-on-One (In-depth) interviews:

 All alternative to focus group interviews when you want to avoid group influences on
the responses people give.

 Can include questions on demographics, knowledge level, attitudes, and opinions,


which are all useful for planning purposes.

 Questions are mostly open-ended questions; they are usually related to an open-ended
question.

c. Group interviews:

 Typically composed of 8 to 12 key participants selected through non-random means


who are brought together with a facilitator to respond to the research questions.

 Role of the interviewer is to ask questions and record the responses of each
participant, stimulating conversation directed at him/her.

 Involves little interaction between participants.


 Participants need not share particular experience, in other words, the group may be
diverse.

 Uses only open-ended questions.

d. Focus Groups:

 Typically composed of 8 to 12 participants selected through non-random means who


are brought together with a facilitator to respond to the research questions.

 Role of the facilitator is to stimulate conversation among the group.

 Useful for generating ideas and suggesting strategies; not very useful for finalizing
choices and definitively settling issues.

 Group members must have some relevant common experience or characteristic;


ideally, they should not know each other.

 Conduct focus groups at neutral and convenient sites.

 Questions or topics are open-ended and more broader than questions used in group or
one to one interviews.

All of these above methods for collecting qualitative data include the use of a protocol with
pre-determined topics. A protocol that provides consistency in the areas of inquiry allows the
development and analysis of findings that are cross-cutting and not just anecdotal.

How do you develop a protocol?

 Create a list of questions that you would like answered.

 Organize these questions into a set of summary topics. For focus groups and group
interviews, prepare four to six open-ended questions with pre-planned probes for
each. For one-on-one interviews, prepare from 5 to 10 questions with pre-planned
probes.

 Remember, a good facilitator creates a natural progression across topics with some
overlap between the topics.

 Questions should start broad and become more specific.


 Check to make sure that the questions or topics are related to your original research
question.

 Always pre-test any protocol you develop.

What is Quantitative Data?

Quantitative information is usually gathered by surveys and elicits data in a form that permits
exact counting. These types of data deals with percentages, averages, and other mathematical
operations as a way of summarizing how a population thinks, feels, or acts. Quantitative data
is typically gathered from large numbers of respondents selected either at random or through
a convenience selection process. The data collected is generally analyzed by statistical
methods. Quantitative data answer the what, when, and who but are limited in their ability to
answer the why and how.

Strengths

 Quantitative data can be very consistent, precise and reliable.

 There is important contextual information.

 Surveys can usually be conducted economically and can allow you to question large
numbers of people.

 If the participant selection process is well designed and the sample is representative
of the population being studied, the responses can be generalized.
 Relatively easy to analyze data and flexible enough to allow an array of analytical
methods to extract inferences.

Weaknesses

 Surveys tend to address issues in a somewhat narrow and superficial way.

 It is difficult to gain a full sense of the context in which activities take place.

 Access to and reliability of secondary data can be problematic.

 Some secondary data may not be related to your research question.


How do you get it?

a. Quantitative Secondary Data

Secondary data are commonly used in program evaluation and needs assessment
research. The following are some of the types of secondary data typically available:

 Epidemiological data
 Census data
 Knowledge, attitude, belief, and behavior (KABB) studies
 Other needs assessments
 Program client records
 Agency progress reports
 Clinic data
 Evaluation data

b. Quantitative Primary Data

Surveys

Surveys are the most commonly used form of quantitative primary data. Survey research
involves the use of a questionnaire given to a sample of respondents selected from some
population. This method of collecting data is typically used with a population too large to
observe directly.

 Surveys can be self-administered, administered face to face or over the telephone.

 It is essential that the interviewer be neutral and avoid having an effect on the
participant’s responses.

 The advantages of self-administered questionnaires are economy, speed, lack of


interviewer bias, and possibility of anonymity and privacy to encourage candid
responses on sensitive issues.

 The advantage of survey administered by an interviewer are fewer incomplete


questionnaires, fewer misunderstood questions, a higher return rate, and the
opportunity for special observations.

Survey Tips
 Surveys should in general use closed-ended questions (e.g., those that can be
answered “yes” or “no” or with a set number of choices).

 Response categories should be mutually exclusive.

 For continuous variable (e.g., age, income), leave response categories open-ended.

 Always pilot-test any survey you develop.

 Avoid asking two questions in one, don’t use “and” or “or” in your questions.

 Give clear instructions.

 Use language and cultural references that are appropriate to your anticipated
participants.

 Make sure you ask questions your respondents can answer.

 Always provide a “don’t know”, “other” or “not applicable” response category.

 Make your survey attractive and leave plenty of white space between questions.

 Colored paper increases response rate.

 If you conduct a mail survey, remember to provide return postage.

Key Principles of Data Collection

Confidentiality

This is a particularly important issue if collecting data about sexual behavior, substance
use, HIV status, or any other topic that may be considered “sensitive”. Remember “sensitive”
issues can be different for each individual and within each community. At minimum the
following steps should be taken to protect confidentiality of key informants:

 Names, addresses, or other identifying information should not be collected unless


necessary for collecting follow-up information at a later date.
 If identifying information is collected, no one should have access to it except for key
staff persons in charge of data collection activities or those that have a genuine need
to know. Store data in locked files.

 After data have been collected, identifying information should be destroyed as soon
as possible.

 All individuals involved in data collection and/or analysis should be carefully trained
or briefed in the importance of confidentiality.

Informed Consent

This criterion emphasizes the importance of both accurately advising any key informants
as to the nature of the research process and obtaining his or her verbal or written consent to
participate.

Cultural Sensitivity

Every community has its own unique characteristics and compositions. It is important
that your research process understands and addresses these issues of cultural difference.
Cultural sensitivity needs to move beyond the polemic and influence everyday practice in all
research activities. A process that is truly culturally sensitive has the following
characteristics:

 Awareness that racial, ethnic, and cultural minority groups have different needs or
have underutilized services. Diversity can include differences on the basis of sexual
orientation, socio-economic status and gender.

 Actively seeks input and expertise from minority communities.

 Is characterized by acceptance and respect for difference, continuing self-assessment


regarding culture, careful attention to the dynamics of difference, and continuous
expansion of cultural knowledge and resources.

 Partnership replaces paternalism.

 Staff, volunteers and advisory committee members reflect the racial and cultural mix
of the local population.
Reliability refers to the extent to which collected data is free of unpredictable kinds of error. In
other words, “Does this questionnaire or protocol yield consistent result”?

Sources of error that effect reliability Suggestions to overcome them

 Variations in how the instrument is  When pilot testing, include issues related
administered or oversights in giving to survey administration. Train facilitators
directions and staff. Occasionally supervise or follow-
up with staff.

 Fluctuation in mood or alertness of  Make sure that all respondents are


respondents because of illness, fatigue, or comfortable and reschedule if they are not
recent good or bad experiences well or indicate having had a recent
experience that may change the way they
would normally respond.

 Differences in scoring or interpreting  For quantitative data, establish rules for


results missing or unclear data and have only one or
two people responsible for data entry

 Random effects caused by respondents  Make sure to pilot test your surveys.
who guess or check alternatives without Make sure that respondents are provided with
trying to understand them clear instructions. Don’t ask respondents
questions that you are not sure can answer.

Validity

Validity refers to the ability of your instruments to help you produce an accurate, relevant,
representative and complete description or picture of your community or program. For example,
your IQ would seem a more valid measure of your intelligence than would the number of hours
you spend in the library.

When determining validity, the following questions must be answered:

 Is the instrument appropriate for what needs to be measured?


 How worthwhile is a measure likely to be for telling you what you need to know?
 Does the instrument give you the true story, or an approximate of the truth?

Reliability and validity refer to different aspects of an instrument’s credibility. Assessment of


the validity and reliability of an instrument and the data it produces helps to determine the
amount of faith people should place in the results. Early attention to the validity and, to a lesser
degree, the reliability of measure will help ensure that information gathered during your research
will enable you to answer the questions that are important to you and your community.

Compensation/Incentives

For some communities or populations, providing some form of compensation or incentive is


necessary to guarantee their participation. If you use incentives please keep the following in
mind:

 The type of incentive you use needs to be appropriate for the population. For example, if
conducting interviews with mothers of your children, choosing diapers is more
appropriate than giving out theater tickets; or giving substance users gift certificate for
food is more appropriate than cash incentives.

 It is important to communicate to participants that they will receive compensation


regardless of how they answer questions. This is to ensure honesty in participant’s
responses.

Ten Commandments in Data Collection

1. Begin thinking about the type of data you will have to collect.

2. As you think about the type of data that you will be collecting, think about where you
will be getting the data

3. Make sure that the data collection form you will be using is clear and easy to use

4. Make a duplicate copy of the data file

5. Do not rely on other people to collect or transfer your data unless trained

6. Plan a detailed schedule of where and when you will be collecting your data

7. Cultivate possible sources of your subject pool


8. Follow-up subjects who missed testing session or interview

9. Never discard original data

10. Follow the previous 9 and pray. This is most important!

Ways to Acquire Research Instruments

 Find and administer a previously existing instrument


 Design one’s own instrument

Tips in Developing Instruments

 Be sure you are clear as to what variables are to be assessed


 Review existing instruments that measure similar variables
 Decide on a format for each variable
 Begin compiling and/or writing items
 Have colleagues review items
 Revise item based on colleagues’ feedback
 Locate a group of people with experience appropriate to your study and have them review
your items.
 Try-out your instruments with a group of respondents who are as similar as possible to
your study respondents
 If feasible, conduct a statistical item analysis with your tryout data
 Select and revise items as necessary until you have the number you want.

Steps in Planning a Study/Research

1. Make an estimate of the number of items in the population.

2. Assess resources such as time and money factors that are available to pursue the research.

3. When the amount of resources does not warrant the study of the entire population, samples
are used instead of the population.

N
4. Determine the sample size by the formula n= 2
1+N e
Where n = sample size
N = population size
e = desired margin of error (percent allowance for non-precision
because of the use of sample instead of the population)

Note: Advisable only if the population is equal to or more than 100

***One may also use the sample size calculator, search the web and use either the survey
monkey or that of Raosoft Inc. Just enter the population, the margin of error and the level of
confidence.

Find the sample size of each population using the indicated margin of error.

i. N=4215 e = 5% ii. N=6718 e = 10%

5. Pick the samples by using the appropriate sampling technique. Sept 25 am class

a. Random Sampling –all individuals in the defined population have an equal and
independent chance of being selected as samples

i. Lottery Sampling-drawing of lots; advisable when the population is small

ii. Table of Random Numbers

a. Direct Selection Method


b. Remainder Method

b. Systematic Sampling

i. Stratified Proportionate Sampling – population is divided into groups based on


homogeneity in order to avoid the possibility of drawing samples whose
members come only from one stratum and then units are selected randomly
from each stratum.
Question: is the basis for stratification logically related to the items of
information sought?

Suppose we wish to select a sample of employees from a company to determine its attitude to the
new employee development program. We decide that one factor which affects their views would
be their age bracket. The company records show that the workforce has the following
breakdown. Using 1) 10% 2) 5% margin of error, how many samples shall we get from each
group?

Population
30 and below 425
31-45 843
46-60 578
Above 60 187

ii. Cluster Sampling – the population is grouped into clusters (commonly based
on geographic areas/districts) or small units like blocks or districts in a
municipality or city. This is advantageous when individuals in the districts or
block are heterogeneous and when the clusters are approximately the same
size.

iii. Multi-Stage Sampling – useful in surveys involving a large universe. The


population is grouped into a hierarchy of units and sampling is done
consecutively.

c. Non-Random Sampling

i. Purposive – individuals are chosen as samples according to the purpose of the


researcher; this is popular in qualitative research

ii. Quota –popular in the field of opinion research because it is done by merely
looking for individuals with the requisite characteristics. You select people
according to some fixed quota.

iii. Convenience (Accidental or Incidental) - applied to those samples taken


because they are most available, the selection of units from the population is
based on easy availability and/or accessibility.

iv. Snowball sampling-each research participant is asked to identify other


potential research participants who have a certain characteristic. You start
with one or a few participants, ask them for more, find those, ask them for
some, and continue until you have a sufficient sample size. This technique
might be used for a hard to find population.

6. Prepare the questions to be asked in the interview or questionnaire. In case a questionnaire is


used, it is best to evaluate its clarity and validity by subjecting it to a pretest by 5-10% of the
desired sample size.

Advantages of Sampling:
 It saves time, money and effort

 It gives a more comprehensive information since a more thorough investigation of the


study is possible

Disadvantages of Sampling

 If sampling plan is not correctly designed and followed, the results may be misleading

 Complicated sampling plans are laborious to prepare

 If the characteristics to be observed occur rarely in a population, then the sample is in fact
misleading

Types of Questions

Structured – leaves only one way or few alternative ways of answering it. The questions are
clear, simple and objective. These questions are easy to tabulate.

Unstructured or open-ended –probing questions that want to elicit reasons.

Features of a Good Questionnaire Sept 25 pm class

1. Make the questions short and clear

2. Avoid leading questions

3. Always state the precise units in which you require the answer in order to facilitate tabulation
later on

4. As much as possible ask questions which can be answered by just checking slots or stating
simple names or brands

5. Limit questions to essential information – all questions must have a reason for being included
in the questionnaire

6. Arrangement of questions should be carefully planned-questions should be asked in correct


sequence so that answers obtained will have a logical flow of thought

7. Words used should be familiar and relevant to the respondents


Exercises:

I. Find the number of samples needed if

a. N = 4,367
e = 1%

b. N = 3,798
e = 10%

II. Find the number of samples needed per department given a margin of error of 5%

Departments First Semester


CON/GM 376
CCS 126
CAS 115
CBA/HRM 258
CAC 341
CED 100
CPT 39

III. State whether each is quantitative or qualitative

_______________a. marital status or nurses in a hospital


_______________b. time it takes to travel from the Philippines to Spain
_______________c. amount of time a transaction is processed at LTO
_______________d. colors of cars in the school’s parking lot
_______________e. flavors of ice cream in a supermarket
_______________f. seating capacity of SPC Presidents’ Hall
_______________g. brand of cellphones
_______________h. number of ISO certified hospitals in San Pablo

IV. Identify each item as discrete or continuous

_______________a. number of cups of hot coffee sold by Starbucks San Pablo in a day
_______________b. lifetimes of cellphone batteries
_______________c. weight of school bags of Grade School pupils
_______________d. amount of time spent by a 4-year old child playing games in a tablet
_______________e. number of students in the Senior High School Arts & Design strand
_______________f. amount of money spent by a graduate school fellow in school fees per
semester

V. Classify each as nominal, ordinal, interval and ratio level data

________________a. social security number


________________b. Ranking of triathletes in Ironman Philippines
________________c. License plate numbers
________________d. Salaries of coaches in the PBA
_________________e. Temperature inside 10 pizza ovens
________________ f. time required to complete a game of chess

You might also like