Professional Documents
Culture Documents
Advance Statistics
Advance Statistics
INTRODUCTION
Statistics is a branch of mathematics that deals with the processes of gathering,
describing, organizing, analyzing and interpreting numerical data as well as
drawing valid conclusions and making reasonable decisions on the basis of such
analysis.
Scope of Statistics:
It is said that statistics is the “tool” of all sciences. It is also called the “language of
research”.
However, statistical figures can be used to present the truth or it can be used to distort
the facts and hide the truth. This implies that statistics can also be misused and
abused.
Data- are the raw materials with which the statistician works
Ungrouped or raw data - are data which are not organized in any specific way
Grouped data - are data organized into groups or categories with corresponding
frequencies
Sample - is a subgroup of the population; taken from the population so as to represent the
population characteristics or traits. Measures are called estimates or statistics
Variable-the characteristic that is being studied. Examples are age, gender, intelligence,
personality type, attitudes, political or religious affiliation, height, weight, marital status
etc…
Qualitative Variables-have values that are described by words rather than numbers;
represent differences in quality, character or kind i.e. gender, birthplace, geographic
locations, religious preference, marital status
Note: there are variables that can be expressed both qualitatively and quantitatively like
grades in school in percent or in letters
Discrete - values can be counted using integral values such as the number of enrollees, drop-outs,
graduates, deaths, employees, cars, math subjects, calls (integral values)
Continuous - can assume any numerical value over an interval or intervals like height, weight,
temperature, time, pressure in a tire, number of kilometers driven (decimals or fractions)
Scales of Measurement
Nominal data use numbers for the purpose of identifying name or membership in a group
or category. All qualitative variables use this scale. Examples are gender, zip codes,
religion, marital status, major field of study, names of schools attended, brands of soaps
purchased, diagnosis
Ordinal data connote ranking or inequalities. One category is higher than the other one.
Examples are social class or incomes, responses to an item on an instrument (always,
sometimes, never), contest results (first, second, third), birth order study (first or second-
born)
Interval scales indicate an actual amount and there is equal unit of measurement
separating each score, specifically equal intervals. The interval between 50 F and 60 F is
the same as the interval between 20 F and 30 F. Since intervals between numbers
represent distances, mathematical operation can be done like getting the average.
However, since the zero point is arbitrary meaning 0 F does not mean there is no
temperature, we can’t say that 60 F is twice as warm as 30 F. Examples are Fahrenheit
temperature, Likert scale (5 or 7 scale points) scores on test as a measure of knowledge
and aptitude test scores
Ratio data are the strongest level of measurement; have all the properties of the other
three but has a meaningful zero that represents the absence of the quantity being
measured. Because of this, ratios are meaningful. Thus, you can say that the income of
one is twice or thrice the other. Examples are election vote, speed of a production line,
average delivery and measurements of length, weight, area, volume, density, velocity,
money and duration.
Kinds of Statistics
Descriptive Statistics-the collection and organization of data where the statistician tries to
describe a situation; masses of unorganized numerical data are of little value unless
statistical techniques are available to organize this type of data into a meaningful form
Types of Data
Primary – refer to information which are gathered directly from an original source or which are
based on direct or first hand experience.
Secondary – refer to information which are taken from published or unpublished data which
were previously gathered by other individuals or agencies
Observation Method – used when the subjects cannot talk or write. This makes possible
the recording of behavior at the appropriate time and situation.
Experiment Method – used when the objective is to determine the cause and effect
relationship of certain phenomena under controlled conditions.
Strengths
In focus groups or group interviews, questions are directed at the group rather than
specific individuals; highly sensitive subjects can therefore be explored without the
individuals feeling pressured to respond or disclose.
Weaknesses
Results may not be fully generalized to the entire study population or community
because the group is not a representative sub-sample in a strict or formal sense.
a. Observation:
Involves observation rather than asking questions; therefore; you are not dependent
on respondent self-report.
Used to better understand behaviors, the social context in which they arise, and the
meanings that individuals attach to them.
Observers compile field notes describing what they observe. Analysis focuses on
what happened and why.
Maybe the most feasible way of collecting information from certain populations (e.g.,
infants and children or individuals who frequent public sex environments or drug
shooting galleries, children or infants).
All alternative to focus group interviews when you want to avoid group influences on
the responses people give.
Questions are mostly open-ended questions; they are usually related to an open-ended
question.
c. Group interviews:
Role of the interviewer is to ask questions and record the responses of each
participant, stimulating conversation directed at him/her.
d. Focus Groups:
Useful for generating ideas and suggesting strategies; not very useful for finalizing
choices and definitively settling issues.
Questions or topics are open-ended and more broader than questions used in group or
one to one interviews.
All of these above methods for collecting qualitative data include the use of a protocol with
pre-determined topics. A protocol that provides consistency in the areas of inquiry allows the
development and analysis of findings that are cross-cutting and not just anecdotal.
Organize these questions into a set of summary topics. For focus groups and group
interviews, prepare four to six open-ended questions with pre-planned probes for
each. For one-on-one interviews, prepare from 5 to 10 questions with pre-planned
probes.
Remember, a good facilitator creates a natural progression across topics with some
overlap between the topics.
Quantitative information is usually gathered by surveys and elicits data in a form that permits
exact counting. These types of data deals with percentages, averages, and other mathematical
operations as a way of summarizing how a population thinks, feels, or acts. Quantitative data
is typically gathered from large numbers of respondents selected either at random or through
a convenience selection process. The data collected is generally analyzed by statistical
methods. Quantitative data answer the what, when, and who but are limited in their ability to
answer the why and how.
Strengths
Surveys can usually be conducted economically and can allow you to question large
numbers of people.
If the participant selection process is well designed and the sample is representative
of the population being studied, the responses can be generalized.
Relatively easy to analyze data and flexible enough to allow an array of analytical
methods to extract inferences.
Weaknesses
It is difficult to gain a full sense of the context in which activities take place.
Secondary data are commonly used in program evaluation and needs assessment
research. The following are some of the types of secondary data typically available:
Epidemiological data
Census data
Knowledge, attitude, belief, and behavior (KABB) studies
Other needs assessments
Program client records
Agency progress reports
Clinic data
Evaluation data
Surveys
Surveys are the most commonly used form of quantitative primary data. Survey research
involves the use of a questionnaire given to a sample of respondents selected from some
population. This method of collecting data is typically used with a population too large to
observe directly.
It is essential that the interviewer be neutral and avoid having an effect on the
participant’s responses.
Survey Tips
Surveys should in general use closed-ended questions (e.g., those that can be
answered “yes” or “no” or with a set number of choices).
For continuous variable (e.g., age, income), leave response categories open-ended.
Avoid asking two questions in one, don’t use “and” or “or” in your questions.
Use language and cultural references that are appropriate to your anticipated
participants.
Make your survey attractive and leave plenty of white space between questions.
Confidentiality
This is a particularly important issue if collecting data about sexual behavior, substance
use, HIV status, or any other topic that may be considered “sensitive”. Remember “sensitive”
issues can be different for each individual and within each community. At minimum the
following steps should be taken to protect confidentiality of key informants:
After data have been collected, identifying information should be destroyed as soon
as possible.
All individuals involved in data collection and/or analysis should be carefully trained
or briefed in the importance of confidentiality.
Informed Consent
This criterion emphasizes the importance of both accurately advising any key informants
as to the nature of the research process and obtaining his or her verbal or written consent to
participate.
Cultural Sensitivity
Every community has its own unique characteristics and compositions. It is important
that your research process understands and addresses these issues of cultural difference.
Cultural sensitivity needs to move beyond the polemic and influence everyday practice in all
research activities. A process that is truly culturally sensitive has the following
characteristics:
Awareness that racial, ethnic, and cultural minority groups have different needs or
have underutilized services. Diversity can include differences on the basis of sexual
orientation, socio-economic status and gender.
Staff, volunteers and advisory committee members reflect the racial and cultural mix
of the local population.
Reliability refers to the extent to which collected data is free of unpredictable kinds of error. In
other words, “Does this questionnaire or protocol yield consistent result”?
Variations in how the instrument is When pilot testing, include issues related
administered or oversights in giving to survey administration. Train facilitators
directions and staff. Occasionally supervise or follow-
up with staff.
Random effects caused by respondents Make sure to pilot test your surveys.
who guess or check alternatives without Make sure that respondents are provided with
trying to understand them clear instructions. Don’t ask respondents
questions that you are not sure can answer.
Validity
Validity refers to the ability of your instruments to help you produce an accurate, relevant,
representative and complete description or picture of your community or program. For example,
your IQ would seem a more valid measure of your intelligence than would the number of hours
you spend in the library.
Compensation/Incentives
The type of incentive you use needs to be appropriate for the population. For example, if
conducting interviews with mothers of your children, choosing diapers is more
appropriate than giving out theater tickets; or giving substance users gift certificate for
food is more appropriate than cash incentives.
1. Begin thinking about the type of data you will have to collect.
2. As you think about the type of data that you will be collecting, think about where you
will be getting the data
3. Make sure that the data collection form you will be using is clear and easy to use
5. Do not rely on other people to collect or transfer your data unless trained
6. Plan a detailed schedule of where and when you will be collecting your data
2. Assess resources such as time and money factors that are available to pursue the research.
3. When the amount of resources does not warrant the study of the entire population, samples
are used instead of the population.
N
4. Determine the sample size by the formula n= 2
1+N e
Where n = sample size
N = population size
e = desired margin of error (percent allowance for non-precision
because of the use of sample instead of the population)
***One may also use the sample size calculator, search the web and use either the survey
monkey or that of Raosoft Inc. Just enter the population, the margin of error and the level of
confidence.
Find the sample size of each population using the indicated margin of error.
5. Pick the samples by using the appropriate sampling technique. Sept 25 am class
a. Random Sampling –all individuals in the defined population have an equal and
independent chance of being selected as samples
b. Systematic Sampling
Suppose we wish to select a sample of employees from a company to determine its attitude to the
new employee development program. We decide that one factor which affects their views would
be their age bracket. The company records show that the workforce has the following
breakdown. Using 1) 10% 2) 5% margin of error, how many samples shall we get from each
group?
Population
30 and below 425
31-45 843
46-60 578
Above 60 187
ii. Cluster Sampling – the population is grouped into clusters (commonly based
on geographic areas/districts) or small units like blocks or districts in a
municipality or city. This is advantageous when individuals in the districts or
block are heterogeneous and when the clusters are approximately the same
size.
c. Non-Random Sampling
ii. Quota –popular in the field of opinion research because it is done by merely
looking for individuals with the requisite characteristics. You select people
according to some fixed quota.
Advantages of Sampling:
It saves time, money and effort
Disadvantages of Sampling
If sampling plan is not correctly designed and followed, the results may be misleading
If the characteristics to be observed occur rarely in a population, then the sample is in fact
misleading
Types of Questions
Structured – leaves only one way or few alternative ways of answering it. The questions are
clear, simple and objective. These questions are easy to tabulate.
3. Always state the precise units in which you require the answer in order to facilitate tabulation
later on
4. As much as possible ask questions which can be answered by just checking slots or stating
simple names or brands
5. Limit questions to essential information – all questions must have a reason for being included
in the questionnaire
a. N = 4,367
e = 1%
b. N = 3,798
e = 10%
II. Find the number of samples needed per department given a margin of error of 5%
_______________a. number of cups of hot coffee sold by Starbucks San Pablo in a day
_______________b. lifetimes of cellphone batteries
_______________c. weight of school bags of Grade School pupils
_______________d. amount of time spent by a 4-year old child playing games in a tablet
_______________e. number of students in the Senior High School Arts & Design strand
_______________f. amount of money spent by a graduate school fellow in school fees per
semester