Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 65

STATISTICS : THE

LANGUAGE OF
FACTS
GROUP 6
INTRODUCTION

• STATISTICS is needed in all researches especially in quantitative studies.


• STATISTICS measures the data of the real world on “how much” and “how many” scale.
• According to LEEDY, statistics is language which, through its own special symbols and
grammar, takes numerical facts of life and translates them meaningfully
• According to FERGUSON AND TAKANE, statistics is viewed as the study of variation,
because it provides a technology for the exploration of the study of variation in the events of
nature and for the working of inference about the casual circumstances which underline that
variation
USES OF STATISTICS
•S TAT I S T I CS I S I MP O RTA N T I N P RES E N TAT I O N , A N A LY S I S, A N D
I N T ER PRE TATI O N .

•D ATA A R E U S EL ES S I F T H E Y A RE A N A LY SE D .

•I T I S A LA N G U A G E O F FA C TS . W I T H O U T S TAT I S TI CS
RES E A RC H I S U S E LE SS .

•A LL O W S U S TO CRI TI CA L LY A N A LY S E T H E RE SU LT S .

•P RO V I D ES O RG A N I Z AT I O N A N D M EA N I N G TO D ATA .
ES TA BL I S H T H E L EV EL O F REL I A B I L I TY A N D BA S I S FO R
I N F ER EN CES F O R T H E P O P U LAT I O N .

•G U I D E S TH E RES E A RC H E R I N D RAW I N G O U T TH E C O N CLU SI O N


FO R ST U D Y. M A K ES O BJ E CTI V E S, D EF I N I T E, A N D P RE CI S E
D ES C RI P TI O N O F D ATA .
KINDS OF STATISTICS

 There are about two kinds of statistics.

A.DESCRIPTIVE STATISTICS.
B.INFERENTIAL STATISTICS.
DESCRIPTIVE
STATISTICS
 Descriptive statistics are used to describe the basic features of the data.
They provide simple summarises about the sample and the measures.
• Measures of central tendency(arithmetic mean ,mode ,median).
• Measures of dispersion OR variability(variance standard deviation from the
mean ,interquartile ,range).
• Percentage.
• Frequency distribution(histogram or bar graph, frequency polygon or
curve).
• Ratios and ranking.
• Measures of non central location(percentile ,decile ,quartile).
• Measures of symmetry/assymmentry (positively and negatively skewed).
• Measures of flatness and kurtosis(leptocrotic ,platykurtic).
INFERENTIAL STATISTICS
• Inferential statistics are used when you want to move beyond simple
description or characterization of your data
1. PARAMETRIC TESTS
A) Z-test
B) t-test
C) f-test(Analysis of variance –ANOVA)
D)correlation techniques (pearson product moment correlation
or pearson, biserial correlation , Tetrachloric correlation.
2.NON PARAMETRIC TEST
A)Chi square test
B)Spearmen Rho-rank correlation
C)Friedman’s analysis of variance
D)Kruskal-Wallis test
E)Phi-coefficient
F)Contingent coefficient
G)Kendall rank correlation or kendall coefficient
of concordance.
DESCRIPTIVE STATISTICS
DESCRIPTIVE STATISTICS
• Descriptive Statistics are Used by Researchers to Report
on Populations and Samples

• In Sociology:
Summary descriptions of measurements (variables) taken
about a group of people

• By Summarizing Information, Descriptive Statistics Speed


Up and Simplify Comprehension of a Group’s
Characteristics
DESCRIPTIVE STATISTICS

Types of descriptive statistics:


• Organize Data
– Tables
– Graphs

• Summarize Data
– Central Tendency
– Variation
DESCRIPTIVE STATISTICS

Types of descriptive statistics:


• Organize Data
– Tables
• Frequency Distributions
• Relative Frequency Distributions
– Graphs
• Bar Chart or Histogram
• Stem and Leaf Plot
• Frequency Polygon
FREQUENCY DISTRIBUTION
Frequency Distribution of IQ for Two Classes

IQ Frequency

82.00 1
87.00 1
89.00 1
93.00 2
96.00 1
97.00 1
98.00 1
102.00 1
103.00 1
105.00 1
106.00 1
107.00 1
109.00 1
111.00 1
115.00 1
119.00 1
120.00 1
127.00 1
128.00 1
131.00 2
140.00 1
162.00 1

Total 24
RELATIVE FREQUENCY
DISTRIBUTION
Relative Frequency Distribution of IQ for Two Classes

IQ Frequency Percent Valid Percent Cumulative Percent

82.00 1 4.2 4.2 4.2


87.00 1 4.2 4.2 8.3
89.00 1 4.2 4.2 12.5
93.00 2 8.3 8.3 20.8
96.00 1 4.2 4.2 25.0
97.00 1 4.2 4.2 29.2
98.00 1 4.2 4.2 33.3
102.00 1 4.2 4.2 37.5
103.00 1 4.2 4.2 41.7
105.00 1 4.2 4.2 45.8
106.00 1 4.2 4.2 50.0
107.00 1 4.2 4.2 54.2
109.00 1 4.2 4.2 58.3
111.00 1 4.2 4.2 62.5
115.00 1 4.2 4.2 66.7
119.00 1 4.2 4.2 70.8
120.00 1 4.2 4.2 75.0
127.00 1 4.2 4.2 79.2
128.00 1 4.2 4.2 83.3
131.00 2 8.3 8.3 91.7
140.00 1 4.2 4.2 95.8
162.00 1 4.2 4.2 100.0

Total 24 100.0 100.0


SPSS OUTPUT FOR HISTOGRAM
BAR GRAPH
STEM AND LEAF PLOT
Stem and Leaf Plot of IQ for Two Classes

Stem Leaf
8 279
9 3678
10 235679
11 159
12 078
13 1
14 0
15
16 2

Note: SPSS does not do a good job of producing these.


SPSS OUTPUT OF A FREQUENCY
POLYGON
DESCRIPTIVE STATISTICS
Summarizing Data:

– Central Tendency (or Groups’ “Middle Values”)


• Mean
• Median
• Mode

– Variation (or Summary of Differences Within Groups)


• Range
• Interquartile Range
• Variance
• Standard Deviation
MEAN

Most commonly called the “average.”

Add up the values for each case and divide by the total number of
cases.

Y-bar = (Y1 + Y2 + . . . + Yn)


n

Y-bar = Σ Yi
n
MEDIAN

The middle value when a variable’s values are ranked in


order; the point that divides a distribution into two equal
halves.

When data are listed in order, the median is the point at


which 50% of the cases are above and 50% below it.

The 50th percentile.


MODE

The most common data point is called the mode.

The combined IQ scores for Classes A & B:


80 87 89 93 93 96 97 98 102 103 105 106 109 109 109 110 111 115 119 120
127 128 131 131 140 162

BTW, It is possible to have more than one mode!


A la mode!!
RANGE
The spread, or the distance, between the lowest and highest values of a
variable.

To get the range for a variable, you subtract its lowest value from its
highest value.

Class A--IQs of 13 Students Class B--IQs of 13 Students


102 115 127 162
128 109 131 103
131 89 96 111
98 106 80 109
140 119 93 87
93 97 120 105
110 109
Class A Range = 140 - 89 = 51 Class B Range = 162 - 80 = 82
INTERQUARTILE RANGE
A quartile is the value that marks one of the divisions that breaks a series of values into four equal
parts.

The median is a quartile and divides the cases in half.

25th percentile is a quartile that divides the first ¼ of cases from the latter ¾.
75th percentile is a quartile that divides the first ¾ of cases from the latter ¼.

The interquartile range is the distance or range between the 25 th percentile and the 75th percentile.
Below, what is the interquartile range?
25% 25% 25% of
25% of
cases cases

0 250 500 750 1000


VARIANCE
Variance is a number that at first seems complex to calculate.

Calculating variance starts with a “deviation.”

A deviation is the distance away from the mean of a case’s score.

Yi – Y-bar

If the average person’s car costs $20,000,


my deviation from the mean is - $14,000!
6K - 20K = -14K
STANDARD DEVIATION

To convert variance into something of meaning, let’s create


standard deviation.

The square root of the variance reveals the average deviation of


the observations from the mean.
INFERENTIAL STATISTICS
TYPES OF
DESCRIPTIVE
S TAT I S T I C S
MEASURES OF CENTRAL TENDENCY
OR AVERAGES
• The general measures of central tendency is
either a mid point, an average, or the most
frequent measures in a distribution of
measures . These are
• The median
• The mean
• The mode
• The mode : it is not a stable measures of central
tendency. It represents the measures or the variables
with the greatest frequency . When the mode is
computed from ungrouped data or grouped data.

• The median : it is the mid point of a scale or the


middle score in a distribution where one half of the
cases are above the median while the other half is
below the median
• The Arithmetic mean : it may simply be called
mean . It refers to the average of a group of
measures . It is the most reliable measures of
central tendency because it is always the
center of gravity of any group measures.
METHODS OF COMPUTING THE MEAN

• When the number of measures is not too big (30 or less)


averaging is used to compute the mean . This method is
called absolute methods or long method . The formula is

• When the measures are grouped into a class
frequency distribution , the mean may be
computed by means of midpoint method . The
formula is :
• When computing the mean by means of the
lower limit method that uses the lower limit
instead of midpoint , the formula is:
• Computing the mean using the deviation or
short method. The formula is
MEASURES OF PEAKENESS
OR FLATNESS
AND
TYPES OF INFERENTIAL
STATISTICS
Measures of peakeness Or
flatness
• This measure is sometimes referred to as kurtosis
• Peakedness of distribution can be leptokurtic, platykurtic, or
mesokurtic
• It is leptokurtic if one distibution is more peaked
• If less peaked it is platykurtic
• If normal it is mesokurtic i.e) falls between both
TYPES OF INFERENTIAL STATISTICS

• Inferential statistics
– Allow researchers to generalize to a population of individuals based on
information obtained from a sample of those individuals
– Assess whether the results obtained from a sample are the same as those
that would have been calculated for the entire population
• Two types
– Parametric
– Nonparametric
• Four assumptions of parametric tests
– Normal distribution of the dependent variable
– Interval or ratio data
– Independence of subjects
– Homogeneity of variance
• Advantages of parametric tests
– More statistically powerful
– More versatile
• Assumptions of nonparametric tests
– No assumptions about the shape of the distribution of the dependent variable
– Ordinal or categorical data
• Disadvantages of nonparametric tests
– Less statistically powerful
– Require large samples
– Cannot answer some research questions
• Multiple comparisons
– Omnibus ANOVA results
• Significant difference indicates whether a difference exists across all pairs of scores
• Need to know which specific pairs are different
– Types of tests
• A priori contrasts
• Post-hoc comparisons
– Scheffe
– Tukey HSD
– Duncan’s Multiple Range
• Conservative or liberal control of alpha
• Two-factor ANOVA
– Also known as factorial ANOVA
– Comparison of means when two independent variables are being examined
– Effects
• Two main effects – one for each independent variable
• One interaction effect for the simultaneous interaction of the two independent variables
• Two-factor ANOVA (continued)
– Example – examining the mean score differences for male and female students
in an experimental or control group
– Computation of the test statistic
– SPSS-Windows syntax
• Chi Square
– A nonparametric test in which observed proportions are compared to expected proportions
– Types
• One-dimensional – comparing frequencies occurring in different categories for a single group
• Two-dimensional – comparing frequencies occurring in different categories for two or more
groups
– Examples
• Is there a difference between the proportions of parents in favor of or opposed to an extended
school year?
• Is there a difference between the proportions of husbands and wives who are in favor of or
opposed to an extended school year?
CORRELATION

• The correlation coefficient is related to other types of measures of association:


– The partial correlation, which measures the degree of association between two variables
when the effects on them of a third variable is removed: what is the relationship between
student achievement and dollars per student spent by the school district when the effect of
parents’ SES is removed
– The multiple correlation, which measures the degree to which one variable is correlated
with two or more other variables: how well can I predict student achievement knowing
mean school district expenditure per pupil and parent SES
REGRESSION ANALYSES

• Regression: technique concerned with predicting some


variables by knowing others
• The process of predicting variable Y using variable X
MULTIPLE REGRESSION ANALYSIS
(MRA)
• Method for studying the relationship between a dependent variable
and two or more independent variables.
• Purposes:
– Prediction
– Explanation
– Theory building
FORMULATING HYPOTHESIS

Research hypothesis:
A hypothesis is a suggested answer to the problem. It is formulated & presumably adapted to
explain observed facts or conditions & to guide in further investigation. It is a logical
supposition, a reasonable guess, an educated conjecture which may give direction to thinking
with respect to the problem & thus aid , in solving it. It is an expectation about events based on
generalization of the assumed relationship between variables.
Characteristics of hypothesis:
 It should conjecture upon a relationship between two or more variables.
 It should be stated clearly & unambiguously in the form of a declarative
sentence.
 It should be testable, that is, it should be possible to restate it in an
operational form which can be then be evaluated based on data.
Common classifications of hypothesis:
 Research hypothesis
 Statistical hypothesis
RESEARCH HYPOTHESIS:
It is possible solution to the research problem posited. It is a prediction. It
is a statement in a declaration form. It is also known as an alternate
hypothesis.
For example:
Sub-problem:
What is the relationship between IQ and achievement ??
Hypothesis:
IQ and achievement are positively related.
STATISTICAL HYPOTHESIS:
It may either be null hypothesis or alternative hypothesis. The null
hypothesis is stated in a null form which means that there is no significant
relationship between the independent & dependent variable. This type of
hypothesis is very common in psychological, social and educational
research . It is easier to disprove using the null hypothesis. It is a
hypothesis of no relationship or no difference.
EXAMPLES OF A NULL HYPOTHESIS:
 There is no significant relationship between IQ and achievement.
 There is no significant difference between educational attainment and
teaching effectiveness.
In alternative hypothesis, it asserts that there is significant difference or
significant relationship between the independent variable and dependent
variable. There are 3 forms of hypothesis
 Non directional hypothesis
 Positive directional hypothesis
 Negative directional hypothesis
Non-directional hypothesis:
It is one in which the researcher is not interested in the direction of the
difference, whether one is lesser than or greater than the other. The interest
lies on the difference mot in its direction, or that the direction of the
difference of no consequence.
POSITIVE DIRECTIONAL HYPOTHESIS:
It uses the positive tail or the upper tail of the curve. It is a one-tailed test
which is a less rigid test.
NEGATIVE DIRECTIONAL HYPOTHESIS:
It uses the lower tail of the curve. It is a one tailed test.
In summary, where relationship or difference exists , hypothesis also
exists. Some descriptive studies do not require hypothesis if the studies
purpose is only to describe & explain the phenomenon.
SOURCES OF HYPOTHESIS:
 Problem statement.
 Theoretical framework.
 Related literature.
 Observation & experiences.
CHARACTERISTICS OF A GOOD HYPOTHESIS:
 It clearly states what variable are used.
 It clearly states how the variables are used.
 It determines the purpose of the study.
 It is testable.
 It is clearly determines the significance of the relationship or difference of
sets of variable.
 In comparative analysis, the intervening variables are clear and isolated.
 In a problem of relationship, the independent & the dependent variables
are clear, specific, and isolated.
HYPOTHESIS AND ASSUMPTION:
An assumption is a statement of the research behavior to facts but cannot be
verified whereas hypothesis is testable and verifiable and thus confirm or
disconfirm a theory or theories.
CAUTIONS IN USING STATISTICS:
Statistics is an important tool of researcher. There are limitations,
however, that should be recognized in using statistical process as well as
drawing conclusions from statistical evidence. The precautions are
 Statistics is a companion of research which clarifies, verifies & measures
relationships that have been established by clear & logical analysis.
A statistical analysis should be utilized in the analysis of data if it adds
clarity or meaning to the analysis of data.
 The statistical tools used to analyze the data do not yield significant truths
if the are invalid & unreliable.
 All the treated data must be double checked frequently so as to minimize the
like hood of measurement errors, recording, tabulating & analyzing.
 Some biased statistics may use inappropriate
statistical tools and procedures or omit relevant data to suit statistical
process
U …
YO
NK
HA
T

You might also like