Cac Chi So Mean, Median

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 80

CONTENT

A. SIMPLE NUMERICAL COMPUTATIONS


1. Frequency & Frequency Distributions
2. Simple frequency
3. Relative Frequency
4. Cumulative Frequency
B. DISPLAYING THE DATA  
1. Table forms
2. Visual forms
2.2 Graphs
2.3 Polygons
DEFINITION
Numerical computation
 the basic principles of arithmetic like addition,
subtraction, multiplication and division
 mathematical terms and methods such as
percentages, ratios, fractions and decimals...
Frequency
 The frequency of an event i is the number ni of time the
event occurred in an experiment or study.
 3 types: simple frequency, relative frequency, cumulative
frequency.
What is a frequency
distribution?
An organized tabulation of the number of individuals
located in each category on a scale of measurement
• A method for simplifying and organizing
data
• Presents an organized picture of the
entire set of scores
• Are scores generally high or low?
• Are the scores clustered together or
spread out?
• Shows where each individual is located
relative to others in a distribution
• Where does one score fall relative to
all others?
Frequency Distribution Tables
• Consists of at least two columns
• Categories on the scale of measurement (X), ordered from lowest
to highest
• Frequency for each category (how often each category was
reported)

Original scores: 1, 2, 3, 5, 4, 4, 2, 3, 1, 3, 2, 3, 2, 2

Frequency table:
1 was the lowest reported
score. Two people had a
2 was the most commonly
score of 1.
reported score. Five
people had a score of 2.
Section A-1
Simple Frequency

Copyright © 2010, 2007, 2004 Pearson


Education, Inc. All Rights Reserved.
Copyright © 2010 Pearson Education
Example: An experiment using a sample of Chicago
graduate students at San Jose State

• Step 1: check the number of each in the population


of Chicano graduate students.
• Step 2: code them as l-2
• Step 3: tally them
The number for males: 255 (1) Tallies
The number for females: 60 (2)
 The total number of Chicano graduate students
(hypothetically): 255 + 60 = 315  

•Ratio
  = = = = 0.24 (people)
X 100

=> there are 24 women enrolled at the university to


every 100 male Chicanos.  
 

 
Ratio

Simple
Proportion
frequencies

Percentage

=> Simple frequencies are useful first ways of reducing the


data, but such counts don’t always give us a precise picture of
the data.
Section A-2
Relative Frequency
Tần số tương đối

Copyright © 2010, 2007, 2004 Pearson


Education, Inc. All Rights Reserved.
Copyright © 2010 Pearson Education
 

Example: Report the frequency of high scores on reading tests for


two different schools
• School A School B => Similar
2 of 2 high marks 2 of 2

• School A School B => Different


2 of 10 high marks 2 of 80

To show this difference, we would compute the relative frequency.

Relative frequency =
 
Relative frequency =

Score Frequency Relative Result


Frequency

80 5 5 : 60 0.08

70 15 15 : 60 0.25

60 20 20 : 60 0.34

50 15 15 : 60 0.25

40 5 5 : 60 0.08

N = 60
RATE
• Rate is used to show how often an event happens compared
with how often it might happen.
• For example; we might use rate to show the number of
people who do learn a language compared with the number
of people who might learn the language.

  Rate = (1000) (Relative Frequency)


= (1000)
Galaxy people who speak English as a second language

Age Total ESL speakers Rate per 1,000 population

0–5 2,905,000 189.5


6 – 16 11,915,000 314.9
17 – 24 10,436,000 227.9
25 – 44 14,905,000 277.8
45 – 64 7,856,000 218.5
65+ 1,400,000 201.4

- There are more ESL speakers who are in the age group 25-44
- However, there is a greater proportion of ESL speakers in the 6-16
age group.

=> your chances of finding someone to talk to in English is best if you


walk up to someone between 6 and 16 years of age.  
RAT
E
2 reasons:
 to compare different populations with respect to frequency
of some variable
 to compare the same population at different times

Example: twenty years after we collected the above data, we conducted


another survey of Galaxy people.

  Percent change = (100)

n2 : the new frequency number


n1 : the beginning frequency
 Percentage = (100)
Children in the 6 to 16
age group = (100)
n1 : 11,915,000
= (100) (0.66)
n2 : 19,742,000
= 66%

 the “why” answer is not in the data, all these data


WHY show is that a change took place.

 If we wish to show a relationship between these


variables => make hypotheses and test them.
Section A-3
Cumulative Frequency

Copyright © 2010, 2007, 2004 Pearson


Education, Inc. All Rights Reserved.
Copyright © 2010 Pearson Education
Cumulative Frequency
Tần số tích lũy
• If you want to show the standing of any particular
score in a group of scores
• This will show us how many scores fall below that
particular point in the distribution.
• It is also the basis for calculating percentile scores.
Percentiles & Percentile
Ranks
Thứ hạng
• Percentile rank
phần trăm
• The percentage of individuals in the distribution with scores at
or below the particular value
• Percentile
• When a score is identified by its percentile rank

For example:
o Your exam score is X = 43
o 60% of the class had scores of 43 or lower
o Your score has a percentile rank of 60% and is called the
60th
percentile.
Example: student placement in English classes at the university
Placement
f F Percentile
repuirement

0 quarter term 102 392 100 f : simple frequency


1 quarter term 130 290 74
2 quarter terms 77 160 41 F: cumulative frequency
3 quarter terms 45 83 21
4 quarter terms 38 38 10

 Percentile = (100) = (100) = (100)(0.74)

= 74th

=> 74% of the students who took the test scored at or lower than that level
Score Frequency Relative Cumulative Percentile
(f) Frequency Frequency (F)

80 5 0.08 60 100
70 15 0.25 55 92
60 20 0.34 40 67
50 15 0.25 20 33
40 5 0.08 5 8

  Relative Frequency =

Cumulative Frequency = 5 +15 = 20

Percentile = (100)
Section A-2
Table forms

Copyright © 2010, 2007, 2004 Pearson


Education, Inc. All Rights Reserved.
Copyright © 2010 Pearson Education
Distribution of ESL students World literacy
in the entering foreign estimates,*
student population
Illiterate Literate Total

Groups f 810 mill. 1,525 mill. 2,335 mill.


ESL Students 290
*Literacy is equated with 4th grade
Non-ESL Students 102 graduation
Total 392

 The most conventional way to display simple frequency data is in


table form  
 Nominal data are easily presented in this way.
 Ribbonlike tables (if they spread across a page) are usually considered
unattractive, as are long thin tables. They count as “bad style.”  
Section A-2
Visual forms

Copyright © 2010, 2007, 2004 Pearson


Education, Inc. All Rights Reserved.
Copyright © 2010 Pearson Education
2 basic types of graphic displays
• Graphs (either histograms or bar graphs)
• Polygons (line drawings).

The techniques for constructing them are almost the same:  


1. Draw two axes (a vertical and a horizontal line).  
2. On the horizontal line, enter the scores for the variable.  
3. On the vertical line, enter the frequency of each of these scores.  
4. Construct the graph or polygon around these frequency points.  
Histogram
Basically a graphic version of a frequency distribution.
Multiple Bar Graph
Median Income of Males and Females
Polygon
Uses line segments connected to points directly
above class midpoint values
Bell-shaped curve

Features
• One half of the curve looks like the other half (symrnetric distribution)
• Most of the scores fall in the middle of this curve, the high part.
• There is a nice gradual slope of scores below and scores  
above that one midpoint.
The Shape of a Frequency
Distribution
Skewed
Distributions
Positively Skewed Negatively Skewed
• Scores tend to pile up on the • Scores tend to pile up on the
left side right side
• Tail “points” to the right” • Tail “points” to the left
• The “skew” is on the • The “skew” is on the
“positive” side of the curve “negative” side of the curve
Name That Distribution!

Symmetric (no skew) Positive Skew Negative Skew


DESCRIBING THE DATA

The purpose of this chapter


1. Show how you can arrive at these most typical scores
and also the reservations that you must keep in mind
when you interpret them.
• The typical score is also important, for it allows us to
compare different groups.
• Using a pretest posttest control group design
2. You have four different distributions:
• two for the pretest (experimental and control)
• two for the posttest
Suppose the representative typical scores of each distribution on a
scale of 50 were the following:
  Control 25 27
  Experimental 26 40

Pretest Posttest

It tells you:
At the time of the pretest there appears to be no difference between
the groups.
At the time of the posttest (after the treatment). there appears to be
a large difference between the groups (your treatment worked).
The difference is in favor of you r experimental group, not the
control. Thus, the most typical score is both useful and crucial to
your research.
Descriptive statistics
• Only with ordinal and interval variables
MEAN
The formula for obtaining the mean is
 
MEDIAN
• The median score is also easy to find. Arrange your scores
in rank order. The median is the score which is at the center
of the distribution.
• Half of the scores are above the median and half are
below.
If the number of scores is odd, the median is the middle
score: 4 4 5 7 9 10 11.
If the number of scores is even, use the midpoint between
the two middle scores as the median:
4 5 7 9- 10 12 (7 + 9  2 = 8)
Ex: There were 20 items and the scores you obtained were:
16 10 5 6 8 15 20 14 16 10
MODE
• The mode is the most frequently obtained score in the
data. For example, in the following data the mode is
25:
20 22 23 23 25 25 25 25 27 29 30
In bimodal · distributions there.are two values which are
obtained most often, e.g.:
2 3 4 4 4 4 5 7 7 9
10 10 10 10 12 12 13 15
This distribution has two modes, 4 and 10.
The mode is the easiest measure of central tendency to
identify
MEASURES OF VARIABILITY
• Just as there are three ways of talking about the
most typical score in your data, there are three major
ways, too, to show how the data are spread out from
that point the range, the standard deviation, and
variance.
Range
• The easiest, most informal way to talk about the
spread of the distribution of scores is the range.
• Arrange the scores from the highest to the lowest.
• Subtract the lowest score from the highest score.

Range = Xhighest - Xlowest


Standard Deviation
• The most frequently used measure of variability
is the standard deviation
• The larger the standard deviation, the more
variability from the central point in the
distribution. The smaller the standard deviation,
the closer the distribution is to the central point.
• Consider, for example, the data on scores of ten Ss on a
short cloze passage: 2, 3, 3, 4. 5, 5, 5, 6, 6, 8.
• The mean (  X = I X -c- N ) is 47 -;- J O = 4.7. The individual
deviation of each score is:
Sample cloze data
X (scores) X–X x
2 2 - 4.7 - 2.7
3 3 - 4.7 - 1.7
3 3 - 4.7 - 1.7
4 4 - 4.7 - 0.7
5 5 - 4.7 .3
5 5 - 4.7 .3
5 5 - 4.7 .3
6 6 - 4.7 1.3
6 6 - 4.7 1.3
8 8 - 4.7 3.3
Variance
Standard Deviation
Definition of some analyses
• Independent-samples T-test: Compare ONE
value between TWO subjects. Ex. Compare
HAPPINESS between Male and Female
• Paired-samples T-test: Compare TWO values
between TWO subjects. Ex. Compare pre-test
sores and post-test scores of G1 & G2
Definition of some analyses
• The researcher wants to find out whether
there is a difference between Male and
Female towards Happiness. Write a report
with based on following data.
Independent-samples t-tests
sex N Mean Std. Deviation Std. Error
Mean

Happy 1 633 1.76 .593 .024

2 871 1.83 .632 .021


Independent Samples Tests
Levenes Test
for Equality of
Variances t-test for Equality of Means

95%
Confidence
Interval of the
Difference

Std. Lower Upper


Sig. Mean Error
(2- Differe Differe
F Sig. t df tailed) nce nce
Equal .029 .865 -2.196 1502 .028 -.071 .032 -.134 -.008
variance
assumed

Equal -2.219 1409.0 .027 -.071 .032 -.133 -.008


variance 89
not
assumed
• If Sig.<0.05 ->Equal variance not assumed
• If Sig.>=0.05-> Equal variance assumed
* If Sig. (2-tailed) <0.05-> there is a difference
between two subjects
* If Sig. (2-tailed) >=0.05-> there is no difference
between two subjects
Conclusion
As Sig. (2-tailed) = 0.028<0.05, there is
significant difference in HAPPINESS between
Male and Female
Paired-samples T-test
• Used to determine if there is any difference
between TWO VALUES before and after a
treatment has been applied.
• Ex.
- Determine if there is any difference between
the control group and experimental group after
TOEIC TEST PREPARATION has been applied .
Paired-samples T-test
STT Pre-test Post-test STT Pre-test Post-test
1 7 8 11 7 9
2 8 9 12 7 5
3 6 5 13 8 9
4 8 9 14 9 10
5 7 8 15 7 7
6 7 9 16 7 9
7 7 7 17 8 7
8 6 7 18 7 9
9 8 7 19 6 6
10 6 8 20 8 8
Paired sample statistics
Mean N Std. Std.
Deviation Error mean
Pair 1 Pre-test 7.20 20 .834 .186
scores
Post-test 7.80 20 1.399 .313
scores
Paired sample correlation
N Correlation Sig.
Pair 1 Pre-test scores and 20 .533 .016
Post-test scores
Paired Samples Test
Pair 1
Pre-test scores -
Post-test scores

Mean -6.0
Std. Deviation 1.188
Paired differences Std. Error Mean .266
95% Confidence Interval of Lower -1.16
the Difference Upper - .04
t -2.259
Df 19
Sig. (2-tailed) .036
Conclusion
As Sig. (2-tailed) = .036<0.05, there is significant
difference between pre-test and post-test
scores. Specifically, the post-test scores are 6%
higher than pre-test scores.
Analysis of Variance (ANOVA)
ANOVA: Compare ONE value among 3 or above
subjects. Ex. Compare HAPPINESS among people
aged 10-15, 15-20, 20-25, etc.
ONE WAY ANOVA
1= white people
2= black people
3= other people
Descriptive
95% Confidence
Interval for Mean

Lower Upper
Std. Bound Bound
N Mean Deviation Std. Error Minimum Maximum
1 1256 1.77 .604 .017 1.73 1.80 1 3

2 201 1.97 .651 .046 1.87 2.06 1 3

3 47 1.94 .673 .098 1.74 2.13 1 3

Total 1504 1.80 .617 .016 1.77 1.83 1 3


ANOVA

Sum of Mean F
Squares df Square Sig.
Between 7.680 2 3.840 10.225 .000
Groups

Within 563.679 1501 .376


Groups

Total 571.359 1503


• If Sig >=0.01, there is an opposite movement.
• If Sig< 0.01, there is parallel movement.
Conclusion
As Sig.=.000<0.01, there is a significant
difference among 3 groups, white people are the
happiest one.
Post Hoc Tests
• Used to indicate to which groups the
difference belong.
Post Hoc Tests
Subset for alpha = .05

1 2
race N
1
1256 1.77
3
47 1.94
2
201 1.97
Sig.
1.000 .725
Conclusion
• Between white and black people, there is no
difference in HAPPINESS since Sig= (.725) >
0.05 (the significance).
Chi-square test
• Used to indicate whether there is a relationship
between VARIABLES to ONE VARIABLE.
• Variables: TWO NOMINAL VARIABLES or ONE
NOMINAL VARIABLE and ONE ORDINAL
VARIABLE
• Ex. To indicate whether there is a relationship
between MALE AND FEMALE to EDUCATIONAL
LEVEL.
Sex race Crosstabulation
race Total
High University Post-
school graduate graduate
sex Male Count 545 71 20 636
Expected 529.9 85.5 20.5 636.0
count
Residual 15.1 -14.5 -5
Female Count 719 133 29 881
Expected 734.1 188.5 28.5 881.0
count
Residual -15.1 14.5 .5
Total Count 1264 204 49 1517
Expected 1264.0 204.0 49.0 1517.0
count
Chi-square test
• Those figures in RED are actual values.
Chi-square test
Value Df Asymp. Sig. (2-sided)

Pearson Chi-square 5.011a 2 .082

Likelihood Ratio 5.094 2 .078

Linear-by-linear Association 2.944 1 .086

N of Valid cases 1517


Chi-square test
• If Sig. <0.05 there is a relationship between
Gender and educational level.
• If Sig. >0.05 there is no relationship between
Gender and educational level.
Chi-square test
• Chi square = là 5.011, df = 2, Sig. = 0.082 >
0.05. There is no relationship between Gender
and educational level.
Pearson Correlation
• Used to indicate the relationship between
INDEPENDENT VARIABLES
• Ex. Between LISTENING SCORE and SPEAKING
SCORE, WRITING SCORE, READING SCORE;
between HEIGHT and WEIGHT.
Pearson Correlation
Pearson Correlation
• If Sig. <0.05, there is a relationship between
VARIABLES
• If Sig. >0.05, there is no relationship between
VARIABLES. In other words, VARIABLES are
independent.
• -1<= r< = 1
• If r -> -1 : negative correlation
• If r -> 1: positive correlation
• If r = 0: no correlation
Conclusion
• If Sig. = .000 < 0.05, there is a relationship
between HEIGHT and WEIGHT
Pearson Correlation
Pearson Correlation
• The researchers want to answer the following
research question: “Is there any relationship
between the HAI LONG and LUONG?” Data
were collected and analyzed with SPSS 16.0.
From the SPSS output below, write a report.
Difficulties that junior students in HCMUE
encounter when translating tourism expressions
from Vietnamese to English
THANK YOU
FOR LISTENING!

You might also like