Professional Documents
Culture Documents
Statistics For Sociologists (2015E.C)
Statistics For Sociologists (2015E.C)
Statistics for
Sociologists; JJU-BA
in Sociology
Student(Evening
Program)
By:
REDIAT TAKELE
(MSc in Statistics)
Department of Statistics
E-Mail: redtakele@Gmail.com
Cell phone: (+251)-920824328
OVERVIEW ABOUT STATISTICS
1. Definitions of statistics
In the plural sense: Statistics is defined as aggregates of numerical expressed facts (figures) in different issues.
Fact/figure about household/students/teacher(numeric information).
Fact about socio-economic/cultural factor/Society characteristics of household in rural/urban/pastoral area.
Fact/Figure about births/death, marital status(marriage, divorce) of the society
Fact about education: 20% of household head in pastoral area are not 12 grade completed,…
In the singular sense: Statistics is the a science that deals with the collection, organization,
presentation, analysis and interpretation of data to make decision. Consequently, for actions.
Illustration:
1) Organizing and present age distribution, marital status, and gender of people in some area in tabular and graphic form.
2) Four people out of a hundred(4%) will have a heart attack over ten years with this current situations continues
3) Average age at the time of marriage of women in rural part of Somali region is 15 years old
Others
8%
Diges tiv e Sys tem
4%
Injur y and Pois oning
3%
Neoplas mas
Illustration
Help to organize, understand & reporting Big data on
different countries death because of car accident
Body of statistics which provide method that help us to inferring or drawing conclusion or
make report about the population under study based upon the information/ results / finding
obtain from sample data.
The method under inferential statistics are: Correlation analysis, Regression analysis, T-test, Analysis of
variance(ANOVA),Chi-Square test, …
Inferential Statistics
Methods
Generalizing result of sample to the whole group
(population) under study. This technique used for:
Used for comparisons different population
By Make predictions
Assess the strength of the evidence
T-Test
Analysis of Variance(ANOVA)
Correlation Analysis
Regression Analysis
Logistic regression analysis
Non-parametric test
Illustrations!
Applying Inference for Comparison:
Common issue/variable need to Principle of inference by
Population-2
Population-1 compare comparison
Finding will • One or more population are
work for compared based on common
population issues or variable of interest.
comparison
• But, difficult to reach every
subject in the population for
different reason.
Inference!
• So, sample of subject from
each population are taken &
measured/collected!
Sample-2
Sample-1 • End the sample
Comparison sample result obtained
comparison finding will
work/used for population
comparison.
Illustrations
1) It is scientifically possible to infer(or obtain finding) about household in general
population (region or pastoral area or school …etc), by analyzing/ investigating
a few household in some study area with some society issue like social, cultural
issues and demographic characteristics(marital status, income, expenditure,
gender, education, age, …).
Vocabulary of statistics
1) Statistical population
It is the collection of individual/household/animal/plant, object or
measurement under study/consideration.
It is the entire set of people or observations in which you are interested or which
are being studied.
For example:
1) All student attending for first year class for this year
2) All people of Somali region
3) All married women in Somali Region in 2009E.C
4) The ages of all the students in your high school graduating class on the day you graduated for
a report to the newspaper on the characteristics of your graduating class
5) The first examination grades for all the students in your 6th-grade class.
2) Statistical sample is a part or portion of the population under study.
6) The students in a particular classroom as a sample of all of the students attending that
school.
7) Some selected married women in Somali region
By: Rediat T.(Assi.Prof.)
…Vocabulary of statistics
Statistical Population: It is the collection of individual, object or
measurement under study . It further classified as:
1) Target population (source population or reference population):
The population of interest in which at the end of the study we wish to draw
or made the conclusion or society where wish to solve a problem or gab
To which the investigators would like to generalize the results of the study
2) Study population: The specific population group from which samples are drawn and data
are collected.
2) Study population: All household that live in Jigjiga at Kebele 01& 06.
Study Population
(household in Kebele 01& 06)
Target Population
(all household/student in Jigjiga)
Illustration:
It is helpful to further divide variables or data into different scale/level, because different statistical test
and methods require further division of nature & type of variable.
For purpose of selecting statistical test and methods for particular data, we might need to know more
properties or characteristics of data or variable.
Variable further classified as Nominal, ordinal and ratio/interval scale
b) Ordinal data: Observations or value are can be ranked and still no arithmetic. The variables
deal with their relative difference rather than with quantitative differences, like ‘stronger, softer,
weaker, better than’, etc.
Example:
Students grade result: A, B, C,D and F
Education rank : Diploma, Bsc., Msc, PHD
Military rank
c) Ratio Scale: value or observations which can be ordered and we can done any arithmetic and
relational operation. Characterized by the fact that equality of ratios as well as equality of
intervals may be determined.
Example:
Age of the person/ thing
Height
Weight, length, volume, rate, time, amount of rainfall, etc.
Stages in statistical investigation
Data collection ,Data organization, Data presentation, Data analysis, Interpretation
a. Data Collection: This is the first stage in any statistical investigation and involves the process
of obtaining (gathering) a set of data.
b. Data Organization: It is a stage where we edit our data . After editing, we may classify
(arrange) according to their common characteristics.
c. Data Presentation: The organized data can now be presented in the form of diagram in a
very summarized and condensed manner. Graphs and diagrams may also be used to give the
data a vivid meaning and make the presentation attractive.
d. Data Analysis: This is the stage where we critically study the data to draw conclusions about
the population parameter. The purpose of data analysis is to dig out information useful for
decision making.
e. Data Interpretation: draw valid conclusions from the results obtained through data
analysis. The interpretation of data is a difficult task and necessitates a high degree of skill
and experience.
Why study statistics?
(Class)
Frequency percentage(%)
Gender
Male 30 (x100)=62.2%
(x100)=34.8%
Female 16
total 46
total 21 76 71
1) Totally 10 children were in school whose age is less than or equal to 4 year old.
2) Total number of children whose age is less than or equal to 8 year old were 19
children.
3) Total number of children in school whose age is greater than or equal to 4 years
old were 15 children.
Interpretation of result
LCF for the first class is equal to 2 means only 2
student from the sample will spend there study
with less than 11.5 hours per week
LCF for the first class is equal to 4 indicate totally 4
student will spend there study with less than 17.5
hours per week
And so on…
MCF for the first class is equal to 20 indicate totally
20 student spend there study with more than 17.5
hours per week
Exercise
1) Compute cumulative frequencies, cumulative percent and interpret value for the
age distribution of teacher in some school
2) Total number of teacher in school whose age is less than or equal to 19 year old ?
3) Total number of teacher in school whose age is 29 and less 29 years old?
To finding data on SPSS
File Open Data My
Computer Local Disk C
Program Files SPSSInc
SPSS16 Samples Employee
Data.sav Ok(…SPSS16)
Interpretation:
1) Among total 243 of the respondent, teacher respondent was 51(25.1%) and 182 was doctors
2) From the teacher respondent, 43 of them are male and rest was female
3) From male teacher respondent , 18% of them are living in urban area
Data presentation:
Diagrammatic and Graphic
Most of those graph and chart are seen frequently in news paper,
magazines and various statistical reports.
Others
8%
Digestive System
4%
Injury and Poisoning
3%
Circulatory system
Respiratory system
42%
13%
Neoplasmas
30%
40
35
30
25
No of women
20
15
10
0
14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5
Age group
By: Rediat T.(Assi.Prof.)
Numeric Summary of data
Example:
Mean and median can’t be used for qualitative data
General formula
If x 1 , x 2 , ..., x n are observed values , then
n
xi
=
x= i =1
n
Solution:
Age at marriage fi fx
12 8 160
13 10 300
14 16 640
15 8 400
16 5 300
17 3 210
50 2010
Properties of the arithmetic mean
• Can be used for both discrete and continuous data. However, it is
not appropriate for either nominal or ordinal data.
• For given set of data there is one and only one arithmetic mean.
• It is easily understood and easy to compute.
Mode
Mode
20
18
16
14
12
N 10
8
6
4
2
0
By: Rediat T.(Assi.Prof.)
T. Ancelle, D. Coulombie
Properties of mode
Can be used for nominal, ordinal, discrete and continuous data.
However, it is more appropriate for nominal and ordinal data.
It is not affected by extreme values
Often its value is not unique
The main drawback of mode is that often it does not exist
1) The amount may be small when the values are close together.
2) If it is high, then the data are far each other.
3) If all the values are the same, no variation among the value.
Bottom-line:
Knowing the average of a data set is not enough to describe/ summarize the data set entirely.
To make a decision on particular issues, in addition to aware of MCT, one must know how the
data values are dispersed
Why we need measure of variation?
Case-2:
Consider the following score of psychology & education student in basic statistics
course
Psychology: 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11
Education: 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8
1) Need to compare academic performance(for some pedagogic purpose) by
compute the mean, median, and mode of each of the two data sets. As you see from
your results, the two data sets have the same mean, the same median, and the same
mode, all equal to 6. The two data sets also happen to have the same number of
observations =12.
2) But the two data sets are different. What is the main difference between them?
Case-3:
For some interventions you need to assess age distribution of 6 th-grade
student in two school
It measure the average variation of each set of data from the central value.
It is the measure of variation\distance of each observation from the mean.
They use the information contained in all the observations in the data set or
population. (The range contains information only on the distance between the
largest and smallest observations,
It is the value that describe how all of the score in a distribution are dispersed
or spread out about the mean.
Interpretation: If the variance is large, it show\indicate that each value in
average concentrated far from the mean
The sample variance of the set x1, x2, ..., xn of n observations with mean is
n
2
(x
i=1
i x) 2
S
n -1
Illustration
The study used to assess the academic
performance of student in JJU. Time spend of
student on study in hours per week for 4 student
were measured.
Researcher need to estimate average variation of
student in spending hours on study based on the
following sample data collected from student.
5hr, 7hr, 2hr, 3hr
2
S= S
T-Test
Analysis of Variance(ANOVA)
Correlation Analysis
Regression Analysis
Chi-square test
Concept of Hypothesis testing
A statistical hypothesis is an assumption or a statement which may or may not
be true concerning one or more populations parameter or issues.
The assumption is should based on his experience on his area expertise, his
observation or other area of related studies.
Collect data which will allow the researcher to test the hypothesis.
To prove this scientifically/statistically, you should create two statement(Ho & H1). We call it hypothesis
statement.
Hypothesis statement:
Ho: some statement
H1: statement against Ho
Example:
(x i x )(y i y) xy [ x y]/n
r
2
(x i x ) (y i y)
2
[ x 2 ( x) 2 /n][ y 2 ( y) 2 /n]
Case to Practice-1
Consider the following data on the student
score(GPA) and study hour . It is interested
to assess the strength and direction of
association student score and study hour .
(Data is available with your PC)
Linear Regression Analysis
Regression is a technique that can be used to investigate the effect of one or more
predictor variables on an outcome variables.
It explain the variation in the dependent variable using the variation in independent
variables.
It measure the effect of independent variables on dependent variable
R Square used to measure how well the model fit the data
Measure the amount of variability in response variables
explained by a given set of predictor variables
In our case, Multiple R square(0.4167) implies that the included
independent variables(gender, family income & study hour) explain or
affect academic performance of student by 42% .
Other 58% affected by other explanatory variables not included here
“ANOVA” table provides an F-test for the null hypothesis of overall
parameter significance.
(all variable are not significant)
H1: at least one variable significantly affect the dependent
Decision: Reject Ho(since p-value is small)
Conclusion: At least one explanatory variables are related to or
affected the dependent variables (academic performance)
Coefficient Interpretation
Interpretation of coefficient:
1) Study hour(B1)=0.179, implies that academic
performance of student(GPA) will increased by
0.179 much time for each additional hours
student spend on their study.
That is, the more student spent on their study, more
perform academically. (On average 1 hour spend in
study, their AP will increase by 0.179 time).
Which individual variable affect dep. var
S
CV 100
x
xi x
Z
sd
Example
Two sections of 40 mathematics student were given
introduction to statistics examinations. The
following information was observed.
Value Section 1 Section 2
Mean 78 90
Stan. Deviation 6 5
let Student A from section 1 scored 90 and student B
from section 2 scored 95.Relatively speaking who
performed better?
Interpretation
Student A performed better relative to his section
because the Z-score of student A is larger than B
Exercise:
Standard score