Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 125

JIGJIGA UNIVERSITY

Statistics for
Sociologists; JJU-BA
in Sociology
Student(Evening
Program)

By:
REDIAT TAKELE
(MSc in Statistics)
Department of Statistics

E-Mail: redtakele@Gmail.com
Cell phone: (+251)-920824328
OVERVIEW ABOUT STATISTICS
1. Definitions of statistics
In the plural sense: Statistics is defined as aggregates of numerical expressed facts (figures) in different issues.
 Fact/figure about household/students/teacher(numeric information).
 Fact about socio-economic/cultural factor/Society characteristics of household in rural/urban/pastoral area.
 Fact/Figure about births/death, marital status(marriage, divorce) of the society
 Fact about education: 20% of household head in pastoral area are not 12 grade completed,…

In the singular sense: Statistics is the a science that deals with the collection, organization,
presentation, analysis and interpretation of data to make decision. Consequently, for actions.

Statistics is the science of learning/dealing about data

 Data is the backbone, input or raw materials for statistics. Without


data, impossible to think about statistics(Statistics is unthinkable).

 Statistical Data and Information??


Data in Statistics View
• Data is any value(it can be number, string, date, currency) which can
be obtained by measurement, observation or counting.
• Can be obtained from(source):
– Routine records,
– Literature
– Surveys(sample/census)
– Experiments
– Reports
– Observation… etc.
• Where as information means any processed/ organized data: -number of
student in JJU ,among them how much of them are female/male in each
college/dept…,number of student in grade-8,9,10… dropout rate of
students/employee last 10 year…
• Data are numbers which can be measurements or can be obtained by
counting.
• It is a backbone, raw material and input for statistics. 3
Classifications of statistics:
-Descriptive Vs Inferential Statistics

1. Descriptive Statistics: Body of statistics that provide methods/technique that help us to


organize, present and summarize the a set of data without making any conclusions or
inferences about general targeted population. So as to understanding/explore the data.
 Methods of organizing and summarizing data so as to understand about feature of data.

 Aim: Describing/explore/understand the data in order to see the what


pattern/feature/characteristics data has so as to extract useful information.
 The method under descriptive statistics are: Tables, Charts, Graphs and Numeric
summary measure(mean, median, variance, standard deviation & so on)

Illustration:
1) Organizing and present age distribution, marital status, and gender of people in some area in tabular and graphic form.
2) Four people out of a hundred(4%) will have a heart attack over ten years with this current situations continues

3) Average age at the time of marriage of women in rural part of Somali region is 15 years old

Distrib uti on fo ca use of d e a th for fe ma le s, in Eng la n d a nd W a le s, 1989

Others
8%
Diges tiv e Sys tem
4%
Injur y and Pois oning
3%

Cir c ulatory s y s tem


Res pir atory s y st em
42%
13%

Neoplas mas
Illustration
Help to organize, understand & reporting Big data on
different countries death because of car accident

By: Rediat T.(Assi.Prof.)


2. Inferential Statistics:
 This is important, especially when reporting a scientific paper and dealing with surveys.
 It is the science of making surveys conclusions or generalized to the whole population, based on random and
representative sample.

 Body of statistics which provide method that help us to inferring or drawing conclusion or
make report about the population under study based upon the information/ results / finding
obtain from sample data.
 The method under inferential statistics are: Correlation analysis, Regression analysis, T-test, Analysis of
variance(ANOVA),Chi-Square test, …
Inferential Statistics
Methods
 Generalizing result of sample to the whole group
(population) under study. This technique used for:
 Used for comparisons different population
 By Make predictions
 Assess the strength of the evidence
T-Test
Analysis of Variance(ANOVA)
Correlation Analysis
Regression Analysis
Logistic regression analysis
Non-parametric test
Illustrations!
Applying Inference for Comparison:
Common issue/variable need to Principle of inference by
Population-2
Population-1 compare comparison
Finding will • One or more population are
work for compared based on common
population issues or variable of interest.
comparison
• But, difficult to reach every
subject in the population for
different reason.
Inference!
• So, sample of subject from
each population are taken &
measured/collected!
Sample-2
Sample-1 • End the sample
Comparison sample result obtained
comparison finding will
work/used for population
comparison.
Illustrations
1) It is scientifically possible to infer(or obtain finding) about household in general
population (region or pastoral area or school …etc), by analyzing/ investigating
a few household in some study area with some society issue like social, cultural
issues and demographic characteristics(marital status, income, expenditure,
gender, education, age, …).
Vocabulary of statistics

1) Statistical population
 It is the collection of individual/household/animal/plant, object or
measurement under study/consideration.
 It is the entire set of people or observations in which you are interested or which
are being studied.
 For example:
1) All student attending for first year class for this year
2) All people of Somali region
3) All married women in Somali Region in 2009E.C
4) The ages of all the students in your high school graduating class on the day you graduated for
a report to the newspaper on the characteristics of your graduating class
5) The first examination grades for all the students in your 6th-grade class.
2) Statistical sample is a part or portion of the population under study.
6) The students in a particular classroom as a sample of all of the students attending that
school.
7) Some selected married women in Somali region
By: Rediat T.(Assi.Prof.)
…Vocabulary of statistics
 Statistical Population: It is the collection of individual, object or
measurement under study . It further classified as:
1) Target population (source population or reference population):
 The population of interest in which at the end of the study we wish to draw
or made the conclusion or society where wish to solve a problem or gab
 To which the investigators would like to generalize the results of the study
2) Study population: The specific population group from which samples are drawn and data
are collected.

 Example-1: Consider study wishes to assess the Socio-cultural characteristics of


household in Jigjiga werada, researcher select 150 households from Kebele 01 and 06.
1) Target population : All household/people in that live in Jigjiga wereda

2) Study population: All household that live in Jigjiga at Kebele 01& 06.

3) Sample: will be 150 household from Kebele 01& 06.


Sample
(150 household from Kebele 01& 06)

Study Population
(household in Kebele 01& 06)

Target Population
(all household/student in Jigjiga)

By: Rediat T.(Assi.Prof.)


Some Illustration
1) A Sociologist wishes to assess age at marriage of women in Somali region. She selects 500 married women from
Jigjiga & Godey and she discovered that 45% of them are 15 & less year old when married. Define /identify:
a) Target population
b) Study(sampled) population
c) Sample
2) Researcher try to estimate the household in Ethiopia under poverty line. She selects 1000 household from Addis
Ababa and she discovered that 900 people out of 2000 are under poverty line. Define /identify:
a) Target population for this study
b) Study(sampled) population
c) Sample
3) 4) In a study of the prevalence of HIV/AlDS among Jigjiga University student, randomly 100 student were selected
from social science college and prevalence is estimated. For this study: Identify: Target population, Study
population and Sample
4) In a study on dropout rate of student in Jigjiga University 2010E.C, a randomly sample 100 student were selected
from college of Education and dropout rate is estimated. For this study: Identify
a) Target population for this study
b) Study population
c) Sample
5) Consider study wishes to address challenge in education issue of primary school student in Jigjiga werada, for this
purpose, it selected 400 students from two primary school. Identify
a) Target population for this study
b) Study population
c) Sample
Identify Population & Sample
Sampling frame:
 It is the list of all elements/items in the population.
 The list of units/individual/object from which the sample is to be
selected. Illustration:
 If the study focus on some issues of household in Somali region,
sampling frame will be a list of household Somali region
 The study designed to investigate the academic performance of
Jigjiga University student, Sampling frame will be the list of all
student in Jigjiga University

 The study focus on school/districts/region, the sampling


frame will be a list of school/district/region, etc.
By: Rediat T.(Assi.Prof.)
Sampling unit
The individual/item/element to be selected as sample
in the sampling process.

Illustration:

 If the study focus on some issues of household in Somali region, sampling


Unit will be household
 The study designed to investigate the academic performance of Jigjiga
University student, Sampling Unit will be the student
 The study focus on school/districts/region, the sampling Unit will be…..
Illustration:
Define sampling frame & Unit for the following:
1) A Sociologist wishes to assess the variation of age at marriage of
couple(men & women) in Ethiopia. She selects 150 married couple
from Addis Ababa and she discovered that most men older than
women when married.
2) If you want to studies marital status of the house holds in Jigjiga
town.
3) Researcher want to studies the performance of first year students in
Jigjiga University. For this purpose, he select a sample of 120 first year college
of Education student.
4) Consider study wishes to address challenge in education issue of primary school student
in Jigjiga werada, for this purpose it selected 400 students from two primary school.
5) Educational researcher try to estimate the proportion of female teacher in Somali region
whose education level is bachelor degree and above. She selects 2000 female teacher
from Jigjiga & Godey and she discovered that 10% of them have Bsc. and above .

By: Rediat T.(Assi.Prof.)


Nature & type of variables/data
 Variables: it is any attribute or characteristics that assume different value(it can
be string, number, date, currency…)
 Example: Gender, marital status, attitude/ perception for something/someone,
Education level, number of household/student, age of people…
1) Qualitative Variables are non-numeric and can't be measured.
 Examples: gender of student, marital status of people in jigjiga, Job satisfaction of
employee, place of birth, blood type….
2) Quantitative Variables are numerical and can be obtained by measuring or counting.
 Examples: number of children in school, age of school child, height, weight etc
The quantitative variables also can be classified as continuous and discrete variables,
which is based on how will be obtained(measuring or counting):
a. Continuous Q. Variables: - are usually obtained by measurement not by counting.
These are variables which assume or take any decimal value when collected. The
variables like time, height, weight of the some thing, temperature, and etc
b. Discrete Q. Variables: - are obtained by counting. A discrete variable takes always
whole number values that are counted.
Example: number of students/household, number of accidents on traffic line, age of
household, etc
Scale of Measurement

 It is helpful to further divide variables or data into different scale/level, because different statistical test
and methods require further division of nature & type of variable.
 For purpose of selecting statistical test and methods for particular data, we might need to know more
properties or characteristics of data or variable.
 Variable further classified as Nominal, ordinal and ratio/interval scale

a) Nominal data or variable:


 Value or observation naturally can’t order or ranked rather classified into
various mutually exclusive categories
 No any arithmetic or relational operation can applicable on them
 Example:
• Gender: Male, Female
• Eye color: brown, black, white
• Blood type: A,B, AB,O
• Place of birth: Jigjiga, D/bour, Gode..
• Marital status: Single, Married, Divorced
• Attitude: negative, positive, low,..
…Scale….Cont’d

b) Ordinal data: Observations or value are can be ranked and still no arithmetic. The variables

deal with their relative difference rather than with quantitative differences, like ‘stronger, softer,
weaker, better than’, etc.

Example:
 Students grade result: A, B, C,D and F
 Education rank : Diploma, Bsc., Msc, PHD
Military rank

c) Ratio Scale: value or observations which can be ordered and we can done any arithmetic and
relational operation. Characterized by the fact that equality of ratios as well as equality of
intervals may be determined.
Example:
 Age of the person/ thing
 Height
 Weight, length, volume, rate, time, amount of rainfall, etc.
Stages in statistical investigation
 Data collection ,Data organization, Data presentation, Data analysis, Interpretation

a. Data Collection: This is the first stage in any statistical investigation and involves the process
of obtaining (gathering) a set of data.

b. Data Organization: It is a stage where we edit our data . After editing, we may classify
(arrange) according to their common characteristics.

c. Data Presentation: The organized data can now be presented in the form of diagram in a
very summarized and condensed manner. Graphs and diagrams may also be used to give the
data a vivid meaning and make the presentation attractive.

d. Data Analysis: This is the stage where we critically study the data to draw conclusions about
the population parameter. The purpose of data analysis is to dig out information useful for
decision making.

e. Data Interpretation: draw valid conclusions from the results obtained through data
analysis. The interpretation of data is a difficult task and necessitates a high degree of skill
and experience.
Why study statistics?

1. Data are everywhere. For data Management!


2. Statistical techniques are used to make many decisions
that affect our lives
3. No matter what your career, you will make
professional decisions that involve data. An
understanding of statistical methods will help you
make these decisions efectively
Advantage/ Rationale of sampling
 Reduced cost: Sampling reduces demands on resource such as
finance, personal and material
 Greater accuracy: Sampling may lead to better accuracy of
collecting data due to :
 Better trained personnel, more careful supervision and processing

 Greater speed/Energy: Data can be collected and summarized


more quickly
Sampling Methods
1) Random or Probability sampling methods
 Is a method of sampling in which all elements/unit in the population have a
equal chance/probability to be included as the sample.
 Every individual of the study population has equal chance to be included in
the sample.
 It can be categories as: Simple random sampling, systematic random
sampling, stratified random sampling and cluster random sampling.
2) Non-Random sampling methods: Every unit have no equal chance to be selected
as a sample like Quota sampling…
A) Simple random sampling (SRS)
 It is the simplest and basic probability random sampling methods
 Is a method of selecting items from a population such that every
possible sample of specific size has an equal chance of being selected
 Each unit in the sampling frame has an equal chance of being selected
 However, it is costly and boring to conduct SRS, when the size of
population under study is large
Example: Having the list of household in Jijiga
• It is applicable for relatively small size
Procedure in simple random method:
 Each unit on the list should be numbered/ identified in sequence
from 1 to N (where N is the size of the population)
 Decide on the size of the sample
 Select the required number of size, using a
 Lottery method

 Table of random number


"Lottery” method:
• Each unit in the population is represented by equally slip/piece of paper(to make
sure each unit has equal chance), these are put in a box and mixed and a sample
will be taken until needed.
• It applied when the size of population is small

Table of random number:


 Numbers are generated by computer program randomly, which doesn’t
have any sequence/order for any side. In using these tables to select
sample, the steps are:
i) Give code of number to the units in the population from 1 to N
ii) Select a columns from the table of random numbers randomly and
select any one starting value randomly with population digit size
iii) Go down with order to select the required sample size(n)
iv) Reject the numbers that comes more than once and above the digit
Illustration
Let we need to select random sample of 10 student
from a college of education having 528 student
using table of random number
By: Rediat T.(Assi.Prof.)
B) Systematic random Sampling
• A complete list of all elements with in the population (sampling frame)
is required.
• The procedure starts in determining the first element to be included in
the sample.
• The first unit to be selected is taken at random from among the first k
units. That is: Choose randomly any number between 1 and k. Suppose
it is j(1≤j≤K) ,which has to be the unit to be included as sample
• Let N = population size, n = sample size, K 
N
n
• The jth unit is selected at first and then the (j + k ) thunit, (j + 2k )th unit, (j
+ 3k )thunit …. etc until the required sample size is reached.
Illustration
Let the sample size is decided to be 10 student
from 100 students of a school using a systematic
random sample
C) Stratified Random Sampling methods
• The population will be divided in to none overlapping but exhaustive
groups called strata
• The group is based on your variable of interest or study.
• Elements in the same strata should be more or less homogeneous while
different in different strata.
• If you assume that main study variable is more or less have the same
characteristics ,then you can group them together as one group
• A separate sample is then taken independently from each stratum, by
simple random sampling methods
• Some of the criteria for dividing a population into strata are:
Age (under 18, 18 to 28, and 29 to 39); Sex (male, female);Occupation
(manager, professional, and other),college(statistics, maths, Bio, Chem,
Phy)
 Example: To study the academic performance of student of Social
science and Natural science on the course mathematics or statistics
D) Cluster Random sampling
• The reference population is divided into clusters, unit within the cluster is
heterogeneous and between the cluster is more or less similar based on
variable of interest
• These clusters are often geographic units (eg districts, villages, school etc.)

• A simple random sample of groups or cluster of elements is chosen and


all the sampling units in the selected clusters will be surveyed.
• Clusters are formed in a way that elements with in a cluster are
heterogeneous, i.e. observations in each cluster should be more or less
dissimilar.
• All the units in the selected clusters are studied
Cluster Sampling VS Stratified Random Sampling
Example: How would these two sampling methods differ
in selecting students from all high schools in Jigjiga?
• Cluster Random Sampling: Some high schools would be
randomly selected from a list of all high schools in Jigjiga
• ●All students from selected high schools would be
included in the study
• Stratified Random Sampling: A specific number of
students would be randomly selected from each high
school in Jigjiga
• ●Unlike Cluster Sampling, this method ensures that
every high school in Jigjiga is represented in the study
For example, to estimate the average annual
household income in a large city we use cluster
sampling, because to use simple random
sampling we need a complete list of households
in the city from which to sample. To use
stratified random sampling, we would again
need the list of households. A less expensive
way is to let each block within the city represent
a cluster. A sample of clusters could then be
randomly selected, and every household within
these clusters could be interviewed to find the
average annual household income.
Sample size determination
• What parameter do you need to consider
1) Variance or heterogeneity of variable of interest of the population(P
or sigma)
Obtained from:
a) Previous studies:
b) Researcher expectations:
c) Pilot study:
2) The degree of acceptable/ tolerable error(d)
 Know that in survey(sampling) there exist an error(sampling & non-
sampling error), it can be respondent, natural variation ….should take
care of this error or assign it value by percentage… etc
3) Confidence level
Z at 95% confidence = 1.96
Z at 99% confidence = 2.58

By: Rediat T.(Assi.Prof.)


General Sample size formula

By: Rediat T.(Assi.Prof.)


Example
We wish to determine the required sample size
with 95% confidence and 5% error tolerance that
the percentage of investigating the academic
performance among 5000 student of JJU.A
recent studies showed that 40% of the variability
in academic performance of student in the
higher education. What is the required sample
size?

By: Rediat T.(Assi.Prof.)


2. Data Organization and Presentation
Descriptive statistics:
• Techniques used to organize, summarize and present a set of
data in a concise way.
• Goal: To describe and understand about data at hand.
Consequently, can extract useful information from the data.
And help to go ahead for the next analysis.
• Numbers /data that have not been summarized and organized are
said to be a raw data.
Method of descriptive statistics include:
• Tables: Frequency distribution- normal & cumulative
• Chart and Graphs: pie-chart, bar-chart and histogram, scatterplot, line-
graph.
• Numeric summary tool:
- Measures of central tendency: mean(average), median, mode
- Measures of variability: Range, Variance, Std dev., Standard score
Frequency distribution
• It is most common and simplest method of organizing any type of
data in different issues.
• Aim: To organize the data by categorizing/assign one or similar group
in one class and other with another class….by having their frequency
& percentage like similar age, income level, socio-cultural, education
level…
• It measure/shows the distribution of value within variable.
• It is table which has a list of each of the possible values that the data/variable
can assume(class) along with the number of times each value
occurs(frequency) and percentage for each value.
• It organize data based on common attributes: frequency, percent
and class.
– Frequency refers to how often a score/value occurs in a set of data. Or number of time/repetition each
value occurs.
– Percentage refers to the percentage of value who had a certain score or who had in certain category.
Example-1:
Organizing data that collected from students of JJU for some study :
M,M,F,M,M,F,F,F,M,M,M,F,F,F,M,M,M,F,F,M….M,F

(Class)
Frequency percentage(%)
Gender

Male 30 (x100)=62.2%

(x100)=34.8%
Female 16

total 46

𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒄𝒂𝒔 𝒆/𝒗𝒂𝒍𝒖𝒆


%= 𝒐𝒇 𝒄𝒂𝒔𝒆¿×𝟏𝟎𝟎
𝒕𝒐𝒕𝒂𝒍¿
Example-2:
Organizing age distribution of household in rural area

By: Rediat T.(Assi.Prof.)


Cumulative Frequency Distribution

 Sometime we may need to assess or have cumulative value


rather than individual.
• It measure cumulative(collection) rather than
individual value.
• It is a table that can be constructed based on class,
cumulative frequencies and percent.
For example: Answers the question,
1) How many student the class had age of less than or equal to
20?
2) “How many individuals had the score 60 or lower?”
Cumulative frequency
 Cumulative frequency: is the number of observations
less than/more than or equal to a specific value.
 Cumulative frequency above(MCF): it is the total
number of values/ observation which is greater than or
equal to the specific class.
It can be calculated as the sum of frequency above
specific class and including its class frequency.
 Cumulative frequency below(LCF): it is the total
number of values/ observation which is less than or equal
to the specific class.
 It can be calculated as the sum of frequency less than
specific class and including its class frequency.
Example:
 Age of children were considered for assessing
some social issue in some rural area of Somali
region as follow:
10,2,4,10,2,5,10,2,4,5,2,2,5,4,8,4,3,8,8,8,5,5
1) Construct of cumulative frequency distribution(using class, cumulative
frequency and cumulative percent) and interpret for the result of the
student.
2) What is total number of children in school whose age is less than or
equal to 4 year old?
3) Total number of children in school whose age is greater than 8 years old?
4) What percent of children in school have age less than or equal to 5 years
old?
Cumulative frequency distribution!
Age of Frequency Cumulative frequency
children (# Less than type(LCF) More than %LCF %MCF
of children) type(MCF)

2 years 5 0+5=5 5+1+4+5+4+2=21 =6.6%

3 years 1 5+1=6 4+5+4+2+1=16 =7.9%

4 years 4 5+1+4=10 5+4+2+4=15 =13%

5 years 5 5+1+4+5=15 5+4+2=11 =19.7%

8 years 4 5+1+4+5+4=19 4+2=6 =25%

10 years 2 5+1+4+5+4+2=21 2+0=2 =27.6%

total 21 76 71

1) Totally 10 children were in school whose age is less than or equal to 4 year old.
2) Total number of children whose age is less than or equal to 8 year old were 19
children.
3) Total number of children in school whose age is greater than or equal to 4 years
old were 15 children.
Interpretation of result
LCF for the first class is equal to 2 means only 2
student from the sample will spend there study
with less than 11.5 hours per week
LCF for the first class is equal to 4 indicate totally 4
student will spend there study with less than 17.5
hours per week
And so on…
MCF for the first class is equal to 20 indicate totally
20 student spend there study with more than 17.5
hours per week
Exercise
1) Compute cumulative frequencies, cumulative percent and interpret value for the
age distribution of teacher in some school
2) Total number of teacher in school whose age is less than or equal to 19 year old ?
3) Total number of teacher in school whose age is 29 and less 29 years old?
To finding data on SPSS
File  Open  Data  My
Computer  Local Disk C 
Program Files SPSSInc 
SPSS16  Samples  Employee
Data.sav Ok(…SPSS16)

By: Rediat T.(Assi.Prof.)


Data organization:
Two or more variable

 Organizing data that involves more than two variable


that help us to see the relationship among them.
 Example:
1) Organizing and see the relation among gender and
academic performance of student in some school.
2) The relationship among marital status and education
level of household.
…Data organization: Two variable

(x100)=8.3% (x100)=12.0% (x100)=63.4%


𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒄𝒂𝒔 𝒆/𝒗𝒂𝒍𝒖𝒆
%= 𝒐𝒇 𝒄𝒂𝒔𝒆¿×𝟏𝟎𝟎
𝒕𝒐𝒕𝒂𝒍¿
(x100)=30.0% (x100)=39.4% (x100)=50.4%
Interpretation
From table:
1) From the total 474 teacher ,the number of male
teacher is 258 and female was 216.
Gender view:
1) From total 216 female teacher, 18(8.3%) of them have
certificate, 26(12%) of them are diploma holder,…
2) Among the total of 258 male teacher, 42(16%) of
them are certificate holder,….
Education status view:
1) Among the holder of certificate(60), 30% of them are
female and 70% of them are male.
Example of higher order table
Table : Distribution of Professional by Sex and Residence
Residence

Profesion/Sex Urban Rural Total

Teacher Male 8(19%) 35(81%) 43

Female 2(11%) 16(89%) 18

Doctor Male 46(58) 36(22) 82

Female 23(29) 77(47) 100

Total 79(100) 164(100) 243

 Interpretation:
1) Among total 243 of the respondent, teacher respondent was 51(25.1%) and 182 was doctors
2) From the teacher respondent, 43 of them are male and rest was female
3) From male teacher respondent , 18% of them are living in urban area
Data presentation:
Diagrammatic and Graphic

 Chart used for presenting data which are qualitative(catagorical)


where graph for quantitative data presentation 

 Importance of diagrammatic and graphic representation

1. Diagrams have greater attraction than mere figures.


2. They give quick overall impression of the data.
3. They have great memorizing value than mere figures.
4. They facilitate comparison easily for audience and researcher
5. Used to understand patterns or trends data easily and quickly
By: Rediat T.(Assi.Prof.)
Cont’d…
Commonly used type chart and graph in all field of study are:
o Bar chart
 Simple bar chart
 Multiple bar chart
 Component bar chart
Pie chart
Histogram
Line graph

 Most of those graph and chart are seen frequently in news paper,
magazines and various statistical reports.

By: Rediat T.(Assi.Prof.)


1.Simple bar chart:
• It used if we have only one categorical data
 Example: we need to present sex of student, marital
status of the people
• It is a chart in which the bar represents the whole
of the magnitude
• The height or length of each bar indicates the size
(frequency) of the figure represented.
By: Rediat T.(Assi.Prof.)
Practical Example
 Data collected from JJU student to know the gender
proportion of student enrolment in some academics year:
M,M,F,F,M,F,F,F,M....

Data can be organized by SPSS software as:

Let present data visual using simple bar chart…


By: Rediat T.(Assi.Prof.)
Simple bar chart will be

By: Rediat T.(Assi.Prof.)


Modify/Edit barchart
To edit the output of a Bar chart double
click and then
> right click it….Then
> select following options:
1) show label(to see how much percentage each
categories have)
2) add title( to write the title of the chart)
3) Try other option….to see more effect on chart
Interpretation
 Number of male respondent in the sample is
216 and that of female is 258
 Therefore, this implies that number of male
student is more than female student
 So, you may recommend, the concerned body
should take some action to balance the value

By: Rediat T.(Assi.Prof.)


2. Multiple bar chart:
•It describes distributional pattern of more than one
variable
• They are used for comparing different variables at the
same time.
•Figures are shown as separate bars adjoining each other.
•The height of each bar represents the actual value of the
component figure.

By: Rediat T.(Assi.Prof.)


Practical illustration
Data is collected from some organization for some
educational research. Let present this data for visual
analysis and comparison

By: Rediat T.(Assi.Prof.)


Multiple bar chart output will be

By: Rediat T.(Assi.Prof.)


Interpretation
Among employee which has certificate in the
organization, 18 of them are female and 42
are male
Among employee, 26 of them holds diploma
which are female and 40 of them are male
 137 of them are Bsc. Graduate and the rest of
them are Msc. holds

By: Rediat T.(Assi.Prof.)


3. Component (sub-divided) Bar Diagram

• Used when there is a desire/interest to know how a


total (or aggregate) is divided in to its component parts,
we use component bar chart.
• Bars are sub-divided into component parts of the
figure.
• Component diagrams are constructed when each total
is built up from two or more component figures.

By: Rediat T.(Assi.Prof.)


Output will be

By: Rediat T.(Assi.Prof.)


OR

By: Rediat T.(Assi.Prof.)


Interpretation
Among female employee, the one who has
certificate is 16, 26 of them holds diploma,
137 of them are Bsc. Graduate and the rest of
them are Msc. holds

By: Rediat T.(Assi.Prof.)


2. Pie chart

• Shows the frequency for each category by dividing a circle into


sectors or part
• Used for presenting a single categorical variable
• The angles are proportional to the relative frequency(f/n)

By: Rediat T.(Assi.Prof.)


Steps to construct a pie-chart

• Construct a frequency table

• Change the frequency into percentage (P)

• Change the percentages into degrees, where: degree =


Percentage X 360o

• Draw a circle and divide it accordingly

By: Rediat T.(Assi.Prof.)


Example: Distribution of deaths for females, in England and
Wales, 1989.

Cause of death No. of death


Circulatory system 100 000
Neoplasm 70 000
Respiratory system 30 000
Injury and poisoning 6 000
Digestive system 10 000
Others 20 000

Total 236 000

By: Rediat T.(Assi.Prof.)


Distribution fo cause of death for females, in England and Wales, 1989

Others
8%
Digestive System
4%
Injury and Poisoning
3%

Circulatory system
Respiratory system
42%
13%

Neoplasmas
30%

By: Rediat T.(Assi.Prof.)


Practical Exercise
Let try to present using pie-chart the following
variable and edit it:
a) Employee categories
b) minority classification in employee data
3. Histogram
• Histograms are frequency distributions with continuous class
interval that have been turned into graphs.

• Given a set of numerical data, we can obtain impression of the


shape/ distribution of data by constructing a histogram.

 Most of the statistical methods assume or need data normality,


this assumption is checked by constructing histogram

By: Rediat T.(Assi.Prof.)


Example: Distribution of the age of women at the time of marriage

Age 15-19 20-24 25-29 30-34 35-39 40-44 45-49


group
Number 11 36 28 13 7 3 2
Age of women at the time of marriage

40

35

30

25
No of women

20

15

10

0
14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5
Age group
By: Rediat T.(Assi.Prof.)
Numeric Summary of data

By: Rediat T.(Assi.Prof.)


A. Measures of Central tendency(MCT)

• It is a measure that shows the central values of the data.

• It is the way of expressing/represent/describing/ summering the


whole set of data by single value or few set of number.
• MCT appear commonly on TV, in the newspaper and in news
magazines.
 Here are some example:
1) The average monthly salary of instructors of JJU is 6000 birr
2) One may read that the average distance covered by instructor in
Jigjiga from home to campus is one kilo meters.
3) The average income of jigjig people is 300 birr
4) To see whether there are more study hours in library in first year
than third year student. We can use average for comparison
By: Rediat T.(Assi.Prof.)
…MCT ….Cont’d
• Commonly used Measure of central tendency are mean, median and mode

• However , there is a situation/ scenario where we doesn’t used all of them


simultaneously.

Example:
 Mean and median can’t be used for qualitative data

 Can we find or it does not has meaning: average region of Ethiopia,


average marital status of this class student…etc

 Mode can be used for both quantitative and qualitative data


 We can’t use mean for the data that has large extreme value ; in this case
median/mode can be used instead(visual inspection of data very
important for this purpose) By: Rediat T.(Assi.Prof.)
1. Arithmetic mean/Average:
The sum of each observations divided by the total number of observations

General formula
If x 1 , x 2 , ..., x n are observed values , then
n

 xi
=
x= i =1
n

Example-1: Suppose that a small class of 10 students (N = 10) has the


following scores on a introduction to sociology exam:
78, 85, 92, 81, 90, 88, 71, 75, 81, 84
Compute average score of the student
Solution:
Example-1:
Average(=
Average(=
Avarage(==8.5

Example-2: Data below show the age of 20 sampled student in


some department .We need to know the average age of student.
19 21 20 20 34 22 24 27 27 27 18 21
22 18 23 23 25 24

By: Rediat T.(Assi.Prof.)


Example-3:
Given data show the distribution of women age at marriage in some area in
Ethiopia. Calculate the arithmetic mean (average age at marriage of women)
Age 13 14 15 16 17 18
No. of people 8 10 16 3 5 3

Solution:
Age at marriage fi fx

12 8 160
13 10 300
14 16 640
15 8 400
16 5 300
17 3 210
  50 2010
Properties of the arithmetic mean
• Can be used for both discrete and continuous data. However, it is
not appropriate for either nominal or ordinal data.
• For given set of data there is one and only one arithmetic mean.
• It is easily understood and easy to compute.

• Algebraic sum of the deviations of the given values from their


arithmetic mean is always zero.
• It is greatly affected by the extreme value

By: Rediat T.(Assi.Prof.)


2. The Median
 the median is defined as the middle observation of the set of data.
 Aim: Description/ understanding of data by dividing in to equal part,
instead of looking each and every data values.
 Example: An article recently reported that the median net income
for college professors of US is 40,000 dollar.
 It can be compute as follows:

Step-1: Arrange data in ascending/ descending order


Step-2: Select the formula based on number of observation(n): even or
odd.
Median( By: Rediat T.(Assi.Prof.)
Example-1
For the following data, let calculate the median value of the data
a) 19 23 20 21 22 24 27 30 18 34 21
Solution:
Steps-1: Arrange data(either ascending/ descending order)
18 19 20 21 21 22 23 24 27 30 34
Step-2: Select the formula based on number of observation(n=11;
odd)
 Median()=the value of observation
 Median()= the the value of observation
 Median()= the the value of observation
 Median()= 24

By: Rediat T.(Assi.Prof.)


Example-2
b) 20 19 26 18 21 22 25 17 27 34 (n=10 ; even)
Solution:
Steps-1: Arrange data(either ascending/ descending order)
17 18 19 20 21 22 25 26 27 34
Step-2: Select the formula based on number of observation(n=10; odd)
 Median(
 Median()=
 Median()= =
 Median()= 21.5

By: Rediat T.(Assi.Prof.)


Properties of median
• Can be used for ordinal, discrete and continuous data. However, it
is not appropriate for nominal data.
• There is only one median for a given set of data
• The median is easy to calculate
• Median is a positional average and hence it is not drastically
affected by extreme values
• It is not a good representative of data if the number of items is
small

By: Rediat T.(Assi.Prof.)


3. Modal value
• It is the score/value that occur most frequent in set of data
• It may not be unique
• It may or may not be exist in set of data
• It can be used for qualitative and quantitative data
Example: what is the modal value of the following:
a) 19 21 20 30 22 24 27 27 27 (modal value is 27)
b) 20 19 18 30 31 62 19 30 (modal value is 19 and 30)
c) 27 28 60 40 50 70 20 32 (no modal value)
d) The marital status of employee in certain organization is given as:
Marital status no. of employee
single 20
Married 45 (modal value is married)
divorced 15
By: Rediat T.(Assi.Prof.)
Interpretation of modal:
How can we interpret meaning of the following:
1) It found that the modal marital status of household in Jigjiga area is
married.
2) The modal age at the time marriage of women in D/bour area is 15-
years old.
3) If the number of car accident on the streets of a city were tabulated
by hours of occurrence, it is likely that two modal period would
become evident: b\n 7AM-8AM and b\n 5PM-6pm the hours when
traffic is to and from stores and office is heaviest and when drivers
are in the greatest hurry.

By: Rediat T.(Assi.Prof.)


Mode

Mode
Mode

20
18
16
14
12
N 10
8
6
4
2
0
By: Rediat T.(Assi.Prof.)
T. Ancelle, D. Coulombie
Properties of mode
 Can be used for nominal, ordinal, discrete and continuous data.
However, it is more appropriate for nominal and ordinal data.
 It is not affected by extreme values
 Often its value is not unique
 The main drawback of mode is that often it does not exist

By: Rediat T.(Assi.Prof.)


B. Measures of Variability
• The statistical measures/technique which provide ways of measuring the extent
in which data values are dispersed or spread out from the central value.
• It measure the average distance of each data value from the central values.
• Measures that quantify the average variation or dispersion of a set of data from
its central location.
Interpretation of the meaning:

1) The amount may be small when the values are close together.
2) If it is high, then the data are far each other.
3) If all the values are the same, no variation among the value.

By: Rediat T.(Assi.Prof.)


Case-1: Why we need Measure of variation
Illustration:
1) If your nature guide told you that the river has averaged depth of
0.45m, would you want to across on foot with out additional
information?
 If you aware of maximum depth of the river 2.25m and minimum
0.35 m if that is the case, you would probably agree to cross the
river?
 There are issues of data distribution calling for additional type of
statistical measure rather than measure of central tendency(mean,
median, mode).

Bottom-line:
Knowing the average of a data set is not enough to describe/ summarize the data set entirely.
To make a decision on particular issues, in addition to aware of MCT, one must know how the
data values are dispersed
Why we need measure of variation?
 Case-2:
Consider the following score of psychology & education student in basic statistics
course
Psychology: 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11
Education: 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8
1) Need to compare academic performance(for some pedagogic purpose) by
compute the mean, median, and mode of each of the two data sets. As you see from
your results, the two data sets have the same mean, the same median, and the same
mode, all equal to 6. The two data sets also happen to have the same number of
observations =12.
2) But the two data sets are different. What is the main difference between them?
Case-3:
For some interventions you need to assess age distribution of 6 th-grade
student in two school

By: Rediat T.(Assi.Prof.)


Case-3.: Why we need Measure of variation
2) The following data show the score of a group of student on two
different test ; one is reading and other are arithmetic Exam.

 Can MCT describe fully the difference in achievement b\n student


score ? Why?
By: Rediat T.(Assi.Prof.)
Common Measure of variation
1. Range
2. Variance 1. Range(R):
3. Standard Deviation  It is the simplest measure of
4. Coefficient of Variation variability.
 It consider only two
extreme value.
 It can be obtained by
subtracting the smallest
value from highest value.
 That is: R=H-L

Example: Measure the variability of the following data using Range:


20,12, 14,15,10,16,9,17,18,19,14,19,7
By: Rediat T.(Assi.Prof.)
Variance (s2)

 It measure the average variation of each set of data from the central value.
 It is the measure of variation\distance of each observation from the mean.
 They use the information contained in all the observations in the data set or
population. (The range contains information only on the distance between the
largest and smallest observations,
 It is the value that describe how all of the score in a distribution are dispersed
or spread out about the mean.
Interpretation: If the variance is large, it show\indicate that each value in
average concentrated far from the mean

 The sample variance of the set x1, x2, ..., xn of n observations with mean is
n

2
 (x
i=1
i  x) 2
S 
n -1
Illustration
The study used to assess the academic
performance of student in JJU. Time spend of
student on study in hours per week for 4 student
were measured.
Researcher need to estimate average variation of
student in spending hours on study based on the
following sample data collected from student.
5hr, 7hr, 2hr, 3hr

By: Rediat T.(Assi.Prof.)


Solution:

By: Rediat T.(Assi.Prof.)


Properties of Variance:
· The main disadvantage of variance is that its unit is the square of
the unit of the original data

Example: Time spend on study of student in hours. We need to


estimate average variation of student in spending hours of study.
However it is difficult to interpret the result by variance

• The drawbacks of variance are overcome by the standard


deviation.
By: Rediat T.(Assi.Prof.)
Standard deviation (s)

• It is the square root of the variance.


• This produces a measure having the same scale as that of the
individual values.

2
S= S

By: Rediat T.(Assi.Prof.)


Properties of SD

• Has the advantage of being expressed in the same units of


measurement.
• The best measure of dispersion for comparing the variation two or
more group iff the group have the same unit of measurement
Example:To see whether there are more study hours in library in first
year than third year student.
Drawback:
• However, if the units of measurements of variables of two data sets
is not the same, then there variability can’t be compared by SD.
By: Rediat T.(Assi.Prof.)
Inferential Statistics
Methods
 Generalizing result of sample to the whole group
(population) under study
 Make comparisons
 Make predictions
 Assess the strength of the evidence

T-Test
Analysis of Variance(ANOVA)
Correlation Analysis
Regression Analysis
Chi-square test
Concept of Hypothesis testing
 A statistical hypothesis is an assumption or a statement which may or may not
be true concerning one or more populations parameter or issues.
 The assumption is should based on his experience on his area expertise, his
observation or other area of related studies.
 Collect data which will allow the researcher to test the hypothesis.
 To prove this scientifically/statistically, you should create two statement(Ho & H1). We call it hypothesis
statement.
 Hypothesis statement:
Ho: some statement
H1: statement against Ho
 Example:

1) The average income of household in Jigjiga town is not more than


2,000birr.

2) The average age at marriage of women in rural part of Ethiopia is


13 year old.
3) A professional in the filed, you might think most of household in
Jigjiga are educated or have basic class.
…Steps in Hypothesis Testing
Step-1. Choose the hypothesis that is to be tested, which we wish to test is called the Null hypothesis(Ho) and
Choose an Alternative hypothesis(HA), which is accepted if the null hypothesis is rejected.
Step-2. Choose a rule(decision value/ line) for making a decision about when to reject or accept the null
hypothesis.
 Decision line/point/value is obtained from the observed sample data based on its standard
distribution(arrangement of data).
 Each collected data has or follows their own standard distribution like normal distribution, t-distribution,
F-distribution, Chi-square distribution...
 Each distribution has its own formula to obtain (decision value) test statistic(Z-value, t-vale, F-value…)
and has its own tabulated value help us comparison with decision value)
Step-3. Make the decision(either accept or reject Ho).
Step-4. Make conclusion on the statement
Normal distribution:
Tabulated values & Critical value formula(decision value)

 Decision point formula for normal distribution of data:


Correlation Analysis
Correlation analysis is used for assessing the
strength of association or relationship between two
or more variables.
It measure how strong the r/ship among variables.
Measures or quantify the direction and strength of
relationship between two or more quantitative
variables
 For Example:
1) To see the association income and expenditure of society.
2) To measure strength of association among GPA and study
hour of students.
Correlation coefficients can be computed theoretical
as:

 (x i  x )(y i  y)  xy  [ x  y]/n
r 
2
 (x i  x )  (y i  y)
2
[  x 2  ( x) 2 /n][ y 2  ( y) 2 /n]
Case to Practice-1
Consider the following data on the student
score(GPA) and study hour . It is interested
to assess the strength and direction of
association student score and study hour .
(Data is available with your PC)
Linear Regression Analysis
 Regression is a technique that can be used to investigate the effect of one or more
predictor variables on an outcome variables.
 It explain the variation in the dependent variable using the variation in independent
variables.
 It measure the effect of independent variables on dependent variable

 The response(dependent) variable should be continuous


quantitative variable.
 For example :

1) To identify the factor that affect academic performance of Jigjiga


University student.

2) Factor affecting the living standard of household in some selected


rural area of jigjiga
Case to Practical-1
The study was conducted to investigate the factors
that affect the academic performance (GPA) of
student: In case Jigjiga University. Independent variable
expected to affect GPA considered: average study hour per
day, gender, income

Objective: To identify factor that affect the academic


performance of student in study area

Question of interest: What is the major factor that


affect the academic performance of student in study area?
…Cases to discuss
a)How do we interpret all parameter (R,R2,B0, B1,B2, B3,)
b)What is the regression equation based on this data
c)Test the significance of study hour, gender & income and
what is your conclusion on it?
d)Predict GPA of student if the student study on average
per day is 8 hours.
e)What will be the status GPA of student if the gender is
varies(male or female)
f) Which factor is the major determining factor that affect
the academic performance of student in JJU.
Measurement how model fit data

 R Square used to measure how well the model fit the data
 Measure the amount of variability in response variables
explained by a given set of predictor variables
 In our case, Multiple R square(0.4167) implies that the included
independent variables(gender, family income & study hour) explain or
affect academic performance of student by 42% .
 Other 58% affected by other explanatory variables not included here
 “ANOVA” table provides an F-test for the null hypothesis of overall
parameter significance.
(all variable are not significant)
H1: at least one variable significantly affect the dependent
Decision: Reject Ho(since p-value is small)
Conclusion: At least one explanatory variables are related to or
affected the dependent variables (academic performance)
Coefficient Interpretation

Interpretation of coefficient:
1) Study hour(B1)=0.179, implies that academic
performance of student(GPA) will increased by
0.179 much time for each additional hours
student spend on their study.
That is, the more student spent on their study, more
perform academically. (On average 1 hour spend in
study, their AP will increase by 0.179 time).
Which individual variable affect dep. var

 Significance of the individual parameter (independent


variable) will be tested using t-test.
That is:
(ith indep. variable has no effect on dependent variable)
 That is, in our case:
1)Ho: Study hours has no effect on GPA of student
2)Ho: Family income not determine/affect/change the GPA of
student
3) Ho: Gender variation not determine/affect/change the GPA
…From table-3: Cont’d
1) Study hours of student significantly affect
effect on academic performance of student(GPA)
….(p-value <0.05)
2) Variation in Gender(being male or female)
significantly determine/affect/change GPA of
student(p-value(0.034)<0.05).
3) Family income of student not significantly
affect the academic performance(p-
value(0.15)>0.05)
Which factor is major?
The one who have the highest magnitude of coefficient and passes the
significance test, will affect more than other variable the academic
performance of student.
Conclusion:
Based on this study data,
Family income does not determine GPA. However, it found that study
hour & Gender significantly affect the academic performance of
student.
Gender has highest on the academic performance of student than
study hour.
Thank you!!

By: Rediat T.(Assi.Prof.)


Coefficient of variation (CV)

• When two data sets have different units of measurements , the CV


should be used as a measure of dispersion.
• It is the best measure for comparing the variability of two or more
group of observations.
• Data with less coefficient of variation is considered more
consistent and less variable
CV is the ratio of the SD to the mean multiplied by 100.

S
CV   100
x

Example: Two sections of mathematics student were given introduction


to statistics examinations. The following information was observed.
Value Section 1 Section 2
Mean 78 90
Stan. Deviation 6 5

Compare the variability on score of student in two section


Standard Scores (Z-scores)
 It is used to compare two individual observations
variability coming from different groups.
 If X is a measurement from a distribution with
mean X and standard deviation S, then its value
in standard units is given as:

xi  x
Z
sd
Example
Two sections of 40 mathematics student were given
introduction to statistics examinations. The
following information was observed.
Value Section 1 Section 2
Mean 78 90
Stan. Deviation 6 5
let Student A from section 1 scored 90 and student B
from section 2 scored 95.Relatively speaking who
performed better?
Interpretation
Student A performed better relative to his section
because the Z-score of student A is larger than B

Exercise:
Standard score

By: Rediat T.(Assi.Prof.)

You might also like