Practical - 592 MA SOCIOLOGY SPSS Fourth Sem

Tribhuvan University

Department of Sociology
Saraswoti Multiple Campus

So592: Survey Research and Computer Data Analysis

with SPSS
Lecturer Janak Paudel
Scale of Measurement
Scales of Ordering Differenc Ratio Variable
measureme Relation e Type
Nominal No No No
Ordinal Yes No NO

Interval Yes Yes No

e variables
Ratio Yes Yes Yes
Example of Four Types of Variables

Student Gender Social Status Score in I.Q. Age

Male=1 High=1 Test
Female=2 Medium=2

A 1 1 30 20

B 2 2 40 26

C 1 3 60 22
Analysis of Nominal Scale
• Univariate data analysis: Analysis of
one variable; one way frequency table
• Bivariate data analysis: Analysis of two
variables; two way frequency table,
Chi-square test
• Level of significance

One way frequency table

• Analyze
– Descriptive
• Frequencies
– Ecological region (choose any one of the variables)
» ok
Two Way Frequency Table
• Analyze
– Descriptive
• Cross-tab
– Poverty and Education (choose any two categorical
– OK
For Chi-square test tick (√) Pearson box after you
insert variables and then OK.
Formula is, χ2=∑(O-E)2/E
Where, χ2= Chi-square, O= Observed frequency,
E=Expected frequency
Go to New File
Analysis of Ordinal Scale
• Univariate data analysis: Analysis of
one variable; one way frequency table
• Bivariate data analysis: Analysis of two
variables; Spearman rho, Chi-square
test of indepedence
Ordinal Scale Data-Frequency
(K4 in 04538-0011.sav CVFS)
Spearman’s Rank Order
Spearman’s rho(ρ)=1-6∑D2/N(N2-1)
Where, ρ=rho, D= Difference between two ranks,
N=Number of observations/frequency
Where R denotes coefficient of rank
correlation, D denotes the differences
between paired ranks, and N stands for
the number of pairs.
Spearman’s rho (rank in ordinal)
Analysis of Interval and Ratio-scale Data

• Univariate data analysis: Analysis of

one variable; mean, median and mode;
range and standard deviation; one
sample z and t tests
• Bivariate data analysis: two way
frequency table; two sample z, t and F
tests; scatter diagram, correlation and
correlation coefficient, simple linear
regression and binary logistic
Univariate Data Analysis
Mean, median, mode and standard
Descriptive Statistics
Variable (pcinc)
(mean, median, mode, std. deviation) FILF INCOM
Univariate statistics (pcinc)
Difference between z and t
• Z test- for larger population
• t-test for smaller population
• No difference in meaning and
interpretation, nowadays SPSS has
made easier in handling large scale
data and we use t-test for both large
and small population
Desctiptive Statistics

• Analyze
One Sample t-Test
• Analyze
Compare means
One sample t-test
Dependent variable
Test value
Compare means
• Analyze
Compare means
One sample t-test
Independent sample t-test
Paired sample t-test
One way Anova
One sample test (pcinc in income &
Independent Sample test (age in New one. sav)
Paired sample test (E15 & E15_BLT2 in 04538-0011 CVFS.sav)
Same samples at different points in time (1996 and 2008)

One variable categorical and another

(E15 & A1_recode)
Bivariate Analysis (two variables)

• Correlation:Karl Pearsons
• Anlyze
Variable(pcinc &pcexp)
• Two continuous (quantitative) variables
• Formula
r=∑xy/√∑x ∑y 2 2

where, x stands for the deviations of the individual items of the

subject from their mean, and y for the deviation of the
individual items of the relative from their mean.
Bivariate Correlation (percapita income and
percapita expenditure)
Scatter Plot
• Graphs
Legacy Dialogues
Variables (x-axis and y-
(Pcinc in x-axis and pcexp in y-axis)
Scatter Plot
Association between two categorical variables
(H7 & Gender in 04538-0011.sav)
Multivariate Analysis
• Linear regression-OLS
• Logistic regression (Binary regression)
Linear (Ordinary Least Square)
• It is used when variables are
continuous; e.g. income, birthrate
• Dependent variable must be
• Regression is used to predict the value
of dependent variable (Y), also called
the “effect”, due to change in the value
of independent variable (X), also called
Linear. . .
• In simple regression we examine the effect of
one independent variable, also called cause
on the dependent.
• In multiple regression we examine the effect
of two or more than two independent
variable, also called causes (Xs) on the
dependent variable (Y).
• Assumption is that there is linear
relationship between independent and
dependent variable

Regression equation is
Y= ά+ βX+µ
Where, Y= Dependent Variable (Effect)
ά = Intercept (constant)
β = Value of effects of X on Y
X= Independent variable (Cause)
Simple linear regression model is
Multiple Correlation (Linear regression)
• Analyze
Dependent Variable
(minutes for interview)

Independent Variable
(Education, Age and Gender)
Statistics → (√) Model Fit (√)
Output (Result)

Regression . . .

1. First table shows variable entered and
method (enter)
2. Second table model summary shows 0.347
which means about 35 (34.7)% variance in
the dependent variable is caused by
independent variable (education, age and
3. Third table ANOVA shows that fitted model
is significant (F value significant as p<0.001)
4. Fourth table coefficient shows the effects of
independent variables (age, education and
gender) on dependent variable (length of
Logistic (binary) regression

• When the variables are categorical

(may be any continuous) and measured
in binary or ‘Yes’ and ‘No’ (dummy or
‘0’ and ‘1’) go on logistic (binary)
• Take one group as reference
(especially first category, e.g. 0) and
compare with other.
Equation of binary logistic regression

Logistic Regression
Equation takes the form:

Logit(Y) = ln [Pi/(1- Pi)] = β0 + β1X1 +…+ βnXn

Logit(Y) = ln [Odds)] = β0 + β1X1 +…+ βnXn

Logit(Y)=Predicted value of dependent variable

ln = Natural log
Pi =Probability of experiencing an event
(1-Pi) =Probability of not experiencing an event
Pi/(1-Pi) =Odds of experiencing an event

β0 =Intercept
β1 =Regression coefficient (Logit coefficient)
Xn =Independent (or explanatory) variable
Binary logistic. . .

Logistic Regression

Intercept= Value of dependent variable when the value of X=0

Slope or Logit Coefficient = The amount of change in Y (logged odds of Y)

for each unit change in X

Also interpreted as odds ratio

X= an independent variable (predicted value Y depends on the value of X)

Logistic . . .

Logistic Regression

Problem= Value of dependent variable is bounded between 0 and 1

Solution= Transform the dependent variable

How to Transform?

1. Transform the probability into “odds”

If Pi is the probability of occurring an event,

the odds= Pi/(1-Pi)

If Pi tends to 1; (Pi/(1-Pi)) tends to+∞ (Upper limit)

If Pi tends to 0; (Pi/(1-Pi)) tends to 0
Logistic Regression
2. Transform the “odds” into the log odds or “logit”

Log (of the) odds or “logit” = ln(Pi/(1-Pi))

As Pi=1, (Pi/(1-Pi)) tends to +∞

But as Pi tends to 0, ln(Pi/(1-Pi) tends to -∞

Probabilities, Odds, and Log Odds (Logit)

Probability Odds Logit=Log odds=
(Pi) (Pi/(1-Pi)) ln(Pi/(1-Pi))
0.999 999.0 +6.9
0.5 1.0 0
0.001 .001001 -6.9
Value of Odds Ratio Ranges from 0 to +∞

More than 1=Increased or higher odds

Less than 1=Decreased or lower odds
Equal to 1 = Same odds (no difference)

Value of Logit Coefficient ranges from -∞ to +∞

- =decreased logged odds

+ =increased logged odds
Process of binary logistic regression

• Analyze
Dependent variable (response)
Covariates (age, gender)
Covariates Categorical
Age Age
Gender Gender
Indicator-Reference group
o Last o First Change
Binary logistic regression
(salaried job, uch, gender and
Omni bus Tests of Model Coeffici ents

Chi-square df Sig.
Step 1 Step 1487.077 2 .000
Block 1487.077 2 .000
Model 1487.077 2 .000

Model Summary

-2 Log Cox & Snell Nagelkerke

Step likelihood R Square R Square
1 4653.326a .246 .357
a. Estimation terminated at iteration number 6 because
parameter es timates changed by less than . 001.
Binary logistic. . .

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

GENDER(1) -2.778 .097 819.838 1 .000 .062
1 ed_eth(1) .504 .079 40.664 1 .000 1.655
Constant -.396 .069 32.522 1 .000 .673
a. Variable(s) entered on step 1: GENDER, ed_eth.
Binary logistic . . .

Classification Tablea

Predic ted

Salary ever Percentage

Observ ed No Yes Correct
Step 1 Salary ever No 2958 894 76.8
Yes 444 975 68.7
Ov erall Percentage 74.6
a. The cut v alue is .500

