Ant Analysis (Smoker Edition) Final

Group member
Talha Basham
Kashan Iqbal
Umer Farooq
Ozair Ali
M Aneeq Siddiqui
Definition
A procedure for the determination of the group to which an
individual belongs, based on the characteristics of that individual.
Suppose we have measurements on p characteristics for each of a
sample of individuals. We know that each individual belongs to one
of g groups, but we do not know which. Discriminant analysis
attempts to maximize the probability of correct allocation. It differs
from cluster analysis in that we have an initial data set, the training
set, whose group allocations are known.
Purpose
 The purpose of Discriminant Analysis is to classify objects (people,
customers, things, etc.) into one of two or more groups based on a set of
features that describe the objects (e.g. gender, age, income, weight,
preference score, etc. ). In general, we assign an object to one of a number
of predetermined groups based on observations made on the object.
 Note that the groups are known or predetermined and do not have order
(i.e. nominal scale). The classification problem gives several objects with a
set features measured from those objects. What we are looking for is two
things:
 Which set of features can best determine group membership of the
object?
 What is the classification rule or model to best separate those groups?
Example
Literally, "discrimination" means recognizing a difference between
two things. If I choose to buy New model Toyota car instead of
recondition car, that's an example of discrimination on the basis of
performance.
SSPS Requirement
 The observations are a random sample
 Each predictor variable is normally distributed
 Dependent variable should be categorical or nominal
 Sample size should be > 30
 There must be at least two groups or categories,
 Independent variable may have any type of measurement i.e.

(ordinal, nominal, scale)
Objective
We take ‘smoke’ is a nominal variable indicating whether the
employee smoked or not. The other variables to be used are age,
days absent sick from work last year, self-concept score, anxiety
score and attitudes to anti-smoking at work score. The aim of the
analysis is to determine whether these variables will discriminate
between those who smoke and those who do not.
Box’s M test results table
Ho : Covariance are same
H1 : Covariance are not same
Box’s M 176.474
F Approx. 11,615
df1 15
df2 600825.3
Sig. 0
Wilks’ lambda
Wilks’ lambda indicates the significance of the discriminant function.
The below table indicates a highly significant function (p < .000).
Test of Wilks’
function(s) Lambda Chi-square df Sig.
1 0.356 447.227 5 0
Eigenvalues table
This table indicates that 80.2% relationship between actual and
predicted value
Eigenvalues
Cumulative Canonical
Function Eigenvalue % of variance % correlation
1 1.806a 100 100 0.802

Canonical Discriminant Function
Coefficients
D = (.024 × age) + (.080 × self-concept) + (−.100 × anxiety) + (−.012
days absent) +(.134 anti smoking score) − 4.543
Function
1
Age .024
Self concept score .080
Anxiety score -.100
Days absent last year -.012
Total anti-smoking policies subtest .134
(Constant) -4.543
Classification Results(a)
Predicted Group
Membership Total
smoke or non- smoker
not smoker
Original Count Non smoker 238 19 257
Smoker 17 164 181
% Non smoker 92.6 7.4 100.0
Smoker 9.4 90.6 100.0
91.8% of original grouped cases correctly classified.

Research Paper
We would like to predict a user of Internet banking from a non-user of
Internet banking. In this case, the dependent variable is a nominal variable
with 2 levels or categories with say 1 = User and 2 =Non-user. In this case,
regression analysis is no longer appropriate. Next, we have a choice of
using a discriminant analysis which is a parametric analysis or a logistic
regression analysis which is a non-parametric analysis. The basic
assumption for a discriminant analysis is that the sample comes from a
normally distributed population
Whereas logistic regression is called a distribution free test where the

normality requirement is not needed. This paper will only delve into the
use of discriminant analysis as parametric tests that are much more
powerful than its non-parametric alternative
(Ramayah et al., 2004;Ramayah et al., 2006).
Thank you

Ant Analysis (Smoker Edition) Final

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ant Analysis (Smoker Edition) Final

Uploaded by

Copyright:

Available Formats

Group member

 Each predictor variable is normally distributed

 Dependent variable should be categorical or nominal

 Sample size should be > 30

 There must be at least two groups or categories,

 Independent variable may have any type of measurement i.e.

1 1.806a 100 100 0.802

91.8% of original grouped cases correctly classified.

Whereas logistic regression is called a distribution free test where the

You might also like