Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

A Comparison of Classification Accuracy for Gender

Using Neural Networks Multilayer Perceptron (MLP),


Radial Basis Function (RBF) Procedures Compared to
Discriminant Function Analysis and Logistic Regression
Based on Nine Sports Psychological Constructs to
Measure Motivations to Participate in Masters Sports
Competing at the 2009 World Masters Games

Ian Heazlewood 1, Joe Walsh 1, Mike Climstein 2, Jyrki Kettunen 3, Kent Adams 4 and
Mark DeBeliso 5

1
Charles Darwin University
2
University of Sydney
3
Arcada University of Applied Sciences
4
California State University
5
Southern Utah University

Abstract. Neural networks can be applied to many predictive data mining


applications due to their power, flexibility and relatively easy operations. Pre-
dictive neural networks are very useful for applications where the underlying
process is complex, such as in classification using a mix of nominal and ratio
level variables and for predictive validity based on classification modelling. A
neural network can approximate a wide range of statistical models without re-
quiring the researcher to hypothesize in advance certain relationships between
the dependent and independent variables. The two major applications are multi-
layer perceptron (MLP) and radial basis function (RBF) procedures. In contrast
to MLP networks, in the RBS networks it is only the output units that have a bi-
as term. Discriminant analysis (or discriminant function analysis) based on clas-
sification modelling is applied to classify cases into the values of a categorical
dependent variable, usually a dichotomy. Logistic regression is useful for situa-
tions in which you want to be able to predict the presence or absence of a char-
acteristic or outcome based on values of a set of predictor variables. It is similar
to a linear regression model but is suited to models where the dependent varia-
ble is dichotomous. The aim of this research was to apply both neural networks,
discriminant function analysis (a more traditional statistical approach under the
general linear model) and logistic regression and compare their ability as statis-
tical techniques to classify the different genders based nine sports psychological

Ó Springer International Publishing Switzerland 2016 93


P. Chung et al. (eds.), Proceedings of the 10th International Symposium
on Computer Science in Sports (ISCSS), Advances in Intelligent Systems
and Computing 392, DOI 10.1007/978-3-319-24560-7_12
94 I. Heazlewood et al.

constructs to measure motivations to participate in masters sports. The sample


consisted of 3687 male and 3488 female master’s athletes who participated in
the 2009 World Masters Games and represented a volunteer/convenient sample
in the study and a cross-sectional non-experimental research design. The Moti-
vations of Marathoners Scales (MOMS) psychometric instrument assessed par-
ticipant motivation by nine constructs/factors using factor scores from a 56 item
seven Likert type survey instrument measuring motivations to participate. The-
se factors were health orientation, weight concern, personal goal achievement,
competition, recognition, affiliation, psychological coping, life meaning and
self-esteem. The accuracy of the solutions were assessed with neural networks,
by classification accuracy using both test and holdout samples, predicted-by-
observed chart, ROC curve, cumulative gains and lift charts, independent varia-
ble importance and normalised importance; and discriminant function analysis
by both original and cross-validation samples, lambda values, p-values, toler-
ance, F to remove and in stepwise discriminant analysis by the hierarchy of in-
clusion steps. Similar methods were applied when assessing classification accu-
racy using logistic regression. The results in terms of MLP analysis was overall
correct percent of 64.4% and the order of importance was competition, self-
esteem, affiliation, recognition, weight concern, health orientation, goal
achievement, psychological coping and life meaning. In terms of RBF analysis
training sample overall correct percent was 60.5% and order of importance was
competition, affiliation, recognition, psychological coping, weight concern, life,
meaning, self-esteem, goal achievemnt and health orientation. For the discrimi-
nant analysis the overall correct classification rate was 63.3% and for logistic
regression 63.0% and the stepwise entry order into both analyses was affilia-
tion, competition, self-esteem, recognition, weight concern and health orienta-
tion. The classification accuracies based on MLP, discriminant analysis and lo-
gistic regression were very similar in outcome for both the classification of
gender and combined classification accuracy. None of the classification tech-
niques based on neural network analyses and multivariate method of discrimi-
nant analysis and logistic regression were overtly superior to each other. Alt-
hough it is important to note the RBF neural network displayed classification
accuracy slightly lower than the other three methods.

1 Introduction

1.1 Neural Networks


Neural networks can be applied to many predictive data mining applications due to
their power, flexibility and relatively easy operations. Predictive neural networks [1,
2] are very useful for applications where the underlying process is complex, such as in
classification using a mix of nominal and ratio level variables and for predictive valid-
ity based on classification modelling. A neural network can approximate a wide range
of statistical models without requiring the researcher to hypothesize in advance cer-
tain relationships between the dependent and independent variables. Neural networks
are the preferred tool for many predictive data mining applications because of their
power, flexibility, relevance and ease of use. Predictive neural networks are particu-
A Comparison of Classification Accuracy for Gender Using ... 95

larly useful in applications where the underlying process is complex, especially pat-
tern recognition and classification problems that are based on predictive and concur-
rent validity.
Neural networks used in predictive applications, such as the multilayer perceptron
(MLP) and radial basis function (RBF) networks, are supervised in the sense that the
model-predicted results can be compared against known values of the target variables.
These target variables are identified on a priori criteria by the researcher. The term
neural network applies to a loosely related family of models, characterized by a large
parameter space and flexible structure, descending from studies of brain functioning.
As the family grew, most of the new models were designed for non-biological appli-
cations, though much of the associated terminology reflects its origin in biology [1,
2]. A neural network is a massively parallel distributed processor that has a natural
propensity for storing experiential knowledge and making it available for use and is
analogous to human brain function. Specifically, it resembles the brain in two re-
spects; knowledge is acquired by the network through a learning process and Inter-
neuron connection’ strengths, known as synaptic weights and analogous to human
synapses, are used to store the knowledge.
A neural network can approximate a wide range of statistical models without re-
quiring that you hypothesize in advance certain relationships between the dependent
and independent variables, a non a priori model. Instead the form of the relationships
is determined during the learning process [1, 3]. A type of neural processing phenom-
enology in this context. The trade-off for this flexibility is that the synaptic weights of
a neural network are not easily interpretable. Thus, if you are trying to explain an
underlying process that produces the relationships between the dependent and inde-
pendent variables, it would be better to use a more traditional statistical model, such
as discriminant analysis or logistic regression. However, if model interpretability is
not important, you can often obtain good model results more quickly using a neural
network [1, 3]. Although neural networks impose minimal demands on model struc-
ture and assumptions, unlike inferential statistics, it is useful to understand the general
neural architecture or neural network structure. The multilayer perceptron (MLP) and
radial basis function (RBF) networks are functions of predictors (also called inputs or
independent variables) that minimize the prediction error of target variables (also
called outputs) [1, 3].

1.2 Discriminant Analysis


Discriminant analysis (or discriminant function analysis) based on classification mod-
elling is applied to classify cases into the values of a categorical dependent variable,
usually a dichotomy [3. 4]. In sport this could be males compared to females on dif-
ferent motor fitness tests or different player grades using the same principles. If dis-
criminant function analysis is effective for a set of data, the classification table of
correct and incorrect estimates will yield a high percentage correctly classified cases
and maybe useful in such processes as sport talent identification, such as in Olympic
sports or motor fitness differences based on gender. The major foci of discriminant
analysis [3, 4, 5, 6, 7] are to; 1. Classify cases into groups using a discriminant predic-
96 I. Heazlewood et al.

tion equation and test theory by observing whether cases are classified correctly as
predicted. Investigate differences between or among groups and determine the most
parsimonious way to distinguish among groups. 2. Determine the percent of variance
in the dependent variable explained by the independents. Determine the percent of
variance in the dependent variable explained by the independents over and above the
variance accounted for by control variables, using sequential discriminant analysis. 3.
Assess the relative importance of the independent variables in classifying the depend-
ent variable and discard variables, which are little related to group distinctions.

1.3 Logistic Regression


Logistic regression is useful for situations in which you want to be able to predict the
presence or absence of a characteristic or outcome based on values of a set of predic-
tor variables. It is similar to a linear regression model but is suited to models where
the dependent variable is dichotomous. Logistic regression coefficients can be used to
estimate odds ratios for each of the independent variables in the model. Logistic re-
gression is applicable to a broader range of research situations than discriminant anal-
ysis. When comparing logistic regression with discriminant analysis both provide
similar predictive and classification outcomes and employ similar diagnostic meas-
ures such as model fit indices, however logistic regression is less affected when basic
assumptions as deviations from normality occur [8].
Previous research evaluated classification accuracy using MLP and discriminant
analysis to classify high ability and non-high ability karate athletes based on differ-
ences on physiological and biomechanical measures [9], classification accuracy of
MLP, RBF and discriminant analysis using similar predictors of participant motiva-
tion when classifying gender based on this construct [10] and a comparison of classi-
fication accuracy for gender using MLP and RBF and discriminant analysis based on
biomechanical measures of isokinetic torque, work, power, fatigue index, counter
movement jump and 10m acceleration [11]. These research studies discovered the
classification accuracy using MLP, RBF and discriminant analysis produced very
similar results, however these studies used smaller sample sizes and did not include
the method of logistic regression as a multivariate statistical classification method.
Using and comparing multiple classifications methods as MLP, RBF, discriminant
analysis and logistic regression would enable a more complete understanding of the
value of such methods when attempting to classify into mutually exclusive groups
based on underpinning differences, if they exist, between different groups.

1.4 Research Aim


The aim of this research was to apply both neural networks MLP and RBF and stand-
ard statistical multivariate discriminant function analysis (a more traditional statistical
approach under the general linear model) and logistic regression, and then compare
their ability as statistical techniques to classify the different genders based nine sports
psychological constructs/factors to measure motivations to participate in masters
sports. Specifically, the motivations to participate factors were health orientation,
A Comparison of Classification Accuracy for Gender Using ... 97

weight concern, personal goal achievement, competition, recognition, affiliation, psy-


chological coping, life meaning and self-esteem [12]. In addition, the research aim
also focused on which factors or variables provided the greatest difference between
the genders, the establishment of a hierarchy of factor-variable importance and to
assess which multivariate method of classification provided the best solution.

2 Methods

ņ The sample consisted of 3687 male (age = 53.72 years, s.d. = +/-10.05 years) and
3488 female (age = 49.39 years, s.d. = +/-9.15 years) master’s athletes and repre-
sented a volunteer/convenient sample in the study and a cross-sectional non-
experimental research design from the potential population of approximately
33,000 masters athletes competing at the 2009 World Masters Games.
MLP and RBF neural networks, discriminant analysis and logistic regression were
applied, based on a set of dependent/covariate variables, which were the nine partici-
pant motivations factors of health orientation, weight concern, personal goal
achievement, competition, recognition, affiliation, psychological coping, life meaning
and self-esteem. The categorical or classification dichotomous variable utilised in the
different models was gender, specifically the male and female athletes who competed
at the 2009 World Masters Games and who completed the sports psychometric in-
strument specifically designed to measure the nine different participant motivation
factors [9]. The instrument assessed the participant motivation factors derived using
factor scores from a 56 item question bank and athletes responding on a seven point
Likert type scale for each question [9]. The sport psychological instrument was com-
pleted via an online survey using the Limesurveytm interactive survey system prior to,
and during competition at the 2009 World Masters Games. The four statistical meth-
ods were compared for their classification accuracy to successfully discriminate be-
tween male and female athlete responses. Neural networks, specifically the multilayer
perceptron (MLP) and radial basis function (RBF) networks, were applied and com-
pared with stepwise method discriminant analysis and stepwise logistic regression for
classification accuracy. The neural network multilayer perceptron architecture was
based on:
x Selecting one hidden layer where the hidden layer contains unobservable network
nodes (units). Each hidden unit is a function of the weighted sum of the inputs.
The function is the activation function, and the values of the weights are deter-
mined by the estimation algorithm.
x The selected activation function was the hyperbolic tangent, where the activation
function links the weighted sums of units in a layer to the values of units in the
succeeding layer.
x Hyperbolic tangent function has the form Ȗ(c) = tanh(c) = (ecíeíc) / (ec+eíc).
(1)
98 I. Heazlewood et al.

x It takes real-valued arguments and transforms them to the range (–1, 1). When
automatic architecture selection is used in SPSS, this is the activation function for
all units in the hidden layers.
The identity function was selected and this function has the form: Ȗ(c) = c. It takes
real-valued arguments and returns them unchanged. When automatic architecture
selection is used, this is the selected activation function for units in the output layer if
there are any scale-dependent variables. Training the network was based on the batch
method. This method updates the synaptic weights only after passing all training data
records, which means batch training uses information from all records in the training
dataset. Batch training is often preferred because it directly minimizes the total error
and is most useful for smaller datasets. In contrast to MLP networks, in the RBF net-
works it is only the output units that have a bias term.
Discriminant analysis was based on using the nine factors/variables as the starting
point in the stepwise method, which is based on statistical criteria to enter the model
at each calculation step. Gender was used as the independent dichotomous variable in
the analysis. It must be emphasised that the comparison of discriminant analysis with
neural network analysis were based on the identical nine factors. The stepwise method
was applied to generate a hierarchy of importance in terms of predictor variables and
to assess which variables contributed to significant difference between genders.
Applying logistic regression the dependent variable was the nominal dichotomous
variable gender where dummy coding zero for male and one for female. The predictor
or independent variables were the nine factors of participant motivation utilised in the
MLP, RBF and discriminant analysis. The logistic regression variable selection
method was the forward selection likelihood ratio stepwise selection method with
entry testing based on the significance of the score statistic, and removal testing based
on the probability of a likelihood-ratio statistic based on the maximum partial likeli-
hood estimates.

3 Results

The different statistical methods produced slightly different outcomes in independent


variable importance and slightly different classification accuracy. For MLP the order
of importance competition, self-esteem, affiliation, recognition, weight concern,
health orientation, goal achievement, psychological coping and life meaning. For RBF
the order of importance is somewhat different and the order was competition, affilia-
tion, recognition, psychological coping, weight concern, life, meaning, self-esteem,
goal achievemnt and health orientation. Self-esteem which was identified as number
two with MLP is identified as number seven with RBF, whereas life meaning is
ranked ninth with MLP and ranked sixth with RBF. This indicates that the two ap-
proaches are providing different solution in terms of the predictor variable importance
hierarchy.
The discriminant and logistic regression analyses in terms of the factors entered in
the stepwise solutions were identical and the factors were entered in the following
A Comparison of Classification Accuracy for Gender Using ... 99

order of affiliation, competition, self-esteem, recognition, weight concern and health


orientation.
The discriminant and logistic regression analyses in terms of the factors entered in
the stepwise solutions were identical and the factors were entered in the following
order of affiliation, competition, self-esteem, recognition, weight concern and health
orientation. It is important to highlight that the stepwise methods only selected six
significant factors as important to providing the classification solutions, whereas the
MLP and RBF methods included all nine factors in the analysis with no inclusion or
exclusion criteria that exist with discriminant analysis and logistic regression.
In terms of classification accuracy MLP, discriminant analysis and logistic regres-
sion were very similar at approximately 66% for males and approximately 60% for
females and the exact classification percentages are displayed in table 1. The RBF
classification accuracy was slightly less accurate at 63.5% for males and 57.4% for
females. The combined classification accuracy was MLP 64.4%, RBF 60.5%, dis-
criminant analysis 63.3% and logistic regression 63.0%, and once again the RBF
function was slightly less accurate for females. Overall the actual differences when
comparing the accuracy of the different classification methods were only marginal in
outcome. This indicates a convergence in all classification solutions with the data set
in this research.

Table 1. Classification accuracy for males, females and combined genders based on MLP, RBF,
discriminant analysis and logistic regression.

Method Male Classifica- Female Combined Clas-


tion Classification sification
% % %
MLP 66.8 61.8 64.4
RBF 63.5 57.4 60.5
Discriminant 66.0 60.5 63.3
Logistic 66.1 59.8 63.0

4 Conclusion

The multilayer perceptron (MLP) networks, was more effective in predicting group
membership based on gender using the nine factors that represented the multiple di-
mensions of participant motivation within male and females athletes competing at the
2009 World Masters Games (WMG) and displayed a reasonable level of predictive
validity and marginally more predictive than radial basis function (RBF) networks.
However, when MLP is compared to the general linear multivariate methods of dis-
criminant analysis and logistic regression MLP only marginally outperforms these
methods. The MLP and RBF utilised the nine factors in the analysis whereas stepwise
discriminant analysis and logistic regression required only six discriminating variable
to provide a solution nearly as accurate as MLP.
100 I. Heazlewood et al.

In terms of which participant motivation factors were the best discriminators be-
tween the genders, both discriminant analysis and logistic regression produced identi-
cal hierarchies concerning order of importance of factors, which were in order from
the most to least important affiliation, competition, self-esteem, recognition, weight
concern and health orientation. The variables excluded from the model as not contrib-
uting significantly were life meaning, psychological coping and goal achievement.
The order of importance identified by MLP and RBF were different from discriminant
analysis and logistic regression, as two slightly difference orders of importance were
derived.
The order of importance MLP was completion, self-esteem, affiliation, recognition,
weight concern, health orientation goal achievement, psychological coping and life
meaning, whereas for RBF the order was competition, affiliation, recognition, psy-
chological coping, weight concern, life meaning, self-esteem, goal achievement and
health orientation. This indicates although the two methods were very different con-
cerning order of importance they were somewhat similar in classification accuracy.
One of the problems with neural networks is they can produce different solutions
from the same data base as they open ended learning structures and hence the possi-
bility of multiple solutions based on the identical data base. To overcome this prob-
lem in terms of replicating results the researcher has to use the same initialization
value for the random number generator, the same data order, and the same variable
order, in addition to using the same procedure settings [3]. Alternatively, as stated in
the introduction if the researcher is trying to explain an underlying process that pro-
duces the relationships between the dependent and independent variables, it would be
better to use a more traditional statistical model, such as discriminant analysis or lo-
gistic regression and in this research these methods produced essentially identical
solutions.

References
1. Fausett, L.: Fundamentals of Neural Networks: Architectures, Algorithms and Applica-
tions. Upper Saddle River NJ: Prentice Hall, (1994)
2. SPSS Inc.: SPSS Statistics Base User’s Guide 17.0. Users Guide. Chicago, IL: SPSS Inc,
(2007)
3. SPSS Inc.: SPSS Neural NetworksTM 17.0. Chicago, IL: SPSS Inc, (2007)
4. Norusis, M.: Advanced Statistics Guide: SPSSX. Chicago, IL: SPSS Inc, (1985)
5. StatSoft, Inc.: Electronic Statistics Textbook. Tulsa, OK: StatSoft. WEB:
http://www.statsoft.com/textbook/, (2010)
6. Hair, J., Block, W., Babin, B., Anderson, R., Tatham, R.: Multivariate Data Analysis. (6th
Ed.). Upper Saddle River: Pearson - Prentice Hall,( 2006)
7. SPSS Inc.: IBM SPSS Statistics 22. Chicago, IL: SPSS Inc, (2013)
8. SPSS Inc.: SPSS Regression 17.0. Chicago, IL: SPSS Inc, 2007.
9. Heazlewood, I., Keshishian, H.: A Comparison of Classification Accuracy for Karate
Ability Using Neural Networks and Discriminant Function Analysis Based on Physiologi-
cal and Biomechanical Measures Of Karate Athletes. Refereed Proceedings of the Tenth
Australasian Conference on Mathematics and Computers in Sport. July 5-7, 2010. Crowne
Plaza, Darwin, Northern Territory, Australia. Pp. 197-204. (2010)
A Comparison of Classification Accuracy for Gender Using ... 101

10. Heazlewood, I., Walsh, J., Burke, S., Climstein, M., Kettunen, J., Adams, K., DeBeliso,
M.: A Comparison of Classification Accuracy for Gender Using Neural Networks Multi-
layer Perceptron (MLP) and Radial Basis Function (RBF) Procedures and Discriminant
Function Analysis Based On Nine Sports Psychological Constructs to Measure Motiva-
tions to Participate in Masters Sports. Proceedings of 2012 Pre-Olympic Congress-IACSS
2012, pp. 88-94: Liverpool, England, UK, July 24-25, 2012 ISBN 978-1-84626-094-0.
(2012)
11. Heazlewood, I., Walsh, J.: A Comparison of Classification Accuracy for Using Neural
Networks Multilayer Perceptron (MLP) and Radial Basis Function (RBF) Procedures and
Discriminant Function Analysis. Proceedings of the International Association of Computer
Science in Sport Conference (IACSS2014). Ed. Assoc. Prof. Ian Heazlewood, Assoc. Prof.
Anthony Bedford, Darwin, Australia, June 22-24. 2014. Pp. 116-120. (2014)
12. Masters, K., Ogles, B., Dolton, J.: The development of an instrument to measure motiva-
tion for marathon running: the Motivations of Marathoners Scales (MOMS). Research
Quarterly in Exercise and Sport. 1993, 64 (2):134-43. (1993)

You might also like