Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Journal of Real Estate Research

ISSN: 0896-5803 (Print) 2691-1175 (Online) Journal homepage: https://www.tandfonline.com/loi/rjer20

Impact of Artificial Neural Networks Training


Algorithms on Accurate Prediction of Property
Values

Joseph Awoamim Yacim & Douw Gert Brand Boshoff

To cite this article: Joseph Awoamim Yacim & Douw Gert Brand Boshoff (2018) Impact of Artificial
Neural Networks Training Algorithms on Accurate Prediction of Property Values, Journal of Real
Estate Research, 40:3, 375-418, DOI: 10.1080/10835547.2018.12091505

To link to this article: https://doi.org/10.1080/10835547.2018.12091505

Published online: 17 Jun 2020.

Submit your article to this journal

Article views: 18

View related articles

View Crossmark data

Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=rjer20
Impact of Artificial Neural Networks
Tr a i n i n g A l g o r i t h m s o n A c c u r a t e
P r e d i c t i o n o f P r o p e r t y Va l u e s

Authors J o s e p h Awoa mi m Yacim a nd D ou w G ert


B r an d B o s ho ff

Abstract This study extended the use of artificial neural networks (ANNs)
training algorithms in mass appraisal. The goal was to verify the
comparative performance of ANNs with linear, semi-log, and
log-log models. The methods were applied to a dataset of 3,232
single-family dwellings sold in Cape Town, South Africa. The
results reveal that the semi-log model and the Levenberg-
Marquardt trained artificial neural networks (LMANNs)
performed best in their respective categories. The best
performing models were tested in terms of prediction accuracy
within the 10% and 20% of the assessed values, performance,
and reliability ranking, and explicit explainability ranking order.
The LMANNs outperform the semi-log model in the first two
tests, but fail the explainability ranking order test. The results
demonstrate the semi-log model as the most preferred technique
due to its simplicity, consistency, transparency, locational
advantage, and ease of application within the mass appraisal
environment. The black box nature of the ANNs inhibits the
production of sufficiently transparent estimates that appraisers
could use to explain the process in legal proceedings.

Keywords ANN training algorithms, mass appraisal, predictive accuracy,


model transparency

There have been concerted efforts worldwide to deal with problems of


inconsistencies that sometimes occur in mass appraisal assessment. These
challenges can sometimes be the result of appraisers’ intuitiveness and the
approach or methods employed. The intuition of an appraiser is predicated on the
foreknowledge of market reactions relating to different classes of properties over

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
3 7 6 u Ya c i m a n d B o s h o f f

time. The intuition is what influences the appraiser(s) choice of method for
estimation of property values. Income, market, and cost approaches are the most
widely used appraisal methods for determining the market values of residential
properties. These methods are used for single and mass appraisal until obvious
limitations are observed. The limitations relate to subjectivity in dealing with a
number of properties, delay in reporting value estimates to clients, and an
insufficient number of comparable properties. In mass appraisal predictions,
sophisticated technology is required to comparatively evaluate a number of
properties (McGreal, Adair, McBurney, and Patterson, 1998). Technology is
employed to complement the appraisers’ efforts in appraising values, reduce their
workload, and increase the precision of estimates.
Several high-tech methods including supportive vector machines, hedonic
regression models, expert systems, fuzzy logic, and artificial neural networks are
used in mass appraisal. The hedonic regression models have been the most
extensively used techniques in modeling property prices among practitioners and
academics (Zurada, Levitan, and Guan, 2011). The hedonic regression models are
used to advise mortgage lenders, local tax authorities, dissolved companies etc.
on the market values of properties. However, despite their widespread use, the
methods have a number of shortcomings including an inability to handle
specification error exacerbated by nonlinearity, multicollinearity, and functional
form (Do and Grudnitski, 1992; Worzala, Lenk, and Silva, 1995. The
shortcomings led to the emergence of a number of propositions towards the use
of non- or semi-parametric regression techniques in mass appraisal. Artificial
neural networks (ANNs) are among the techniques designed to remediate the
obvious limitations of the hedonic regression models. Pioneering works in the
field include Borst (1991, 1995), Do and Grudnitski (1992), Tay and Ho (1992),
Worzala, Lenk, and Silva (1995), and McCluskey (1996). The model is designed
to handle the complex nonlinear relation that exists in data without the many
parametric restrictions that are found in statistical techniques.
Numerous elements and processes are required for the smooth operation of the
ANNs. Training the ANNs is one of such process that is fundamental to the
achievement of desired results. The purpose of the training phase is to reduce a
cost function usually defined as sum squared error (SSE), mean squared error
(MSE) or root mean squared error (RMSE) between the actual and the predicted
property sale values by adjusting weights and biases. There are a number of
training algorithms including back propagation (BP), Levenberg-Marquardt (LM),
Powell-Beale conjugate gradient (PBCG), and scaled conjugate gradient (SCG).
The most frequently used training algorithm in the mass appraisal of properties
is the BP, first developed by Werbos (1974) and popularized for multilayer
perceptron by Rumelhart, Hinton, and Williams (1986). Most of the studies
involving the ANNs utilize the BP training algorithm (Borst, 1991; Do and
Grudnitski, 1992; Tay and Ho, 1992; Borst, 1995; Worzala, Lenk, and Silva, 1995;
McCluskey, 1996; Lenk, Worzala, and Silva, 1997; McGreal, Adair, McBurney,
and Patterson, 1998; Nguyen and Cripps, 2001; Limsombunchai, Gan, and Lee,
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 3 7 7

2004; Peterson and Flanagan, 2009; McCluskey et al., 2012; McCluskey, et al.,
2013). Additionally, there has been no definite conclusion on the superiority of
the ANNs over the hedonic regression models in most of these studies.
Furthermore, the BP algorithm has an unimpressive behavior of getting entrapped
in local minima, providing longer iterations, and employing parameter settings
that are user dependent. Therefore, in this study we propose two ANN training
algorithms for mass appraisal. The goal is to compare their performance relative
to previously used training algorithms and the hedonic regression models.
Accordingly, the methodologies are applied to data for 3,232 sales transactions of
single-family dwellings in Cape Town, South Africa. We find that training the
ANNs with other algorithms namely LM, PBCG, and SCG considerably improves
the performance of ANNs than the BP and the LM outperforms all the algorithms.
We also document the relevance of the hedonic regression models in the mass
appraisal environment despite their underperformance to the ANNs.
The paper is organized as follows. We provide a literature review, and then give
an overview of the methodology and training algorithm. We then describe the data
and procedures used in the research, and establish the baseline regression models
for the Cape Town property market. We next apply the training algorithms and
analyze the models. We close with concluding remarks.

u Literature Review

P r i c i n g o f P r o p e r t i e s w i t h Tr a d i t i o n a l H e d o n i c
Regression

According to Goodman and Thibodeau (1998), the hedonic regression technique


has given crucial procedures for analyzing commodities that had previously
seemed extraordinarily complex. Real estate fell under the category of previously
hard to analyze commodities but the work of Rosen (1974) provided a framework
for real estate pricing. The central idea behind the hedonic regression is that
different unit (attributes) of a property are aggregated to develop a property price.
The use of this technique can be traced to Court (1939), who used it in the
automobile industry and also to Lancaster (1966), who relates it to the bundle of
characteristics that provide utility to consumers. Malpezzi, Ozanne, and Thibodeau
(1980) relate real property to consumer goods like groceries having differing sizes
and items. Relating this to the different component features of property, Sirmans,
Macpherson, and Zietz (2005) report that property is a bundle of structural,
environmental, and spatial attributes. These attributes differ one property from
another; therefore, the hedonic multiple regression would price the features on a
collective sample of many dwellings.
Des Rosier and Thériault (2008) list a number of areas where the hedonic multiple
regression model has been used and find it to be particularly important for the

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
3 7 8 u Ya c i m a n d B o s h o f f

property market due to the high level of competition between buyers and sellers.
In practice, the model has been used for pricing real estate for more than four
decades. Goodman (1978) used the hedonic regression model on a database of
1,835 single family dwellings in the New Haven standard metropolitan statistical
area (SMSA) to form submarkets and indices that determine relative prices of
housing services. However, despite the robustness of this model, the choice of
functional form and ability to capture spatial features are among its limitations.
On functional form specification, the economic theory fails to specify the
functional relation that a property price and its attributes should take. This
according to Crooper, Deck, and McConnel (1988), who led a number of scholars
to rely on goodness-of-fit criterion, as suggested in Rosen (1974), and Goodman
(1978) to select appropriate functional form. However, they note that when all
attributes are observed, linear and quadratic Box-Cox transformed variables give
accurate estimates of marginal attribute prices but this changes when certain
variables are not observed or replaced by proxies; in these instances, a linear
function outperformed the quadratic Box-Cox function. Goodman (1978) found
the frequently used linear form to be overly restrictive and favors the Box-Cox
transformation. Several criticisms trailed the use of Box-Cox function, leading to
a situation where some authors directly formulate a model structure without
recourse to the hedonic function (Borst, 2007).
In general, apart from the basic linear multivariate specification, researchers
commonly use the semi-log and log-log specifications. In the semi-log
formulations, the left-hand side of the equation, namely the dependent variable
(property price), is regressed against the linear arrays of structural and locational
characteristics. In the log-log formulation, both the dependent variable (property
price) and independent variables (structural features) are transformed. Therefore,
all assessment must take a particular form, which researchers in practice usually
specify in their model calibration. Kang and Reichert (1987) note the significance
of regression coefficients and that the prediction accuracy depends on the choice
of estimating technique and the functional form of the regression equation. Schulz,
Wersing, and Werwatz (2014) use the price and log-prices as dependent variable
while other variables remain in their linear format. They examine 18,444 single-
family transactions to estimate property prices and find the semi-log hedonic
regression to be a better choice. McCluskey et al. (2012) examine data on 2,694
properties in the Lisburn district of Northern Ireland using linear, semi-log, and
log-log hedonic regression models. They find that the semi-log model outperforms
the other models. The three studies show that log transformation could improve
the predictability of hedonic regression. However, a dissimilar result was achieved
with semi-log model in a study by McCluskey (2016) on a dataset of 46,689
properties (before the removal of outliers) in Kazakhstan using three scenarios,
which included analysis after the first data cleaning, second data cleaning, and
semi-logarithm. The best result (67%) was obtained after the second data cleaning
with the linear additive regression. The results of these studies show that
sometimes log transformation does not assure optimal results. Therefore, it is good
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 3 7 9

to first undertake a test that supports the need for log transformation of the
variables. There are studies that find an alternative way of avoiding the parametric
restrictions of the hedonic regression, which led to the introduction of ANNs in
modeling property prices.

The Application of ANNs in Mass Appraisal of


Properties

The ANNs have been applied in many studies of mass appraisal since the 1990s
and produced mixed results (McCluskey, et al., 2012). Several papers have been
written on the potential of the model within the real estate sector. While some
studies find the model as a useful tool for effective prediction of property prices,
others take a contrary position. For instance, Borst (1991), Evans, James, and
Collins (1992), Tay and Ho (1992), and Do and Grudnitski (1992) find it useful
in mass appraisal. Since the objective of ANNs is to overcome the functional and
nonlinear limitations of the hedonic regression, in most studies the prediction
accuracy of the ANNs is compared to the hedonic regression in analyzing data
for different countries. For example, Tay and Ho (1992) examine properties in the
Singapore; Do and Grudnitski (1992) in the United States; Evans, James, and
Collins (1992) in the United Kingdom; Borst (1995) in the U.S.; all establish that
ANNs perform better than the hedonic regression models. Similarly, studies such
as Nguyen and Cripps (2001) carried out in the U.S.; Limsombunchai, Gan, and
Lee (2004) undertaken in the New Zealand; Peterson and Flanagan (2009) also in
the U.S., all provide evidence that supports earlier studies that ANNs are better
than the hedonic regression models. However, a contrary position is taken in
Worzala, Lenk, and Silva (1995), Lenk, Worzala, and Silva (1997), and McGreal,
Adair, McBurney, and Patterson (1998). These studies also gave a caveat to the
assessment community against the use of ANNs in mass appraisal due to the
unreliable results they find in their studies.
Zurada, Levitan, and Guan (2011) utilize hedonic regression, ANNs, additive
regression (AR), M5P trees, SVMs, radial basis function neural networks
(RBFNN), and memory-based reasoning (MBR) in their analysis of properties in
Louisville, Kentucky. They find that AR, M5P trees, and SVMs perform better
than the ANNs, RBFNN, and MBR models. Lin and Mohan (2011) utilize hedonic
regression, ANNs, and ANR to examine mass appraisal in Amherst, New York.
They find ANNs to be better than the hedonic regression and the ANR in both
training and validation data. McCluskey et al. (2012) use linear, semi-log, and
log-log models on the one hand and the ANNs on the other to examine properties
in Lisburn, Northern Ireland. They find that the three regression models are better
at predicting property prices than the ANNs. McCluskey et al. (2013) conduct a
comparative study on the predictive abilities of the hedonic regression model, a
simultaneous autoregressive (SAR) model, a geographically-weighted regression
(GWR) model, and ANNs. They find that the GWR performs better than the other
models.

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
3 8 0 u Ya c i m a n d B o s h o f f

The studies of Peterson and Flanagan (2009) and McCluskey et al. (2012) are of
significant interest to this study because of the RESET test carried out to support
the case for the use of nonlinear models. Peterson and Flanagan (2009) observe
that hedonic regression models are open to pricing errors owing to the way means
are extrapolated from large samples; thus, they will also be exposed to significant
sampling errors. The authors note that specification error is also unavoidable in
ad hoc specifications and to the extent that value does not map linearly on to
property characteristics, so too are errors dues to neglected nonlinearities. Again
to the point that nonlinear models integrate linear forms, then nonlinear models
would be the desired choice. However, the exact nonlinear form is neither apparent
nor are their practical steps one would take to find the correct form. The ANNs
do, however, provide a practical alternative to the conventional least squares form
(including nonlinear least squares) that is easily implementable and which
efficiently models nonlinearity in the underlying relations (including the
parameters) (Peterson and Flanagan, 2009).
The above studies provide inconsistent results that reveal different model
performances. McCluskey and Anand (1999) provide a summary of different
methods to reveal their strengths and weaknesses. They conclude that almost all
methods have limitations in terms of prediction accuracy, transparency, and ease
of application. Kauko and d’Amato (2008) observe these limitations and suggest
that certain pertinent issues should be reflected upon. These are: Is the challenge
on the accuracy of techniques? Is it on the feasibility of using the model in mass
appraisal environment or on some non-technical portion that requires a more
detailed analysis, such as making adjustments to the model structure? It is
expected that when these issues are examined, some improvements or
modifications should be implemented so that optimality could be realized in a
model(s). Such improvement may be reflected in the model structure or data
quality.
McCluskey et al. (2012) utilize the different regression models having observed
its limitations in handling nonlinearity in property data relative to the ANNs. We
extend their work by improving on the structure of the ANNs in the field of mass
appraisal. The ANNs have a structure that permits the adjustment of weights on
the basis of patterns in the property dataset. Should discrepancies occur, the
weights are altered changing the original state of the network, so the system
appears to learn (McCluskey, 2012). This alteration is performed with the aid of
the back propagation (BP) learning algorithm. BP uses the gradient descent search
method to modify connection weights in order to minimize error between the
actual and desired output. The algorithm is simple in execution and suitable for
solving different problems. However, foremost among its challenges are its
inability to continuously achieve global optimum, longer iteration, and possibility
of getting entrapped into local optima.
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 3 8 1

u Methodology and Algorithms

Hedonic Regression Models

Janssen and Söderberg (1999, p. 361) report that the hedonic regression gives the
framework for the assessment of ‘‘differentiated goods like housing units whose
individual features do not have observable market prices.’’ The model takes each
attribute of a property, measures their contribution, and assigns a weight. These
attributes in the aggregate will produce the price of a property. The ability of the
hedonic regression to capture complex interactions that exist within the property
variables and provide estimates in an explicit and transparent manner makes it
attractive, acceptable, and widely used for property tax assessment (McCluskey et
al., 2012; McCluskey et al., 2013). However, McCluskey et al. (2012) note that
three important elements must be addressed for this model to be effective: careful
selection of dependent and independent variables; choice of functional form; and
the statistical relevance and contribution of the independent variables to the model.
In this study, the dependent variable is the assessed values of the properties and
the independent variables are selected based on the a priori expert knowledge that
they are likely to have a positive influence on property prices. The functional form
to be used in hedonic modeling is not specified in the economic theory; therefore,
we employ the linear, semi-log, and log-log models. Palmquist (1979) points out
that selection must be approached from an empirical viewpoint. The general
formulation of a linear model is given in equation (1):

Y 5 b0 1 b1 x1 1 b2 x2 1 zzz 1 bs xs 1 « (1)

or

Y 5 b0 1 o i51
S
bi xi 1 «,

where Y is the assessed values of properties; x1 , x2 , ... , xs are the independent


variables; b1 , b2 , ... , bs are coefficients or prices per unit assigned by the model
to the independent variables; b0 is the regression constant; and « is the error term.
To model a nonlinear interaction between variables, the semi-log and log-log
models are given in equations (2) and (3):

LogY 5 b0 1 b1 x1 1 b2 x2 1 zzz 1 bs xs 1 « (2)

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
3 8 2 u Ya c i m a n d B o s h o f f

and

LogY 5 b0 1 b1 Logx1 1 b2 Logx2 1 zzz 1 bs Logxs 1 «. (3)

The importance of log transformation is noted in Schulz, Wersing, and Werwatz


(2014). These authors observe that using the log transformation could partially
remediate the inherent heteroscedasticity of property prices. It also makes the
effective relations among the variables nonlinear while still preserving the linear
model. Furthermore, Gloudemans (2002) reports that of the three functional forms,
the linear additive model is widely used for single-family dwellings, as it is easy
to calibrate and allows the contribution of each variable to be added to the model.
However, foremost among its limitations are the inability to deal with the problems
of spatial dependence and spatial heterogeneity, which are commonly found in
property markets. We concentrate on dealing with the functionality and nonlinear
limitations of hedonic regression models through the use of ANNs. Since the ANN
models have the capability of recognizing patterns, the influence of spatial effects
on property data could easily be suppressed.

Artificial Neural Network Models

According to Engelbrecht (2007), the human brain is a complex, nonlinear, and


parallel computer with the ability to handle tasks at a greater speed, such as pattern
recognition, perception, and motor control, than any computer, irrespective of the
shortness of time or moments at which an event occurs for neural systems. The
ability of the human brain to learn, memorize, and generalize well created research
interest to artificially mimic the biological neural systems (BNSs) commonly
known as the ANNs. The BNSs are nerve cells also referred to as neurons
containing the cell body, dendrites, and an axon massively interconnected. The
connection between the axon of the neuron and the dendrite of another neuron is
commonly referred to as a synapse (Exhibit 1). The signals are conveyed from
the dendrites through the cell body to the axon from where the signals are
transmitted to all connected dendrites.
The artificial neuron is modeled after the biological neuron of the human brain.
The artificial neurons (ANs) are units in the ANNs that receive one or more inputs
as numerical values associated with their respective weights. A bias or threshold
level is added as an additional input value to the summation function. The summed
value is passed to the next phase to execute the activation function, which produces
the output from the neuron, as shown in Exhibit 2.
The ANNs have a multilayer perceptron as the most popular network architecture,
consisting of an input layer, an output layer, and at least a hidden layer for
processing the nonlinear elements. The least (one) number of hidden layers is a
reminiscence of the suggestion given in Masters (1993) that a hidden layer should
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 3 8 3

E x h i b i t 1 u Biological Neuron

Adopted from Engelbrecht (2007, p. 6).

E x h i b i t 2 u Structure of the Artificial Neuron

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
3 8 4 u Ya c i m a n d B o s h o f f

be the initial choice for any practical ANN design. In this regard, Lin and Mohan
(2011) reveal that a single hidden layer is adequate for ANNs to approximate any
complex nonlinear function and achieve accuracy (Hornik, 1991). We use a hidden
layer to construct the architecture of the ANNs. The values of the input variables
are sent into the network via the input layer to the hidden layer. The number of
input variables in the network is determined by the configuration of the input data.
The output layer is fully connected to the neurons of the input layer and this line
of connection runs through all the units of the network.
The number of neurons in the hidden layers is a matter of user discretion through
a trial and error process. Kwok and Yeung (1997) propose that the method of
finding the optimal number of hidden neurons is to begin with a lesser number
of neurons; the number should be increased until the desired result is found. We
follow the same procedure of optimal selection of the number of hidden neurons
as used by Lin and Mohan (2011) and McCluskey et al. (2013: 250). Another
important element of the neural networks is the transfer function that determines
the relation between inputs and output (target) of the neuron and its network. The
transfer function we employ is the tan-sigmoid used in the neurons of the hidden
layer and the linear transfer function used in the neurons of the output layers. The
network compares the output with the actual value to ascertain its accuracy by
the total mean squared error. In building a model, the usual practice is to split the
data into two or three segments, such as for training, testing, or validation
processes.
The ANNs utilize a number of training algorithms including conjugate gradient
(CG), LM, and BP being the most widely used, amongst many others.
Back Propagation Algorithm. The back propagation (BP) algorithm is not the first
training rule for ANNs (Engelbrecht, 2007). However, in most real estate mass
appraisal assessments, the BP is the most commonly used algorithm in training
the ANNs. The sum squared error given in equation (3) is required to evaluate
the training process:

O O (d
M L
1
E(y, w) 5 lm 2 olm )2, (4)
2 m l

where y and w are the network input and weight vectors, respectively; m is the
index of patterns, from 1 to M, in which M denotes the number of training patterns;
l is the index of outputs, from 1 to L, in which L is the total number of outputs;
k and p are indices of weights, from 1 to N, where N is the total number of
network weights and dlm and olm are desired and actual values of the mth output
and the lth patterns. Yu and Wilamowski (2011) note that the gradient g is defined
as the first-order derivative of the total error function (equation (4)) as:
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 3 8 5

g5
­ E(y, w)
­w
5 F ­E ­E
­ w1 ­ w2
zzz
­E
­ wN G T

. (5)

With the definition of g in equation (5), the update rule of the gradient/steepest
descent algorithm is written as:

wr11 5 wr 2 a gr , (6)

where r is the index of iterations and a is the learning constant (step size taken
in the negative direction of the gradient). The training process of the gradient
descent algorithm is asymptotic (approaching a curve arbitrarily closely)
convergence. The convergent behavior of BP is dependent on the choice of the
initial values of the network connection weights, which is related to the network
parameters including the learning rate and momentum. This procedure could
sometimes be problematic because of the tendency to over train the network and
slow convergence. Training speed could be significantly increased by the use of
second-order algorithms including Newton and LM, scaled conjugate gradient, and
Powell-Beale amongst others. Additionally, Openshaw (1998) suggests that
another way of easing this problem might be to use meta-heuristics (GA, PSO,
etc.) to design and train ANNs. The use of these training algorithms is amongst
others the assignment of this study.
Levenberg-Marquardt Algorithm. The Levenberg-Marquardt (LM) algorithm was
principally designed by Levenberg (1944) and Marquardt (1963) to provide a
numerical solution to nonlinear least squares problems. This algorithm has the
properties of gradient descent stability (slight variation in training error even if
there is a change in training data point) and Gauss-Newton (GN) speed, which
gives it an advantage over all the others. These attributes are what carved a unique
niche to LM as a versatile ANN training algorithm because in many respects
convergence speed is assured despite the complexities of the error surface.
According to Yu and Wilamowski (2011a), the central idea of LM is that its
training process is combined, switching between two algorithms. Therefore, if the
combination coefficient m is small, the GN algorithm is utilized but when the
combination coefficient is large, the gradient/steepest descent algorithm is
employed. Consequently, an adjustment is made to m parameters so that the
network converges to an optimum desired position. The GN algorithm simplifies
the calculation of the second-order derivative while the LM is considered a trust-
region that modifies the GN method (Battiti, 1992).

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
3 8 6 u Ya c i m a n d B o s h o f f

The sum squared error (E) in equation (4) is used to evaluate the training process
in LM. The update rule of LM is given as (Yu and Wilamowski, 2011):

Dwr 5 (Hr 1 m I)21gr , (7)

where I is the identity matrix, g is the gradient vector, the Hessian matrix is given
as H, and m is the combination coefficient. The Hessian matrix (H) and gradient
vector (g) are defined in equations (8) and (9) as:

3 4
­2E ­2E ­2 E
zzz
­w21 ­w1 ­w2 ­w1 ­wN
­2E ­2E ­2E
zzz
­w2 ­w1 ­w2 ­w2 ­w2 ­wN
H5 (8)
zzz zzz zzz zzz

­2E ­2E ­2E


zzz
­wN ­w1 ­wN ­w2 ­w2N

34
­E
­w1
­E
­w2
g5 . (9)
zzz

­E
­wN

According to Yu and Wilamowski (2011), the second-order derivative of E in


equation (8) has to be computed in order to perform the update rule in equation
(6). The process according to Battiti (1992) and Yu and Wilamowski (2011a,
2011b) is complicated. Furthermore, in Hagan and Menhaj (1994) and Yu and
Wilamowski (2011) implementation of the LM algorithm, Jacobian matrix (J) is
introduced so that the process of computation will be simplified by avoiding the
second-order derivatives as:
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 3 8 7

34
­e11 ­e11 ­e
zzz 11
­w1 ­w2 ­wN
­e12 ­e12 ­e
zzz 12
­w1 ­w2 ­wN

: : zzz :

­e1L ­e1L ­e
zzz 1L
­w1 ­w2 ­wN

J5 : : zzz : . (10)

­eM1 ­eM1 ­e
zzz M1
­w1 ­w2 ­wN
­eM2 ­eM2 ­e
zzz M2
­w1 ­w2 ­wN

: : zzz :

­eLM ­eLM ­e
zzz LM
­w1 ­w2 ­wN

By combining equations (4) and (9), the gradient vector elements can be calculated
as:

O O S­d ­w2 o D
M L
­E lm lm
5 dlm 2 olm . (11)
­wk m51 l51 k

So that the relation between Jacobian matrix (J) and gradient vector (g) can be
presented by:

g 5 J Te. (12)

By combining equations (4) and (8), the Hessian matrix (H) can be computed
as:

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
3 8 8 u Ya c i m a n d B o s h o f f

OO
M L
­2E
5
­wk ­wp m51 l51

3 S ­dlm 2 olm ­dlm 2 olm


­wk ­wp
1
­2dlm 2 olm
­wk ­wp D
dlm 2 olm

O O ­d ­w2 o
M L
lm lm ­dlm 2 olm
5 .
m51 l51 k ­wp (13)

So that the relation between Jacobian matrix (J) and Hessian matrix (H) can be
described by equation (14) as:

H 5 J TJ 5 Q, (14)

where matrix Q is the approximated Hessian Matrix (H) known as the quasi
Hessian Matrix. Therefore, the LM update rule is achieved by integrating
equations (6) and (11) with equation (14):

D wr 5 (J rT Jr 1 m I)21J Tr er . (15)

The error vector is (e). To implement the LM algorithm, equation (15) is used.
The procedure is to first calculate and store the Jacobian matrix, then carryout
matrix multiplications of equations (12) and (14) for more weight update. In the
Jacobian matrix (J) defined in (10), there are M 3 L 3 N elements that needed
to be stored. Wilamowski, Kaynak, Iplikci, and Efe (2001) and Yu and
Wilamowski (2011) report that the LM algorithm is effective for solving problems
with small- and medium-sized training patterns but ineffective in handling large-
sized training patterns because of its memory limitation. To support this view,
Wilamowski and Yu (2010) observe its limitation in solving a parity-16 problem
of 65,536 patterns and suggests an improvement of not storing Jacobian matrix
but replacing Jacobian matrix multiplication with vector operations so that
problems with unlimited number of training patterns are solved. The size of
training patterns in this study is considered adequate (small) for the LM algorithm;
however, for a large-size pattern, the approach suggested and used in Wilamowski
and Yu (2010) should be the preferred.
LM has been used in many fields for ANN training with optimal results. The only
mass appraisal/valuation study found that involved LM is the study of El
Hamzaoui and Perez (2011) to determine selling prices in Casablanca, Morocco.
The study is of exploratory nature and finds the algorithm as a useful model in
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 3 8 9

ANN training. This research is considered to be more comprehensive in the


analysis of residential properties to ascertain its capability in ANN training.
Conjugate Gradient Algorithm. Unlike the BP algorithm that adjusts weight in the
direction of the negative gradient of the error surface, the conjugate gradient (CG)
algorithm performs searches in the conjugate paths, which usually gives
convergence that is faster than searches made along gradient descents. The
algorithm trades off the simplicity of gradient/steepest descent and fast quadratic
properties of Newton’s method (Engelbrecht, 2007). Many variants to the CG
algorithm are used in training feed-forward ANNs including the Fletcher-Reeves
CG, Polak-Ribiére CG, Powell-Beale Restarts CG, and Scaled Conjugate Gradient.
In our study, Powell-Beale Restarts and scaled CG are the primary focus. This is
because the Powell-Beale Restart method gives an opportunity of restarting if there
is little orthogonality (shift or change) left between the current and previous
gradient (Sharma, Sharma, and Kasana, 2007). Again, to save time, which is akin
to the CG during line search at each iteration step, the scaled CG utilizes a step-
size scaling approach for line search per iteration.
Scaled Conjugate Gradient Algorithm. The scaled conjugate gradient (SCG)
algorithm is another approach used in estimating the step size (Møller, 1993). This
algorithm is unique in training feed-forward ANNs because it combines LM with
CG algorithms. Accordingly, the process involves introduction of a scalar
parameter denoted as br in CG, where r represents 0, 1, 2, 3 ... , N, the step-size
ar is positive, and the path pr is generated by the equation:

pr11 5 ur11 gr11 1 br pr . (16)

This is the search path, also known as scaled CG algorithm. In equation (16), the
parameters that are to be determined are ur11 and br . Thus if ur11 5 1, the result
is a classical CG algorithm based on the value of scalar parameters br (Andrei,
2007). Conversely, if scalar parameter is 0 ( br 5 0), then another class of
algorithms that support the choice of parameter ur11 evolved. Looking at br 5 0,
it is possible for ur11 to assume a positive scalar or negative definite matrix. These
possibilities are either a change in the paths of gradient descent or Newton
algorithms. If ur11 5 1, a gradient descent algorithm ensued. But if ur11 5
= 2ƒ(xr 1)21, or an estimation of it (Andrei, 2007), then it shifts to the path of
Newton or quasi-Newton algorithms. Again if ur11 Þ 1 is chosen in a quasi-
Newton way and br Þ 0, equation (16) characterizes a blend of quasi-Newton and
CG techniques. According to Andrei (2011), if ur11 is a matrix that has
information about an inverse Hessian of function ƒ, then it is better to use pr11 5
2ur11 gr11 because adding the term br pr might affect the path pr11 from being a
descent except if the line search is accurately sufficient.
A scaled conjugate descent algorithm has been applied in ANN training for parity
problems using 20 different initial weight vectors with a momentum set at 0.9 for

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
3 9 0 u Ya c i m a n d B o s h o f f

all simulations. The result indicates that SCG converges faster than a standard BP,
CG with line search, and Brayden-Fletcher-Goldfarb-Shanno memory-less quasi-
Newton algorithms (Møller, 1993). Orozco and Garcı́a (2003) use SCG to avoid
time-consuming line search per iteration of other second-order CG algorithms in
classifying two types of infant cries. Ambarish and Saroj (2016) use the SCG
algorithm in the prediction of soil moisture content for the control of farm
irrigation in eastern India. They find that SCG performs better than Broyden
Fletcher Goldfarb Shanno (BFGS) in training feed-forward ANNs. With this in
mind, SCG is proposed for ANN training in the mass appraisal of properties.
Powell-Beale Conjugate Gradient Algorithm. In the conjugate gradient algorithm,
the search path is reset to the negative position of the gradient periodically. When
number of weights and biases in the network equates the number of epochs, the
standard reset point is said to occur, however, other reset approaches such as
Powell-Beale (PBCG) are used to influence the efficiency of network training.
These approaches are proposed by Powell (1977) based on the earlier work of
Beale (1972). Powell’s proposition is that a restart should ensue if the
orthogonality between the new and the former gradient is left to a very minimal
amount. This is defined by the following condition to reset the steepest descent
path:

u gTr21 gr u $ a \ gr \2, (17)

where a is the restart/reset factor, which has a range of 0.1, 0.2, ... 0.9 (Powell,
1977), g is the gradient and subscript, and r is the gradient index. The major
shortcoming of this algorithm is the expensive nature of computation of line search
requirements for each iteration step. This algorithm has been utilized in ANN
training for identifying the presence of red palm weevil (Al-Saqer and Hassan,
2011). Specifically their study shows that although Powell-Beale CG training
algorithm consumes less training time, the SCG algorithm outperformed the
Powell-Beale CG in their assessment.

u Methods and Procedures

T h e C a p e To w n A s s e s s m e n t

The city of Cape Town is 944 square miles. It is located on a peninsula beneath
Table Mountain on the southwest coast of South Africa. It is the provincial capital
of the Western Cape and the legislative capital of the country. There are other
municipalities that are adjacent to it including Swartland and West Coast to the
north; Drakenstein, Cape Winelands, and Stellenbosch to the northeast; and
Theewaterskloof, Overberg, and Overstrand to the southeast. The city of Cape
Town has centralized its property tax assessment under the city valuation office
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 3 9 1

(CVO). The CVO is the body responsible for the valuation of 915,148 residential
and commercial properties in Cape Town (KPMG, 2015). The CVO has created
1,555 neighborhoods and 31 submarkets for the purpose of assessment and
reassessment.

The Data

The data we use were supplied by the CVO in Cape Town. The data contained
some missing, extreme values and unreliable transactions that were removed
because leaving them could affect effective assessment. In the original data, there
are 46 property variables and 3,526 but after the removal of questionable
transactions, cost related and variables that simply describe the property such as
single-family dwellings, 3,232 property transactions and 11 variables were left for
analysis. The sample was considered adequate for the analysis because the
methods are not in any case controlled by the number of properties that can be
handled (McCluskey and Anand, 1999). There are text data, such as property
quality, condition, building style, and property view, which were renamed and
assigned numeric values for ease of assessment. The assignment of numeric values
is done such that for say property quality, six bands reflect poor, fair, average,
good, very good, and excellent factors, which were assigned the values of 1–6,
respectively. The same procedure is used for all other text data (condition, building
style, and property view).
Furthermore, in the variable building style, a dummy variable for group
housing (style grouph) is found in one transaction, so this was added to the
style medit/t category. Transactions above four stories were included in the
modeling activity but recoded as three stories because they are few in number.
Transactions with zero stories were considered ground floor or bungalow. The two
transactions found in the category are recoded to reflect the one-story category.
The recoding is consistent with the study of Guan, Zurada, and Levitan (2008),
which is designed to ameliorate the problem of dimensionality that is capable of
reducing the strength of a model. Depending on the platforms, binary, continuous,
and categorical variables are used. The variables are left in their categorical and
continuous states when the ANNs are used but when the hedonic regression
models are applied, the binary dummy format is used. This is necessary due to
the limitations of ANNs when there are a large number of variables (McCluskey
et al., 2013; Feng and Jones, 2015). For instance, the variable conditions and
quality are coded between 1 and 5 or 6 when the ANNs is used but when the
OLS, semi-log, or log-log models are used, this was coded 1 and 0 (with 1 if the
categorical condition is met, 0 if otherwise) leaving out the most commonly
occurring category to avoid the dummy variable trap that might lead to a condition
of perfect multicollinearity ensuing the failure of the regression program (Greene,
2003; Borst, 2007). Exhibit 3 contains a summary of the property data.
The number of variables (11) is carefully chosen using pragmatic and realistic
approaches. They are also considered likely to be value significant in line with

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
3 9 2 u Ya c i m a n d B o s h o f f

E x h i b i t 3 u Property Variables Description

Variable Description

Assessed values Assessed values of properties in South African Rand (1 USD 5 15 ZAR)
Beds Total number of bedrooms
Quality Quality grade factor of construction
Condition Condition /state of property
Storey Total number of stories
Bld style Building design and architectural style
View Quality grade factor of property
RMOS Reverse month of sale
Size Size of property in square meters
Pool Size of swimming pool in square meters
SMKT Locational variable identifying submarkets

E x h i b i t 4 u Descriptive Statistics of Property Variables

Variable Mean Std. Dev. Min. Max.

Assessed values 4,483,474.29 3,117,754.04 824,000 38,000,000


Beds 3.56 0.99 1.0 10
Quality 3.49 0.62 1.0 6.0
Condition 3.51 0.63 1.0 5.0
Storey 1.52 0.55 1.0 3.0
Bld style 3.03 0.43 1.0 7.0
View 3.58 0.96 1.0 6.0
RMOS 14.89 8.16 1.0 29.0
Size 177.45 78.91 31 599
Pool 13.97 18.36 0.0 154

previous multivariate analyses of this nature. Exhibit 4 presents the descriptive


statistics of the property data. The statistics illustrate the variability within the
data. The mean assessed value in the sample is R4,483,474.29 while the average
size of property is 177 square meters. The smallest property assessed in terms of
size is 31 square meters. The average (mean) number of stories in the sample is
two. The average property in the sample has four bedrooms. The total transactions
have been stratified into 70% training and 30% testing datasets in accordance with
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 3 9 3

acceptable norm that data should be partitioned to allow for modeling or training
and testing. The stratification is done using WEKA explorer via a resample
procedure.
The sales date (temporal or time trend) is very important in any modeling activity.
This becomes necessary due to the fluctuation in property prices over time in
different geographical areas of the property market. Different techniques are
employed to incorporate time trend into a model including paired sales analysis,
resale analysis, incorporating time variables in regression models, and using
current appraised value (Gloudemans, 1990). Borst (2008, 2009) suggests the
application of reverse month of sale (RMOS), Fourier expansion, quarterly or
semi-annual binary (dummy) variables, and the linear spline method to capture
time trend in a model. In this study, the RMOS and semi-annual dummy variables
are used for ANNs and hedonic regression models to account for time trend.
Again, while the RMOS are used for modeling with ANNs, the semi-annual
dummies are used for modeling prices with the hedonic regression. The date of
sale in the sample data covers a period from January 2012 to May 2014.
Consequently, in using the RMOS, the most recent month in the sample is
RMOS 5 1, the next is RMOS 5 2, and this continues to the oldest month of
sale, assigned the RMOS 5 29. There are five bands created to reflect the semi-
annual dummies. The notation SA1 is used to represent the first half year, which
equals 1 for RMOS 5 1–6 and is 0 otherwise; SA2 is used to describe the second
half year, which equals 1 for RMOS 5 7–12 and is 0 otherwise. For the third
half year, the notation SA3 equals 1 for RMOS 5 13–18 and is 0 otherwise. This
is done for all periods until the last quarter; however, to mitigate dimensionality
and dummy trap problems, the most occurring category (SA4) was excluded from
the analysis.
Having taken into account the temporal aspect in the data, the next is to account
for the spatial nature within the data. McCluskey and Borst (2007) report that
there are several techniques by which location is specified and calibrated in a
model. These use market segmentation that groups properties based on the
different segments, a neighborhood delineation variable that acts as an indicator
reflecting the particular neighborhood the property belongs to, accessibility
measures, and explicit use of location and advanced model specification methods.
For further information, see McCluskey and Borst (2007). The techniques are
sometimes used together with others in a single study depending on the model
capability, specification, and availability of location element(s) within the data.
We utilize the neighborhood delineation variable to indicate the submarket location
of the property. The neighborhoods and submarkets were delineated by tax
assessors for ease of identification and assessment. There are 181 neighborhoods
and 31 submarkets in the property data. The neighborhoods are used to establish
15 submarket dummy variables, excluding the most occurring submarket
(SMKT54). Other studies that use the neighborhood delineation or submarket
dummy variable to account for location are Wilhelmsson (2002), Bourassa,
Cantoni, and Hoesli (2007), and Lin and Mohan (2011). Wilhelmsson (2002) uses

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
3 9 4 u Ya c i m a n d B o s h o f f

13 submarket dummy variables from a previously defined administrative parish


to account for location in the hedonic regression model. Bourassa, Cantoni,
and Hoesli (2007) create 33 submarket dummies to improve their results and
Lin and Mohan (2011) create 66 dummies from neighborhood codes to form
submarkets/clusters in their analysis.

Performance Measurement

The Spatial Analysis in Macroecology (SAM v4.0) and Statistical Package for the
Social Science (SPSS) v21 were used for assessment of property prices with
hedonic regression models. MatlabR2013b and Waikato Environment for
Knowledge Analysis (WEKA, 3.6) explorer were used for property pricing with
the ANNs. The accuracy test statistics used to explain the performance models
are root mean squared error (RMSE), squared coefficient correlation (R2 ), and
mean absolute error (MAE). The RMSE formula is:

RMSE 5 !O (y 2 ŷ ) 2/n.
i
i i (18)

The RMSE is the square root of the average of the squared values of the estimation
error. RMSE usually assigns higher weights for large errors than smaller errors.
The MAE formula is:

O u ŷ 2 y u.
n

i i
i51
MAE 5 (19)
n

The MAE deals with errors uniformly based on their sizes. It is the average of
the absolute values of the estimation errors. The R2 formula is:

O (y 2 ŷ ) .
n
2
i i
i51

O (y 2 y )
2
R 512 n (20)
2
i i
i51

Thus apart from squared coefficient correlation R2 that measures correlation


between the actual and predicted values that must have a reasonable goodness-of-
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 3 9 5

fit well above 50%, the model with the lowest MAE and RMSE is considered a
better method. Another important measure of model performance we use is
Akaike’s information criterion (AIC). This accuracy test statistic is designed to
choose from many competing models the optimal model based on the maximum
likelihood criterion of the model parameter. McCluskey and Borst (2011) report
that the estimates of bi are based on the least squares and the maximum likelihood
estimates of the model parameters are identical. This permits the expression of
AIC in terms of statistics available from the OLS regression as:

AIC 5 n 1 n In(2p) 1 n log S D


RSS
n
1 2K, (21)

where K is the estimated parameters in the model that include the intercept and
ś 2 (McCluskey and Borst, 2011) while the RSS denotes the sample residual sum
of squares. The model with the minimum AIC is adjudged the best in this study.
To measure model quality in line with consistency and uniformity, the benchmark
tests acceptable to the International Association of Assessing Officers (IAAO,
2013) is used. These are coefficient of dispersion (COD) and price related
differential (PRD). The COD measures assessment uniformity and variability of
how much the value ratios vary from the medium ratio:

ou Zi 2 Zm u
COD 5 X100. (22)
nZ

PRD on the other hand is used to measure the consistency of valuation ratios
between low-valued properties and high valued properties. In essence this
technique measures the vertical equity of appraisals, which should be less than
1.0 (progressivity), because measures above 1.0 suggest regressivity (IAAO,
2013):

o Zi
n
PRD 5 , (23)
o PVi
o AVi

where Zi depict PVi /AVi ; PVi is the predicted value of the ith property; AVi is the
assessed value; Zi is the ratio between the predicted value and the assessed value
of the ith property; and Zm is the median of Z1.

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
3 9 6 u Ya c i m a n d B o s h o f f

u Analysis and Discussion of Results


This section contains the results of the hedonic regression models and the various
ANN training algorithms. The model building is done in categories with a view
to selecting the optimal model within the same category before a general
comparative analysis on the best two performed models selects the best model
based on the criteria.

E s t a b l i s h i n g a B a s e l i n e M o d e l f o r t h e C a p e To w n
Property Market

We first examine neglected nonlinearities in the linear model to establish the case
for using the nonlinear techniques. Peterson and Flanagan (2009) report that those
supporting the use of ANNs base their argument on the neglected nonlinearities
observed in the linear models. Consequently, a RESET test that applied the
predictions of the ANNs as applied by Peterson and Flanagan (2009) and
McCluskey et al. (2012) is used to justify the application of nonlinear models.
The predictions of ANNs are included as an additional regressor in the OLS.
Therefore, if the OLS specification is:

y 5 X b 1 m, (24)

and the predictions of the ANNs are contained in the vector m, then a test of
neglected nonlinearities is equivalent to a t-test on HO: w 5 0 in the regression:

m 5 X b 1 w m 1 v. (25)

The null of no neglected nonlinearities is easily rejected at any conventional level


of significance, which reveals the need for using the nonlinear techniques in the
data. Exhibit 5 provides the summary.

The Baseline Regression Model

The linear, semi-log, and log-log models are tested based on their R2, adjusted
R2, AIC, and F-statistics to show levels of their statistical acceptability and
confidence. Apart from the RESET test undertaken above, McCluskey (2016)
observes that variable transformation into their log form makes it possible to
explore the presence of nonlinear relations in the data.
The R2 statistic demonstrates the variation in property prices across the linear,
semi-log, and log-log models. The parameter estimates reveal that 58.9%, 69.4%,
E x h i b i t 5 u OLS Test for Neglected Nonlinearities

Variable Coeff. Std. Coeff. VIF Std. Error T-stat. P-value

Constant 184,128.64 0.000 0.000 108,996.235 1.689 .091


ANNs PRED 0.17 0.302 5.434 0.020 8.279 .000
Beds1 2268,912.33 20.016 1.029 268,324.101 21.002 .316

I m p a c t
Beds2 59,294.49 0.012 1.158 86,323.676 0.687 .492
Beds4 215,482.94 20.005 1.350 55,025.949 20.281 .778
Beds5 84,891.11 0.018 1.294 84,908.868 1.000 .317

o f
Beds6 241,474.57 20.005 1.193 140,961.806 20.294 .769
Beds7 1.069 300,670.729 .507

A r t i f i c i a l
2199,621.72 20.011 20.664
Beds8 2718,658.19 20.023 1.048 489,621.563 21.468 .142
Beds9 1.014 900,162.925 .532
J R E R

2563,217.54 20.010 20.626


Beds10 21,611,303.8 20.020 1.023 1,278,560.17 21.260 .208
Poor quality 281,408.02 0.008 1.898 779,474.055 0.361 .718
u

N e u r a l
Fair quality 282,547.11 0.018 1.268 270,141.889 1.046 .296
Vo l .

Good quality 120,315.50 0.042 2.572 71,753.480 1.677 .094


Very good quality 692,168.80 0.049 1.145 236,679.916 2.924 .003
4 0

Excellent quality 224,460.42 0.018 1.391 226,330.294 0.992 .321

N e t w o r k s
u

Poor condition 671,187.54 0.022 1.901 659,403.960 1.018 .309


N o .

Fair condition 10,129.23 0.001 1.265 213,404.801 0.047 .962


Good condition 221,277.10 20.007 2.507 71,271.145 20.299 .765
3 – 2 0 1 8

Excel. condition 51,451.23 0.008 1.524 124,717.101 0.413 .680


Storey 2 1,740.85 0.001 1.500 54,616.097 0.032 .975

u
3 9 7
Storey 3 180,262.75 0.021 1.369 158,106.703 1.140 .254
3 9 8
E x h i b i t 5 u (continued)
OLS Test for Neglected Nonlinearities

u
Ya c i m
Variable Coeff. Std. Coeff. VIF Std. Error T-stat. P-value

Sub-economic style 25,557.41 0.000 1.020 737,259.479 20.008 .994

a n d
Unconventional style 223,656.01 20.003 1.262 157,940.060 20.150 .881
Georgian victor style 208,584.53 0.016 1.035 202,106.906 1.032 .302

B o s h o f f
Cape Dutch style 2304,658.87 20.014 1.019 354,544.091 20.859 .390
Maisonette style 108,745.01 0.008 1.042 213,405.683 0.510 .610
Mediterranean style 240,112.35 20.001 1.034 575,263.505 20.070 .944
Partly obstructed view 250,059.37 20.007 1.091 109,684.505 20.456 .648
Below average view 232,586.80 20.002 1.019 293,559.725 20.111 .912
Above average view 153,332.36 0.051 1.283 53,134.430 2.886 .004
Panoramic view 343,931.27 0.085 1.510 77,445.606 4.441 .000
Excellent view 211,128.77 20.001 1.127 185,853.749 20.060 .952
Smkt48 2256,405.29 20.020 1.158 213,843.225 21.199 .231
Smkt50 2173,579.83 20.031 1.492 106,282.625 21.633 .103
Smkt52 2245,923.90 20.065 1.821 79,270.687 23.102 .002
Smkt53 288,541.59 20.021 1.565 84,031.206 21.054 .292
Smkt55 519,394.25 0.097 2.317 127,637.310 4.069 .000
Smkt56 158,524.51 0.031 1.529 100,060.322 1.584 .113
Smkt64 2291,337.73 20.012 1.034 371,839.580 20.784 .433
Smkt65 243,415.82 20.001 1.008 897,615.767 20.048 .961
Smkt66 2150,383.35 20.019 1.183 133,362.619 21.128 .260
E x h i b i t 5 u (continued)
OLS Test for Neglected Nonlinearities

I m p a c t
Variable Coeff. Std. Coeff. VIF Std. Error T-stat. P-value

Smkt67 151,260.73 0.033 1.574 90,234.646 1.676 .094

o f
Smkt68 243,594.95 0.043 1.422 105,282.361 2.314 .021

A r t i f i c i a l
Smkt69 2110,396.79 20.008 1.156 221,757.391 20.498 .619
Smkt70 2275,218.52 20.028 1.303 174,352.198 21.579 .115
J R E R

Smkt73 2196,250.27 20.004 1.011 734,343.680 20.267 .789


SA1 22,488.97 0.006 1.440 69,159.491 0.325 .745
SA2 275,546.55 20.021 1.466 67,391.449 21.121 .262
u

N e u r a l
Vo l .

SA3 260,514.74 20.017 1.456 68,269.379 20.886 .375


SA5 1,074.55 0.000 1.424 69,578.385 0.015 .988
Size 2.778 0.000 2.155 413.752 0.007 .995
4 0

Pool 2599.422 20.008 1.316 1,389.581 20.431 .666

N e t w o r k s
u

Note: The dependent variable is Residual. AIC 5 100,046.80.


N o .
3 – 2 0 1 8

u
3 9 9
4 0 0 u Ya c i m a n d B o s h o f f

E x h i b i t 6 u Goodness-of-Fit Measurements for Linear, Semi-log, and Log-log Models

Model

Linear Semi-log Log-log

R2 0.589 0.694 0.687


Adj. R2 0.583 0.689 0.682
F-statistic 89.441 141.527 137.045
AIC 103,057.028 1,461.717 1,533.562

E x h i b i t 7 u Performance Comparison of Linear, Semi-log, and Log-log Models

Model

Test Linear Semi-log Log-log

Median ratio 1.055 1.027 1.032


Mean ratio 1.087 1.044 1.046
PRD 1.08 1.10 1.10
COD 26.7 22.1 22.6
MAE 1,234,015 1,091,961 1,127,104
RMSE 1,997,911 1,951,184 1,995,487

and 68.7% of the variation in property prices are explained by the three models
respectively. There was a reduction in the level of explanation to 58.3%, 68.9%,
and 68.2% when adjustment was made to the general population as revealed in
the adjusted R2 statistic. The adjusted R2 reveals the acceptability of the results
since the maximum value is 100%. Again, the three regression models exhibit
high F-statistics at P , 0.01, linear regression (89.441), semi-log (141.527), and
log-log (137.045). Overall, the semi-log model outperforms the linear and log-log
models. However, the goodness-of-fit criterion used in Exhibit 6 was not sufficient
to establish a baseline regression model for the city of Cape Town because the
three models have different structure. The COD as used by Borst (2007) can be
used for direct comparison of the three models (Exhibit 7). Additional measures
of model performance including median ratio, mean ratio, RMSE, MAE, and price
related differentials are also used. The mean and median ratios are part of the
quality assurance benchmark tests recommended for industry used. The standard
stipulates that a ratio of between 0.90 and 1.10 meets the required standards.
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 4 0 1

The results in Exhibit 7 reveal that the semi-log model (1,951,184) performs better
than the linear and log-log models (1,997,911 and 1,995,487) in terms of RMSE.
According to Limsombunchai, Gan, and Lee (2004), a model with the lowest
RMSE is considered the best in terms of prediction accuracy. The MAE value
also reveals the semi-log model (1,091,961) predicts prices that are closer to the
assessed values than the linear and log-log models (1,234,015 and 1,127,104). The
results show marginal performance by the log-log and linear models in terms of
predicting property prices closer to the assessed values. Furthermore, the semi-
log model has the best performance in terms of the mean ratio, median ratio, and
COD while the only slim advantage of the linear model is that it has a slightly
better PRD relating to the vertical equity of prices. Therefore, on the strength of
these results, the baseline model for the Cape Town property market is the semi-
log model.
The detailed results reveal the regression coefficients, t-statistics, and indicators
of the level of significance of the three models. This is presented in Exhibits 8–
10. The test of multicollinearity using the variance inflation factor (VIF) indicates
that in all assessments (linear, semi-log, and log-log), there is no inflation of any
of the coefficients.
The coefficients show the contribution of specific attributes to property price in
the linear model. The result shows that property size is positively related to
property price at R9,156.69. The base number of bedrooms is a three-bedroom
property. It therefore follows that properties with two, four, five, six, seven, eight,
and ten bedrooms are positively related to property prices while properties with
one and nine bedrooms are negatively related to price in this study. The quality
grade factor and condition of properties are essential. As expected, properties
having poor and fair qualities are negatively related to price while good, very
good, and excellent properties are positively related to prices. The condition of
properties provides a result that is somewhat counterintuitive in excellent condition
because of the negative values. While the negative value for fair condition is
expected, the same cannot be said of excellent property condition, which should
ordinarily be additive towards property price, albeit at an appreciating rate. The
number of stories is very important, and two- and three-story properties are
associated with higher prices.
The building style reveals that sub-economic, unconventional, Georgian victor,
and maisonette are positively related to property price, while the Cape Dutch and
Mediterranean building styles are negatively related to price. Property view is also
very important, and coefficients show that properties that have partially obstructed,
above average, panoramic, and excellent views are positively related to price while
properties having a view that is below average are negatively related to price. The
location of a property is very important in determination of its price. Consequently,
the submarket dummy variable is used to depict areas where the property is
located. Properties that are located in SMKT55 and SMKT56 are positively related
to property prices while all properties located in the remaining submarkets are
negatively related to price.

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
4 0 2
E x h i b i t 8 u Linear Regression Model Coefficients

u
Ya c i m
Variable Coeff. Std. Coeff. VIF Std. Error T-stat. P-value

Constant 1,925,246.24 158,212.73 12.169 .000


Beds1 295,531.07 20.003 1.029 427,538.96 20.223 .823

a n d
Beds2 15,630.32 0.001 1.158 137,541.94 0.114 .910
Beds4 37,641.80 0.006 1.350 87,676.96 0.429 .668

B o s h o f f
Beds5 291,784.11 0.028 1.293 135,278.58 2.157 .031
Beds6 629,564.89 0.035 1.182 223,563.63 2.816 .005
Beds7 637,132.87 0.016 1.066 478,398.83 1.332 .183
Beds8 5,007,485.25 0.075 1.017 768,486.58 6.516 .000
Beds9 171,405.83 0.001 1.013 1,434,258.1 0.120 .905
Beds10 3,867,852.90 0.022 1.015 2,029,838.2 1.905 .057
Poor quality 2202,524.20 20.003 1.897 1,241,655.6 20.163 .870
Fair quality 261,100.14 20.002 1.267 430,244.67 20.142 .887
Good quality 442,444.77 0.071 2.537 113,563.08 3.896 .000
Very good quality 2,618,454.27 0.084 1.123 373,413.86 7.012 .000
Excellent quality 2,756,016.95 0.102 1.282 346,187.56 7.961 .000
Poor condition 363,167.64 0.005 1.900 1,050,457.90 0.346 .730
Fair condition 252,365.85 20.002 1.264 339,991.77 20.154 .878
Good condition 247,138.27 0.039 2.505 113,526.24 2.177 .030
Excel. condition 2209,527.37 20.015 1.519 198,402.30 21.056 .291
Storey 2 613,462.18 0.098 1.377 83,371.75 7.358 .000
Storey 3 2,392,759.12 0.126 1.264 242,111.40 9.883 .000
E x h i b i t 8 u (continued)
Linear Regression Model Coefficients

Variable Coeff. Std. Coeff. VIF Std. Error T-stat. P-value

Sub-economic style 280,819.94 0.003 1.019 1,174,492.0 0.239 .811

I m p a c t
Unconventional style 2,103,625.07 0.107 1.143 239,491.88 8.784 .000
Georgian victor style 572,529.82 0.021 1.034 321,948.83 1.778 .075
Cape Dutch style 2286,716.32 20.006 1.018 564,814.73 20.508 .612
Maisonette style 92,098.37 0.003 1.042 340,034.39 0.271 .787

o f
Mediterranean style 238,174.25 0.000 1.034 916,498.59 20.042 .967

A r t i f i c i a l
Partially obstructed view 176,691.38 0.012 1.090 174,756.93 1.011 .312
Below average view 2139,569.28 20.003 1.019 467,739.72 20.298 .765
J R E R

Above average view 485,613.16 0.074 1.268 84,179.59 5.769 .000


Panoramic view 1,327,260.85 0.150 1.382 118,050.99 11.243 .000
Excellent view 1,994,698.18 0.081 1.075 289,304.63 6.895 .000
u

N e u r a l
Vo l .

Smkt48 22,515,927.10 20.090 1.089 330,435.18 27.614 .000


Smkt50 22,094,033.0 20.172 1.281 156,943.99 213.343 .000
Smkt52 1.603 118,522.76 .000
4 0

21,316,265.2 20.160 211.106


Smkt53 1.424 127,714.31 .000

N e t w o r k s
21,136,953.0 20.121 28.902
u

Smkt55 3,852,948.92 0.328 1.460 161,459.66 23.863 .000


N o .

Smkt56 1,391,301.39 0.123 1.439 154,642.04 8.997 .000


Smkt64 22,176,342.9 20.042 1.023 589,279.11 23.693 .000
3 – 2 0 1 8

Smkt65 21,892,688.7 20.015 1.006 1,429,077.4 21.324 .185

u
Smkt66 2830,327.69 20.048 1.166 210,958.29 23.936 .000

4 0 3
4 0 4
u
Ya c i m
E x h i b i t 8 u (continued)
Linear Regression Model Coefficients

a n d
B o s h o f f
Variable Coeff. Std. Coeff. VIF Std. Error T-stat. P-value

Smkt67 2874,903.05 20.087 1.463 138,579.95 26.313 .000


Smkt68 2445,484.83 20.036 1.373 164,824.71 22.703 .007
Smkt69 22,032,349.3 20.070 1.095 343,996.14 25.908 .000
Smkt70 22,293,589.6 20.107 1.206 267,271.65 28.581 .000
Smkt73 21,698,941.2 20.017 1.008 1,168,266.1 21.454 .146
SA1 235,065.4 20.004 1.439 110,172.13 20.318 .750
SA2 2187,413.1 20.024 1.463 107,292.86 21.747 .081
SA3 242,573.42 20.005 1.455 108,746.12 20.391 .695
SA5 132,591.65 0.016 1.423 110,819.77 1.196 .232
Size 9,156.69 0.232 1.789 600.68 15.244 .000
Pool 7,025.48 0.041 1.300 2,200.61 3.193 .001

Note: The dependent variable is Assessed values.


I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 4 0 5

E x h i b i t 9 u Semi-log Regression Model Coefficients

Variable Coeff. Std. Coeff. VIF Std. Error T-stat. P-value

Constant 14.722 0.024 622.852 .000


Beds1 20.086 20.013 1.029 0.064 21.352 .176
Beds2 20.034 20.018 1.158 0.021 21.675 .094
Beds4 0.038 0.033 1.350 0.013 2.909 .004
Beds5 0.071 0.039 1.293 0.020 3.522 .000
Beds6 0.092 0.029 1.182 0.033 2.754 .006
Beds7 0.118 0.017 1.066 0.071 1.649 .099
Beds8 0.524 0.045 1.017 0.115 4.562 .000
Beds9 0.085 0.004 1.013 0.214 0.399 .690
Beds10 0.201 0.007 1.015 0.303 0.663 .507
Poor quality 20.080 20.006 1.897 0.185 20.433 .665
Fair quality 20.002 0.000 1.267 0.064 20.036 .972
Good quality 0.089 0.082 2.537 0.017 5.227 .000
Very good quality 0.266 0.050 1.123 0.056 4.773 .000
Excellent quality 0.350 0.075 1.282 0.052 6.768 .000
Poor condition 0.058 0.005 1.900 0.157 0.371 .711
Fair condition 20.019 20.004 1.264 0.051 20.379 .705
Good condition 0.048 0.044 2.505 0.017 2.843 .004
Excel. condition 20.066 20.027 1.519 0.030 22.240 .025
Storey 2 0.148 0.137 1.377 0.012 11.904 .000
Storey 3 0.342 0.104 1.264 0.036 9.458 .000
Sub-economic style 0.070 0.004 1.019 0.175 0.396 .692
Unconventional style 0.169 0.050 1.143 0.036 4.722 .000
Georgian victor style 0.117 0.024 1.034 0.048 2.439 .015
Cape Dutch style 20.077 20.009 1.018 0.084 20.910 .363
Maisonette style 20.006 20.001 1.042 0.051 20.125 .900
Mediterranean style 0.233 0.017 1.034 0.137 1.700 .089
Partially obstructed view 0.100 0.039 1.090 0.026 3.820 .000
Below average view 20.019 20.003 1.019 0.070 20.265 .791
Above average view 0.115 0.101 1.268 0.013 9.184 .000
Panoramic view 0.242 0.158 1.382 0.018 13.720 .000
Excellent view 0.287 0.068 1.075 0.043 6.638 .000
Smkt48 20.914 20.190 1.089 0.049 218.520 .000
Smkt50 20.570 20.270 1.281 0.023 224.299 .000
Smkt52 20.339 20.237 1.603 0.018 219.123 .000
Smkt53 20.248 20.152 1.424 0.019 213.000 .000

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
4 0 6 u Ya c i m a n d B o s h o f f

E x h i b i t 9 u (continued)
Semi-log Regression Model Coefficients

Variable Coeff. Std. Coeff. VIF Std. Error T-stat. P-value

Smkt55 0.502 0.247 1.460 0.024 20.800 .000


Smkt56 0.202 0.103 1.439 0.023 8.737 .000
Smkt64 20.810 20.091 1.023 0.088 29.200 .000
Smkt65 20.609 20.028 1.006 0.213 22.851 .004
Smkt66 20.205 20.069 1.166 0.032 26.492 .000
Smkt67 20.220 20.126 1.463 0.021 210.635 .000
Smkt68 20.120 20.056 1.373 0.025 24.874 .000
Smkt69 20.733 20.147 1.095 0.051 214.273 .000
Smkt70 20.829 20.224 1.206 0.040 220.763 .000
Smkt73 21.024 20.058 1.008 0.175 25.868 .000
SA1 20.014 20.010 1.439 0.016 20.845 .398
SA2 20.035 20.026 1.463 0.016 22.157 .031
SA3 20.021 20.016 1.455 0.016 21.310 .190
SA5 0.003 0.002 1.423 0.017 0.206 .837
Size 0.002 0.271 1.789 0.000 20.674 .000
Pool 0.001 0.050 1.300 0.000 4.451 .000

Note: The dependent variable is Ln Assessed values.

The base time of sale is in the fourth (SA4) semi-annual notation. Properties that
are sold in the fifth semi-annual time of sale (SA5) are positively related to price
with R132,591.65, while properties sold during the first (SA1), second (SA2), and
third (SA3) semi-annual time of sale are negatively related to price. Also
properties that have a swimming pool are positively related to price with a
R7,025.48 premium over properties without a swimming pool.
The coefficients of the semi-log and log-log models reveal that one- and two-
bedroom properties are negatively related to property prices while properties with
a higher number of bedrooms are positively related to price. Again as expected,
properties with poor and fair quality grade factors are negatively related to price
while higher quality properties are positively related to price. Properties in poor
or good condition are positively related to price while those in fair or excellent
condition are negatively related to price. Again this is counterintuitive as already
noted in the negative sign in the coefficients of excellent properties in the linear
models. In addition, properties with more than two stories are positively related
to price. Property size and swimming pool all have a positive correlation with
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 4 0 7

E x h i b i t 10 u Log-log Regression Model Coefficients

Variable Coeff. Std. Coeff. VIF Std. Error T-stat. P-value

Constant 13.492 0.083 162.123 .000


Beds1 20.076 20.008 1.034 0.093 20.811 .417
Beds2 20.032 20.011 1.171 0.030 21.063 .288
Beds4 0.056 0.034 1.363 0.019 2.917 .004
Beds5 0.117 0.045 1.291 0.029 3.989 .000
Beds6 0.177 0.039 1.169 0.048 3.652 .000
Beds7 0.216 0.021 1.062 0.104 2.076 .038
Beds8 0.754 0.045 1.017 0.167 4.504 .000
Beds9 0.282 0.009 1.010 0.312 0.903 .367
Beds10 0.678 0.015 1.010 0.441 1.536 .125
Poor quality 20.120 20.006 1.898 0.271 20.445 .656
Fair quality 0.008 0.001 1.266 0.094 0.090 .928
Good quality 0.133 0.085 2.540 0.025 5.356 .000
Very good quality 0.380 0.049 1.124 0.081 4.672 .000
Excellent quality 0.527 0.078 1.282 0.075 6.990 .000
Poor condition 0.087 0.005 1.900 0.229 0.379 .705
Fair condition 20.038 20.006 1.264 0.074 20.510 .610
Good condition 0.066 0.042 2.509 0.025 2.646 .008
Excel. condition 20.100 20.028 1.519 0.043 22.311 .021
Storey 2 0.216 0.138 1.411 0.018 11.739 .000
Storey 3 0.500 0.106 1.276 0.053 9.432 .000
Sub-economic style 0.128 0.005 1.020 0.256 0.502 .616
Unconventional style 0.242 0.049 1.144 0.052 4.644 .000
Georgian victor style 0.162 0.023 1.035 0.070 2.311 .021
Cape Dutch style 20.080 20.006 1.016 0.123 20.648 .517
Maisonette style 0.053 0.007 1.051 0.074 0.712 .476
Mediterranean style 0.273 0.014 1.035 0.200 1.364 .173
Partially obstructed view 0.130 0.035 1.089 0.038 3.425 .001
Below average view 0.036 20.004 1.018 0.102 20.355 .722
Above average view 0.171 0.104 1.268 0.018 9.329 .000
Panoramic view 0.357 0.162 1.385 0.026 13.856 .000
Excellent view 0.437 0.071 1.078 0.063 6.924 .000
Smkt48 21.300 20.187 1.094 0.072 218.019 .000
Smkt50 20.813 20.267 1.280 0.034 223.775 .000
Smkt52 20.482 20.235 1.612 0.026 218.624 .000
Smkt53 20.344 20.146 1.424 0.028 212.374 .000

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
4 0 8 u Ya c i m a n d B o s h o f f

E x h i b i t 10 u (continued)
Log-log Regression Model Coefficients

Variable Coeff. Std. Coeff. VIF Std. Error T-stat. P-value

Smkt55 0.722 0.246 1.460 0.035 20.510 .000


Smkt56 0.357 0.126 1.392 0.033 10.768 .000
Smkt64 21.169 20.091 1.024 0.128 29.097 .000
Smkt65 20.874 20.028 1.006 0.311 22.805 .005
Smkt66 20.297 20.069 1.165 0.046 26.453 .000
Smkt67 20.305 20.121 1.460 0.030 210.114 .000
Smkt68 20.153 20.050 1.371 0.036 24.266 .000
Smkt69 20.961 20.133 1.120 0.076 212.685 .000
Smkt70 21.117 20.209 1.241 0.059 218.907 .000
Smkt73 21.452 20.057 1.009 0.255 25.700 .000
SA1 20.026 20.013 1.439 0.024 21.072 .284
SA2 20.056 20.029 1.462 0.023 22.409 .016
SA3 20.033 20.017 1.456 0.024 21.413 .158
SA5 20.001 0.000 1.423 0.024 20.031 .976
Size 0.304 0.259 1.899 0.016 18.946 .000
Pool 0.014 0.044 1.318 0.004 3.855 .000

Note: The dependent variable is Ln Assessed values.

price in the semi-log model. Similar variables have a positive influence on property
price in the log-log model. The results show that the three regression models have
dissimilar significant variables. The linear model has the least number of
significant variables (28); the semi-log model has a slightly higher number of
significant variables (34); and the log-log model has the highest number of
significant variables (35). Again, quite a number of variables have the appropriate
signs in the nonlinear regression models than in linear regression.
In general, the results of the log transformed regression are better than the linear
regression, thereby improving the predictive accuracy. McCluskey (2016) observed
that it is imperative to have quality control measures such as the log transformation
to confirm the results of a predictive model before such estimates are used for
taxation purposes. This analysis shows that the semi-log models can be used to
achieve the objective of tax assessors in arriving at a tax amount for properties in
Cape Town. As noted earlier, due to the limitations of the hedonic models, various
improvements were made to enhance their predictive accuracy.
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 4 0 9

E x h i b i t 11 u Goodness-of-Fit Measurements and Prediction Comparison of BP, L M, PBCG, and SCG


Algorithms

Performance Mean Median COD


Measures R2 Ratio Ratio PRD % MAE RMSE

SCG All data 0.7590 1.0263 0.9941 1.0275 11.77 256,469.9 397,494.6
Train data 0.7558 1.0251 0.9940 1.0277 11.82 1,762,084.0 2,889,345.0
Test data 0.7653 1.0291 0.9965 1.0272 11.62 253,126.3 407,919.9
PBCG All data 0.7504 1.0276 0.9978 1.0257 11.75 255,564.8 527,264.9
Train data 0.7509 1.0279 0.9979 1.0262 11.85 257,894.6 571,605.3
Test data 0.7495 1.0269 0.9978 1.0245 11.50 250,131.6 405,444.6
LM All data 0.7583 1.0129 0.9957 1.0229 10.77 239,297.9 513,824.2
Train data 0.7973 1.0139 0.9936 1.0239 10.98 243,832.4 559,611.0
Test data 0.6716 1.0106 1.0003 1.0207 10.26 228,723.7 386,524.8
BP All data 0.6789 1.0461 1.0278 1.0829 25.14 1,057,708.0 1,773,281.0
Train data 0.6618 0.8262 0.7951 1.0180 28.76 1,288,025.0 2,042,443.0
Test data 0.7749 1.0859 1.0413 1.0606 24.16 1,013,821.0 1,410,072.0

T h e I n f l u e n c e o f A N N s Tr a i n i n g A l g o r i t h m s i n M a s s
Appraisal

The structure of the ANNs is different from the regression-based techniques. There
is no a priori pre-specification of the relation between the dependent and
independent variables. The variables used for ANN training and testing are
identical but have differing composition as previously noted. Four ANN training
algorithms (BP, SCG, PBCG, and LM) were used in this assessment. The results
are presented in Exhibit 11. The performance of the models reveals a very good
square correlation between the actual assessed values and the estimated values for
the all data, training, and testing datasets. For the BP training algorithm, the R2
reveals that the ANN architecture explains 67.9% of prices for all data, 66.2% for
training data, and 77.5% for testing data. There is improvement in the performance
of the SCG, PBCG, and LM relative to the BP in explaining the variation in
property prices. Specifically, the SCG algorithm explains 75.9%, 75.6%, and
76.5% of variation in property prices for all data, training, and testing data,
respectively. In addition, the R2 statistic measures for PBCG revealed 75%, 75.1%,
and 74.9% variation in property prices for all data, training, and testing data,
respectively. The LM algorithm revealed variance in property price as shown in
their R2s of 75.8%, 79.7%, and 67.2% for all data, training, and testing datasets,
respectively.
The MAE accuracy test statistic reveals the LM algorithm predicts property prices
closer to the assessed values than all other training algorithms for all data, training,

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
4 1 0 u Ya c i m a n d B o s h o f f

and testing data (239,298, 243,832 and 228,724). The BP training algorithm
performed worse in the all data and testing data (1,057,708 and 1,013,821) but
has slight improvement in the training data (1,288,025) as revealed in the MAE
accuracy test. The BP algorithm slightly outperforms the SCG in the training data.
The RMSE accuracy test shows the BP algorithm performs poorly in all data and
test data in comparison to the other training algorithms. The BP algorithm also
marginally outperforms the SCG in the training dataset. The LM reveals a lower
RMSE in the training and testing dataset, thus showing that in terms of model
prediction and accuracy, it is better than other training models.
The results of the different ANN training algorithms when viewed in the light of
the IAAO approved guidelines for mass appraisal reveal that apart from the BP
training algorithm, all other models demonstrate consistency between lower valued
and higher valued properties. The PRD for all other models and the training data
of BP reveals that it lies between 1.01 and 1.03, depicting that neither the lower
nor higher valued properties are favored. However, the results show that the BP
algorithm performs better in one scenario—the training data than the PRD of the
training data of other models. Again, the results show that the LM algorithm
performs better in the training and testing datasets in terms of their COD statistic
but the BP algorithm performs poorly in terms of uniformity and horizontal or
random dispersion (COD). The results reveal a high similarity of results among
the different algorithms. Therefore, to arrive at a definite conclusion on which
training algorithm performs optimally, a reliability ranking order as used in
McCluskey et al. (2013) was employed. The analysis was carried out on the R2,
PRD, COD, MAE and RMSE for all data, training, and testing datasets.
The results of reliability ranking order in Exhibit 12 reveal the LM training
algorithm outperforms all other models in the study. The LM algorithm has the
properties of gradient descent and Newton speed, which enables it to train the
ANNs faster and better than the most widely used BP algorithm. Although the
SCG and PBCG algorithms perform well, they nonetheless could not surpass the
LM training algorithm.

Comparison of the Performance of LMANNs and


Semi-log Models

Previous studies are not wholly conclusive in terms of model superiority.


McCluskey et al. (2012) observe that the right question on model superiority is
not rightly asked. The correct question should be: Which modeling approach meets
the rigorous standards of transparency, stability of output, predictive ability and
defensibility for the mass appraisal industry? This question is worthy of use in
this study to select the optimal model for the Cape Town property market. The
analyses reveal the baseline model as the semi-log, and the optimal ANN training
algorithm is the Levenberg-Marquardt. The tests used for comparison are the
model performance and reliability ranking order, model explainability ranking
order (McCluskey et al., 2013) and prediction accuracy of a model within 10%
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 4 1 1

E x h i b i t 12 u Performance of Algorithms and Reliability Ranking Order

Training Algorithms R2 PRD COD MAE RMSE Overall Rank

Panel A: All data

SCG 1 3 2 3 1 2.0
PBCG 3 2 1 2 3 2.2
LM 2 1 2 1 2 1.6
BP 4 4 4 4 4 4.0

Panel B: Training data

SCG 2 4 2 4 4 3.2
PBCG 3 3 3 2 2 2.6
LM 1 2 1 1 1 1.2
BP 4 1 4 3 3 3.0

Panel C: Testing data

SCG 2 3 3 3 3 2.8
PBCG 3 2 2 2 2 2.2
LM 4 1 1 1 1 1.6
BP 1 4 4 4 4 3.4

E x h i b i t 13 u Model Prediction within 10% and 20% of Actual Assessed Values

Model Percent within 10% Percent within 20%

Semi-log 43.4% 50.5%


L MANNs 51.1% 57.1%

and 20% of the actual assessed values. The model that performs best in all
measures is selected as the Cape Town mass appraisal model.
The first test is to assess the level of prediction accuracy that falls within 10%
and 20% of the assessed values as used by Thibodeau (2003) and McCluskey et
al. (2013). The results are presented in Exhibit 13. The results reveal that semi-
log model fails the minimum standard of 50% prediction accuracy within 10% of
the assessed values but the LMANNs marginally achieve the benchmark of 50%.
These models have a good predictive performance. There is a slight improvement
in the semi-log model when the benchmark was increased to prediction within

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
4 1 2 u Ya c i m a n d B o s h o f f

E x h i b i t 14 u General Model Performance and Reliability Ranking

Accuracy Measures Semi-log L MANNs

R2 2 1
Median ratio 2 1
Mean ratio 2 1
PRD 2 1
COD 2 1
MAE 2 1
RMSE 2 1
Overall rank 2.00 1.00

20% of the assessed values. In all, the LMANNs performed best in this test. The
model has a good prediction accuracy of 53.1% of property appraised within 10%
and 57.1% when predicted values fall within 20% of the assessed values.
The second test employed is the performance and reliability ranking. The R2,
median, and mean ratios, COD, PRD, MAE, and RMSE accuracy measures are
used. The accuracy measures were selected because they are found in all platforms
used. The results are summarized in Exhibit 14.
The results presented in Exhibit 14 are tied to the Cape Town property data. The
comparative analysis results reveal the LMANNs outperform the semi-log model.
The LMANNs includes both the gradient descent search method and Newton
speed, which enhance its performance. The overall result favors the LMANNs as
the best in terms of predictive accuracy but it should be noted that within the
mass appraisal field, equally significant is the ability to support the assessed/
estimated values before a tribunal (McCluskey et al., 2012). Although such
challenges against the model are rare, the detail estimates predicted should
reasonably be comprehensible to the appraiser, such that in the occasion of any
challenge, this should explicitly be explained.
Exhibit 15 provides the explicitability results of a model to explain details, which
is very important in mass appraisal. The hedonic regression model has been used
in the mass appraisal environment despite its shortcoming because it provides
simple and consistent statistical evidence that can help in objective judgement
about the superiority of the predictive method in terms of R2, adjusted R2, F-
statistics, t-statistics, and levels of the significance of variables. McCluskey (2013)
note that the traditional hedonic regression models are preferred because of the
single set of parameter estimates that are consistently applied. These qualities of
the traditional regression models give them an edge over the other models used
in this comparative analysis; hence, the semi-log model occupies a unique
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 4 1 3

E x h i b i t 15 u Model Explicit Explainability Ranking

Measurement Criteria Semi-log L MANNs

Simple 1 2
Consistent 1 2
Transparency 1 2
Locational 1 2
Applicability 1 2
Mean ranking 1.00 2.00

position, as shown in Exhibit 15. In terms of the explicit nature of location, the
traditional semi-log models are not inherently locational (McCluskey et al., 2013).
The linear, semi-log, and log-log models do not capture explicit location (x, y
coordinates), although locational dummies can be used. The locational dummies
could be neighborhood, political or administrative units or known submarkets.
The ANN training algorithms have the ability to recognize patterns; hence,
locational dummies are equally used. McCluskey et al. (2013) opine that absolute
location can also be used for ANNs but we find no such evidence. The most
important measure is the applicability of models in mass appraisal. It is not enough
for a model to accurately predict values but to what extent can the model be used
in mass appraisal. Exhibits 13 and 14 reveal the LMANNs to be the best
performing model in terms of prediction accuracy; however, the results in Exhibit
15 negate this finding. This is because of the non-transparent nature of the model
estimates. The black box nature of the ANN training algorithms make them
difficult to implement in mass appraisal. The hedonic regression models,
particularly, the semi-log form, can be implemented for mass appraisal due to the
explicit manner the parameter estimates provide.

u Conclusion
The importance of modeling property prices that is devoid of the many parametric
restrictions of the hedonic regression has resulted in a plethora of academic
research over the last 40 years. The BP training algorithm of the ANNs is one
such model that has been applied in modeling price with varying degrees of
success. However, a few studies have taken a contrary position that negates such
claims. This study represents a first attempt to evaluate the feasibility and
effectiveness of other ANN training algorithms in modeling property prices and
comparing results with hedonic regression models. Using industry and standard
benchmark accuracy statistical tests, the results reveal that the PBCG, LM, and
SCG training algorithms outperform the traditional BP algorithm in this study

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
4 1 4 u Ya c i m a n d B o s h o f f

with the overall performance credited to the LM training algorithm. The LMANNs
has the ability to combine both gradient descent and Newton speed, thus enhancing
its speed in training the ANNs. In general, the ANNs do not require much effort
in pre-processing data. They also do not require the creation of binary or
transformation to linearize variables and over-training of fitting is less of an issue
with the models. They do however suffer from non-transparency in interpretation
of coefficients. The trial and error procedure of ANNs can create slightly
inconsistent and unreliable results each time the technique is used.
The hedonic regression models have long been used to provide a well-established
approach in mass appraisal. In contrast to the ANNs, the regression models create
coefficients for each variable, providing a transparent platform for explanation and
defense should the appraisal exercise be called into question. The ability of the
regression-based models to replicate findings irrespective of the software utilized
is an added advantage of the model. The results of this study reveal that while
the LMANNs perform best in terms of prediction accuracy within the 10% and
20% of the assessed values and performance reliability ranking order, the model
was rejected on the basis of non-transparency and applicability within the mass
appraisal environment. The results favor the semi-log model; although in the
context of predictive accuracy was below the LMANNs, it is superior due to its
simplicity, consistency, locational ability, transparency, and applicability in the
mass appraisal domain.
For future research, it might be of interest to improve the degree of explainability
of ANNs through research that promotes the opening of the black box (to see
what is happening inside) so that parameter estimates can be viewed explicitly.
Until this is done, these models would continue to be regarded as techniques that
support and complement econometric analysis and not as substitutes. Data from
other jurisdictions within the South African property market would be useful to
determine the robustness of the findings. It will also be of great interest to utilize
property data from other regions within Sub-Saharan Africa to ascertain the
robustness of the models. Finally, to add voice to what McCluskey et al. (2012)
opined having considered the computing abilities of the regression techniques and
the ANNs that a hybrid of the duo should be formed such that the results from
ANNs could be used to smooth the output generated by the hedonic regression
models. This is because despite the high performance of other training algorithms
in this study, the results did not make any difference with the back propagation
training algorithm in terms of transparency.

u References
Al-Saqer, S.M and G.M. Hassan. Artificial Neural Networks based Red Palm Weevil
(Rynchophorus Ferrugineous, Olivier) Recognition System. American Journal of
Agricultural and Biological Sciences, 2011, 6:3, 356–64.
Ambarish, G.M. and K.L. Saroj. Neural Network Pattern Classification and Weather
Dependent Fuzzy Logic Model for Irrigation Control in WSN-based Precision Agriculture.
Proceedia Computer Science, 2016, 78, 499–506.
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 4 1 5

Andrei, N. Scaled Conjugate Gradient Algorithms for Unconstrained Optimization.


Computational Optimization and Applications, 2007, 38:3, 401–16.
——. Open Problems in Nonlinear Conjugate Gradient Algorithms for Unconstrained
Optimization. Bulletin of the Malaysian Mathematical Sciences Society, 2011, 34:2, 319–
30.
Battiti, R. First- and Second-order Methods for Learning: between Steepest Descent and
Newton’s Method. Neural Computation, 1992, 4, 141–66.
Beale, E.M.L. A Derivative of Conjugate Gradients. In: Numerical Methods for Nonlinear
Optimisation, F.A. Lootsma (ed.). London: Academic Press, 1972.
Borst, R.A. Artificial Neural Networks: The Next Modelling/Calibration Technology for
the Assessment Community? Property Tax Journal, 1991, 10:1, 69–94.
——. Artificial Neural Networks in Mass Appraisal. Journal of Property Tax Assessment
& Administration, 1995, 1:2, 5–15.
——. Discovering and Applying Locational Influence Pattern in the Mass Valuation of
Domestic Real Property. Ph.D. thesis submitted to the faculty of engineering, University
of Ulster, U.K., 2007.
——. Evaluation of the Fourier Transformation for Modelling Time Trends in a Hedonic
Model. Journal of Property Tax Assessment, 2008, 6:4, 33–40.
——. Time-varying Parameters: Obtaining Time Trends in a Hedonic Model without
Specifying their Functional Form. Journal of Property Tax Assessment, 2009, 6:4, 29–36.
Bourassa, S.C., E. Cantoni, and M. Hoesli. Spatial Dependence, Housing Submarkets and
House Prediction. Journal of Real Estate Finance & Economics, 2007, 35, 143–60.
Court, A.T. Hedonic Price Indexes with Automobile Examples, In: The Dynamics of
Automobile Demand. New York: General Motors Corporation, 1939.
Crooper, L.M., L.B. Deck, and K.E. McConnell. On the Choice of Functional Form for
Hedonic Functions. The Review of Economics and Statistics, 1988, 70:4, 668–78.
Des Rosier, F. and M. Thériault. Mass Appraisal, Hedonic Price Modelling and Urban
Externalities: Understanding Property Value Shaping Processes. In: Mass Appraisal
Methods: An International Perspective for Property Valuers, T Kauko, and M. d’Amato
(eds.). Oxford: Wiley-Blackwell, 2008.
Do, A.Q. and G. Grudnitski. A Neural Network Approach to Residential Property
Appraisal. The Real Estate Appraiser, 1992, 58:3, 38–45.
El Hamzaoui, Y. and J.A.H. Perez. Application of Artificial Neural Networks to Predict
the Selling Price in the Real Estate Valuation Process. Proceedings of 10th Mexican
International Conference on Artificial Intelligence, Puebla, Mexico, November 26–
December 4, 2011.
Engelbrecht, A.P. Computational Intelligence: An Introduction. Second edition. England:
John Wiley & Sons, Ltd., 2007.
Evans, A., H. James, and A. Collins. Artificial Neural Networks: An Application to
Residential Valuation in the U.K. Journal of Property Valuation and Investment, 1992, 11:
2, 195–204.
Feng, Y. and K. Jones. Comparing Multilevel Modelling and Artificial Neural Networks in
House Price Prediction. 2015 2nd IEEE International Conference on Spatial Data Mining
and Geographical Knowledge Services (ICSDM 2015). Proceedings of a meeting held 8–
10 July 2015, Fuzhou, China. Institute of Electrical and Electronics Engineers (IEEE),
2005.

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
4 1 6 u Ya c i m a n d B o s h o f f

Gloudemans, R.J. Adjusting for Time in Computer-assisted Mass Appraisal, Property Tax
Journal, 1990, 83–99.
——. Comparison of Three Residential Regression Models: Additive, Multiplicative, and
Nonlinear. Assessment Journal, 2002, 9:4, 25–36.
Goodman, A.C. Hedonic Prices, Price Indices and Housing Markets. Journal of Urban
Economics, 1978, 5, 471–84.
Goodman, A.C. and T.G. Thibodeau. Housing Market Segmentation. Journal of Housing
Economics, 1998, 7, 121–43.
Greene, H.W. Econometric Analysis. Fifth Edition. Upper Saddle River, NJ: Prentice Hall,
2003.
Guan, J., J. Zurada, and A.S. Levitan. An Adaptive Neuro-fuzzy Inference System-based
Approach to Real Estate Property Assessment. Journal of Real Estate Research, 2008, 30:
4, 395–420.
Hagan, M.T. and M.B. Menhaj. Training Feedforward Networks with the Marquardt
Algorithm. IEEE Transactions on Neural Networks, 1994, 5:6, 989–93.
Hornik, K. Approximation Capabilities of Multilayer Feed-forward Networks. Neural
Networks, 1991, 4, 251–57.
IAAO. 2013. Standard on Mass Appraisal of Real Property. Accessed February 4, 2015 at:
http: / / katastar.rgz.gov.rs / masovna-procena / Files/ 2.Standard of Mass Appraisal of
Real Property 2013.pdf.
Janssen, C. and B. Söderberg. Estimating Market Prices and Assessed Values for Income
Properties. Urban Studies, 1999, 36:2, 359–96.
Kang, H.B. and A.K. Reichert. An Evaluation of Alternative Estimation Techniques and
Functional Forms in Developing Statistical Appraisal Models. Journal of Real Estate
Research, 1987, 2:1, 1–27.
Kauko, T and M. d’Amato. Introduction: Suitability Issues in Mass Appraisal Methodology.
In: Mass Appraisal Methods: An International Perspective for Property Valuers, T. Kauko
and M. d’Amato, (eds). Oxford: Wiley-Blackwell, 2008.
KPMG. Exploring the Benefits of Technology to Government and Society: A Study of
Thomson Reuters Aumentum in the City of Cape Town, South Africa. Delaware: KPMG,
LLP, 2015.
Kwok, T.Y. and D.Y. Yeung. Constructive Algorithms for Structure Learning in
Feedforward Neural Networks for Regression Problems. IEEE Transactions on Neural
Networks, 1997, 8:3, 630–45.
Lancaster, K. A New Approach to Consumer Theory. The Journal of Political Economy,
1966, 74:2, 23–9.
Lenk, M.M., E.M. Worzala, and A. Silva. High-tech Valuation: Should Artificial Neural
Networks Bypass the Human Valuer? Journal of Property Valuation and Investment, 1997,
15, 8–26.
Levenberg, K. A Method for the Solution of Certain Problems in Least Squares. Quarterly
of Applied Mathematics, 1944, 5, 164–68.
Limsombunchai, V., C. Gan, and M. Lee. House Price Prediction: Hedonic Price Model vs
Artificial Neural Network. American Journal of Applied Sciences, 2004, 1:3, 193–201.
Lin, C.C. and S.B. Mohan. Effectiveness Comparison of the Residential Property Mass
Appraisal Methodologies in the USA. International Journal of Housing Markets and
Analysis, 2011, 4:3, 224–43.
I m p a c t o f A r t i f i c i a l N e u r a l N e t w o r k s u 4 1 7

Malpezzi, S., L. Ozanne, and T. Thibodeau. Characteristics Prices of Housing in Fifty-nine


Metropolitan Areas. Research report. Washington, D.C.: The Urban Institute, December,
1980.
Marquardt, D. An Algorithm for Least Square Estimation of Nonlinear Parameters. Journal
on Applied Mathematics, 1963, 11:2, 431–41.
Masters, T. Practical Neural Network Recipes in C11. Boston, M.A: Academic Press,
McGraw Hill, 1993.
McCluskey, W.J. Real Property Taxation in the Republic of Kazakhstan. Land Tenure
Journal, 2016, 2, 119–38.
McCluskey, W.M. Predictive Accuracy of Machine Learning Models for the Mass Appraisal
of Residential Property. New Zealand Valuers’ Journal, 1996, July, 41–7.
McCluskey, W.J. and S. Anand. The Application of Intelligent Hybrid Techniques for the
Mass Appraisal of Residential Properties. Journal of Property Investment and Finance,
1999, 17:3, 218–38.
McCluskey, W.J. and R.A. Borst. Specifying the Effect of Location in Multivariate
Valuation Models for Residential Properties. Property Management, 2007, 23:4, 313–43.
——. Detecting and Validating Residential Housing Submarkets: A Geostatistical Approach
for Use in Mass Appraisal. International Journal of Housing Market and Analysis, 2011,
4:3, 290–318.
McCluskey, W.J., P. Davis, M. Haran, M. McCord, and D. McIlhatton. The Potential of
Artificial Neural Networks in Mass Appraisal: The Case Revisited. Journal of Financial
Management of Property and Construction, 2012, 17:3, 274–92.
McCluskey, W.J., M. McCord, P.T. Davis, M. Haran, and D. McIlhatton. Prediction
Accuracy in Mass Appraisal: A Comparison of Modern Approaches. Journal of Property
Research, 2013, 30:4, 239–65.
McGreal, S., A. Adair, D. McBurney, and D. Patterson. Neural Networks: The Prediction
of Residential Values. Journal of Property Valuation and Investment, 1998, 10:1, 57–70.
Møller, F.M. A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning. Neural
Networks, 1993, 6, 525–33.
Nguyen, N. and A. Cripps. Predicting Housing Values: A Comparison of Multiple
Regression Analysis and Artificial Neural Networks. Journal of Real Estate Research, 2001,
22:3, 313–36.
Openshaw, S. Neural Network, Genetic, and Fuzzy Logic Models of Spatial Interaction.
Environmental and Planning A, 1998, 30, 1857–72.
Orozco, J. and C. Garcı́a. Detecting Pathologies from Infant Cry Applying Scaled
Conjugate Gradient Neural Networks. Paper presented at the European Symposium on
Artificial Neural Networks, Bruges, Belgium, April 23–25, 2003.
Palmquist, R.B. Alternative Techniques for Developing Real Estate Price Indexes. The
Review of Economics and Statistics, 1980, 62:3, 442–48.
Peterson, S. and A.B. Flanagan. Neural Network Hedonic Pricing Models in Mass Real
Estate Appraisal. Journal of Real Estate Research, 2009, 31:2, 147–64.
Powell, M.J.D. Restart Procedures for the Conjugate Gradient Method. Mathematical
Programming, 1977, 12, 241–54.
Rosen, S. Hedonic Prices and Implicit Markets, Product Differentiation in Pure
Competition. Journal of Political Economy, 1974, 82, 218–33.

J R E R u Vo l . 4 0 u N o . 3 – 2 0 1 8
4 1 8 u Ya c i m a n d B o s h o f f

Rumelhart, D.E., G.E. Hinton, and R.J. Williams. Learning Representations by Back-
propagating Errors. Nature, 1986, 323, 533–36.
Schulz, R., M. Wersing, and A. Werwatz. Automated Valuation Modelling: A Specification
Exercise. Journal of Property Research, 2014, 31:2, 131–53.
Sharma, A.K., R.K. Sharma, and H.S. Kasana. Prediction of First Lactation 305-day Milk
Yield in Karan Fries Dairy Cattle using ANN Modelling. Applied Soft Computing, 2007,
7:3, 1112–20.
Sirmans, G.S., D.A. Macpherson, and E.N. Zietz. The Composition of Hedonic Pricing
Models. Journal of Real Estate Literature, 2005, 13:1, 3–43.
Tay, D.P.H. and D.K.K. Ho. Artificial Intelligence and the Mass Appraisal of Residential
Apartment. Journal of Property Valuation and Investment, 1992, 10, 525–40.
Thibodeau, T.G. Marking Single-family Property Values to Market. Real Estate Economics,
2003, 31:1, 1–22.
Wilamowski, B.M., O. Kaynak, S. Iplikci, and M.Ö. Efe. An Algorithm for Fast
Convergence in Training Neural Networks. IEEE, 2001, 1778–82.
Wilamowski, B.M. and H. Yu. Improved Computation for Levenberg-Marquardt Training.
IEEE Transactions on Neural Networks, 2010, 21:6, 930–37.
Werbos, P.J. Beyond Regression: New Tools for Prediction and Analysis in Behavioral
Sciences. Doctoral dissertation in Applied Mathematics, Harvard University, 1974.
Wilhelmsson, M. Spatial Models in Real Estate Economics. Housing, Theory and Society,
2002, 19, 92–101.
Worzala, E.M., M.M. Lenk, and A. Silva. An Exploration of Neural Networks and its
Application to Real Estate Valuation. Journal of Real Estate Research, 1995, 10:2, 185–
202.
Yu, H. and B.M. Wilamowski. Levenberg-Marquardt Training. In: The Industrial
Electronics Handbook. Vol. 5, Intelligent Systems, second edition. Boca Raton, FL: CRC
Press, 2011a.
——. Neural Network Training with Second Order Algorithms. In: Human–Computer
Systems Interaction. Background and Applications, Z.S. Hippe, et al. (eds.), 2nd Series:
Advances in Intelligent and Soft Computing. Springer-Verlag, 2011b.
Zurada, J., A.S. Levitan, and J. Guan. A Comparison of Regression and Artificial
Intelligence Methods in a Mass Appraisal Context. Journal of Real Estate Research, 2011,
33:3, 349–87.

The opinions expressed in this study are those of the authors. Financial assistance
received from the National Research Foundation (NRF) and the IRE/BS Foundation
for African Real Estate Research is hereby acknowledged. We also thank Justine
Kahonde a staff member of the city valuation office Cape for the property data and
other useful information.

Joseph Awoamim Yacim, University of Pretoria, Hatfield 0028, South Africa and
Federal Polytechnic, Nasarawa, Nasarawa 962, Nigeria or Joseph.yacim@up.ac.za.
Douw Gert Brand Boshoff, University of Pretoria, Hatfield 0028, South Africa and
drdgbboshoff@gmail.com.

You might also like