Download as pdf or txt
Download as pdf or txt
You are on page 1of 78

PREDICTION MODEL FOR RESOURCE REQUIREMENT

ANALYSIS OF NETWORK PROJECTS USING ARTIFICIAL


NEURAL NETWORK

Thesis

Submitted in partial fulfilment of the requirements for the degree of

MASTER OF TECHNOLOGY in

CONSTRUCTION TECHNOLOGY AND MANAGEMENT

By

ANKUR MITRA

16CM38F

DEPARTMENT OF CIVIL ENGINEERING

NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA

SURATHKAL, MANGALORE – 575025

MAY 2018
DECLARATION

I hereby declare the thesis entitled “Prediction Model for Resource Requirement Analysis of
Network Projects Using Artificial Neural Network”, which is being submitted at the
National Institute of Technology Karnataka, Surathkal in partial fulfilment of the
requirements for the award of the Degree of Master of Technology in Construction
Technology and Management is a bonafide report of the research work carried out by me.
The material contained in this study report has not been submitted to any University or
Institution for the award of any degree.

……………………………………………..

ANKUR MITRA

(16CM38F)

Department of Civil Engineering

Place: NITK Surathkal

Date:

ii
CERTIFICATE

This is to certify that the thesis report entitled “Prediction Model for Resource Requirement
Analysis of Network Projects Using Artificial Neural Network” submitted by Mr. ANKUR
MITRA (16CM38F) is a bonafide record of the work carried out by him for the partial
fulfilment of the requirements for the award of Master of Technology in Construction
Technology and Management, Department of Civil Engineering, National Institute of
Technology Karnataka, Surathkal.

……………………………………………….

Dr. C. P. DEVATHA

Assistant Professor

Department of Civil Engineering

N.I.T.K, Surathkal

……………………………………………….

Chairman – DPGC

N.I.T.K, Surathkal

iii
ACKNOWLEDGEMENT

I would like to express my unhindered appreciation to my project guide and supervisor Dr.
C.P. Devatha for her continued support and encouragement. She always gave me freedom to
have an unrelenting work process and gave her valuable suggestions whenever I required
them. Her critical approach to every nuanced aspect was crucial in making this report as
flawless as possible.

I would also like to express my heartiest gratitude to Mr. G. BALASUBRAMANIAN


(Internal Guide - L&T). His constant guidance and advice played the vital role in making
the execution of the report.

I am very thankful to L&T Construction-Water & Effluent Treatment IC for giving me the
opportunity and guiding me throughout my summer internship. It was a unique and very good
learning experience for me to be involved with.

During the period of my internship work, I have received generous help from many
departments of IT City, Mohali site like Accounting, Planning, Contracts and Tendering,
Quality Control and Plants and Machinery which I like to put on record here with deep
gratitude and great pleasure.

I would also like to extend my warmest gratitude to Mr. Manoj Kumar Dubey (Project
Manager of IT City, Mohali during my visit) for allowing me to encroach upon his precious
time and giving me all the necessary help and insights required for my study.

I would like to extend my appreciation towards all the staff members of the Mohali project
site and L&T internal BIS students of NITK, Surathkal for their encouragement, support and
guidance throughout this venture.

Ankur Mitra

(16CM38F)

iv
ABSTRACT
Construction Projects are highly complex and subtly evolving endeavours aimed at a time
bound achievement of predetermined objectives. The precise nuance of achieving the exact
goals without any hindrances and delays is still an elusive art. A majority of the projects still
work on half-baked predictions and results obtained from previous experiences. The biggest
crisis that faces the construction industry today is the need to optimize accurately all the
essentials parameters of change.

In this study, an attempt has been made to learn and train from past construction projects all
the important factors that control resources (resource parameters include 4Ms in terms of
activities that is excavation and backfilling, masonry, piling, road laying, reinforcement and
concreting) and their availability. The success or failure of past projects have been utilised to
train a model for future projects to analyse these changing factors to a quantifiable degree and
implement them to obtain a perfect fit of resource requirement and scheduling for that
particular project. In the study, Artificial Neural Network (ANN) has been used to learn and
train a model coded in MATLAB. The use of Non-linear Auto-Regressive network with
eXogenous input (NARX) model extracts vital information and hidden patterns from a large
database to create near accurate predictions.

All subsequent data have been obtained from individual sites under Water Effluent and
Treatment IC of L&T. For the limited context of the project, resource analysis of waste water
(& network) projects are used. The results obtained underline the need to quantify unknown
factors to a suitable degree and classify future projects for accurate predictions. The finalized
weights for the parameters of the Case Studies were obtained. In the 1st case study, the
weights in order of the above parameters were 5.1159, 3.0205, 2.2348, 1.087 and 0.5289 with
mean square error of 170.85. For the 2nd case study the weights in order of the parameters
above were 12.666, 12.902, 6.514 and 1.307 with mean squared error of 7.17 * 106. The
results obtained from the prediction model are within the acceptable range as shown in auto-
correlation error graphs. Although the model works well for the two case studies, errors
occurred due to data insufficiency. The use of Artificial Intelligent methods like ANN
collates very flexible prediction cycles for future projects.

v
TABLE OF CONTENTS

Acknowledgement………………………………………………………………………..….iv
Abstract..............................................................................................................................…...v
List of Figures……………………………………………………………………………......ix
List of Notations……………………………………………………………………………..xi

CHAPTER 1 INTRODUCTION…………………….……….……………………………..1
1.1 Artificial Neural Network……………….…………………………………...........4
1.1.1 Background…………………..….………………………………………4
1.1.2 Feed Forward ANN……………….…………….……………………….5
1.1.3 Feedback ANN……......…………………………………………………6
1.2 Machine Learning in ANN….....………….……………………………………….7
1.3 ANN in Construction Projects…………….……………………………………….8
1.4 Decision Support System………………….……………………………………....9
1.5 Objective of the study………….………………………………………………...10
1.6 Scope of the study………….…………………………………………………….10

CHAPTER 2 LITERATURE REVIEW…………………….…………………………….11


2.1 Time Series Modelling (NTSTOOL)……………………………………….........14
2.1.1 Components of Time Series…………………………………………....14
2.1.2 Introduction to Time Series Analysis………….……………………….16
2.2 Box Jenkins Methodology….....………………………………………………….17
2.3 Time Series Forecasting using ANN……………………………………………..18
2.4 Nonlinear Auto Regressive Model (NAR)…………………………………….....20
2.4.1 Architecture of Auto Regressive Neural Network…………………......21
2.5 Nonlinear Autoregressive Exogenous Model (NARx)…………………………..22
2.5.1 Structure of Nonlinear ARX Model……………………………………23
2.5.2 Nonlinear ARX Model Orders and Delays…………………………….24
2.6 Defining a Proper Network Architecture………………………………………...24

vi
CHAPTER 3 METHODOLOGY…………………….……………………………………26
3.1 Division of Data……………………………………….........................................27
3.2 Data Processing…………………………………………......................................28
3.3 Determination of Network Architecture………….………………………………28
3.4 Model Building Process….....……………………………………………………29
3.4.1 Obtaining Training Data………………………………………………..29
3.4.2 Training of Input Data…………………………………….....................30
3.4.2.1 Transfer Function………………….........................................31
3.4.2.2 Levenberg Marquardt Algorithm…………………………….32
3.4.2.3 Scaled Conjugate or Conjugate Gradient Algorithm………...33
3.4.2.4 Other Factors………………………………………………....34
3.4.3 Testing and Validation………………………………………................35
3.4.3.1 Problem of Under-fitting or Overfitting……………………...36
3.4.4 Error Analysis – Performance Measures……………………………….36
3.5 Forecast Performance Measures………………………………………………….36
3.5.1 Mean Squared Error……………………………………………………36
3.5.2 Sum of Squared Error…………………………………………………..37
3.5.3 Signed Mean Squared Error……………………………………………37
3.5.4 Root Mean Squared Error………………………………………….......37
3.5.5 Normalized Mean Squared Error………………………………………37
3.5.6 Mean Forecast Error……………………………………………………38
3.5.7 Mean Absolute Error…………………………………………………...38
3.5.8 Mean Absolute Percentage Error………………………………………38
3.5.9 Mean Percentage Error…………………………………………………39

CHAPTER 4 CASE STUDY DETAILS…………………………………………………...40


4.1 Case Study 1…..…………………………….........................................................40
4.1.1 Site Details…………………………………………..............................40
4.1.2 Site Location and Layout………….…………………………………...41
4.1.3 Project Introduction………….…………………………………………42
4.1.4 Project Completion Time………….…………………………………...43
4.2 Case Study 2……… ………………………………..............................................45
4.2.1 Site Details…………………………………………..............................45
4.2.2 Brief Scope of Work………….………………………………………..46

vii
4.2.3 Civil Works………….…………………………………………………47
4.2.4 Electrical Works………….…………………………………………….47

CHAPTER 5 RESULT AND DISCUSSION……………………………………………...48


5.1 Results of Case Study 1...………………………………………………………...48
5.1.1 Project Data………….…………………………………………………48
5.1.2 Choice of Selection of Parameter………….…………………………...48
5.2 Result and Analysis of Case Study 1....………………………………..……..….49
5.2.1 Regression Graph……………………………………………....49
5.2.2 Time Series Graph……………………………………………...52
5.2.3 Error Analysis…………………………………….....................52
5.2.3.1 Error Histogram………………………………………53
5.3 Results of Case Study 2…………………………………………………………..55
5.3.1 Project Data………….…………………………………………………55
5.4 Result and Analysis of Case Study 2....………………………………...………..55
5.4.1 Regression Graph………………………………………………55
5.4.2 Time Series Graph……………………………………………...58
5.4.3 Error Analysis…………………………………….....................58
5.4.3.1 Error Histogram………………………………………59
5.5 Comparative Analysis………………………………………................................61
5.5.1 Data Gaps…………………………………………................................61
5.5.2 Noise - Scattered Data………….………………………………………62
5.5.3 Pseudo Effect………….………………………………………………..62
5.5.4 Large Data Set………….………………………………………………63

CHAPTER 6 CONCLUSION……………………………………………………………...64
REFERENCES……………………………………………………………………………...65

viii
LIST OF FIGURES

Fig 1.1. Basic structure of Artificial Neural Network………………………………….……...5

Fig 1.2. Feed Forward Neural Network……………………………………………….………6

Fig 1.3. Feedback Neural Network………………………………………………….………...6

Fig 1.4. Supervised VS Unsupervised Learning…………………………………….………...7

Fig 1.5. Basic Structure of Decision Support System……………………………….………...9

Fig 2.1. A four phase business cycle……………………………………………….………...15

Fig 2.2. Flowchart of Time Series Analysis……………………………………….…………16

Fig 2.3. Flowchart of Box Jenkins Methodology……………………………….……………17

Fig 2.4. Flowchart of Methodology followed………………………………….…………….19

Fig 2.5. MATLAB NTSTOOL page…………………………………………...…………….19

Fig 2.6. Basic Network of NAR model……………………………………………...……….21

Fig 2.7. NAR network diagram with inputs…………………………………………...……..22

Fig 2.8. Basic Network of NARx model………………………………………...…………...22

Fig 2.9. NARx network diagram with inputs……………………………………...…………23

Fig 2.10. A Proper Network Architecture………………………………………….………...25

Fig 3.1. Basic Architecture of Proposed Network Model……………………...…………….29

Fig 4.1. Site Layout of IT City, Mohali……………………………………………………...41

Fig 4.2. Site Location of IT City, Mohali…………………………………………………….42

Fig 4.3. Project Schedule Completion - Planned vs Actual………………………………….44

Fig 4.4. Reinforcement Works for Ghat Floor Slab at Chaudhary Tola Ghat……………….45

Fig 4.5. Concrete Work under progress using Barge Mounted Batching Plant………….......46

ix
Fig 5.1. Regression Graph of Case Study 1………………….……………...……………….49

Fig 5.2. Performance Graph of Case Study 1…………..……………………...……………..51

Fig 5.3. Time Series Graph of Case Study 1………………………..…………...…………...52

Fig 5.4. Error Histogram of Case Study 1…………………………………..……...………...53

Fig 5.5. Error Auto-Correlation Graph of Case Study 1…………………………....………..54

Fig 5.6. Regression Graph of Case Study 2……………………………………..……….…..56

Fig 5.7. Performance Graph of Case Study 2………………………………….......................57

Fig 5.8. Time Series Graph of Case Study 2……………………………………..…..............58

Fig 5.9. Error Histogram of Case Study 2……………………………………………………59

Fig 5.10. Error Auto-Correlation Graph of Case Study 2…………………………................60

x
LIST OF NOTATIONS

AIC Akaike Information Criterion

ANN Artificial Neural Network

ARIMA Auto Regression with Integrated Moving Average

ARMA Auto Regression and Moving Average

BIC Bayesian Information Criterion

CGA Conjugate Gradient Algorithm

CM Construction Management

CPM Critical Path Method

DSS Decision Support System

FFBP Feed Forward Back Propagation

GA Genetic Algorithm

LM Levenberg-Marquardt algorithm

MAE Mean Absolute Error

MAPE Mean Absolute Percentage Error

MFE Mean Forecast Error

MPCS Management Planning and Control System

MPE Mean Percentage Error

MSE Mean Squared Error

NAR Non-linear Auto Regressive network

NARx Non-linear Autoregressive with eXogenous inputs

NMSE Normalized Mean Squared Error

xi
NTSTOOL Neural Time Series Tool

PERT Program Evaluation and Review Technique

RMSE Root Mean Squared Error

SCA Scaled Conjugate Algorithm

SD Standard Deviation

SMSE Signed Mean Squared Error

SSE Sum of Squared Error

STAR Smooth Transition Auto Regression

TAR Threshold Auto Regression

TCT Time Cost Trade-off

WET IC Water Effluents and Treatment

xii
CHAPTER 1
INTRODUCTION

In a multi-dimensional, critically constrained construction project resources and their


availability plays a crucial role. A project becomes stagnant when resources are not made
available at the right time enduring high cost and time overruns. Hence it is an absolute
necessary to plan and make decisions well ahead of schedule.

However, this brings us to the next stage of the problem of planning resources. Planning, per
se, is a phenomenon based on experience and predictions. There may be certain cases when
the resources planned fall short of the required target or exceeding it by far, both cases
causing huge cost burdens to the overall project. The systematic indulgence of resource
planning with accurate predictions has not been obtained so far. Many resource optimization
techniques have been used such as resource levelling, resource smoothening, data gathering
and representation, analytical techniques. Use of PRIMAVERA, MS Project for resource
optimization has not yet yielded sufficient results.

Few companies can remain competitive in today’s highly competitive business environment
without effectively managing the cost of resources. In practice, basic PERT and CPM
scheduling techniques have proven to be helpful only when the project deadline is not fixed
and the resources are not constrained by either availability or time. Since this is not practical
even for small-sized projects, several techniques have been used to modify CPM results in
account of practical considerations. In dealing with project resources, the two already
discussed techniques are resource allocation and resource levelling. Resource allocation
(sometimes referred to as constrained- resource scheduling) attempts to reschedule the project
tasks so that a limited number of resources can be efficiently utilized while keeping the
unavoidable extension of the project to a minimum. Resource levelling (often referred to as
resource smoothing), on the other hand, attempts to reduce the sharp variations among the
peaks and valleys in the resource demand histogram while maintaining the original project
duration. These techniques, as such, deal with two distinct sub-problems that can only be
applied to a project one after the other rather than simultaneously. Accordingly, they do not
guarantee (either individually or combined) a project schedule that minimizes the overall
project time or cost.
There are a number of factors, however, that make resource management a difficult task, as
summarized in the following points:

1
 Segregation of Resource Management: Various researchers have introduced a
number of techniques to deal with individual aspects of resource management, such as
resource allocation, resource levelling, cash flow management, and time-cost trade-off
(TCT) analysis. The work of Talbot and Patterson (1979) and Gavish and Pirkul
(1991), for example, deals with resource allocation, while the work of Easa (1989)
and Shah et al. (1993) deal with resource levelling. Other models were focused only
on TCT analysis e.g., Liu et al. (1995). While these studies are beneficial, they deal
with isolated aspects that can be applied to projects one after the other, rather than
simultaneously. Little effort has been done to achieve combined resource optimization
because of the inherent complexity of projects and the difficulties associated with
modelling all aspects combined,
 Inadequacy of Traditional Optimization Algorithms: In the past few decades,
traditional resource optimization was based on either mathematical methods or
heuristic techniques. Mathematical methods, such as integer, linear, or dynamic
programming has been proposed for individual resource problems. Mathematical
methods, however, are computationally non-tractable for any real-life project, which
is reasonable in size. In addition, mathematical models suffer from being complex in
their formulation and may be trapped in local optimum. Heuristic methods, on the
other hand, use experience and rules-of-thumb, rather than rigorous mathematical
formulations. Despite their simplicity, heuristic methods perform with varying
effectiveness when used on different project networks, and there are no hard
guidelines that help in selecting the best heuristic approach to use.
 Difficulties with Simulation Modelling: During the past three decades, computer
simulation has been introduced to support the efficient use of construction resources.
Even though researchers were interested in its ability to mimic real-world
construction processes on computers, construction practitioners may find it difficult to
master. As a very beneficial tool for resource planning, extensive research was
directed at developing simulation models of construction operation. However, many
of existing tools require knowledge of computer programming and simulation
language, and lack integration with existing project management software and with
optimization algorithms.

2
 Availability of a New Breed of Tools: Recent developments in computer science
have produced a new breed of tools that are beneficial to be utilized for construction
applications.
Based on recent advances in artificial intelligence, a new optimization technique, genetic
algorithms (GA) has emerged. Simulating natural-evolution and survival-of-the-fittest
mechanisms, GAs apply a random search for the optimum solution to a problem. Due to their
perceived benefits, GAs have been used successfully to solve several engineering and
construction management problems.
In the recent past, a move has been made towards the use of algorithms to solve the problem
of optimization. Heuristic, Analytical and multi-objective optimization techniques have been
employed to solve the critical problems of resource allocation. In fact, there is not even a
universally accepted definition of "optimum" as in single-objective optimization, which
makes it difficult to even compare the results of one method to that of another, because
normally the decision about what the "best" answer is depends on the decision maker. In the
past few decades, traditional resource optimization was based on either mathematical
methods or heuristic techniques. Mathematical methods, such as integer, linear, or dynamic
programming have been proposed for individual resource problems.
Mathematical methods, however, are computationally non-tractable for any real-life project,
which is reasonable in size. In addition, mathematical models suffer from being complex in
their formulation and may be trapped in local optimum.
Heuristic methods, on the other hand, use experience and rules-of-thumb, rather than rigorous
mathematical formulations. Researchers have proposed various heuristic methods for
resource allocation. Despite their simplicity, heuristic methods perform with varying
effectiveness when used on different project networks, and there are no hard guidelines that
help in selecting the best heuristic approach to use. They, therefore, cannot guarantee
optimum solutions. Furthermore, their inconsistent solutions have contributed to large
discrepancies among the resource-constrained capabilities of commercial project
management software.
In this paper a new genre of decision making is used in order to solve the problem of resource
optimization. Artificial intelligence and the use of deep learning have gathered speed in the
past decade. The use of Artificial Neural Network has been employed in order to learn from
previous projects and exercise a better understanding of resource allocation for future
projects. The complexity of mathematical methods and the dire unpredictability of heuristic
methods are eliminated in the use of Artificial Neural Network (ANN). The process involving

3
ANN uses vast amounts of data of previous projects in order to set a pattern and classify
important factors that sway the decision on resources.

1.1 Artificial Neural Network


1.1.1 Background
Artificial Neural Networks (ANN for short) are practical, elegant, and mathematically
fascinating models for machine learning. They are inspired by the central nervous systems of
humans and animals – smaller processing units (neurons) are connected together to form a
complex network that is capable of learning and adapting.
The idea of such neural networks is not new. McCulloch-Pitts (1943) described binary
threshold neurons already back in 1940’s. Rosenblatt (1958) popularised the use of
perceptrons, a specific type of neurons, as very flexible tools for performing a variety of
tasks. The rise of neural networks was halted after Minsky and Papert (1969) published a
book about the capabilities of perceptrons, and mathematically proved that they can’t really
do very much. This result was quickly generalised to all neural networks, whereas it actually
applied only to a specific type of perceptrons, leading to neural networks being disregarded
as a viable machine learning method.
In recent years, however, the neural network has made an impressive comeback. Research in
the area has become much more active, and neural networks have been found to be more than
capable learners, breaking state-of-the-art results on a wide variety of tasks. This has been
substantially helped by developments in computing hardware, allowing us to train very large
complex networks in reasonable time.
Artificial Neural Networks are computing systems made up of a number of simple, highly
interconnected processing elements, which process information by their dynamic state
response to external inputs. A neural network is a computational structure inspired by the
study of biological neural processing. The idea of ANNs is based on the belief that working
of human brain by making the right connections can be imitated using silicon and wires as
living neurons and dendrites. The human brain is composed of 86 billion nerve cells called
neurons. They are connected to other thousand cells by Axons. Stimuli from external
environment or inputs from sensory organs are accepted by dendrites. These inputs create
electric impulses, which quickly travel through the neural network. A neuron can then send
the message to other neuron to handle the issue or does not send it forward. ANNs are
composed of multiple nodes, which imitate biological neurons of human brain. The neurons

4
are connected by links and they interact with each other. The nodes can take input data and
perform simple operations on the data. The result of these operations is passed to other
neurons. The output at each node is called its activation or node value. ANNs are considered
nonlinear statistical data modelling tools where the complex relationships between inputs and
outputs are modelled or patterns are found.
Each link is associated with weight. ANNs are capable of learning, which takes place by
altering weight values. The following Figure 1.1 shows a simple ANN:

Fig 1.1: Basic structure of Artificial Neural Network

There are many different types of neural networks, from relatively simple to very complex.
ANN can be classified in to feed forward and feedback or recurrent networks.

1.1.2 Feed Forward ANN:


The information flow is unidirectional. A unit sends information to other unit from which it
does not receive any information. There are no feedback loops. They are used in pattern
generation/recognition/classification. They have fixed inputs and outputs. The goal of a
feedforward network is to approximate some function f*. For example, for a classifier, y =
f*(x) maps an input x to a category y. A feedforward network defines a mapping y = f(x;θ)
and learns the value of the parameters θ that result in the best function approximation. These
models are called feed forward because information flows through the function being
evaluated from x, through the intermediate computations used to define f, and finally to the
output y. There are no feedback connections in which outputs of the model are fed back into

5
itself. When feedforward neural networks are extended to include feedback connections, they
are called recurrent neural networks as shown in Figure 1.2.

Fig 1.2: Feed Forward Neural Network


1.1.3 Feed Back ANN:
Here, feedback loops are allowed. They are used in content addressable memories. The basic
difference between the two is that in feed forward networks the signal is passed in a forward
manner only till the desired output is obtained from the output layer. Whereas in feedback
network the output obtained is fed back into the network through the input layer, thus this
type of network will have a minimum of single loop in its structure as shown in Figure 1.3.

Fig 1.3: Feedback Neural Network

A teacher is assumed to be present during the learning process, when a comparison is made
between the network’s computed output and the correct expected output, to determine the
error. The error can then be used to change network parameters, which result in an
improvement in performance. In unsupervised learning the target output is not presented to
the network. It is as if there is no teacher to present the desired patterns and hence, the system

6
learns of its own by discovering and adapting to structural features in the input patterns
(Deepthi I. Gopinath and G.S. Dwarakish, 2015).

1.2 Machine Learning in ANNs


ANNs are capable of learning and they need to be trained. There are several learning
strategies:
 Supervised Learning − It involves a teacher that is scholar than the ANN itself. For
example, the teacher feeds some example data about which the teacher already knows
the answers such as pattern recognizing. The ANN comes up with guesses while
recognizing. Then the teacher provides the ANN with the answers. The network then
compares it guesses with the teacher’s “correct” answers and makes adjustments
according to errors.
 Unsupervised Learning − It is required when there is no example data set with known
answers. For example, searching for a hidden pattern. In this case, clustering i.e.
dividing a set of elements into groups according to some unknown pattern is carried
out based on the existing data sets present.
 Reinforcement Learning − This strategy is built on observations. The ANN makes a
decision by observing its environment. If the observation is negative, the network
adjusts its weights to be able to make a different required decision the next time.

Fig 1.4: Supervised VS Unsupervised Learning

7
1.3 ANN in Construction Projects
Since late 1980’s several investigators have applied ANN in civil engineering to carry out a
variety of tasks such as prediction, optimization, system modelling and classification. The
applications of ANN in Construction Management include the following fields: Cost,
Productivity, Risk Analysis, Safety, Duration, Dispute, Unit rate and Hybrid Models. ANN’s
were found to learn from the relationships between input and output provided through
training data and could generalize the output, making it suitable for non-linear problems
where judgment, experience and surrounding conditions are the key features. ANNs typically
comprise of 3 layers i.e. input layer with input neurons, hidden layer(s) with hidden neurons
and output layers with output neurons.
Each neuron in the input layer is connected to each neuron in the hidden layer and each
neuron in hidden layer is connected to each neuron in the output layer. The number of hidden
layers and number of neurons in each hidden layer can be one or more than one. The number
of input neurons, hidden neurons and output neurons constitute the network architecture.
Before its application the network is trained, i.e., the connection weights and bias values are
fixed, with the help of a mathematical optimization algorithm and using part of the data set
until a very low value of error is attained. The network is then tested with an unseen data set
to judge the accuracy of the developed model. The network is trained using various training
algorithms which aim at minimizing the error between the observed and network predicted
values. The networks are classified according passage of flow of information either in the
forward direction (feed forward) or in reverse or lateral directions (recurrent network).
Generally three-layer feed-forward or recurrent networks are found to be sufficient in civil
engineering practices.
There are several papers and study that confirm the usefulness of ANNs in carrying out a
variety of prediction, classification; optimization and modelling related tasks in areas of CM.
ANNs are based on the input-output data in which the model can be trained and can always
be updated to obtain better results by presenting new training examples. ANN thus has
significant benefits that make it a powerful tool for solving many problems in the field of
CM.

8
1.4 Decision Support System (DSS)
A decision support system (DSS) is a computerized information system used to support
decision-making in an organization or a business. A DSS lets users sift through and analyse
massive reams of data and compile information that can be used to solve problems and make
better decisions referred in Figure 1.5. DSSs serve the management, operations and planning
levels of an organization and help people make decisions about problems that may be rapidly
changing and not easily specified in advance—i.e. unstructured and semi-structured decision
problems.
The primary purpose of using a DSS is to present information to the customer in a way that is
easy to understand. A DSS system is beneficial because it can be programed to generate
many types of reports, all based on user specifications. A DSS can generate information and
output it graphically, such as a bar chart that represents projected revenue, or as a written
report.

Fig 1.5: Basic Structure of Decision Support System

Some authors have extended the definition of DSS to include any system that might support
decision making and some DSS include a decision-making software component; Sprague
(1980) defines a properly termed DSS as follows:
 DSS tends to be aimed at the less well structured, underspecified problem that upper
level managers typically face;
 DSS attempts to combine the use of models or analytic techniques with traditional
data access and retrieval functions;
 DSS specifically focuses on features which make them easy to use by non-computer-
proficient people in an interactive mode; and
 DSS emphasizes flexibility and adaptability to accommodate changes in the
environment and the decision making approach of the user.

9
DSSs include knowledge-based systems. A properly designed DSS is an interactive software-
based system intended to help decision makers compile useful information from a
combination of raw data, documents, and personal knowledge, or business models to identify
and solve problems and make decisions.

1.5 Objective of the Study


Following are the objectives of this study as listed below:
 Study and analyse the resource utilization parameters of 4M resources (manpower,
machinery, materials and money) in terms of activities including excavation and
backfilling, piling, road laying, masonry, reinforcement and concreting through case
studies.
 Program the code using Artificial Neural Network to optimize and predict the future
data.
 Measure the performance indicators of the data model for two case studies and to
present a comparative analysis between them.

1.6 Scope of the Study


This study mainly focuses on Waste Water unit of the construction giant Larsen and Toubro.
The study covers required data from various projects undertaken by this water business unit.
This data from the sites is used in the investigation of resource requirement on the timely
completion of the projects. The study mainly pertains to the Construction Industry and firms
in India.

10
CHAPTER 2
LITERATURE REVIEW

In this chapter, a number of previous studies has been looked at which have used Artificial
Neural Network in model predictions for construction projects. Various authors have given
comprehensive studies on the ways and methods to increase accuracy of the prediction model
using ANN. The concept of Time Series analysis has been explained in details and how it can
be implemented in ANN.

Kulkarni et. al, (2017) reviewed application of ANNs in construction activities related to
prediction of costs, risk and safety, tender bids, as well as labour and equipment productivity.
The review suggested that the ANN’s had been highly beneficial in correctly interpreting an
inadequate input information. It was seen that most of the investigators used feed forward
back propagation type of the network; however if a single ANN architecture was found to be
insufficient then hybrid modelling in association with other machine learning tools such as
genetic programming and support vector machines were much useful. It was however clear
that the authenticity of data and experience of the modeller are important in obtaining good
results.
The review confirmed the usefulness of ANNs in carrying out a variety of prediction,
classification; optimization and modelling related tasks in areas of CM. ANNs are based on
the input-output data in which the model can be trained and can always be updated to obtain
better results by presenting new training examples. ANN thus has significant benefits that
make it a powerful tool for solving many problems in the field of CM. However large scope
is still found to exist in experimenting with variety of network architectures, training
algorithms and hybrid type of methods, which could lead to a higher level of model
performance. Acceptability of ANN for routine use in CM can be increased if clear
guidelines to select input, network architecture, learning algorithms and other network
control parameters are evolved from an exhaustive assessment of all past works. Providing a
standard benchmark for determining the accuracy level of the construction proposals will
help in increased use of ANN in CM. Large scale attempts in future to unlock potential
knowledge in the network system can also go a long way in increasing user confidence in the
ANN use. Only few instances are seen pertaining to use of developed ANN models for
practical applications. Implementation of ANN for live projects and a step towards
understanding the user related problems towards implementation of the same should be done.

11
Sodikov, (2005) attempted to prove that cost estimation inaccuracy at the conceptual phase
can be reduced to half of what it is at the present time by using network simulation models.
ANN could be an appropriate tool to help solve problems which come from a number of
uncertainties such as cost estimation at the conceptual phase. Future work was focused on
developing an ANN model of cost estimation by incorporating other methods including fuzzy
logic, case based reasoning, and other up to date techniques. The limitation made in this study
was that only two types of road works were investigated: new construction and asphalt
overlay projects. Some assumptions were accepted during data analysis such as the type of
missing data and use of synthetic data. Nevertheless, this study can be reproduced and
applied to other road works and with a complete dataset. Some recommendations were
developed during the study, such as how to choose the number of variables corresponding to
project type and prediction accuracy levels, can be reproduced for other model types.

Hung and Babel, (2009) presented a new approach using an Artificial Neural Network
technique to improve rainfall forecast performance. A real world case study was set up in
Bangkok; 4 years of hourly data from 75 rain gauge stations in the area were used to develop
the ANN model. The developed ANN model is being applied for real time rainfall forecasting
and flood management in Bangkok, Thailand. Aimed at providing forecasts in a near real
time schedule, different network types were tested with different kinds of input information.

Gopinath and Dwarakish, (2015) attempted to predict waves at New Mangalore Port Trust
(NMPT) located along the west coast of India using Feed Forward Back Propagation (FFBP)
with LM algorithm and a recurrent network called Non-linear Auto Regressive with
exogenous input(NARX) network. Field data of NMPT has been used to train and test the
network performance, which are measured in terms of mean square error (mse) and
correlation coefficient (r). Effect of network architecture on the performance of model has
been studied. Correlation coefficient is found to be 0.94 in case of NARX predictions
indicating better performance than FFBP network whose ‘r’ value is 0.9. It was found that for
time series prediction NARX network outperform FFBP network not only in terms of
accuracy but also in terms of time required for computation.
The present study makes use of relatively new technique of Artificial Neural Network which
has been tried and tested in various coastal engineering applications. In the present study
FFBP and NARX network are used to predict waves at NMPT along west coast of India.
Predictions of waves at NMPT for one year has been carried out using yearlong wave data

12
with FFBP network gave satisfactory correlation coefficient ‘r’ value of 0.90 and 0.91 for
data set divided on monthly and weekly basis respectively. Using NARX network prediction
up to 25 weeks can be achieved with accuracy level greater than 0.94 using one week’s data
and yearly prediction can be achieved with accuracy greater than 0.94 using one month’s
data. Comparison of the results of FFBP network and NARX network showed NARX
performing better than the later as the ‘r’ obtained in case of NARX was 0.94.

Ezeldin and Sharara, (2016) developed three neural networks to estimate the productivity,
within a developing market, for formwork assembly, steel fixing, and concrete pouring
activities. Eighteen experts working in six projects were carefully selected to gather the data
for the neural networks. Ninety-two data surveys were obtained and processed for use by the
neural networks. Commercial software was used to perform the neural network calculations.
The processed data were used to develop, train, and test the neural networks. The results of
the developed framework of neural networks indicate adequate convergence and relatively
strong generalization capabilities. When used to perform a sensitivity analysis on the input
factors influencing the productivity of concreting activities, the framework has demonstrated
a good potential in identifying trends of such factors.

Mandal and Prabharan, (2005) described an artificial neural network, namely recurrent
neural network with rprop update algorithm and is applied for wave forecasting. Measured
ocean waves off Marmugao, west coast of India are used for this study. Here, the recurrent
neural network of 3, 6 and 12 hourly wave forecasting yields the correlation coefficients of
0.95, 0.90 and 0.87 respectively. This shows that the wave forecasting using recurrent neural
network yields better results than the previous neural network application.

Shukla et. al, (2014) focused on data mining technique based on artificial neural network and
its application in runoff forecasting. The long-term and short- term forecasting model was
developed for runoff forecasting using various approaches of Artificial Neural Network
techniques. This study compares various approaches available for runoff forecasting of
artificial neural networks (ANNs). On the basis of this comparative study, it is tried to find
out better approach in perspective of research work.

13
2.1 Time Series Modelling (NTSTOOL)
Time series modelling is a dynamic research area which has attracted attentions of
researchers’ community over last few decades. The main aim of time series modelling is to
carefully collect and rigorously study the past observations of a time series to develop an
appropriate model which describes the inherent structure of the series. This model is then
used to generate future values for the series, i.e. to make forecasts. Time series forecasting
thus can be termed as the act of predicting the future by understanding the past. Due to the
indispensable importance of time series forecasting in numerous practical fields such as
business, economics, finance, science and engineering, etc. proper care should be taken to fit
an adequate model to the underlying time series. It is obvious that a successful time series
forecasting depends on an appropriate model fitting. A lot of efforts have been done by
researchers over many years for the development of efficient models to improve the
forecasting accuracy. As a result, various important time series forecasting models have been
evolved.
A time series is a sequential set of data points, measured typically over successive times. It is
mathematically defined as a set of vectors x(t),t = 0,1,2,... where t represents the time elapsed.
The variable x(t) is treated as a random variable. The measurements taken during an event in
a time series are arranged in a proper chronological order.
A time series containing records of a single variable is termed as univariate. But if records of
more than one variable are considered, it is termed as multivariate. A time series can be
continuous or discrete. In a continuous time series observations are measured at every
instance of time, whereas a discrete time series contains observations measured at discrete
points of time. For example temperature readings, flow of a river, concentration of a chemical
process etc. can be recorded as a continuous time series. On the other hand population of a
particular city, production of a company, exchange rates between two different currencies
may represent discrete time series. Usually in a discrete time series the consecutive
observations are recorded at equally spaced time intervals such as hourly, daily, weekly,
monthly or yearly time separations.

2.1.1 Components of a Time Series


A time series in general is supposed to be affected by four main components, which can be
separated from the observed data. These components are: Trend, Cyclical, Seasonal and
Irregular components. A brief description of these four components is discussed below. The
general tendency of a time series to increase, decrease or stagnate over a long period of time

14
is termed as Secular Trend or simply Trend. Thus, it can be said that trend is a long term
movement in a time series. For example, series relating to population growth, number of
houses in a city etc. show upward trend, whereas downward trend can be observed in series
relating to mortality rates, epidemics, etc.
Seasonal variations in a time series are fluctuations within a year during the season. The
important factors causing seasonal variations are: climate and weather conditions, customs,
traditional habits, etc. For example sales of ice-cream increase in summer, sales of woollen
cloths increase in winter. Seasonal variation is an important factor for businessmen,
shopkeeper and producers for making proper future plans.
The Cyclical variation in a time series describes the medium-term changes in the series,
caused by circumstances, which repeat in cycles. The duration of a cycle extends over longer
period of time, usually two or more years. Most of the economic and financial time series
show some kind of cyclical variation. For example a business cycle consists of four phases,
viz. i) Prosperity, ii) Decline, iii) Depression and iv) Recovery.
Schematically a typical business cycle is shown in Figure 2.1:

Fig 2.1: A four phase business cycle


Irregular or random variations in a time series are caused by unpredictable influences, which
are not regular and also do not repeat in a particular pattern. These variations are caused by
incidences such as war, strike, earthquake, flood, revolution, etc. There is no defined
statistical technique for measuring random fluctuations in a time series.
Considering the effects of these four components, two different types of models are generally
used for a time series viz. Multiplicative and Additive models.
Multiplicative Model: Y(t) = T(t)× S(t)×C(t)× I (t).
Additive Model: Y(t) = T(t) + S(t) + C(t) + I (t).
Here Y(t) is the observation and T(t) , S(t) ,C(t) and I (t) are respectively the trend, seasonal,
cyclical and irregular variation at time t. Multiplicative model is based on the assumption that
the four components of a time series are not necessarily independent and they can affect one
15
another; whereas in the additive model it is assumed that the four components are
independent of each other.

2.1.2 Introduction to Time Series Analysis


In practice a suitable model is fitted to a given time series and the corresponding parameters
are estimated using the known data values. The procedure of fitting a time series to a proper
model is termed as Time Series Analysis. It comprises methods that attempt to understand
the nature of the series and is often useful for future forecasting and simulation.
In time series forecasting, past observations are collected and analysed to develop a suitable
mathematical model which captures the underlying data generating process for the series. The
future events are then predicted using the model as shown in Figure 2.2. This approach is
particularly useful when there is not much knowledge about the statistical pattern followed by
the successive observations or when there is a lack of a satisfactory explanatory model. Time
series forecasting has important applications in various fields. Often valuable strategic
decisions and precautionary measures are taken based on the forecast results. Thus making a
good forecast, i.e. fitting an adequate model to a time series is very important. Over the past
several decades many efforts have been made by researchers for the development and
improvement of suitable time series forecasting models.

Fig 2.2: Flowchart of Time Series Analysis

16
2.2 Box-Jenkins Methodology
After describing various time series models, the next step is how to select an appropriate
model that can produce accurate forecast based on a description of historical pattern in the
data and how to determine the optimal model orders. Statisticians George Box and Gwilym
Jenkins developed a practical approach to build Auto-regressive model, which best fit to a
given time series. Their concept has fundamental importance on the area of time series
analysis and forecasting.
The Box-Jenkins methodology does not assume any particular pattern in the historical data of
the series to be forecasted. Rather, it uses a three step iterative approach of model
identification, parameter estimation and diagnostic checking to determine the best model
from a general class of Auto-regressive models. This three-step process is repeated several
times until a satisfactory model is finally selected. Then this model can be used for
forecasting future values of the time series. The Box-Jenkins forecast method is
schematically in the Figure 2.3 below.

Fig 2.3: Flowchart of Box Jenkins


Methodology
A crucial step in an appropriate model selection is the determination of optimal model
parameters. One criterion is that the sample Auto-Correlation factors, calculated from the
training data should match with the corresponding theoretical or actual values. Other widely

17
used measures for model identification are Akaike Information Criterion (AIC) and Bayesian
Information Criterion (BIC) which are defined below:
AIC (p) = n ln (σe2 /n) + 2p
BIC (p) = n ln (σe2 /n) + p + p ln (n)
Here n is the number of effective observations, used to fit the model, p is the number of
parameters in the model and σe2 is the sum of sample squared residuals. The optimal model
order is chosen by the number of model parameters, which minimizes either AIC or BIC.

2.3 Time Series Forecasting Using Artificial Neural Networks


Recently, artificial neural networks (ANNs) have attracted increasing attention in the domain
of time series forecasting. Although initially biologically inspired, but later on ANNs have
been successfully applied in many different areas, especially for forecasting and classification
purposes. The excellent feature of ANNs, when applied to time series forecasting problems is
their inherent capability of non-linear modelling, without any presumption about the
statistical distribution followed by the observations. The appropriate model is adaptively
formed based on the given data. Due to this reason, ANNs are data-driven and self-adaptive
by nature.
The salient features of ANNs, which make them quite favourite for time series analysis and
forecasting, can be discussed as follows:
First, ANNs are data-driven and self-adaptive in nature. There is no need to specify a
particular model form or to make any prior assumption about the statistical distribution of the
data; the desired model is adaptively formed based on the features presented from the data.
This approach is quite useful for many practical situations, where no theoretical guidance is
available for an appropriate data generation process.
Second, ANNs are inherently non-linear, which makes them more practical and accurate in
modelling complex data patterns, as opposed to various traditional linear approaches. There
are many instances, which suggest that ANNs made quite better analysis and forecasting than
various linear models.
Finally, as suggested by Hornik and Stinchcombe, ANNs are universal functional
approximates. They have shown that a network can approximate any continuous function to
any desired accuracy. ANNs use parallel processing of the information from the data to
approximate a large class of functions with a high degree of accuracy. Further, they can deal
with situation, where the input data are erroneous, incomplete or fuzzy.

18
Fig 2.4: Flowchart of Methodology followed
In Matlab, ntstool launches the neural time series application and leads the user through
solving a time series problem using a two-layer feed-forward network. Several methods of
time-series modelling can be done such as NAR model, NARx model and Nonlinear Input-
Output model. System Identification Toolbox software provides tools for modelling and
forecasting time-series data. Estimates of both linear and nonlinear black-box and grey-box
models for time series data can be done. Some particular types of models are parametric
autoregressive (AR), autoregressive and moving average (ARMA), and autoregressive
models with integrated moving average (ARIMA). For nonlinear time series models, the
toolbox supports nonlinear ARX models. The Figure 2.5 below shows the MATLAB time
series tool for developing prediction model.

Fig 2.5: MATLAB NTSTOOL page


19
2.4 Nonlinear Autoregressive Model (NAR)
Predictive models are also used for system identification (or dynamic modelling), in which
you build dynamic models of physical systems. These dynamic models are important for
analysis, simulation, monitoring and control of a variety of systems, including manufacturing
systems, chemical processes, robotics and aerospace systems.
Nonlinear models try to overcome the problem of observed nonstandard features in linear
models. They can be interpreted as an alternative draft to linear models with extensions on
the stochastic part (ARMA) as they try to improve the predictable part to explain the process
rather than to add some stochastic components or to introduce some assumptions which are
difficult to handle. By contrast it is possible that a nonlinear AR has its Ɛt in accordance with
the standard assumptions. In natural sciences only nonlinear modelling allows us to think of
pure deterministic processes.
However, according to Granger and Teräsvirta (1993) such theory does not fit to economic
and financial time series. Nonlinear methods are moreflexible than linear models on the one
hand, but it may become difficult to interpret their parameters.
The entirety of nonlinear modelling techniques is large. The first step to classify them is to
distinguish between parametric, semiparametric and nonparametric methods. Parametric
means that the structure of the function to estimate and the number of the related parameters
are known. Examples are threshold auto-regression (TAR) or smooth transition auto-
regression (STAR), methods which consider regime switching effects. Nonparametric models
do not constrain the function to any specific form, but allow for a range of possible functions.
Granger and Teräsvirta (1993) describe semiparametric models as a combination of
parametric and nonparametric parts. Granger and Teräsvirta (1993) as well as Kuan and
White (1994) classify neural networks as parametric econometric models, for the model has
to be specified - including the number of parameters - before it is estimated.
Neural networks have a universal approximation property. This means that they are able to
approximate any (not specified) function arbitrary accurately. This property can be seen as
evidence for a nonparametric model. However, the neural network function has to be
specified and is therefore parametric, even if this parametric function may be able to
approximate any unknown function arbitrary precisely. Hence a neural network can be
referred as parametric model in the statistical sense. Of course in estimating linear functions
neural networks are clearly inferior to linear methods because of the needless additional
effort.

20
Nonlinear autoregressive network is a type of recurrent neural network that can learn to
predict a time series Y given past values of Y. In recurrent networks, the output depends not
only on the current input to the network but also on the previous input and output of the
network. The response of the static network at any point of time depends only on the value of
the input sequence at that same point whereas, the response of the recurrent networks lasts
longer than the input pulse. Its response at any given time depends not only on the current
input, but on the history of input sequence. This is done by introducing a tapped delay line in
the network which makes the input pulse last longer than its duration by an amount which is
equal to the delay given in the tapped delay.
The Nonlinear Autoregressive model predicts the time series using a series y(t) given d past
values of y(t) as shown in figure 2.6. It follows y(t) = f[y(t-1), y(t-2),……………..y(t-d)].

Fig 2.6: Basic Network of NAR model

2.4.1 Architecture of Auto-Regressive Neural Network


Neural networks, as they appear often in econometric literature always contain a linear and a
nonlinear part. To make the neural network function easily accessible, we use signal-flow
graph representation, stepwise, at first on the linear part and then of the whole neural network
function. The basic components of the universal approximation theorem are explained
clearly. As the universal approximation property depends of the activation function, it
becomes necessary to understand some appropriate bounded functions. Their boundedness
allows the analysis of stationarity using linear methods. Non-bounded activation functions in
contrast are much more difficult to handle. After the activation function and the architecture
of the network including the number of parameters is specified, the AR-NN becomes a
parametric function as mentioned above. This is the starting point for model building
according to the typical scheme of Box-Jenkins methodology (variable selection, estimation,
evaluation).

21
Fig 2.7: NAR network diagram with inputs

2.5 Nonlinear Autoregressive Exogenous Model (NARx)


Nonlinear Autoregressive models with eXogenous inputs or NARX network is a dynamic
neural network which appears effective in the input-output identification of both linear and
nonlinear systems. Nonlinear ARX models extend the linear ARX models to the nonlinear
case. The structure of these models enables us to model complex nonlinear behaviour using
flexible nonlinear functions, such as wavelet and sigmoid networks. When identifying them
by NARX model, the first step is to collect training data and the final results vary
considerably with different training data. In time series modelling, a nonlinear autoregressive
exogenous model (NARX) is a nonlinear autoregressive model which has exogenous inputs.
This means that the model relates the current value of a time series to both:
 past values of the same series; and
 current and past values of the driving (exogenous) series — that is, of the externally
determined series that influences the series of interest.
In addition, the model contains an "error" term which relates to the fact that knowledge of the
other terms will not be sufficient to predict the current value of the time series accurately
shown in figure 2.8. NARX network describes discrete nonlinear system by using past input
and output data. To the modelling system, the expression is shown below:
y(k) = f[ y(k-1), y(k-2)……, y(k-n); u(k), u(k-1)……, u(k-m)]
where u(k) and y(k) represent input and output values, respectively, of the system at time step
k , n and m represent input and output orders, and f() represents nonlinear mapping function.

Fig 2.8: Basic Network of NARx model

22
2.5.1 Structure of Nonlinear ARX Models
A nonlinear ARX model consists of model regressors and a nonlinearity estimator. The
nonlinearity estimator comprises both linear and nonlinear functions that act on the model
regressors to give the model output. This block diagram in figure 2.9 represents the structure
of a nonlinear ARX model in a simulation scenario.

Fig 2.9: NARx network diagram with inputs

The MATLAB software computes the nonlinear ARX model output (y) in two stages:
 It computes regressor values from the current and past input values and past output
data. In the simplest case, regressors are delayed inputs and outputs, such as u(t-1)
and y(t-3). These kinds of regressors are called standard regressors. We can specify
the standard regressors using the model orders and delay. We can also specify custom
regressors, which are nonlinear functions of delayed inputs and outputs. By default,
all regressors are inputs to both the linear and the nonlinear function blocks of the
nonlinearity estimator. We can choose a subset of regressors as inputs to the nonlinear
function block.
 It maps the regressors to the model output using the nonlinearity estimator block. The
nonlinearity estimator block can include linear and nonlinear blocks in parallel. For
example:
F(x) = LT (x−r) + d + g (Q (x−r))
Here, x is a vector of the regressors, and r is the mean of the regressors x. LT (x) +d is
the outputs of the linear function blocks and is affine when d ≠ 0. ‘d’ is a scalar offset. g
(Q(x−r)) represents the output of the nonlinear function block. Q is a projection matrix that
makes the calculations well-conditioned. The exact form of F(x) depends on your choice of
the nonlinearity estimator. We can select from available nonlinearity estimators, such as tree-
partition networks, wavelet networks, and multilayer neural networks. Also the linear or the
nonlinear function block can either be excluded from the nonlinearity estimator.

23
When estimating a nonlinear ARX model, the MATLAB software computes the model
parameter values, such as L, r, d, Q, and other parameters specifying g. Typically, all
nonlinear ARX models act as black-box structures. The nonlinear function of the nonlinear
ARX model is a flexible nonlinearity estimator with parameters that need not have physical
significance. We can estimate nonlinear ARX in the System Identification app or at the
command line using the nlarx command in MATLAB. Uniformly sampled time-domain
input-output data or time-series data (no inputs) for are used estimating nonlinear ARX
models. The data can have one or more input and output channels, while frequency-domain
data cannot be used for estimation.

2.5.2 Nonlinear ARX Model Orders and Delay


The orders and delays of a nonlinear ARX model are used to define the standard regressors of
the model. The orders and delay are defined as follows:
na — Number of past output terms used to predict the current output.
nb — Number of past input terms used to predict the current output.
nk — Delay from input to the output in terms of the number of samples.
The meaning of na, nb, and nk is similar to that for linear ARX model parameters. Orders are
specified as scalars for SISO data, and as ny-by-nu matrices for MIMO data, where ny and nu
are the number of outputs and inputs. Preliminary step suggested estimate is based on linear
ARX models and only provides initial guidance. The best orders for a linear ARX model
might not be the best orders for a nonlinear ARX model.

2.6 Defining a Proper Network Architecture


After specifying a particular network structure, the next most important issue is the
determination of the optimal network parameters. The number of network parameters is equal
to the total number of connections between the neurons and the bias terms.
A desired network model shown in figure 2.10 should produce reasonably small error not
only on within sample (training) data but also on out of sample (test) data. Due to this reason
immense care is required while choosing the number of input and hidden neurons. However,
it is a difficult task as there is no theoretical guidance available for the selection of these
parameters and often experiments, such as cross-validation are conducted for this purpose.
Another major problem is that an inadequate or large number of network parameters may
lead to the overtraining of data. Overtraining produces spuriously good within-sample fit,

24
which does not generate better forecasts. To penalize the addition of extra parameters some
model comparison criteria, such as AIC and BIC can be used.
In summary we can say that NNs are amazingly simple though powerful techniques for time
series forecasting. The selection of appropriate network parameters is crucial, while using NN
for forecasting purpose. Also a suitable transformation or rescaling of the training data is
often necessary to obtain best results.

Fig 2.10: A Proper Network Architecture

25
CHAPTER 3
METHODOLOGY

At the beginning of the model building process, it is important to clearly define the criteria by
which the performance of the model will be judged, as they can have a significant impact on
the model architecture and weight optimisation techniques chosen. In most applications,
performance criteria include one or more of the following: prediction accuracy, training
speed and the time delay between the presentation of inputs and the reception of outputs for a
trained network. The time delay between the presentation of network inputs and the reception
of the corresponding outputs is a function of processing speed.
For a particular computational platform, this is a function of the number of connection
weights and the type of connection between them. In order to maximise processing speed, it
is desirable to keep the number of connection weights as small as possible and the
connections between them as simple as possible. The time taken to train a network is highly
problem dependent. However, for a particular case study, training speed is a function of a
number of factors. The optimisation method and its associated parameters have a major
influence on convergence speed. The size of the training set can also play a significant role.
Another factor that affects training speed is the size of the network. Larger networks
generally require fewer weight updates to find an acceptable solution. However, the time
taken to perform one weight update is increased.
Prediction accuracy is affected by the optimisation algorithm. The method’s ability to escape
local minima in the error surface is of particular importance. A number of measures of
generally proposed are calculated using data that have not been utilised in the training process
so that the model’s generalisation ability can be assessed. Generalisation ability is defined as
a model’s ability to perform well on data that were not used to calibrate it and is a function of
the ratio of the number of training samples to the number of connection weights. If this ratio
is too small, continued training can result in overfitting of the training data. This problem is
exacerbated by the presence of noise in the data. To minimise the overfitting problem,
various techniques can be used for determining the smallest number of connection weights
that will adequately represent the desired relationship. Generalisation ability is also affected
by the degree with which the training and validation sets represent the population to be
modelled and by the stopping criterion used.

26
3.1 Division of data
It is common practice to split the available data into two sub-sets; a training set and an
independent validation set. The validation set is itself divided into validation and testing
subsets. Typically, ANNs are unable to extrapolate beyond the range of the data used for.
Consequently, poor forecasts/predictions can be expected when the validation data contain
values outside of the range of those used for training. It is also imperative that the training
and validation sets are representative of the same population. When limited data are
available, it might be difficult to assemble a representative validation set. In our model, the
percentage of data division is different owing to achieving optimum results.
One method which maximises utilisation of the available data is the holdout method
(Masters, 1993). The basic idea is to withhold a small subset of the data for validation and to
train the network on the remaining data. Once the generalisation ability of the trained
network has been obtained with the aid of the validation set, a different subset of the data is
withheld and the above process is repeated. Different subsets are withheld in turn, until the
generalisation ability has been determined for all of the available data. Other methods for
maximising the availability of data for training have also been proposed such as
Lachtermacher and Fuller (1994) generated a synthetic test set that possessed the same
statistical properties as the training data. Maier and Dandy (1998) suggested using a subset of
the data as a testing set in a trial phase to determine how long training should be carried out
so that acceptable generalisation ability is achieved. The subset used for testing is then added
to the remaining training data, and the whole data set is used to train the network for a fixed
number of epochs, based on the results from the trial phase.
Cross-validation (Stone, 1974) is a technique that is used frequently in ANN modelling and
has a significant impact on the way the available data are divided. It can be used to determine
when to terminate training and to compare the generalisation ability of different models. In
cross-validation, an independent test set is used to assess the performance of the model at
various stages of learning. As the validation set must not be used as part of the training
process, a different, independent testing set is needed for the purposes of cross-validation.
This means that the available data need to be divided into three subsets; a training set, a
testing set and a validation set, which is very data intensive. The same applies to cases where
network geometry or internal parameters are optimised by trial and error.

27
3.2 Data Processing
The data processing was conducted in three steps. The first step included obtaining the
Management Planning Control System data record of the latest month from each particular
site as prepared by planning department of each individual L&T site. The inputs of these data
were obtained in several parts containing schedules of planned vs actual quantities of various
resources utilized. Each part had specific details of the various resources used in each activity
included in the study. The data included the quantity of various equipments, material, labour
and money used for the activities. These individual data were used to create neural network
for the study model.
The second step in the data processing was the conversion of data into numeric values (if
needed) and the formulation of the individual set for each neural network. If the input data
fields available in the excel sheet were numbers, then they were directly entered into the
neural networks without manipulations or calculations. Each numerical data field was linked
to its corresponding position in its matrix, and simply transferred there. Data fields of that
nature were concrete quantity, steel quantity, and crew size. On the other hand, if the input
data were in text form, it needed to be converted from text format into numerical form in
order for the neural network to utilize it in its computations.
The third step was an optional scheme for randomizing the data in order to avoid incorrect
outcomes. In this step, data processing included randomizing records and normalizing data.
As the data were fed into the main excel table, it was entered from the excel table at hand. To
avoid the monotony that could arise from this situation, randomization or shuffling of the
records was required. The randomization improved the generalization capability of the
network and allowed for smoother convergence. Normalizing data was another manipulation
that was carried out on the data. The process converted the numbers available in each matrix
to values between fixed ranges. Such scaling would allow the neural networks to converge
faster and later to generalize better outputs.

3.3 Determination of network architecture


Network architecture determines the number of connection weights (free parameters) and the
way information flows through the network. Determination of appropriate network
architecture is one of the most important, but also one of the most difficult, tasks in the model
building process. Below a network architecture is described graphically in Figure 3.1.

28
Fig 3.1: Basic Architecture of Proposed Network Model

3.4 The model building process consists of four sequential steps:


 Obtaining Data for the supervised NAR learning.
 Training of the normalized data using LM learning.
 Testing and validating the goodness of fit of the model.
 Comparing the predicted output with the desired.

3.4.1 Obtaining Training Data


In any model development process, familiarity with the available data is of the utmost
importance. ANN models are no exception, and data pre-processing can have a significant
effect on model performance. It is important to note that the available data need to be divided
into their respective subsets (e.g. training, testing and validation) before any data pre-
processing is carried out. Generally, different variables span different ranges. In order to
ensure that all variables receive equal attention during the training process, they should be
standardised. In addition, the variables have to be scaled in such a way as to be
commensurate with the limits of the activation functions used in the output layer.
Data selection is carried out by a preliminary pre‐processing of all data coming from several
sites. After obtaining the data to be used to model the ANN, the subsequent step was to
separate the data into two parts: training dataset and test dataset. All of the data were
normalized to have the same range of values for each of the inputs to the ANN model. This

29
guarantees that there are no attributes which are more important than others because of data
ranges, and also eases a stable convergence of network weights and biases. In most traditional
statistical models, the data have to be normally distributed before the model coefficients can
be estimated efficiently. If this is not the case, suitable transformations to normality have to
be found. It has been suggested in the literature that ANNs overcome this problem, as the
probability distribution of the input data does not have to be known.
More recently, it has been pointed out that as the mean squared error function is generally
used to optimise the connection weights in ANN models, the data need to be normally
distributed in order to obtain optimal results. However, this has not been confirmed by
empirical trials. Clearly, this issue requires further investigation.
Until recently, the issue of stationarity has been rarely considered in the development of
ANN models. However, there are good reasons why the removal of deterministic components
in the data (i.e. trends, variance, seasonal and cyclic components) should be considered. As
previously discussed, it is generally accepted that ANNs cannot extrapolate beyond the range
of the training data. One way to deal with this problem is to remove any deterministic
components using methods commonly used in time series modelling such as classical
decomposition (Chatfield, 1975) or differencing (Box and Jenkins, 1976). Differencing has
already been applied to neural network modelling of non-stationary time series. However, use
of the classical decomposition model may be preferable, as differenced time series can
possess infinite variance. Another way of dealing with trends in the data is to use an adaptive
weight update strategy during on-line operation.
It has been suggested that the ability of ANNs to find non-linear patterns in data makes them
well suited to dealing with time series with non-regular cyclic variation. Maier and Dandy
(1996a) investigated the effect of input data with and without seasonal variation on the
performance of ANN models. Their results indicate that ANNs have the ability to cater to
irregular seasonal variation in the data with the aid of their hidden layer nodes.

3.4.2 Training of the Input Data


In certain cases, the input and the output data obtained have to be normalized because they
are of different units and otherwise there will be no correlation between the input and the
output values.
First the mean of all the data separately are taken.
Let the mean be M.

30
M= sum of all entries/number of entries
Then the standard deviation, SD, for each of these parameters individually is calculated. Now
after having the values of mean and SD for every parameter, the values for each parameter
were normalized
Normalized value = (x-M)/SD

3.4.2.1 Transfer (activation) Function


The transfer functions that are most commonly used are sigmoidal type functions such as the
logistic and hyperbolic tangent functions. However, other transfer functions may be used as
long as they are differentiable. In a study, Moody and Yarvin (1992) compared the
performance of logistic, polynomial, rational function (ratios of polynomials) and Fourier
series (sums of cosines) transfer functions on datasets containing varying degrees of noise
and non-linearities. They found that the non-sigmoidal transfer functions performed best
when the data were noiseless and contained highly non-linear relationships. When the data
were noisy and contained mildly non-linear relationships, the performance of the polynomial
transfer function was inferior while the performance of the other transfer functions was
comparable. There is also an argument that the hyperbolic tangent transfer function should be
used. Another option is to use a radial basis transfer function. However, radial basis function
networks operate quite differently from feedforward networks with polynomial or sigmoidal-
type transfer functions. The use of radial basis function networks is a research area in itself
and is beyond the scope of this paper.
Generally, the same transfer function is used in all layers. However, using sigmoidal- type
transfer functions in the hidden layers and linear transfer functions in the output layer can be
an advantage when it is necessary to extrapolate beyond the range of the training data (but it
is not considered in the context of this study). It should also be noted that the type of transfer
function used affects the size of the steps taken in weight space, as weight updates are
proportional to the derivative of the transfer function.
The training is done with the help of three algorithms:
 Levenberg-Marquardt Algorithm
 Scaled Conjugate Gradient Algorithm
 Bayesian Regularization Algorithm
In the case of the current study, only the first two algorithms have been taken into
consideration.

31
3.4.2.2. Levenberg-Marquardt algorithm
The Levenberg- Marquardt algorithm founded in the works of Levenberg (1944) and
Marquardt (1963) combines the steepest descent algorithm of Rumelhart, Hinton and
Williams (1986b) and Newton's method. Like other quasi-Newton methods it cannot be
counted to the second order gradient methods, because the Hessian matrix is approximated by
combinations of the Jacobian matrix of "Ɛt (), the matrix of first order gradients, such that no
second order gradients rest to calculate. According to Haykin (2009) the advantages of this
method are therefore that it converges rapidly like Newton's method but it cannot diverge
because of the steepest descent algorithm influence. Via modification of some parameters the
Levenberg-Marquardt algorithm can be made equal to either the steepest descentor Newton's
algorithm. According to Bishop (1995) the Levenberg-Marquardt algorithm is especially
applicable to error-sum-of-squares performance functions.
Levenberg-Marquardt algorithm used in this study can be written as:

Wnew = Wold [JT J + γ I]-1 JT E (Wold)

where J is the Jacobian of the error function (E), I is the identity matrix and γ is the parameter
used to define the iteration step value. It minimizes the error function while trying to keep the
step between old weight configuration (Wold) and new updated one (Wnew) small. The
performance of the network is measured in terms of various performance functions like sum
squared error (SSE), mean squared error (MSE), and Co-efficient of Correlation (CC or ‘r’)
between the predicted and the observed values of the quantities. Lower value of MSE and
higher value of CC indicates better performance of the network.
The Levenberg–Marquardt (LM) modification of the classical Newton algorithm overcomes
one of the problems of the classical Newton algorithm, as it guarantees that the Hessian is
positive definite. However, the computation/memory requirements are still O (k2). The LM
algorithm may be considered to be a hybrid between the classical Newton and steepest
descent algorithms. When far away from a local minimum, the algorithm’s behaviour is
similar to that of gradient descent methods. However, in the vicinity of a local minimum, it
has a convergence rate of order two. This enables it to escape local minima in the error
surface.

32
3.4.2.3 Scaled Conjugate or Conjugate Gradient Algorithm
Conjugate Gradient Algorithm (CGA) in which a search is performed along conjugate
directions, which produces generally faster convergence than steepest descent directions. A
search is made along the conjugate gradient direction to determine the step size, which will
minimize the performance function along that line. A line search is performed to determine
the optimal distance to move along the current search direction. Then the next search
direction is determined so that it is conjugate to previous search direction.
Conjugate gradient methods may be viewed as approximations to the Shanno algorithm. The
quasi-Newton approach proposed by Shanno (1978) overcomes both problems associated
with the classical Newton algorithm while maintaining a convergence rate of order two. The
computation/memory requirements are reduced to O(k) by using an approximation of the
inverse of the Hessian. This approximation also has the property of positive definiteness,
avoiding the problem of ‘uphill’ movement. One potential problem with the Shanno
algorithm is that it cannot escape local minima in the error surface, and may thus converge to
a sub-optimal solution. However the Shanno algorithm is expected to converge more rapidly
to a nearby strict local minimum, take fewer uphill steps, and have greater numerical
robustness than the generic conjugate gradient algorithm.
The general procedure for determining the new search direction is to combine the new
steepest descent direction with the previous search direction. An important feature of the
CGA is that the minimization performed in one step is not partially undone by the next, as it
is the case with gradient descent methods. The key steps of the CGA are summarized as
follows:
 Choose an initial weight vector wi.
 Evaluate the gradient vector g1,and set the initial search direction d1=-g1
 At step j, minimize E(wj + adj) with respect to a to give wj+1 = wj + amindj
 Test to see if the stopping criterion is satisfied.
 Evaluate the new gradient vector gj+1
 Evaluate the new search direction using dj+1= -gj+1 + ßj dj.
The various versions of conjugate gradient are distinguished by the manner in which the
constant ßj is computed.
The approximation tends in the limit to the true value of E (wj + adj). The calculation
complexity and memory usage of gj are, respectively, O (3N2) and O(N5). 5 If this strategy is
combined with the CG approach, we get an algorithm directly applicable to a feedforward

33
neural network. This slightly modified version of the original CG algorithm will also be
referred to as CG. The CG algorithm was tested on a generic test problem. Generally it fails
in almost every case and converges to a nonstationary point. The cause of this failure is that
the algorithm only works for functions with positive definite Hessian matrices, and that the
quadratic approximations on which the algorithm works can be very poor when the current
point is far from the desired minimum. The Hessian matrix for the global error function E has
shown to be indefinite in different areas of the weight space, which explains why CG fails in
the attempt to minimize E.

3.4.2.4 Other Factors essential for Training the Network


Epoch size
The epoch size is equal to the number of training samples presented to the network between
weight updates. If the epoch size is equal to 1, the network is said to operate in on-line mode.
If the epoch size is equal to the size of the training set, the network is said to operate in batch
mode. In many applications, batch mode is the preferred option, as it forces the search to
move in the direction of the true gradient at each weight update. However, several researchers
suggest using the online mode, as it requires less storage and make the search path in the
weight space stochastic which allows for a wider exploration of the search space and,
potentially, leads to better quality solutions.

Learning rate
The learning rate is directly proportional to the size of the steps taken in weight space.
However, it is worthwhile re-iterating that learning rate is only one of the parameters that
affect the size of the steps taken in weight space. Traditionally, learning rates remain fixed
during training and optimal learning rates are determined by trial and error. Guidelines for
appropriate learning rates have been proposed for single-layer networks but it is difficult to
extend these guidelines to the multilayer case. Many heuristics have been proposed which
adapt the learning rate, and hence the size of the steps taken in weight space, as training
progresses based on the shape of the error surface.

Stopping criteria
The criteria used to decide when to stop the training process are vitally important, as they
determine whether the model has been optimally or sub-optimally trained. Examples of the
latter include stopping training too early or once overfitting of the training data has occurred.

34
At this stage, it is important to understand that overfitting is intricately linked to the ratio of
the number of training samples to the number of connection weights. Overfitting does not
occur if the above ratio exceeds 30. In such cases, training can be stopped when the training
error has reached a sufficiently small value or when changes in the training error remain
small.
When the above condition is not met, there are clear benefits in using cross-validation. In
practical terms, however, this is not a straightforward task, and there has been much
discussion about the relative merits of the use of cross-validation as a stopping criterion.
Some researchers suggest that it is impossible to determine the optimal stopping time and that
there is a danger that training is stopped prematurely (i.e. even though the error obtained
using the test set might increase at some stage during training, there is no guarantee that it
will not reach lower levels at a later stage if training were continued). Consequently, when
cross-validation is used, it is vital to continue training for some time after the error in the test
set first starts to rise.

3.4.3 Testing and Validation


Testing is done after the training of the data is complete and the error is below the tolerance
levels. The NAR generally keeps 30 % of the input data for testing and validation. Once the
training (optimisation) phase has been completed, the performance of the trained network has
to be validated on an independent data set using the criteria chosen. It is important to note
that it is vital that the validation data should not have been used as part of the training process
in any capacity. If the difference in the error obtained using the validation set is markedly
different than that obtained using the training data, it is likely that the two data sets are not
representative of the same population or that the model has been over-fitted. Poor validation
can also be due to network architecture, a lack of, or inadequate, data pre-processing and
normalisation of training/validation data. Diagnostic tools used to check the adequacy of
statistical models can also be applied to neural networks.
A validation of the neural networks is required to determine the extent of the learning and
generalization of the neural networks. To proceed with comparison between the desired
productivity and the predicted productivity it is crucial to determine a level of accuracy. The
capability of the networks to produce accurate results is obtained by determining the
percentage of exemplars meeting or exceeding a preset accuracy level. It is worth mentioning
that the level of accuracy should be regarded as an indicator, with the available data, for the
capability of the training data sets to converge and for the testing data sets to generate values.

35
3.4.3.1 The Problem of Under-fitting or Overfitting
The training set was used to train the network in order to choose its parameters (weights), the
cross validation set was used for generalization, that is to produce better output for unseen
examples. Finally, the test set was used to measure the performance of the selected ANN
model. A practical issue is the under-fitting/overfitting dilemma. Under-fitting may occur
when models are too simple and have insufficient training data. In such cases both training
error and testing error are large. Overfitting occurs when model is too complex, have
insufficient or noisy training data, resulting training error to be small but large testing error.
In order to tackle these problems stop criteria and weight reset were used during the training
network.

3.4.4 Error Analysis: Performance Measures


After the testing is done, the results are saved in the workspace and a graph is plotted
between the actual output and the predicted output so that a comparison can be made. The
graph is an efficient way of comparing the two types of data available with us. It can also be
used to calculate the accuracy of the model.
Due to the fundamental importance of time series forecasting in many practical situations,
proper care should be taken while selecting a particular model. For this reason, various
performance measures are proposed in literature to estimate forecast accuracy and to compare
different models. These are also known as performance metrics. Each of these measures is a
function of the actual and forecasted values of the time series.

3.5 Description of Various Forecast Performance Measures


3.5.1 The Mean Squared Error (MSE)
Mathematical definition of this measure is MSE = 1/n Σ et2
Its properties are:
 It is a measure of average squared deviation of forecasted values.
 As here the opposite signed errors do not offset one another, MSE gives an overall
idea of the error occurred during forecasting.
 It panelises extreme errors occurred while forecasting.
 MSE emphasizes the fact that the total forecast error is in fact much affected by large
individual errors, i.e. large errors are much expensive than small errors.
 MSE does not provide any idea about the direction of overall error.

36
 MSE is sensitive to the change of scale and data transformations.
 Although MSE is a good measure of overall forecast error, but it is not as intuitive
and easily interpretable as the other measures.

3.5.2 The Sum of Squared Error (SSE)


It is mathematically defined as SSE = Σ et2
 It measures the total squared deviation of forecasted observations, from the actual
values.
 The properties of SSE are same as those of MSE.

3.5.3 The Signed Mean Squared Error (SMSE)


This measure is defined as SMSE = 1/n Σ (et / |et|) * et2
Its salient features are:
 It is same as MSE, except that here the original sign is kept for each individual
squared error.
 SMSE panelises extreme errors, occurred while forecasting.
 Unlike MSE, SMSE also shows the direction of the overall error.
 In calculation of SMSE, positive and negative errors offset each other.
 Like MSE, SMSE is also sensitive to the change of scale and data transformations.

3.5.4 The Root Mean Squared Error (RMSE)


Mathematically, RMSE = √MSE = √ [1/n Σ et2]
 RMSE is nothing but the square root of calculated MSE.
 All the properties of MSE hold for RMSE as well.

3.5.5 The Normalized Mean Squared Error (NMSE)
This measure is defined as NMSE =MSE/ σ2 = 1/ (n * σ2) Σ et2
Its features are:
 NMSE normalizes the obtained MSE after dividing it by the test variance.
 It is a balanced error measure and is very effective in judging forecast accuracy of a
model.
 The smaller the NMSE value, the better is the forecast.
 Other properties of NMSE are same as those of MSE.

37
3.5.6 The Mean Forecast Error (MFE)
This measure is defined as MFE = 1/n Σ et
The properties of MFE are:
 It is a measure of the average deviation of forecasted values from actual ones.
 It shows the direction of error and thus also termed as the Forecast Bias.
 In MFE, the effects of positive and negative errors cancel out and there is no way to
know their exact amount.
 A zero MFE does not mean that forecasts are perfect, i.e. contain no error; rather it
only indicates that forecasts are on proper target.
 MFE does not panelise extreme errors.
 It depends on the scale of measurement and also affected by data transformations.
 For a good forecast, i.e. to have a minimum bias, it is desirable that the MFE is as
close to zero as possible.

3.5.7 The Mean Absolute Error (MAE)


The mean absolute error is defined as MAE = 1/n Σ |et|. Its properties are:
 It measures the average absolute deviation of forecasted values from original ones.
 It is also termed as the Mean Absolute Deviation (MAD).
 It shows the magnitude of overall error, occurred due to forecasting.
 In MAE, the effects of positive and negative errors do not cancel out.
 Unlike MFE, MAE does not provide any idea about the direction of errors.
 For a good forecast, the obtained MAE should be as small as possible.
 Like MFE, MAE also depends on the scale of measurement and data transformations.
 Extreme forecast errors are not panelised by MAE.

3.5.8 The Mean Absolute Percentage Error (MAPE)


This measure is given by MAPE = 1/n Σ (|et| / |yt|) * 100. Its important features are:
 This measure represents the percentage of average absolute error occurred.
 It is independent of the scale of measurement, but affected by data transformation.
 It does not show the direction of error.
 MAPE does not panelise extreme deviations.
 In this measure, opposite signed errors do not offset each other.

38
3.5.9 The Mean Percentage Error (MPE)
It is defined as MPE = 1/n Σ (et / yt) * 100
The properties of MPE are:
 MPE represents the percentage of average error occurred, while forecasting.
 It has similar properties as MAPE, except,
 It shows the direction of error occurred.
 Opposite signed errors affect each other and cancel out.
 Thus like MFE, by obtaining a value of MPE close to zero, we cannot conclude that
the corresponding model performed very well.
 It is desirable that for a good forecast the obtained MPE should be small.
Each of these measures has some unique properties, different from others. In experiments, it
is better to consider more than one performance criteria. This will help to obtain a reasonable
knowledge about the amount, magnitude and direction of overall forecast error. For this
reason, time series analysts usually use more than one measure for judgment.

Also, other data plotting is done such as Regression Best Fit plot, Error Histogram and
Performance plot. Error in this study is found out on the basis of mean squared error and sum
of squared error. Both the error data has to be closer to zero to give more accurate data.
Further it is away from 0, less accurate the data is. Also in the regression plot, correlation
coefficient ‘r’ is be closer to 1 in order to have a high correlation been output values and
target values.

39
CHAPTER 4
CASE STUDY DETAILS

In this study, the Artificial Neural Network model developed using NAR and NARx network
have been tried on two different site data that were obtained from L&T Waste Water Unit and
Industrial & Large Water Unit of Water Effluent Treatment (WET) IC. The two different
projects had different project scope, site conditions, essential requirements and different
critical parameters governing the resource requirements of the sites. The data was obtained
from the MPCS records of each site that are updated regularly.
Both the site requirements for resources are discussed in connection with the Case Study
created to understand how ANN can be used to predict resource requirement for future
projects. A detailed analysis with regression and error graphs are provided in the study.
Certain aspects of each project have been found to be unique and do not correlate with one
another.
In order to understand this, a comparative analysis is done at the end of both the findings. The
comparison represents the similarities as well as differences of the pattern of resource
requirements. An attempt has been made to understand what critical factors define the
differences connoted in the findings. This drawback is not taken into consideration as the
validation of the model can only be done accurately when a large data sample is available.
Since only two sites and their resource analysis are done, it considerably reduces the
efficiency of the model. Some of the performance measures may vary beyond the permissible
limits as the prediction is done on a very limited database. These problems can be overcome
with adequate data availability.

4.1 Case Study 1


4.1.1 Site Details
Project Name : Development of IT City, Mohali Project
Location : Sector-66A, 66B, 82A, 82B, S.A.S. Nagar,
Mohali
Project Client/ Owner : GMADA (Greater Mohali Area Development
Authority)
Project Chief Consultant : SYAL Consultancy Ltd.
Project Contractor : L&T Construction.
Project Duration : 18 Months.

40
Project Scope : Construction of Roads, PH services (Water
Supply, Sewerage Storm Water Drainage &
Treated Water Networks), Electrical Services
(HT, LT and Streetlights), Maintenance of PH
and Electrical Works for 5 years
Project Value : Rs 349 Cr
Total built up area : 1700 acres
Type of Contract : Item Rate

Fig 4.1: Site Layout of IT City, Mohali

4.1.2 Site Location and Layout

GMADA IT City Mohali Located in Sectors 82 A, 83 A, 101 B near Indian School of


Business (ISB) and Indian Institute of Science Education and Research (IISER). The site is
conveniently placed near all possible transport networks of both Mohali and Chandigarh. It is
approximately at a distance of 1.5km from Mohali International Airport, 0.75km from Mohali
Railway Station and 10km from Chandigarh Railway Station. It is an integrated township
developed by GMADA (Greater Mohali Area Development Authority). Strategically located
very close to International Airport yet it does not fall in the flight path making it the most
sought after residential destination. Spread over 1680 Acres covering Sector 66-B / 82-A /
83-A / 101-A offering Residential, Commercial & Industrial Plots as shown in Figure 4.2.

41
A detailed site Layout was outlined at the beginning of the project keeping in mind the
logistic requirement. The important areas were office area; material stacking area, lay down
area, parking area, quality lab area, dumping yard, etc. were clearly demarcated. Area of soil
stacking, aggregate stacking and reinforcement stacking area were also identified. Areas for
Formwork and Rebar Yard were clearly separated considering the vehicular movement
requirement. The work on access roads for all part of the project site was constructed after
finalizing above areas. The temporary buildings for L&T offices were also built keeping in
mind the requirement of maintenance for 5 years post completion. All necessary drainage
works, infrastructure work, office establishment are constructed. The PPE area and non PPE
areas are clearly demarcated as per layout.

Fig 4.2: Site Location of IT City, Mohali

4.1.3 Project Introduction


GMADA IT City Mohali consists of IT industries, Residential areas and commercial
establishments. GMADA IT City is located in 82A 83A and 101A sectors which are situated
on 200 feet wide road. GMADA IT-City Mohali project is built on an estimated area of
1,688 acres out of which 400 acres will be used for IT industries, another 1100 acres for
residential purposes and remaining 188 acres for commercial establishments. 80% of plots in
GMADA IT City Mohali are already being given to landowners under land pooling and the
remaining plots would be offered to public through draw by GMADA.

42
The Waste Water Business Unit of WET IC of L&T Construction was offered the
Construction of roads, development of Public Health services i.e. water supply & distribution
network, Sewerage & Storm network & treated water network , Electrical services includes
HT & LT cabling works , street lighting etc. complete with maintenance of PH & electrical
services for a period of five years.
The project was initiated on 30th August, 2014 after receiving environmental clearances from
State Departments. The estimated project duration was 18 months with a completion date of
30th April, 2017. However, due to multiple interconnected fault lines and in discrepancies in
resource handling, the project got delayed. As per the monthly progress report for the month
of April 2017, the physical progress is 95% completed with a financial progress of 86%.

4.1.4 Project Completion Timeline


There have been multiple reasons for the delay in completion of the project. After a detailed
audit was reported, it was found that at the onset 50% of the work area was on hold by
GMADA and work was not permitted in those areas. These restrictions to access to land
hindered the process of constructing access roads and temporary structures.
The possession of site was given in phases with 50% of the project site was handed over at
the start. However, the remaining 50% was handed over later i.e. 25% in June 2015 and 5%
in July 2016. Even after 24 months, 20% of the site was still on hold and not permitted to use
by GMADA authorities.
The second major cause of delay was the issuance of designs and drawings. Since the work
included water supply, public utilities, electrical networks and several other PH services, the
drawings were erratically issued. As per the report:
 Design and drawings of Sewerage, Storm water, Water Supply were issued to us vide
letter no 2015/656 dated 19.02.2015 between Rd 1 to Rd 5 & Block I.
 Design & drawing of Public Health utilities between Rd 5 & 6 were issued on
10.04.2015.
 Revised utilities drawings on approved layout plan Between Rd 6 to 8 & upper
portion issued to us vide letter No: 2631 Dt. 01.07.2015
 Electrical LT & HT Scheme yet to be approved
 It was not possible to lay PH services in Blocks as location of plot Gates was required
to locate man-hole chambers, inspection chambers and other house service

43
connections. This drawing was received through mail from DTP Dept. on 13th
April'15
Also there was a no work period of 3 months from December 2014 to February 2015 as
during this time GMADA undertook the operation of feasibility checking of the IT City
Layout Plan as shown in Figure 4.3.

Fig 4.3: Project Schedule Completion - Planned vs Actual

44
4.2 Case Study 2
4.2.1 Site Details
1. LOA DATE : 04-03-2014
2. AGREEMENT DATE : 10-04-2014
3. LOA REFERENCE NO. : BUIDCo/Yo-24/10(lll) -665 dated 04-03-2014
4. START DATE : 24-04-2014
5. COMPLETION DATE : 24-06-2016
6. ORIGINAL PROJECT DURATION : 26 Months
7. DLP PERIOD : 1 year after project completion
8. OPERATION & MAINTENANCE : NIL
9. CONTRACT VALUE : 254.52 crores
10. TYPE OF CONTRACT : Item Rate Contract

Fig 4.4: Reinforcement Works for Ghat Floor Slab at Chaudhary Tola Ghat

45
4.2.2 Brief Scope of Work:
PART A:
1) Development of Ghats - 20 No's.
2) Promenade of 6.6 km.
3) Crematorium at Gulvi Ghat.

PART B: BUILDINGS - 5 NOS.

PART C: ELECTRICAL WORKS

PART D: ENVIRONMENTAL MANAGEMENT PROGRAM

Fig 4.5: Concrete Work under progress using Barge Mounted Batching Plant

46
4.2.3 Civil Works (Approx):
1. Earthwork
1. Excavation : 85,000 cum
2. Filling : 90,000 cum
2. Pile Work : 9,000 no's
3. Concrete : 83,500 cum
4. Reinforcement steel : 8,500 MT
5. Shuttering : 3, 80,000 Sqm
6. Brickwork : 11,000 cum
7. Flooring works : 1, 32,000 Sqm
8. Plastering & Finishing works : 4, 54,000 Sqm
9. Gabion : 25,000 cum

4.2.4 Electrical Work:


1. Double Chambered body cremation furnace : 2 No's
2. Electrical Wiring : 19,000 meter

This project is also a completed project as of January 2017. The project was delayed by a
period of 5 months mainly due to intense rainfall in the Gangetic belt for the monsoon
season. The project was under the Industrial and Large Water unit of Water Effluents and
Treatment Division of L&T.

47
CHAPTER 5
RESULT AND DISCUSSION
5.1 Results of Case Study 1
5.1.1 Project Data
The site MPCS of the latest update had contained the details of every activity performed
monthly till the present. The site construction was completed and that gave us a full scope the
project resource requirements. The MPCS recorded data in a plan vs actual format. The
planned amount of monthly usage of resources under each activity was calculated in the
project initiation phase.
Here, there is a need to understand the planned variable inputs in projects. The planned
resources are fixed under various assumptions and previous experiences. The data that is
calculated for planned resources contain an overall optimum condition fulfilling criteria.
Certain allowances are obviously taken, may it be seasonal or other uncontrollable
hindrances. However, most of the time these presented data does not accommodate various
unforeseeable circumstances. The actual data obtained by physical implementation of
activities on site creates a marked difference in estimates.
For the model analysis, the actual data obtained from sites were taken as input data. Now
there is an interesting take on the choice of selection of input parameters

5.1.2 Choice of selection of parameter


The debate can come over which data to take for the model analysis. The vast differences in
the actual vs planned data can create different results. So how can we interpret the choice of
selection for the specific Case Study?
Now, primarily it should be understood that choice of parameter for the prediction does not
affect the Case Study as such. If we were input planned values, the model predicted the future
values in that fashion and similarly for actual values. The hindrances, negligences, delays that
cause the actual values to deviate from planned values are not relevant in the prediction.
In this model, we have taken actual values of resource requirement for activities due to a
specific reason. In planned value model prediction, the estimated values have a slightly better
correlation among each other owing to the fact that values are estimated by the rule book.
There is very less noise in these values. So for a Case Study it is far easier to correlate these
values and predict the future estimates. However, in actual estimates, the discrepancies or

48
noise is very high as it does not in any ways correlate or tends to correlate with one another.
In this case, it makes it harder for the Case Study to find the setting pattern for these values.
Hence these actual resource values were used to test the efficiency of the model in dire
circumstances. In future, if operations on site are made more viable and less risk prone, the
actual values will have a more established connection almost closer to the planned values.
This will make the work of the model easier.

5.2 Result and Analysis of Case Study 1


5.2.1 Regression Graph

Fig 5.1 Regression Graph of Case Study 1

49
As described earlier the entire test data was divided in 3 parts; training set, validation set and
testing set. For this project analysis, the data was divided as 70%, 15% and 15%. The main
recourse in understanding how to divide the data sets is to find the normalizing effect of the
entire input parameters. If the values are within a finite scope and the percentage of noise is
below 20%, the training data is given a bulk load of the inputs. It helps to formulate a better
non-linear relation between the inputs. Since there is very less noise in the remainder of data
it is highly probable that these values will more or less fit in the designed correlation.
The regression ‘r’ value of all the three combined comes as 0.997 as in Figure 5.1. This is
quite a high value since the optimum value is 1. This achievement shows the stability of the
data set and the viable range of errors is limited. The regression value sticks upward in the
graph scale proving that the LM algorithm has an inbuilt mechanism to fine tune the weights
to the least convergent point. Also, a point needs to be noted that normalization process of
standardizing the values was conducted in this set data inputs. The performance graph shows
a strong convergence between training and validation data sets. This normalization had great
impact in filling the “Data Gaps”. The standardization of inputs helped reduce the noise and
create a more sustained result. In the larger scheme, the slight tweak in the data inputs due to
normalization does not have unreliable prediction efficiency. The normalization takes into
account the range and variations of consecutive values.
The regression graph shows a very high convergence pattern within the input data. This
shows a high correlation among the individual input parameters. This shows that the noise
level in the input data is bare minimum within the prescribed 20% limit.

50
Fig 5.2: Performance Graph of Case Study 1

The performance graph shows that the validation data set and training set is almost correlated
to a degree within error range. The testing data falters to a certain extent as the data set
slightly shifts upward representing the similar path but a deviated pattern. The testing set data
or predicted data had the best convergence in the 9th epoch as the minimum MSE was
obtained as shown in Figure 5.2. As detailed earlier, epoch is the iteration number which the
model continues to find the minimum error. The epoch is continued till the 15th time after
which the error between the predicted values and the actual values starts diverging. The LM
algorithm gets rid of the local minima by continuing on the iteration till such a condition
occurs. In the points of intersection before the 9th epoch between the testing and the
validation and training data sets shows the points of local minima. These redundancies are
done away in the LM algorithm.

51
5.2.2 Time Series Graph

Fig 5.3: Time Series Graph of Case Study 1

The time series graph in Figure 5.3 shows all the points of inputted data and predicted values
for all the three input subsets. Each of the point denotes both the values. The time series
analysis is done as recorded data follow a monthly pattern although no such connection
between months and data value can be found. The time series analysis makes it easier for the
algorithm to denote each value as an index of a particular time (in this case a month each)
which is used as a correspondent point to the next month’s value.
The time series graph also shows the errors for each predicted value. The error analysis is
described in details in the next context.

5.2.3 Error Analysis


In the model, error analysis is done on the basis of mean squared error and sum of squared
error. In the methodology chapter, these errors are described in details. Along with them
several other performance parameters are also described. However, only two parameters were
used. This is because as the data range is wide and untethering; in order to understand the full
scope of work the entire sum of errors is needed to be considered. The mean squared error
gives a mean value of the entire error matrix. The range and limit of error values is not fixed.

52
This maximum value of error recommended is determined by eliminating the noise and then
reducing the MSE value to 0. Although 0 is the optimum value for any error variable,
however large value of error in this Case Study shows minor individual errors over a
sufficiently large data set. In these cases, the regression ‘r’ value comes into play. The better
the ‘r’ value, the more acceptable the error value is.

5.2.3.1 Error Histogram

Fig 5.4: Error Histogram of Case Study 1

The above error histogram in Figure 5.4 shows the deviation and magnitude of error of all the
data points. The central yellow line shows the point of zero error. In this graph, 60% of the
inputs have a range of zero error or within the prescribed bin. The remainder of the inputs
have a data range as shown in the graph. As we can see from the graph, the error range lies
between -86.79 to 61.76. Out of these, only 12% of the values range beyond the central
accomodable range of -30 to 30.

53
Fig 5.5: Error Auto-Correlation Graph of Case Study 1

The error Mean Squared Error (MSE) comes to a value of 170.85 which is within reasonable
limits. The mean squared error gives the mean error deviation from the 0 value. The sum of
squared error (SSE) comes to about 6.08 * 104. As we can see from the error correlations
graph in Figure 5.5 the error deviations are within the considerable ranges as shown by the
confidence limit lines. This reasserts that the model prediction has a high convergence of data
inputs.
From the outcome of predicted data, the model worked well for the given set of data for the
first project. The correlation ‘r’ values come to 0.997 while the MSE is 170.85. The data
estimated is highly accurate and suits well for such projects and sequence of events.
The parameters for estimation and prediction were the 4M resources as in Manpower,
Machinery, Materials and Money. These resources were estimated based on essential
activities over the entire project completion duration. In this 1st case study the activities were
road laying, excavation and backfilling, masonry, reinforcement and concreting. The
finalized weights for the above parameters in respective order were 5.1159, 3.0205, 2.2348,
1.087 and 0.5289 with mean squared error of 170.85.

54
5.3 Results of Case Study 2
5.3.1 Project Data
As described in the first project, the data was collected from the MPCS records after project
completion. The work was divided into activities such as concreting, excavation,
reinforcement and the resource requirements for each of these activities were enlisted. The
resources primarily considered for this study are manpower, machinery and material used.
The activities taken into consideration are generally the important activities which form the
bulk of the construction work. Activities on the periphery, with only small scale importance
were not considered. Activities such as flooring, wood work, aluminium works, painting all
occupy a certain time period of the entire project. With respect to the entire project timeline,
they form a minor part and so their monthly requirement is basically 0 except for those few
months when it is done. Entering such inputs would not create a good correlation equation
and generally tend to fail and create large errors.

5.4 Result and Analysis of Case Study 2


5.4.1 Regression Graph
The entire test data was firstly divided in 3 parts; training set, validation set and testing set.
For this project analysis, the data was divided as 50%, 25% and 25%. Data normalization was
done on the data input prior to input. However, in this case, we see that the data inputs form a
highly deviant or non-convergent graph. The combined correlation ‘r’ value comes to 0.75 as
shown in Figure 5.6 which can be said to be just at the end of considerable limit. As stated
earlier, an ‘r’ value should be as close to 1.

55
Fig 5.6: Regression Graph of Case Study 2

It should be noted that normalization was also done in this data set; however there were large
“Data Gaps” which had no value. As per the site implementation, in the heavy months of rain
no work was done. Hence to subscribe for this fact, the data inputs were 0. So, the problem
arises when the LM algorithm tries to find the correlation pattern between the data inputs.
Having a consistent series of 0 values in an intermediate or extreme ends will somehow
create a point of decreasing trend of resource requirement. This additional effect when
considered in conjunction to the already preceding trend of data correlation creates a different
pattern. The finalised pattern in the validation phase then somehow predicts wrong values for
the future inputs.

56
The error is very high in this case as the data hardly converges as shown in the performance
graph:

Fig 5.7: Performance Graph of Case Study 2

We see in Figure 5.7 that iteration or epoch 3 is taken as the best value after which both the
validation and testing data diverges or follows an almost parallel path with respect to each
other. The iteration process was continued until the 9th epoch beyond which any point of
convergence is highly unlikely.
The point to be made in this Case Study is that this particular analysis of the data inputs is not
completely wrong. The correlation value comes to a quite significant 0.75 which is good in
respect to other conditions. The error in this case is very high and that shows a substantial
individual errors. As stated in the previous modelling, there is no cap or range of acceptable
error as error can increase with the increase in number of data input. The most reliable source
is the correlation value. Going by that, this predicted data set is still a moderately good
output. This will be further discussed in Error Analysis.

57
5.4.2 Time Series Graph

Fig 5.8: Time Series Graph of Case Study 2

In the above Figure 5.8, each data point is represented by its actual value as well as its
predicted value. The time series graph shows all the points of inputted data and predicted
values for all the three input subsets. Each of the point denotes both the values. The time
series analysis is done as recorded data follow a monthly pattern although no such connection
between months and data value can be found. The time series analysis makes it easier for the
algorithm to denote each value as an index of a particular time (in this case a month each)
which is used as a correspondent point to the next month’s value.
The time series graph also shows the errors for each predicted value. The error analysis is
described in details in the next section.

5.4.3 Error Analysis


The parameters used to measure error are the same mean squared error and the sum of
squared errors. The optimum value should be 0 for the cases. However, as we see in the
following graphs the error values are quite high in the order of 106. The error is created due to
substantial individual error of each individual data points.

58
5.4.3.1 Error Histogram

Fig 5.9: Error Histogram of Case Study 2

The zero error line in Figure 5.9 is slightly skewed towards the left as majority of the error
shows predicted value more than the actual values. The negative error is more in the data sets.
The error range for zero limits is higher in this case of about -75 to 75. The majority of data
inputs show an error range of -1000 to 1000. This is higher than the previous model the
reasons for which are discussed previously.
Also another interesting point that arises seeing the high value of errors is the data scattering
or noise. In this model, the noise or scattered data has a very high percentage exceeding the
considerable limit of 20%. This is yet another reason that the error obtained is large. Data
scattering is one of the main problems for not obtaining a uniform correlation equation
between the input data sets. This causes the skewing of the histogram above as well. The
noise could not be reduced by data normalization in this case as there was already the
problem of data gaps. This caused the creation of a “Pseudo Effect” wherein a correlation
parametric equation was formed between the input and output. In this effect, the model tried
to predict data taking into consideration several zero values in the middle. The result was
that, it formed a decreasing gradient slope for value estimation whereas the subsequent values
did not follow the trend.

59
Fig 5.10: Error Auto-Correlation Graph of Case Study 2

The error Mean Squared Error (MSE) comes to a value of 7.17 * 106 which is reasonably
high. The mean squared error gives the mean error deviation from the 0 value. The sum of
squared error (SSE) comes to about 1.65 * 109. As we can see from the error correlations
graph in Figure 5.10 the error deviations are within the considerable ranges as shown by the
confidence limit lines. This reasserts our previous discussion that just because the error
values are quite high does not necessarily signify wrong prediction. This confidence limit
shows that with respect to the given inputs along with its inherent data shortcomings, the
predicted values lie within the range. It proves that the model does indeed work well for such
discrepancies as well.
From the outcome of predicted data, the model worked moderately for the given set of data
for the first project. The correlation ‘r’ values come to 0.75 while the MSE is 7.17 * 106. The
data estimated has prediction difficulties which can be eliminated with better data inputs and
no data gaps.
The parameters for estimation and prediction were the 4M resources as in Manpower,
Machinery, Materials and Money. These resources are estimated based on essential activities
over the entire project completion duration. In this 2nd case study the activities were
excavation, piling, reinforcement and concreting. The finalized weights for the above
parameters in respective order were 12.666, 12.902, 6.514 and 1.307 with mean squared error
of 7.17 * 106.

60
5.5 Comparative Analysis
As previously stated in the individual Case Study results of both the projects, the first
prediction worked immensely well while the latter had certain issues. These issues do not
represent the entire spectrum of problems that may arise in these ANN Case Studys.
However, in effect of resource analysis of big construction project, the underlying error in
judgement arises due to the above stated problems. In trying to asses these misinterpretations
of data certain core essentials of Case Study working comes into picture.
If we look closely, these prediction errors or misinterpretations does indeed follow a certain
pattern or tradeable symptoms. There is a need to understand such mechanisms in order to
fulfil the desired goals of the model in question. In doing so, the main reasons can be
explained as follows:

5.5.1 Data Gaps


As suggested earlier, the concept of data gaps is not novel or exceptional. Working in large
multi-dimensional projects does require us to see an overall process of project progress. In
trying to analyse this vast database only in terms of resource requirement does create a
terrific problem of overlooking certain aspects of project delays and improper risk
management. “Data Gaps” is basically the result of such overlooking. When we assess
resource data on a monthly basis, it is absolutely true that not all resources will be required
throughout the entire data phase. There will be certain resources that may be required only for
a small intermittent phase of the project.
Now it may be noted that, in trying to eliminate this discrepancy, activities without an overall
impact on the project timeline were initially eliminated. Predicting resource requirement for
such activities can be done separately on a smaller scale giving sufficiently good results.
However, there are certain activities that continue year round in the project site. These
activities tend to utilize the set of specific resources throughout the project completion. In this
case, however, we still see the problems of data gaps i.e. absence of intermittent data. Now
for other prediction model such as rainfall and temperature this absence of data can be
accounted for by various data mining techniques. The core concept of data mining is trying to
get a set of input data that have occurred but were not registered in any format. This implies
that the data is present even if it is not recorded anywhere. The problem with resource
analysis of construction project sites is that there are cases when data is missing i.e. data is
not only absent in any recorded database but it actually never occurred. In such cases, it
leaves no choice but to enter 0 for such data values.

61
When we do that, it creates a problem for the prediction algorithm (in this case the LM
algorithm) as it tries to formulate a pattern including the 0 values. This creates a tendency for
the algorithm to generate lower or random values in place of the 0 values.
A uniformity of data is must in order to understand and predict accurate values. The problem
of data gaps cannot be accommodated in the pre-defined algorithms. It is necessary to
understand that data normalization is not a solution in this case. It cannot help sort the
problem of no data.

5.5.2 Noise - Scattered Data


In the prediction of the second project, we saw in the regression graphs a number of scattered
data. This defines those values that skew from the generated prediction equation. These
values generate a problem of skewing the errors much higher than normal circumstances. In
case of high noise in data sets, the error and correlation ‘r’ values shifts decisively from its
optimum. In any case, a percentage skewing of 20% of data inputs is acceptable under normal
circumstances in which a good predicted model can be resulted.
The problem of scattering of data however has a very simple solution-Data Normalization. In
this study, both the data sets have been normalized to obtain standardized data values. So, in
the first model as well, data normalization was done. Hence, we see that scattered data does
not necessarily create problems for prediction. In the first model the noise well within the
desired 20% limit and hence we obtained a good result. However, in the second case, the data
scattering exceeded 20% which saw a slightly skewing of error values.

5.5.3 Pseudo Effect


This phenomenon is in conformation of the error caused due to “data gaps” and noise or data
scattering. In this process, the presence of scattered data creates a “pseudo” correlative
parametric equation between the input and output. Due to this effect, the model tried to
predict data taking into consideration several skewed and zero values in the middle. The
result was that, it formed a decreasing gradient slope for value estimation whereas the
subsequent values did not follow the trend. This kind of value estimation creates several
pressure points where we see huge increase in error as the pattern tries to shift from a low
lying 0 trend to absolutely higher value zones.

62
5.5.4 Volume of Data-Set
One of the most important requirements for obtaining data for Case Studys is the bulk of
data. It is to be understood that for a model to work perfectly and predict almost accurate
outputs it needs to have a huge set of data to train from. Larger the data set better will be the
predictive power of the algorithm. As we know that the data set is divided into three subsets
namely: training, validation and testing set. So a large volume of data needs to be assigned to
the training part of the model.
In our first model prediction, the data set is much larger than the second set. The first project
had a 5 year timeline with 5 different crucial activities. These created a larger data set with
more meaningful insight on the pattern of data variation. In the second model the entire data
had a little more than a 3 year or 40 month timeline with the only essential activities included
excavation, piling, reinforcement and concreting. However, even in these activities,
reinforcement and concreting was only a part of entire operation which further reduced the
acceptable data inputs.
Another interesting point of discussion is the error generation of outputs and targets. As
stated briefly earlier, a large data set tends to create a larger cumulative error for obvious
reasons. As the number of data points increases, each data point corresponds to an error value
and in turn increases the bulk of error produced. But this argument is not a proportional
relation. In order to understand this concept we need to be clear of two things. Firstly, a large
data set creates a higher number of individual errors. Secondly, a large data set also causes a
better fit for the ensuing prediction pattern. As the above two statements creates a confusion,
it can be understood that both the mechanism act a counteracting force against each other,
creating a more stable neural pattern. Since the algorithm works on the principle of reducing
error by changing pattern weights and not the other way round, the first proposition shows the
dominance. The larger the data set better will be the predictability of the algorithm. Though it
creates more individual errors, the magnitude of errors will be negligible thereby reducing the
overall cumulative error.

63
CHAPTER 6
CONCLUSION

With the help of the two distinct iterative processes using separate data from two individual
sites and applying Nonlinear Autoregressive Neural Network, we get certain characteristic
observations. Certain general assumption of data representation, prediction, misinterpretation
and error generation were done which may be applicable to all construction projects. Better
normalization of data, better innovative algorithm, larger data sets and more convergent data
inputs would have definitely yielded better results. However, keeping in mind the persistence
of manually running each model with a varying set of differing parameters the best output has
been obtained. Some observations noted in the process of training, validation and testing of
neural network are as follows:
 The parameters for estimation and prediction were the 4M resources as in Manpower,
Machinery, Materials and Money. These resources are estimated based on essential
activities over the entire project completion duration. In the 1st case study the
activities were road laying, excavation and backfilling, masonry, reinforcement and
concreting. In the 2nd case study the activities were excavation, piling, reinforcement
and concreting.
 The finalized weights for the parameters of the Case Studies were obtained. In the 1st
case study, the weights in order of the above parameters were 5.1159, 3.0205, 2.2348,
1.087 and 0.5289 with mean square error of 170.85. For the 2nd case study the weights
in order of the parameters above were 12.666, 12.902, 6.514 and 1.307 with mean
squared error of 7.17 * 106.
 The main determinants of the success of the tests are mean squared error (MSE) and
the correlation coefficient(r). The correlation was best obtained at an average value of
0.997 in all the three phases combined as shown in the first iterative model. The
correlation coefficient of the second model comes to a decent 0.75 which is a
moderately suitable prediction. The analysis does not provide accurate prediction but
brings into picture certain data shortcomings.
 This Case Study can be conclusively used to predict accurately the resource
requirement of partially completed projects by using data of work completed. The
model can help predict resource volumes of any specific site in accordance to its
historical data. The model inculcates slight nuances and differences of each individual
site with precision.

64
REFERENCES

1. P.S.Kulkarni, S.N.Londhe and M.C.Deo (2017), “Artificial Neural Networks for


Construction Management: A Review” Journal of Soft Computing in Civil
Engineering 1-2 70-88.
2. Luis Gonzaga Baca Ruiz, Manuel Pegalajar Cuéllar et. al. (2016), “An Application of
Non-Linear Autoregressive Neural Networks to Predict Energy Consumption in
Public Buildings” pp. 5-21.
3. Deepthi I. Gopinath and G.S. Dwarakish (2015), “Wave prediction using neural
networks at New Mangalore Port along west coast of India” International Conference
On Water Resources, Coastal And Ocean Engineering (ICWRCOE 2015) 143 – 150.
4. Megha Jain, K. K. Pathak (2014), “Applications of Artificial Neural Network in
Construction Engineering and Management-A Review” Volume 2 Issue 3, ISSN
2349-4476.
5. Satanand Mishra, Prince Gupta, S.K.Pandey and J.P.Shukla (2014), “An Efficient
Approach of Artificial Neural Network in Runoff Forecasting” International Journal
of Computer Applications (0975 – 8887) Volume 92 – No.5.
6. G.M. Naik and M. Kumar (2013), “Project Cost and duration optimization Using Soft
Computing Techniques, Journal of Advanced Management Science”.
7. Kumar Abhishek, Abhay Kumar, Rajeev Ranjan and Sarthak Kumar (2012), “A
Rainfall Case Study using Artificial Neural Network” IEEE Control and System
Graduate Research Colloquium (ICSGRC 2012).
8. Sebastian Dietz (2010), “Autoregressive Neural Network Processes Univariate,
Multivariate and Co-integrated Models with Application to the German Automobile
Industry”.
9. N. Q. Hung, M. S. Babel, S. Weesakul, and N. K. Tripathi (2009), “An artificial
neural network model for rainfall forecasting in Bangkok, Thailand” Issue: Hydrol.
Earth Syst. Sci., 13, 1413–1425.
10. Yu-Ren Wang, Yi-Jao Chen and C.F. Jeffrey Huang (2009), “Applying Neural Network
Ensemble Concepts for Modelling Project Success”.
11. S. Mandal and N. Prabaharan (2007), “Ocean Wave Forecasting using Recurrent
Neural Networks” pp. 6-9.

65
12. Samer Ezeldin and Lokman M. Sharara (2006), “Neural Networks for Estimating the
Productivity of Concreting Activities” Date of Issue: 10.1061/ (ASCE) 0733-9364
132:6(650).
13. Jamshid Sodikov (2005), “Cost Estimation of Highway Projects in Developing
Countries: Artificial Neural Network Approach” Journal of the Eastern Asia Society
for Transportation Studies, Vol. 6, pp. 1036 - 1047.
14. Tarek Hegazy and Moustafa Kassab (2003), “Resource Optimization Using
Combined Simulation and Genetic Algorithms” Date of Issue: 10.1061/ (ASCE)
0733-9364 (2003) 129:6(698).
15. Ajith Abraham, Dan Steinberg and Ninan Sajeeth Philip (2001), “Rainfall Forecasting
Using Soft Computing Models and Multivariate Adaptive Regression Splines” pp. 5-
10.
16. K. Gwangseob and P. B. Ana (2001), “Quantitative flood forecasting using multi-
sensor data and neural networks”.
17. Holger R. Maier and Graeme C. Dandy (2000), “Neural networks for the prediction
and forecasting of water resources variables: a review of modelling issues and
applications” Environmental Modelling & Software 15 101–124.
18. R. Sonmez, and J. E. Rowings, (1998), “Construction labour productivity modelling
with neural networks.”
19. J. Portas, and S. Abourizk (1997), “Neural network model for estimating construction
productivity.”
20. M. N. French, W. F. Krajewski, and R. R. Cuykendall (1992), “Rainfall forecasting in
space and time using neural network”.
21. Ratnadip Adhikari and R. K. Agrawal, “An Introductory Study on Time Series
Modelling and Forecasting”.

66

You might also like