uj_19060+SOURCE1+SOURCE1.1

COPYRIGHT AND CITATION CONSIDERATIONS FOR THIS THESIS/ DISSERTATION
o Attribution — You must give appropriate credit, provide a link to the license, and indicate if
changes were made. You may do so in any reasonable manner, but not in any way that
suggests the licensor endorses you or your use.
o NonCommercial — You may not use the material for commercial purposes.
o ShareAlike — If you remix, transform, or build upon the material, you must distribute your
contributions under the same license as the original.
How to cite this thesis
Surname, Initial(s). (2012) Title of the thesis or dissertation. PhD. (Chemistry)/ M.Sc. (Physics)/
M.A. (Philosophy)/M.Com. (Finance) etc. [Unpublished]: University of Johannesburg. Retrieved
from: https://ujdigispace.uj.ac.za (Accessed: Date).
The design of a vehicle traffic congestion Bayesian
prediction model for Gauteng
Supervisor: Dr B.N. Gatsheni by
ZIPHOZETHU PEARL NKOSI
A dissertation submitted in fulfilment for the Degree
of
Magister Commercii
in
Information Technology Management
Faculty of Management
UNIVERSITY OF JOHANNESBURG
Supervisor: Dr B.N. Gatsheni
2015
ABSTRACT
Traffic congestion is a challenge in Gauteng province of South Africa and it has a negative
impact on the economy of this province in that services and products are not being
rendered on time. Traffic congestion affects the quality of lives of Gauteng residents and
visitors alike. Historical vehicle traffic data for the freeway linking Midrand with Florida in
Johannesburg was collected from Mikros Traffic Monitoring (MTM), an agency contracted
by the Gauteng department of transport. This data was used for constructing the vehicle
traffic flow prediction models. In this research, the Bayesian model provides a reliable
alternative traffic flow prediction model to other evaluated models such as the Naive
Bayes, K-Nearest Neighbor and the Decision tree model. Cross-validation and the root
mean square error were used to evaluate the models. The results in this study will benefit
both commuters and employers by reducing stress levels and save costs for companies,
improve the South African economy as well as assist the Gauteng department of transport
in aligning future road traffic strategies.
ACKNOWLEDGEMENTS
First and foremost, I would like to thank God Almighty, for giving me the strength and
patience to complete this dissertation. I would have not made it on my own.
Secondly, I would like to express my sincere appreciations to my supervisor Dr Barnabas

Gatsheni for his guidance, time, expertise and encouragement. Thank you to my fellow
friends for encouraging and inspiring me to pursue academic excellence.
Finally, I would like to thank my husband, Andile Nkosi for the constant source of
encouragement and support. This dissertation is dedicated to my dearest son Musa and
my lovely sisters Cleo, Sbahle, Yongie, Thando, Noma and Blessing. I appreciate your
support and fervent prayers.
i
TABLE OF CONTENTS
ACKNOWLEDGEMENTS ........................................................................................................................................... I
CHAPTER 1: INTRODUCTION .................................................................................................................................. 1
1.1 INTRODUCTION ............................................................................................................................................1

1.2 A BACKGROUND TO JOHANNESBURG AND ITS TRANSPORTATION SECTOR .................................................................4
1.3 THE RESEARCH PROBLEM STATEMENT ...............................................................................................................6
1.4 RESEARCH OBJECTIVE ....................................................................................................................................7
1.5 RESEARCH QUESTION ....................................................................................................................................7
1.6 JUSTIFICATION OF THE RESEARCH .....................................................................................................................8
1.7 THE RESEARCH REPORT STRUCTURE ..................................................................................................................8
CHAPTER 2: LITERATURE REVIEW .......................................................................................................................... 9
2.1 RELATED WORK............................................................................................................................................9

2.2 THE FRAMEWORK .......................................................................................................................................15
2.3 CHAPTER CONCLUSION................................................................................................................................16
CHAPTER 3: RESEARCH METHODOLOGY .............................................................................................................. 17
3.1 CHAPTER INTRODUCTION .............................................................................................................................17

3.1 RESEARCH METHODOLOGY ..........................................................................................................................17
3.3 LIMITATIONS .............................................................................................................................................19
3.4 RESEARCH PROCEDURE AND METHODS ...........................................................................................................19
3.4.1 Bayesian Networks .....................................................................................................................19
3.4.2 Naive Bayesian networks ...........................................................................................................33
3.4.3 Decision Tree (C4.5) ....................................................................................................................34
3.4.4 Neural networks .........................................................................................................................36
3.4.5 k-Nearest-Neighbor (KNN) ..........................................................................................................38
CHAPTER 4: EXPERIMENT .................................................................................................................................... 41
4.1 CHAPTER INTRODUCTION ................................................................................................................................41

4.2 EXPERIMENT .............................................................................................................................................41
ii
4.2.1 Cross-validation Method ............................................................................................................44
4.2.2 Recipe for constructing the Bayesian model .............................................................................46
4.2.3 The Naive Bayes Prediction Model .............................................................................................49
4.2.4 Bayesian Belief Network (BNN) ..................................................................................................52
4.2.5 k-Nearest-Neighbor ....................................................................................................................55
4.2.6 Decision Tree ..............................................................................................................................57
4.3 POST PROCESSING ......................................................................................................................................59
4.3.1 Total cost for Naive Bayes ..........................................................................................................61
4.3.2 Total cost for Bayesian Network ................................................................................................61
4.3.3 Total Cost for k-Nearest Neighbor ..............................................................................................61
4.3.4 Total cost for Decision tree .........................................................................................................61
4.3.5 The attribute selection ...............................................................................................................64
CHAPTER 5: SUMMARY, CONCLUSION AND RECOMMENDATIONS ...................................................................... 69
5.1 SUMMARY ................................................................................................................................................69

5.2 CONCLUSION AND RECOMMENDATIONS..........................................................................................................71
CHAPTER 6: REFERENCES ..................................................................................................................................... 73
ANNEXURES ......................................................................................................................................................... 77
iii
LIST OF FIGURES
FIGURE 1.1: MAP OF JOHANNESBURG WITH RING ROAD AND REGION MARKINGS............................................... 5
FIGURE 2.1: TRAFFIC CONGESTION RESEARCH FRAMEWORK .............................................................................. 15
FIGURE 3.1: BAYESIAN BELIEF NETWORK STRUCTURE ......................................................................................... 21
FIGURE 3.2: DIRECT ACYCLIC GRAPH .................................................................................................................... 23
FIGURE 3.3: JOINT PROBABILITY DISTRIBUTION GRAPH ...................................................................................... 26
FIGURE 3.4: SERIAL CONNECTION ........................................................................................................................ 29
FIGURE 3.5: DIVERGING CONNECTION ................................................................................................................. 30
FIGURE 3.6: CONVERGING CONNECTION ............................................................................................................. 30
FIGURE 3.7: BAYES NETWORK STRUCTURE .......................................................................................................... 33
FIGURE 3.8: DECISION TREE SHOWING FIRST STEP OF ID3 .................................................................................. 35
FIGURE 3.9: ARTIFICIAL NEURAL NETWORK STRUCTURE (MULTILAYER PERCEPTRON (MLP)) .............................. 37
FIGURE 3.10: 1- NEAREST NEIGHBOR ................................................................................................................... 39
FIGURE 4.1: SHOWS A PROCESS FOR CONSTRUCTING THE MODEL ...................................................................... 45
FIGURE 4.2 (A): SHOWS HOW THE MODEL IS USED TO PREDICT NEW INSTANCE ................................................. 45
FIGURE 4.2(B): SHOWS THE FLOW FOR THE CONSTRUCTION OF THE MODEL AND ITS USE IN PREDICTING A NEW
INSTANCE ............................................................................................................................................................ 46
FIGURE 4.3: LOADING SAVED TRAINING EXAMPLES ON WEKA ............................................................................ 48
FIGURE 4.4: THE RESULTS FROM THE BAYESIAN NETWORK PREDICTION MODEL ................................................ 49
FIGURE 4.5: TRAINING PREDICTION RESULTS FOR THE BN ALGORITHM USING K2-P1 -S BAYES SEARCH ............. 53
FIGURE 4.6: K-NEAREST NEIGHBOR TRAINING PERFORMANCE RESULTS WITH 4-NEIGHBORS ............................. 56
FIGURE 4.7: PREDICTION ACCURACY PER MODEL ............................................................................................... 63
FIGURE 4.8: RMSE PER MODEL ............................................................................................................................ 63
FIGURE 4.9: TOTAL COST PER MODEL ................................................................................................................. 64
FIGURE 4.10: RESULTS OF A SERACH METHOD FOR SELECTING THE BEST ATTRIBUTES ........................................ 65
FIGURE 4.11: SHOWS THE VARIATIONS OF VEHICLE TRAFFIC VOLUME WITH TIME OF DAY ................................. 67
iv
LIST OF TABLES
TABLE 3.1: JOINT PROBABILITY DISTRIBUTION OF A, B AND C ............................................................................. 27
TABLE 4.1: 24 OF THE 520 TRAINING EXAMPLES ................................................................................................. 43
TABLE 4.2: A SUMMARY OF TRAINING AND TEST DATA SETS .............................................................................. 43
TABLE 4.3: NAIVE BAYES RESULTS ........................................................................................................................ 51
TABLE 4.4: NAIVE BAYES CONFUSION MATRIX USING THE NEW DATA ON THE MODEL ...................................... 51
TABE 4.5: CONDITIONAL PROBABILITY TABLE (CPT) ............................................................................................. 53
TABLE 4.6: BAYESIAN NETWORK RESULTS USING THE K2-P1 -S BAYES SEARCH ................................................... 54
TABLE 4.7: BAYESIAN NETWORKS CONFUSION MATRIX USING THE TRAINING SET ............................................. 54
TABLE 4.8: RECIPE FOR KNN TRAINING ALGORITHM ............................................................................................ 55
TABLE 4.9: KNN PREDICTION RESULTS USING 4-K ................................................................................................ 56
TABLE 4.10: K-NEARET NEIGHBOR CONFUSION MATRIX USING THE TRAINING SET ............................................. 57
TABLE 4.11: DECISION TREE PREDICTION RESULTS .............................................................................................. 58
TABLE 4.12: DECISION TREE CONFUSION MATRIX USING THE NEW INPUT TO THE MODEL ................................. 58
TABLE 4.13: THE LOSS MATRIX FOR COMPUTING THE COST OF VEHICLE TRAFFIC PREDICTION ........................... 60
TABLE 4.14: PREDICTION MODELS RESULTS ......................................................................................................... 62
TABLE 4.15: ATTRIBUTE SELECTION COMPARING TWO ATTRIBUTES FOR THE BN MODEL DATASET .................... 66
v
LIST OF ANNUXURE
ANNUXURE 1A: LIVE VEHICLE POPULATION ADOPTED FROM NATIONAL TRAFFIC INFORMATION SYSTEMS ...... 77
ANNEXURE 1B: PERCENTAGE OF THE TYPE OF VEHICLE USED ON JOHANNESBURG FREEWAY ............................. 78
ANNUXURE 4A: DATA FOR VEHICLE TRAFFIC FLOW COLLECTED FROM MTM....................................................... 79
ANNEXURE 4B: INSTANCES USED FOR THE VEHICLE TRAFFIC FLOW EXPERIMENT ............................................... 82
ANNUXURE 4C: PROCEDURE FOR CONVERTING DATA TO NOMINAL VALUES ...................................................... 87
vi
CHAPTER 1: INTRODUCTION
1.1 Introduction
Traffic congestion is a challenge in Gauteng province of South Africa and it has a dire
impact on the economy of this province in that services and products are not being
rendered on time [9]. The lack of punctuality from employees due to traffic jams also
affects the economy negatively. It is estimated that every month companies lose
approximately R1.1 billion in the form of paying salaries [12]. This loss excludes other
costs resulting from the cancellation of meetings. Statistics show that 8% of employees’
meetings are delayed because of traffic congestion [12]. Traffic congestion affects the
quality of lives of Gauteng residents and visitors alike, resulting in longer than necessary
driving time [1]. In addition, fuel emissions from vehicles contribute to greenhouse gases.
Furthermore, traffic congestion can prevent emergency services from reaching
individuals whose lives might be in peril. Among the causes of traffic congestion are
incidents such as car accidents, roadblocks, bad weather and lane closure due to road
constructions.
Studies ([11]; [14]) provided a number of solutions to traffic congestion. The use of
technologies such as Artificial Neural Networks (ANN)[14], Support Vector Machines[14],
Kalman Filter [14] and Fuzzy Systems have been used to complement the available
transport infrastructure and to reduce vehicle traffic congestion in urban areas.
The ANN model was constructed by using vehicles’ speed (km/hr) as the input to ANN in
predicting travel time (the time taken to move from one location to another) on a given
road [14]. The model predicted the traffic during off-peak hours better than during peak
hours. The Kalman Filter model was used for dynamic travel time information forecasts
[14]. The inputs into this model were the speed of the vehicle, the travel time and the
travel times of the rear and the adjacent cars [14]. The Kalman Filter model gave an
accuracy of 0.015 maximum absolute relative errors in all time periods. When the
smoothing process was carried out as the second improved model for predicting dynamic
travel time, it showed that the forecast results fluctuated when the proportion of the
accidental factors was too large in a single segment; therefore, the smoothing process
1
showed better prediction results than the Kalman Filter. Support Vector Machines (SVM)
were used to determine the occurrence and characteristics of an incident. The input to
this model was the vehicle speed profiles and lane changing behaviour [14]. It was
established that the SVM, as well as the ANN, successfully predicted the occurrence and
characteristics of the incident into three categories (i.e. normal conditions, passing a
possible incident scene, and stopped in a queue) [14].
Ji et al. [13] studied changes in travel time variations using the Kalman filter method. The
attributes that include the speed of the vehicle, the travel time and the travel times of the
rear and the adjacent vehicles were used as inputs to the Kalman filter. The predicted
results of the Kalman filter gave a value of 0.015 maximum absolute relative errors
(MARE) in all periods. On smoothing [13], the forecast results fluctuated when the
proportion of the accidental factors was too large in a single segment. An accidental factor
refers to the unexpected traffic incidents, such as potholes on the road, vehicle accidents
and road works among other factors that may cause traffic to slow down. The smoothing
process showed better prediction results MARE of 0.999. The study only compared the
predicted travel time against the statistic travel time to see if the model is able to predict
travel time variations. They did not mention whether the results obtained from the
smoothing model were able to resolve the travel time variations problem.
Zhao et al. [25] used the ANN, fuzzy systems and evolutionary computation algorithms
(which overcome nonlinearity and randomness of traffic systems) to investigate their
application in the Traffic Signal Control (TSC) systems for urban surface-way and freeway
networks [25]. Zhao et al. [25] mentioned that Computational Intelligence (CI)
technologies were effective solutions for TSC. Because of the advanced sensing and
communication of CI technologies, real-time traffic measurements have become
commonly available which are suitable for regulating traffic flow by sensing the minimal
average delay of vehicles. However, the researchers did not conduct the experiment for
this research, there were no results and no attributes were mentioned in essence the
researchers only discussed the Computational intelligence technologies available to
predict traffic congestion and did not mention which technology is the best to use for
predicting traffic congestion.
2
Hoong et al. [10] proposed a Bayesian Network (BN) framework for road condition
predictions. The events used were road traffic, road construction, roadblock and weather.
For the evaluation process, they created two variations of BN, namely, Naive Bayesian
Network (M2) and parameter-learning Bayesian Network (M3) as a benchmark to
measure the predictive accuracy of the proposed Bayesian Network (M1). M2 depicted
the lowest average accuracy (58.57%), followed by M1 (74.37%), while M3 scored the
highest accuracy of 76.01%. They evaluated the constructed BN (M3) model with Original
CPT (M1) and Parameter Learning (M2). According to Hoong et al. [10], the BN model
showed promising results, even though the accuracy of road traffic prediction was at
76.01%. In conclusion, they presented an approach to provide coverage of road traffic
conditions. The research relied on twitter tweets and user reports as main source of traffic
status evidence. That means if users did not report traffic conditions on time then the
reports would be delayed therefore inconveniencing commuters with incorrect report
information.
Sun et al. [24] proposed a traffic flow forecasting model performed under the criterion of
minimum mean square error (MSE) in Beijing. Their approach departs from many existing
traffic-flow forecasting models in that it explicitly includes information from adjacent road
links to analyse the trends of the current link statistically and that it includes traffic flow
forecasting when incomplete data exists. The BN model’s performance was better than
that of the Autoregressive (AR) method with RMSE 1322.6 versus 1384.0. By comparing
the results to several other methods such as fuzzy-neural model (FNM) and AR, it showed
that the BN would be a promising and effective approach for traffic flow modelling and
forecasting, for both complete data and incomplete data. The weakness of the study is
that the pre-processing of the data was not mentioned. They pointed that when large
amount of data is used the model may not be optimal therefore; this model may not be a
good model for predicting traffic vehicle flow.
The research using the ANN model, the SVM and the Kalman Filter [14] did not address
attributes that include the day of the week, public holidays, and an increase in the
population. Thus, this study intends to use the Bayesian networks – the Naive Bayes, K-
Nearest Neighbor (kNN) and Decision trees to predict vehicle traffic flow on the
3
Johannesburg ring road, the focus is study of traffic patterns. Furthermore, the vehicle
traffic congestion attributes will be built into the model.
1.2 A background to Johannesburg and its transportation sector
Johannesburg, the industrial and commercial capital of Gauteng province in South Africa,
had a population of 4 434 827 and a population density of 2 696/km2 in 2011 [5]. The
Gauteng population is growing at 3.18 per annum due to mass urbanisation. This growth
rate has resulted in transportation demands and a 0.37% increase in the number of
vehicles per month [9].
Johannesburg has commuter trains and these trains do not connect to important nodes
of the city. The public transport system is still very poor. However, cycle tracks are being
introduced. In addition, plans to expand the 80km rapid rail system to link to other
important nodes of the city are at an advanced stage. The focus of this study is on vehicle
traffic flow on the Johannesburg Ring Road. The Ring Road shown in Figure 1.1
comprises three freeways that converge forming an 80-kilometre loop around
Johannesburg [8]. These freeways comprise the N3 Eastern Bypass, which links
Johannesburg with Durban; the N1 Western Bypass, which links Johannesburg with
Pretoria and Cape Town; and the N12 Southern Bypass, which links Johannesburg with
Kimberley.
4
Figure 1.1 – Map of Johannesburg with Ring Road and region markings (adopted
by Turnbull [20])
In spite of being up to 12 lanes wide in some areas, the average travel time can be as
low as 2.4 minutes per kilometre [6]. Annexure 1A shows the live vehicles (the registered
vehicles that use the Johannesburg roads) for the nine provinces in South Africa,
according to the National Traffic Information System (eNaTiS) for 30 April 2013 to 31 May
2013. It further shows the increase or decrease of live vehicle population accessed within
one month, clearly showing that Johannesburg has the highest increase in terms of
numbers among the provinces (from 30 April 2013 to 31 May 2013).
5
1.3 The research problem statement
The economic growth in Johannesburg has attracted tens of thousands of people seeking
employment and other opportunities. These people come from within South Africa and
from the rest of Africa. This increase in population has put more pressure on the exiting
transportation infrastructure and systems. As a result, traffic congestion is a challenge in
Gauteng province. This has a negative impact on the economy of Gauteng in that service
are not being rendered on time [9] and that products become expensive as companies
tend to factor in the costs related to overtime allowances they pay to drivers. The lack of
punctuality by employees due to traffic jams also affects the economy negatively as
companies are likely to miss their customers’ agreed targets. It is estimated that every
month companies lose approximately R1.1 billion in the form of salaries paid for hours
not worked for [12].
Currently the interventions by the Gauteng province include the introduction of the Bus
rapid transit (BRT) also called Rea Vaya, the introduction of the rapid rail called the
Gautrain and e-tolls. However, the effectiveness of the e-tolls have done a heavy blow by
the new premier of Gauteng who gave mixed messages due to pressure from trade union
COSATU who are opposing e-tolls. The Gautrain only connects the Gauteng province’s
cities of Pretoria, Johannesburg and Ekurhuleni. Consequently, areas outside this
corridor still experience serious traffic congestion. Johannesburg is introducing cycle
tracks. The public have not yet bought into cycling yet.
According to Beall et al. [3], 34% of Johannesburg commuters use public transport, 32%
walk to work or school and 34% use private transport. According to ENaTiS [9], the total
number of active vehicles using Gauteng roads is greater than 4 million as shown in
Annexure 1A.
Annexure 1B shows the percentages of vehicle types that use the Johannesburg
freeways. The vehicle types range from buses, mini-taxis, vans, trucks and private
vehicles [9].
6
An efficient and fast transportation system is necessary as it would be beneficial to both
individuals, organisations and the Johannesburg economic growth. This research is
undertaken to design a model that will be able to predict traffic conditions in advance
A number of technologies to overcome challenges of vehicle traffic congestion have been

used [25]. Thus this number of research technologies that has been used by other
researchers has somehow limitations in that the research was carried out in simulation
environments and the attributes used were incomplete and some research work did not
mention the attributes used. The purpose of this research is to develop a vehicle traffic
prediction model for Gauteng, Johannesburg using the Bayesian networks, Naïve Bayes,
the nearest neighbour and the decision trees. These algorithms will be used to study the
current traffic flow patterns and find the best prediction model for predicting future traffic
flow. The Bayesian network, Naïve Bayes, Nearest Neighbor and Decision tree will be
trained and the best model will be selected based on the results of correct prediction, the
RMSE and the cost of prediction. The best model should be able to give real time traffic
conditions in advance therefore saving time and cost for commuters and organizations
and thus potentially increase work productivity.
1.4 Research objective

The objective of this study is to develop a model for predicting vehicle traffic flow in the
Gauteng Johannesburg urban area using the artificial intelligence methods.
1.5 Research question
The following questions will be asked in this study:
• What is the best model (s) for predicting the vehicle traffic congestion on the Ring road
(freeway)?
• What are the key variables for constructing the model?
• Which time of the day has the highest traffic congestion?
7
1.6 Justification of the research
There is almost 25% unemployment rate in South Africa [12]. Increasing efficiencies in
production by industry is key to attracting investment and hence the creation of jobs.
Vehicle traffic congestion contributes substantially to inefficiencies in productivity of
companies due to workers arriving late at work or delivery vehicles being held for hours
in traffic and thus resulting in unnecessary and costly overtime.
1.7 The research report structure
Chapter 1 introduces the study, giving a general background that includes the general
problem area and the cause-and-effect relationship to be studied in the context of the
study. Furthermore, the research problem, research objectives and research questions
are outlined in Chapter 1.
The literature review in Chapter 2 raises different arguments from various studies on the
elements affecting traffic congestion in developed and developing countries. The review
of methods, data, and explanations of traffic congestion done by other researchers are
also discussed in Chapter 2.
Chapter 3 shows the methodology that was followed. It defines the different algorithms to
be used to predict vehicle traffic flow on the Gauteng Johannesburg freeway. The chapter
outlines the research design, methods used to collect data, prediction tools, and a
performance analysis for each tool.
Chapter 4 presents the experiments that were conducted. Chapter 5 presents a

discussion of the findings. The conclusions and recommendations follow in Chapter 6. A
comprehensive reference list is presented after Chapter 6. Additional Figures are
attached in the Annexure.
8
CHAPTER 2: LITERATURE REVIEW
The chapter deals with related work done by other researchers.
2.1 Related work
Wisitpongphan et al. [23] used artificial neural networks (ANN) model for predicting
vehicle travel time in Bangkok Thailand. They used the attributes that include vehicle’s
speed (km/h) and time of the day, peak time 6:30hrs -9:30hrs, 15:00hrs -18:00hrs and
off-peak time of 11:00hrs -14:00hrs, 19:00hrs as the input to the ANN model. The model
predicted the real-time traffic during off-peak hours better than during peak hours. The
off-peak hour travel time prediction was off from the actual data (GPS data) by 8.768e-8
compared to the generated Google Maps results. Wisitpongphan et al. [23] concluded
that the proposed ANN model could provide drivers with real-time traffic conditions.
However, they did not show clearly how the experiment was done. Their approach is weak
because only two attributes that include speed and time of the day were used, which does
not include the volume of vehicle to give an indication to drivers a real status of traffic
condition well in advance.
Ma et al. [17] used vehicle speed profiles and lane changing behaviour as the attributes
to determine the occurrence and characteristics of an incident. The study presented a
real-time traffic condition assessment framework conducted in a simulation environment
in Spartanburg. The results showed 95% accurate vehicle traffic incident detection for
SVM and 92% for the ANN. The occurrence and characteristics of an incident was
categorised into normal conditions, passing a possible incident scene, and stopped in a
queue [17]. The study only focused on two attributes speed and lane changing behaviour.
Thus the attributes used were insufficient, and more attributes such as time of day when
the incident occurred would be useful. The research was conducted under simulation
environment, which may produce different results in real-world environment.
Pascale et al. [27] proposed for California highways an adaptive Bayesian network (ABN)
in which the network topology changes following the non-stationery characteristics of
vehicle traffic flow. Two traffic phases that included free flow and congestion were
9
identified. Free flow defines the situation in which the vehicles can travel at maximum
allowed speed and congestion is said to take place when vehicle speed is way below the
allowed maximum speed due to the mutual interaction of vehicles. The weakness of their
model is that only 26 days of traffic flow data was collected for the experiment in 2010.
Their results obtained using ABN show an RMSE of 10. There are a number of models
[26] [23] that have been used for predicting traffic flow. Other models may have been
employed the model could have output different results. Their research only included 26
days of traffic data which is not enough for predicting traffic flow for different seasons in
a year. Therefore their model does not capture the full coverage of the vehicle traffic flow
status.
Ji et al. [13] studied changes in travel time variations using the Kalman filter method. The
attributes that include the speed of the vehicle, the travel time and the travel times of the
rear and the adjacent vehicles were used as inputs to the Kalman filter. The predicted
results of the Kalman filter gave a value of 0.015 maximum absolute relative errors
(MARE) in all periods. On smoothing [13], the forecast results fluctuated when the
proportion of the accidental factors was too large in a single segment. An accidental factor
refers to the unexpected traffic incidents, such as potholes on the road, vehicle accidents
and road works among other factors that may cause traffic to slow down. The smoothing
process showed better prediction results MARE of 0.999. The study only compared the
predicted travel time against the statistic travel time to see if the model is able to predict
travel time variations. They did not mention whether the results obtained from the
smoothing model were able to resolve the travel time variations problem.
Zhao et al. [25] used the ANN, fuzzy systems and evolutionary computation algorithms
(which overcome nonlinearity and randomness of traffic systems) to investigate their
application in the Traffic Signal Control (TSC) systems for urban surface-way and freeway
networks [25]. Zhao et al. [25] mentioned that Computational Intelligence (CI)
technologies were effective solutions for TSC. Because of the advanced sensing and
communication of CI technologies, real-time traffic measurements have become
commonly available which are suitable for regulating traffic flow by sensing the minimal
average delay of vehicles. However, the researchers did not conduct the experiment for
this research, there were no results and no attributes were mentioned in essence the
10
researchers only discussed the Computational intelligence technologies available to
predict traffic congestion and did not mention which technology is the best to use for
predicting traffic congestion.
Hoong et al. [10] proposed a Bayesian Network (BN) framework for road condition
predictions. The events used were road traffic, road construction, roadblock and weather.
For the evaluation process, they created two variations of BN, namely, Naive Bayesian
Network (M2) and parameter-learning Bayesian Network (M3) as a benchmark to
measure the predictive accuracy of the proposed Bayesian Network (M1). M2 depicted
the lowest average accuracy (58.57%), followed by M1 (74.37%), while M3 scored the
highest accuracy of 76.01%. They evaluated the constructed BN (M3) model with Original
CPT (M1) and Parameter Learning (M2). According to Hoong et al. [10], the BN model
showed promising results, even though the accuracy of road traffic prediction was at
76.01%. In conclusion, they presented an approach to provide coverage of road traffic
conditions. The research relied on twitter tweets and user reports as main source of traffic
status evidence. That means if users did not report traffic conditions on time then the
reports would be delayed therefore inconveniencing commuters with incorrect report
information.
Sun et al. [24] proposed a traffic flow forecasting model performed under the criterion of
minimum mean square error (MSE) in Beijing. Their approach departs from many existing
traffic-flow forecasting models in that it explicitly includes information from adjacent road
links to analyse the trends of the current link statistically and that it includes traffic flow
forecasting when incomplete data exists. The BN model’s performance was better than
that of the Autoregressive (AR) method with RMSE 1322.6 versus 1384.0. By comparing
the results to several other methods such as fuzzy-neural model (FNM) and AR, it showed
that the BN would be a promising and effective approach for traffic flow modelling and
forecasting, for both complete data and incomplete data. The weakness of the study is
that the pre-processing of the data was not mentioned. They pointed that when large
amount of data is used the model may not be optimal therefore; this model may not be a
good model for predicting traffic vehicle flow.
11
May et al. [26] used the k- nearest neighbour (kNN) model to predict traffic flow and
pedestrian frequency for German cities that have populations of more than 50 000
inhabitants. The experiment was done in the city of Rodgau and Hamburg. The prediction
model was used for determining the cost of road advertising using posters in Germany.
May et al. [26] compared the kNN, decision trees, Gaussian processes (GP), support
vector regression (SVR) and linear regression models using the correlation and the
relative absolute error (RAE) as the performance indicators. The kNN with 9 neighbours
gave the best results outperforming the other models and it had an RAE of 31.65% for
Rodgau, 28.36% for Hamburg and correlation of 0.9503 and 0.9034 respectively. The
decision tree model gave the RAE results of 40.99% for Rodgau, 48.86% for Hamburg
and a correlation of 0.9023 and 0.8209 respectively. The SVR model gave the RAE
results of 42.93% for Rodgau and 47.70% for Hamburg and a correlation of 0.8928 and
0.7690 respectively. May et al. [26] used 10 folds-cross validation which gave almost the
same identical results then used the leave-one-out cross validation to improve the results.
The kNN model was able to predict traffic frequencies for vehicle and pedestrian in
Rodgau and Hamburg thus becoming the basis for pricing of posters. The weakness of
the study was that no manual calculations were done to validate the accuracy of the
results. The study also does not mention the days in which the data was collected,
weekdays, weekends and public holidays may project different results.
Yu, et al. [32] proposed a prediction model for forecasting the vehicle traffic flow
information. The BN was used for each road link with some casual nodes which can affect
road situations in the future. A joint probability density function of the Bayesian network
was obtained by assuming Gaussian Mixture Model (GMM) which utilizes training data
set. Yu, et al. [32] conducted an experiment to validate the accuracy of the model with
two measures, one was the root mean square error (RMSE) and the other was travel
time. To construct the BN model and to verify the accuracy of the model, Yu, et al. [32]
used real-time traffic information for road links of an area at Kang-Nam-Ku in Seoul city
for one month, from 1 September to 30 September, 2004. For the experiment the
researchers divided the traffic flow information into two parts: the training data set and the
test data set. The training data set was extracted from the 1st of September through to
20th of September and was used to estimate the parameters of the GMM. The test data
12
set was extracted from the 21st of September through to 30th of September 2014 and was
used for model validation of forecasting performance. This is a weakness as the test data
is likely not to be representative of the data collected throughout the whole month of
September. Yu, et al. [32] employed the two measures such as root mean square error
(RMSE) and travel time for the validation. The predicted results for the BN model RMSE
averaged from 0.44% to 0.79% and the predicted travel time obtained was 85%.Yu, et al.
[32] concluded that the proposed BN is able to predict traffic flow with the low RMSE of
0.44%. The procedure followed was not clearly documented such that one cannot utilise
the same experiment to produce the same results and the attributes used for the
experiment was not clearly stated.
Ji et al. [28] proposed traffic flow forecasting model based on genetic neural network
(GNN) model. They conducted simulation experiments using the genetic neural network
for forecasting of vehicle traffic flow. The data was collected from Jingshishi Road in
Jinan. The ANN weights and biases were obtained using the genetic algorithm and the
backpropagation algorithm was used for training the ANN. The results show biases of -
0.1150 and weight of -0.1677. Their results showed that the genetic neural network
method is an accurate and valid method for traffic flow forecasting. The weakness of the
study is that the researchers did not discuss the results obtained. The model is poor
because they only use ANN weight to evaluate the performance of the model.
Passow et al. [29], applied an ANN method to traffic flow condition prediction. The traffic
flow data was collected from 20 individual roads in Leister in the UK. Four different ANNs
were tested on traffic flow data collected over a year: the feed-forward back-propagation
(FF-BP), cascade forward back-propagation (CF-BP), radial basis function (RBF) and the
generalised regression (GR). An adaptive ANN based training data filter was introduced
and tested. The adaptive ANN based filter showed that it is able to detect outliers. The
FF-BP ANN and the CF-BP ANN benefited from using the adaptive training data filter.
The mean error of the FF-BP reduced on average by more than 48% and the CF-BP
reduced by more than 43%.The filter also improved the RBF and the GR ANN’s
performance with mean error of 64%. Passow et al. [29] concluded that the adaptive ANN
with data filter delivered better real-time traffic flow conditions for CF-BP and FF-BP than
13
without the data filter. The researchers did not mention the attributes that were used for
the experiment of each model.
Li et al. [30] proposed the k nearest neighbour method and a pattern recognition algorithm
(PRA) to predict the future state of traffic flow. The performance of weighted pattern
recognition algorithm (WPRA) was evaluated by simulation data and real data. The
performance of the WPRA and the PRA was compared by studying the RMSE between
the actual and the predicted traffic flow. Li et al. [30] collected the traffic flow measurement
data at a site monitored by the Institute of Automation Chinese Academy of Sciences
(CASIA). The variables used include; weather, time of the day and the volume of vehicles.
The daily traffic flow was divided into four time intervals: A (5:30 - 9:30), B (9:30 - 15:30),
C (15:30 - 18:30) and D (18:30 - 5:30). A is the morning peak hours. B and D are the off-
peak hours. C is the evening peak hours. The RMSE predicted by WPRA was 17% less
compared to that of the actual traffic flow. The traffic flow calculated by PRA was 22%. Li
et al. [30] declared that good prediction results were obtained. The traffic flow data for
summer days only (September) was used and this is a weakness as the data is not
representative of a whole year traffic flow pattern. For future work traffic data for different
weather season should be considered.
14
2.2 The framework
The design of a vehicle traffic congestion Bayesian prediction

model in Gauteng
Past studies
Research problem
Research Objectives Why past studies are

included in study?
Attributes
Research questions Applied Design /

Algorithm
Proposed
Method/
Model
Figure 2.1 – Traffic congestion research Framework
Figure 2.1 shows the research problem, possible processes, and the proposed
methodology of pursuing this study.
15
2.3 Chapter Conclusion
A number of technologies to overcome the nonlinearity which shows in randomness of

traffic flow have been used [25]. Artificial intelligence tools that can be used to build a
predictive model for determining road traffic flow that have been identified include BN,
ANN, Support vector machine, decision tree and Kalman filter [14] [17].
Related work show that BN [24] [27] [32] followed by the ANN [29] [28] gave good
performance results. The ANN was not used for this research because it requires a large
amount of data in order to produce better prediction results. In this work, the data was
small. Thus based on the related work, the most suitable approach for related work is the
Bayesian Network model. On the other hand, some of the models used on related work
were carried out under simulation environments, which may differ when used in the real
world situations. In addition, some of the reported work does not cover a whole year of
data.
16
CHAPTER 3: RESEARCH METHODOLOGY
3.1 Introduction
This section focuses on the approach used to find answers to the underlying research
questions and how the research was conducted. The chapter addresses the research
methodology, the research design (the data collection method, data analysis) and
limitations associated with these approaches.
3.2 Research Methodology
The research methodology talks about how the research will be done. Research
methodology refers to the way in which the researcher gathers and analyses the
evidence. Put another way, research methodology is the systematic way in which the
research problem is solved Bell et al [33].
The approach that was selected for this research is the quantitative research. A predictive
model will be constructed. This model must be able to generalize from the analysis by
predicting certain phenomena on the basis of hypothesized, general relationships. Thus
the study to a particular problem will be applicable to similar problems elsewhere.
The main aspect of the research were to; first construct a model using the artificial
intelligence algorithms which was evaluated based on each algorithm performance. This
was carried through experimental study in Weka open source software. Other
experimental software such as MetLab could be used, but for this study Weka was chosen
as it is an open source software and requires minimum setup. An experiment is
constructed to be able to explain some kind of causation. To make sure that this model
is applicable to the prediction of vehicle traffic flow on freeways interviews with data
analyst at Mikros Traffic Monitoring (MTM) were undertaken.
3.2.1 Research Design
The research design talks about what was done. This quantitative study was adopted for
creating the vehicle traffic congestion forecast model for Johannesburg in Gauteng.
Charoenruk [7] described a quantitative study as a formal, objective process that
17
describes, tests relationships, and examines causes and effect of interactions among
variables. The training data used to predict the traffic congestion was obtained from
Mikros Traffic Monitoring (MTM) databases.
The data analyst from MTM were interviewed telephonically. Telephone interviews reduce
costs associated with face-to-face interviews. MTM was selected because it is a road
traffic agency that electronically collects, monitors data on vehicle traffic flow.
The obtained traffic data from the MTM TrafBase system show the name of the road,
travel time, day, time of day, traffic volume and the vehicle speed on each lane of the
freeway. The attributes day, time of day, travel time, traffic volume, and the vehicle speed
were used for this study. The model that was developed shows the interaction between
the attributes.
3.2.2 Data Collection
The data obtained from MTM was from the N1 freeway in Johannesburg, starting from
the N1 New Road in Midrand to the N1 south direction 14th Avenue near Florida and the
Ring road shown in Figure 1.1.
MTM uses traffic monitoring stations, which are various wireless devices that collect traffic
data for diverse purposes. MTM have devices, such as ADTs, which are induction loops
and electronic monitoring equipment devices that are permanently installed on the roads.
These devices are used for monitoring long- and short-term traffic trends. The devices
are high speed weigh-in-motion stations to measure dynamic axle masses, overloading
trends and they also have interchange/intersection monitoring, which is used to monitor
the mainline traffic, the turning movements, queue lengths, time delays, signal phasing,
pedestrian movements, vehicle occupancy and vehicle class. The data collection stations
are installed on the Johannesburg freeway and are set 50 metres apart. These different
devices are installed on Johannesburg roads, collecting real-time traffic data and storing
the data in the traffic database system called TrafBase. MTM has the traffic database
management system to ensure quality assurance and the validation of data collected.
18
3.2.3 Reliability
To ensure that the data collected is reliable and not corrupted, MTM has the traffic
database management system to ensure quality assurance and the validation of data
collected. According to MTM, the databases are maintained and cleaned more frequently.
3.3 Limitations
The limitations include the time constrains in which this research is conducted. The data
collected from MTM databases is primary data to MTM but secondary data to us. We thus
had no control on the collection of data and despite all this collecting primary data will
have cost thousands of rands on infrastructure for electronically capturing this data and
also expense in terms of time spent by the side of the freeways observing individual
vehicle traffic flow.
3.4 Research procedure and methods
This section focuses on the artificial intelligence methods and explains how each method
is used to predict traffic congestion. Each model is explained in detail with its specific
advantages and challenges.
3.4.1 Bayesian Networks

The Bayesian belief networks (BBN) are directed graphs suitable for representation of
uncertain knowledge. The BBN is adopted to provide a solution to the vehicle traffic
congestion problem in Gauteng. It is a graphical model that encodes relationships among
variables.
The Bayesian network model can be adapted to suit various kinds of problems, including:
Military applications include:
• Automatic Target Recognition

• Autonomous control of unmanned underwater vehicle
• Assessment of Intent
19
Applications in Medical Diagnosis include:
• Internal medicine
• Pathology diagnosis
• Breast Cancer Manager with Intellipath
Commercial applications include:
• Financial market analysis

• Information retrieval
• Situation assessment for nuclear power plant
Decision making by a motorist on the freeway is characterised by:
• Time pressure
• Dynamically evolving scenarios
Bayesian networks (BNs) – also called belief networks, Bayesian belief networks, Bayes
nets and, sometimes, probabilistic networks – describe the probability distribution over a
set of variables by specifying a set of conditional independence assumptions along with
a set of conditional probabilities [18]. Consider an arbitrary set of random variables
𝑌𝑌1 … 𝑌𝑌𝑛𝑛 , where each variable 𝑌𝑌𝑖𝑖 can take on the set of possible values 𝑉𝑉(𝑌𝑌𝑖𝑖 ). We define the
joint space of the set of variables 𝑌𝑌 to be the cross product 𝑉𝑉(𝑌𝑌1 ) × 𝑉𝑉(𝑌𝑌2 ) × … . 𝑉𝑉(𝑌𝑌𝑛𝑛 ).
Each item in the joint space corresponds to one of the possible assignments of values to
the tuple of variables 〈𝑌𝑌1 … 𝑌𝑌𝑛𝑛 〉. The probability distribution over this joint is called the joint
probability distribution [18] which specifies the probability for each of the possible
variables bindings for the tuple 〈𝑌𝑌1 … 𝑌𝑌𝑛𝑛 〉. A BBN describes the joint probability distribution
for a set of variables [18].
Conditional Independence
The notion of conditional independence is as follows: Let 𝑋𝑋, 𝑌𝑌 𝑎𝑎𝑎𝑎𝑎𝑎 𝑍𝑍 be three discrete-
valued random variables. Therefore, 𝑋𝑋 is conditionally independent of 𝑌𝑌 given 𝑍𝑍 if the
20
probability distribution governing 𝑋𝑋 is independent of the value of 𝑌𝑌 given a value for 𝑍𝑍,
that is if
(∀𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑗𝑗 , 𝑧𝑧𝑘𝑘 ) 𝑃𝑃(𝑋𝑋 = 𝑥𝑥𝑖𝑖 �𝑌𝑌 = 𝑦𝑦𝑗𝑗 , 𝑍𝑍 = 𝑧𝑧𝑘𝑘 � = 𝑃𝑃(𝑋𝑋 = 𝑥𝑥𝑖𝑖 |𝑍𝑍 = 𝑧𝑧𝑘𝑘 )
Where 𝑥𝑥𝑖𝑖 ∈ 𝑉𝑉(𝑋𝑋), 𝑦𝑦𝑗𝑗 ∈ 𝑉𝑉(𝑌𝑌), 𝑎𝑎𝑎𝑎𝑎𝑎 𝑧𝑧𝑘𝑘 ∈ 𝑉𝑉(𝑍𝑍) . The above expression is commonly written
as 𝑃𝑃(𝑋𝑋|𝑌𝑌, 𝑍𝑍) = 𝑃𝑃(𝑋𝑋|𝑍𝑍). The set of variables 𝑋𝑋1 … 𝑋𝑋𝑙𝑙 is conditionally independent of the set
of variables 𝑌𝑌1 … 𝑌𝑌𝑚𝑚 given the set of variables 𝑍𝑍1 … 𝑍𝑍𝑛𝑛 .
BN are mathematical models presented graphically so that each variable is presented as

a node with directed links forming arcs between them. The information content of each
variable is presented as one or several probability distributions. If a variable has no
incoming arcs and is not dependent on any other variables, such as no parent in the joint
space, it has one probability distribution; and if it has parents, it has one probability
distribution per combination of possible values of the parents. BN use probability as a
measure of uncertainty – beliefs about values of variables are expressed as probability
distributions and the higher the uncertainty, the wider is the probability distribution. For
instance, when the information accumulates, knowledge of the true value of the variable
usually increases, i.e. the uncertainty of the value decreases and the probability
distribution grows narrower [21].
Figure 3.1 – Bayesian Belief Network structure
21
Figure 3.1 shows that the node traffic volume is connected to traffic congestion; therefore,
traffic volume is the parent of traffic congestion; similarly, speed is the parent of traffic
congestion. The Bayesian belief network is of the following form:
• Factored joint probability distribution as a directed graph:
o structure for representing knowledge about uncertain variables
o computational architecture for computing the impact of evidence on beliefs
• Knowledge structure:
o variables are depicted as nodes
o arcs represent probabilistic dependence between variables
o conditional probabilities encode the strength of the dependencies
• Computational architecture:
o computes posterior probabilities given evidence about selected nodes
o exploits probabilistic independence for efficient computation
The Bayes rule:
P ( E | A) P ( A)
P( A | E ) =
PE
• is based on definition of conditional probability
• P(Ai|E) is posterior probability given evidence E
• P(Ai) is the prior probability
• P(E|Ai) is the likelihood of the evidence given Ai
• P(E) is the preposterior probability of the evidence
Its architecture consists of a set of nodes and a set of directed edges (links) between the
nodes. Each node has a finite set of mutually exclusive states. The nodes together with
the edges (arcs) form a directed acyclic graph. Directed acyclic graph is a probabilistic
22
graphical model that represents a set of random variables and their conditional
dependencies.
Figure 3.2 – Direct Acyclic Graph
Figure 3.2 shows a directed acyclic graph, the nodes representing the random variables
or events and the edges representing the causal relationships of the nodes. Examples of
nodes in this study are average speed, time of day and incidents, among other variables.
The links will show the relationships between these variables (attributes). An important
concept in BBN is the conditional independence between variables. BBN is a probabilistic
graphical model that represents a set of events and their causal relations, using a directed
acyclic graph. The BBN causal inference makes the effects of various scenarios possible.
Nodes at the head and at the tails of the arc are called parent and child respectively. The
parent nodes affect the child nodes based on dependence relations between them.
Nodes with parents have a conditional probability table (CPT). It is conditional because it
is derived from the conditional probability. The CPT for each node captures the mutual
relationship between each node and its parent nodes. The CPT gives a probability
distribution for every combination of states of the variable’s parents. Nodes with no
parents have a probability table giving the prior probability distribution of the node. A node
with no parents is called a root node. The child node can be deterministic, incomplete
deterministic and nondeterministic:
23
• The deterministic node: its state is completely deterministic when the parent node
state is defined.
• The incomplete deterministic node: when one of the parent nodes is in a certain state,
no matter what the other parent nodes’ state is, the node’s state is certain. However,
for the parent node’s other states, the node’s state depends on the other parent nodes’
states.
• The nondeterministic node: this node’s state is always uncertain no matter what the
parent state is. For those probabilities for the nondeterministic node that are hard to
estimate manually, the noise-or principle needs to be applied, which allows the
estimation of the joint conditional probabilities based on the marginal conditional
probabilities and thus reduce the number of probabilities to estimate.
The variables for the vehicle traffic flow prediction are nondeterministic and thus in this
study, the approach that is used is for the nondeterministic node.
BBN provides a correct and accurate method of measuring the effect of events on each
other. For instance, if event C can be effected by events A and B, then using the known
probabilities will calculate the initialised probability of C by summing the various
combinations in which C is true and breaking those probabilities down into known
probabilities.
𝑝𝑝(𝐶𝐶) = 𝑝𝑝(𝐶𝐶𝐶𝐶𝐶𝐶) + 𝑝𝑝(𝐶𝐶~𝐴𝐴𝐴𝐴) + 𝑝𝑝(𝐶𝐶𝐶𝐶~𝐵𝐵) + 𝑝𝑝(𝐶𝐶~𝐴𝐴~𝐵𝐵)
Then calculate the reviewed probability of A and B being true (and therefore the chances
that they caused C to be true), by using the Bayes Theorem, with the initialised probability:
p(B|C) =(p(C|B)*p(B))/p(C)
In this work, the model of the intelligent system will be said to be complete when the
nodes’ states and relationship are specified.
The strength of BBN vis-a-vis other technologies lies in modelling and analysing a
complex problem that is characterised by direct or indirect effects and uncertainty. The
24
causal relations are derived by pairwise comparisons. With BBN, the strength of a
relationship between attributes can be quantified using the CPT. The CPT represents the
detailed relations (the strength of the dependence relations) between nodes. BBN is a
means of addressing uncertainty in the form of vehicle flow prediction.
In BBN, learning constitutes the following:

• parameters (parameter estimation)
• structure (optimization w/ score functions)
BBN is:
o different from other knowledge-based systems because uncertainty is handled

in a mathematically rigorous yet efficient and simple way.
o different from other probabilistic analysis tools because of network
representation of problems, using Bayesian statistics, and the synergy
between these.
The Bayesian net situation assessment model provides:
o Knowledge of the structural relationship between situations, events, and event

clues
o Means of integrating the situations and events to form a holistic view of their
meaning
o Mechanism for projecting future events
BBN captures the qualitative (by the graph) and the quantitative (by CPT) contextual
knowledge as a compact and single model. The network part captures the qualitative
nature of dependence relations, while the quantitative nature is captured by the probability
tables.
The other reason for choosing BBN is that it is able to deal with uncertain and incomplete
data based on the sound mathematical theory.
25
Joint probability distribution
A joint probability is analogous to a knowledge base in expert systems. It means that the
joint probability distribution is supposed to have answers for all the queries on this
particular problem domain.
𝑃𝑃(𝐴𝐴 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡, 𝐵𝐵 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡) is written to mean “the probability of 𝐴𝐴 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 and 𝐵𝐵 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡”.
Figure 3.3 further shows the joint probability graph.
Figure 3.3 – Joint probability distribution graph
The joint probability can be between any number of variables, e.g.
P(𝐴𝐴 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡| 𝐵𝐵 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡|𝐶𝐶 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡)
For each combination of variables, it is necessary to say how probable that combination
is as shown in Figure 3.3. The probability of these combinations needs to sum to 1 as it
shows in Table 3.1.
26
Table 3.1 – Joint probability distribution of A, B and C
A B C P(A,B,C)
false false False 0.1
false false True 0.2
false true False 0.05
false true True 0.05
true false False 0.3
true false True 0.1
true true False 0.05
true true True 0.15
=1
Once one has the joint probability distribution, one can calculate any probability involving
A, B, and C
Compute:
𝑃𝑃(𝐴𝐴 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡)= sum of 𝑃𝑃(𝐴𝐴, 𝐵𝐵, 𝐶𝐶) in rows with 𝐴𝐴 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑃𝑃(𝐴𝐴 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡, 𝐵𝐵 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡|𝐶𝐶 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡) = 𝑃𝑃(𝐴𝐴 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡, 𝐵𝐵 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡, 𝐶𝐶 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡)| 𝑃𝑃(𝐶𝐶 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡)
Some challenges with the Joint Distribution are:
• Lots of entries in the table to fill up
• For k Boolean random variables, one needs a table of size 2k
• To use fewer numbers, one needs the concept of independence
27
Independence: Variables A and B are independent, if any of the following hold:
• P(A,B) = P(A) P(B)
• P(A | B) = P(A)
• P(B | A) = P(B)
The Chain Rule for Bayesian Networks

Theorem: Let BN be a Bayesian network over 𝑈𝑈 = {𝐴𝐴1 , … , 𝐴𝐴𝑛𝑛 }. Then BN specifies a unique
joint probability distribution 𝑃𝑃(𝑈𝑈) given by the product of all conditional probability tables
specified in BN:
𝑃𝑃(𝑈𝑈) = ∏𝑛𝑛𝑖𝑖=1 𝑃𝑃�𝐴𝐴𝑖𝑖 �𝑝𝑝𝑝𝑝(𝐴𝐴𝑖𝑖 )�, (Equation 3.1)
Where 𝑝𝑝𝑝𝑝(𝐴𝐴𝑖𝑖 ) are the parents of 𝐴𝐴𝑖𝑖 in BN, and 𝑃𝑃(𝑈𝑈) reflects the properties of BN.
If the variables A and B are d-separated in BN given the set C, then A and B are
independent given C in 𝑃𝑃(𝑈𝑈).
d-Separation
The conditional independence in BN can be explained using the concept called d-

separation. In summary, nodes X and Y are d-separated if on any (undirected path)
between X and Y there is some variable Z such that is either:
• Z is in a serial or diverging connection and Z is known, or

• Z is in a converging connection and neither Z nor any of Z’s descendants are known.
Nodes X and Y are said to be d-connected if they are not d-separated (there exists an
undirected path between X and Y not d-separated by any node or a set of nodes). If X
and Y are d-separated by Z, then X and Y are conditionally independent.
28
Steps to test d-Separation
Test whether A and B are d-separated given hard evidence on a set of variables C.
1. Construct the ancestral graph consisting of A, B and C together with all nodes from
which there is a directed path to A, B or C.
2. Insert an undirected link between each pair of nodes with a common child.
3. Make all links undirected obtaining a moral graph of the ancestral graph.
4. If all paths connecting A and B intersect C in the moral graph, then A and B are d-
separated given C.
Learning causal relations
Causal network relations have an important role when developing probabilistic models.
In BN causal relations are helpful, as the directed link to variables must generate causal
relations. Figure 3.4 shows the causal relation of serial connection, which states unknown
evidence on B, information about either A or C will influence our belief of the state of the
other variables. This means evidence can be transmitted in a serial connection, if we do
not know the state of the middle variable B [15].
Figure 3.4 – Serial connection
29
Figure 3.5 – Diverging connection
In the diverging connection in Figure 3.5, knowing evidence on A makes B, C and E

independent (common cause), in other words, knowing evidence on A will not affect our
believe on B, C or E.
Figure 3.6 – Converging connection
In Figure 3.6, which is a converging connection, not knowing A makes B, C and E

independent. This means if A is unknown, then E is not an indicator of B or C and vice
versa. Therefore, unlike the serial and diverging connections, converging structure allows
transmission of information whenever the evidence of the intermediate variable is
available [15].
Markov Blanket
The Markov blanket of variable A is the set consisting of the parents of A, the children of
A, and the variables sharing a child with A. When a Markov blanket is instantiated, then
A is d-separated from the rest of the network.
30
The other independence relations in BN are described using the Markov blanket. A
variable (node) is conditionally independent of its non-descendants given its parents.
Using the Markov blanket, then a node is conditionally independent of all other nodes
given its Markov blanket, i.e. its parents, children and spouses (parents of common
children).
A Hidden Markov model is a Dynamic Bayesian Network (DBN) model with the Markov
property, while a Kalman filter is a hidden Markov model where exactly one variable has
relatives outside the time slice.
Dempster-Shafer Theory (DST)
If there is no prior knowledge, assuming probability of each possibility: P = 1/N.
N is # of possibilities. Thus, probability theory (PT) distributes an equal amount of

probability even in ignorance. This assignment of P is made in desperation and it is the
use of the principle of indifference. When only 2 possibilities exist, then P = 50%.
It turns out that the difference between DST and probability theory (PT) is the treatment
of ignorance.
• PT requires evidence that does not support a hypothesis to contest it.
• DST does not force belief to be assigned to ignorance or refutation of a hypothesis.

Instead, mass is only assigned to subsets to which one wishes to assign belief.
• With DST, any belief that is not assigned to a specific subset is considered no belief
or non-belief and just associated with Belief that contests its disbelief, which is not
non-belief.
Dempster’s rule combines masses to produce a new mass that represents a consensus
of the original (as it tends to favour agreement by including masses in set intersections
(common elements of evidence)), possibly conflicting evidence.
The Dempster-Shafer is different from the Bayesian in that each evidence has a degree
between 0 or 1, whereby 0 means there is no evidence support for that fact and 1 means
there is full evidence support for that fact. Both values can be 0, which means there is no
31
evidence about that fact. In BN, accurate use depends on the prior knowledge of
probabilities, which makes the BN algorithm more reliable in predicting traffic flow.
The BN has been chosen for this research, as it offers the following benefits:
• It is suitable for small and incomplete data sets – which means there is no minimum
sample data sizes required to perform the analysis – it takes into account all the
available data. It can show good prediction accuracy in rather small sample sizes.
• It provides a simple way to visualise the structure of a probabilistic model and can be
used to design and motivate new models.
EM algorithm
The EM algorithm can be used even for variables whose value is never observed. It is
used to learn an unobservable variable provided the general form of the probability
distribution governing these variables is known [19]
The EM algorithm has been used to train a BN and many other unsupervised clustering
algorithms.
For the study, the d-separation was not used because it looks at each variable as
independent of its ancestor’s value given the value of its parents. That means it only looks
at the latest evidence and updates the current problem.
Some challenges with Bayesian networks are that they practically depend on prior
knowledge of many probabilities. If these probabilities are not known in advance, they are
often estimated based on background knowledge, previous available data and
assumptions about the underlying distribution. Another challenge is the significant
computational cost required to determine the Bayes optimal hypothesis in the general
case [18].
In summary, Bayesian Networks (BN) are graphical probabilistic models. BN provides an
efficient representation and inference. BN represents a knowledge structure that models
the relationship between vehicle traffic flow challenges, their causes and effects, vehicle
traffic information and diagnostic tests.
32
3.4.2 Naive Bayesian networks
The Naive Bayes (NB) classifier applies to learning tasks where each instance 𝓍𝓍 is
described by a conjunction of attribute values and where the target function 𝑓𝑓(𝑥𝑥) can take
on any value from some finite set 𝒱𝒱. A set of training examples of the target function is
provided and described by the tuple of attributes values P〈𝑎𝑎1, 𝑎𝑎2, … . . 𝑎𝑎𝑛𝑛, 〉. The model is
asked to predict the target value or prediction for a new instance [18].
The NB classifier is based on the simplifying assumption that the attribute values are
conditionally independent given the target value [18].
𝒱𝒱𝒩𝒩ℬ = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 P(𝑣𝑣𝒿𝒿 ) ∏𝑖𝑖 𝑃𝑃(𝑎𝑎𝑖𝑖 |𝑣𝑣𝑗𝑗 ) (Equation 3.2)
Where 𝒱𝒱𝒩𝒩ℬ represents the target value output by the NB algorithm. The NB algorithm
involves a learning step in which the various P(𝑣𝑣𝒿𝒿 ) and P(𝑎𝑎𝑖𝑖 |𝑣𝑣𝒿𝒿 ) terms are estimated,
based on their frequency over the training data. The set of these estimates corresponds
to the learned model. The model is then used to predict each new instance by applying
the rule in Equation 3.2.
NB refers to the strong independence assumptions in the model, rather than to the
particular distribution of each feature.
Figure 3.7 – Bayes network structure (adapted from Russell et al. [19])
33
In Figure 3.7, The NB model assumes that each of the attributes 〈𝑓𝑓1 , 𝑓𝑓2 … 𝑓𝑓𝑛𝑛 〉 used is
conditionally independent of one another given some state c. More formally, if one wants
to calculate the probability of observing features 𝑓𝑓1 through 𝑓𝑓𝑛𝑛 given some state c, under
the NB assumption the following holds:
𝑃𝑃(𝑓𝑓1 , … , 𝑓𝑓𝑛𝑛 | 𝑐𝑐) = ∏𝑖𝑖 = 1𝑛𝑛𝑛𝑛(𝑓𝑓1 |𝑐𝑐) (Equation 3.3)
This means that when one wants to use a NB model to predict a new example, the
posterior probability is much simpler to work with:
𝑃𝑃(𝑐𝑐|𝑓𝑓1 , … , 𝑓𝑓𝑛𝑛 ) ∝ 𝑃𝑃(𝑐𝑐)𝑃𝑃(𝑓𝑓1 |𝑐𝑐)…𝑃𝑃(𝑓𝑓𝑛𝑛 |𝑐𝑐) (Equation 3.4)
These assumptions of independence are rarely true, but in practice, NB models have
performed surprisingly well, even on complex tasks where it is clear that the strong
independence assumptions are false [18].
Advantages of NB[22]
• Fast, highly scalable model building and scoring

• Easy to understand
• It can be used for both binary and multi-class prediction problems.
The NB is too simple and it results in the use of fewer parameters. Yet, the unrestricted
BN is too complex with a possibility of overfitting plus complexity. Thus, various
approximations between these two extremes (Naive Bayes and Bayesian belief networks)
are useful.
3.4.3 Decision Tree (C4.5)

A Decision tree is a classifier that uses a tree-like graph in which the learned functions
are represented by the Decision tree. A Decision tree is a supervised prediction method
because the dependent attribute and the counting of classes (values) are given.
34
Decision tree is easy for humans to understand. Decision tree learning has been applied
in problems such as predicting medical patient diseases, loan applicants by their
likelihood of defaulting on payments and predicting vehicle traffic flow volumes.
Decision tree predict instances by sorting them down the tree from the root to some leaf
node, which provides the prediction of the instance. Each node in the tree specifies a test
of some attribute of instance, and each branch descending from that node corresponds
to one of the possible values for this attribute [18]. Each leaf node in the tree specifies an
evaluation of an attribute of an instance and each branch descending from the node
represents the values of this attribute [18]. Decision tree represent a disjunction of
conjunctions of constraints on the attribute values of instances and each path from the
tree root node to a leaf node corresponds to a conjunction of attribute tests [18]. An
instance is predicted by starting at the root node of the tree, testing the attribute specified
by this node, then moving down the tree branch corresponding to the value of the attribute
in the given example.
Figure 3.8 – Decision tree showing first step of ID3
Figure 3.8 shows the partially learned decision tree diagram, where the descendant of
Low shows positive examples and therefore becomes a leaf node with a Yes. The other
two attributes Average and High will be expanded further by selecting the attribute with
the highest information gain related to the new subset of examples [18].
The information gain can be used to measure which attribute is the best predictor. This
can be done by first defining the measure used in information gain called the Entropy.
35
Given a collection of B samples, containing positive and negative samples of the target
concept, the entropy of B relative to this Boolean prediction is:
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 (𝐵𝐵) ≡ −𝑝𝑝⨁ 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝⨁ − 𝑝𝑝⊖ 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝⊖ (Equation 3.5)
Where 𝑝𝑝⨁ is the quantity of positive examples in B and 𝑝𝑝⊖ is the quantity of negative
examples in B.
The decision tree equation 3.5 on the information gain measures does well in the
following:
• Handling continuous variables,

• Handling training data with missing values, and
• Improving computational efficiency.
The decision tree is much faster, once trained, compared to the ANN. The Decision tree
usually throws away input features that it finds not useful, whereas the Neural Network
will use it unless feature selection is done. It means that ANN gives a slightly better
performance than decision trees. Similarly, BN does not throw away any attributes, unless
this is done during feature selection; therefore, BN will outperform ANN and Decision
trees.
3.4.4 Neural networks

The artificial neural network learning method provides a robust approach to approximating
real-valued, discrete-valued and vector-valued target functions. Algorithms such as
BACKPROPAGATION use gradient descent to tune network parameters to fit a training
set of input-output pairs. These network parameters include weight, number of epochs,
number of inputs, the learning rate ɳ, and the momentum 𝛼𝛼 term. ANN is built out of a
closely interconnected set of simple units, where each unit takes a number of real-valued
inputs and produces a single real-valued output [18].
A simple Neural network model consists of three layers, namely the input layer, the hidden
layer and the output layer. The number of attributes or features determines the number
of units in the input layer; the output layer has a neuron, and the output value used in the
36
training is provided from the training data; the number of units in the hidden layer is
determined by the number of input and output neurons [18]. The future vehicle traffic
forecast has relations with the historical vehicle traffic flow therefore the future and current
traffic flow prediction can be based on the vehicle traffic flow several periods before. The
idealization of the neuron process activity is shown in Figure 3.9.
Figure 3.9 – Artificial Neural Network structure (Multilayer perceptron (MLP))
Good characteristics for ANN
• It is robust to errors in the training data [19].

• It is efficient in handling input data, enabling learning from large datasets [19].
• It needs large training data.
1
𝐸𝐸(→
𝜔𝜔 ) ≡ 2 ∑𝑑𝑑∈𝐷𝐷 ∑𝑘𝑘∈𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜(𝑡𝑡𝑘𝑘𝑘𝑘 − 𝑜𝑜𝑘𝑘𝑘𝑘 ) (Equation 3.6)
Where 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 is the set of output units in the network, and 𝑡𝑡𝑘𝑘𝑘𝑘 and 𝑜𝑜𝑘𝑘𝑘𝑘 are the target and
output values associated with the 𝑘𝑘th output unit and training example 𝑑𝑑.
ANN, especially the MLP, has been found to perform well in non-linear problems. Since
the data set is small (520 samples), the MLP will have performed poorly.
37
3.4.5 k-Nearest Neighbor (KNN)
Instance-based learning methods, such as Nearest Neighbor and locally weighted
regression, are used for approximating real-valued or discrete-valued functions. An
instance-based learning method stores the training examples and generalizing beyond
these examples is postponed until a new instance must be predicted. Each time a new
instance is met, its relationship to the previous example is examined in order to assign a
target function value for the new instance.
This algorithm assumes all instances correspond to a point in the 𝑛𝑛-dimension space ℜ n
. 𝐾𝐾 in kNN stands for the selected number of Neighbors. The Nearest Neighbor of an
instance is defined in terms of the standard Euclidean distance. Let the arbitrary instance
𝑥𝑥 be described by the feature vector V [18].
a1(x ), a 2(x ),...a n ( x)
Where a r (x ) is the value of the r th attribute of instance x. then the distance between
( )
two instances xi and x j is defined by d xi , x j . (Using the Euclidean distance)
d (xi , x j ) ≡ ∑ (a (x ) − a (x ))
n
2
where r i r j
(Equations 3.7)
r =1
This equation 3.7 measures how far each of the neighbors is from the instance. The
Power of the Nearest Neighbor is that the model allows for uses of different K values for
different classes, rather than a fixed K value for all classes [2]. The model performance
is sensitive to the choice of parameter K (the number of neighbors). This parameter is
manipulated.
In nearest neighbor, the target function being learned may be either discrete-valued or
real-valued.4
Learning a discrete-valued target function of the form 𝑓𝑓: ℜ𝑛𝑛 → 𝑉𝑉, where 𝑉𝑉 is the finite set
of targets {𝑣𝑣1, … 𝑣𝑣𝑠𝑠 } is shown in Figure 3.10.
This algorithm assumes all instances correspond to a point in the 𝑛𝑛 - dimension space ℜ n
. The kNN of an instance is defined in terms of the standard Euclidean distance. The
arbitrary instance 𝑥𝑥 should be described by the feature vector [18].
38
Figure 3.10 – 1-Nearest Neighbor (adapted from Mitchel [18])
Figure 3.10 shows the shape of the decision surface prompted by 1-Nearest Neighbor
over the entire instance space, 1-Nearest Neighbor predicts 𝑥𝑥𝑞𝑞 positive. The decision
surface is the combination of complex polyhedral surrounding each of the training
examples.
Good Characteristics for the kNN:
Is that in lazy learning instead of estimating the target function once in the entire instance
space, lazy algorithms can estimate it locally and differently for each new instance to be
predicted [18].
The following challenges exist with the k-Nearest-Neighbor classifier:
• It has the potential to be a computationally expensive algorithm.

• The model breaks down quickly and becomes inaccurate when there are a few data
points for comparison.
• It is inefficient, as it needs to compare a new instance with all the samples in the
training set.
kNN decision boundary can take on any form [18], because kNN makes no
assumption about the data distribution. Contrast this NB, which assumes that attributes
are conditionally independent to each other given the class (attributes). As a result, NB
can only have linear, elliptic or parabolic decision boundaries, which make the tractability
of kNN decision boundary a huge advantage. However, if the data is separable by any of
the forms of NB's decision boundaries, then kNN is not of much help.
39
This section discussed the different artificial intelligence algorithms. The challenges and
benefits of each algorithm were discussed. It was concluded that the BN, NB, Decision
trees and kNN were better candidates, based on their potential benefits, in minimizing
traffic congestion in Gauteng and they would be evaluated further in Chapter 4. The BN
and NB emerged as the most interesting candidates. The other two algorithms in the
experimentation in Chapter 4 would be included for comparison purposes.
40
CHAPTER 4: EXPERIMENT
4.1 Chapter Introduction
The focus of this work is to learn to recognize complex patterns automatically and to make
intelligent decisions based on data. The WEKA (Waikato Environment for Knowledge
Analysis) open source application which implements a number of different artificial
intelligence learning algorithms [16] was used for the experiment. WEKA supports data
mining tools that include data processing, clustering, classification, prediction, regression,
visualization and feature selection among other tools [22]. For this experiment, the four
algorithms that were used to produce models from data included the BN, NB, k-Nearest-
Neighbor and the Decision tree algorithm. The models predict vehicle traffic flow into three
levels: freeflow, trafficjam and flowingcongestion. Annexure 4C illustrate the difference
between these levels. The four algorithms were compared, based on their performance
on predicting traffic flow. The model that performed best in predicting vehicle traffic flow
was selected.
4.2 Experiment
The data set for the year 2013 (January to December) were used to construct a model.
The data used for the experiment was obtained from the Mikros Traffic Monitoring (MTM).
A sample of this data is shown in Annexure 4A. The data used for constructing the model
had five attributes as shown in Table 4.1 where 24 of the 520 instances are shown. The
four attributes, Day (day of the week), Time of day (time), Volume (vehicle volume size)
and Speed (speed of the vehicle) excluding Class (target value) were selected as having
a high influence on predicting vehicle traffic congestion.
All the models were generated from algorithms that include Bayesian networks, Naive
Bayes, k-Nearest Neighbor and Decision trees. The cross-validation was selected at 10
Folds. The data was split two-thirds for the training examples and one-third for test
examples as shown in Table 4.2. The test sample was a separate file that the model had
never seen before. The training set was used for building the model. The class label was
41
removed from the test set and then the test set was used to determine if the model was
able to predict the unseen data. Cross-validation will be explained in Section 4.2.1. Three
indices were used to measure the performance of the three selected algorithms, namely
the correct prediction, root mean squared error (RMSE) and cost of prediction.
Data was converted from integer to nominal values in order to present the five attributes
as pre-specified fixed values. The full procedure is shown in Annexure 4C. The 173 test
instances are shown in the Annexure 4B.
Time of day - Peak time is defined as (06h00 - 09h00, 15h00 - 18h00) and Off Peak time
is defined as (10h00 - 13h00, 19h00 - 22h00).
42
Table 4.1 – 24 of the 520 training examples
Table 4.2 – A summary of training and test data sets
Classes
TrafficJam FreeFlow FlowingCongestion Total
Training set 101 160 86 347
Instances
Testing set 46 76 51 173
Total
520
instances
43
4.2.1 Cross-validation Method
Cross-validation is a validation technique used to evaluate the performance of a
prediction model. In cross-validation the datasets are partitioned into subsets (k) called
folds, which is split into N partitions for one-third test sets and two-thirds training sets. The
10 fold cross-validation was used to reduce any bias caused by the sample chosen. The
10 fold means that the process was repeated ten times, each time with a different subset,
where each subset was used exactly once for testing. Tenfold cross-validation was used
in all the experiments conducted in Section 4.3
Three measures were employed to assess the model’s performance: the accuracy for
prediction; the root mean square error (RMSE) defined by equation 4.1; and the cost of
prediction. The accuracy of prediction is a ratio of the number of correct predictions and
the total number of test examples.
RMSE =
∑(p i − yi ) 2
(Equation 4.1)
M
In the above equation 4.1:
p is the model’s predicted output
yi is the true output for example 𝒊𝒊 and
M is the total number of test examples.
A high RMSE means that the obtained result is poor, while a low RMSE infers that the
obtained result is good.
Prediction is the foretelling the occurrence of an event based on historical data. The model
must have generalization ability, in other words, it must be able to predict future unseen
data within the same distribution. For the model to be able to predict the traffic flow, one
needs to first train the model and then deploy a trained model to predict unseen data.
Figure 4.1 shows a pictorial view of the process of training the model.
44
Figure 4.1 – Shows a process for constructing the Model
Figure 4.2 shows a pictorial view depicting a model predicting unseen data.
Figure 4.2(a) – Shows how the Model is used to predict a new instance
Figure 4.2 shows the model accepting new data, in this case the attributes of time of day,
volume, speed and weekday which we can represent as [𝑥𝑥1 , 𝑥𝑥2 , … 𝑥𝑥𝑛𝑛 ] . The resulting
output 𝑦𝑦 indicating the likely outcome for this new data as show in Figure 4.2(b). The
important aspect in modelling is that the model would generate accurate predictions for
future observations. Figure 4.2 (b) captures in flow chart form the combination of Figure
4.1 and Figure 4.2 (a). The detailed process for constructing the model is given from
section 4.2.2 up to section 4.2.6.
45
Figure 4.2 (b) - Shows the flow for the construction of the Model and its use in
predicting a new instance (new data)
Under perfect situations, the model should be error-free. In real- world problems, even a
finest model produces errors. Sources of model errors will be uncertainty due to the noise,
insufficient data, missing data or inappropriate data. The best model as shown in Table
4.14 is assessed based on the accuracy of predicting future data.
4.2.2 Recipe for constructing the Bayesian model

4.2.2.1 The recipe for training the model was as follows:
This follows the block diagrams in Figure 4.1 and Figure 4.2.
A set of 347 training examples, where each instance is described by the attributes
Day, Time of day, Volume, Speed and Class were used. The model needs to predict
46
the target value (freeflow, trafficJam, flowingCongestion) which is in essence the
state of vehicle traffic flow on the freeway.
Step 1 – Load the saved training examples (training data) on WEKA as shown in
Figure 4.3
Step 2 – Under Classify select the Bayesian algorithm and select start (start
training)
Step 3 – Save the model
The saved model is the brains of the system as it has captured the underlying
function that describes the data. You now want to use this model for prediction by
using step 4 and 5.
Step 4 – Select set supply test set and upload the unseen test data
Step 5 – Reload the saved model by right clicking the result list then select re-
evaluate model. The output results for the Bayesian networks (which is covered in
more detail in section 4.2.4) are results shown in Figure 4.4.
The data that the model has never seen before (test data) is used to evaluate the
performance of the model. In other words, the ability of the model to predict future
data (new data).
47
Figure 4.3 – Loading saved training examples on Weka
48
Figure 4.4 – The results from the Bayesian Network prediction model
4.2.3 The Naive Bayes prediction model
The learning algorithm in Figure 4.1 in our case is the NB algorithm. Thus, a Naïve Bayes
model is created. A set of 347 training examples, where each instance is described by
the attributes Day, Time of day, Volume, Speed and Class were used.
The model needs to predict the target value (freeflow, trafficJam, flowingCongestion)
which is in essence the state of vehicle traffic flow on the freeway.
The mathematical procedure followed in constructing the Naïve Bayes Model
Step 1: Instantiate Equation 4.2 to fit the current task, the target value 𝒱𝒱𝒩𝒩ℬ is given
by
49
𝒱𝒱𝒩𝒩ℬ = argmax 𝑃𝑃 (𝑣𝑣𝒿𝒿 ) ∏𝑖𝑖 𝑃𝑃(𝑎𝑎𝑖𝑖 �𝑣𝑣𝑗𝑗 � (Equation 4.2)
𝑣𝑣𝑗𝑗 ∈ {𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓,𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡, 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓}
= argmax 𝑃𝑃 (𝑣𝑣𝒿𝒿 )
𝑣𝑣𝑗𝑗 ∈ {𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓,𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡, 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓}
Where 𝒱𝒱𝒩𝒩ℬ denotes the target value output by the Bayes predictor.
Now it can be shown how the model works given a new instance:
P(Day=weekday|𝑣𝑣𝑗𝑗 ) P(Time of day=off-peak|𝑣𝑣𝑗𝑗 ) P(Volume=Low|𝑣𝑣𝑗𝑗 )

P(Speed=Average|𝑣𝑣𝑗𝑗 )
Now it would be possible to predict the state of vehicle traffic flow on the freeway:
Step 2: Determine the probabilities of the target values based on the occurrences
over the training examples.
P(State = freeflow) = 160/ 347 =.46

P(State = trafficJam) = 101/ 347=.29
P(State = flowingCongestion) = 86/ 347 =.24
Step 3: Estimate the conditional probabilities
Those of Volume = Low are
P(Volume= Low|State=freeflow) = 139/ 139=1

P(Volume = Low|State=trafficjam) = 0/ 0=0
P(Volume = Low|State=flowingCongestion) = 0/0 =0
Now that we have shown the mathematical treatment of data using NB, we can now safely
use the NB from WEKA. Table 4.3 shows the NB training data prediction and the
prediction results for the test data. The NB predictor model was constructed using10-folds
cross-validation. A total of 347 training instances were used to train the model and 173
50
test instances were used where the Class target value was removed to test if the model
would be able to predict the traffic status.
The naïve Bayes model was able to predict 343 out of 347 training instances correctly;
with 98.8% correctly predicted instances and 4 instances were predicted correctly at
1.15% as shown in Table 4.3. The model was loaded and then the test data was used for
the model to predict the target values. Table 4.3 shows that the test data model predicted
98.8% instances correct, which means most of the instances were predicted correctly
with only two instances predicted incorrect.
Table 4.3 – Naive Bayes results
Training Data Test Data

Incorrectly Correctly
Correctly Predicted
Predicted RMSE Predicted RMSE
Instances
Instances Instances
343 - 98.8% 4 - 1.15% 0.09 171- 98% 0.09
Table 4.4 shows the Confusion matrix with the NB algorithm. The rows in a Confusion
matrix denote the instances in an actual state, and the columns denote the instances in
a predicted state.
Naive Bayes Confusion Matrix
Table 4.4 – Naive Bayes Confusion matrix using the new data on the model
Predicted
Freeflow TrafficJam FlowingCongestion
Actual Freeflow 74 2 0
TrafficJam 0 46 0
FlowingCongestion 0 0 51
Table 4.4 shows the Confusion matrix for NB which was used later to calculate the Loss
matrix for NB. The results from Table 4.4 mean that 74 Freeflow instances were predicted
correctly out of 76, where 2 instances were predicted incorrectly as Trafficjam. The 46
51
actual instances for Trafficjam were all correctly predicted as Trafficjam, and 51 instances
were predicted correctly as FlowingCongestion.
4.2.4 Bayesian Belief Network (BNN)
Conditional probability is defined in terms of joint probabilities as shown in Table 4.5; it is

useful for analysing the effect of an event (traffic congestion).
Recipe for the BBN algorithm
1. Define from the instance space X, the set of relevant target value 𝑋𝑋𝑖𝑖 .
2. Provide training set of the target function < X 1 , X 2 ..... X n > such that causes prior
effects.
3. Add the instance (node), 𝑋𝑋𝑖𝑖 to the network.
4. Add relations to the node 𝑋𝑋𝑖𝑖 from some minimal set of attributes already in the network
P( X i | X 1 , X 2 ..... X m ) = P( X i| Parents ( X i )) Where X 1 , X 2 ..... X m are all the variables
preceding X i that are not in Parents( X i )
5. Determine the conditional probability table (CPT) for each attribute, P( X i| Parents ( X i ))
Table 4.5 shows the conditional probability table (CPT) for the BN algorithm. With BBN,
the strength of a relationship between attributes can be quantified using the CPT. The
CPT represents the detailed relations (the strength of the dependence relations) between
nodes. The CPT is discussed in more detail in Chapter 3, Section 3.4.1.
52
Table 4.5 – Conditional probability Table (CPT)
P(Traffic Congestion)
Traffic volume
FreeFlow FlowingCongestion trafficJam
Traffic True 0.01 0.02 0.07 0.1
Congestion
False 0.09 0.04 0.01 0.23
0.1 0.06 0.08
P(TrafficVolume)
Marginalization is the summing of rows and columns – P(Traffic Congestion, Traffic

volume)
Figure 4.5 – Training prediction results for the BN algorithm using K2-P1 -S Bayes
search
53
Figure 4.5 shows the Bayesian model summary results and the Confusion matrix of the
BN algorithm using K2-P1 -S Bayes search. Figure 4.4 shows the results of the Bayesian
model after the unseen data was used to test if the model is able to predict future unseen
data. Table 4.6 shows the BN results after loading the model.
Table 4.6 - Bayesian network results using the K2-P1 -S Bayes search
Training Data Test data

Correctly Incorrectly Correctly
Predicted predicted RMSE predicted RMSE
Instances Instances Instances
344 - 99.1% 3 - 0.08% 0.08 171 - 98.8% 0.06
The results for the Bayesian network model using the K2-P1 S search are shown in Table
4.6. From Figure 4.5 the training results show that the BN model was able to predict
instances at 99.1% with 3 Freeflow instances that were incorrectly predicted as
Trafficjam. Table 4.6 further shows that the model was able to predict the test instances
at 98.8% correctly with an RMSE of 0.06.
Bayesian networks Confusion Matrix
Table 4.7 – Bayesian networks Confusion matrix using the training set
Predicted
Freeflow 74 2 0
Actual TrafficJam 0 46 0
The results in Table 4.7 mean that 74 Freeflow instances were predicted correctly out of
76, where two instances were predicted incorrectly as Trafficjam. A total of 46 training
instances were predicted correctly as TrafficJam and 51 instances were predicted
correctly as FlowingCongestion.
54
4.2.5 k-Nearest-Neighbor
The recipe of 𝑘𝑘 − 𝑁𝑁earest neighbor algorithm for approximating a discrete-valued target
function is given in Table 4.8.
Table 4.8 – Recipe for kNN training algorithm
For the k-Nearest-Neighbor experiment, on WEKA the predict tab was selected, then
Lazy was selected, then IBK (the IB stands for Instance-Based, and the k allowed one to
specify the number of neighbors to examine). “Use training set” was selected from WEKA
to use the data set that had just been loaded to create the model.
55
Figure 4.6 – k-Nearest Neighbor training performance results with 4 Neighbors
From Figure 4.6, this model shows a 98.8% accuracy rating by correctly predicting
training instances. This creates an accurate and effective model. However, the model
breaks down quickly and becomes inaccurate when there are few data points for
comparison. In Table 4.9, the kNN model with k = 4 shows 4 false positives, meaning the
4 instances were interpreted incorrectly.
Table 4.9 – kNN prediction results using the value k = 4

Training Data Test data
Correctly Incorrectly Correctly
Predicted predicted RMSE predicted RMSE
Instances Instances Instances
343 - 98.8% 4 - 1.15% 0.06 171- 98% 0.07
Table 4.10 shows the Confusion matrix using the kNN algorithm with K = 4 which was
used later to calculate the cost for kNN. The training results show in Table 4.10 that the
model predicted 74 instances correctly out of 76 instances as Freeflow, while 2 instances
56
were incorrectly predicted as Trafficjam. The model is able to predict 46 instances
correctly as TrafficJam and 51 instances as FlowingCongestion.
Table 4.10 – k-Nearest Neighbor Confusion matrix using the training set
Predicted
TrafficJam 0 46 0
4.2.6 Decision Tree
Cross-validation was used for the experiment and 520 instances were used from January
to December. A total of 347 instances were used for the training set and the remaining
173 were used for test data.
The Decision tree C4.5 was used for this experiment. The saved training data was used
to predict unseen data. The model is able to predict 345 out of 347 instances correctly
with 99% correctly predicted instances and 2 instances predicted incorrectly at 0.057%
as shown in Table 4.11. The model was loaded, and the new data was used to evaluate
the model prediction performance on the state of vehicle traffic flow. Table 4.11 shows
that the model was able to predict 95% test instances correctly.
Steps to add attributes in a tree
Step 1: Calculate the entropy of each attribute in the training example to determine
which attribute should be tested first in the tree. Entropy is calculated using
Equation 4.3
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 (𝐵𝐵) ≡ −𝑝𝑝⨁ 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝⨁ − 𝑝𝑝⊖ 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝⊖ (Equation 4.3)
Step 2: Calculate the information gain of each attribute in the training example.
|𝑆𝑆𝑣𝑣 |
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺(𝑆𝑆, 𝐴𝐴) ≡ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆) − ∑𝑣𝑣∈𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉(𝐴𝐴) |𝑆𝑆|
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑣𝑣 ) (Equation 4.4)
57
Step 3: From the above Step 2, choose the attribute with the most information gain.
Step 4: Calculate the entropy of each attribute value of the root node and partition
them against all the remaining attributes.
Step 5: Calculate the information gain of the attribute values against the remaining
attributes. The attribute with the highest information gain is the next attribute to
branch from.
Step 6: Repeat the process until all attributes are included in the tree.
Table 4.11 – Decision tree prediction results
Training Data Test Data

Incorrectly Correctly
Correctly Predicted
predicted RMSE predicted RMSE
Instances
Instance Instances
345- 99% 2 – 0.57% 0.07 166 - 95% 0.14
Decision tree Confusion Matrix
Table 4.12 shows the Confusion matrix of a Decision tree which was used later to
calculate the cost for Decision tree model.
Table 4.12 – Decision Tree Confusion matrix using the new input to the model
Predicted
TrafficJam 0 46 0
In Table 4.12, the results of a Confusion matrix using the Decision tree (C4.5) show that
the model predicted 69 instances correctly as Freeflow, 2 instances were incorrectly
predicted as Trafficjam and 5 test instances were incorrectly predicted as Flowing
congestion. The model was able to predict 46 instances correctly as TrafficJam and 51
58
instances as FlowingCongestion. This means the training model performed better
compared to the test model.
4.3 Post Processing
In the previous section, four algorithms were used to predict traffic flow on a freeway. The
Confusion matrix was used to identify the prediction rate for each algorithm, where the
false positive and the false negative was used to evaluate the cost of the model. The Loss
matrix is the function used to measure the overall cost incurred in taking any of the
variable decisions or actions [4].
Predictions that were not beneficial to commuters’ were penalized and this was done
using the Loss matrix. In other words, the model was heavily penalised when it predicted
traffic flow as FreeFlow, where it was a TrafficJam and it was penalised at low cost when
it predicted FlowingCongestion, where it was a FreeFlow and vice versa.
The goal for Loss matrix is to reduce the number of incorrect predictions and thereby
reduce the total cost incurred. A penalty of 4 was assigned to a model when it predicted
FreeFlow when the actual traffic state was TrafficJam as shown in Table 4.13. A penalty
of 1 was assigned to a model when it predicted TrafficJam when the traffic status was
FlowingCongestion. A penalty of 2 was assigned to a model when it predicted
FlowingCongestion when the actual traffic status was TrafficJam. A penalty of 3 was
assigned to a model when it predicted FreeFlow when the actual traffic status was
FlowingCongestion. A penalty of 3 was assigned to a model when it predicted TrafficJam
when the actual traffic status was FreeFlow. A penalty of 1 was assigned to a model when
it predicted FlowingCongestion when the actual traffic status was FreeFlow.
59
Table 4.13 – The Loss Matrix for computing the cost of vehicle traffic prediction
Predicted
FreeFlow TrafficJam FlowingCongestion
FreeFlow 0 3 1
Actual
TrafficJam 4 0 2
The elements (𝐿𝐿_𝑀𝑀𝑘𝑘𝑘𝑘 ) of the loss matrix in Table 4.13 specify the penalty associated with
the prediction. These elements were chosen by hand.
Calculation for the cost of the model is given by Equation 4.5.
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶_𝑀𝑀𝑀𝑀𝑀𝑀 = ∑𝐶𝐶_𝑀𝑀𝑘𝑘𝑘𝑘 ∗ 𝐿𝐿_𝑀𝑀𝑘𝑘𝑘𝑘 (Equation 4.5)
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶_𝑀𝑀𝑀𝑀𝑀𝑀 = ∑𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑋𝑋 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀
Where 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶_𝑀𝑀𝑀𝑀𝑀𝑀 is the total cost incurred, 𝐶𝐶_𝑀𝑀𝑘𝑘𝑘𝑘 is the Confusion matrix from the previous
Table 4.4 and 𝐿𝐿_𝑀𝑀𝑘𝑘𝑘𝑘 is the Loss matrix from Table 4.13.
60
4.3.1 Total cost for Naive Bayes
The cost for NB is calculated by taking the values from the Naive Bayes Confusion matrix
in Table 4.4 and multiply with the value in the Loss matrix in Table 4.13; then add the
values together to get the cost for vehicle traffic flow prediction using the NB model:
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶_𝑀𝑀𝑀𝑀𝑀𝑀 = ∑𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑋𝑋 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 (Equation 4.5)
Total cost = 74x0 + 2x3 + 0x1 + 0x4 + 46x0 + 0x2 + 0x3 + 0x1 + 51x0
=6
4.3.2 Total cost for Bayesian Network

The total cost for Bayesian Network is calculated by multiplying the values of the Loss
matrix in Table 4.13 with the values from the corresponding cell of the Confusion matrix
in Table 4.7. Equation 4.5 is applied to get the total cost for vehicle traffic flow prediction
using the Bayesian Network model.
Total cost = 74x0 + 2x3 + 0x1 + 0x4 + 46x0 + 0x2 + 0x3 + 0x1+ 51x0
=6
4.3.3 Total Cost for k-Nearest Neighbor

The total Cost for kNN is calculated by multiplying the values of the Loss matrix in Table
4.13 with the values from the corresponding cell of the Confusion matrix in Table 4.10.
Equation 4.5 is used to calculate the total cost for vehicle traffic flow prediction using the
k-Nearest Neighbour.
Total Cost = 74x0 + 2x3 + 0x1 + 0x4 + 46x0 + 0x2 + 0x3 + 0x1+ 51x0
=6
4.3.4 Total cost for Decision tree

The total cost for Decision tree is calculated by multiplying the values of the Loss matrix
in Table 4.13 with the values from the corresponding cell of the Confusion matrix in Table
61
4.12. The total cost for vehicle traffic flow prediction using Decision tree model is
calculated using Equation 4.5.
Total Cost = 69x0 + 2x3 + 5x1 + 0x4 + 46x0 + 0x2 + 0x3 + 0x1+ 51x0
= 11
Table 4.14 shows each model’s performance results. The Bayesian Network has
outperformed the other models by predicting the test instance correctly at 98.8% with a
low RMSE of 0.06 and a cost of 6. Although the other models performed well with the
same cost of 6 as of the BN model, the model with the lowest RMSE was chosen for this
experiment.
Table 4.14 – Prediction models results
Test Prediction
Model RMSE Cost
Accuracy
Bayesian Networks 98.8% 0.06 6
Naive Bayes 98% 0.09 6
K-Nearest Neighbor 98% 0.07 6
Decision Tree 95% 0.14 11
62
Prediction
99 Bayesian Networks,
98.8
98.5
K-Nearest Neighbor,
98 Naïve Bayes, 98
98
97.5
97
96.5
96
95.5
95 Decision Tree, 95
94.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Prediction
FIGURE 4.7 – Prediction accuracy per model
In Figure 4.7, the BN model outperformed the other models with 98.8% correct traffic flow
prediction, the NB with 98%, the kNN with 98% and Decision tree with 95%, respectively.
RMSE
0.16
0.14 0.14
0.12
0.1
0.09
0.08
0.07
0.06 0.06
0.04
0.02
0
Bayesian Networks Naïve Bayes K-Nearest Neighbor Decision Tree
Figure 4.8 – RMSE per model
63
Figure 4.8 shows the RMSE per model. One can observe that the BN has the lowest
RMSE of 0.06 compared to the other models, which puts the BN model as the best
method, with kNN as the second best at the RMSE of 0.07, then the NB at the RMSE of
0.09 and the Decision tree as the least best method with the RMSE of 0.14.
Figure 4.9 – Total cost per model
Figure 4.9 shows the total cost for each model used in the experiment; the Bayesian
Networks, Naïve Bayes and K-Nearest Neighbor have a total cost of 6, which makes
these three models the best models in comparison with decision trees with the highest
cost of 11.
4.3.5 The attribute selection
The results in Figure 4.9 show that the Bayesian network is a promising model in
predicting vehicle traffic flow in Gauteng. The next step is to identify the attributes that
contribute to the vehicle traffic congestion. The most effective attributes that contribute to
traffic flow congestion were selected from attributes that include day, time of day, volume
and speed. The detail on these four attributes is as follows: time of day includes peak
time (06h00 - 09h00, 15h00 - 18h00), off-peak time (10h00 - 13h00, 19h00 - 22h00),
vehicle Speed is segmented into (<= 40 km/h low, >=41 km/h and <=100 km/h average
and >100 km/h high), day which is the day of week, volume of vehicles was segmented
into (<=3000 low, <=60000 average <=90000 high).
64
The two sets of attributes (Time of day and Speed) are selected based on the correct
prediction and RMSE. What follows are the steps on how the attribute feature selection
was done.
Step 1: Load data
1. Load the saved training examples on WEKA
Step 2: Attribute Selection
2. Click on “Select attributes” tab,

This selection allows you to reject unrelated attributes and thus reducing the
dimensionality of the dataset
3. Under “Attribute Evaluator” frame select relevant evaluation method (e.g.

“CfsSubsetEval”)
4. Under “Search Method” frame select relevant search method and in this case it is
“GreedyStepwise”.
5. On “Attribute Selection Mode” the selection will depend on the chosen evaluation
method selected in step 3. In this case ‘Use Full training set”, then click on the
start button.
Figure 4.10 – Results of a search method for selecting the best attributes
65
Figure 4.10 shows that out of the four attributes used in the dataset, two attributes that
include “time of day” and “speed “are the best attributes (meaning they are highly
discriminative).
Step 3: Is to confirm if the selection in step 2 is correct by using the best two
attributes combinations in building the model, then compare the RMSE and correct
prediction of the two attributes feature selected
6. Under “Pre-process” tab, in attribute highlight two attributes which you want to
remove then click remove button.
7. You will be left with two attributes and the target Concept.
8. Under “Classifier” select classification (e.g. BayesNet) then click on the Start
button.
9. Right click on “result list” and “save model”
10. Repeat the same steps for all the features by comparing a set of two attributes.
11. Evaluate the output buffer of all the results set, comparing their RMSE and correct
prediction.
12. You can repeat the steps from 8 till step 13 to compare all the attributes feature
two by two
Table 4.15 – Attribute selection comparing two attributes for the BN model dataset
Learning Correct
Algorithm Attributes RMSE Prediction
Time of day and
1 BN 0.14% 96%
Speed
2 BN Day and Speed 0.14% 96%
3 BN Volume and Speed 0.14% 96%
Time of day and
4 BN 0.31% 80%
Volume
5 BN Day and Volume 0.35% 63%
6 BN Time of day and Day 0.24% 58%
Table 4.15 shows the results of the BN model. The results show that speed obtained
correct prediction of 96% and RMSE of 0.14% which means it is the attribute that
66
contribute to vehicle traffic congestion the most. Time of day obtained correct prediction
of 80% and RMSE of between 0.31% and 0.24%. Figure 4.10 shows that speed and time
of day are the two attributes that influence the vehicle traffic flow status the most.
Figure 4.11 – Shows the variation of vehicle traffic volume with time of day
Figure 4.11 shows that the largest vehicle traffic volume is at 07.15hrs and at this time,
the volume is slightly more than 12000 vehicles. The next highest is at 16.00hrs with
approximately 10600 vehicle traffic volume. The differences in the morning peak and the
evening peak is that in the latter the knock off-time is spread from 15.30hrs to 17.00hrs
whereas in the former most companies start work between 07.30hrs and 08.00hrs in
Johannesburg.
67
In conclusion, the Bayesian Networks, Naïve Bayes, K-Nearest Neighbor and Decision
Trees were used for the experiment. The performance results for each model were
evaluated based on the prediction accuracy, RMSE and the total cost of prediction. The
cost associated with incorrect prediction of vehicle traffic flow was calculated to reduce
the number of inaccurate predictions and thereby decrease the total loss incurred. The
attribute selection method was done on Weka the attributes were compared using a set
of two attributes. The two attributes with the lowest RMSE and highest correct prediction
were selected as attributes that contribute the most to traffic congestion.
68
CHAPTER 5: SUMMARY, CONCLUSION AND RECOMMENDATIONS
5.1 Summary
The results in Table 4.6 mean that the best prediction model for the state of traffic flow is
the Bayesian network model. It achieved a test prediction accuracy of 98.8% and a cost
of 6 and the lowest RMSE of 0.06. Thus the BN has a high generalisation ability, meaning
that it can predict the data that was not used in constructing the model but data that comes
from the same distribution as the data that was used for constructing the model. The
reason for choosing BN is that it is able to deal with uncertain situations and incomplete
data and this is based on sound mathematical theory.
Table 4.3 shows that the Naïve Bayes model achieved testing performance of 98% with
an RMSE of 0.09 and a total cost of 6 which is the same as that of the BN model. The
model predicted 2 instances incorrectly as traffic jam instead of free flow. The cost penalty
of 3 as shown in Table 4.13 was given for such incorrect prediction. Its RMSE was higher
than that of the BN model, which means that commuters might make costly inaccurate
travel decisions.
From Table 4.9, the kNN model has achieved a test performance of 98% with an RMSE
0.07 and a total cost of 6. Thus the kNN model is a better model to use in predicting traffic
flow congestion compared to the NB model.
The Decision tree model achieved testing performance of 95% with an RMSE of 0.14,
and a total cost of 11 as shown in Table 4.14. This means the Decision tree is a poor
model to use in predicting vehicle traffic flow. The Decision tree model was panelised and
considered a high cost model. This model will not be considered for predicting vehicle
traffic flow.
Thus the BN model outperformed the NB, kNN and Decision tree models. This is based
on the number of correctly predicted instances, the root mean squared error and the cost
of prediction. The ANN was not used for this work. This is because the sample size was
small as the ANN requires a larger sample of data to perform well.
69
The models incorrectly predicted Freeflow as Trafficjam this is due to the high number of
instances used for the attribute Freeflow of 76 compared to the two attributes namely;
Flowingcongestion with 51 instances and Trafficjam with 46 instance. The attributes that
had a very strong influence in predicting the vehicle traffic flow were the traffic volume
and the time of day. The time of the day peak or off-peak, also show a strong influence in
traffic congestion as shown in Figure 4.11.
Similarly, others applying the BN algorithm in predicting traffic flow have achieved good
results. Sun et al. [24] and Hoong et al. [10] proposed the BN model for forecasting vehicle
traffic flow. Sun et al. [24] proposed a traffic flow forecasting model performed under the
criterion of minimum mean square error (MMSE). Their approach differs from this study
in that it includes information from adjacent road links to analyse. In addition, it includes
traffic flow forecasting when incomplete data exist. Their forecasting BN performance was
RMSE 1322.6.
Hoong et al. [10] proposed a BN framework for road condition predictions. For the
evaluation process, they created two variations of BN, namely, Naive Bayesian Network
(M2) and parameter-learning Bayesian Network (M3), as benchmark to measure the
predictive accuracy of the proposed Bayesian Network (M1). M2 depicted the lowest
average accuracy (58.57%), followed by M1 (74.37%), while M3 scored the highest
accuracy of 76.01%. Their results are lower than the results from this study.
The implications of the Bayesian network traffic prediction model from this study are that
commuters will be able to know the status of vehicle traffic flow on the freeway ahead of
time and be able to use alternative routes or leave hours earlier to avoid traffic jam. This
will allow businesses to save millions of rands per annum on the production loss they
incur currently as a result of workers reporting late for work.
Constructing a model that predicts the status of traffic flow for the next five years is a
challenge because 5 years is a long time as there is a high likelihood of some new
variables might come up to disrupt the pattern (trend) and hence the model might no
longer be able to accurately predict the state of traffic as a result of missing data
(variables). The model is not perfect because there is missing data (variables), for
example data on weather.
70
5.2 Conclusion and recommendations
In this study, a vehicle traffic flow prediction model based on Bayesian networks for
Gauteng freeways has been designed. The Bayesian Networks outperformed the Naive
Bayes, k-NN and the Decision tree in predicting vehicle traffic flow on the Johannesburg
ring road.
The results show that the Bayesian networks model outperformed the other models with
the higher number of correctly predicted instances, low RMSE and a low cost. From the
results in this study, one can reason that the BN algorithm is a valid tool for predicting
vehicle traffic flow.
A model was built using vehicle traffic flow historical data captured from Gauteng
freeways. From the results, the Bayesian Networks model outperforms the Naive Bayes,
k-Nearest Neighbor and Decision tree in predicting vehicle traffic flow. The other three
models have performed equally well, the difference between the performances is small
and thus any of the three models can be equally used for constructing the model for
predicting traffic flow. The results of this study will benefit both commuters and employers
by potentially reducing stress levels and thus improving their health, save on production
costs for companies and potentially improve the efficiency of South African organisations.
The recommendations from the study to the Gauteng department of transport is that they
must intervene in the area of vehicle traffic speed and also target specific times of the day
to promote free flowing traffic. This is because out of the four variables that included day,
time of the day, volume and speed; the variables speed and time of day in combination
were found to contribute 96% to vehicle traffic congestion as shown in Figure 4.10.
The Bayesian networks model is recommended as the best model for predicting vehicle
traffic flow. The prediction model is only valid for 18 months as beyond that the new
variables could have come up. As an example, during the past 5 years there has been an
introduction of cycle tracks, a marked increase in the number of commuters using public
transport. Thus this model will need to be updated every 18 months or whenever there
are new variables. As a result having a model that predicts traffic for 5 years will thus not
71
be realistic as new variables could have disrupted the pattern (trend) and hence the model
might no longer be able to accurately predict the state of traffic. Further work can include
collecting weather and road construction data.
The recommendation to the Gauteng department of transport is to invest in artificial

intelligence tools to overcome the problem of vehicle traffic congestion in South Africa.
72
CHAPTER 6: REFERENCES
[1] Acentric, Acentric survey reveals traffic congestion's effect on SA, 2011.
[2] L. Baoli, Y. Shiwen and L. Qin, An improved k-nearest neighbor algorithm for text
categorization, Proceedings of 20th International Conference on Computer Oriental
Languages, Shenyang, China, 2003.
[3] J. Beall, O. Crankshaw and S. Parnell, Local government, poverty reduction and
inequality in Johannesburg, Environment & Urbanization, vol.12 no.1, pp.107–122,
2000.
[4] C. Bishop, Pattern Recognition and Machine Learning. Springer Science, New York,
2006.
[5] T. Brinkhoff, Major agglomerations of the world, 2013. Available at
http://www.citypopulation.de/world/Agglomerations.html.
[6] City of Johannesburg Metropolitan Municipality, The Local Government Handbook:
A complete guide to municipalities in South Africa, 2012. Available at
http://www.localgovernment.co.za/metropolitans/view/2/city-of-johannesburg-
metropolitan-municipality.
[7] D. Charoenruk, Communication Research methodologies: qualitative and
quantitative methodology, 2002. Available at
http://utcc2.utcc.ac.th/localuser/amsar/PDF/Documents49/quantitative_and_qualita
tive_methodologies.pdf.
[8] DistanceFrom. Travel Time Calculator, 2013. Available at
http://www.distancesfrom.com/Travel-Time.aspx.
[9] ENaTiS, Live vehicle population, 2013. Available at
http://carinsurance.arrivealive.co.za/category/car-statistics.
[10] P. Hoong, I. Tan, O. Chien, and C. Ting. Road traffic prediction using Bayesian
Networks. Proceedings of IET International Conference on Wireless
Communications and Applications, pp. 1 - 5, 2012.
73
[11] A. Ghazy and T. Ozkul, Design and simulation of an artificially intelligent VANET for
solving traffic congestion. Proceedings of IEEE 6th International Symposium on
Mechatronics and its Applications, pp. 1 - 6, 2009.
[12] InterNations, Johannesburg at the glance: working in Johannesburg, 2013.
Available at http://www.internations.org/johannesburg-expats/guide/working-in-
johannesburg-15840.
[13] H. Ji, A. Xu, X. Sui and L. Li, The applied research of Kalman in the dynamic travel
time prediction, Proceedings of 18th International Conference on Geoinformatics,
pp.1-5, 2010.
[14] M. Kamal, J. Imura, A. Ohata, T. Hayakawa, and K. Aihara, Efficient control of
vehicles in congested traffic using model predictive control, Proceedings of IEEE
International Conference on Control Applications, pp. 1522 - 1527, 2012.
[15] U. Kjaerulff and A.L. Madsen, Bayesian Networks and Influence Diagrams: A Guide
to Construction and Analysis. Springer Science, New York, 2013.
[16] W. Labeeuw, K. Driessens, D. Weyns, T. Holvoet and G. Deconick, Prediction of
congested traffic on the critical density point using machine learning and
decentralised collaborating sensors, 2009.
[17] Y. Ma, M. Chowdhury, A. Sadek, and M. Jeihani, Real-time highway traffic condition
assessment framework using vehicle-infrastructure integration (VII) with Artificial
Intelligence (AI). Proceedings of IEEE Transactions on Intelligent Transportation
Systems, Vol. 10, No. 4, 2009.
[18] T. Mitchell, Machine Learning, McGraw-Hill International Editions, Singapore, 1997.
[19] S. Russell and P. Norvig, Artificial Intelligence a Modern Approach. Prentice Hall,
New Jersey, 2010.
[20] N. Turnbull, Map of Johannesburg with road and region markings, 2005, Available
at http://en.wikipedia.org/wiki/File:Johannesburgmap-ringroad.jpg.
[21] L. Uusitalo, Advantages and challenges of Bayesian networks in environmental
modelling, Ecological modelling, vol.203, no.3-4, pp.312-318, 2006.
[22] S. Vigneswaran, A. Joseph and E. Rajamanickam, Efficient Analysis of traffic
accident, vol.2, no.3, pp.107-118, 2014.
74
[23] N. Wisitpongphan, W. Jitsakul and D. Jieamumporn, Travel time prediction using
multi-layer feed forward artificial neural network. Proceedings of IEEE Fourth
International Conference on computational intelligence, communication systems
and networks, pp.326 - 330, 2012.
[24] S. Sun, C. Zhang and G. Yu, A Bayesian network approach to traffic flow
forecasting, Proceedings of IEEE Transactions on Intelligent Transportation
Systems, vol.7, no.1, pp.216 – 221, 2006.
[25] D. Zhao, Y. Dai, and Z. Zhang, Computational intelligence in urban traffic signal
control: A survey, Proceedings of IEEE transactions on systems, man, and
cybernetics, part C: Applications and reviews, vol.42, no.4, pp.485-494, 2012.
[26] M. May, D. Hecker, C. Korner, S. Scheider and D. Schulz, A Vector-Geometry
Based Spatial kNN-Algorithm for Traffic Frequency Predictions, Proceedings of
IEEE International Conference on Data Mining Workshops, pp.442-447, 2008.
[27] A. Pascale, M. Nicoli, Adaptive Bayesian network for traffic flow prediction,
Proceedings of IEEE International Conference on Statistical Signal Processing
Workshop (SSP), pp.177-180, 2011.
[28] T.Ji, Q. Pang, X. Liu, Study of Traffic Flow Forecasting Based on Genetic Neural
Network, Proceedings of IEEE 6th International Conference on Intelligent Systems
Design and Applications, Vol:1, pp.960 - 965, 2006.
[29] B. Passow, D. Elizondo, F. Chiclana, S. Witheridge and E. Goodyer, Adapting traffic
simulation for traffic management: A neural network approach, Proceedings of IEEE
16th International Conference on Intelligent Transportation Systems - (ITSC), pp.
1402 - 1407, 2006.
[30] S. Li, Z. Shen, F. Wang, A weighted pattern recognition algorithm for short-term
traffic flow forecasting, Proceedings of IEEE 9th International Conference on
Networking, Sensing and Control (ICNSC), pp.1 - 6, 2012.
[31] J. Collis and R. Hussey, Business Research: A Practical Guide for Undergraduate
and Postgraduate Students, Palgrave MacMillan, England, 2003.
[32] Y. Yu and M. Cho, A short-term prediction model for forecasting traffic information
using Bayesian network, Proceedings of IEEE 3rd International Conference on
Convergence and hybrid Information Technology, vol.1, pp.242-247, 2014.
75
[33] E. Bell and A. Bryman, Business research methods, Third edition, Oxford University
Press. 2003.
76
ANNEXURES
Annexure 1A
Live vehicle population adopted from National Traffic Information Systems

(eNaTiS), [9].
77
Annexure 1B
Vehicle Types %
80%
60%
40%
20%
0%
Buses Taxis Private Van Trucks
Vehicles
Percentage of the type of vehicle used on Johannesburg freeway
78
Annexure 4A
Data for vehicle traffic flow collected from MTM (from January 2013 to December
2013). Due to large sample size only sample for January, February and August is shown
in Annexure4A
Instances Site ID Date Time Total Average Speed

Dir 1 Dir 1
1 1863 2013/01/28 14 4169 100
2 1863 2013/01/28 15 6719 58
3 1863 2013/01/28 16 7660 54
4 1863 2013/01/28 17 8682 56
5 1863 2013/01/28 18 7793 60
6 1863 2013/01/28 19 4853 99
7 1863 2013/01/28 20 2855 90
8 1863 2013/01/28 21 1871 94
9 1863 2013/01/28 22 1340 99
10 1863 2013/01/28 23 910 103
11 1863 2013/01/28 24 462 102
12 1863 2013/01/29 1 227 101
13 1863 2013/01/29 2 194 96
14 1863 2013/01/29 3 184 95
15 1863 2013/01/29 4 214 102
16 1863 2013/01/29 5 613 108
17 1863 2013/01/29 6 4046 98
18 1863 2013/01/29 7 11147 55
19 1863 2013/01/29 8 11427 58
20 1863 2013/01/29 9 9091 60
21 1863 2013/01/29 10 7593 58
22 1863 2013/01/29 11 7303 60
23 1863 2013/01/29 12 7183 58
24 1863 2013/01/29 13 7376 60
25 1863 2013/01/29 14 7271 60
26 1863 2013/01/29 15 6969 101
27 1863 2013/01/29 16 8032 56
28 1863 2013/01/29 17 7868 58
29 1863 2013/01/29 18 7902 60
30 1863 2013/01/29 19 5140 103
31 1863 2013/01/29 20 3169 102
32 1863 2013/01/29 21 2167 103
33 1863 2013/01/29 22 1546 104
79
34 1863 2013/01/29 23 1054 105
35 1863 2013/01/29 24 614 106
36 1863 2013/01/30 1 418 108
37 1863 2013/01/30 2 225 103
38 1863 2013/01/30 3 221 100
39 1863 2013/01/30 4 219 102
40 1863 2013/01/30 5 617 110
41 1863 2013/01/30 6 4172 99
42 1863 2013/01/30 7 12145 54
43 1863 2013/01/30 8 11021 55
44 1863 2013/01/30 9 9142 60
45 1863 2013/01/30 10 7570 72
46 1863 2013/01/30 11 7189 75
47 1863 2013/01/30 12 7412 98
48 1863 2013/01/30 13 7201 101
49 1863 2013/01/30 14 7146 100
50 1863 2013/01/30 15 7175 58
51 1863 2013/01/30 16 7973 60
52 1863 2013/01/30 17 9067 60
53 1863 2013/01/30 18 7716 60
54 1863 2013/01/30 19 5365 102
55 1863 2013/01/30 20 3324 102
56 1863 2013/01/30 21 2264 102
57 1863 2013/01/30 22 1724 102
58 1863 2013/01/30 23 1198 103
59 1863 2013/01/30 24 669 106
60 1863 2013/01/31 1 404 106
61 1863 2013/01/31 2 284 102
62 1863 2013/01/31 3 218 100
63 1863 2013/01/31 4 244 101
64 1863 2013/01/31 5 656 105
65 1863 2013/01/31 6 4034 80
66 1863 2013/01/31 7 11820 54
67 1863 2013/01/31 8 11014 55
68 1863 2013/01/31 9 9525 59
69 1863 2013/01/31 10 7783 60
70 1863 2013/01/31 11 7373 60
71 1863 2013/01/31 12 7459 60
72 1863 2013/01/31 13 7711 60
73 1863 2013/01/31 14 7505 100
74 1863 2013/01/31 15 7359 58
80
400 1863 2013/02/26 16 7972 60
401 1863 2013/02/26 17 8849 58
402 1863 2013/02/26 18 7780 58
403 1863 2013/02/26 19 5327 103
404 1863 2013/02/26 20 3349 102
405 1863 2013/02/26 21 2207 102
406 1863 2013/02/26 22 1634 103
407 1863 2013/02/26 23 1101 104
408 1863 2013/02/26 24 621 106
409 1863 2013/02/27 1 350 101
410 1863 2013/02/27 2 219 98
411 1863 2013/02/27 3 192 98
412 1863 2013/02/27 4 246 104
413 1863 2013/02/27 5 690 108
414 1863 2013/02/27 6 4299 107
415 1863 2013/02/27 7 12552 50
416 1863 2013/02/27 8 10324 58
417 1863 2013/02/27 9 8300 59
418 1863 2013/02/27 10 7983 59
419 1863 2013/02/27 11 7470 59
420 1863 2013/02/27 12 7249 100
421 1863 2013/02/27 13 7351 100
422 1863 2013/02/27 14 7349 101
423 1863 2013/02/27 15 7235 101
424 1863 2013/02/27 16 8270 68
425 1863 2013/02/27 17 9223 70
426 1863 2013/02/27 18 7734 100
427 1863 2013/02/27 19 5585 104
428 1863 2013/02/27 20 3521 103
429 1863 2013/02/27 21 2388 103
430 1863 2013/02/27 22 1799 104
431 1863 2013/02/27 23 1182 103
432 1863 2013/02/27 24 680 105
433 1863 2013/02/28 1 398 103
434 1863 2013/02/28 2 261 101
435 1863 2013/02/28 3 192 101
436 1863 2013/02/28 4 247 104
437 1863 2013/02/28 5 704 109
438 1863 2013/02/28 6 4274 109
439 1863 2013/02/28 7 12353 50
440 1863 2013/02/28 8 10558 50
81
Annexure 4B
Test instances used for the vehicle traffic flow experiment in chapter 4
Day Time of day Volume Speed Class

weekday off-peak High Average flowingCongestion
weekday off-peak High Low trafficJam
weekday Peak High Average flowingCongestion
weekday off-peak Average Average freeflow
weekday off-peak Low Average freeflow
weekday Peak Average Average flowingCongestion
weekday Peak High Low trafficJam
weekday off-peak Average Low freeflow
82
83
weekday off-peak Average Low freeflow
weekday off-peak Low high freeflow
84
85
86
Annexure 4C
Procedure for converting data to nominal values:
if speed <= 40 60 then traffic = trafficJam (i.e. vehicles that travel at any speed below
40km/h )
else if speed >=41 61 and <=100 then traffic = flowingCongestion (i.e. vehicles that travel
between 40 and 100km/h )
else traffic = freeflow (free flow vehicles can travel at their own anticipated speed).
Peak time (06h00 - 09h00, 15h00 - 18h00)
Off Peak (10h00 - 13h00, 19h00 - 22h00)
Public holiday (any bank holidays)
School holidays (April, July, September, December)
Number of vehicles (<=3000 trafficJam, <=60000 flowingCongestion <=90000 freeflow)
=IF(F3<=60,"Low",IF(F3<90,100,"Average","high"))
=IF(E3<3000,"Low",IF(AND(E3>=3000,E3<=6000, 5000),"Average","High"))
=IF(AND(J3="off-Peak",K3="high",L3="Average"),"flowingCongestion",IF(AND(J3="off-
peak",K3="High",L3="Low"),"trafficJam",IF(AND(J3="Peak",K3="High",L3="Low"),"traffic
Jam",IF(AND(J3="Peak",K3="Low",L3="High"),"freeflow",IF(AND(J3="Peak",K3="Avera
ge",L3="Average"),"flowingCongestion",IF(AND(J3="off-
peak",K3="Average",L3="High"),"freeflow",IF(AND(J3="Peak",K3="High",L3="Average")
,"flowingCongestion","freeflow")))))))
=IF(AND(J4="off-Peak",K4="high",L4="Average"),"flowingCongestion",IF(AND(J4="off-
peak",K4="High",L4="Low"),"trafficJam",IF(AND(J4="Peak",K4="High",L4="Low"),"traffic
87
Jam",IF(AND(J4="Peak",K4="Low",L4="High"),"freeflow",IF(AND(J4="Peak",K4="Avera
ge",L4="Average"),"flowingCongestion",IF(AND(J4="off-
peak",K4="Average",L4="High"),"freeflow",IF(AND(J4="Peak",K4="High",L4="Average")
,"flowingCongestion","freeflow")))))))
Loading training and testing data into the model

The recipe for training the model was as follows:
Step 1
1. The training data set was supplied to an Excel spreadsheet.

2. Convert the spreadsheet file to a comma-separated value (CSV) format as
follows:
2.1 Open the Excel file.
2.2 Click File, and then Save As.
2.3 Change the file type by clicking the down arrow on the Save As type
textbox.
2.4 Select .CSV format and click Save.
2.5 Click OK on the message box to save the only active sheet.
2.6 Click Yes on the message box to keep the .CSV format.
2.7 Close the .CSV file.
Step 2
1. Launch WEKA to get the graphical user interface (GUI) chooser panel.
2. Click Explorer button from the choice on the right pane.
3. Click the “Open file” button.
4. Navigate to the folder that contains the .CSV file created in Step 1 and open it.
5. Select the .CSV on the Files of type drop down box (All the .CSV files in this
folder will appear).
6. Click the .CSV file in the folder and click Open Button to load the file (the screen
similar to Figure 4.1.1 will appear).
7. Click the Save Button on the top right corner to save the file in .arff format.
88
8. Click the Classify Tab and select the classifier by clicking the Choose Button
at the top left corner of the screen to get the different classifiers in a hierarchical
menu.
9. Select the appropriate classifier.
10. Click on the Classifier Name on the text box next to the Choose Button to get
the object editor, which enables you to change the parameter of the selected
classifier.
11. In the Test Option section, select the cross-validation 10-Folds.
12. Click the Start Button to start the training and the result will be displayed in the
classifier output section as shown in Figure 4.1.2.
13. To save the model, right click the Model Name in the Result List Section and
click Save Model.
14. Save the trained model in the preferred folder.
Figure 4.1.1 - WEKA Explorer Interface
89
The recipe for testing the model
With the result of the trained model opened on the screen as shown in Figure 4.1.2:
1. Select in Figure 4.1.2 Supplied test set in the Test Options section.
2. The small Test Instances window will appear.
3. Click the Open file on the Test Instances window to navigate to the location where the
test data set file is saved in your computer and load the file.
4. Once the file is loaded, select No Class from the list of attributes in the Class Section
of the Test Instances window and click Close.
5. Then, click More Options shown in Figure 4.1.2 a new window opens and choose
PlainText from Output Predictions and click OK.
6. Then right click the model in the Result List window shown in Figure 4.1.2 and click
Re-evaluate Model on Current Test Set.
7. The results will be displayed on the Classifier Output panel, under Predictions on User
Test Set as shown in Figure 4.1.3
Figure 4.1.2 - The output for the Decision tree at the top shows the attributes used.
90
Figure 4.1.3 - Results for the test set using Decision tree algorithm
91

uj_19060+SOURCE1+SOURCE1.1

Uploaded by

Copyright:

Available Formats

You might also like

uj_19060+SOURCE1+SOURCE1.1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

uj_19060+SOURCE1+SOURCE1.1

Uploaded by

Copyright:

Available Formats

COPYRIGHT AND CITATION CONSIDERATIONS FOR THIS THESIS/ DISSERTATION

How to cite this thesis

Supervisor: Dr B.N. Gatsheni by

ZIPHOZETHU PEARL NKOSI

A dissertation submitted in fulfilment for the Degree

Information Technology Management

Supervisor: Dr B.N. Gatsheni

Secondly, I would like to express my sincere appreciations to my supervisor Dr Barnabas

CHAPTER 1: INTRODUCTION .................................................................................................................................. 1

1.1 INTRODUCTION ............................................................................................................................................1

CHAPTER 2: LITERATURE REVIEW .......................................................................................................................... 9

2.1 RELATED WORK............................................................................................................................................9

CHAPTER 3: RESEARCH METHODOLOGY .............................................................................................................. 17

3.1 CHAPTER INTRODUCTION .............................................................................................................................17

CHAPTER 4: EXPERIMENT .................................................................................................................................... 41

4.1 CHAPTER INTRODUCTION ................................................................................................................................41

CHAPTER 5: SUMMARY, CONCLUSION AND RECOMMENDATIONS ...................................................................... 69

5.1 SUMMARY ................................................................................................................................................69

CHAPTER 6: REFERENCES ..................................................................................................................................... 73

FIGURE 2.1: TRAFFIC CONGESTION RESEARCH FRAMEWORK .............................................................................. 15

FIGURE 3.1: BAYESIAN BELIEF NETWORK STRUCTURE ......................................................................................... 21

FIGURE 3.2: DIRECT ACYCLIC GRAPH .................................................................................................................... 23

FIGURE 3.3: JOINT PROBABILITY DISTRIBUTION GRAPH ...................................................................................... 26

FIGURE 3.4: SERIAL CONNECTION ........................................................................................................................ 29

FIGURE 3.5: DIVERGING CONNECTION ................................................................................................................. 30

FIGURE 3.6: CONVERGING CONNECTION ............................................................................................................. 30

FIGURE 3.7: BAYES NETWORK STRUCTURE .......................................................................................................... 33

FIGURE 3.8: DECISION TREE SHOWING FIRST STEP OF ID3 .................................................................................. 35

FIGURE 3.10: 1- NEAREST NEIGHBOR ................................................................................................................... 39

FIGURE 4.1: SHOWS A PROCESS FOR CONSTRUCTING THE MODEL ...................................................................... 45

FIGURE 4.3: LOADING SAVED TRAINING EXAMPLES ON WEKA ............................................................................ 48

FIGURE 4.7: PREDICTION ACCURACY PER MODEL ............................................................................................... 63

FIGURE 4.8: RMSE PER MODEL ............................................................................................................................ 63

FIGURE 4.9: TOTAL COST PER MODEL ................................................................................................................. 64

TABLE 3.1: JOINT PROBABILITY DISTRIBUTION OF A, B AND C ............................................................................. 27

TABLE 4.1: 24 OF THE 520 TRAINING EXAMPLES ................................................................................................. 43

TABLE 4.2: A SUMMARY OF TRAINING AND TEST DATA SETS .............................................................................. 43

TABLE 4.3: NAIVE BAYES RESULTS ........................................................................................................................ 51

TABE 4.5: CONDITIONAL PROBABILITY TABLE (CPT) ............................................................................................. 53

TABLE 4.8: RECIPE FOR KNN TRAINING ALGORITHM ............................................................................................ 55

TABLE 4.9: KNN PREDICTION RESULTS USING 4-K ................................................................................................ 56

TABLE 4.11: DECISION TREE PREDICTION RESULTS .............................................................................................. 58

TABLE 4.14: PREDICTION MODELS RESULTS ......................................................................................................... 62

ANNUXURE 4C: PROCEDURE FOR CONVERTING DATA TO NOMINAL VALUES ...................................................... 87

1.2 A background to Johannesburg and its transportation sector

A number of technologies to overcome challenges of vehicle traffic congestion have been

1.4 Research objective

1.5 Research question

The following questions will be asked in this study:

1.7 The research report structure

Chapter 4 presents the experiments that were conducted. Chapter 5 presents a

The chapter deals with related work done by other researchers.

2.1 Related work

The design of a vehicle traffic congestion Bayesian prediction

Research Objectives Why past studies are

Research questions Applied Design /

Figure 2.1 – Traffic congestion research Framework

A number of technologies to overcome the nonlinearity which shows in randomness of

3.2 Research Methodology