AIMES Report - Melbourne Traffic Prediction - Urban PeakHour

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

TRAFFIC PREDICTION:

A comprehensive analysis of the prediction


quality of a data-driven Deep Learning
Algorithm (DLA) to predict traffic

Created: 18TH June 2021

Updated: 1st December 2021

eng.unimelb.edu.au/industry/transport
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA)
to predict traffic

Contents

1 Introduction 3
2 Terminology 4
2.1 Model-based approach 4

2.2 Forecast Quality, The GEH 5

3 Numerical comparison 5
3.1 Model-based 5

3.2 The Deep Learning Algorithms (DLA) 7

4 Conclusion 15
Appendix A, Deep Learning Algorithm 19

eng.unimelb.edu.au/industry/transport 01
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA)
to predict traffic

Executive summary

The University of Melbourne (UoM) has comprehensively studied and evaluated traffic flow prediction using
purely data-driven Artificial Intelligent (AI) in general and Machine Learning (ML), in particular. The predictive
algorithm was developed by PeakHour Urban Technologies and is called DLA (Deep Learning Algorithm) and
was provided to the UoM for this study.

In this report, we provide reasoning for the importance of reliable and accurate prediction methods for
today’s Intelligent Transport Systems (ITS) as well as an overview of the classical alternatives. We believe
that examination of the pros and cons of such alternatives can provide the necessary context for the DLA
methodology.

Numerical applications of the DLA utilised in the City of Melbourne are provided and discussed. The results
presented show highly promising predictive power of the AI (DLA) in two dimensions: the accuracy and the
time span of the prediction.

For traffic flow prediction, GEH is a metric that assesses predictive capability against the actual traffic
volume; the lower this index, the better the prediction will be. For example, the 60-minute prediction of a
leading classical model-based method1 shows a GEH of 10.0, whereas the DLA prediction demonstrates
much more accuracy, with a GEH of 3.0. Moreover, the DLA can extend the depth of prediction to 120-min
ahead with almost the same level of precision. This is not attainable by the model-based approach.

Even though, the DLA is a data-driven method, it has at the same time proven to be data-frugal too. That is,
compared to other classical methods, the DLA requires much less data. The DLA can work with only the
traffic volume data currently being collected at traffic signals’ (e.g., using SCATS’ loop detectors), whereas
classical models require a series of comprehensive datasets, including household survey, land use data,
travel demand, road survey traffic count etc.

Preparation or development of the DLA is fast and economical, a significant advantage compared to classical
models. Finally, the DLA has a learning capability that keeps pace with gradual shifts or changes in the traffic
conditions. Thus, costly maintenance, which is a one of the big drawbacks of the model-based solutions, is
already embedded in the DLA.

1 We have access to the calibrated OPTIMA model provided to UoM by PTV Group. This model-based approach
provides a fair and clear context for comparison with DLA.

eng.unimelb.edu.au/industry/transport 02
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA)
to predict traffic

Introduction

Traffic control and management have traditionally been dependent on traffic models. The existing models
date back to the 1970s and suffer from several shortcomings. For example, they are largely made on the
back of abstract theories (such as traffic flow theory, traffic assignment etc.) that are not able to accurately
reflect reality. Nevertheless, in absence of other alternatives, they have been applied in traffic management
practice for many years. These models rely on large datasets such as traffic counts, roadside interviews, etc.

In recent years, this domain has changed on three fronts:

• Big Data: The area of Intelligent Transport Systems (ITS) is witnessing a new era of Big Data, sensor
revolution, cameras, lidar, radar, Bluetooth, GPS, mobile phones, etc. which provide a wealth of real-
time traffic and traveller data. Such data were not available in the past.

• Computation power: The early traffic models were developed when computers were far less
advanced in their capabilities. Today, we are witnessing high computation power, quantum
computing and cloud computing that provide enormous capability to collect and process data,
including computation “on the spot” (edge computing).

• Artificial Intelligent (AI): Advances in AI and machine learning (ML) in general, and recent progress in
deep learning, have provided a boost in data processing which was not imaginable even a decade
ago.

The coincidence of the above-mentioned advances has sparked a great opportunity for advanced traffic
control and management with improvements in efficiency, affordability, safety, moving from model-based
approaches to data-driven ones.

An increasingly important part of traffic control and management is prediction: knowing what is going to
happen in the future (in the next 5 min, 15 min, 1 hr, 2 hr etc.). Predicting traffic flow (or traffic volume) is
arguably the most important information for traffic control schemes, city logistics management, and transit
management as well as providing crucial information for travellers (e.g., how long will it take to travel from
point to point?).

To this end, PeakHour Urban Technologies has developed a suit of Deep Learning Algorithms (DLA) to predict
traffic volume and speed with high precision and for a prolonged period (prediction up to 2 hrs). The DLA
works on available speed/volume data and learns from the hidden patterns to predict for the next 2 hours
with updates every 15 min.

The DLA has been comprehensively tested by the University of Melbourne Transport Technologies group on
real dataset obtained in Melbourne, and its effectiveness is compared with that of a model-based method.

eng.unimelb.edu.au/industry/transport 03
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA)
to predict traffic

Terminology

Model-based approach

The UoM set up and calibrated a real-time traffic simulation with short term forecast capability, as a baseline
model-based method.

The study area is shown in Figure 1, bordering the University of Melbourne campus and the area
immediately to the east, bounded by Alexandra Parade to the north, Royal Parade to the west, Victoria
Street to the south and Hoddle Street to the east.

The model-based method forecasts traffic conditions by considering a model-based demand forecasting
method utilising real-time data. To this end, traffic flow and signal plans of SCATS sites within the study area
have been connected to the model through a compatible interface with a SCATS ITS-port. The model
includes 2966 links (segments) and 87 count locations under which 197 SCATS detectors are embedded.

Figure 1: Study Area

eng.unimelb.edu.au/industry/transport 04
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA)
to predict traffic

Forecast Quality, The GEH

As mentioned before, the GEH indicator was used in this measurement to check the quality of different
forecasts intervals. The inputs of GEH (Equation 1) are the predicted flow versus the actual traffic volume at
the road level.

Equation 1

In which:
• M: hourly traffic flow predicted for each forecast interval (15, 30, 45, and 60 minutes)
• C: hourly actual traffic flow detected corresponding to the forecast interval

The GEH Statistic is a formula used in traffic engineering, traffic forecasting, and traffic modelling to compare
two sets of traffic volumes. Although its mathematical form resembles that of a chi-squared test, it is not a
true statistical test. Rather, it is an empirical formula that has proven useful for a variety of traffic analysis
purposes. Using the GEH Statistic avoids some pitfalls that occur when using simple percentages to compare
two sets of volumes. This is because the traffic volumes in real-world transportation systems vary over a
wide range. For example, the mainline of a freeway/motorway might carry 2000 vehicles per hour, while one
of the on-ramps leading to the freeway might carry only 50 vehicles per hour. The GEH statistic reduces this
problem; because the GEH statistic is non-linear, a single acceptance threshold based on GEH can be used
over a wide range of traffic volumes.

Acceptable standards of the GEH rate are shown in Figure 2 in which anything below 5 indicates a good and
trustworthy prediction. A good model should have a GEH of 5 or below for at least 85% of all roads.

GEH less than 5 Acceptable fit

GEH between 5 and 10 Caution: Possible model error

GEH larger than 10 Unacceptable

Figure 2 Standard rates of GEH

Numerical comparison

Model-based

The target GEH used by the model-based method applied to the study area (Figure 1) for real time prediction
is as follows:
´ 15-minute forecast

eng.unimelb.edu.au/industry/transport 05
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA)
to predict traffic

o GEH < 12 for at least 70% of all links with valid count data
´ 30–60-minute forecast
o GEH < 15 for at least 70% of all links with valid count data

As can be seen, such a broad target shows first the intensity of the prediction challenges ahead and
capability of existing model-based methods.

A summary of the model-based calculation aggregated over peak hours is available in Table 1 and the
aggregation for whole day is in Table 2.

As can be seen, the results are not promising and the prediction is myopic in that, as the horizon of the
prediction exceeds 15 min, the prediction deteriorates significantly.

25/6/2020_peak hours (AM peak: 6:00-9:00 & PM peak: 17:00-19:00)

Future Forecast Percentage of links with GEH


AVG GEH X
Time (minutes) less than "X"

15 4 95 12

30 7 90 15

45 13 67 15

60 13 67 15

Table 1: GEH Values for Different Forecast Intervals during Peak Hours

25/6/2020_Whole Day

Future Forecast Percentage of links with GEH


AVG GEH X
Time (minutes) less than "X"

15 3 97 12

30 5 95 15

45 10 79 15

60 10 79 15

Table 2: GEH Values for Different Forecast Intervals for Whole Day Aggregation

eng.unimelb.edu.au/industry/transport 06
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA)
to predict traffic

It is worth noting that, examining the 15 min prediction (which far less complicated than the 60 min
prediction), the average GEH is around 4, and over 5% of roads still have a GEH of 12 or above.
The Deep Learning Algorithms (DLA)

The DLA only needs traffic volume and speed data that is readily available almost everywhere. These two
types of datasets are usually considered minimum traffic data.

Error! Reference source not found. shows the location of the roads used in the DLA, covering an area much
bigger than the area used for mode-based evaluation. Covering 2215 km, 18461 road segments and 4300
signalised intersections, this area provides more challenges to the DLA as it is significantly larger and
demonstrates much more diverse and non-homogeneous traffic patterns. Table 3. Morning peak average
volume and speed (per 15-minute interval) reports average hourly traffic volume and speed for a typical
weekday during morning peak hour. As can be seen, there is a balanced blend of highly congested and less-
congested roads scattered in and around the city, providing a fair basis for evaluation and comparisons. This
dataset includes 41 intersection, 43 road segments, and 127 SCATS loop detectors.

Table 3. Morning peak average volume and speed (per 15-minute interval)

Link Name Average Hourly Volume Average Speed (kmh)

kings-york 3760 26

alexandra-nicholson 2983 19

hoddle-johnston 2945 18

hoddle-victoria 2744 11

flemington-racecourse 2719 14

springvale-waverley 2226 21

canterbury-springvale 2115 20

sydney-boundary 1719 24

ferrars-park- 1667 27

victoria-smith 1637 22

warrigal-north 1595 12

burwood-warrigal 1491 13

king-dudley 1465 10

stgeorges-merri 1370 14

bell-stgeorges 1341 10

eng.unimelb.edu.au/industry/transport 07
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA)
to predict traffic

punt-toorak 1324 19

pascoevale-victoria 1285 57

olympic-batman 1210 14

sydney-gaffney 1189 11

bell-sydney 1148 17

pascoevale-loeman 1135 45

victoria-elizabeth 1056 14

mtalexander-ormond 1007 20

pascoevale-fletcher 985 28

macaulay-dryburgh 983 23

ingles-normanby 973 14

rathdowne-elgin 955 20

epsom-smithfield 946 17

nicholson-moreland 905 12

sydney-albion 886 19

melville-victoria 863 31

exhibition-lonsdale 857 9

city-clarendon 825 12

gilbert-murray 807 20

high-westgarth 796 11

melville-moreland 792 22

johnston-wellington 737 18

high-arthurton 663 16

harbour-latrobe 641 17

eng.unimelb.edu.au/industry/transport 08
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA)
to predict traffic

The DLA is designed to predict both volume and speed every 15 minutes up to 2 hours (although the DLA has
shown promising results on longer prediction periods of 3 hrs or more) for both weekdays and weekends on
a 24x7 basis.

Table 4 provides a snapshot of the results. As can be seen, the average GEH from 15 min to 120 min is
around 4 which is significantly better than those of model-based method (where for 60 min prediction it
could go above 17). This is a significant achievement compared to classical transport models.

Moreover, the DLA provides great results on a 24x7 basis (peak hour, off peak hour, weekday and weekend).
The DLA is an online model, and (i) can learn and adapt from the latest live data and (ii) it can render results
in a matter of seconds, representing a true real-time application.

Figure 3 Location of roads used in the DLA

eng.unimelb.edu.au/industry/transport 09
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA)
to predict traffic

Table 4 DLA results, average of 15 min to 120 min

Weekday Weekend

Model Model Type Learning Error Model Learning Error


Mode Type Mode

Volume to Peak/Off- Online2 (per 1.83 24-hour Online (per 1.63


predict Volume peak road) (GEH) road) (GEH)

Speed to predict Peak/Off- Online (per 2.2 24-hour Online (per 1.9
Volume peak road) (GEH) road) (GEH)

Speed to predict 24-hour Online (per 3.7% 24-hour Online (per 3.5%
Speed road) road)

Moreover, data availability and ability to handle various scenarios of available data are also important factors
for prediction. The DLA can also handle such scenarios. Table 4 shows the results of applying the DLA’s in
different data availability scenarios.

Lastly, it is important to note that the DLA is extremely data frugal. We first started training (or calibrating)
the DLA using only one day of data and kept recording the GEH up until 28 days. The weekday results are
shown in Figure 4. As can be seen, the average GEH in the first day (after collecting only one day of data) was
around 7, still significantly better than a model-based method. Moreover, as the figure suggests, with only
five days of data, the DLA can reach its optimal performance. Figure 5 shows the same weekend result, in
which the GEH in the first day of training stands at 4.5 and decreases to 3.0 at the end of 12 days’ training.

2 Daily update

3 A 14-day window is used to obtain enough weekend data (i.e., 4 weekend days)

eng.unimelb.edu.au/industry/transport 10
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA)
to predict traffic

Figure 4 Average GEH and speed accuracy across all roads, weekday

Figure 5 Average GEH and speed accuracy across all roads, weekend

eng.unimelb.edu.au/industry/transport 11
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA)
to predict traffic

To take a closer look at the level of individual roads, two roads (Kings-York street and Sydney-Albion street)
with high and medium traffic flow were singled out and analysed (see Figure 6 and
Figure 7 for their locations). Prediction results for the DLA operating on these two roads for one typical day, a
weekday, over a period of 24 hours are shown in Figure 8 and Figure 9, in which both actual volume and
predicted volume for 15 min prediction up to 120 min prediction are illustrated. These figures clearly show
how accurate DLA’s predictions are, as the prediction (green line) complies closely with the actual volume
(orange line), even for 120 min ahead prediction.

Figure 6 King Street to York Street location

Figure 7 Sydney Street to Albion Street location

eng.unimelb.edu.au/industry/transport 12
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA) to predict traffic

Figure 8 Prediction vs actual traffic volume, King Street to York Street

eng.unimelb.edu.au/industry/transport 13
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA) to predict traffic

Figure 9 Prediction vs actual traffic volume, Sydney Street to Albion Street

eng.unimelb.edu.au/industry/transport 14
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA) to
predict traffic

Long-term Prediction

The long-term model aims to predict for a prolong interval in the feature says 60 days ahead with high accuracy.
It is a new provision to mainly take historical data into account when predicting for future. It differs from the
short term prediction (120 min) where it is highly dependent on the live readings of the data.

Table 5 shows the 60 days prediction accuracy of the models from the model training date (Date column in
the table) on 10 sample locations. The location names are as follow: City-Clarendon (CC), Exhibition-Lonsdale
(EL), Harbour-Latrobe (HL), Hoddle-Victoria (HV), Lonsdale-William (LW), Melville-Moreland (MM), Pascoe
Vale-Victoria (PV), Springvale-Waverley (SW), Sydney-Gaffney (SG) and Victoria-Elizabeth (VE). For most
location the model achieves high accuracy if no extreme events happen.

Table 5. Average GEH of the long-term predictions.

Date CC EL HL HV LW MM PV SW SG VE

2019-05-01 2.61 2.59 2.58 4.71 2.82 1.99 2.22 2.63 2.59 2.92
2019-06-01 2.43 2.42 2.58 4.98 2.92 1.97 2.11 2.51 2.18 2.87
2019-07-01 2.28 2.48 2.39 5.40 2.92 1.86 2.01 2.57 2.06 2.73
2019-08-01 2.40 3.02 2.37 5.18 2.69 1.91 2.18 2.61 2.18 2.66
2019-09-01 2.51 5.26 2.47 4.19 2.49 1.98 2.12 2.48 2.24 2.75
2019-10-01 2.68 6.44 2.54 3.83 3.21 1.99 2.20 2.67 2.23 2.82

Figure 10 and Figure 11 illustrate 60 days prediction from 02-09-2019 to 31-10-2019 for Springvale road and
Lonsdale street respectively.

Figure 10. Long-term prediction for Springvale road near Waverley road

eng.unimelb.edu.au/industry/transport 15
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA) to
predict traffic

Figure 11. 60 days prediction for Lonsdale street near William street

Visualising the Similarity of Daily Volume Patterns

We also developed a way to visualize the similarity of daily volumes. In Figure 12, similar daily traffic volume patterns are plotted close
to each other. Interestingly the traffic volume patterns follow the order of the days. Patterns of the day seems to be in the order of left
to right (Monday to Sunday). All weekdays form a cluster while Saturday and Sunday have their own cluster.

This could be considered as an interesting and yet easy-to-use as well as easy-to-understand tool to measure and detect different
pattern in the traffic at different locations, different time of a day, different day of week, different month of the year (i.e. school
seasons, shopping seasons etc.)

Figure 12. Similarity of daily volume patterns in 2-dimensional latent space


Conclusion

The DLA as a data-driven method has shown significant superiority over classical model-based alternative in
many aspects. The DLA has shown itself to be the beginning of a paradigm-shift, from classical transport
models to data-driven models sparking new analytical opportunities. For example, the DLA prediction can be
mixed with a data-driven traffic-signal-setting algorithm to provide smarter signal control addressing mobility
issues as well as safety concerns.

In contrast, classical model-based approaches have been revealed as data-intensive model-based methods
which are not appropriate for reliable and accurate traffic predictions. In general, the shortcomings of model-
based approach can be summarised as follows:

eng.unimelb.edu.au/industry/transport 16
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA) to
predict traffic

There needs to be a traffic model to start with, which itself is a restricting factor, in the sense that some cities
may not have such models. In addition, the reliability of such models is questionable, even though they usually
are the result of a rigorous, expensive and time-consuming data collation process and calibration activity.

• They need a diverse amount of data that tends to be expensive to collect and maintain.

• Setup and Calibration requires significant effort. There are several functions and parameters to be
calibrated on a continuous basis which is an expensive and difficult task.

• These models are basically reactive; they collect and process data to react upon. They hardly see
latent emerging trends in the data

• Such complicated and data-intensive methods with myriads of equations and parameters require
regular maintenance to keep pace with the changes in the data. There is no sense of live-learning.

• The level of complexity associated with such models require special skills and trained operators.

• Such models are usually customised to a very specific case-study. For example, a model developed for
Melbourne is not transferable and valid in Sydney.

• These models are usually customised for a specific time of day or day of the week (for example, peak
hour on weekday)

On the other hand, the DLA can address the above-mentioned shortcomings and bring additional capabilities
tailored to the emerging need of smart traffic control schemes. These benefits can be summarised as follows:

• The traffic flow prediction performance of the DLA exhibits high accuracy, and the length of prediction
is noteworthy. The DLA can predict with a very low GEH of 4 (high precision) up to 3 hrs ahead of time.

• The DLA can work with bare minimum data (traffic volume and speed) which are readily available.
Hence, the DLA can work on the existing datasets and doesn’t need any additional sources of data.

• The DLA can work with live (real-time) data and provide live prediction (online application). At the
same time, the DLA can work with historical data to provide offline prediction. The latter (offline
application) can be applied for planning purposes which is a task currently reserved for macro models.

• Running the DLA is fast, so the DLA can be deployed to large scale; it has already been applied to the
Melbourne’s road network covering more than 20,000 roads. Moreover, fast computation at such a
large scale does not need special hardware.

• The DLA has already been uploaded for cloud computing, which streamlines operation of the system.

• Establishment of the DLA is fast, easy and cost-effective. The DLA can be deployed to a new area (city,
county, town, council etc.) in a matter of a few days, a task that may take months for classical model-
based methods.

eng.unimelb.edu.au/industry/transport 17
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA) to
predict traffic

• The DLA has an embedded self-learning feature; that is, the DLA can see gradual shifts or changes in
the data and learn from them. Hence, there is no need for regular maintenance (which is costly for
model-based approaches), and the DLA maintains itself.

• The DLA will be a prerequisite for future smart traffic control schemes, such as smart traffic signal
controls, where accurate predictions for a prolonged period are needed.

• The DLA can accommodate features, such as weather conditions and special events, which are not
available in classical model-based methods.

• The DLA has been developed thoroughly on open-source software packages, which come at no
software cost (unlike model-base alternatives which run on commercial software packages with
associated license fees).

• DLA’s calibration has been automated, and there is no need for any skilled manpower. Furthermore,
running the DLA is easy and straightforward and all the operations are controlled and enacted through
a simple user-friendly dashboard. Though the DLA itself is highly sophisticated and intelligent,
operating it does not need any special skill or training.

eng.unimelb.edu.au/industry/transport 18
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA) to
predict traffic

Appendix A

Deep Learning Algorithm: Deep Learning is the next generation of Machine Learning. It is currently a subset of
Machine Learning (Figure 13) and it can make its own predictions entirely independent of humans, whereas in
machine learning, there is still need some level of human intervention in many cases to arrive at the optimal
outcome. Deep Learning models use artificial neural networks. The design of such networks is inspired by the
biological neural network of the human brain. It analyses data with a logical structure influenced by the
manner in which a human would draw conclusions.

The adjective "deep" in deep learning refers to the use of multiple layers in the network. Deep learning is a
modern variation of AI which is concerned with an unbounded number of layers of bounded size, and which
permits practical application and optimized implementation while retaining theoretical universality under
mild conditions. In deep learning, the layers are also permitted to be heterogeneous and to deviate widely
from biologically informed connectionist models. This is allowed for the sake of efficiency, trainability and
understandability and leads to the "structured" aspect of many descriptions of deep learning.

Figure 13 Deep learning in the context of AI and ML

eng.unimelb.edu.au/industry/transport 19
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA) to
predict traffic

The most important strength of the DLA is feature extraction: the extraction of latent patterns in a large
amount of data (Figure 14). This is not available in classical theories and models.

Figure 14 Deep learning for feature extraction and prediction

eng.unimelb.edu.au/industry/transport 20
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA) to
predict traffic

The DLA requires a time series of live data (for example, traffic volume and speed), and this may be collected
from SCATS loop detectors. Although the SCATS port provides instant data every one second, the DLA can also
work with aggregated data (every 15 minutes). The main data component is the traffic volume, although
other additional data sources such as speed can be easily incorporated (as an additional data layer) to achieve
better precision. Figure 15 illustrates the way the DLA works, consisting of four main steps.

Pre-Processing Data:

Live Data Cleaning

DLA’s Architect design


with an optimization-ML
method

DLA’s training

Applying DLA,
GEH

Figure 15 DLA methodology

eng.unimelb.edu.au/industry/transport 21
Traffic Prediction: A comprehensive analysis of the prediction quality of a data-driven Deep Learning Algorithm (DLA) to
predict traffic

The four steps of the DLA may be summarised as follows:

Step1: The data are first pre-processed to address several known issues with large, and real-time datasets
such as sporadic missing or faulty data. In this step, a data cleaning algorithm (derived using a machine
learning technique) is run over the data to locate and rectify such issues. The aim of this step is to make sense
of the data before passing them on to the DLA. No matter how good or smart the DLA is, the outcome will be
unreliable if the input is incorrect. The data cleaning process must keep pace with the live data and is done
instantly on real-time basis.

Step 2: Given a subject road with live and cleaned data, the second step is to set up or design the architecture
of the DLA, including the identification of the essential characteristics of the DLA such as number of layers,
activation functions, etc. A supervised machine learning algorithm combined with a non-linear optimization
method has been developed to test different architecture designs and settle on a preferred version. Early
numerical analysis showed that the outcome (in terms of GEH) is highly dependent on the design of the DLA’s
architecture.

Step 3: The third step is to train the DLA using live and cleaned data. This process starts with the arrival of the
first patch of data (collected a day before) and the training will be triggered at 2400 hours on nightly basis.
Daily collected data is added to the stack of already-collected data for the DLA’s training purposes.

Step 4: The fourth step is to apply the DLA to the live data and to provide prediction for 2 hours ahead, on a
resolution of 15 minutes.

eng.unimelb.edu.au/industry/transport 22

You might also like