Professional Documents
Culture Documents
Dissertation-Time at Door
Dissertation-Time at Door
Konstantinos-Michail Mylonas
September 7, 2017
1
Contents
I Abstract 6
II Introduction 6
1 Purpose 7
2 Background 8
3 Acknowledgements 8
4 Literature Overview 8
III Methodology 9
5 Pre-Processing 9
5.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
7 Statistical Modelling 18
7.4.1 Chi-Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2
7.5.2 Ensemble Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
IV Results 25
7.7 Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
V Discussion 34
3
List of Figures
1 Figure 1 : The graph depicts clear pattern in early product codes. Then, the
weight of products increases in late product type, there is a break from the pattern
which . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Figure 4 : The graph shows that as the number of individuals pieces increases
5 Figure 5 : The plot shows no particular difference in delivery time on the different
6 Figure 6 :The graph shows mean of delivery time against the number of orders
fewer the chances are to have available data. On the contrary, it seems that most
7 Figure 8 : The plot shows that the predicted is following the actual line indicating
that in most cases there the tow values are close. However, another feature that
8 Figure 9 : The plot shows that the predicted is following the actual line indicating
that in most cases there the tow values are close. However, another feature that
4
9 Figure 10 : The plot shows even though in most cases delivery time is predicted
close the actual value, there is a systematic pattern of inflating the estimated
delivery time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
10 Figure 11 : The plot shows that the predicted is following the actual line indi-
cating that in most cases there the tow values are close. However, another feature
that capture the attention is the occasional large estimated delivery time . . . . 30
11 Figure 12 : plots show that collection item and product type play the most
12 Figure 13 : The plot shows that the predicted is following the actual line indi-
cating that in most cases there the tow values are close. However, another feature
that capture the attention is the occasional large estimated delivery time . . . . 32
13 Figure 14 : The graph shows that those participant who dropped the survey had
5
List of Tables
2 The largest proportion of information was decided not to be included to the the
final data set. Other covariates needed to be transformed to facilitate their use
data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Part I
Abstract
This article considers the application of machine learning algorithms in optimizing routing sched-
ule in Delivery Industry by tackling a fundamental problem - predicting accurately the time
needed to deliver an order. Using the current operational research algorithms, delivery time
was predicted in an ad hoc manner without utilising past data on the process. By contrast, the
present research utilises past data on the estimation of delivery time in an attempt to ensure
better costumer experience. In this report, the results from several machine algorithms are
compared to provide . Although initially results meet projects goals of predicting 60% of order
within a three minute time window and with reasonable error metrics, hardware limitations
6
Part II
Introduction
Operational research is a science dedicated to apply advanced analytical methods to qualify
better decisions in real world problems. In particular, one of the main areas of concern in
operational research is route optimisation- a family of problems which attempts to find the
optimal way of routes for a fleet of vehicles to traverse in order to deliver to a given set of
customers. Its origins lie back to 19th century when Hamilton firstly mathematically formulated
the problem. However, like all branches of Operational Research it came into prominence during
World War 2 and the subsequent decades. At that time, experts relied on advanced mathematical
tools such as Combinatorix and Integer Programming to solve their problems. However, as time
moved forward, tools are either emerged or found an apply to this particular set of problems.
Most recently, it is endeavoured to apply Data Science methods in Routing Planning. In more
detail, a key parameter in this problem is to the time delivery services are needed to finish
a delivery. A novel approach entails past data of orders to estimate the time which delivery
experts spend with client using machine learning algorithms to find patterns between delivery
time and available covariates such as items characteristic and spatial information hidden in
delivery address.
1 Purpose
The purpose of this project is to build predictive models that estimate delivery time and com-
pare the developed models to prescribe the best course of action for estimating time at door.
7
2 Background
In todays world, globalisation has succeeded in bringing the world together through trade.
However, a far less known aspect of it is the struggle for moving products in the most efficient
manner. Indeed, as the volume of products increases, it is becoming more vital to rely on
more scientific ways to optimise route planning avoiding delays of the execution plan or missing
opportunities to deliver items. In this particular context, a retail company sought the aid of Data
Science Company Satalia providing with data from every stage of delivery process to estimate
3 Acknowledgements
whose selfless time and care were sometimes all that helped me endure, Mrs Vega and Satalia
for trusting me with this project and provide ample guidance and Dr Simon Tomlinson for
bringing all these opportunities to Lancaster University. Finally, a thank to all my Professors
for beating in me the pursue of excellence under the most dire circumstances.
4 Literature Overview
The travelling salesman problem (TSP) is a great chapter in Operational Research by asking
the following question: Given a list of cities and the distances between each pair of cities, what
is the shortest possible route that visits each city exactly once and returns to the origin city?
or more scientifically expressed given a length L, the task is to decide whether the graph has
any tour shorter than L. Having arisen from business cycles of travelling merchants in Europe in
the first half of 19th century, W.R. Hamilton and Thomas Kirkman,the problem was formulated
in 1930 when intensively studied problems in optimization. However, it was as late as 1950s
8
before another touchstone was reached with the seminal paper of George Dantzig, Delbert Ray
Fulkerson and Selmer M. Johnson[1], expressing the problem as an integer linear program and
developing the cutting plane method for its solution using an example of 49 cities with a string
model. Later, Christofidis[2] developed the worst-case scenario in 1976. Owing to the speed
and simplicity of algorithm, many hoped it would pave the way to a near optimal solution
method. In the 1990s, Applegate, Bixby, Chvtal, and Cook[3] and most recently, Cook and
others computing an optimal tour through an 85,900-city instance given by a microchip layout
problem,
Part III
Methodology
5 Pre-Processing
5.1 Data
The data was provided by Satalia and consists of four different data sets which contains infor-
mation for tracking both orders and items in every stage of delivery procedure . Firstly, DFS
DEL HIST data set holds information about the delivery of products in the delivery branches
distributed to their final destination. In particular, this data set bears the branch id, the loca-
tion of delivery branch and expected date and time of delivery, along with the route and the
van id through the products will be transported to the branch. Secondly, DFS DEL DET data
set gives a description of the products characteristics to be delivered. More specifically, DFS
DEL DET contains information such as the items weight, volume, number of individual pieces,
products category and a brief description. Moreover, it is included a notice whether delivery
expert is supposed to expect a payment or an item on loan by the .The aforementioned data sets
9
were merged using inner join command of dplyr package. However,working in R it was deemed
necessary to reduce the dimensions of both DFS DEL HIST DFS DEL DET due to memory
storage restrictions and to their irrelevance to the given task. Thus, many covariates dropped
giving rise to the DFS DEL dataset. A description of the resulting data set is given below
Collection Item Logical A Binary Variable indicating whether there is an item on loan needed to be retrieved
Table 2: The largest proportion of information was decided not to be included to the the final
data set. Other covariates needed to be transformed to facilitate their use by machine learning
algorithms
[H] Indeed, both Collection Item and delivery address postcode were transformed to binary
variables and to used only the three first digits to facilitate clustering amongst observations.
Moreover, it was deemed to use variables Individual pieces and product type as factors rather
than integer due to the special meaning of numeric values which is associated with categories.
Turning to the third data set, ApoloOrders holds data about the deliveries such as the order id,
the time of arrival and departure, location of delivery destination giving both postcode address
and the longitude and latitude of the location, date time, slot that delivery took place and time
spent with client. In this dataset is found a key information the timeAtDoor covariate.Similarly,
it was decided to join the data set using Order id as primary key leaving only the features in in
10
the final data set Therefore, the aforementioned features were selected to merge with DFS DET
to compose the final data set. Below, it is given a short description of the emerged data set.
Collection Item Logical A Binary Variable indicating whether there is an item on loan needed to be retrieved
Table 4: A lot of key information was found in AppoloOrderDetails benefiting the data with the
[H]
By enriching the data, it becomes feasible the addition of more advanced modelling tech-
niques such as Spatial model and Gaussian Processes to the analysis. Finally, HistMasterDrops
illustrates the itineraries of each delivery vehicle such as the number and location of each stop
along the route and information pertaining to the proportion of time spent actively during the
journey. Although, it is not considered necessary to merge this dataset with previous data set,
it is asked to endeavour to find a smart way of joining the two data sets. In absence of common
columns that could serve as primary key, it was decided to use postcode and dates columns to
11
5.2 Feature Engineering
Having obtained the final dataset, it was decided that extra features should be geared to firstly
reduce further the dimensions of the data and secondly facilitate their inclusion on the models.
Thus, it was created the attribute of TimDiff which is the time interval between the arrivalTime
and departureTime. It is believed that this conversion would help implement time interval
information in the model- something that would be infeasible employing the original columns.
facilitate the assessment of performance of models. Finally, it was decided to create a column
with the first three or four digits of postcode in an attempt to cluster together observations lying
The next stage of the analysis involved an exploratory analysis on the data. Although, there is a
plethora of covariates that intuitively influence the duration of a delivery, the analysis attempts
to retrieve key relationships in the data. In doing so, a number of graphical tools will be
entailed to disseminate information. Once the power of the relationships has been established,
the statistically important features will be used as covariates in machine learning algorithms.
Additionally, spatial-temporal correlations will be taken into account due to the inclusion of
models whose parameters involve amongst observations in the data set. At the beginning, one
of the features that mostly of the item is the product code .Subsequently, analysis revolved
around of how timeatDoor differs with respect the product category. In this case, if this holds
then mean of each product would differ. Thus, to present tangible evidence of, it is decided to
plot the mean estimated delivery time with respect to product type.
12
Scatter plot of Mean Time per Product Type
Figure 1 : The graph depicts clear pattern in early product codes. Then, the weight of products
increases in late product type, there is a break from the pattern which
[H] As it is expected, similar products need approximately the same amount of time to be
delivered as the graph depicts. More importantly, major deviations are illustrated amongst
clusters of product categories. Indeed it is natural for categories 2 ,3 and associated with to take
significantly less amount of time to be delivered compared to products such as sofas or mirrors.
However, displaying the mean of each categories is not enough to be able paint the larger picture.
On the contrary, it is needed a more thorough treatment of the data to let them fully express
for themselves. Thus, to obtain a better picture of how the delivery time is distributes within
13
Bar plot of Estimated Delivery Time Per Product Type
Figure 2 : The weights of items found in each product category is increasing as the product
code increases. Additionally, the presence of outliers diminishes as the product type
[H] As it can be observed, outliers are prevalent especially in the earlier categories where.
However, it is of interest to discover the reason behind the difference between delivery time
within product categories. At this point, analysis turns to other covariates for explaining the
great divergence between delivery time. Initially, it is thought that irrespective of which category
items belonged, weight and number of the individuals items play a significant part in the inflation
of delivery time. This holds because heavier items intuitively require more time to be transported
from delivery vehicles stopping location to the delivery address. Secondly, items comprised of a
large number of individual pieces need to be assembled in delivery location devoting time which
is added to the overall delivery time. However, this seems to be a more challenging task than
it appeared to be since delivery specialist often do not monitor the time especially with items
corresponding to certain categories. This results in having incomplete information about the
14
Bar plot of Item Weights Per Product Type
Figure 3 : The weights of items found in each product category is increasing as the product
code increases. Additionally, the presence of outliers diminishes as the product type
[H] The resulting plot seems to have imperfect information for certain product categories
related to heavier items. This disquieting ritual amongst delivery experts would have its impact
on model whose performance will be hampered by the limitation of collected data. Turning
to the individual pieces, a bar plot shows the distribution of Delivery time with respect to the
individual pieces.
15
Bar plot of Delivery Time per Number of Products Individual Pieces
Figure 4 : The graph shows that as the number of individuals pieces increases so does the
[H] In the first few categories the median is largely unchanged, however there is upward
trend in the behaviour of median. Conversely, as the product type changes moving to larger
item categories delivery times are homogenised causing the extinction of outliers. Turning to
other factors that might influence time spent with clientle, is considered. Again, it seems that
the number of individual items influences the time spent with client since it is observed a stable
increase of time as the number of pieces is increasing with steeper increase on products comprised
of 6 items. Another observation that needs to be stressed is the existence of outlying observations
16
Bar plot of Estimated Delivery Time With Respect to Item Collection
Figure 5 : The plot shows no particular difference in delivery time on the different time slot.
[H] This may be resulted from traffic and stopping restrictions on vehicle during rush hours
spurring delivery vehicle to stop or search for suitable parking space that does not fall into
parking restrictions. However, on the whole large deviations of time are not observed in the bar
17
Bar plot of Mean of Estimated Time At Door per Number of Orders
Figure 6 :The graph shows mean of delivery time against the number of orders sorted by date.
As it can be observed, as the number of orders increases , the fewer the chances are to have
available data. On the contrary, it seems that most single orders concern larger item
[H]
7 Statistical Modelling
To forecast the TAD, it was decided Random Forests and Deep Learning neural networks meth-
ods to be employed due to methods robustness and their capacity to model large scale problems
with complex relationships. Secondly , because of the results from Explanatory data analysis cor-
roborate that postcode carry enough predictive power, Multilevel Modelling along with Spatial
Temporal Models were considered as modelling options using postcodes to cluster observations
and investigate the spatial pattern which they follow. Lastly, Gaussian Processes were employed
due to the existence of latitude and longitude for each order. Below the main utilised algorithm
are described.
18
7.1 Multilevel Modelling
Multilevel Models implement linear or generalised linear regression models on clustered data[4].
This is achieved by allowing the intercept and coefficient vary for each cluster in the data. In
particular, the Multilevel Models are comprised of levels of hierarchies. In each level there is
regression model which give rise to the next hierarchies with last layer being the level where
observation lie. To achieve this the models coefficients contain fixed and random effects which
y = 0 + 1 x1 + . . . + n xn + t
. . . 0 = 00 + 01 1 . . . + = + u s
The global coefficients have similar properties with their opposite numbers in simple linear
regression and are given by solving the same equations. Whereas, random effects part are given
by random number drawn from a standard Normal. The random part discerns the different
clusters in data giving a more flexible and expressive model.As for the assumptions[5] governing
Linearity The assumption of linearity states that there is a rectilinear (straight-line, as opposed
Normality The assumption of normality states that the error terms at every level of the model
ances
states that cases are random samples from the population and that scores on the depen-
dent variable are independent of each other.However, multilevel models usually deal with
19
cases where the assumption of independence is violated. Thus, multilevel models alter the
Neural Networks have infinite capacity of modelling complex and large data such as stock prices
[6] and [7]. However, most noticeably Neural Net excel at High Dimensional Data[8] Artificial
Neural Nets are processing units which resemble the neuronal structure of human cerebral cortex
in a smaller scale. Neural Networks are organised in layers which by turn are comprised of
interconnected nodes[9]. These layers can be categorised in input, hidden, output layers where
the first two are concerned with receiving and presenting information. As for the latter, it is
burdened with disseminating the information from input layer to identify patterns in the data.
The artificial neural net learns to classify instances by being exposed to patterns linked with
categories found in the data set in question. In doing so, neural network extends the notion
using basics functions[10]. This adjustment gives rise to the basic neural network. Then, a
learning rule- most frequently delta method is employed to adjust connections weightsw . This
is achieved by implementing gradient descent algorithm within the solution vector space to find
a global minimum in order to minimise errors.In particular, to update the weights gradient
where is the learning rate- a indication how small the step toward the direction of error function
n
X n
X
E(w) = E(w )En = (yn tn )2 (2)
=1 =1
20
w is the weight attached to the connection
This formula adjusts the weight towards the directions greatest decrease of the error with
a small step to safeguard against the prospect of local minima. However, this gradient descent
n
X
E(w ) = (yn f (wT x )2 (3)
=1
En
= f (u)(1 f (u)) = f (u)f (u)
u
X
a = f ( w x ) (5)
En En a u
= = (y t)y(1 y)x (6)
w u w
Turning to the Deep Learning Networks[11], they work in the same fashion as Neural Net-
works. However, the difference lies in the number of hidden layers involved. Moreover, in this
particular context the developed net is a feed forward deep learning network with a 10 with
Gaussian Processes can be seen as generalisation of linear and polynomials models. In particular,
instead of making assumptions about what kind of curve could fit the data, a less parametric
21
approach is taken. This approach enables to see the data as incarnations of points coming from
multivariate Normal[12]. In this context as in many other, the mean function is assumed to be
(x x0 )2
k(x, x0 ) = f (7)
2l2
where f the maximum allowable covariance is defined as this should be high for functions
which cover a broad range on the y axis. Each observation y can be thought of as related to an
Just for simplicity of exposition, the noise is folded into k(x, x0) ), by writing
(x x )2
k(x, x ) = f2 exp + f2 (x, x ) (9)
2l
where (x, x is the Kronecker delta function. Thus, given n observations y, our objective is to
predict y, not the actual f their expected values are identical according to , but their variances
differ owing to the observational noise process. Calculating the covariance matrix[?]
k(x1 , x1 ) . . . k(x1 , xn )
k(x , x ) . . .
2 1
K=
K = (k(x , x ), . . . , k(x , x )) K = k(x , x1 )[13] (10)
1 n
.
.. .
..
k(xn , x1 ) . . . k(xn , xn )
Then, using the assumption that data comes form Multivariate Normal
t
y K K
N (0, ) (11)
y K K
where T indicates matrix transposition. We are of course interested in the conditional probability
22
p(yy): given the data, how likely is a certain prediction for y
y |y N (K K 1 y, K K K 1 K )
y = K K 1 y
var(y) = K K K 1 K
[13]
Decision trees belongs to the category of supervised algorithms employed to classify both cate-
gorical and continuous data. In particular, a decision tree is a flowchart tree in which through
successive testing in each internal node[14]. Branches emerge denoting the outcome of the test
leading to further testing or a leafing node which holds the class label of the instance. Given a
training instance ,it goes through a pattern of questions in which is decided to which category
should be assigned. However a common problem is how to organise the sequence of ques-
tions Therefore, a number of splitting criteria has been developed to ensure the homogeneity of
performed in the most informative attribute according to one of the following criterion
7.4.1 Chi-Square
Chi-square algorithm finds out the statistical significance between the differences between sub-
nodes and parent node. It is measured by sum of squares of standardized differences between
observed and expected frequencies of target variable. The algorithm works for categorical target
variable whose value the higher is higher the statistical significance of differences between sub-
23
node and Parent node.
(Actual Expected)2
Chisquare = (12)
(Expected)1/2
Although originated from Physics, Information gain or Entropy is a measure to define this degree
of disorganization in a system.Thus it allows to discern pure and less pure nodes and take split
according to which feature requires less information to be described, Entropy can be calculated
Here p and q is probability of success and failure respectively in that node. Entropy is also
used with categorical target variable. It chooses the split which has lowest entropy compared to
parent node and other splits. The lesser the entropy, the better it is.
Gini index says, if we select two items from a population at random then they must be of same
It works with categorical target variable Success or Failure. It performs only Binary splits
Higher the value of Gini higher the homogeneity. CART (Classification and Regression Tree)
uses Gini method to create binary splits. The Gain Ratio criterion is a normalising version of
information which has the tendency to split towards test towards with many outcomes. Thus
algorithm which represents potential information by splitting training instances into n partitions-
the same number of outcome. After, the criterion is selected, the data set is split to all the
distinct values of the this attribute Sx . Subsequently, another spiting attribute is selected and
calculated based on the frequencies of the distinct values. This procedure continues until no
attribute is left.
24
7.5.2 Ensemble Methods
Despite the nice properties of decision trees, the algorithm generally suffers from high bias even
in its more complex incarnations. Thus, the aid of ensemble methods is entailed to reduce
variance of predictions. In doing so, bagging builds several weak learners on the different sub-
samples of the same data. A random forest is an implementation of the bagging paradigm which
grows a number of weak learners whose final prediction are combined using mean/median or
major voting approach to classify instances. Turning to boosting, similarly to other ensemble
methods, it employs weak learners to create stronger classification rules. To create such rules, it
must be firstly defined a weak learner. This is achieved by applying lose learning with different
distribution. Gradient Boost Tree encapsulates boosting paradigms. Typically, Gradient Boost
Tree grows an number of decision trees whose predictions are summed. However, the next
decision tree attempts to minimise the observed error by reconstructing the residuals from the
Part IV
Results
At this point, having identified important relationships in the data set, analysis turned to
predicting delivery time employing a number of machine learning algorithms. In this chapter,
portrayed through mean square error ,mean absolute error, mean and latter to predict 70 % of
orders delivery time within 3 minutes of time window error. Initially, analysis turned to the
most ubiquitous algorithms for handing this task- regression. Indeed, regression is utilised for
25
forecasting or predicting calendar events leading to believe that this course of action would be
the the best since it is simple and it could be potentially used as benchmark for more advanced
methods. Therefore a simple linear regression model ids developed .In more detail, the model
Figure 8 : The plot shows that the predicted is following the actual line indicating that in most
cases there the tow values are close. However, another feature that capture the attention is the
[H] Turning to the performance of the model, the aforementioned linear regression model suc-
ceeded in predicting 68.76%of observations within three minute time window and rmse 7.029438
However, it is becoming more popular to explore the spatial structure of the observations for
enhancing the performance of model. Indeed, spatial information such as address postcodes can
capture hidden relationships in the data[15]. This information might be able to add another
This is of importance since observation lying proximate on the map might have similar duration
26
Table 5: Output of the Multilevel Model-Random Effects
and be affected differently by covariates. Thus, a Multilevel model is entailed to develop differ-
ent regression model for each cluster of observations by letting intercept and coefficients vary
Below, it is found a description of the model and hierarchies structure of the model
0 = 00 + s
Having identified the model parameters, focus is shifted to the performance of the model. The
model identified 72.51% of the observation within three minute time interval with root mean
square and mean absolute error 6.13 and 3.72. Below it is depicted a plot of both predicted
27
Line Plot of Predicted against Actual Predicted Time at Door
Figure 9 : The plot shows that the predicted is following the actual line indicating that in most
cases there the tow values are close. However, another feature that capture the attention is the
[H] From the plot, it is ostensible that there is alarming pattern of predicting the duration
predict delivery time. However, in this particular case, it verified that initial expectations that
it would not perform equally well as other methods due the the complexity of data without a
smart choice for kernel. Support Vector Machines managed to predict 69% of observation in the
desired interval with inflated mean absolute error 7.20 and root mean square error 4.24
28
Line Plot of Predicted against Actual Predicted Time at Door
Figure 10 : The plot shows even though in most cases delivery time is predicted close the
actual value, there is a systematic pattern of inflating the estimated delivery time
[H]
Leaving the sub par performance of Support Vector Machine behind, it is time to attempt
to predict delivery time utilising more advanced methods- decision trees machine algorithm.
However, due to some limitations, it is considered more beneficial to employ ensemble methods
with decision trees such as Random Forest and Gradient Boosted Trees. Turning first to Random
Forest, the algorithm succeeds in predicting 70% within a 3 minute time interval. Additionally,
the performance is reasonable giving respectively for Mean Square Error, Mean Absolute Error
and Root Mean Square Error. Below it can be found a plot illustrates the performance of the
Random Forests
29
Figure 11 : The plot shows that the predicted is following the actual line indicating that in
most cases there the tow values are close. However, another feature that capture the attention
[H] Furthermore, rf package used to developed the random forest supports graphical assess-
ment of the importance of variable in the model. Below, it is depicted a plot illustrating the key
30
Importance Plot of Variables in Random Variables
Figure 12 : plots show that collection item and product type play the most significant part
in estimating duration of delivery followed by item weight and the number of individual pieces
[H] From the plot, it is concluded that the statistically important is somewhat counter-
intuitive to what it is expected based on the explanatory data analysis As for the Gradient
Boost Trees, the performance is similar to Random Forest with slight increase reaching 73% in
the percentage of correctly predicted observations within a 3 minute error. In particular, both
mean absolute error and root mean squares error are estimated at 3,73 and 6.22 respectively.
Finally, no effort would be complete without trying to develop a Deep Learning network. Due to
their capacity of dealing with complex and large data such as stock prices [6] and [7]. However,
most noticeably Neural Net excel at High Dimensional Data[8]. In particular, it is developed
a recurrent deep learning network with 3 hidden layers comprised of 10 nodes each with 500
31
epochs succeeding to correctly predicting 77% with three minute error. More specifically, mean
absolutely error is 2,75 and root square error experience is decreased to 3.012 compared Decision
trees
Line Plot of Predicted against Actual Predicted Time at Door
Figure 13 : The plot shows that the predicted is following the actual line indicating that in
most cases there the tow values are close. However, another feature that capture the attention
[H] It is apparent that it certainly deserved the time and resources to develop a Deep Learning
Although, industry is moving towards Deep Learning for the discussed above. However, it is
not the only option available. Ultimately,Gaussian Processes are found particular application in
modelling complex system [16] which include spatial-temporal information[17]. Thus, a model
is employed to predict delivery time using both item characteristics and spatial information.
In contrast to other method it is only feasible to implement in Amazon Web Services since its
32
computation complexity outweigh the computers resources. As for the performance of algorithm
Figure 14 : The graph shows that those participant who dropped the survey had higher
[H]
7.7 Recommendation
In this part, no report would be complete without providing a list of recommendation that have
emerged from the scrutinisation over the project. It is hoped that this piece of work will provide
The following dataset DFS DET DEL, DFS DET HIST and AppoloORdersDetails provide a
useful insight into the problem and DFS DET DEL, DFS DET HIST should be combined
using inner join. The result data set should be merged with AppoloORdersDetails left
33
join.
Moreover, the usefulness of dataset has been circumvented by the lack of a primary key. It is
Both explanatory data analysis revealed that individual pieces, item weight, product type play
Deep Learning, Gaussian Processes and Random Forests were amongst the machine learning
algorithms that bore the most fruitful results, Future workers should pay more attention on
optimisation the performance of those models or start by considering first those techniques.
Part V
Discussion
In an attempt to rethink routing optimisation problem as part of project, a plethora of ma-
chine learning algorithms were entailed to predict time at door. In particular, analysis bore
that although a number of factors influence the time spent with client, the type of product,
whether there is an item for collection and individual pieces consisting of item play the most
significant part in estimating time at Door. Secondly, all machine learning algorithms utilised
performed equally satisfactorily with the percentage of estimated time being within the three
minute window error lying in the almost 74%. However, when Gaussian Processes and Deep
Learning performed significantly better when the process was repeated in cloud computing plat-
forms. This the reason behind the shift of the industry to computational intensive methods
such as Deep Learning and in particular Gaussian Processes. On the whole, machine learning
algorithms successfully tackled the problem of predicting the time that a delivery expert needs
34
to stay with client with the majority of observations predicted within three minute interval from
the correct time Most importantly, this project stands on the same side with a few which en-
deavour to bring closer Operational Research and Data Science under the umbrella of Business
Intelligence. However, even thought the developed models bore tangible results, for the future,
In addition, it was not feasible to enrich the data with external resources. In summary, the
present study adhere the aid of machine learning algorithms to predict time at Door. In doing
so, analysis bore tangible results qualifying state of art algorithms such Gaussian Processes and
References
[2] N. Christofides, Worst-case analysis of a new heuristic for the travelling salesman problem,
1976.
[3] D. L. Applegate, R. E. Bixby, V. Chvatal, and W. J. Cook, The traveling salesman problem:
[4] M. Kuhn and K. Johnson, Applied predictive modeling, vol. 810. Springer, 2013.
[5] J. J. Faraway, Extending the linear model with R: generalized linear, mixed effects and
[6] B. Mandelbrot, Forecasts of future prices, unbiased markets, and martingale models,
35
[7] B. Alipanahi, A. Delong, M. T. Weirauch, and B. J. Frey, Predicting the sequence speci-
ficities of dna-and rna-binding proteins by deep learning, Nature biotechnology, vol. 33,
[8] I. Arel, D. C. Rose, and R. Coop, Destin: A scalable deep learning architecture with
[9] P. Flach, Machine learning: the art and science of algorithms that make sense of data.
[11] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp. 436
444, 2015.
[12] D. Koller and N. Friedman, Probabilistic graphical models: principles and techniques. MIT
press, 2009.
[13] M. Ebden et al., Gaussian processes for regression: A quick introduction, The Website
2008.
[14] L. Rokach and O. Maimon, Data mining with decision trees: theory and applications. World
scientific, 2014.
and health outcomes: a critical review, Journal of Epidemiology & Community Health,
36
[16] N. Chen, Z. Qian, I. T. Nabney, and X. Meng, Wind power forecasts using gaussian
processes and numerical weather prediction, IEEE Transactions on Power Systems, vol. 29,
[17] Y. Xie, K. Zhao, Y. Sun, and D. Chen, Gaussian processes for short-term traffic vol-
37