Professional Documents
Culture Documents
Szkuh-Big Data Analytics Dissertation
Szkuh-Big Data Analytics Dissertation
2019-2020
Acknowledgements
A personal thanks to the Shortridge Ltd for agreeing to be the case organisation, and
for the employees for taking the time to participate in the research and providing the
information needed to complete the dissertation especially with the chaos and
substantial impacts of the COVID-19 pandemic.
Thank you to my supervisor, Dr Adam Gripton for providing support, time, and
experience throughout the dissertation process.
Declaration
I declare that the thesis embodies the results of my own work and has been composed
by myself and meets the University policies on plagiarism and ethical research.
Where appropriate within the thesis I have made full acknowledgement to the work
and ideas of others or have made reference to work carried out in collaboration with
other persons.
Page ii of xii
BDA dissertation Heriot-Watt University
sample
Abstract
Purpose: The aim of this research is to investigate how a BDA solution to a VRP
compares to an experience-led heuristic in an SME and to understand the issues this
case highlights in SME adoption of BDA.
Methodology: Research strategy followed a single case study with an embedded mixed
methods design. Primary data was collected using a self-report online questionnaire. A
purposeful sample of the office employees at the case organisation resulted in 13
responses (response rate of 61.9%). Secondary data consisted of the February 2020
extract of telematics data from the case organisation plus opensource data upon which
BDA (descriptive and prescriptive) was performed using Python (v3.8.3 64bit) and
Google OR-Tools (v7.6.7691).
Findings: BDA solution to the VRP performed better than experience-led heuristic in
all simulations, reducing total distance covered by between 4-31% and total duration of
routes by 4-26%. Barriers to adoption of BDA consistent with the literature emerged
from analysis and reported by employees. These include limited technical expertise,
data-driven culture, and understanding of “how” to collect the data and generate value.
Research value: The research illustrates the value of BDA in solving a real business
problem and provides an example use-case for SMEs. In addition, it provides evidence
of the barriers to BDA adoption in SMEs from the literature.
Limitations: A single case study so results may lack generalisability. Research method
selection constrained, and validity of findings also impacted by COVID-19 pandemic.
Table of Contents
ACKNOWLEDGEMENTS ....................................................................................... II
CHAPTER 3 - METHODOLOGY........................................................................... 20
CHAPTER 6 - CONCLUSION................................................................................. 66
REFERENCES ........................................................................................................... 70
APPENDICES ............................................................................................................ 86
Page vi of xii
Heriot-Watt University
List of Figures
Figure 2-1 - The eight attributes of Big Data (adapted from Mikalef et al. (2018) and
Belhadi et al. (2019)) ..................................................................................................... 4
Figure 2-2 - Types of Big Data Analytics (source: Belhadi et al. (2019, p. 3)) ............. 5
Figure 2-3 - An illustration of The Sweep Method with a suboptimal solution (left)
versus the optimal solution (right) for a 3-node-capacity vehicle (source: author) ..... 15
Figure 3-1 – Diagram depicting the research design (source: author) ......................... 24
Figure 4-1 - Python code used to geocode the postcodes from the telematics data
(source: author) ............................................................................................................ 35
Figure 4-2 - Python code to request distance and duration matrix from local OSRM
(source: author) ............................................................................................................ 35
Figure 4-3 - Example of how demand was derived based on delivery frequency and
capacity of original delivery vehicle (source: author) ................................................. 37
Figure 4-4 - Time series of the number of routes operated in February (source: author)
......................................................................................................................................40
Figure 4-5 – Time series of the number of nodes serviced per day during February
(source: author) ............................................................................................................ 40
Figure 4-6 - Time series of total distance (left) and total duration (right) of routes during
February (source: author) ............................................................................................. 42
Figure 4-7 - Map of nodes visited in February 2020 with red denoting location visited
on a Wednesday (source: author) ................................................................................. 42
Figure 4-8 - Map of how many times per week a node is serviced: grey - once, black -
twice and red – three and more (source: author) .......................................................... 43
Figure 4-9 - Histogram of percentage difference in route distance (left) and duration
(right) between Original and Algorithm routes in Analysis 1 (source: author) ........... 44
Figure 4-10 - Scatter graphs of the number of stops versus %difference between the
Original and Algorithm routing in route distance (left) and time (right) in Analysis 1
(source: author) ............................................................................................................ 44
Figure 4-12 – Map plot of Original route “2020-02-26+PX19LCJ” with the red tooltip
indicating the depot (source: author) ........................................................................... 46
Figure 4-13 - Map plot of Algorithm route “2020-02-26+PX19LCJ” with the red tooltip
indicating the depot (source: author) ........................................................................... 46
Figure 4-14 – the distribution of route distances for Original and Algorithm routing.
Adjusting only the order of routes in Analysis 1 (left) and allowing the Algorithm to
route at depot level in Analysis 2 (right) (source: author) ........................................... 47
Figure 4-15 – Network view of Original routing (left) and Algorithm routing (right)
from Workington depot on 05/02/2020 in Analysis 2 (source: author) ....................... 48
Figure 4-16 – Comparison in vehicle distance (left) and time (right) between Original
routing and Algorithm routing from Workington depot on 05/02/2020 in Analysis 2
(source: author) ............................................................................................................ 48
Figure 4-17 – Map plot of Original routing from Workington depot on 05/02/2020
(source: author) ............................................................................................................ 49
Figure 4-18 - Map plot of Algorithm routing from Workington depot on 05/02/2020 in
Analysis 2 (source: author) .......................................................................................... 49
Figure 4-19 – Number of nodes visited per vehicle for the Original routing and the
Algorithm routing under 95% CoODV simulation (left) and the bag capacity of the
vehicles (right) (source: author) ................................................................................... 51
Figure 4-20 - Number of nodes visited per vehicle for the Original routing and the
Algorithm routing under 95% CoODV simulation (left) and the bag capacity of the
vehicles (right) with vehicle selection (source: author) ............................................... 51
Figure 4-21 - Number of nodes visited per vehicle for the Original routing and the
Algorithm routing under 85% CoODV simulation with vehicle selection (source:
author) .......................................................................................................................... 51
Figure 4-22 – Route duration for the Original routing and the Algorithm under 95%
CoODV simulation with vehicle selection (source: author) ........................................ 52
Figure 4-24 – Routes from Original routing and Algorithm routing in Analysis 5 with
the route time constraint (source: author) .................................................................... 54
Figure 4-25 – Example of a route that returns to the depot twice (source: author) ..... 55
Figure 4-26 – Role of participant at Shortridge (left) and length of time with Shortridge
(right) (source: author) ................................................................................................. 56
Figure 4-27 – Selected software, services and programming options to Q7 from the
questionnaire (source: author)...................................................................................... 60
Page ix of xii
List of Tables
Table 2-1 - Table of terms from the literature that represent interrogating data to
generate value (source: author) ...................................................................................... 7
Table 2-2 - Common Vehicle Routing Problem variants (source: adapted from Rincon-
Garcia et al. (2017, p. 128)) ......................................................................................... 14
Table 2-3 - Common methods in the literature for solving Vehicle Routing Problems
(adapted from Güneri (2007) and Gendreau et al. (2008)) .......................................... 15
Table 3-1 - Table of research questions, aim and objectives (source: author) ............. 20
Table 3 2 - Additional secondary data collected during the research (source: author)26
Table 3-3 - Summary of the approach to data analysis and how it aligns with research
aims and objectives (source: author) ............................................................................ 27
Table 3-4 – Summary of Big Data Analysis approach (source: author) ...................... 28
Table 4-1 - Description of the comparative BDA analyses (source: author) ............... 31
Table 4-2 - Shortridge vehicle fleet (types and capacities) (source: author) ............... 32
Table 4-3 - Identification of the critical data elements from the telematics dataset
(source: author) ............................................................................................................ 34
Table 4-4 - Assumptions for each version of the routing algorithm analysis (source:
author) .......................................................................................................................... 36
Table 4-5 - Number of each vehicle type at each depot (source: author) .................... 38
Table 4-6 - Summary of the five analyses setups and results (source: author) ............ 39
Table 4-7 - Descriptive statistics of the number of routes per day in February (source:
author) .......................................................................................................................... 41
Table 4-8 - Descriptive statistics of the number of nodes serviced per day (source:
author) .......................................................................................................................... 41
Table 4-9 – Percentage of available journeys* vehicle type is used during February
2020 (source: author) ................................................................................................... 50
Table 4-10 – Average usage of each vehicle type during February 2020 (source: author)
......................................................................................................................................53
Page x of xii
Table 4-11 – Description of records with and without solutions from the routing
algorithm in Analysis 5 (source: author)...................................................................... 54
Page xi of xii
Abbreviations
BA Business Analytics
BD Big Data
BI Business Intelligence
DC Dynamic Capabilities
OR Operational Research
By 2025, the European Commission projects the EU data economy will grow by 275%
from €301b in 2018 to €829b with the volume of data to grow by 530% in the same
timeframe (European Commission, 2020). Big Data Analytics (BDA) is seen as a
capability for firms to extract value from the ever-growing volumes of data (H Chen et
al., 2012; Wang et al., 2016; Nguyen et al., 2018). Among the benefits, BDA enables
objectivity and transparency in decision-making (Belhadi et al., 2019). From the
replacement of manual processes, BDA has been shown to increase profitability
(Raguseo et al., 2020) through reduced operational costs (Carlan et al., 2020), improved
efficiency (Mikalef et al., 2018) and increased productivity (Müller et al., 2018; Ferraris
et al., 2019). BDA has been positively associated with innovation (Božič and Dimovski,
2019) and a firms agility (Wamba and Akter, 2019). Although LaValle et al. (2011)
stated that top firms use analytics five times more than lower performers and
expectations of BDA to enhance performance is high, adoption is in the minority across
supply chain functions (Wang et al., 2016) and logistics (Schoenherr and Speier-Pero,
2015) with many firms struggling to deliver valuable insights (Roßmann et al., 2018)
and unconvinced on the influence BDA has had on firm outcomes (Ghasemaghaei and
Calic, 2020). In Small Medium Enterprises (SMEs), decision-making is more likely to
rely on feelings and intuition (Garengo and Bititci, 2007) and estimates of BDA
adoption are even lower – 15% in 2016 (Eurostat, 2020). SMEs also have a significant
impact on the road network (Miwa and Bell, 2017), yet uptake of Computerised Vehicle
Routing Software (CVRS) is also low (McCrea, 2017, cited in Fontaine et al., 2020, p.
1) which perhaps suggests experience-led heuristic approaches to routing are common.
With logistics highlighted as one of the most applicable areas for BDA (Kache and
Seuring, 2017), a lack of empirical examples of BDA in the literature (Mikalef et al.,
2018) and a limited volume of research on adoption of BDA in SMEs (Coleman et al.,
2016; Bordeleau et al., 2019), this research seeks to address this gap through a case
study of an SME and the Vehicle Routing Problem (VRP). Thus, the aim of this
research is to investigate how a BDA solution to a VRP compares to an experience-led
heuristic in an SME and to understand the issues this case highlights in SME adoption
of BDA.
Page 1 of 110
The rest of this paper is organised as follows, Chapter 2 is a review of the literature to
provide an understanding of current research and theory, Chapter 3 details the precise
methodology followed including research philosophy, strategy, details of the case and
data collection instruments, Chapter 4 presents the findings and analysis from the
collected data and Chapter 5 discusses these results in relation to the academic literature
before the research is concluded in Chapter 7, with implications for practitioners and
research. The appendix also contains information referred to in the chapters.
Page 2 of 110
Chapter 2 - Literature Review
2.1. Introduction
The following chapter is a review of the existing literature on Big Data (BD), Big Data
Analytics (BDA), and its adoption within Small Medium Enterprises (SMEs). The
review then explores the Vehicle Routing Problem (VRP) from a practical and
theoretical perspective. The chapter concludes with the research questions, aims and
objectives for this research.
Like many terms associated with data and analytics, there are ambiguities in the
literature as to precise definitions (Gandomi and Haider, 2015; Dedić and Stanier,
2016). In the mainstream, BD usually refers to structured datasets that cannot be
managed and processed with traditional IT systems (Min Chen et al., 2014; Duan and
Xiong, 2015). However, Gandomi and Haider (2015) highlight that this definition is
largely driven by the marketing exploits of large software companies and it overlooks
the semi-structured and unstructured data (video files, sensors, geospatial, HTML and
text files) which make up the vast proportion of data today (Cukier, 2010; Syed et al.,
2013). In LSCM literature, BD is commonly defined instead by numerous “V”
attributes. Volume, Velocity and Variety, the original three attributes proposed by
Russom (2011) and McAfee et al. (2012), have been added to so there are now eight
attributes of BD (Mikalef et al., 2018; Belhadi et al., 2019). Yet, it is unclear from the
literature how many attributes and the thresholds to be considered BD. This might be
because the definition is continually evolving, it is contextual or it is subjective – BD
for a smaller organisation might be different to BD for a larger organisation (Gandomi
and Haider, 2015). With a broad definition in the LSCM literature, most data could be
justified as BD should it meet a “V” characteristic; Volume is more commonly
associated, but it is just one of eight dimensions. Additionally, Ghasemaghaei and Calic
(2020) found that Volume is not critical for innovation so collecting large amounts of
data is unlikely to help innovation. Increased digitisation and computing power has
created Variety in data sources to create opportunities – data is no longer confined to
standard systems (e.g. CRM). Through this prevalence and unearthed opportunities,
McAfee et al. (2012) states data has become BD.
Page 3 of 110
Figure 2-1 - The eight attributes of Big Data (adapted from Mikalef et al. (2018) and
Belhadi et al. (2019))
There have been many differing definitions put forward by researchers for BD and BDA
(Rozados and Tjahjono, 2014). Differentiating between the two has sometimes been
unclear with the same definition used for both terms (e.g. Wamba et al. (2015) and
Wamba and Akter (2019)). It appears more common in the literature to define BDA as
the process, tools and techniques of generating insight and subsequent value from BD
(Russom, 2011; Lamba and Dubey, 2015; George et al., 2016). Belhadi et al. (2019)
breakdown BDA into Descriptive, Inquisitive, Predictive and Prescriptive Analytics
(see Figure 2-2). Descriptive analytics provide a view of the current state often using
descriptive statistics and delivered through reporting tools and dashboards (Rozados
and Tjahjono, 2014; Duan and Xiong, 2015). Examples include reviewing the volume
of sales and total distance covered.
Inquisitive analytics explore why something happened (Belhadi et al., 2019), though
some researchers do not make a distinction between Descriptive and Inquisitive
analytics (e.g. Nguyen et al. (2018)). The process builds on descriptive analytics to
investigate root causes and reveal underlying patterns. For example, investigating
correlations between the number of sales and seasonality or identifying customer
sentiments through analysing social media data (Gandomi and Haider, 2015).
Page 4 of 110
Figure 2-2 - Types of Big Data Analytics (source: Belhadi et al. (2019, p. 3))
Across the academic divide, Operational Research (OR) literature tends to use Business
Intelligence (BI), Business Analytics (BA) and Business Intelligence and Analytics
(BI&A) in a similar way BDA is used in the LSCM literature. BI is typically associated
with extracting insights and reporting from data in a structured tabular form, either
spreadsheet or relational database management system (H Chen et al., 2012; Mortenson
et al., 2015; Dedić and Stanier, 2016). BA emerged later to represent the extensive use
of data, statistical, and modelling component often associated with BI (H Chen et al.,
Page 5 of 110
2012; Mortenson et al., 2015). The composite term, BI&A is used more widely in recent
literature to mitigate ambiguity between the two individual terms (Mortenson et al.,
2015). BI&A is defined as “the techniques, technologies, systems, practices,
methodologies, and applications that analyse critical business data to help an enterprise
better understand its business and market and make timely business decisions” (H Chen
et al., 2012, p. 1166). Some researchers distinguish BI&A from BDA by it not
extending to BD (Dedić and Stanier, 2016; Kache and Seuring, 2017), however, other
researchers disagree (Mortenson et al., 2015; Hindle et al., 2020). BI&A also associates
with the four types of analysis: Descriptive, Inquisitive, Predictive and Prescriptive that
BDA is defined by (Del Vecchio et al., 2018; Hindle et al., 2020). So whilst some
authors have tried to distinguish between the various nomenclature (Dedić and Stanier,
2016), it could be argued that these are negligible semantic differences brought on by
the superficial separation of research disciplines (Mortenson et al., 2015). Analytics
and data are interdisciplinary, the focus is on delivering objectivity to support decision-
making (Hindle et al., 2020). Table 2-1 defines terms that are often used
interchangeably in the literature and in the mainstream. Ultimately, “Big Data are
worthless in a vacuum” (Gandomi and Haider, 2015, p. 140); analytics, intelligence,
science and discovery are employed to generate value from that data.
Page 6 of 110
Table 2-1 - Table of terms from the literature that represent interrogating data to
generate value (source: author)
Big Data techniques used to analyse and acquire intelligence (Gandomi and
Analytics from big data to inform decision-making Haider, 2015)
Page 7 of 110
2.2.4. Applications of BDA in LSCM
The Department for Business Energy & Industrial Strategy (2019) defines an SME as
a business with less than 250 employees and these constitute 99% of businesses in the
UK private sector1 and account for 60% of all UK employment. Yet there is marked
difference in the adoption of BDA between SMEs and larger organisations with greater
than 250 employees. Only 15% of SMEs performed BDA in 2016 versus 35% of large
1
Excluding businesses with no employees
Page 8 of 110
organisations in the same year (Eurostat, 2020). Cultural differences exist between
SMEs and larger organisations which affects how the different sized organisations
operate (Gibb, 2000). As a result, differences between SMEs and larger organisations
have been found in a wide-range of other domains such as, approaches to management
development (Gray and Mabey, 2016), corporate social responsibility (Jenkins, 2004),
and adoption of ERP systems (Buonanno et al., 2005). Therefore, it is perhaps
unsurprising there are differences in the adoption of BDA. Additionally, it also poses
problems for generalising the research results from large organisations to SMEs
(Mikalef et al., 2019). For example, Raguseo et al. (2020) found that smaller firms
generally do not receive the same level of profitability from BDA investment that larger
organisations do, whereas, Bughin (2016) found no effect of firm size on the
performance of firm. Conversely, Dong and Yang (2020) found SMEs were better able
to take advantage of analytics on social media data with it proving relatively more
valuable than larger firms. The different structures and differing levels of resources
between larger companies and SMEs likely influences the outcomes of BDA.
The RBV of the firm was proposed by Barney (1991) and explains a firm as a sum of
its resources – assets, knowledge, information, processes. Different firms have different
and contrasting levels of resources and utilising these in combination enables a firm to
achieve competitive advantage. The potential is created from resources that are
valuable, rare, inimitable and non-substitutable (Barney, 1991). Following extensive
use in the IT literature for understanding challenges, adoption and value creation
(Bharadwaj, 2000), the RBV of the firm has become common framework for BDA
(Vidgen et al., 2017; Mikalef et al., 2018; Wamba and Akter, 2019; Ghasemaghaei and
Calic, 2020; Raguseo et al., 2020). Bordeleau et al. (2019) also concluded that the RBV
is suitable for application of BDA in SMEs. The resources associated with BDA tend
to consist of technology, process, people and organisation (Akter et al., 2016; Vidgen
et al., 2017).
BD itself is a key resource but getting the data right is critical (Mikalef et al., 2018) and
though quality of data is not the biggest obstacle in adoption (LaValle et al., 2011),
poor data can lead to incorrect decisions and unnecessary cost (Hazen et al., 2014;
Page 9 of 110
Wamba et al., 2015). Though not always the case and dependent on the BD definition,
smaller companies are thought to have fewer of these BD resources than larger
organisations (Del Vecchio et al., 2018). Even with the necessary BD resource, finding
vendor BD solutions that are both user friendly and embedded with robust analytics
solutions is rare (Russom, 2011; Selamat et al., 2018).
2.3.2.2. People
Having the knowledge and expertise to identify what is needed from a solution, how it
can be implemented and being able to identify suitable data are all required intangible
resources (Schoenherr and Speier-Pero, 2015; Coleman et al., 2016). BDA requires
employee expertise to have the technical knowledge to interrogate the data with the
business and relational knowledge to understand what is important and why (Russom,
2011; Waller and Fawcett, 2013; Wamba et al., 2015; Vidgen et al., 2017; Del Vecchio
et al., 2018; Mikalef et al., 2018; Surbakti et al., 2020). There are fewer employees in
SMEs and managers often have multiple or broader responsibilities than in large
organisations (Gibb, 2000). As a result, where large organisations have analytics teams,
Bordeleau et al. (2019) found analytics is conducted by managers and senior managers
in SMEs. Additionally, using external consultancies with the expertise is unaffordable
(Coleman et al., 2016) and while large companies might be able to team up with large
Page 10 of 110
software providers for implementation support, as in the case of Hopkins and Hawking
(2018) and SAP, this is often out of reach for smaller companies (Akter et al., 2019).
2.3.2.3. Organisation
Researchers also note “Data-Driven Culture” (DDC) is a key resource (Mikalef et al.,
2018) and is defined as a collective thought pattern summarising mindsets, and attitudes
towards process optimization (Belhadi et al., 2019). It involves the collaboration across
different people, skillsets and departments (Akter et al., 2019) and effects how data is
viewed and perceived throughout the organisation (Mikalef et al., 2018; Akter et al.,
2019; Dremel et al., 2020; Mikalef et al., 2020). A successful DDC permeates all levels
of the organisation often requiring a shift towards analytical and problem-solving skills
(Vidgen et al., 2017). Despite the different organisational structures between large
organisations and SMEs (Coleman et al., 2016), Ferraris et al. (2019) also confirmed
the evidence of DDC as a resource for BDA in SMEs.
Particularly important for the development of the DDC is the support of senior
leadership (Schoenherr and Speier-Pero, 2015) whom need to trust the data derived
insights and have an understanding of how they were derived (McAfee et al., 2012;
Mikalef et al., 2018; Conboy et al., 2020; Mikalef et al., 2020). Leadership in SMEs is
often different to large organisations (Gibb, 2000), in particular the personality and
leadership style has an impact on the success of BDA (Bordeleau et al., 2019) echoing
findings found in implementing Performance Management in SMEs (Garengo and
Bititci, 2007).
A limitation with the RBV of the firm is that having static levels of resources does not
necessarily explain how firms adapt to changing external environments and maintain a
competitive advantage (Eisenhardt and Martin, 2000). Additionally, by drawing a
parallel with Information Systems, BDA resources may be imitable so just having the
right resources might be insufficient (Bharadwaj, 2000). For example, some BD
technology is open-source developed and data can be bought from third-parties.
To account for this limitation, Dynamic Capabilities (DC) have been proposed in the
management (Teece, 2007) and Information Systems literature (Bharadwaj, 2000). DC
Page 11 of 110
are an organisation’s ability to create, integrate and deploy resources in combination
and simultaneously to support sustained business performance (Bharadwaj, 2000;
Teece, 2007). Therefore, BDAC is referred to as a firm’s ability to effectively
implement infrastructure, technology and talent to capture and analyse data towards the
generation of insight for decision-making (Akter et al., 2016; Mikalef et al., 2020). In
the analogy of a production process, resources are the input and capability is the process
of leveraging these resources in a strategic way (Mikalef et al., 2018).
Much research has begun theorising a view of BDAC and it has been linked to
competitive advantage (Wamba et al., 2017) and firm performance (Mikalef et al.,
2019; Wamba and Akter, 2019). However, there is limited empirical research exhibiting
BDAC with a reliance on anecdotal evidence (Mikalef et al., 2018). Additionally, with
a different resource makeup in SMEs and large organisations, a uniform approach is
unlikely to fit since traditional resources considered for large companies are insufficient
to facilitate analytics capabilities in SMEs (Bordeleau et al., 2019). There is little
research about how organisations develop a BDAC (Kayser et al., 2018). Multiple
researchers suggest that it is gradual (Mikalef et al., 2019) and occurs through learning
and as the learning evolves over time, the competence and value of the BDAC also
develops (Vidgen et al., 2017; Hindle and Vidgen, 2018; Conboy et al., 2020).
However, SMEs are underrepresented in the literature (Bordeleau et al., 2019) and with
story-telling proven to aid with BDA adoption (Boldosova, 2019), there is an absence
of trendsetting use-cases to aid understanding and develop knowledge (Coleman et al.,
2016).
The output of logistics is customer service (Gubbins, 2003); delivering the right product
to the right place at the right time. With logistics including several functions such as
transportation, inventory planning, warehousing and site locations (Kasilingam, 1998),
there is a balance between customer service and logistics cost (Rushton et al., 2010).
Transportation is not only one of the highest cost logistics operations (Güneri, 2007)
but also a key physical interface between a company and their customers – it has a direct
impact on customer service.
Page 12 of 110
Organisations that outsource transportation to a 3PL or 4PL, will likely have a single
fixed cost for transportation. For an inhouse operation, there will be both fixed costs
and variable costs. According to Kasilingam (1998) and Rushton et al. (2010), fixed
costs include the depreciation of the vehicle value, excise duty, driver compensation
and insurance across the vehicle fleet whereas the variable costs, such as fuel, oil,
maintenance fluctuate with the distance each of the vehicles travels.
Therefore, finding the best routes a vehicle should take is a frequent decision problem
in logistics to reduce costs and improve customer service (Güneri, 2007). First
described by Dantzig and Ramser (1959), the Vehicle Routing Problem (VRP) plays a
fundamental role in logistics (Laporte, 1992). The original VRP, also known as the
Capacitated Vehicle Routing Problem (CVRP), designs optimal delivery routes where
each identical vehicle sets out from a central depot, travels a single route, and returns
to the depot. The aim is to find the routes of least-expense for each vehicle such that
each customer is visited only once by only one vehicle and the capacity of the vehicles
is not exceeded (Laporte, 1992; Güneri, 2007; Braekers et al., 2016). Thus, a solution
to the problem minimises variable costs by maximising vehicle usage whilst achieving
requirements of customer service (Rushton et al., 2010).
Page 13 of 110
Table 2-2 - Common Vehicle Routing Problem variants (source: adapted from Rincon-
Garcia et al. (2017, p. 128))
MDVRP Multiple Depots (to start and end routes from) Pisinger and Ropke
(2007)
TDVRP Time Dependent – routing varies by the time Malandraki and
of day (e.g. congestion) Daskin (1992)
DVRP Dynamic – routes adapted as new information Wilson and Colvin
arises (e.g. traffic, additional orders etc) (1977)
Page 14 of 110
Method heuristic starting with a linear sweep East of the red depot will not find the
global optimum solution. However, if the linear sweep started North or South of the
depot it would. In general, metaheuristic methods have ways of evaluating more
potential solutions to select a better solution. A list of example heuristics and
metaheuristics is shown in Table 2-3.
Figure 2-3 - An illustration of The Sweep Method with a suboptimal solution (left)
versus the optimal solution (right) for a 3-node-capacity vehicle (source: author)
Table 2-3 - Common methods in the literature for solving Vehicle Routing Problems
(adapted from Güneri (2007) and Gendreau et al. (2008))
Page 15 of 110
Genetic Meta- Algorithm that operates in a similar way to natural
Algorithms heuristic
selection. Solutions evolve over generations with only
the best solutions parenting, “crossover”, the next
generation. Two parent solutions are combined to
create offspring solutions. A mutation operator is
applied to each offspring for the next generation. The
best solution found is returned.
Greedy Heuristic Multiple random candidate solutions are generated
Randomised before a local search is performed across the candidate
Adaptive solutions. Each element not added to the solution
Search evaluated by a heuristic function and a random element
Procedure is chosen from a list of the “best” the best elements
stored in a restricted list. The best solution after a
specified number of restarts is returned.
Simulated Meta- Randomised local search method where modifications
Annealing heuristic that increase the cost of the solution can be added with
some probability (i.e. there is a chance that the next
element is not the best element). Modifications are
added at each iteration with a solution kept if it is
better than the current solution. The best solution
returned after set number of iterations. Most likely
method to converge to the global optimum.
Tabu Search Meta- Randomised local search method where the best
heuristic solution is selected as the current solution even if it
causes increase in solution cost. A memory (tabu list)
of recently visited solutions is stored to avoid repeated
solutions. Best solution returned after set number of
iterations or consecutive iterations without
improvement.
Variable Meta- Local search method that exploits different
Neighbourhood heuristic neighbourhoods to escape local optima. When a local
Search optimum is reached, another neighbourhood is selected
and used in the following iterations. Best solution is
returned from all neighbourhoods searched.
A challenge with the VRP methods and solutions in the literature is the inaccessibility
to practitioners in a raw form, due to the language and knowledge of mathematical
notation required, the expensive software used (e.g. MATLAB) and that the solutions
generated are often very specific and have limited validity to a practical setting
(Kasilingam, 1998; Rincon-Garcia et al., 2017). Therefore, the two common options
Page 16 of 110
left for organisations are the choice of either experience-led, manual heuristics or
purchasing vendor software – Computerised Vehicle Routing Software (CVRS).
From a survey of organisations in 2017, only 25% of medium enterprises and 50% of
large organisations were using CVRS (McCrea, 2017, cited in Fontaine et al., 2020, p.
1). CVRS generally have complex algorithms and geographical representations of the
road network to automate the daily planning of collections and deliveries (Rincon-
Garcia et al., 2017). Because of the complexity of the VRP, the automation from the
software generally improves reliability, reduces fixed costs, decreases operational
costs, with vendors also claiming a 10-30% reduction in mileage and an 80-90%
reduction in planning time over manual heuristics (Bräysy and Hasle, 2014; Rushton et
al., 2014). Differences between CVRS is often due to the algorithms and the map data
used for the road network.
Though CVRS is often easy to use and can quickly recalculate routes when changes
occur, a possible reason for the low adoption is the unspecialised nature of cheaper
software with tailored software solutions generally more expensive (Rushton et al.,
2010; Rincon-Garcia et al., 2017; Carlan et al., 2020; Fontaine et al., 2020).
Additionally, there is a desire for closer association between software developers and
researchers, which perhaps emphasises the differences between suboptimal software
and unrealistic research solutions (Bräysy and Hasle, 2014; Rincon-Garcia et al., 2017;
Vidal et al., 2020).
In absence of software or complex algorithms, methods for routing tend to fall under
Cluster First, Route Second; Route First, Cluster Second or follow the Sweep and
Savings method in Table 2-3 (Kasilingam, 1998; Fontaine et al., 2020). A number of
principles for routing are suggested in the literature (Kasilingam, 1998; Güneri, 2007;
Rushton et al., 2010), these include:
Page 17 of 110
Combine deliveries on the same day of the week together (temporal
coordination)
Build routes beginning with the farthest stop from the depot
Use the largest vehicle first to maximise utilisation
Avoid narrow time windows
Consider alternate delivery means for remote or low-volume locations
However, the complexity of the VRP means human planning is inadequate in most
cases (Bräysy and Hasle, 2014). Indeed, Fontaine et al. (2020) used participants with
no logistics experience and found participants rarely found the optimal solution. In
particular, the participants performed poorly at identifying the clusters but routing
within the clusters was very close to optimal. Though the participants with logistics
experience may have performed better, the manual approach to routing is often labour
intensive, time-consuming, and likely to be embedded with errors and inefficiencies
(Carlan et al., 2020).
2.5. Conclusion
2.5.1. Research Questions
Waller and Fawcett (2013) highlighted the importance of conducting research at the
intersection of the domains of OR and LSCM, yet the number of studies with a BDA
focus in OR literature is low (Mortenson et al., 2015). This includes research on the
value of BDA (Vidgen et al., 2017; Hindle and Vidgen, 2018) and research from a
practical setting (Mortenson et al., 2015; Conboy et al., 2020). In addition, researchers
in LSCM literature have highlighted that more empirical research is needed to better
understand BD, BDA in supply chains in general (Wamba et al., 2015; Kamble and
Gunasekaran, 2019) and logistics (Hopkins and Hawking, 2018). Logistics is
highlighted as one of the most applicable areas for BDA (Kache and Seuring, 2017)
including to support route-optimisation using data such as telematics, traffic density
and weather (Rozados and Tjahjono, 2014; Hopkins and Hawking, 2018). Thus, there
is an opportunity for BDA to fill the gap between experience-led manual heuristics and
CVRS in vehicle routing:
Page 18 of 110
Adoption of BDA is lower in SMEs than larger organisations (Eurostat, 2020) which
the literature suggests is a question of differing levels of technology, process, people
and organisational resources and the challenges of achieving BDAC. The literature
often uses cases from large companies where an established analytics function already
exists and generalises to SMEs (e.g. Wamba et al. (2015); Vidgen et al. (2017); Belhadi
et al. (2019)). Additionally, real problem scenarios and cases exploring the barriers to
adoption of BDA within particular contexts are also limited in the literature (Kache and
Seuring, 2017; Shukla and Mattar, 2019) with much understanding built on the
assumption that all organisations face the same challenges (Mikalef et al., 2019).
Therefore, evidence from a real SME will help build these theories:
Research Question 2: What does the example highlight to explain why there is low
adoption of BDA in SMEs?
The aim of this research is to investigate how a BDA solution to a VRP compares to an
experience-led heuristic in an SME and to understand the issues this case highlights in
SME adoption of BDA.
Page 19 of 110
Chapter 3 - Methodology
3.1. Introduction
The aim of this research, the research questions, and the research objectives derived
from the review of the literature are shown in Table 3-1. The following chapter
describes the methodology for the research and includes the research philosophy,
research strategy, data collection instruments, plan for analysis and ethical issues.
Table 3-1 - Table of research questions, aim and objectives (source: author)
Page 20 of 110
3.2.1. Ontology
Ontological assumptions are concerned with the nature of being and reality (Saunders
et al., 2016). O'Gorman and MacIntosh (2015) describe ontology in straightforward
terms as viewing the world as either objective or subjective. An objective ontological
viewpoint assumes a reality consisting of objects that are measurable and testable
whereas a subjective ontological viewpoint assumes a reality as the emergence of the
individual perceptions and interactions of individuals (O'Gorman and MacIntosh,
2015). Although there is a large objective element to the research with the application
of BDA to solve a VRP, this analysis is grounded within the context, case, and reality
in which it occurs. The value of the research is the interpretation of the results with
respect to this context to be practical which, as highlighted in the literature review, is a
considerable gap within the literature. Thus, the research follows a mixed ontology.
3.2.2. Epistemology
Page 21 of 110
according to a different set of epistemological standards” than traditional science. Yet,
with the focus of BDA on the creation of practical functioning code, algorithms and
solutions (Lowrie, 2017; Mehozay and Fisher, 2019), it aligns with the pragmatic
epistemology. Therefore, this research follows a pragmatic epistemology.
3.2.3. Axiology
Axiology refers to the role of values and ethics within the research process (Saunders
et al., 2016) with the values informing the bias the researcher brings to the research
(O'Gorman and MacIntosh, 2015). In positivism and objectivity, the axiological
assumption is that the research is free from bias with the researcher seeking to minimise
the influence of values (Teddlie and Tashakkori, 2009). Whereas, in interpretivism, the
axiological assumption is the research is value-bound and biased due to the researchers
actively employing subjective intepretations of the data (O'Gorman and MacIntosh,
2015). In pragmatic epistemology, axiological assumptions are often overlooked in
research (Biddle and Schafft, 2015). With pluralistic methods employed by research
with a pragmatic epistemology, it is likely that parts of the the research will contain bias
and other parts will not (Creswell and Plano Clark, 2011). The axiological assumptions
of this research will be value-bound and contain bias largely through the analysis and
interpretation of the data in relation to the context.
There are three common approaches to theory development: Deduction, Induction and
Abduction (Saunders et al., 2016). In deduction, research tests existing theories through
hypothesis testing, whereas with induction, theory is derived from the research and
these are common analytic strategies in positivist and interpretivist epistemologies
respectively (Bryman, 2012). Conversely, abduction begins with an inductive approach
followed by the testing of the modified or generated theories (Kovács et al., 2005). This
research is inductive as it builds theory and does not systematically test existing theory.
BDA also tends to follow an inductive approach (Mortenson et al., 2015) and the aim
is to provide an example of BDA to build on existing theory of BDA adoption in SMEs.
The research strategy is the general plan of how the research will be undertaken to
answer the research questions (Saunders et al., 2016). Though some literature
Page 22 of 110
associates research methods with research purposes (e.g. Creswell (2003)), Yin (2018)
suggests that each research method can be used for all research purposes and the
selection of the research strategy depends on three conditions: (i) the form of the
research question, (ii) the level of control over events required and (iii) the recency of
the events. In this research, the research questions contain “how”, “what” and “why”
which perhaps indicates an explanatory research purpose (Saunders et al., 2016).
Explanatory research tends to be associated with either experiments, archival research
and case studies (Yin, 2018). Archival research assesses change over time (O'Gorman
and MacIntosh, 2015) with no control over the historic events (Yin, 2018). Conversely,
experiments tend to investigate causal links through controlled manipulation of
independent variables and measurement of dependent variables (O'Gorman and
MacIntosh, 2015). However, neither Archival research nor Experiments are suitable for
this research. Whilst this research tries to understand why a change is occurring, the
research focus is on observing and analysing a present problem within context to
understand and explain the low adoption of BDA in SMEs and as identified by the
literature review, there is limited literature featuring practical examples of BDA. Since
a case study investigates a contemporary phenomenon within its real-world context
(Yin, 2018), the research strategy is a case study. A single case study is used that
represents the common case of an SME that has not adopted BDA as this is likely to
provide insight into a typical, more prevalent situation (Saunders et al., 2016). Though
case studies are often criticised due to lacking generalisability (Saunders et al., 2016),
the aim of this research is to expand theories of the low adoption of BDA adoption and
not to extrapolate to the entire SME population (Yin, 2018).
The case for this research is an SME called Shortridge Limited. Shortridge trace
business roots back to 1845 in providing laundry services. The organisation have an
annual turnover of £9.5m and employ an average 246 staff (Shortridge Ltd., 2018) to
provide quality linen hire and laundry services to businesses in the North of England
and Scotland. Typically, the company service a range of industries and business sizes,
predominantly hospitality – hotels, B&Bs, holiday parks. Across three sites,
Workington, Dumfries and Darlington; Shortridge operate a fleet of 32 vehicles to
collect and deliver linen to customer sites up to six times per week. Shortridge have
faced challenges with vehicle routing relying on intuition and experience-led heuristics
Page 23 of 110
from their Transport team. In July 2020, Shortridge implemented CVRS, Max Optra,
which is claimed to reduce operational costs by up to 20% at an annual price of £600
per vehicle (MaxOptra, 2020) – roughly £19k for the Shortridge vehicle fleet.
Case study research commonly draws on multiple sources of data, including both
quantitative and qualitative, which converge in a triangulating fashion (Yin, 2018).
Using both types for data collection is called mixed methods (Creswell, 2003). Though
there are disadvantages associated with mixed methods research, like the extra skills
and resources required, a great advantage is that the strength of one method can offset
the weakness of another method (Creswell and Plano Clark, 2011). In particular,
qualitative methods can aid the explanation and utility of quantitative results (Bryman,
2012) that are often weak in understanding context when used alone (Creswell and
Plano Clark, 2011). Therefore, mixed methods are employed in this research in an
embedded design with an emphasis on the quantitative element (QUAN) and
integration of results during the interpretation (see Figure 3-1).
Data is considered primary if it originates for the specific purpose of the research
(O'Gorman and MacIntosh, 2015). Primary data was primarily qualitative and collected
through a self-completion questionnaire using Qualtrics (see Appendix 1). The
questionnaire consisted of 6 questions plus 2 questions for consent at the beginning,
and 2 questions for demographic information at the end. With the aim of the qualitative
questionnaire to explain the context of the quantitative results, the questionnaire was
Page 24 of 110
weighted towards open questions to capture the participants own words and
understanding as much as possible (Bryman, 2012). Following feedback from a pilot
on other students, two questions were posed as closed questions to help clarify the
question meaning (Bryman, 2012). The questions centre around how the employees in
the organisation solve problems (Q3), gauges their understanding of BDA (Q4 & Q5),
the barriers of using it (Q6) and the tools associated with BDA (Q7). Other qualitative
data collection instruments such as interviews potentially offer richer data collection,
however, the instrument is time-consuming for both researcher and the organisation
(Bryman, 2012). With the qualitative element having less emphasis in the research and
mixed methods research considered intensive, a questionnaire is chosen for
convenience and efficiency (Creswell and Plano Clark, 2011; Bryman, 2012). The
“online” mode of administration is select for similar reasons (Rosenfeld et al., 1993).
In recognition of the potential difference in response rates between modes of
administration (Bowling, 2005), a Director of the organisation distributed the
questionnaire as self-completion questionnaires typically have a low response rate
(Bryman, 2012).
The questionnaire was purposefully sampled from across the office staff at the
organisation. It was administered to participants in the organisation over email and was
completely anonymous. The questionnaire link was shared with 21 employees and 13
responses were received, a response rate of 61.9%.
Secondary data was sought from the academic literature to provide an understanding of
the background and current state of research into BDA, its adoption within SMEs and
the background to VRP. This data also was used in the formation of the codes used in
the qualitative data analysis.
To identify the historic routes and the vehicles used by Shortridge, telematics data was
extracted from the 3rd party telematics vendor portal: PRS telematics. Each vehicle is
fitted with a telematics device that transmits the GPS location of the vehicle when the
engine is switched on. A Transport Manager at Shortridge extracted a sample from the
Page 25 of 110
vendor cloud portal for analysis in Comma Separated Values (.csv) format and covering
the entirety of February 2020 (12,662 rows, 14 columns, 2MB) and transferred via
email. Additional secondary data was collected to supplement the telematics data
during the analysis (Table 3-2). Data from February 2020 was the most recent month
of normal business activity for Shortridge as UK Government imposed restrictions to
combat the Coronavirus pandemic took effect throughout March 2020 (UK
Government, 2020b). The one-month sample of automatically generated telematics data
may or may not meet the “Volume” characteristic of BD but the combination of
secondary data used, the varied formats and method of access suggest a presence of
other BD characteristics. Plus, building a routing solution to a bespoke problem is
considered innovation so “Volume” is a less important BD characteristic
(Ghasemaghaei and Calic, 2020).
Table 3-2 - Additional secondary data collected during the research (source: author)
Page 26 of 110
Table 3-3 - Summary of the approach to data analysis and how it aligns with research
aims and objectives (source: author)
Template analysis is a form of thematic analysis and is suitable for most analysis
approaches and forms of qualitative data, including questionnaire responses (King,
2012; Saunders et al., 2016). Template analysis is both systematic and flexible,
however, it has been criticised by some researchers for the focus on the template rather
than the data (Saunders et al., 2016). An alternative technique would be content analysis
which codes qualitative data in order to analyse it quantitatively (Saunders et al., 2016)
or grounded theory, a recursive analysis that generates theory due to close alignment of
the analysis and theory (Bryman, 2012). However, due to the expected small sample of
responses, treating the data as quantitative is unlikely to yield useful findings or be
substantial enough to generate new theory. Template Analysis better supports the
purpose of this research to supplement existing theory.
For the analysis, an initial coding template for each of the questions was generated a-
priori using codes generated from the literature review (King, 2012). The codes were
then modified during the analysis of the questionnaire responses, (the initial template
Page 27 of 110
is shown in Appendix 8). Questionnaire responses were downloaded from Qualtrics and
analysed in Microsoft Excel.
BDA generally requires an explorative and inductive approach as the analysis often
starts from a dataset rather than theory and requirements (Mortenson et al., 2015;
Kayser et al., 2018; Chehbi-Gamoura et al., 2020). There is limited literature on
methods of approaching Big Data Analysis (Hindle and Vidgen, 2018). The 6 step
method proposed by Akter et al. (2019) was adapted for this research (Table 3-4).
All programming and analyses were conducted in Jupyter (v6.0.3) using Python (v3.8.3
64bit) programming language. Microsoft Excel was also used to store data and collate
results. To perform the routing optimisation, commercial solvers like Gurobi
Optimization (2020) and CPLEX from IBM (2020) were excluded due to cost. Two
open-source solvers were considered: Google OR-Tools (2020b) and the “vrpy” library
from Montagné and Sanchez (2020). Due to “vrpy” package being in beta development
phase, Google OR-Tools was selected.
Page 28 of 110
environme perform the Installation of local routing engine
nt tools analysis (OSRM) built on the OpenStreetMap data
(Luxen and Vetter, 2011) in a docker
container
Ethics are the standards of behaviour that guide conduct and the rights of the
participants during the research (Saunders et al., 2016). Bryman (2012) refers to four
principles of ethics: harm, consent, deception, and privacy. For this research, both the
organisation and the employees participating in the questionnaire are participants. A
separate information sheet was shared with both the organisation and was attached to
the email with the online questionnaire link (Appendix 9). The information sheet
explained the study, what information was being requested and explained the
participant’s right to anonymity and the right to withdraw. Consent was sought from
the organisation and the questionnaire had two questions related to consent to
Page 29 of 110
participate in the research. To align with GDPR (2018), the minimum amount of data
was requested, and the data was kept confidentially and securely on the Heriot-Watt
University OneDrive. The data will also be retained for no longer than required by the
assessment process (CDRC, 2018).
3.8. Conclusion
Page 30 of 110
Chapter 4 - Findings and Data Analysis
4.1. Introduction
This chapter presents the findings and the data analysis from the research methods
described in the previous chapter. The chapter begins with the quantitative BDA of the
telematics data and the comparisons in routing between the derived algorithms and is
followed by the qualitative findings of the online questionnaire of the employees.
There are 6 analyses; an initial descriptive analysis of the telematics dataset to explore,
understand and measure the experience-led heuristic (original) routing, followed by five
comparative analyses with the experience-led heuristic routing and the five versions of
BDA solution to routing (algorithm) (Table 4-1). The five versions of algorithm routing
were developed iteratively in Python (v3.8.3 64bit) using the Google OR-Tools
(2020b) constraint programming solver.
Analysis Description
1 Route level comparison Applying the Travelling Salesperson Problem
of original with TSP (TSP) to the original routes
2 Depot level comparison Algorithm builds the same number of routes from
of original with CHVRP the nodes serviced from the same depot on the
(capacity = number of same day but is restricted to using the same number
stops) of stops from the original routes
3 Depot level comparison As Analysis 2, but customer demand is calculated,
of original with CHVRP and vehicle capacity is used instead of number of
(capacity = derived) stops
4 Depot level comparison As Analysis 3, except any vehicle at the depot can
of original with CHVRP be used
(capacity = derived, all
vehicles)
5 Depot level comparison As Analysis 4, with additional constraint of
of original with maximum route time of 9 hours and 10 minutes
TCHVRP (capacity = service time per node (to meet UK driving limits
derived, all vehicles) (UK Government, 2020a))
The analysis followed the process as outlined in the methodology (see Table 3-4) and
key preliminary activity is presented before the results and analysis.
Page 31 of 110
4.2.1. Problem Context
Linen is delivered in separate laundry bags for each customer weighing 10-15kg on
average for both soiled and clean linen. The bag weight is highly variable, depending
on the order size and due to differences in packing between depot and the customers,
with bags of dirty linen known to have been in excess of 25kg. Delivery and collection
can be at any point during the day with customers having specific instructions on where
the exchange of linen needs to occur. Larger customers, such as hotels, are serviced
more often, generally have larger orders and tend to have laundry delivered in roll cages
in 12T or 18T vehicle.
Table 4-2 - Shortridge vehicle fleet (types and capacities) (source: author)
Page 32 of 110
The routes follow a weekly cycle with different geographical areas serviced on different
days (e.g. North Pennines on a Wednesday). In addition, some larger customers are
serviced daily with smaller and medium customers once or twice per week. The routes
are manually constructed, largely unchanged week-to-week and rarely re-evaluated or
optimised. Existing customers without an order for that day are excluded from the route
and new customers are added to existing routes based on demand and the existing
customers they are closely located to. This approach fosters a routine that gives the
drivers familiarity and is easier to manage and administer.
Customer demand is seasonal with demand peaking in both the summer and around
Christmas and New Year. To cope with the rise in demand, routes switch to “Summer
Routes” which are longer, use larger vehicles, and sometimes include a driver’s mate
to help with deliveries and collections. The switch to “Summer routes” tends to occur
in April lasting until early October and in mid-December, lasting until mid-January.
As highlighted by Hazen et al. (2014), poor data quality can lead to incorrect decisions,
inaccurate insights and reduced value. Thus, the first step of the data exploration was
to identify the Critical Data Elements (Table 4-3) for the analysis and transform the
data into an appropriate format for analysis.
The transforming and cleansing of the February dataset with 12,662 rows of data is
described in detail in Appendix 3. This led to the identification of 1720 unique
postcodes (or nodes) which is greater than the 1000 customers Shortridge estimated
their customer base to be, indicating noise in the data. Ideally, these nodes would be
cross-referenced with the customer base or the nodes would directly come from the
Page 33 of 110
customer base. However, the aim of the analysis is to illustrate that BDA can be used
to provide solutions to the VRP as an alternative to experience-led heuristics.
Table 4-3 - Identification of the critical data elements from the telematics dataset
(source: author)
Critical
Column Description %Populated Data
Elements*
Registration of the vehicle (number
Registration 100% Critical
plate)
Start Time Datetime when journey started 100% Low
End Time Datetime when journey ended 100% Critical
Start Description of location/Address
100% Low
Location where vehicle starts journey
End Description of location/Address
100% Low
Location where vehicle ends journey
Start POI “Shortridge Darlington” or empty 8% No
End POI “Shortridge Darlington” or empty 8% No
Driver Name of driver 21% No
Duration Length of time the journey took 100% No
Length of time spent stationary on the
Idle 100% No
journey
Miles Distance covered on the journey 100% No
Max Speed Highest speed attained on the journey 100% No
Start
Postcode at the start of the journey 100% Low
Postcode
End
Postcode at the end of the journey 100% Critical
Postcode
*Critical – fundamental for analysis, Low – used in data cleansing activity, No – not used
The baseline datasets for analysis consist of 630 routes and 72 depot-day pairs. All
distances and durations in the datasets use the same distance and duration matrix from
OSRM to ensure comparability (see 4.2.2.3.)
4.2.2.2. Geocoding
Geocoding was undertaken via HTTP requests to the GetTheData API and the
responses parsed using JSON embedded within the Python code. The function to do
this is shown and built using guidance from Andrade (2018) (Figure 4-1). For the 1720
nodes, this took roughly 20 minutes to complete.
Page 34 of 110
Figure 4-1 - Python code used to geocode the postcodes from the telematics data
(source: author)
The setup for a local routing engine, OSRM, is described in Appendix 4. The distance
and duration matrix are retrieved from OSRM via a HTTP request of longitude and
latitude tuples to the backend of the OSRM using the code shown in Figure 4-2.
Figure 4-2 - Python code to request distance and duration matrix from local OSRM
(source: author)
The routing algorithm was setup using Google OR-Tools (2020b) guidance. Analyses
1 and 2 use the heuristic “PATH_CHEAPEST_ARC” for the first solution and the
metaheuristic “GUIDED_LOCAL_SEARCH” to refine the solution as recommended
by Google OR-Tools (2020a). In Analyses 3, 4 and 5, the heuristic was changed to
“PARALLEL_CHEAPEST_INSERTION” which improved the number of solutions
found and followed guidance from the developers (see Furnon (2017)). A time-limit
Page 35 of 110
was implemented for each routing problem in the analysis as the metaheuristic will run
infinitely otherwise (see Appendix 5).
The objective of the routing algorithm was to minimise distance travelled. Analysis 1
was also simulated for minimising route duration and there are not substantial
differences in results between the two variables (see Table 4-3). Distance was selected
because Duration is much more vehicle dependent (e.g. maximum speeds) and since
the default vehicle profile was installed in the routing engine, OSRM, it is more likely
to be inaccurate (see Appendix 4).
4.2.2.5. Assumptions
Table 4-4 - Assumptions for each version of the routing algorithm analysis (source:
author)
iii. Order volume and customer demand are the same for each node 1,2
vii. 18T vehicles are primarily used for trunking. These are excluded to 2,3,4,5
simplify the analysis. Further modifications to the algorithm would
be required for their inclusion (e.g. customers that can be serviced
by certain vehicles)
viii. 6 is the maximum times a customer/node can be visited per week 3,4,5
(Shortridge Ltd., 2020)
ix. Vehicles rarely operate at 100% capacity 3,4,5
Page 36 of 110
4.2.2.6. Derived Customer Demand (CoODV)
Analyses 3, 4 and 5 consider the capacity of the vehicle in route selection. Real demand
data was not used in the analysis, so this was derived to provide an understanding of
the impact of capacity constraints on the routes. Customer demand is measured in bags
and was derived based on assumption v. (Table 4-3); customers that are visited most
frequently tend to be larger customers and have larger orders. Thus, customer demand
was derived as a function of the frequency of deliveries per week and the capacity of
the vehicle that originally made the delivery, termed CoODV. An example of how this
was calculated is shown in Figure 4-3 with further detail in Appendix 6.
Figure 4-3 - Example of how demand was derived based on delivery frequency and
capacity of original delivery vehicle (source: author)
Analyses 1, 2 and 3 use the original delivery vehicles whereas Analysis 4 and 5 expand
to include all delivery vehicles at the depot.
Page 37 of 110
Table 4-5 - Number of each vehicle type at each depot (source: author)
The five key findings are presented with a summary of the results from the five
comparative analyses is shown in Table 4-6.
Page 38 of 110
1) Route level 2) Depot level 3) Depot level 4) Depot level 5) Depot level comparison of
comparison of comparison of comparison of original comparison of original original with TCHVRP
original with TSP original with with CHVRP (capacity = with CHVRP (capacity (capacity = derived, all
Analysis CHVRP derived) = derived, all vehicles) vehicles)
Table 4-6 - Summary of the five analyses setups and results (source: author)
H00335623
Edmund Houldridge
(capacity =
number of
stops)
Assumptions i, ii, iii i, ii, iii, iv, vii i, ii, iv, vi, vii, viii, ix i, ii, iv, vi, vii, viii, ix i, ii, iv, v, vi, vii, viii, ix, x
490 routes that began 72 depot-day 72 depot-day pairs (472 72 depot-day pairs (472 72 depot-day pairs (472
and end at the same pairs (472 routes, routes, routes operated by routes, routes operated routes, routes operated by 18T
depot and had more routes operated 18T vehicles excluded, by 18T vehicles vehicles excluded, 12T
than one stop by 18T vehicles 12T vehicle PX19 LCJ excluded, 12T vehicle vehicle PX19 LCJ also
Scope excluded, 12T also excluded) PX19 LCJ also excluded).
vehicle PX19 excluded). Entire vehicle fleet in scope.
LCJ also Entire vehicle fleet in Maximum route duration at 9
excluded) scope hours. 10 minutes per node
PATH_CHEAPEST_A PATH_CHEAPES PARALLEL PARALLEL PARALLEL
RC / T_ARC / _CHEAPEST_INSERTION / _CHEAPEST_INSERTIO _CHEAPEST_INSERTION /
Heuristic /
Page 39 of 110
Time taken to run 8 hours 8 hours 11.5 hours 15 hours 15 hours 15 hours 15 hours 15 hours 15 hours
Heriot-Watt University
% scope with
100% 100% 100% 100% 99% 100% 99% 96% 78%
solutions found
% total reduction in
5% 4% 20% 30% 24% 31% 25% 25% 21%
distance
% total reduction in
4% 4% 17% 25% 20% 26% 21% 13% 11%
time
% total reduction in
N/A N/A N/A N/A N/A 19% 8% 18% 17%
average vehicles used
4.2.3.1. Key Finding 1 – The Original Routing operates a similar number of routes each
weekday despite a drop in nodes/customers on a Wednesday
In February 2020, Shortridge operated 630 routes covering 1720 unique nodes
(including 3 depot nodes). 111 (18%) of these routes involved the 18T vehicles for
trunking. Figure 4-4 displays the number of routes operated each day in February
(excluding Trunk vehicles) with a clear distinction between weekends and weekdays.
The number of nodes serviced each day follows a similar pattern (Figure 4-5). The main
exception to the pattern are Wednesdays which are on average over half the average for
the other weekdays (172.8 vs 370.7) but from Figure 4-4, a similar number of routes
are operated.
Figure 4-4 - Time series of the number of routes operated in February (source: author)
Figure 4-5 – Time series of the number of nodes serviced per day during February
(source: author)
Page 40 of 110
On weekdays the mean and median number of routes is 24 routes and reduces to 4 on
weekends and this pattern is broadly stable noted by the standard deviation of ± 2 routes
(Table 4-7). The number of nodes serviced per day has a greater variance, especially on
a weekday with a range of 294 nodes, largely driven by the differences on a Wednesday
(Table 4-8).
Table 4-7 - Descriptive statistics of the number of routes per day in February (source:
author)
Table 4-8 - Descriptive statistics of the number of nodes serviced per day (source:
author)
Figure 4-6 shows the total distance covered and the combined duration of all vehicle
journeys in February. The pattern of the chart mirrors the pattern of the nodes per day
with a similar distance and duration covered each weekday except for Wednesday
(Figure 4-5). A possible inefficiency is highlighted by operating as many routes for a
reduced number of nodes on a Wednesday if total distance and total duration are also
reduced. Unless each vehicle is at 100% capacity, it perhaps implies the underutilisation
of the vehicles being used.
Page 41 of 110
Figure 4-6 - Time series of total distance (left) and total duration (right) of routes
during February (source: author)
Figure 4-7 shows that the Wednesday routes cover the North Pennines and North
Yorkshire Dales and visually there appears no spatial reason to have as many routes
operating. In addition, Figure 4-8 shows that these Wednesday nodes are predominantly
serviced once per week which suggests they are not Shortridge’s larger customers and
thus, it is unlikely all the vehicles are at capacity.
Figure 4-7 - Map of nodes visited in February 2020 with red denoting location visited
on a Wednesday (source: author)
Page 42 of 110
Figure 4-8 - Map of how many times per week a node is serviced: grey - once, black -
twice and red – three and more (source: author)
4.2.3.2. Key Finding 2 – Algorithm routing reduced the total route distances (and
duration) in all comparative analyses
All versions of the Algorithm routing reduced the total distance travelled throughout
February by between 4-31%. Total duration of routing was also reduced by between 4-
26%.
Analysis 1 shows that by simply reordering the delivery route to minimise the distance
travelled can result in a saving of 5,385km (5%). The majority of routes have less than
a 5% reduction in distance travelled (39%) or no improvement at all (31%) under
Algorithm routing, however, there are routes that make reductions in distance of up to
36% (Figure 4-9).
Page 43 of 110
Figure 4-9 - Histogram of percentage difference in route distance (left) and duration
(right) between Original and Algorithm routes in Analysis 1 (source: author)
It might be expected that the greater the number of nodes in a route, the greater the
difference between the Algorithm routing and Original routing. However, there is only
weak, positive correlation for both route distance and route time (0.44 and 0.36
respectively using Kendall’s Tau) (Figure 4-10), indicating extra variables are needed
to explain the relationship.
Figure 4-10 - Scatter graphs of the number of stops versus %difference between the
Original and Algorithm routing in route distance (left) and time (right) in Analysis 1
(source: author)
Page 44 of 110
in the shape of the routes with the Original route seemingly criss-crossing, back and
forth and the Algorithm route more circular (note the distances are road distances, only
the visuals are as-the-crow-flies). By simply reordering the nodes, 58km is saved.
There may be extraneous factors why the order of the Original routes was selected
which are overlooked by the Algorithm and underlying routing engine. For example,
traffic might build-up at different points in the route or one node may have had a
particularly urgent order, however, it should be factored into the decision-making that
covering this extra distance increases the variable costs of the overall operation and
should be balanced accordingly.
Page 45 of 110
Figure 4-12 – Map plot of Original route “2020-02-26+PX19LCJ” with the red tooltip
indicating the depot (source: author)
Figure 4-13 - Map plot of Algorithm route “2020-02-26+PX19LCJ” with the red
tooltip indicating the depot (source: author)
Page 46 of 110
4.2.3.3. Key Finding 3 – Giving the Algorithm routing further freedom to select routes
at depot level further reduced total routing distance and time in all depot-level analyses
Whilst the depot-level version of the Algorithm routing includes further assumptions
around vehicle capacity and accessibility of customer sites which will reduce how
realistic the simulation is, by allowing the Algorithm routing this extra freedom, the
savings in distance covered increased to 20-31% versus the 4-5% at route-level.
Figure 4-14 shows the differences in route distribution between the Original routing
and the Algorithm routing. With depot-level routing, there is a more pronounced left-
shift in the distribution to routes between 0-150km with a compensatory reduction in
routes between 150-350km whereas at route-level these savings are not achieved to the
same extent (Original route distributions are slightly different between the charts due
to data cleansing reasons, see Appendix 3).
Figure 4-14 – the distribution of route distances for Original and Algorithm routing.
Adjusting only the order of routes in Analysis 1 (left) and allowing the Algorithm to
route at depot level in Analysis 2 (right) (source: author)
Page 47 of 110
is visualised by the differences routes on Figure 4-17 and Figure 4-18. The Original
routing has overlapping routes and limited node-clustering with each route covering
similar areas. Conversely, the Algorithm routing shows a clustering of nodes with
distinct separation of routes.
Figure 4-15 – Network view of Original routing (left) and Algorithm routing (right)
from Workington depot on 05/02/2020 in Analysis 2 (source: author)
Figure 4-16 – Comparison in vehicle distance (left) and time (right) between Original
routing and Algorithm routing from Workington depot on 05/02/2020 in Analysis 2
(source: author)
Page 48 of 110
Figure 4-17 – Map plot of Original routing from Workington depot on 05/02/2020
(source: author)
Figure 4-18 - Map plot of Algorithm routing from Workington depot on 05/02/2020 in
Analysis 2 (source: author)
Page 49 of 110
4.2.3.4. Key Finding 4 – The Algorithm routing maximises vehicle utilisation across
the fleet to first reduce the number of vehicles required and then individual vehicle
utilisation
Analysis 3 and Analysis 4 simulate customer demand and vehicle capacity constraints.
The difference is that Analysis 4 permits any vehicle from the depot and Analysis 3
restricts to the vehicles used in the Original routing. Table 4-9 shows the difference in
usage of the vehicle types across all depots. By giving the algorithm freedom of vehicle
selection, the vehicles with the highest capacity are used more regularly with usage of
the 500-bag capacity vehicle type increasing from 86% to 95% and thus, the smaller
capacity vehicle usage reduced.
Table 4-9 – Percentage of available journeys* vehicle type is used during February
2020 (source: author)
Due to customer demand being derived from the telematics data, comparisons between
the Algorithm routing and Original routing are questionable due to the assumptions.
Yet, there is a hint that vehicle selection might be suboptimal. Consider the example
from before, Workington depot on 05/02/2020 (Figure 4-17). Figure 4-19 shows the
Algorithm routing uses the larger vehicle as much as possible. Additionally, when the
Algorithm can select from the fleet of vehicles in Analysis 4, both 500-bag capacity
vehicles are utilised, reducing the number of vehicles needed from 5 to 3. Since
customer demand has been derived based upon the capacity of the original delivery
vehicles, in this example there are 1160 bags to be delivered. Whereas the Original
routing uses 5 vehicles to achieve that capacity, the Algorithm uses 3 vehicles, with 1
vehicle having a surplus space of 40 bags. Demand may be less too, say 986 bags, in
Page 50 of 110
which case, the algorithm shows that only two vehicles are required (Figure 4-21). In
both cases, route duration is not substantially impacted either (e.g. Figure 4-22).
Figure 4-19 – Number of nodes visited per vehicle for the Original routing and the
Algorithm routing under 95% CoODV simulation (left) and the bag capacity of the
vehicles (right) (source: author)
Figure 4-20 - Number of nodes visited per vehicle for the Original routing and the
Algorithm routing under 95% CoODV simulation (left) and the bag capacity of the
vehicles (right) with vehicle selection (source: author)
Figure 4-21 - Number of nodes visited per vehicle for the Original routing and the
Algorithm routing under 85% CoODV simulation with vehicle selection (source:
author)
Page 51 of 110
Figure 4-22 – Route duration for the Original routing and the Algorithm under 95%
CoODV simulation with vehicle selection (source: author)
Figure 4-23 - Map plot of Algorithm routing from Workington depot on 05/02/2020 in
with 95% CoODV and vehicle selection (source: author)
Page 52 of 110
There may be other reasons for vehicle selection not captured in the Algorithm routing:
particular vehicles may have been unavailable through maintenance, sites could be
inaccessible by larger vehicles, or drivers unavailable etc. However, this also impacts
the size and makeup of vehicle fleet required. In the analysis, the fleet consists of 28
vehicles (Table 4-10) and the maximum number used on any day in February was 25
vehicles occurring 7 times in the month suggesting there is a surplus of vehicles.
However, February is low-season for Shortridge, and the surplus vehicles are likely
utilised during high-season. Yet with improved routing and vehicle selection, both the
maximum number of vehicles required and the frequency in which the maximum occurs
reduced to 20-24 vehicles. There is also the possibility of reducing this even further by
holistic planning of customer demand, operations, and routing so slack from weekends
and Wednesdays may also be used.
Table 4-10 – Average usage of each vehicle type during February 2020 (source:
author)
4.2.3.5. Key Finding 5 – Algorithm routing does not always find a solution when the
solution-space is small
A Vehicle Routing Problem may have many solutions, however, only one of these
solutions is the Global Optimum. The more constraints that are added and the more
restrictive those constraints are, the smaller the solution-space becomes i.e. there could
be fewer local optima or even a single optimum. Thus, the heuristic and metaheuristic
employed by the algorithm may not find a solution within the time limit.
Page 53 of 110
Each of the 95% CoODV simulations failed to find a solution for the same record
(Dumfries on 19/02/2020). The problem appears feasible as it is designed based on the
Original routing and appears unremarkable with 49 nodes and a total demand of 2537
bags versus a total vehicle capacity over the 8 vehicles of 2700 bags. Furthermore,
additional constraints could make the problem unsolvable. This is perhaps evidenced in
Analysis 5 where limiting the duration of routes to less than 9 hours including a 10-
minute stop per customer may have made problems unsolvable, since in the analysis
the Original routing had routes over 9 hours under these conditions (Figure 4-24). Table
4-11 shows that the problems failing to find a solution, tended to have greater nodes
suggesting a more complex problem and smaller solution space.
Table 4-11 – Description of records with and without solutions from the routing
algorithm in Analysis 5 (source: author)
Figure 4-24 – Routes from Original routing and Algorithm routing in Analysis 5 with
the route time constraint (source: author)
Page 54 of 110
4.2.4. Limitations
The Algorithm routing and simulations are a simplification of a real and dynamic
problem. Whilst the scope and assumptions have been explicitly stated, there will be
intricacies and variables in the real problem that have not been accounted for. For
example, the 18T vehicles are excluded from much of the analysis to simplify the
problem to customer deliveries only. However, as well as sometimes being used in
customer deliveries, deliveries from Darlington depend on these vehicles. Hence, these
vehicles have an important role on the real routing problem which is not accounted for
in these simulations. The definition and derivation of customer demand in Analyses 3,4
and 5 is unlikely to completely reflective of actual customer demand. True customer
demand is independent of vehicle capacity and will be driven by lots of factors
including both the season and the linen the customer already has in circulation. As well
as limitations in the data and how it is used (e.g. 1720 nodes versus a customer base of
1000), the visualisations also highlight discrepancies in the data of how a route has been
defined as the activity of the vehicle on a particular day. For example, for Route “2020-
02-28+PX67TNO”, the baseline route is split into two parts, returning to the depot in
between. Under a different route definition this could be defined as two routes or it
could highlight, if vehicle capacity was available, an inefficiency in the routing. The
algorithm for this route reduces the distance travelled by 157km (31%) (Figure 4-25).
Further cross-referencing of data sources would improve validity of the results.
Figure 4-25 – Example of a route that returns to the depot twice (source: author)
Page 55 of 110
4.3. Qualitative Analysis
The results and template analysis of the online questionnaire is described below. The
aim of the questionnaire was to gauge broad perceptions and understanding of BD and
BDA across the organisation. The questionnaire was designed to be completed in less
than 15 minutes to avoid taking too much time away from the business and ensure
completion. Completion rate was 85% and average completion time was 8.5 minutes.
There were 13 participants that responded to the online questionnaire at Shortridge with
all participants answering positively to the questions on consent. However, 2 (15%)
participants did not complete the full questionnaire and with the placement of the
demographic questions at the end of the survey, these were not completed. Placing the
demographic questions earlier in the questionnaire may have led to lower response rates
in general (Roberson and Sundstrom, 1990) or it may have had no effect (Teclaw et al.,
2012). As the purpose of demographic questions in research is often either as an
independent variable or to ensure the “correct” population responded to the survey
(Hughes et al., 2016), its less important in this research within a single organisation.
Figure 4-26 shows how the 11 participants identified their role at Shortridge and the
length of time they have worked at Shortridge. Most participants identified as Senior
Management (64%) and had been with Shortridge at least one year (72%). The sample
is also influenced by the effects of the COVID-19 pandemic as only employees that
returned from furlough on 1st July 2020 were sent the link for the questionnaire.
8 64% 6
7 45%
5
6
5 4
Frequency
Frequency
4 27%
27% 3
3 18%
2 9% 2
1 9%
1
0
Sales / Customer Manager / Senior 0
Service and Supervisor Management More than 3 1 to 3 years Less than 1 Other
Office years year
Position at Shortridge Length of time with Shortridge
Figure 4-26 – Role of participant at Shortridge (left) and length of time with Shortridge
(right) (source: author)
Page 56 of 110
4.3.2. Problem-Solving (Q3)
The purpose of the question was to understand the approach the participant takes to
solve a problem. Following feedback from the pilot, this became a closed question to
aid understanding of the question, with three responses indicating the participant either
solves problems using their experience and intuition (A), shared problem-solving with
other colleagues (B) or through data and analysis (C). All 13 respondents answered the
question.
Q3 If you were faced with a problem in your normal work, such as planning a
large production line, choosing to offer an additional product/service, or a
change in regulations or guidelines; would you...
A use your own experience and judgement to solve the problem 1 (8%)
and make a decision
B identify and speak with colleagues (or the internet) who may 6 (46%)
have an answer and between you make a choice
C collect data on the problem from lots of sources and using 4 (31%)
analysis outputs to make a decision
Sourcing external information was the approach taken either from other colleagues or
from hard data, although when speaking with a colleague that external information
could be experience based or more likely from these results, data-driven. This could
also be indicative of the demographic of senior management who will regularly raise
and discuss issues during management meetings or seek insight from direct reports.
The purpose of the question was to gauge the participants understanding of BD through
association with a company and prime the participant to be thinking in terms of BD for
the next questions, thus avoiding technical terms that may decrease response rate
(Bryman, 2012). The majority (77%) of participants showed an understanding of BD
through association by naming large technology companies who use BD (e.g. Amazon,
Google, Netflix, etc.), industries (e.g. financial services) or mainstream news (e.g.
Page 57 of 110
Cambridge Analytica). The remaining 3 participants (24%) did not provide an answer
or indicated they did not know.
The purpose of the question was to understand if the participants could recognise
opportunities for problem-solving with BD in their workplace. The majority (69%) of
participants provided an example with Participant 4 providing five possible examples.
General themes centred around the operational aspects of the organisation (e.g. washing
machine efficiency, transport and routing, daily reporting) and customer demand, both
current (e.g. customer databases, orders) and prospective (e.g. marketing, market
research). Participant 9 also gave an interesting response eluding to their definition of
BD:
“…the term “big” is just relative to the size of the organisation.” (Participant 9)
Perhaps illustrating the ambiguity of the definition of BD and the differences it has
between the mainstream and literature. Although this participant clarifies that most
organisations can use data, the response possibly eludes to a misconception that BD
applies only to “Big” companies.
The purpose of the question was to gauge the participants perception of the barriers to
BD at Shortridge and response rate was 72%. Themes broadly align with the
Technology, People and Organisation resources from the RBV of the firm (Table 4-13).
Several responses related to the data itself. Data integrity and quality is well-
documented in the literature (Hazen et al., 2014; Wamba et al., 2015). Whilst data
protection is an issue for some use-cases of BD, particularly for personal customer
information, it is unlikely to be an obstacle for all use-cases (e.g. washing machine
efficiency which this participant suggested for the previous question). Additionally, the
response by Participant 1 on applicability of the data perhaps corroborates the limited
knowledge of BDA within the business suggested by Participant 10.
Page 58 of 110
Table 4-13 – Summary of responses to Q6 from the questionnaire (source: author)
Resource Responses
Type
The purpose of this question was to understand the technical skills of the participants
at Shortridge through selection from a non-exhaustive list of BDA tools. Excluding the
participant who did not select any of the options, every participant had used Microsoft
Excel. Both Google and Amazon analytics and cloud products were selected by over
50% of participants. The validity of these responses is questionable considering the
answers to previous questions. In addition, experience of these computing services
tends to be accompanied by knowledge of a programming language, which only 1
respondent answered positively to those – SQL. This perhaps highlights a limitation
with the question design too. An open question or forced-choice design (Bryman, 2012)
may have yielded more reliable results or perhaps it is the conglomerate nature of these
companies that has caused confusion.
Page 59 of 110
11
10
9
8
7
Frequency
6
5
4
3
2
1
0
Microsoft Google Amazon Web SQL (inc. Hadoop / Python / R
Excel Analytics / Services MySQL, Spark / Hive
Google Teradata etc)
Cloud
Figure 4-27 – Selected software, services and programming options to Q7 from the
questionnaire (source: author)
Analytical techniques have low frequency selection with Regression Analysis the most
frequent selection of analytical technique with 3 positive respondents. This is
corroborated by the response of Participant 9 whom adds:
11
10
9
8
Frequency
7
6
5
4
3
2
1
0
Linear Programming / Clustering / Regression Analysis Machine Learning
Optimisation Segmentation
Techniques
4.4. Conclusion
Through an inductive and iterative approach, the analysis illustrates how the
experience-led heuristic (original) routing at Shortridge compares to a BDA solution
(algorithm). The BDA solution appears to show improvements on the experience-led
Page 60 of 110
heuristic, reducing the total distance between 4-31% and total duration by 4-26% and
explores the subsequent impact on fleet size. Though there are several limitations and
assumptions which will impact the external validity of the results, the indication is that
a more objective approach to vehicle routing, standardised across all three depots would
ultimately reduce delivery costs. Additionally, the reduced number of deliveries on a
Wednesday perhaps indicates that more holistic demand planning would benefit the
vehicle routing too. The results of the questionnaire indicate a limited understanding
and knowledge of BDA techniques at Shortridge. Although most participants elect to
solve problems with peer support and some identify as having data-driven problem-
solving approaches, this appears not to be comparable with modern analytical
approaches. 75% of respondents identified a BD associated business, 69% were able to
identify a use-case at Shortridge and a minority of the question responses clearly show
an appetite for using data and analytics but the level of knowledge and understanding
across the organisation is clearly a barrier. This is borne out in the responses to BD
obstacles and the BDA tools and techniques.
Page 61 of 110
Chapter 5 - Discussion
5.1. Introduction
The aim of this research is to investigate how a BDA solution to a VRP compares to an
experience-led heuristic in an SME and to understand the issues this case highlights in
SME adoption of BDA. The following chapter discusses the findings from undertaking
the five objectives (Table 3-1) in relation to the two research questions and the academic
literature.
5.2. Discussion
5.2.1. Research Question 1 - How does a BDA solution to a VRP compare to an
experience-led heuristic?
The results show the BDA solution outperformed the experience-led heuristic routing,
reducing distance covered between 4-31% which aligns with the 10-30% reduction
often claimed by vendors of CVRS (Bräysy and Hasle, 2014). The results also support
the findings by Fontaine et al. (2020) that even with experienced logistics personnel,
manual routing rarely solves the VRP optimally. Additionally, the differences between
the two methods are less marked when viewed at route-level than depot-level (4% vs
20-31%) which aligns with Fontaine et al. (2020) finding that within-cluster routing is
reasonable but clustering is generally poor. However, this may be an unfair comparison
since the structure of the routes go largely unchanged at Shortridge, only when the
demand peaks do the routes change, and this is not captured in the one-month window
used for the analysis. Yet not changing the routes for ease of administering the routes
and for developing a routine for the driver is indicative of VRP complexity. It highlights
the planning resource required for manual planning, the potential errors and
inefficiencies (Carlan et al., 2020) so using the same routes simplifies the problem.
Such a simplification ignores the principles of routing in the literature (e.g. minimising
mileage, using the largest vehicle first to maximise utilisation (Güneri, 2007)) leading
to excess variable costs. Conversely, the BDA solution captures these principles
inherently through the objective function. CVRS is estimated to save 80-90% of the
planning time over simple heuristics (Bräysy and Hasle, 2014) and, once developed,
the indications from this analysis are an equivalent BDA solution would do the same.
Page 62 of 110
The BDA solution also provides an oversight on the route duration allowing objective
performance monitoring of the routes and the drivers. Integrating the telematics with
BDA routing could also provide real-time updates on delivery progress and
interruptions on the route (Hopkins and Hawking, 2018). Furthermore, the BDA
solution designs routes around the customer demand rather than fitting the customer
demand into predetermined routes changing the focus of the logistics from the operation
to the customer, enabling involvement in holistic demand planning. Such a change in
focus has been shown to have further benefits such as higher customer satisfaction and
improved competitive advantage (Thomé et al., 2012; Wagner et al., 2014). With the
saving in route durations, it finds extra capacity for Shortridge to expand sales, reduce
their fleet or enhance their customer service offering for further competitive advantage.
For example, encouraging the drivers to use the extra time to provide a better customer
service.
The BDA solution to the VRP and subsequent routing improvements showcases the
insights and value that can be generated from BDA in an SME and confirms BDA as
an effective tool for planning (Chehbi-Gamoura et al., 2020). Despite this, there are
limitations to the analysis, not least from the quality of the raw telematics data. Under
Page 63 of 110
the RBV (Barney, 1991), BD is a critical resource (Mikalef et al., 2018) and multiple
researchers highlight the importance of data quality in using BDA to avoid incorrect
decisions (Hazen et al., 2014). The autonomously generated telematics data contained
a large volume of noise requiring extensive data cleansing for analysis. However, there
is still noise in the data as seen by the 1720 nodes versus a customer base of
approximately 1000. Though the analysis could have cross-referenced with other
datasources for validation, the noise inherent in the data potentially decreases the value
of the insights. LaValle et al. (2011) stated data quality is not a barrier to adoption,
however, it is a significant obstacle and can limit the value (Wamba et al., 2015).
To overcome such data quality issues and deliver value requires the intangible resources
of technical expertise to transform the data and understand the business context (Waller
and Fawcett, 2013; Wamba et al., 2015). In this analysis, this meant the transformation
of the raw data, the routing engine setup, the build of the routing algorithm to meet the
problem requirements, and then to run the simulations. However, the results of the
questionnaire indicate a limited understanding of BD and BDA with numerous
participants citing little experience and knowledge across Shortridge as a barrier to
adoption. It appears a little more nuanced though since the majority (69%) of
participants provided an example potential BD at Shortridge indicating an awareness of
what BD is. The possible gap in knowledge is in knowing both how to collect the BD
and then to derive value from it with BDA. This gap in empirical examples is widely
reported in the literature too (Mortenson et al., 2015; Mikalef et al., 2018; Conboy et
al., 2020) and Coleman et al. (2016) suggests these example use-cases are particularly
important for SME adoption.
The drivers are a critical resource to the logistics operation and key to implementation
of new routing. The drivers need to trust the changes to the static routes to undertake
them and their feedback is critical in continually improving the routing. Literature
highlights “Integrated human-data intelligence” as core for developing BDA
capabilities within a production environment (Belhadi et al., 2019, p. 12) implying the
BDA solution as an enhancement rather than a replacement of human decisioning. This
is perhaps the heart of a DDC with all levels of the organisation viewing and perceiving
data as an enhancement (Akter et al., 2019; Dremel et al., 2020). Vidgen et al. (2017)
highlights a shift in problem-solving skills is often required, not least in senior
Page 64 of 110
management whose support has been shown as important for BDA adoption
(Schoenherr and Speier-Pero, 2015; Shukla and Mattar, 2019). Plus, this is the
population that often performs analytics (Bordeleau et al., 2019). The questionnaire
sample was largely (91%) Management or Senior Management, and though a
proportion (31%) of this sample identified with data-driven decision-making, there
appears not to be the requisite understanding of analytic tools and techniques to
recognise how insights would be derived, which researchers have found is important
for DDC (McAfee et al., 2012; Mikalef et al., 2018; Conboy et al., 2020). Thus, without
the requisite knowledge and understanding to develop the DDC at a senior management
level, adoption of BDA is unlikely.
5.3. Conclusion
The BDA routing algorithm reduced the total distance of the manual experience-led
routing by 4-31%. The results highlight both the complexity of the VRP and the
inefficiencies and inadequacy of manual experience-led routing. BDA routing
algorithm allows the flexibility to plan routes according to demand and unlocks
opportunities to reduce costs, to expand or to improve customer service and
performance. The benefits are suspected to be similar to CVRS but inexpensive and
with the advantage of creating a bespoke solution to the problem. However, the
challenge is having the expertise and understanding to do so. In this case, though there
is an awareness of BD in the organisation, the specific understanding and expertise to
collect, extract and manipulate the BD to build BDA solutions is limited. As highlighted
by the literature, further use-cases that focus on “how” will help develop this
understanding and demonstrate the value of BDA. Internally too, other intangible
resources cited in the literature such as a DDC and senior management support are also
evident from the case which supports the literature from an SME perspective.
Page 65 of 110
Chapter 6 - Conclusion
6.1. Introduction
This chapter concludes the research. It first begins with a summary of the research and
the outcomes before, assessing the wider implications for practitioners and future
research and acknowledging the limitations.
To add practical research to the body of BDA literature, the research strategy followed
a single case study of an SME, Shortridge Limited, with data collection through an
embedded mixed methods design. Descriptive BDA on a sample of telematics data and
other secondary data highlighted the inadequacy of experience-led heuristic routing via
both the complexity of the VRP in context and a deviation from the principles of routing
found in the literature. To simplify the VRP at Shortridge, routes cycled weekly and
were largely static with similar numbers of routes operated each weekday despite
differing customer volumes. Using prescriptive BDA, five versions of Algorithm
routing were developed iteratively in Python using the open-source Google OR-Tools
library. A comparative analysis showed the Algorithm routing reduced total distance
covered by 4-31% and total duration of routes by 4-26% versus the experience-led
Page 66 of 110
heuristic routing. This comparative analysis highlighted further inefficiencies of the
experience-led heuristic routing in the order the customers were visited, the way the
customers were clustered to create routes and likely underutilisation of delivery
vehicles. Such inefficiencies likely lead to unnecessary transportation variable costs for
Shortridge with the results drawing a parallel with the benefits associated with CVRS
in the literature.
Although these results show the value of BDA, challenges were identified in the
analysis such as the BDA routing algorithm struggling to find solutions as the VRP
became more complex and the required expertise and knowledge resources required to
build such a solution. The self-report questionnaire of the employees at Shortridge
showed that perhaps the required technical expertise is unavailable creating a barrier
for adoption of BDA. Additionally, other resource barriers identified from the literature
were also present such as data quality, DDC, senior leadership support, costs, data
protection and understanding “how” to collect the data and generate value. With the
evidence of such barriers, it is comprehensible why experience-led heuristic routing and
CVRS are perhaps more commonly used for routing and why adoption of BDA in SMEs
is limited.
The research provides an example use-case of BDA in real business setting. The
research illustrates the value that can be generated and the opportunities that BDA can
unlock to optimise operational processes. Specifically, the research shows how to setup
an inhouse routing engine (and prerequisites) to optimise routing of delivery vehicles
using opensource data and tools, providing an alternative to CVRS. The research
demonstrates the further opportunities that can be uncovered to improve an
organisation. The BDA solution to routing, enables a customer-focussed logistics
operation, enables objective route and driver performance monitoring and involvement
in holistic demand planning which can be used to expand operations, cut costs and
improve customer service for further competitive advantage. For SMEs, this is
particularly relevant as it shows how their customer service can become inimitable by
larger organisations. Additionally, the research perhaps provides an informal
benchmark for SMEs on which to self-reflect. The barriers to BDA adoption may be
Page 67 of 110
similar in other organisations and the challenge for practitioners and research is to work
out a way of overcoming them.
Future research should look to address the “how” questions. How can organisations and
SMEs overcome the barriers to BDA adoption, how do organisations and SMEs collect
the BD and generate value from it, how do organisations and SMEs cultivate a DDC.
Further case studies, empirical examples and action research of BDA adoption and
usage within different contexts would help to establish a more representative common
case among SMEs. In addition, these examples will provide practitioners with examples
they can look to recreate. The literature has emphasised the need for further research at
the intersection of OR and LSCM and the VRP is a prime example where reality meets
theory. Future VRP methods and solutions should look to be benchmarked on both the
standard benchmarking datasets and in a real context for the research to have more
utility for practitioners. Additionally, comparisons with and among CVRS would also
deliver value for developers, researchers and practitioners.
6.5. Limitations
The research follows a single case study which may mean findings lack generalisability
to the wider population (other SMEs, larger organisations, industries). The data
available, the routing problems, the knowledge of the employees could well be very
different in other cases and the wider population. The research also took place during
the COVID-19 pandemic in which UK Government-imposed lockdown restrictions
limited the choice of research approaches. This impacted the Shortridge too, with the
furloughing of staff and only operating a single plant/depot due to 99% of the customer
base shutting down also. Results from the BDA routing may also be less valid for the
level of operation at Shortridge after the lockdown restrictions are raised and the
questionnaire may have had different results if the furlough was not in place.
The BDA routing relied on the cleansing of the telematics data and several explicitly
called out assumptions. The results and conclusions should only be quoted with
reference to these assumptions. A variation in results would also be expected in a real
implementation as the cleansed telematics data included noise so nodes that were not
Page 68 of 110
customer nodes have been used in the analysis, plus the large 18T vehicles were out of
scope in the analysis.
Page 69 of 110
References
Akter, S., Bandara, R., Hani, U., Wamba, S. F., Foropon, C. and Papadopoulos, T.
(2019) 'Analytics-Based Decision-Making for Service Systems: A Qualitative Study
and Agenda for Future Research', International Journal of Information Management,
48, pp. 85-95.
Akter, S., Wamba, S. F., Gunasekaran, A., Dubey, R. and Childe, S. J. (2016) 'How to
Improve Firm Performance Using Big Data Analytics Capability and Business Strategy
Alignment?', International Journal of Production Economics, 182, pp. 113-131.
Andrade, E. S. D. (2018) How to Use APIs with Pandas and Store the Results in
Redshift. Available at: https://medium.com/@ericsalesdeandrade/how-to-call-rest-
apis-with-pandas-and-store-the-results-in-redshift-2b35f40aa98f (Accessed: 02 July
2020).
Belhadi, A., Zkik, K., Cherrafi, A., Yusof, S. r. M. and El fezazi, S. (2019)
'Understanding Big Data Analytics for Manufacturing Processes: Insights from
Literature Review and Multiple Case Studies', Computers & Industrial Engineering,
137.
Biddle, C. and Schafft, K. A. (2015) 'Axiology and Anomaly in the Practice of Mixed
Methods Work: Pragmatism, Valuation, and the Transformative Paradigm', Journal of
Mixed Methods Research, 9(4), pp. 320-334.
Page 70 of 110
Bing Maps (2020) Distance Matrix API. Available at: https://www.microsoft.com/en-
us/maps/distance-matrix (Accessed: 2 July 2020).
Braekers, K., Ramaekers, K. and Van Nieuwenhuyse, I. (2016) 'The Vehicle Routing
Problem: State of the Art Classification and Review', Computers & Industrial
Engineering, 99, pp. 300-313.
Bräysy, O. and Hasle, G. (2014) 'Chapter 12: Software Tools and Emerging
Technologies for Vehicle Routing and Intermodal Transportation', in Toth, P. and Vigo,
D. (eds.) Vehicle Routing: Problems, Methods, and Applications. 2nd edn.
Philadelphia: Society for Industrial and Applied Mathematics, pp. 351-380.
Bryman, A. (2012) Social Research Methods. 4th edn. New York: Oxford University
Press.
Bughin, J. (2016) 'Big Data, Big Bang?', Journal of Big Data, 3(1), pp. 1-14.
Buonanno, G., Faverio, P., Pigni, F., Ravarini, A., Sciuto, D. and Tagliavini, M. (2005)
'Factors Affecting ERP system Adoption: A Comparative Analysis between SMEs and
Large Companies', Journal of Enterprise Information Management, 18(4), pp. 384-426.
Carlan, V., Huybrechts, T., Hellinckx, P. and Vanelslander, T. (2020) 'A Universal
Middleware Streaming Framework and Data Analytics: Analysing their Economic
Page 71 of 110
Feasibility in Road Transport Planning', Research in Transportation Business &
Management, 34, p. 100424.
CDRC (2018) The General Data Protection Regulation & Social Science Research.
Available at: https://www.cdrc.ac.uk/wp-content/uploads/2018/05/6-GDPR-and-
social-science-research-full-document-1.pdf (Accessed: 8 April 2020).
Chehbi-Gamoura, S., Derrouiche, R., Damand, D. and Barth, M. (2020) 'Insights from
Big Data Analytics in Supply Chain Management: An All-Inclusive Literature Review
Using the SCOR Model', Production Planning & Control, 31(5), pp. 355-382.
Chen, H., Chiang, R. H. and Storey, V. C. (2012) 'Business Intelligence and Analytics:
From Big Data to Big Impact', MIS Quarterly, 36(4), pp. 1165-1188.
Chen, M., Mao, S. and Liu, Y. (2014) 'Big Data: A Survey', Mobile Networks and
Applications, 19(2), pp. 171-209.
Coleman, S., Göb, R., Manco, G., Pievatolo, A., Tort-Martorell, X. and Reis, M. S.
(2016) 'How Can SMEs Benefit from Big Data? Challenges and a Path Forward',
Quality and Reliability Engineering International, 32(6), pp. 2151-2164.
Conboy, K., Mikalef, P., Dennehy, D. and Krogstie, J. (2020) 'Using Business Analytics
to Enhance Dynamic Capabilities in Operations Research: A Case Analysis and
Research Agenda', European Journal of Operational Research, 281(3), pp. 656-672.
Page 72 of 110
Dantzig, G. B. and Ramser, J. H. (1959) 'The Truck Dispatching Problem', Management
Science, 6(1), pp. 80-91.
Del Vecchio, P., Di Minin, A., Petruzzelli, A. M., Panniello, U. and Pirri, S. (2018) 'Big
Data for Open Innovation in SMEs and Large Corporations: Trends, Opportunities, and
Challenges', Creativity and Innovation Management, 27(1), pp. 6-22.
Demchenko, Y., Grosso, P., De Laat, C. and Membrey, P. (2013) 2013 International
Conference on Collaboration Technologies and Systems (CTS). California, USA, 20-
24 May. IEEE.
Department for Business Energy & Industrial Strategy (2019) Business Population
Estimates 2019.UK Government. [Online]. Available at:
https://www.gov.uk/government/statistics/business-population-estimates-2019
(Accessed: 22 February 2020).
Dong, J. Q. and Yang, C.-H. (2020) 'Business Value of Big Data Analytics: A Systems-
Theoretic Approach and Empirical Test', Information & Management, 57(1), p. 103124.
Dremel, C., Herterich, M. M., Wulf, J. and vom Brocke, J. (2020) 'Actualizing Big Data
Analytics Affordances: A Revelatory Case Study', Information & Management, 57(1).
Duan, L. and Xiong, Y. (2015) 'Big Data Analytics and Business Analytics', Journal of
Management Analytics, 2(1), pp. 1-21.
Page 73 of 110
Eiselt, H. and Sandblom, C.-L. (2000) 'Heuristic Algorithms', in Integer Programming
and Network Models. Berlin: Springer, pp. 229-258.
Ferraris, A., Mazzoleni, A., Devalle, A. and Couturier, J. (2019) 'Big Data Analytics
Capabilities and Knowledge Management: Impact on Firm Performance', Management
Decision, 57(8), pp. 1923-1936.
Fontaine, P., Taube, F. and Minner, S. (2020) 'Human Solution Strategies for the
Vehicle Routing Problem: Experimental Findings and a Choice-Based Theory',
Computers & Operations Research, p. 104962.
Page 74 of 110
Furnon, V. (2017) 'Ortools RoutingModel not finding best solution to a VRP in a 14-
node example'. or-tools-discuss: Google. Available at:
https://groups.google.com/forum/#!topic/or-tools-discuss/6KHuJZ3C3VQ (Accessed:
7 July 2020).
Gandomi, A. and Haider, M. (2015) 'Beyond the Hype: Big Data Concepts, Methods,
and Analytics', International Journal of Information Management, 35(2), pp. 137-144.
Geisberger, R., Sanders, P., Schultes, D. and Delling, D. (2008) WEA 2008:
International Workshop on Experimental and Efficient Algorithms. Provincetown,
USA, 30 May - 1 June. Germany: Springer Berlin Heidelberg.
Gendreau, M., Potvin, J.-Y., Bräumlaysy, O., Hasle, G. and Løkketangen, A. (2008)
'Metaheuristics for the Vehicle Routing Problem and Its Extensions: A Categorized
Bibliography', in Golden, B., Raghavan, S. and Wasil, E. (eds.) The Vehicle Routing
Problem: Latest Advances and New Challenges. Boston, MA: Springer US, pp. 143-
169.
George, G., Osinga, E. C., Lavie, D. and Scott, B. A. (2016) 'Big Data and Data Science
Methods for Management Research', Academy of Management Journal, 59(5), pp.
1493-1507.
Ghasemaghaei, M. and Calic, G. (2020) 'Assessing the Impact of Big Data on Firm
Innovation Performance: Big Data is not Always Better Data', Journal of Business
Research, 108, pp. 147-162.
Page 75 of 110
Gibb, A. A. (2000) 'SME Policy, Academic Research and the Growth of Ignorance,
Mythical Concepts, Myths, Assumptions, Rituals and Confusions', International Small
Business Journal, 18(3), pp. 13-35.
Gubbins, E. J. (2003) Managing Transport Operations. 3rd edn. London: Kogan Page.
Hazen, B. T., Boone, C. A., Ezell, J. D. and Jones-Farmer, L. A. (2014) 'Data Quality
for Data Science, Predictive Analytics, and Big Data in Supply Chain Management: An
Page 76 of 110
Introduction to the Problem and Suggestions for Research and Applications',
International Journal of Production Economics, 154, pp. 72-80.
Hindle, G., Kunc, M., Mortensen, M., Oztekin, A. and Vidgen, R. (2020) 'Business
Analytics: Defining the Field and Identifying a Research Agenda', European Journal
of Operational Research, 281(3), pp. 483-490.
Hopkins, J. and Hawking, P. (2018) 'Big Data Analytics and IoT in Logistics: A Case
Study', The International Journal of Logistics Management, 29(2), pp. 575-591.
Hornstra, R. P., Silva, A., Roodbergen, K. J. and Coelho, L. C. (2020) 'The Vehicle
Routing Problem with Simultaneous Pickup and Delivery and Handling Costs',
Computers & Operations Research, 115, p. 104858.
Page 77 of 110
Kayser, V., Nehrke, B. and Zubovic, D. (2018) 'Data Science as an Innovation
Challenge: From Big Data to Value Proposition', Technology Innovation Management
Review, 8(3), pp. 16-25.
Kovács, G., van Hoek, R. and Spens, K. M. (2005) 'Abductive Reasoning in Logistics
Research', International Journal of Physical Distribution & Logistics Management,
35(2), pp. 132-144.
Kuo, R. (2001) 'A Sales Forecasting System Based on Fuzzy Neural Network with
Initial Weights Generated by Genetic Algorithm', European Journal of Operational
Research, 129(3), pp. 496-517.
Lamba, H. S. and Dubey, S. K. (2015) 'Analysis of Requirements for Big Data Adoption
to Maximize IT Business Value', 2015 4th International Conference on Reliability,
Infocom Technologies and Optimization (ICRITO)(Trends and Future Directions).
Noida, India, 2-4 September. pp. 1-6.
LaValle, S., Lesser, E., Shockley, R., Hopkins, M. S. and Kruschwitz, N. (2011) 'Big
Data, Analytics and the Path from Insights to Value', MIT Sloan Management Review,
52(2), pp. 21-32.
Lipworth, W., Mason, P. H., Kerridge, I. and Ioannidis, J. P. A. (2017) 'Ethics and
Epistemology in Big Data Research', Journal of Bioethical Inquiry, 14(4), pp. 489-500.
Page 78 of 110
Lowrie, I. (2017) 'Algorithmic Rationality: Epistemology and Efficiency in the Data
Sciences', Big Data & Society, 4(1).
Luxen, D. and Vetter, C. (2011) 'Real-time routing with OpenStreetMap data', 19th
ACM SIGSPATIAL international conference on advances in geographic information
systems. Chicago, USA, 1-4 November. New York, USA: ACM, pp. 513-516.
McAfee, A., Brynjolfsson, E., Davenport, T. H., Patil, D. and Barton, D. (2012) 'Big
Data: The Management Revolution', Harvard Business Review, 90(10), pp. 60-68.
Mikalef, P., Boura, M., Lekakos, G. and Krogstie, J. (2019) 'Big Data Analytics and
Firm Performance: Findings from a Mixed-Method Approach', Journal of Business
Research, 98, pp. 261-276.
Mikalef, P., Krogstie, J., Pappas, I. O. and Pavlou, P. (2020) 'Exploring the Relationship
Between Big Data Analytics Capability and Competitive Performance: The Mediating
Roles of Dynamic and Operational Capabilities', Information & Management, 57(2), p.
103169.
Mikalef, P., Pappas, I. O., Krogstie, J. and Giannakos, M. (2018) 'Big Data Analytics
Capabilities: A Systematic Literature Review and Research Agenda', Information
Systems and e-Business Management, 16(3), pp. 547-578.
Min, H. (1989) 'The Multiple Vehicle Routing Problem with Simultaneous Delivery
and Pick-up Points', Transportation Research, 23(5), pp. 377-386.
Page 79 of 110
Miwa, T. and Bell, M. G. (2017) 'Efficiency of Routing and Scheduling System for
Small and Medium Size Enterprises Utilizing Vehicle Location Data', Journal of
Intelligent Transportation Systems, 21(3), pp. 239-250.
Montagné, R. and Sanchez, D. T. (2020) A Python Framework for Solving the VRP and
its Variants with Column Generation. Available at: https://github.com/Kuifje02/vrpy
(Accessed: 2 July 2020).
Motorvation (Shows on the Road) Ltd (2020) Truck Spec. Available at:
http://motorv.com/truck-specifications/ (Accessed: 01 July 2020).
Müller, O., Fay, M. and vom Brocke, J. (2018) 'The Effect of Big Data and Analytics
on Firm Performance: An Econometric Analysis Considering Industry Characteristics',
Journal of Management Information Systems, 35(2), pp. 488-509.
Nguyen, T., Li, Z., Spiegler, V., Ieromonachou, P. and Lin, Y. (2018) 'Big Data
Analytics in Supply Chain Management: A State-of-the-Art Literature Review',
Computers & Operations Research, 98, pp. 254-264.
Parragh, S. N., Doerner, K. F. and Hartl, R. F. (2008) 'A Survey on Pickup and Delivery
Problems', Journal für Betriebswirtschaft, 58(2), pp. 81-117.
Pisinger, D. and Ropke, S. (2007) 'A General Heuristic for Vehicle Routing Problems',
Computers & Operations Research, 34(8), pp. 2403-2435.
Page 80 of 110
Raguseo, E., Vitari, C. and Pigni, F. (2020) 'Profiting from Big Data Analytics: The
Moderating Roles of Industry Concentration and Firm Size', International Journal of
Production Economics, p. 107758.
Roßmann, B., Canzaniello, A., von der Gracht, H. and Hartmann, E. (2018) 'The Future
and Social Impact of Big Data Analytics in Supply Chain Management: Results from a
Delphi study', Technological Forecasting and Social Change, 130, pp. 135-149.
Rushton, A., Croucher, P. and Baker, P. (2010) The Handbook of Logistics and
Distribution Management. 4th edn. London: Kogan Page Limited.
Rushton, A., Croucher, P. and Baker, P. (2014) The Handbook of Logistics &
Distribution Management 5th edn. London: Kogan Page.
Russom, P. (2011) 'Big Data Analytics', TDWI Best Practices Report, Fourth Quarter.
Saunders, M., Lewis, P. and Thornhill, A. (2016) Research Methods for Business
Students. 7th edn. Harlow, UK: Pearson Education.
Page 81 of 110
Schoenherr, T. and Speier-Pero, C. (2015) 'Data Science, Predictive Analytics, and Big
Data in Supply Chain Management: Current State and Future Potential', Journal of
Business Logistics, 36(1), pp. 120-132.
Seddon, J. J. J. M. and Currie, W. L. (2017) 'A Model for Unpacking Big Data Analytics
in High-Frequency Trading', Journal of Business Research, 70, pp. 300-307.
Seyedghorban, Z., Tahernejad, H., Meriton, R. and Graham, G. (2020) 'Supply Chain
Digitalization: Past, Present and Future', Production Planning & Control, 31(2-3), pp.
96-114.
Shah, S., Soriano, C. B. and Coutroubis, A. (2017) 'Is Big Data for Everyone? The
Challenges of Big Data Adoption in SMEs', 2017 IEEE International Conference on
Industrial Engineering and Engineering Management (IEEM). Singapore, 10-13
December. IEEE, pp. 803-807.
Shukla, M. and Mattar, L. (2019) 'Next Generation Smart Sustainable Auditing Systems
Using Big Data Analytics: Understanding the Interaction of Critical Barriers',
Computers & Industrial Engineering, 128, pp. 1015-1026.
Solomon, M. M. (1987) 'Algorithms for the Vehicle Routing and Scheduling Problems
with Time Window Constraints', Operations Research, 35(2), pp. 254-265.
Surbakti, F. P. S., Wang, W., Indulska, M. and Sadiq, S. (2020) 'Factors Influencing
Effective Use of Big Data: A Research Framework', Information & Management, 57(1).
Page 82 of 110
Syed, A., Gillela, K. and Venugopal, C. (2013) 'The Future Revolution on Big Data',
Future, 2(6), pp. 2446-2451.
Taillard, É. D. (1999) 'A Heuristic Column Generation Method for the Heterogeneous
Fleet VRP', RAIRO-Operations Research, 33(1), pp. 1-14.
Vidal, T., Laporte, G. and Matl, P. (2020) 'A Concise Guide to Existing and Emerging
Vehicle Routing Problem Variants', European Journal of Operational Research,
286(2), pp. 401-416.
Page 83 of 110
Vidgen, R., Shaw, S. and Grant, D. B. (2017) 'Management Challenges in Creating
Value from Business Analytics', European Journal of Operational Research, 261(2),
pp. 626-639.
W.S Hunt's Transport Ltd (2015) Dimensions and Capabilities. Available at:
https://huntstransport.co.uk/our-fleet/dimensions-and-capabilities/ (Accessed: 01 July
2020).
Wagner, S. M., Ullrich, K. K. and Transchel, S. (2014) 'The Game Plan for Aligning
the Organization', Business Horizons, 57(2), pp. 189-201.
Waller, M. A. and Fawcett, S. E. (2013) 'Data Science, Predictive Analytics, and Big
Data: A Revolution that will Transform Supply Chain Design and Management',
Journal of Business Logistics, 34(2), pp. 77-84.
Wamba, S. F., Akter, S., Edwards, A., Chopin, G. and Gnanzou, D. (2015) 'How ‘Big
Data’ can make Big Impact: Findings from a Systematic Review and a Longitudinal
Case Study', International Journal of Production Economics, 165, pp. 234-246.
Wamba, S. F., Gunasekaran, A., Akter, S., Ren, S. J.-f., Dubey, R. and Childe, S. J.
(2017) 'Big Data Analytics and Firm Performance: Effects of Dynamic Capabilities',
Journal of Business Research, 70, pp. 356-365.
Wang, G., Gunasekaran, A., Ngai, E. W. and Papadopoulos, T. (2016) 'Big Data
Analytics in Logistics and Supply Chain Management: Certain Investigations for
Research and Applications', International Journal of Production Economics, 176, pp.
98-110.
Yin, R. K. (2018) Case Study Research and Applications: Design and Methods. 6th
edn. California: Sage publications.
Page 84 of 110
Zheng, P., Sang, Z., Zhong, R. Y., Liu, Y., Liu, C., Mubarok, K., Yu, S. and Xu, X.
(2018) 'Smart Manufacturing Systems for Industry 4.0: Conceptual Framework,
Scenarios, and Future Perspectives', Frontiers of Mechanical Engineering, 13(2), pp.
137-150.
Page 85 of 110
Appendices
o Yes
o No
Skip To: End of Survey If I have read the information sheet and have an understanding of what the
research is about, what m... = No
Q2 I voluntarily consent to be a participant in this research and understand that I can
refuse to answer questions, I can withdraw from the study at any time without giving
a reason and that the information I provide will be kept anonymous
o Yes
o No
Skip To: End of Survey If I voluntarily consent to be a participant in this research and understand that I
can refuse to an... = No
Q3 If you were faced with a problem in your normal work, such as planning a large
production line, choosing to offer an additional product/service, or a change in
regulations or guidelines; would you...
o use your own experience and judgement to solve the problem and make a decision
o identify and speak with colleagues (or the internet) who may have an answer and
between you make a choice
o collect data on the problem from lots of sources and using analysis outputs to
make a decision
Q5 In your day-to-day work, can you think of any examples of where Big Data might
be created or is used?
Q6 What do you think the obstacle(s) are to using Big Data in your place of work?
Page 86 of 110
Q7 Have you ever used any of the following?
○ Microsoft Excel
○ Python/R
○ Machine Learning
○ Clustering/Segmentation
○ Linear Programming/Optimisation
○ Hadoop/Spark/Hive etc
○ Regression Analysis
Q8 Thank you for your time and your answers. I would appreciate any other thoughts,
questions or considerations that you have about your understanding of big data and any
other feedback on this questionnaire
Page 87 of 110
– Key python packages used in the analysis (source: author)
pipwin 0.5.0 Used to install packages that failed installation with pip
polyline 1.4.0 Used to interpret the route output between two locations
in OSRM
requests 2.23.0 Used for making HTTP requests for Geocoding and
Routing
Page 88 of 110
– Data Cleansing activity detail (source: author)
Page 89 of 110
Darlington – DL1
4QD, Dumfries –
DG2 0HS, and
Workington –
CA14 4JX.
Routes not 44 routes were identified that did not begin or end at one of the The 44 were
starting or depot postcodes (DL1 4QD, DG2 0HS, CA14 4JX) removed leaving
ending at 630 routes and
depot 8812 rows in the
location dataset.
Duplicated
There are routes where a postcode is repeated, indicating it was Postcodes are
“End
visited more than once for a particular route. This is likely to be deduplicated in
Postcodes”
noise in the data and unlikely to represent multiple visits to the routes. Also,
in routes
same customer. Therefore, only the first instance of the repeated repeated nodes in
postcode is kept. This aligns the original routes with the depot-level
requirements of the routing solvers which cannot have repeated analysis are
nodes. Another way of approaching this would be to assume removed across
these are different nodes and create dummy nodes for each routes so any
repetition, however, this would likely lead to further noise and depot-day pair has
bias within the results as the unique list of nodes would a unique set of
approach 7000 versus the 1720 without repeated nodes and a postcodes to
further departure from the Shortridge customer base of around deliver to.
1000 customers.
This issue exists again at depot level used for depot level
analysis (Analysis 2 to Analysis 5), where routes on the same
day may share a particular postcode. In this instance, the
postcode is kept only once and repeats are removed from other
routes on that day. In this way, the list of nodes in the routes in
the data and the algorithm are unique and are the same. Though,
again may highlight further limitations in the current method of
routing where similar areas are covered more than once in the
same route and by multiple routes.
Distances
Though the telematics dataset contains columns related to Route distances
and
distance and time, these were actual travel distances and times and route times
durations
and will include effects of traffic etc. To ensure a like-for-like were taken from
comparison, only distances and times from OSRM were used OSRM only
for the analysis.
Page 90 of 110
– Routing Engine setup
A method of calculating the distance between each of the locations is required to find
the optimum routes and for the research to hold real-world value, actual distances are
used rather than crows-fly distance calculation from the latitude and longitude with the
Haversine formula (Robusto, 1957). Routing distances are often sourced in a pair-wise
distance matrix, whereby the origins are columns of the matrix and destinations the row
with the route distance the cell of intersection. Appendix Figure 1 illustrates the distance
from B to C and note that the distance from C to B is slightly longer.
Page 91 of 110
Appendix Table 1 – Example providers of distance matrices and the cost (source:
author)
*an element is a single cell within a distance matrix e.g. Appendix Figure 1 has 16 elements
**this research used a 1720×1720 matrix which has 2,958,400 elements
Page 92 of 110
Appendix Figure 4 - Screenshot of the frontend of OSRM running on local machine
(source: author)
Page 93 of 110
– Setting the time-limit for the routing algorithm
(𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑜𝑑𝑒𝑠)2
𝑡𝑖𝑚𝑒 𝑙𝑖𝑚𝑖𝑡 (𝑠𝑒𝑐𝑜𝑛𝑑𝑠) = 30 + ⌊ ⌋
𝑐
where the “number of nodes” is the unique number of nodes in the baseline route and c is a
constant. Note the brackets return an integer.
Page 94 of 110
– Deriving customer demand for Analyses 3,4,5
Customer demand for bags of linen is derived based on the assumption from Shortridge
that customers that are visited most frequently tend to be larger customers and have
larger orders. Thus, the customer demand is modelled as proportional to the per week
frequency a customer was delivered to in February 2020. This frequency is calculated
using Appendix Equation 2.
where “Node PerWeekFreq” is the minimum of either the average number of times per week a
node is visited during February 2020 and 6 (the maximum integer number of days per week
any node can be serviced, assumption viii.), and 4 is the number of complete weeks in February
2020.
The demand for a customer/node is not static and is modelled to vary based on the
demand of the other customers in the route and the capacity of the original delivery
vehicle (Table 4-2). Therefore, the node demand for a particular customer, j, in any one
of the 72 records is as described in Appendix Equation 3.
where “Baseline Vehicle Capacity” is the capacity in bags of the original delivery vehicle, and
is divided by the sum of the “Node PerWeekFreq” for all nodes, n, in the original delivery
route. This is multiplied by the “Node PerWeekFreq” for customer j and multiplied by the
vehicle “Utilisation” – a percentage between 0 and 100%. Node Demand is then rounded down
to the nearest integer.
Appendix Figure 5 - Python code that derives the customer/node demand. Here original
vehicle utilisation is set to 95% (source: author)
Page 95 of 110
– Additional analysis – differences in routing between the depots
Of the three depots, Workington served 713 nodes in February with Dumfries and
Darlington serving 557 and 466, respectively. The box plots (Appendix Figure 6) also
indicate a wider variance in route distances from Workington routes which is likely
driven by the differences in the nodes to delivered to per day and per route (Appendix
Figure 7).
Appendix Figure 6 - Box Plots per depot of the total distance (A) and duration (B) in
original routing (source: author)
Appendix Table 2 - Descriptive Statistics for total distance and total time for each depot
in original routing (source: author)
Page 96 of 110
Appendix Figure 7 - Distribution of the number of nodes to be delivered to each day in
February (source: author)
Appendix Figure 8 - Map of nodes and depot serviced from. Tooltip indicates the depot
location (Red = Dumfries, Blue = Workington, Black = Darlington) (source: author)
Page 97 of 110
Throughout the comparative analyses, algorithm routing shows the greatest reduction
in total distance and duration compared to the original routing from the Workington
depot. For example, routes from Workington depot reduced total distance by an average
25%, compared with Dumfries (17%) and Darlington (12%) in Analysis 2. Even by
simply reordering the nodes in the route in Analysis 1 saw routes from Workington
account for 17 of the 25 routes making a 20% reduction in distance covered.
Appendix Figure 9 shows the visual differences in route distributions between the
depots in Analysis 2. Whilst there is a statistically significant difference in all algorithm
routing distributions compared to the original routing (p-values for Wilcoxon 2-sample
test: 7.4×10-9 (Workington), 5.6×10-5 (Dumfries) and 0.001 (Darlington)), it is much
less pronounced at Darlington depot. Using routing from Darlington on 13/02/2020
which had 73 nodes to service. The five routes in the original routing (Appendix Figure
10) are very similar visually to those proposed by the algorithm (Appendix Figure 11).
Perhaps there are different routing methods between the depots or since the Darlington
depot appears to cover a greater area (), a manual approach to clustering routes is easier.
Darlington Depot
Appendix Figure 9 - Histograms of the route distance from each depot under Original
routing and Algorithm routing in Analysis 2 (source: author)
Page 98 of 110
Appendix Figure 10 - Original routing from Darlington on 13/02/2020 (source: author)
Page 99 of 110
The changes to routing and vehicle selection also have a differing impact on the number
of vehicle journeys from each depot versus the original routing. Workington sees the
largest reduction in vehicle journeys going from a median of 9.5 to 6.5 with improved
vehicle selection (Appendix Table 3) but this remains the same at Darlington (Appendix
Table 5).
Appendix Table 4 - Selected descriptive statistics of the number of vehicle journeys from
Dumfries depot (source: author)
A questionnaire will be shared across Shortridge Ltd asking questions about data and
analytics understanding at Shortridge. The anonymous responses will provide
justification and a background for the research to help future readers interpret the
results.
It is up to you to decide whether or not to take part. You do not have to take part if you
do not want to. If you do decide to take part, please follow the link in the email and
answer positively to the two statements in the questionnaire related to consent.
You can withdraw at any point of the study, without having to give a reason. If any
questions during the questionnaire make you feel uncomfortable, you do not have to
answer them. Withdrawing from the study will have no effect on you. If you withdraw
from the study, I will not retain the information you have given thus far, unless you are
happy for me to do so.
The records from this study will be kept as confidential as possible. The data will be
stored securely on the University systems. Only myself, my supervisor and exam
markers will have access to the data generated by the study. Your data will be
anonymised – your name is not recorded so will not be used in any reports or
publications resulting from the study. Any hard copies of research information will be
kept in locked files at all times.
The legal basis used to process your personal data will be Legitimate interests. The
legal basis used to process special category personal data (e.g. data that reveals racial
or ethnic origin, political opinions, religious or philosophical beliefs, trade union
membership, health, sex life or sexual orientation, genetic or biometric data) will be for
scientific and historical research or statistical purposes.
To request a copy of the data held about you please contact Edmund, egh3@hw.ac.uk.
If you have any questions regarding this study, please contact the researcher: Edmund
Houldridge (egh3@hw.ac.uk)
If you have any concerns or complaints regarding the conduct of this research, in the
first instance please contact Dr Adam Gripton (a.gripton@hw.ac.uk)
If you are dissatisfied with the response from my supervisor, please contact the School
of Social Sciences Research Officer: Dr James Richards (j.richards@hw.ac.uk)
START OF SCRIPT
#!/usr/bin/env python
# coding: utf-8
# In[4]:
import pandas as pd,requests,folium, polyline,json
import numpy as np
import networkx as nx
import vrpy as vrp
import matplotlib.pyplot as plt
import pickle
import datetime
import collections
from future import print_function
from ortools.constraint_solver import routing_enums_pb2
from ortools.constraint_solver import pywrapcp
pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', 500)
#load inputs
location = "C:/Users/Ed/OneDrive - Heriot-Watt University/Uni
work/Dissertation/ShortridgeData/VRPinput_v2/VRPinputs3_allv_2.txt"
with open(location, "rb") as fp: # Unpickling
list_of_nodes_bl = pickle.load(fp)
# In[6]:
#load matrices
location_dist = "C:/Users/Ed/OneDrive - Heriot-Watt University/Uni
work/Dissertation/ShortridgeData/Distance_matrix1.txt"
location_dur = "C:/Users/Ed/OneDrive - Heriot-Watt University/Uni
work/Dissertation/ShortridgeData/Duration_matrix1.txt"
with open(location_dist, "rb") as fp: # Unpickling
distance_matrix = pickle.load(fp)
with open(location_dur, "rb") as fp: # Unpickling
duration_matrix = pickle.load(fp)
# In[7]:
#for testing
#list_of_nodes_bl = list_of_nodes_bl.head(3)
# In[8]:
transit_callback_index =
routing.RegisterTransitCallback(distance_callback)
#Solve!
solution = routing.SolveWithParameters(search_parameters)
vrp_output['routes'].append(nodes+[manager.IndexToNode(index)])
vrp_output['r_node_capacity'].append(loads)
vrp_output['r_capacities'].append(route_load)
plan_output += ' {0}
Load({1})\n'.format(manager.IndexToNode(index),route_load)
plan_output += 'Distance of the route:
{}m\n'.format(route_distance)
plan_output += 'Load of the route: {}\n'.format(route_load)
vrp_output['r_distance'].append(route_distance)
vrp_output['r_node_dists'].append(dists)
print(plan_output)
total_distance += route_distance
total_load += route_load
vrp_output['total_load'] = total_load
vrp_output['total_distance'] = total_distance
print('Total Distance of all routes: {}m'.format(total_distance))
print('Total load of all routes: {}\n\n'.format(total_load))
return vrp_output
def _flatten(l):
for el in l:
if isinstance(el, collections.abc.Iterable) and not
isinstance(el, (str, bytes)):
yield from flatten(el)
else:
yield el
# In[12]:
def _loop_frame (row):
#Vehicles
no_vehicles = len(row['Bag_Capacity_all'])
v_capacity = row['Bag_Capacity_all']
except:
vrp = {}
vrp['total_distance'] = row['total_distance2']
vrp['total_time'] = row['total_time2']
vrp['vehicle'] = [i for i in range(len(row['Bag_Capacity']))]
vrp['route_fix'] = row['routes2']
vrp['r_node_times'] = []
vrp['r_node_dists'] = []
vrp['r_distance'] = row['r_distance2']
vrp['route_time'] = row['route_time2']
vrp['total_load'] = sum(row['Demand2'])
vrp['r_node_capacity'] = []
vrp['algo_solve'] = 'N'
return (vrp['total_distance'],
vrp['total_time'],
vrp['vehicle'],
vrp['route_fix'],
vrp['r_node_times'],
vrp['r_node_dists'],
vrp['r_distance'],
vrp['route_time'],
vrp['total_load'],
vrp['r_node_capacity'],
vrp['algo_solve'])
# ### 4. Run Loop over dataset and add algo columns to dataset
# In[13]:
cols = ['total_distance_algo',
'total_time_algo',
'vehicle_algo',
'routes_algo',
'r_node_times_algo',
'r_node_dists_algo',
'r_distance_algo',
'r_time_algo',
'total_load',
'r_node_capacity',
'algo_solve']
list_of_nodes_bl[cols] = list_of_nodes_bl.apply(lambda row:
pd.Series(_loop_frame(row)),axis=1)
# In[14]:
#save file
location = "C:/Users/Ed/OneDrive - Heriot-Watt University/Uni
work/Dissertation/ShortridgeData/4.
All_vehicles/5.All_vehicles_algo_time_95"+today+".txt"
with open(location, "wb") as fp: #Pickling
pickle.dump(list_of_nodes_bl, fp)
# In[42]:
# In[15]:
list_of_nodes_bl.to_csv("C:/Users/Ed/OneDrive - Heriot-Watt
University/Uni work/Dissertation/ShortridgeData/4.
All_vehicles/5.All_vehicles_algo_time_95"+today+".csv")
# In[ ]:
END OF SCRIPT