Download as pdf or txt
Download as pdf or txt
You are on page 1of 122

MSc Logistics and Supply Chain Management

with Business Analytics

2019-2020

Leveraging Big Data Analytics in an SME: a comparison of an


experience-led heuristic and an analytical approach to a real
Vehicle Routing Problem

Presented for the award of MSc.


BDA Dissertation Example

Acknowledgements

A personal thanks to the Shortridge Ltd for agreeing to be the case organisation, and
for the employees for taking the time to participate in the research and providing the
information needed to complete the dissertation especially with the chaos and
substantial impacts of the COVID-19 pandemic.

Thank you to my supervisor, Dr Adam Gripton for providing support, time, and
experience throughout the dissertation process.

Declaration

I declare that the thesis embodies the results of my own work and has been composed
by myself and meets the University policies on plagiarism and ethical research.
Where appropriate within the thesis I have made full acknowledgement to the work
and ideas of others or have made reference to work carried out in collaboration with
other persons.

Signature of student: Date : 2 August 2020

Word count: 14574

Page ii of xii
BDA dissertation Heriot-Watt University
sample

Abstract

Purpose: The aim of this research is to investigate how a BDA solution to a VRP
compares to an experience-led heuristic in an SME and to understand the issues this
case highlights in SME adoption of BDA.

Methodology: Research strategy followed a single case study with an embedded mixed
methods design. Primary data was collected using a self-report online questionnaire. A
purposeful sample of the office employees at the case organisation resulted in 13
responses (response rate of 61.9%). Secondary data consisted of the February 2020
extract of telematics data from the case organisation plus opensource data upon which
BDA (descriptive and prescriptive) was performed using Python (v3.8.3 64bit) and
Google OR-Tools (v7.6.7691).

Findings: BDA solution to the VRP performed better than experience-led heuristic in
all simulations, reducing total distance covered by between 4-31% and total duration of
routes by 4-26%. Barriers to adoption of BDA consistent with the literature emerged
from analysis and reported by employees. These include limited technical expertise,
data-driven culture, and understanding of “how” to collect the data and generate value.

Research value: The research illustrates the value of BDA in solving a real business
problem and provides an example use-case for SMEs. In addition, it provides evidence
of the barriers to BDA adoption in SMEs from the literature.

Practical implications: Provides an example to show the inefficiency of manually


solving complex problems and subsequently indicates the transformative nature of
BDA. Specifically, shows how to setup an inhouse routing engine (and prerequisites)
to optimise routing of delivery vehicles as an alternative to CVRS.

Limitations: A single case study so results may lack generalisability. Research method
selection constrained, and validity of findings also impacted by COVID-19 pandemic.

Page iii of xii


BDA dissertation Heriot-Watt University
sample

Table of Contents

ACKNOWLEDGEMENTS ....................................................................................... II

ABSTRACT ............................................................................................................... III

LIST OF FIGURES ................................................................................................. VII

LIST OF TABLES ...................................................................................................... X

ABBREVIATIONS .................................................................................................. XII

CHAPTER 1 - INTRODUCTION .............................................................................. 1

CHAPTER 2 - LITERATURE REVIEW.................................................................. 3

2.1. Introduction ............................................................................................................. 3


2.2. Big Data and Big Data Analytics ............................................................................ 3
2.2.1. Defining Big Data ................................................................................................ 3
2.2.2. Defining Big Data Analytics (BDA).................................................................... 4
2.2.3. Other Data Analytics ............................................................................................ 5
2.2.4. Applications of BDA in LSCM ........................................................................... 8
2.3. Small Medium Enterprises (SMEs) ........................................................................ 8
2.3.1. SMEs and BDA.................................................................................................... 8
2.3.2. Resource Based View (RBV) of the firm ............................................................ 9
2.3.3. Big Data Analytics Capability (BDAC) ............................................................ 11
2.3.4. Development of BDAC in SMEs ....................................................................... 12
2.4. Logistics and the Vehicle Routing Problem (VRP) .............................................. 12
2.4.1. Background to the VRP ..................................................................................... 12
2.4.2. Solving the VRP ................................................................................................ 14
2.4.3. Applying VRPs in practice ................................................................................ 16
2.5. Conclusion ............................................................................................................ 18
2.5.1. Research Questions ............................................................................................ 18
2.5.2. Research Aim ..................................................................................................... 19
2.5.3. Research Objectives ........................................................................................... 19

CHAPTER 3 - METHODOLOGY........................................................................... 20

3.2. Research Philosophy ............................................................................................. 20


Page iv of xii
BDA dissertation Heriot-Watt University
sample
3.2.1. Ontology ............................................................................................................ 21
3.2.2. Epistemology ..................................................................................................... 21
3.2.3. Axiology ............................................................................................................ 22
3.2.4. Approaches to theory development ................................................................... 22
3.3. Research Strategy.................................................................................................. 22
3.4. The case: Shortridge Ltd. ...................................................................................... 23
3.5. Data Collection ..................................................................................................... 24
3.5.1. Primary Data ...................................................................................................... 24
3.5.2. Secondary Data .................................................................................................. 25
3.6. Data Analysis plan ................................................................................................ 26
3.6.1. Qualitative Analysis ........................................................................................... 27
3.6.2. Quantitative Analysis ......................................................................................... 28
3.7. Ethical Issues ........................................................................................................ 29
3.8. Conclusion ............................................................................................................ 30

CHAPTER 4 - FINDINGS AND DATA ANALYSIS ............................................. 31

4.1. Introduction ........................................................................................................... 31


4.2. Quantitative Analysis ............................................................................................ 31
4.2.1. Problem Context ................................................................................................ 32
4.2.2. Key preliminary activity .................................................................................... 33
4.2.3. Results and Analysis .......................................................................................... 38
4.2.4. Limitations ......................................................................................................... 55
4.3. Qualitative Analysis .............................................................................................. 56
4.3.1. Demographics of the sample .............................................................................. 56
4.3.2. Problem-Solving (Q3) ........................................................................................ 57
4.3.3. Example of company using BD (Q4) ................................................................. 57
4.3.4. Examples of BD at Shortridge (Q5)................................................................... 58
4.3.5. Barriers to using BD at Shortridge (Q6) ............................................................ 58
4.3.6. BD/BDA tools and techniques (Q7) .................................................................. 59
4.4. Conclusion ............................................................................................................ 60

CHAPTER 5 - DISCUSSION ................................................................................... 62

5.1. Introduction ........................................................................................................... 62


5.2. Discussion ............................................................................................................. 62
5.2.1. Research Question 1 - How does a BDA solution to a VRP compare to an
experience-led heuristic?.............................................................................................. 62
Page v of xii
BDA dissertation Heriot-Watt University
sample
5.2.2. Research Question 2 - How does a BDA solution to a VRP compare to an
experience-led heuristic?.............................................................................................. 63
5.3. Conclusion ............................................................................................................ 65

CHAPTER 6 - CONCLUSION................................................................................. 66

6.1. Introduction ........................................................................................................... 66


6.2. Research Summary and Outcomes ....................................................................... 66
6.3. Implications for practitioners ................................................................................ 67
6.4. Implications for future research ............................................................................ 68
6.5. Limitations ............................................................................................................ 68

REFERENCES ........................................................................................................... 70

APPENDICES ............................................................................................................ 86

- Questions from Questionnaire (source: author) ..................................... 86


– Key python packages used in the analysis (source: author) .................. 88
– Data Cleansing activity detail (source: author) ..................................... 89
– Routing Engine setup ............................................................................ 91
– Setting the time-limit for the routing algorithm .................................... 94
– Deriving customer demand for Analyses 3,4,5 ..................................... 95
– Additional analysis – differences in routing between the depots .......... 96
– Initial coding template used in template analysis ................................ 101
– Information sheet sent to questionnaire participants (adapted from
template) .................................................................................................................... 102
– Python script used in Analysis 5 (source: author) ............................. 104

Page vi of xii
Heriot-Watt University

List of Figures

Figure 2-1 - The eight attributes of Big Data (adapted from Mikalef et al. (2018) and
Belhadi et al. (2019)) ..................................................................................................... 4

Figure 2-2 - Types of Big Data Analytics (source: Belhadi et al. (2019, p. 3)) ............. 5

Figure 2-3 - An illustration of The Sweep Method with a suboptimal solution (left)
versus the optimal solution (right) for a 3-node-capacity vehicle (source: author) ..... 15

Figure 3-1 – Diagram depicting the research design (source: author) ......................... 24

Figure 4-1 - Python code used to geocode the postcodes from the telematics data
(source: author) ............................................................................................................ 35

Figure 4-2 - Python code to request distance and duration matrix from local OSRM
(source: author) ............................................................................................................ 35

Figure 4-3 - Example of how demand was derived based on delivery frequency and
capacity of original delivery vehicle (source: author) ................................................. 37

Figure 4-4 - Time series of the number of routes operated in February (source: author)
......................................................................................................................................40

Figure 4-5 – Time series of the number of nodes serviced per day during February
(source: author) ............................................................................................................ 40

Figure 4-6 - Time series of total distance (left) and total duration (right) of routes during
February (source: author) ............................................................................................. 42

Figure 4-7 - Map of nodes visited in February 2020 with red denoting location visited
on a Wednesday (source: author) ................................................................................. 42

Figure 4-8 - Map of how many times per week a node is serviced: grey - once, black -
twice and red – three and more (source: author) .......................................................... 43

Figure 4-9 - Histogram of percentage difference in route distance (left) and duration
(right) between Original and Algorithm routes in Analysis 1 (source: author) ........... 44

Figure 4-10 - Scatter graphs of the number of stops versus %difference between the
Original and Algorithm routing in route distance (left) and time (right) in Analysis 1
(source: author) ............................................................................................................ 44

Page vii of xii


Figure 4-11 – Network plot of Original route “2020-02-26+PX19LCJ” (blue) and
equivalent Algorithm route (teal) with depot in red (source: author) .......................... 45

Figure 4-12 – Map plot of Original route “2020-02-26+PX19LCJ” with the red tooltip
indicating the depot (source: author) ........................................................................... 46

Figure 4-13 - Map plot of Algorithm route “2020-02-26+PX19LCJ” with the red tooltip
indicating the depot (source: author) ........................................................................... 46

Figure 4-14 – the distribution of route distances for Original and Algorithm routing.
Adjusting only the order of routes in Analysis 1 (left) and allowing the Algorithm to
route at depot level in Analysis 2 (right) (source: author) ........................................... 47

Figure 4-15 – Network view of Original routing (left) and Algorithm routing (right)
from Workington depot on 05/02/2020 in Analysis 2 (source: author) ....................... 48

Figure 4-16 – Comparison in vehicle distance (left) and time (right) between Original
routing and Algorithm routing from Workington depot on 05/02/2020 in Analysis 2
(source: author) ............................................................................................................ 48

Figure 4-17 – Map plot of Original routing from Workington depot on 05/02/2020
(source: author) ............................................................................................................ 49

Figure 4-18 - Map plot of Algorithm routing from Workington depot on 05/02/2020 in
Analysis 2 (source: author) .......................................................................................... 49

Figure 4-19 – Number of nodes visited per vehicle for the Original routing and the
Algorithm routing under 95% CoODV simulation (left) and the bag capacity of the
vehicles (right) (source: author) ................................................................................... 51

Figure 4-20 - Number of nodes visited per vehicle for the Original routing and the
Algorithm routing under 95% CoODV simulation (left) and the bag capacity of the
vehicles (right) with vehicle selection (source: author) ............................................... 51

Figure 4-21 - Number of nodes visited per vehicle for the Original routing and the
Algorithm routing under 85% CoODV simulation with vehicle selection (source:
author) .......................................................................................................................... 51

Figure 4-22 – Route duration for the Original routing and the Algorithm under 95%
CoODV simulation with vehicle selection (source: author) ........................................ 52

Page viii of xii


Figure 4-23 - Map plot of Algorithm routing from Workington depot on 05/02/2020 in
with 95% CoODV and vehicle selection (source: author) ........................................... 52

Figure 4-24 – Routes from Original routing and Algorithm routing in Analysis 5 with
the route time constraint (source: author) .................................................................... 54

Figure 4-25 – Example of a route that returns to the depot twice (source: author) ..... 55

Figure 4-26 – Role of participant at Shortridge (left) and length of time with Shortridge
(right) (source: author) ................................................................................................. 56

Figure 4-27 – Selected software, services and programming options to Q7 from the
questionnaire (source: author)...................................................................................... 60

Figure 4-28 – Selected responses to analytical techniques in Q7 of the questionnaire


(source: author) ............................................................................................................ 60

Page ix of xii
List of Tables

Table 2-1 - Table of terms from the literature that represent interrogating data to
generate value (source: author) ...................................................................................... 7

Table 2-2 - Common Vehicle Routing Problem variants (source: adapted from Rincon-
Garcia et al. (2017, p. 128)) ......................................................................................... 14

Table 2-3 - Common methods in the literature for solving Vehicle Routing Problems
(adapted from Güneri (2007) and Gendreau et al. (2008)) .......................................... 15

Table 3-1 - Table of research questions, aim and objectives (source: author) ............. 20

Table 3 2 - Additional secondary data collected during the research (source: author)26

Table 3-3 - Summary of the approach to data analysis and how it aligns with research
aims and objectives (source: author) ............................................................................ 27

Table 3-4 – Summary of Big Data Analysis approach (source: author) ...................... 28

Table 4-1 - Description of the comparative BDA analyses (source: author) ............... 31

Table 4-2 - Shortridge vehicle fleet (types and capacities) (source: author) ............... 32

Table 4-3 - Identification of the critical data elements from the telematics dataset
(source: author) ............................................................................................................ 34

Table 4-4 - Assumptions for each version of the routing algorithm analysis (source:
author) .......................................................................................................................... 36

Table 4-5 - Number of each vehicle type at each depot (source: author) .................... 38

Table 4-6 - Summary of the five analyses setups and results (source: author) ............ 39

Table 4-7 - Descriptive statistics of the number of routes per day in February (source:
author) .......................................................................................................................... 41

Table 4-8 - Descriptive statistics of the number of nodes serviced per day (source:
author) .......................................................................................................................... 41

Table 4-9 – Percentage of available journeys* vehicle type is used during February
2020 (source: author) ................................................................................................... 50

Table 4-10 – Average usage of each vehicle type during February 2020 (source: author)
......................................................................................................................................53

Page x of xii
Table 4-11 – Description of records with and without solutions from the routing
algorithm in Analysis 5 (source: author)...................................................................... 54

Table 4-12 – Responses from Q3 of the questionnaire (source: author)...................... 57

Table 4-13 – Summary of responses to Q6 from the questionnaire (source: author) .. 59

Page xi of xii
Abbreviations

3PL 3rd Party Logistics Provider

4PL 4th Party Logistics Provider

BA Business Analytics

BDA Big Data Analytics

BDAC Big Data Analytics Capability

BD Big Data

BI Business Intelligence

BI&A Business Intelligence & Analytics

CDE Critical Data Element

CHVRP Capacitated Heterogenous-fleet Vehicle Routing Problem

COVID-19 Coronavirus Disease 2019

CoODV Capacity of Original Delivery Vehicle

CRM Customer Relationship Management

CVRP Capacitated Vehicle Routing Problem

CVRS Computerised Vehicle Routing Software

DC Dynamic Capabilities

DDC Data Driven Culture

ERP Enterprise Resource Planning

LSCM Logistics and Supply Chain Management

OR Operational Research

OSRM Open Source Routing Machine


RBV Resource Based View

SME Small Medium Enterprise

TCHVRP Time-constrained Capacitated Heterogenous-fleet Vehicle Routing


Problem
TSP Travelling Salesperson Problem
VRP Vehicle Routing Problem

Page xii of xii


Chapter 1 - Introduction

By 2025, the European Commission projects the EU data economy will grow by 275%
from €301b in 2018 to €829b with the volume of data to grow by 530% in the same
timeframe (European Commission, 2020). Big Data Analytics (BDA) is seen as a
capability for firms to extract value from the ever-growing volumes of data (H Chen et
al., 2012; Wang et al., 2016; Nguyen et al., 2018). Among the benefits, BDA enables
objectivity and transparency in decision-making (Belhadi et al., 2019). From the
replacement of manual processes, BDA has been shown to increase profitability
(Raguseo et al., 2020) through reduced operational costs (Carlan et al., 2020), improved
efficiency (Mikalef et al., 2018) and increased productivity (Müller et al., 2018; Ferraris
et al., 2019). BDA has been positively associated with innovation (Božič and Dimovski,
2019) and a firms agility (Wamba and Akter, 2019). Although LaValle et al. (2011)
stated that top firms use analytics five times more than lower performers and
expectations of BDA to enhance performance is high, adoption is in the minority across
supply chain functions (Wang et al., 2016) and logistics (Schoenherr and Speier-Pero,
2015) with many firms struggling to deliver valuable insights (Roßmann et al., 2018)
and unconvinced on the influence BDA has had on firm outcomes (Ghasemaghaei and
Calic, 2020). In Small Medium Enterprises (SMEs), decision-making is more likely to
rely on feelings and intuition (Garengo and Bititci, 2007) and estimates of BDA
adoption are even lower – 15% in 2016 (Eurostat, 2020). SMEs also have a significant
impact on the road network (Miwa and Bell, 2017), yet uptake of Computerised Vehicle
Routing Software (CVRS) is also low (McCrea, 2017, cited in Fontaine et al., 2020, p.
1) which perhaps suggests experience-led heuristic approaches to routing are common.
With logistics highlighted as one of the most applicable areas for BDA (Kache and
Seuring, 2017), a lack of empirical examples of BDA in the literature (Mikalef et al.,
2018) and a limited volume of research on adoption of BDA in SMEs (Coleman et al.,
2016; Bordeleau et al., 2019), this research seeks to address this gap through a case
study of an SME and the Vehicle Routing Problem (VRP). Thus, the aim of this
research is to investigate how a BDA solution to a VRP compares to an experience-led
heuristic in an SME and to understand the issues this case highlights in SME adoption
of BDA.

Page 1 of 110
The rest of this paper is organised as follows, Chapter 2 is a review of the literature to
provide an understanding of current research and theory, Chapter 3 details the precise
methodology followed including research philosophy, strategy, details of the case and
data collection instruments, Chapter 4 presents the findings and analysis from the
collected data and Chapter 5 discusses these results in relation to the academic literature
before the research is concluded in Chapter 7, with implications for practitioners and
research. The appendix also contains information referred to in the chapters.

Page 2 of 110
Chapter 2 - Literature Review

2.1. Introduction

The following chapter is a review of the existing literature on Big Data (BD), Big Data
Analytics (BDA), and its adoption within Small Medium Enterprises (SMEs). The
review then explores the Vehicle Routing Problem (VRP) from a practical and
theoretical perspective. The chapter concludes with the research questions, aims and
objectives for this research.

2.2. Big Data and Big Data Analytics


2.2.1. Defining Big Data

Like many terms associated with data and analytics, there are ambiguities in the
literature as to precise definitions (Gandomi and Haider, 2015; Dedić and Stanier,
2016). In the mainstream, BD usually refers to structured datasets that cannot be
managed and processed with traditional IT systems (Min Chen et al., 2014; Duan and
Xiong, 2015). However, Gandomi and Haider (2015) highlight that this definition is
largely driven by the marketing exploits of large software companies and it overlooks
the semi-structured and unstructured data (video files, sensors, geospatial, HTML and
text files) which make up the vast proportion of data today (Cukier, 2010; Syed et al.,
2013). In LSCM literature, BD is commonly defined instead by numerous “V”
attributes. Volume, Velocity and Variety, the original three attributes proposed by
Russom (2011) and McAfee et al. (2012), have been added to so there are now eight
attributes of BD (Mikalef et al., 2018; Belhadi et al., 2019). Yet, it is unclear from the
literature how many attributes and the thresholds to be considered BD. This might be
because the definition is continually evolving, it is contextual or it is subjective – BD
for a smaller organisation might be different to BD for a larger organisation (Gandomi
and Haider, 2015). With a broad definition in the LSCM literature, most data could be
justified as BD should it meet a “V” characteristic; Volume is more commonly
associated, but it is just one of eight dimensions. Additionally, Ghasemaghaei and Calic
(2020) found that Volume is not critical for innovation so collecting large amounts of
data is unlikely to help innovation. Increased digitisation and computing power has
created Variety in data sources to create opportunities – data is no longer confined to
standard systems (e.g. CRM). Through this prevalence and unearthed opportunities,
McAfee et al. (2012) states data has become BD.

Page 3 of 110
Figure 2-1 - The eight attributes of Big Data (adapted from Mikalef et al. (2018) and
Belhadi et al. (2019))

2.2.2. Defining Big Data Analytics (BDA)

There have been many differing definitions put forward by researchers for BD and BDA
(Rozados and Tjahjono, 2014). Differentiating between the two has sometimes been
unclear with the same definition used for both terms (e.g. Wamba et al. (2015) and
Wamba and Akter (2019)). It appears more common in the literature to define BDA as
the process, tools and techniques of generating insight and subsequent value from BD
(Russom, 2011; Lamba and Dubey, 2015; George et al., 2016). Belhadi et al. (2019)
breakdown BDA into Descriptive, Inquisitive, Predictive and Prescriptive Analytics
(see Figure 2-2). Descriptive analytics provide a view of the current state often using
descriptive statistics and delivered through reporting tools and dashboards (Rozados
and Tjahjono, 2014; Duan and Xiong, 2015). Examples include reviewing the volume
of sales and total distance covered.

Inquisitive analytics explore why something happened (Belhadi et al., 2019), though
some researchers do not make a distinction between Descriptive and Inquisitive
analytics (e.g. Nguyen et al. (2018)). The process builds on descriptive analytics to
investigate root causes and reveal underlying patterns. For example, investigating
correlations between the number of sales and seasonality or identifying customer
sentiments through analysing social media data (Gandomi and Haider, 2015).

Page 4 of 110
Figure 2-2 - Types of Big Data Analytics (source: Belhadi et al. (2019, p. 3))

Predictive analytics consist of a number of techniques that forecast future outcomes


based on historical and current data (Duan and Xiong, 2015; Gandomi and Haider,
2015). Predictive analytics techniques feature statistical methods and extend to the use
of supervised or unsupervised machine learning algorithms. This could be as simple as
forecasting sales using regression techniques to developing an unsupervised neural
network to achieve these ends (e.g. Kuo (2001)).

Prescriptive analytics use mathematical programming and simulations to identify


optimal actions (Duan and Xiong, 2015). This includes scenario analysis, stress testing
and linear or non-linear programming. Examples include mining video data for optimal
product placement based on customer behaviour or finding the minimum distance for
routing a vehicle between a set of nodes (Gandomi and Haider, 2015).

2.2.3. Other Data Analytics

Across the academic divide, Operational Research (OR) literature tends to use Business
Intelligence (BI), Business Analytics (BA) and Business Intelligence and Analytics
(BI&A) in a similar way BDA is used in the LSCM literature. BI is typically associated
with extracting insights and reporting from data in a structured tabular form, either
spreadsheet or relational database management system (H Chen et al., 2012; Mortenson
et al., 2015; Dedić and Stanier, 2016). BA emerged later to represent the extensive use
of data, statistical, and modelling component often associated with BI (H Chen et al.,

Page 5 of 110
2012; Mortenson et al., 2015). The composite term, BI&A is used more widely in recent
literature to mitigate ambiguity between the two individual terms (Mortenson et al.,
2015). BI&A is defined as “the techniques, technologies, systems, practices,
methodologies, and applications that analyse critical business data to help an enterprise
better understand its business and market and make timely business decisions” (H Chen
et al., 2012, p. 1166). Some researchers distinguish BI&A from BDA by it not
extending to BD (Dedić and Stanier, 2016; Kache and Seuring, 2017), however, other
researchers disagree (Mortenson et al., 2015; Hindle et al., 2020). BI&A also associates
with the four types of analysis: Descriptive, Inquisitive, Predictive and Prescriptive that
BDA is defined by (Del Vecchio et al., 2018; Hindle et al., 2020). So whilst some
authors have tried to distinguish between the various nomenclature (Dedić and Stanier,
2016), it could be argued that these are negligible semantic differences brought on by
the superficial separation of research disciplines (Mortenson et al., 2015). Analytics
and data are interdisciplinary, the focus is on delivering objectivity to support decision-
making (Hindle et al., 2020). Table 2-1 defines terms that are often used
interchangeably in the literature and in the mainstream. Ultimately, “Big Data are
worthless in a vacuum” (Gandomi and Haider, 2015, p. 140); analytics, intelligence,
science and discovery are employed to generate value from that data.

Page 6 of 110
Table 2-1 - Table of terms from the literature that represent interrogating data to
generate value (source: author)

Term Definition Source

Big Data techniques used to analyse and acquire intelligence (Gandomi and
Analytics from big data to inform decision-making Haider, 2015)

Business extensive use of data, statistical and quantitative (Davenport


Analytics analysis, explanatory and predictive models, and and Harris,
fact-based management to drive decisions and 2007, p. 7,
actions. The analytics may be input for human cited in
decisions or may drive fully automated decisions Mortenson et
al., 2015, p.
584)

Business A set of strategies, processes, applications, data, (Dedić and


Intelligence products, technologies and technical architectures Stanier, 2017,
used to support the collection, analysis, p. 131)
presentation and dissemination of business
information

Business composite of Business Intelligence and Analytics (H Chen et al.,


Intelligence referred to as the techniques, technologies, 2012, p. 1166)
& systems, practices, methodologies, and
Analytics applications that analyze critical business data to
help an enterprise better understand its business
and market and make timely business decisions

Data Supply Chain Management data science is the (Waller and


Science application of quantitative and qualitative methods Fawcett, 2013,
from a variety of disciplines in combination with p. 80)
Supply Chain Management theory to solve relevant
Supply Chain Management problems and predict
outcomes, taking into account data quality and
availability issues

Knowledge Applying data analysis and discovery algorithms to (Dedić and


Discovery produce a particular enumeration of models over Stanier, 2016,
existing data. In this context, data exploration is p. 4)
the most relevant research area.
(synonym of Data Mining)

Page 7 of 110
2.2.4. Applications of BDA in LSCM

The interdisciplinary nature of BDA facilitates wide-ranging applications. It has been


used to better predict customer choices, understand probabilities of developing medical
conditions, for detecting political extremism, better managing traffic networks and
improving customer service (Gandomi and Haider, 2015; Vidgen et al., 2017). Within
supply chains, applications include complete supply chain visibility and transparency,
innovation and design of new products, refining the marketing strategy, better
predictions and responses to market demand, inventory management, supplier
evaluation, and smart factories (Duan and Xiong, 2015; Wamba et al., 2015; Kache and
Seuring, 2017; Zheng et al., 2018). Yet there is an absence of industry usage in the
literature with the major focus on tools, techniques and infrastructure (Mikalef et al.,
2018; Nguyen et al., 2018). Indeed, Seyedghorban et al. (2020, p. 107) in their
bibliometric literature review, found literature reviews to be the “lion share” of
methodological approaches. A few examples highlight the potential benefits of BDA.
Belhadi et al. (2019) show BDA in manufacturing can improve operational
effectiveness through sensors on machines providing real-time automatic feedback to
operational personnel with alerts generated on emissions, downtime, and failure rates.
Another by Hopkins and Hawking (2018), shows how using telematics data, camera
technology and live sensor information enabled proactive alerting in real-time to
improve driver safety and lower operational costs through optimal fuel purchasing
times and forecasting vehicle maintenance schedules. Further research is needed to
develop large scale reliable empirical evidence of BDA (Müller et al., 2018; Mikalef et
al., 2020), to understand its role within decision-making (Akter et al., 2019) and how
the actions of a firm lead to realisation of value from BDA (Dremel et al., 2020).

2.3. Small Medium Enterprises (SMEs)


2.3.1. SMEs and BDA

The Department for Business Energy & Industrial Strategy (2019) defines an SME as
a business with less than 250 employees and these constitute 99% of businesses in the
UK private sector1 and account for 60% of all UK employment. Yet there is marked
difference in the adoption of BDA between SMEs and larger organisations with greater
than 250 employees. Only 15% of SMEs performed BDA in 2016 versus 35% of large

1
Excluding businesses with no employees

Page 8 of 110
organisations in the same year (Eurostat, 2020). Cultural differences exist between
SMEs and larger organisations which affects how the different sized organisations
operate (Gibb, 2000). As a result, differences between SMEs and larger organisations
have been found in a wide-range of other domains such as, approaches to management
development (Gray and Mabey, 2016), corporate social responsibility (Jenkins, 2004),
and adoption of ERP systems (Buonanno et al., 2005). Therefore, it is perhaps
unsurprising there are differences in the adoption of BDA. Additionally, it also poses
problems for generalising the research results from large organisations to SMEs
(Mikalef et al., 2019). For example, Raguseo et al. (2020) found that smaller firms
generally do not receive the same level of profitability from BDA investment that larger
organisations do, whereas, Bughin (2016) found no effect of firm size on the
performance of firm. Conversely, Dong and Yang (2020) found SMEs were better able
to take advantage of analytics on social media data with it proving relatively more
valuable than larger firms. The different structures and differing levels of resources
between larger companies and SMEs likely influences the outcomes of BDA.

2.3.2. Resource Based View (RBV) of the firm

The RBV of the firm was proposed by Barney (1991) and explains a firm as a sum of
its resources – assets, knowledge, information, processes. Different firms have different
and contrasting levels of resources and utilising these in combination enables a firm to
achieve competitive advantage. The potential is created from resources that are
valuable, rare, inimitable and non-substitutable (Barney, 1991). Following extensive
use in the IT literature for understanding challenges, adoption and value creation
(Bharadwaj, 2000), the RBV of the firm has become common framework for BDA
(Vidgen et al., 2017; Mikalef et al., 2018; Wamba and Akter, 2019; Ghasemaghaei and
Calic, 2020; Raguseo et al., 2020). Bordeleau et al. (2019) also concluded that the RBV
is suitable for application of BDA in SMEs. The resources associated with BDA tend
to consist of technology, process, people and organisation (Akter et al., 2016; Vidgen
et al., 2017).

2.3.2.1. Technology and Process

BD itself is a key resource but getting the data right is critical (Mikalef et al., 2018) and
though quality of data is not the biggest obstacle in adoption (LaValle et al., 2011),
poor data can lead to incorrect decisions and unnecessary cost (Hazen et al., 2014;

Page 9 of 110
Wamba et al., 2015). Though not always the case and dependent on the BD definition,
smaller companies are thought to have fewer of these BD resources than larger
organisations (Del Vecchio et al., 2018). Even with the necessary BD resource, finding
vendor BD solutions that are both user friendly and embedded with robust analytics
solutions is rare (Russom, 2011; Selamat et al., 2018).

Typically, investment is required to upgrade traditional IT to be capable of handling the


voluminous, varied and evolving amounts of BD (Wamba et al., 2015; Kache and
Seuring, 2017; Shah et al., 2017; Shukla and Mattar, 2019). This is potentially a greater
issue for SMEs who generally have less mature IT expertise, infrastructure and access
to financing to invest in infrastructure (Coleman et al., 2016; Bordeleau et al., 2019),
particularly when trading conditions are more challenging (Del Vecchio et al., 2018).
Yet the required integration effort is greater across a larger organisation than it is across
an SME (Dong and Yang, 2020). Additionally, though the proliferation of cloud
platforms offer a way for SMEs to avoid installing and implementing much of their
own infrastructure (Del Vecchio et al., 2018), the European Commission (2020)
highlights that smaller companies often suffer economically from these due to
unfavourable contractual terms. There may also be security and privacy concerns with
outsourcing of data and infrastructure to another organisation (Belhadi et al., 2019)
which has particular resonation with SMEs (Coleman et al., 2016).

2.3.2.2. People

Having the knowledge and expertise to identify what is needed from a solution, how it
can be implemented and being able to identify suitable data are all required intangible
resources (Schoenherr and Speier-Pero, 2015; Coleman et al., 2016). BDA requires
employee expertise to have the technical knowledge to interrogate the data with the
business and relational knowledge to understand what is important and why (Russom,
2011; Waller and Fawcett, 2013; Wamba et al., 2015; Vidgen et al., 2017; Del Vecchio
et al., 2018; Mikalef et al., 2018; Surbakti et al., 2020). There are fewer employees in
SMEs and managers often have multiple or broader responsibilities than in large
organisations (Gibb, 2000). As a result, where large organisations have analytics teams,
Bordeleau et al. (2019) found analytics is conducted by managers and senior managers
in SMEs. Additionally, using external consultancies with the expertise is unaffordable
(Coleman et al., 2016) and while large companies might be able to team up with large

Page 10 of 110
software providers for implementation support, as in the case of Hopkins and Hawking
(2018) and SAP, this is often out of reach for smaller companies (Akter et al., 2019).

2.3.2.3. Organisation

Researchers also note “Data-Driven Culture” (DDC) is a key resource (Mikalef et al.,
2018) and is defined as a collective thought pattern summarising mindsets, and attitudes
towards process optimization (Belhadi et al., 2019). It involves the collaboration across
different people, skillsets and departments (Akter et al., 2019) and effects how data is
viewed and perceived throughout the organisation (Mikalef et al., 2018; Akter et al.,
2019; Dremel et al., 2020; Mikalef et al., 2020). A successful DDC permeates all levels
of the organisation often requiring a shift towards analytical and problem-solving skills
(Vidgen et al., 2017). Despite the different organisational structures between large
organisations and SMEs (Coleman et al., 2016), Ferraris et al. (2019) also confirmed
the evidence of DDC as a resource for BDA in SMEs.

Particularly important for the development of the DDC is the support of senior
leadership (Schoenherr and Speier-Pero, 2015) whom need to trust the data derived
insights and have an understanding of how they were derived (McAfee et al., 2012;
Mikalef et al., 2018; Conboy et al., 2020; Mikalef et al., 2020). Leadership in SMEs is
often different to large organisations (Gibb, 2000), in particular the personality and
leadership style has an impact on the success of BDA (Bordeleau et al., 2019) echoing
findings found in implementing Performance Management in SMEs (Garengo and
Bititci, 2007).

2.3.3. Big Data Analytics Capability (BDAC)

A limitation with the RBV of the firm is that having static levels of resources does not
necessarily explain how firms adapt to changing external environments and maintain a
competitive advantage (Eisenhardt and Martin, 2000). Additionally, by drawing a
parallel with Information Systems, BDA resources may be imitable so just having the
right resources might be insufficient (Bharadwaj, 2000). For example, some BD
technology is open-source developed and data can be bought from third-parties.

To account for this limitation, Dynamic Capabilities (DC) have been proposed in the
management (Teece, 2007) and Information Systems literature (Bharadwaj, 2000). DC

Page 11 of 110
are an organisation’s ability to create, integrate and deploy resources in combination
and simultaneously to support sustained business performance (Bharadwaj, 2000;
Teece, 2007). Therefore, BDAC is referred to as a firm’s ability to effectively
implement infrastructure, technology and talent to capture and analyse data towards the
generation of insight for decision-making (Akter et al., 2016; Mikalef et al., 2020). In
the analogy of a production process, resources are the input and capability is the process
of leveraging these resources in a strategic way (Mikalef et al., 2018).

2.3.4. Development of BDAC in SMEs

Much research has begun theorising a view of BDAC and it has been linked to
competitive advantage (Wamba et al., 2017) and firm performance (Mikalef et al.,
2019; Wamba and Akter, 2019). However, there is limited empirical research exhibiting
BDAC with a reliance on anecdotal evidence (Mikalef et al., 2018). Additionally, with
a different resource makeup in SMEs and large organisations, a uniform approach is
unlikely to fit since traditional resources considered for large companies are insufficient
to facilitate analytics capabilities in SMEs (Bordeleau et al., 2019). There is little
research about how organisations develop a BDAC (Kayser et al., 2018). Multiple
researchers suggest that it is gradual (Mikalef et al., 2019) and occurs through learning
and as the learning evolves over time, the competence and value of the BDAC also
develops (Vidgen et al., 2017; Hindle and Vidgen, 2018; Conboy et al., 2020).
However, SMEs are underrepresented in the literature (Bordeleau et al., 2019) and with
story-telling proven to aid with BDA adoption (Boldosova, 2019), there is an absence
of trendsetting use-cases to aid understanding and develop knowledge (Coleman et al.,
2016).

2.4. Logistics and the Vehicle Routing Problem (VRP)


2.4.1. Background to the VRP

The output of logistics is customer service (Gubbins, 2003); delivering the right product
to the right place at the right time. With logistics including several functions such as
transportation, inventory planning, warehousing and site locations (Kasilingam, 1998),
there is a balance between customer service and logistics cost (Rushton et al., 2010).
Transportation is not only one of the highest cost logistics operations (Güneri, 2007)
but also a key physical interface between a company and their customers – it has a direct
impact on customer service.

Page 12 of 110
Organisations that outsource transportation to a 3PL or 4PL, will likely have a single
fixed cost for transportation. For an inhouse operation, there will be both fixed costs
and variable costs. According to Kasilingam (1998) and Rushton et al. (2010), fixed
costs include the depreciation of the vehicle value, excise duty, driver compensation
and insurance across the vehicle fleet whereas the variable costs, such as fuel, oil,
maintenance fluctuate with the distance each of the vehicles travels.

Therefore, finding the best routes a vehicle should take is a frequent decision problem
in logistics to reduce costs and improve customer service (Güneri, 2007). First
described by Dantzig and Ramser (1959), the Vehicle Routing Problem (VRP) plays a
fundamental role in logistics (Laporte, 1992). The original VRP, also known as the
Capacitated Vehicle Routing Problem (CVRP), designs optimal delivery routes where
each identical vehicle sets out from a central depot, travels a single route, and returns
to the depot. The aim is to find the routes of least-expense for each vehicle such that
each customer is visited only once by only one vehicle and the capacity of the vehicles
is not exceeded (Laporte, 1992; Güneri, 2007; Braekers et al., 2016). Thus, a solution
to the problem minimises variable costs by maximising vehicle usage whilst achieving
requirements of customer service (Rushton et al., 2010).

However, the problem is difficult to solve. Termed “NP-hard”, a vehicle routing


problem is non-deterministic polynomial time-hard (Güneri, 2007) since the number of
possible routing solutions grows at an exponential rate as the number of customers to
be delivered to increases and because the number of variations of the vehicle routing
problem grows at an exponential rate as further constraints, dimensions and
requirements are added to the problem (Vidal et al., 2020). The nature of the problem
and ever-increasing computer processing power has spawned a substantial volume of
literature, largely in the domain of OR, as new, larger, more complicated versions of
problems are created and new methods are found for solving them (Braekers et al.,
2016; Vidal et al., 2020). Table 2-2 has a list of example variants of the VRP.

Page 13 of 110
Table 2-2 - Common Vehicle Routing Problem variants (source: adapted from Rincon-
Garcia et al. (2017, p. 128))

Acronym VRP Variant Source


CVRP Capacitated – each vehicle has the same Dantzig and Ramser
capacity (1959); Clarke and
Wright (1964)
VRPTW Time-Windows – specific time intervals for Solomon (1987)
deliveries and collections

VRPPDTW Pickup and delivery with time-windows – Parragh et al. (2008)


picked up from one location and delivered to
another (e.g. cash for cash machines)

HVRP Heterogenous vehicles – different types of Taillard (1999)


vehicle (e.g. capacity, speed etc)

SDVRP Site-dependent – particular customers can Pisinger and Ropke


only be visited by certain vehicles (e.g. size, (2007)
congestion charges)

MDVRP Multiple Depots (to start and end routes from) Pisinger and Ropke
(2007)
TDVRP Time Dependent – routing varies by the time Malandraki and
of day (e.g. congestion) Daskin (1992)
DVRP Dynamic – routes adapted as new information Wilson and Colvin
arises (e.g. traffic, additional orders etc) (1977)

VRPSPD Simultaneous pickup and delivery Min (1989)


VRPSPD-H Simultaneous pickup and delivery with (Hornstra et al.,
handling costs (a built-in buffer time at each 2020)
location)

2.4.2. Solving the VRP

Most methods of solution tend to be approximate solutions from “heuristic” or


“metaheuristic” methods rather than exact methods (Güneri, 2007). Whereas an exact
method usually returns an optimal solution at high computational cost, a heuristic
method produces an approximate or near-optimal solution for lower computational cost
(Eiselt and Sandblom, 2000). Metaheuristic methods operate at a higher level than
heuristics, and facilitate a coordination between higher level strategies and underlying
heuristics (Glover and Kochenberger, 2003). Such methods are less problem-dependent
than heuristics alone and can escape local optima through a vigorous search of potential
solutions (Glover and Kochenberger, 2003; Braekers et al., 2016; Abdel-Basset et al.,
2018). For example, Figure 2-3 shows a hypothetical VRP with 6 nodes. The Sweep

Page 14 of 110
Method heuristic starting with a linear sweep East of the red depot will not find the
global optimum solution. However, if the linear sweep started North or South of the
depot it would. In general, metaheuristic methods have ways of evaluating more
potential solutions to select a better solution. A list of example heuristics and
metaheuristics is shown in Table 2-3.

Figure 2-3 - An illustration of The Sweep Method with a suboptimal solution (left)
versus the optimal solution (right) for a 3-node-capacity vehicle (source: author)

Table 2-3 - Common methods in the literature for solving Vehicle Routing Problems
(adapted from Güneri (2007) and Gendreau et al. (2008))

Method Type Description


The Sweep Heuristic Usually manually applied on a map of demand stops.
Method Straight line extended from the depot in any direction
and rotated until a stop is intercepted. Demand stop
added if vehicle capacity not exceeded. Straight line
now extended from point just added. Once capacity is
reached, the next route starts from the stop excluded
from the previous route. Completed for all points to
return a routing solution.
The Savings Heuristic Dummy vehicle serving each point and returning to the
Method depot gives maximum distance for the routing
problem. Points are combined together based on the
largest saving calculated from removing the extra trips
to and from the depot between the two points.
Completed for all points to return a routing solution.
Ant Colony Meta- Algorithm built based on how ants communicate using
Optimisation heuristic pheromone. Each ant lays pheromone to notify other
ants of food, the strength of the pheromone is driven
by the number of ants following the path. Thus, the
algorithm starts by generating random candidate
solutions, these solutions add elements based on a
heuristic evaluation of the element and the
“pheromone” weight associated with it. Thus, a
solution appears which has the most travelled route.

Page 15 of 110
Genetic Meta- Algorithm that operates in a similar way to natural
Algorithms heuristic
selection. Solutions evolve over generations with only
the best solutions parenting, “crossover”, the next
generation. Two parent solutions are combined to
create offspring solutions. A mutation operator is
applied to each offspring for the next generation. The
best solution found is returned.
Greedy Heuristic Multiple random candidate solutions are generated
Randomised before a local search is performed across the candidate
Adaptive solutions. Each element not added to the solution
Search evaluated by a heuristic function and a random element
Procedure is chosen from a list of the “best” the best elements
stored in a restricted list. The best solution after a
specified number of restarts is returned.
Simulated Meta- Randomised local search method where modifications
Annealing heuristic that increase the cost of the solution can be added with
some probability (i.e. there is a chance that the next
element is not the best element). Modifications are
added at each iteration with a solution kept if it is
better than the current solution. The best solution
returned after set number of iterations. Most likely
method to converge to the global optimum.
Tabu Search Meta- Randomised local search method where the best
heuristic solution is selected as the current solution even if it
causes increase in solution cost. A memory (tabu list)
of recently visited solutions is stored to avoid repeated
solutions. Best solution returned after set number of
iterations or consecutive iterations without
improvement.
Variable Meta- Local search method that exploits different
Neighbourhood heuristic neighbourhoods to escape local optima. When a local
Search optimum is reached, another neighbourhood is selected
and used in the following iterations. Best solution is
returned from all neighbourhoods searched.

2.4.3. Applying VRPs in practice

A challenge with the VRP methods and solutions in the literature is the inaccessibility
to practitioners in a raw form, due to the language and knowledge of mathematical
notation required, the expensive software used (e.g. MATLAB) and that the solutions
generated are often very specific and have limited validity to a practical setting
(Kasilingam, 1998; Rincon-Garcia et al., 2017). Therefore, the two common options

Page 16 of 110
left for organisations are the choice of either experience-led, manual heuristics or
purchasing vendor software – Computerised Vehicle Routing Software (CVRS).

2.4.3.1. Computerised Vehicle Routing Software (CVRS)

From a survey of organisations in 2017, only 25% of medium enterprises and 50% of
large organisations were using CVRS (McCrea, 2017, cited in Fontaine et al., 2020, p.
1). CVRS generally have complex algorithms and geographical representations of the
road network to automate the daily planning of collections and deliveries (Rincon-
Garcia et al., 2017). Because of the complexity of the VRP, the automation from the
software generally improves reliability, reduces fixed costs, decreases operational
costs, with vendors also claiming a 10-30% reduction in mileage and an 80-90%
reduction in planning time over manual heuristics (Bräysy and Hasle, 2014; Rushton et
al., 2014). Differences between CVRS is often due to the algorithms and the map data
used for the road network.

Though CVRS is often easy to use and can quickly recalculate routes when changes
occur, a possible reason for the low adoption is the unspecialised nature of cheaper
software with tailored software solutions generally more expensive (Rushton et al.,
2010; Rincon-Garcia et al., 2017; Carlan et al., 2020; Fontaine et al., 2020).
Additionally, there is a desire for closer association between software developers and
researchers, which perhaps emphasises the differences between suboptimal software
and unrealistic research solutions (Bräysy and Hasle, 2014; Rincon-Garcia et al., 2017;
Vidal et al., 2020).

2.4.3.2. Experience-led manual heuristics

In absence of software or complex algorithms, methods for routing tend to fall under
Cluster First, Route Second; Route First, Cluster Second or follow the Sweep and
Savings method in Table 2-3 (Kasilingam, 1998; Fontaine et al., 2020). A number of
principles for routing are suggested in the literature (Kasilingam, 1998; Güneri, 2007;
Rushton et al., 2010), these include:

 Minimise mileage and the number of vehicles


 Assign vehicle stops that are close to each other (spatial coordination)
 Combine deliveries and pickups

Page 17 of 110
 Combine deliveries on the same day of the week together (temporal
coordination)
 Build routes beginning with the farthest stop from the depot
 Use the largest vehicle first to maximise utilisation
 Avoid narrow time windows
 Consider alternate delivery means for remote or low-volume locations

However, the complexity of the VRP means human planning is inadequate in most
cases (Bräysy and Hasle, 2014). Indeed, Fontaine et al. (2020) used participants with
no logistics experience and found participants rarely found the optimal solution. In
particular, the participants performed poorly at identifying the clusters but routing
within the clusters was very close to optimal. Though the participants with logistics
experience may have performed better, the manual approach to routing is often labour
intensive, time-consuming, and likely to be embedded with errors and inefficiencies
(Carlan et al., 2020).

2.5. Conclusion
2.5.1. Research Questions

Waller and Fawcett (2013) highlighted the importance of conducting research at the
intersection of the domains of OR and LSCM, yet the number of studies with a BDA
focus in OR literature is low (Mortenson et al., 2015). This includes research on the
value of BDA (Vidgen et al., 2017; Hindle and Vidgen, 2018) and research from a
practical setting (Mortenson et al., 2015; Conboy et al., 2020). In addition, researchers
in LSCM literature have highlighted that more empirical research is needed to better
understand BD, BDA in supply chains in general (Wamba et al., 2015; Kamble and
Gunasekaran, 2019) and logistics (Hopkins and Hawking, 2018). Logistics is
highlighted as one of the most applicable areas for BDA (Kache and Seuring, 2017)
including to support route-optimisation using data such as telematics, traffic density
and weather (Rozados and Tjahjono, 2014; Hopkins and Hawking, 2018). Thus, there
is an opportunity for BDA to fill the gap between experience-led manual heuristics and
CVRS in vehicle routing:

Research Question 1: How does a BDA solution to a VRP compare to an experience-


led heuristic?

Page 18 of 110
Adoption of BDA is lower in SMEs than larger organisations (Eurostat, 2020) which
the literature suggests is a question of differing levels of technology, process, people
and organisational resources and the challenges of achieving BDAC. The literature
often uses cases from large companies where an established analytics function already
exists and generalises to SMEs (e.g. Wamba et al. (2015); Vidgen et al. (2017); Belhadi
et al. (2019)). Additionally, real problem scenarios and cases exploring the barriers to
adoption of BDA within particular contexts are also limited in the literature (Kache and
Seuring, 2017; Shukla and Mattar, 2019) with much understanding built on the
assumption that all organisations face the same challenges (Mikalef et al., 2019).
Therefore, evidence from a real SME will help build these theories:

Research Question 2: What does the example highlight to explain why there is low
adoption of BDA in SMEs?

2.5.2. Research Aim

The aim of this research is to investigate how a BDA solution to a VRP compares to an
experience-led heuristic in an SME and to understand the issues this case highlights in
SME adoption of BDA.

2.5.3. Research Objectives

To achieve the research aims, the following objectives will be completed:

i. Literature review on SME usage of BDA and methods to solve VRPs


ii. Administer a questionnaire to gauge perceptions and understanding of BDA
across the SME
iii. Descriptive BDA of historic BD to describe the experience-led heuristic
routing method employed
iv. Prescriptive BDA to derive a routing algorithm to meet the VRP criteria
using opensource packages and external secondary data
v. Descriptive and Inquisitive BDA to compare impact of outputted routes
from experience-led heuristic and BDA solution

Page 19 of 110
Chapter 3 - Methodology

3.1. Introduction

The aim of this research, the research questions, and the research objectives derived
from the review of the literature are shown in Table 3-1. The following chapter
describes the methodology for the research and includes the research philosophy,
research strategy, data collection instruments, plan for analysis and ethical issues.

Table 3-1 - Table of research questions, aim and objectives (source: author)

Research Element Description


Purpose
Element
Research Research Question 1 - How does a BDA solution to a VRP compare to
Questions an experience-led heuristic?
Research Question 2 - What does the example highlight to explain why
there is low adoption of BDA in SMEs?
Research The aim of this research is to investigate how a BDA solution to a VRP
Aim compares to an experience-led heuristic in an SME and to understand
the issues this case highlights in SME adoption of BDA
Research i. Literature review on SME usage of BDA and methods to
Objectives solve VRPs
ii. Administer a questionnaire to gauge perceptions and
understanding towards BDA across the SME
iii. Descriptive BDA of historic BD to describe the experience-
led heuristic routing method employed
iv. Prescriptive BDA to derive a routing algorithm to meet the
VRP criteria using opensource packages and external
secondary data
v. Descriptive and Inquisitive BDA to compare impact of
outputted routes from experience-led heuristic and BDA
solution

3.2. Research Philosophy

Research philosophy is a framework of assumptions and beliefs about knowledge and


how it is learned (Creswell, 2003; Saunders et al., 2016). Thus, grounding research in
a particular research philosophy can determine how the research is designed and the
methods that are employed (O'Gorman and MacIntosh, 2015). It consists of
assumptions related to ontology, epistemology and axiology (Saunders et al., 2016).

Page 20 of 110
3.2.1. Ontology

Ontological assumptions are concerned with the nature of being and reality (Saunders
et al., 2016). O'Gorman and MacIntosh (2015) describe ontology in straightforward
terms as viewing the world as either objective or subjective. An objective ontological
viewpoint assumes a reality consisting of objects that are measurable and testable
whereas a subjective ontological viewpoint assumes a reality as the emergence of the
individual perceptions and interactions of individuals (O'Gorman and MacIntosh,
2015). Although there is a large objective element to the research with the application
of BDA to solve a VRP, this analysis is grounded within the context, case, and reality
in which it occurs. The value of the research is the interpretation of the results with
respect to this context to be practical which, as highlighted in the literature review, is a
considerable gap within the literature. Thus, the research follows a mixed ontology.

3.2.2. Epistemology

Epistemology is a theory of knowledge (Bryman, 2012). Two epistemological


viewpoints are the extremes of positivism and interpretivism. In positivism, knowledge
is established through measuring observable variables and testing (Creswell, 2003), and
in interpretivism, knowledge is generated through the differing individual
interpretations of reality (Creswell and Plano Clark, 2011). Due to this research having
mixed ontology and since positivism and interpretivism are traditionally associated
with objective and subjective ontological positions respectively (O'Gorman and
MacIntosh, 2015), this research is not suited to solely a positivist or interpretivist theory
of knowledge. Mixed ontological assumptions indicate that assumptions will be
extracted from both epistemologies. The epistemology of pragmatism does not commit
to a single theory of knowledge and permits freedom of choice in selecting methods
and procedures to meet the demands and purpose of the research (Creswell, 2003). The
focus is on functioning, “what works” and generating knowledge about the problem
(Feilzer, 2009; Creswell and Plano Clark, 2011). This perspective permits mixed
methods (Teddlie and Tashakkori, 2009) and the flexibility to adapt to the situation,
particularly resonant with conducting research during the COVID-19 pandemic and UK
government imposed constraints (UK Government, 2020b). Additionally, the
epistemology of BDA is largely undefined; Lipworth et al. (2017, p. 494) describe it as
“observational rather than experimental” and Lowrie (2017, p. 6) state that it “operates

Page 21 of 110
according to a different set of epistemological standards” than traditional science. Yet,
with the focus of BDA on the creation of practical functioning code, algorithms and
solutions (Lowrie, 2017; Mehozay and Fisher, 2019), it aligns with the pragmatic
epistemology. Therefore, this research follows a pragmatic epistemology.

3.2.3. Axiology

Axiology refers to the role of values and ethics within the research process (Saunders
et al., 2016) with the values informing the bias the researcher brings to the research
(O'Gorman and MacIntosh, 2015). In positivism and objectivity, the axiological
assumption is that the research is free from bias with the researcher seeking to minimise
the influence of values (Teddlie and Tashakkori, 2009). Whereas, in interpretivism, the
axiological assumption is the research is value-bound and biased due to the researchers
actively employing subjective intepretations of the data (O'Gorman and MacIntosh,
2015). In pragmatic epistemology, axiological assumptions are often overlooked in
research (Biddle and Schafft, 2015). With pluralistic methods employed by research
with a pragmatic epistemology, it is likely that parts of the the research will contain bias
and other parts will not (Creswell and Plano Clark, 2011). The axiological assumptions
of this research will be value-bound and contain bias largely through the analysis and
interpretation of the data in relation to the context.

3.2.4. Approaches to theory development

There are three common approaches to theory development: Deduction, Induction and
Abduction (Saunders et al., 2016). In deduction, research tests existing theories through
hypothesis testing, whereas with induction, theory is derived from the research and
these are common analytic strategies in positivist and interpretivist epistemologies
respectively (Bryman, 2012). Conversely, abduction begins with an inductive approach
followed by the testing of the modified or generated theories (Kovács et al., 2005). This
research is inductive as it builds theory and does not systematically test existing theory.
BDA also tends to follow an inductive approach (Mortenson et al., 2015) and the aim
is to provide an example of BDA to build on existing theory of BDA adoption in SMEs.

3.3. Research Strategy

The research strategy is the general plan of how the research will be undertaken to
answer the research questions (Saunders et al., 2016). Though some literature

Page 22 of 110
associates research methods with research purposes (e.g. Creswell (2003)), Yin (2018)
suggests that each research method can be used for all research purposes and the
selection of the research strategy depends on three conditions: (i) the form of the
research question, (ii) the level of control over events required and (iii) the recency of
the events. In this research, the research questions contain “how”, “what” and “why”
which perhaps indicates an explanatory research purpose (Saunders et al., 2016).
Explanatory research tends to be associated with either experiments, archival research
and case studies (Yin, 2018). Archival research assesses change over time (O'Gorman
and MacIntosh, 2015) with no control over the historic events (Yin, 2018). Conversely,
experiments tend to investigate causal links through controlled manipulation of
independent variables and measurement of dependent variables (O'Gorman and
MacIntosh, 2015). However, neither Archival research nor Experiments are suitable for
this research. Whilst this research tries to understand why a change is occurring, the
research focus is on observing and analysing a present problem within context to
understand and explain the low adoption of BDA in SMEs and as identified by the
literature review, there is limited literature featuring practical examples of BDA. Since
a case study investigates a contemporary phenomenon within its real-world context
(Yin, 2018), the research strategy is a case study. A single case study is used that
represents the common case of an SME that has not adopted BDA as this is likely to
provide insight into a typical, more prevalent situation (Saunders et al., 2016). Though
case studies are often criticised due to lacking generalisability (Saunders et al., 2016),
the aim of this research is to expand theories of the low adoption of BDA adoption and
not to extrapolate to the entire SME population (Yin, 2018).

3.4. The case: Shortridge Ltd.

The case for this research is an SME called Shortridge Limited. Shortridge trace
business roots back to 1845 in providing laundry services. The organisation have an
annual turnover of £9.5m and employ an average 246 staff (Shortridge Ltd., 2018) to
provide quality linen hire and laundry services to businesses in the North of England
and Scotland. Typically, the company service a range of industries and business sizes,
predominantly hospitality – hotels, B&Bs, holiday parks. Across three sites,
Workington, Dumfries and Darlington; Shortridge operate a fleet of 32 vehicles to
collect and deliver linen to customer sites up to six times per week. Shortridge have
faced challenges with vehicle routing relying on intuition and experience-led heuristics

Page 23 of 110
from their Transport team. In July 2020, Shortridge implemented CVRS, Max Optra,
which is claimed to reduce operational costs by up to 20% at an annual price of £600
per vehicle (MaxOptra, 2020) – roughly £19k for the Shortridge vehicle fleet.

3.5. Data Collection

Case study research commonly draws on multiple sources of data, including both
quantitative and qualitative, which converge in a triangulating fashion (Yin, 2018).
Using both types for data collection is called mixed methods (Creswell, 2003). Though
there are disadvantages associated with mixed methods research, like the extra skills
and resources required, a great advantage is that the strength of one method can offset
the weakness of another method (Creswell and Plano Clark, 2011). In particular,
qualitative methods can aid the explanation and utility of quantitative results (Bryman,
2012) that are often weak in understanding context when used alone (Creswell and
Plano Clark, 2011). Therefore, mixed methods are employed in this research in an
embedded design with an emphasis on the quantitative element (QUAN) and
integration of results during the interpretation (see Figure 3-1).

Figure 3-1 – Diagram depicting the research design (source: author)

3.5.1. Primary Data

Data is considered primary if it originates for the specific purpose of the research
(O'Gorman and MacIntosh, 2015). Primary data was primarily qualitative and collected
through a self-completion questionnaire using Qualtrics (see Appendix 1). The
questionnaire consisted of 6 questions plus 2 questions for consent at the beginning,
and 2 questions for demographic information at the end. With the aim of the qualitative
questionnaire to explain the context of the quantitative results, the questionnaire was

Page 24 of 110
weighted towards open questions to capture the participants own words and
understanding as much as possible (Bryman, 2012). Following feedback from a pilot
on other students, two questions were posed as closed questions to help clarify the
question meaning (Bryman, 2012). The questions centre around how the employees in
the organisation solve problems (Q3), gauges their understanding of BDA (Q4 & Q5),
the barriers of using it (Q6) and the tools associated with BDA (Q7). Other qualitative
data collection instruments such as interviews potentially offer richer data collection,
however, the instrument is time-consuming for both researcher and the organisation
(Bryman, 2012). With the qualitative element having less emphasis in the research and
mixed methods research considered intensive, a questionnaire is chosen for
convenience and efficiency (Creswell and Plano Clark, 2011; Bryman, 2012). The
“online” mode of administration is select for similar reasons (Rosenfeld et al., 1993).
In recognition of the potential difference in response rates between modes of
administration (Bowling, 2005), a Director of the organisation distributed the
questionnaire as self-completion questionnaires typically have a low response rate
(Bryman, 2012).

The questionnaire was purposefully sampled from across the office staff at the
organisation. It was administered to participants in the organisation over email and was
completely anonymous. The questionnaire link was shared with 21 employees and 13
responses were received, a response rate of 61.9%.

3.5.2. Secondary Data


3.5.2.1. Literature Review

Secondary data was sought from the academic literature to provide an understanding of
the background and current state of research into BDA, its adoption within SMEs and
the background to VRP. This data also was used in the formation of the codes used in
the qualitative data analysis.

3.5.2.2. Quantitative Data

To identify the historic routes and the vehicles used by Shortridge, telematics data was
extracted from the 3rd party telematics vendor portal: PRS telematics. Each vehicle is
fitted with a telematics device that transmits the GPS location of the vehicle when the
engine is switched on. A Transport Manager at Shortridge extracted a sample from the

Page 25 of 110
vendor cloud portal for analysis in Comma Separated Values (.csv) format and covering
the entirety of February 2020 (12,662 rows, 14 columns, 2MB) and transferred via
email. Additional secondary data was collected to supplement the telematics data
during the analysis (Table 3-2). Data from February 2020 was the most recent month
of normal business activity for Shortridge as UK Government imposed restrictions to
combat the Coronavirus pandemic took effect throughout March 2020 (UK
Government, 2020b). The one-month sample of automatically generated telematics data
may or may not meet the “Volume” characteristic of BD but the combination of
secondary data used, the varied formats and method of access suggest a presence of
other BD characteristics. Plus, building a routing solution to a bespoke problem is
considered innovation so “Volume” is a less important BD characteristic
(Ghasemaghaei and Calic, 2020).

Table 3-2 - Additional secondary data collected during the research (source: author)

Data Type Description Source Accessed via


Longitude and Required for the GetTheData (2020) Application
Latitude translation of whom provide a Programming
(Geocoding) descriptive locational database of the UK Interface
addresses and postcode postcode directory (API)
data in the telematics derived from the UK
data into an absolute Office for National
geographic reference, Statistics (ONS)
known as Geocoding published open data
(Goldberg et al., 2007)
Great Britain Required for the local Geofabrik (2018) Download of
OpenStreetMap installation of the whom host latest file
data routing engine. OpenStreetMap (1.1GB) in pbf
download servers. format on 13th
May 2020

3.6. Data Analysis plan

The plan for data analysis is described in Table 3-3.

Page 26 of 110
Table 3-3 - Summary of the approach to data analysis and how it aligns with research
aims and objectives (source: author)

Research aim Research objective Data Data


Collection Analysis
i. Literature review on SME usage of Literature
The aim of this BDA and methods to solve VRPs Review
research is to ii. Administer a questionnaire to Qualitative Template
investigate gauge perceptions and understanding (primary analysis
how a BDA towards BDA across the SME data)
solution to a iii. Descriptive BDA of historic BD Quantitative Descriptive
VRP compares to describe the experience-led (secondary analytics
to an heuristic routing method employed data)
experience-led iv. Prescriptive BDA to derive a Quantitative Prescriptive
heuristic in an routing algorithm to meet the VRP (secondary analytics
SME and to criteria and parameters using data)
understand the opensource packages and external
issues this case secondary data
highlights in v. Descriptive and Inquisitive BDA Quantitative Descriptive
SME adoption to compare impact of outputted (secondary analytics
of BDA routes from experience-led heuristic data)
and BDA solution

3.6.1. Qualitative Analysis

Template analysis is a form of thematic analysis and is suitable for most analysis
approaches and forms of qualitative data, including questionnaire responses (King,
2012; Saunders et al., 2016). Template analysis is both systematic and flexible,
however, it has been criticised by some researchers for the focus on the template rather
than the data (Saunders et al., 2016). An alternative technique would be content analysis
which codes qualitative data in order to analyse it quantitatively (Saunders et al., 2016)
or grounded theory, a recursive analysis that generates theory due to close alignment of
the analysis and theory (Bryman, 2012). However, due to the expected small sample of
responses, treating the data as quantitative is unlikely to yield useful findings or be
substantial enough to generate new theory. Template Analysis better supports the
purpose of this research to supplement existing theory.

For the analysis, an initial coding template for each of the questions was generated a-
priori using codes generated from the literature review (King, 2012). The codes were
then modified during the analysis of the questionnaire responses, (the initial template

Page 27 of 110
is shown in Appendix 8). Questionnaire responses were downloaded from Qualtrics and
analysed in Microsoft Excel.

3.6.2. Quantitative Analysis

BDA generally requires an explorative and inductive approach as the analysis often
starts from a dataset rather than theory and requirements (Mortenson et al., 2015;
Kayser et al., 2018; Chehbi-Gamoura et al., 2020). There is limited literature on
methods of approaching Big Data Analysis (Hindle and Vidgen, 2018). The 6 step
method proposed by Akter et al. (2019) was adapted for this research (Table 3-4).

All programming and analyses were conducted in Jupyter (v6.0.3) using Python (v3.8.3
64bit) programming language. Microsoft Excel was also used to store data and collate
results. To perform the routing optimisation, commercial solvers like Gurobi
Optimization (2020) and CPLEX from IBM (2020) were excluded due to cost. Two
open-source solvers were considered: Google OR-Tools (2020b) and the “vrpy” library
from Montagné and Sanchez (2020). Due to “vrpy” package being in beta development
phase, Google OR-Tools was selected.

Table 3-4 – Summary of Big Data Analysis approach (source: author)

Step Description Activity

1 Problem Define the  Initial data exploration


recognition problem and the  Discussion with Shortridge Transport
requirements Manager

2 Review To avoid common  Understand programming tools and


Previous pitfalls, packages required for analysis
Findings understand what  Review of routing solver requirements
and solutions are  Selection of routing solver:
Context possible and o OR-Tools (Google OR-Tools, 2020b)
environment o vrpy (Montagné and Sanchez, 2020)
setup is required  Source affordable routing solver inputs
(e.g. distance matrix)

3 Collect the source the  Data cleansing, identification of critical


data and required data to fields and data transformation for
build the analysis
environment and  Geocoding of postcodes

Page 28 of 110
environme perform the  Installation of local routing engine
nt tools analysis (OSRM) built on the OpenStreetMap data
(Luxen and Vetter, 2011) in a docker
container

4 Select narrow the  Programme the routing solver algorithm


variables problem focus in python language
and and an  Select variables for comparison of
develop the understanding of routing (distance, duration and vehicles
model what variables used)
need to be  Develop the output for the comparative
included and analysis
relationships need
to be measured

5 Analyse using descriptive  Descriptive Analysis of the experience-


the data and prescriptive led heuristic routing (original)
analytics  Route level comparison of baseline
(Travelling Salesperson Problem)
 Depot level comparison of baseline with
CHVRP (capacity = number of stops)
 Depot level comparison of baseline with
CHVRP (capacity = derived)
 Depot level comparison of baseline with
CHVRP (capacity = derived, all vehicles)
 Depot level comparison of baseline with
TCHVRP (capacity = derived, all
vehicles)

3.7. Ethical Issues

Ethics are the standards of behaviour that guide conduct and the rights of the
participants during the research (Saunders et al., 2016). Bryman (2012) refers to four
principles of ethics: harm, consent, deception, and privacy. For this research, both the
organisation and the employees participating in the questionnaire are participants. A
separate information sheet was shared with both the organisation and was attached to
the email with the online questionnaire link (Appendix 9). The information sheet
explained the study, what information was being requested and explained the
participant’s right to anonymity and the right to withdraw. Consent was sought from
the organisation and the questionnaire had two questions related to consent to

Page 29 of 110
participate in the research. To align with GDPR (2018), the minimum amount of data
was requested, and the data was kept confidentially and securely on the Heriot-Watt
University OneDrive. The data will also be retained for no longer than required by the
assessment process (CDRC, 2018).

3.8. Conclusion

The research uses a philosophy of mixed ontological assumptions and a pragmatist


epistemology. Whilst pragmatism enables the use of both deduction and induction,
primarily the focus of the research is to build and add to existing theory. The research
strategy is a case study with embedded mixed methods research design and emphasis
on the quantitative methods. Primary data was collected using a purposefully sampled
online questionnaire and secondary data was a telematics dataset shared by the case
organisation with map data and geocoding from open-sources. Template analysis was
used to analyse the responses from the questionnaire and BDA was used to analyse and
derive a working solution to the VRP from the quantitative data. The next chapter
presents the results and data analysis.

Page 30 of 110
Chapter 4 - Findings and Data Analysis

4.1. Introduction

This chapter presents the findings and the data analysis from the research methods
described in the previous chapter. The chapter begins with the quantitative BDA of the
telematics data and the comparisons in routing between the derived algorithms and is
followed by the qualitative findings of the online questionnaire of the employees.

4.2. Quantitative Analysis

There are 6 analyses; an initial descriptive analysis of the telematics dataset to explore,
understand and measure the experience-led heuristic (original) routing, followed by five
comparative analyses with the experience-led heuristic routing and the five versions of
BDA solution to routing (algorithm) (Table 4-1). The five versions of algorithm routing
were developed iteratively in Python (v3.8.3 64bit) using the Google OR-Tools
(2020b) constraint programming solver.

Table 4-1 - Description of the comparative BDA analyses (source: author)

Analysis Description
1 Route level comparison Applying the Travelling Salesperson Problem
of original with TSP (TSP) to the original routes
2 Depot level comparison Algorithm builds the same number of routes from
of original with CHVRP the nodes serviced from the same depot on the
(capacity = number of same day but is restricted to using the same number
stops) of stops from the original routes
3 Depot level comparison As Analysis 2, but customer demand is calculated,
of original with CHVRP and vehicle capacity is used instead of number of
(capacity = derived) stops
4 Depot level comparison As Analysis 3, except any vehicle at the depot can
of original with CHVRP be used
(capacity = derived, all
vehicles)
5 Depot level comparison As Analysis 4, with additional constraint of
of original with maximum route time of 9 hours and 10 minutes
TCHVRP (capacity = service time per node (to meet UK driving limits
derived, all vehicles) (UK Government, 2020a))

The analysis followed the process as outlined in the methodology (see Table 3-4) and
key preliminary activity is presented before the results and analysis.

Page 31 of 110
4.2.1. Problem Context

The purpose of the routing at Shortridge is to service their customers by collecting


soiled linen and delivering clean linen at a frequency required by the customer. The
customer base consists of roughly 1,000 customers covering the North of England and
Scotland from the three depots: Workington, Dumfries and Darlington. Darlington does
not have linen cleaning facilities, so linen is trunked between Darlington and Dumfries
twice per day. The fleet consists of 32 vehicles of differing capacities (see Table 4-2)
spread throughout the three depots. Each vehicle is fitted with PRS telematics that
records the GPS location of the vehicle.

Linen is delivered in separate laundry bags for each customer weighing 10-15kg on
average for both soiled and clean linen. The bag weight is highly variable, depending
on the order size and due to differences in packing between depot and the customers,
with bags of dirty linen known to have been in excess of 25kg. Delivery and collection
can be at any point during the day with customers having specific instructions on where
the exchange of linen needs to occur. Larger customers, such as hotels, are serviced
more often, generally have larger orders and tend to have laundry delivered in roll cages
in 12T or 18T vehicle.

Table 4-2 - Shortridge vehicle fleet (types and capacities) (source: author)

Vehicle Effective Laundry Number Vehicle capacity constraint


Type Payload Bags** of
(Tonnes) (Tonnes) (number) Vehicles*
Volume sensitive (filled before
18 10*** 660 2
overload)
12 7.5 500 8 Volume sensitive
7.5 3*** 200 1 Volume sensitive
7 3.8 250 12 Volume sensitive
5 2.5 200 4 Weight sensitive
3.5 1.9 80 4 Weight sensitive
Weight sensitive (overload before
2 0.5*** 10 1
filled)
*Vehicle numbers taken from February 2020 telematics data
**Based on laundry bag weight of 15kg
***Assumed based on (W.S Hunt's Transport Ltd, 2015; Motorvation (Shows on the Road) Ltd, 2020),
other payload values supplied by Shortridge

Page 32 of 110
The routes follow a weekly cycle with different geographical areas serviced on different
days (e.g. North Pennines on a Wednesday). In addition, some larger customers are
serviced daily with smaller and medium customers once or twice per week. The routes
are manually constructed, largely unchanged week-to-week and rarely re-evaluated or
optimised. Existing customers without an order for that day are excluded from the route
and new customers are added to existing routes based on demand and the existing
customers they are closely located to. This approach fosters a routine that gives the
drivers familiarity and is easier to manage and administer.

Customer demand is seasonal with demand peaking in both the summer and around
Christmas and New Year. To cope with the rise in demand, routes switch to “Summer
Routes” which are longer, use larger vehicles, and sometimes include a driver’s mate
to help with deliveries and collections. The switch to “Summer routes” tends to occur
in April lasting until early October and in mid-December, lasting until mid-January.

4.2.2. Key preliminary activity


4.2.2.1. Data cleansing

As highlighted by Hazen et al. (2014), poor data quality can lead to incorrect decisions,
inaccurate insights and reduced value. Thus, the first step of the data exploration was
to identify the Critical Data Elements (Table 4-3) for the analysis and transform the
data into an appropriate format for analysis.

The telematics dataset is automatically generated in a structured format from the


vehicles GPS signals. The dataset is transactional where each row represents a journey
from a start location to an end location. Due to the automatic generation of data, there
is inevitable noise included in the dataset, for instance refuelling stops, comfort stops,
maintenance stops etc. as well as customer and depot sites. However, this should be the
minority of activity recorded.

The transforming and cleansing of the February dataset with 12,662 rows of data is
described in detail in Appendix 3. This led to the identification of 1720 unique
postcodes (or nodes) which is greater than the 1000 customers Shortridge estimated
their customer base to be, indicating noise in the data. Ideally, these nodes would be
cross-referenced with the customer base or the nodes would directly come from the

Page 33 of 110
customer base. However, the aim of the analysis is to illustrate that BDA can be used
to provide solutions to the VRP as an alternative to experience-led heuristics.

Table 4-3 - Identification of the critical data elements from the telematics dataset
(source: author)

Critical
Column Description %Populated Data
Elements*
Registration of the vehicle (number
Registration 100% Critical
plate)
Start Time Datetime when journey started 100% Low
End Time Datetime when journey ended 100% Critical
Start Description of location/Address
100% Low
Location where vehicle starts journey
End Description of location/Address
100% Low
Location where vehicle ends journey
Start POI “Shortridge Darlington” or empty 8% No
End POI “Shortridge Darlington” or empty 8% No
Driver Name of driver 21% No
Duration Length of time the journey took 100% No
Length of time spent stationary on the
Idle 100% No
journey
Miles Distance covered on the journey 100% No
Max Speed Highest speed attained on the journey 100% No
Start
Postcode at the start of the journey 100% Low
Postcode
End
Postcode at the end of the journey 100% Critical
Postcode
*Critical – fundamental for analysis, Low – used in data cleansing activity, No – not used

The baseline datasets for analysis consist of 630 routes and 72 depot-day pairs. All
distances and durations in the datasets use the same distance and duration matrix from
OSRM to ensure comparability (see 4.2.2.3.)

4.2.2.2. Geocoding

Geocoding was undertaken via HTTP requests to the GetTheData API and the
responses parsed using JSON embedded within the Python code. The function to do
this is shown and built using guidance from Andrade (2018) (Figure 4-1). For the 1720
nodes, this took roughly 20 minutes to complete.

Page 34 of 110
Figure 4-1 - Python code used to geocode the postcodes from the telematics data
(source: author)

4.2.2.3. Distance and duration matrices

The setup for a local routing engine, OSRM, is described in Appendix 4. The distance
and duration matrix are retrieved from OSRM via a HTTP request of longitude and
latitude tuples to the backend of the OSRM using the code shown in Figure 4-2.

Figure 4-2 - Python code to request distance and duration matrix from local OSRM
(source: author)

4.2.2.4. Routing Algorithm setup

The routing algorithm was setup using Google OR-Tools (2020b) guidance. Analyses
1 and 2 use the heuristic “PATH_CHEAPEST_ARC” for the first solution and the
metaheuristic “GUIDED_LOCAL_SEARCH” to refine the solution as recommended
by Google OR-Tools (2020a). In Analyses 3, 4 and 5, the heuristic was changed to
“PARALLEL_CHEAPEST_INSERTION” which improved the number of solutions
found and followed guidance from the developers (see Furnon (2017)). A time-limit

Page 35 of 110
was implemented for each routing problem in the analysis as the metaheuristic will run
infinitely otherwise (see Appendix 5).

The objective of the routing algorithm was to minimise distance travelled. Analysis 1
was also simulated for minimising route duration and there are not substantial
differences in results between the two variables (see Table 4-3). Distance was selected
because Duration is much more vehicle dependent (e.g. maximum speeds) and since
the default vehicle profile was installed in the routing engine, OSRM, it is more likely
to be inaccurate (see Appendix 4).

4.2.2.5. Assumptions

Each analysis was subject to the assumptions as indicated in Table 4-3.

Table 4-4 - Assumptions for each version of the routing algorithm analysis (source:
author)

No. Assumption Analysis

i. The same volume of linen is delivered and collected to each 1,2,3,4,5


customer
ii. A node identified from the telematics dataset is a customer 1,2,3,4,5

iii. Order volume and customer demand are the same for each node 1,2

iv. Each route/depot-day is treated independently. The focus of the 3,4,5


analysis is a simulation of routing possibilities so there is no
aggregation of customer demand or integration of real demand
planning over the month
v. To model demand in the analysis, as confirmed by Shortridge, 3,4,5
nodes with higher frequency of weekly visits during February, are
generally larger customers and thus have larger orders
vi. All routes and customer sites are reachable by any vehicle 2,3,4,5

vii. 18T vehicles are primarily used for trunking. These are excluded to 2,3,4,5
simplify the analysis. Further modifications to the algorithm would
be required for their inclusion (e.g. customers that can be serviced
by certain vehicles)
viii. 6 is the maximum times a customer/node can be visited per week 3,4,5
(Shortridge Ltd., 2020)
ix. Vehicles rarely operate at 100% capacity 3,4,5

x. Each customer/node stop takes 10 minutes 5

Page 36 of 110
4.2.2.6. Derived Customer Demand (CoODV)

Analyses 3, 4 and 5 consider the capacity of the vehicle in route selection. Real demand
data was not used in the analysis, so this was derived to provide an understanding of
the impact of capacity constraints on the routes. Customer demand is measured in bags
and was derived based on assumption v. (Table 4-3); customers that are visited most
frequently tend to be larger customers and have larger orders. Thus, customer demand
was derived as a function of the frequency of deliveries per week and the capacity of
the vehicle that originally made the delivery, termed CoODV. An example of how this
was calculated is shown in Figure 4-3 with further detail in Appendix 6.

Figure 4-3 - Example of how demand was derived based on delivery frequency and
capacity of original delivery vehicle (source: author)

4.2.2.7. Depot fleet

Analyses 1, 2 and 3 use the original delivery vehicles whereas Analysis 4 and 5 expand
to include all delivery vehicles at the depot.

Page 37 of 110
Table 4-5 - Number of each vehicle type at each depot (source: author)

Vehicle Size Vehicle Bag Number of vehicles*


Capacity Darlington Dumfries Workington
12T 500 1 3 2
7.5T 200 0 1 0
7T 250 4 4 4
5T 200 2 0 2
3.5T 80 0 1 3
2T 10 1 0 0
*derived based on original telematics data

4.2.3. Results and Analysis

The five key findings are presented with a summary of the results from the five
comparative analyses is shown in Table 4-6.

Page 38 of 110
1) Route level 2) Depot level 3) Depot level 4) Depot level 5) Depot level comparison of
comparison of comparison of comparison of original comparison of original original with TCHVRP
original with TSP original with with CHVRP (capacity = with CHVRP (capacity (capacity = derived, all
Analysis CHVRP derived) = derived, all vehicles) vehicles)

Table 4-6 - Summary of the five analyses setups and results (source: author)
H00335623
Edmund Houldridge
(capacity =
number of
stops)
Assumptions i, ii, iii i, ii, iii, iv, vii i, ii, iv, vi, vii, viii, ix i, ii, iv, vi, vii, viii, ix i, ii, iv, v, vi, vii, viii, ix, x
490 routes that began 72 depot-day 72 depot-day pairs (472 72 depot-day pairs (472 72 depot-day pairs (472
and end at the same pairs (472 routes, routes, routes operated by routes, routes operated routes, routes operated by 18T
depot and had more routes operated 18T vehicles excluded, by 18T vehicles vehicles excluded, 12T
than one stop by 18T vehicles 12T vehicle PX19 LCJ excluded, 12T vehicle vehicle PX19 LCJ also
Scope excluded, 12T also excluded) PX19 LCJ also excluded).
vehicle PX19 excluded). Entire vehicle fleet in scope.
LCJ also Entire vehicle fleet in Maximum route duration at 9
excluded) scope hours. 10 minutes per node
PATH_CHEAPEST_A PATH_CHEAPES PARALLEL PARALLEL PARALLEL
RC / T_ARC / _CHEAPEST_INSERTION / _CHEAPEST_INSERTIO _CHEAPEST_INSERTION /
Heuristic /
Page 39 of 110

GUIDED_LOCAL_SE GUIDED_LOCAL GUIDED_LOCAL_SEARC N/ GUIDED_LOCAL_SEARCH


metaheuristic ARCH _SEARCH H GUIDED_LOCAL_SEAR
CH
Optimisation Distance Duration Distance Distance Distance Distance
N/A N/A number of stops 85% 95% 85% 95% 85% capacity of 95%
of baseline capacity of capacity of capacity of capacity of original capacity of
Capacity level routes original original original original vehicles original
vehicles vehicles vehicles vehicles vehicles

Time taken to run 8 hours 8 hours 11.5 hours 15 hours 15 hours 15 hours 15 hours 15 hours 15 hours

Heriot-Watt University
% scope with
100% 100% 100% 100% 99% 100% 99% 96% 78%
solutions found
% total reduction in
5% 4% 20% 30% 24% 31% 25% 25% 21%
distance
% total reduction in
4% 4% 17% 25% 20% 26% 21% 13% 11%
time

% total reduction in
N/A N/A N/A N/A N/A 19% 8% 18% 17%
average vehicles used
4.2.3.1. Key Finding 1 – The Original Routing operates a similar number of routes each
weekday despite a drop in nodes/customers on a Wednesday

In February 2020, Shortridge operated 630 routes covering 1720 unique nodes
(including 3 depot nodes). 111 (18%) of these routes involved the 18T vehicles for
trunking. Figure 4-4 displays the number of routes operated each day in February
(excluding Trunk vehicles) with a clear distinction between weekends and weekdays.
The number of nodes serviced each day follows a similar pattern (Figure 4-5). The main
exception to the pattern are Wednesdays which are on average over half the average for
the other weekdays (172.8 vs 370.7) but from Figure 4-4, a similar number of routes
are operated.

Figure 4-4 - Time series of the number of routes operated in February (source: author)

Figure 4-5 – Time series of the number of nodes serviced per day during February
(source: author)

Page 40 of 110
On weekdays the mean and median number of routes is 24 routes and reduces to 4 on
weekends and this pattern is broadly stable noted by the standard deviation of ± 2 routes
(Table 4-7). The number of nodes serviced per day has a greater variance, especially on
a weekday with a range of 294 nodes, largely driven by the differences on a Wednesday
(Table 4-8).

Table 4-7 - Descriptive statistics of the number of routes per day in February (source:
author)

Number of Mean Median Minimum Maximum Standard


routes per Deviation
Day
Weekday 24.0 24 20 27 1.8
Weekend 4.3 4 2 8 2.1
All 17.9 23 2 27 9.4

Table 4-8 - Descriptive statistics of the number of nodes serviced per day (source:
author)

Number of Mean Median Minimum Maximum Standard


nodes serviced Deviation
per day
Weekday 331.1 357 154 448 86.5
Weekend 16.4 16 9 25 6.4
All 233.4 330 9 448 164.4

Figure 4-6 shows the total distance covered and the combined duration of all vehicle
journeys in February. The pattern of the chart mirrors the pattern of the nodes per day
with a similar distance and duration covered each weekday except for Wednesday
(Figure 4-5). A possible inefficiency is highlighted by operating as many routes for a
reduced number of nodes on a Wednesday if total distance and total duration are also
reduced. Unless each vehicle is at 100% capacity, it perhaps implies the underutilisation
of the vehicles being used.

Page 41 of 110
Figure 4-6 - Time series of total distance (left) and total duration (right) of routes
during February (source: author)

Figure 4-7 shows that the Wednesday routes cover the North Pennines and North
Yorkshire Dales and visually there appears no spatial reason to have as many routes
operating. In addition, Figure 4-8 shows that these Wednesday nodes are predominantly
serviced once per week which suggests they are not Shortridge’s larger customers and
thus, it is unlikely all the vehicles are at capacity.

Figure 4-7 - Map of nodes visited in February 2020 with red denoting location visited
on a Wednesday (source: author)

Page 42 of 110
Figure 4-8 - Map of how many times per week a node is serviced: grey - once, black -
twice and red – three and more (source: author)

4.2.3.2. Key Finding 2 – Algorithm routing reduced the total route distances (and
duration) in all comparative analyses

All versions of the Algorithm routing reduced the total distance travelled throughout
February by between 4-31%. Total duration of routing was also reduced by between 4-
26%.

Analysis 1 shows that by simply reordering the delivery route to minimise the distance
travelled can result in a saving of 5,385km (5%). The majority of routes have less than
a 5% reduction in distance travelled (39%) or no improvement at all (31%) under
Algorithm routing, however, there are routes that make reductions in distance of up to
36% (Figure 4-9).

Page 43 of 110
Figure 4-9 - Histogram of percentage difference in route distance (left) and duration
(right) between Original and Algorithm routes in Analysis 1 (source: author)

It might be expected that the greater the number of nodes in a route, the greater the
difference between the Algorithm routing and Original routing. However, there is only
weak, positive correlation for both route distance and route time (0.44 and 0.36
respectively using Kendall’s Tau) (Figure 4-10), indicating extra variables are needed
to explain the relationship.

Figure 4-10 - Scatter graphs of the number of stops versus %difference between the
Original and Algorithm routing in route distance (left) and time (right) in Analysis 1
(source: author)

As an example, Route “2020-02-26+PX19LCJ” has four nodes and has a reduction in


distance by 26% and a duration saving of 24%. With the depot highlighted in red, Figure
4-11 shows the order of the nodes are visited has changed under the algorithm. When
this is plotted on the map in Figure 4-12 and Figure 4-13, there is a visible difference

Page 44 of 110
in the shape of the routes with the Original route seemingly criss-crossing, back and
forth and the Algorithm route more circular (note the distances are road distances, only
the visuals are as-the-crow-flies). By simply reordering the nodes, 58km is saved.

There may be extraneous factors why the order of the Original routes was selected
which are overlooked by the Algorithm and underlying routing engine. For example,
traffic might build-up at different points in the route or one node may have had a
particularly urgent order, however, it should be factored into the decision-making that
covering this extra distance increases the variable costs of the overall operation and
should be balanced accordingly.

Figure 4-11 – Network plot of Original route “2020-02-26+PX19LCJ” (blue) and


equivalent Algorithm route (teal) with depot in red (source: author)

Page 45 of 110
Figure 4-12 – Map plot of Original route “2020-02-26+PX19LCJ” with the red tooltip
indicating the depot (source: author)

Figure 4-13 - Map plot of Algorithm route “2020-02-26+PX19LCJ” with the red
tooltip indicating the depot (source: author)

Page 46 of 110
4.2.3.3. Key Finding 3 – Giving the Algorithm routing further freedom to select routes
at depot level further reduced total routing distance and time in all depot-level analyses

Whilst the depot-level version of the Algorithm routing includes further assumptions
around vehicle capacity and accessibility of customer sites which will reduce how
realistic the simulation is, by allowing the Algorithm routing this extra freedom, the
savings in distance covered increased to 20-31% versus the 4-5% at route-level.

Figure 4-14 shows the differences in route distribution between the Original routing
and the Algorithm routing. With depot-level routing, there is a more pronounced left-
shift in the distribution to routes between 0-150km with a compensatory reduction in
routes between 150-350km whereas at route-level these savings are not achieved to the
same extent (Original route distributions are slightly different between the charts due
to data cleansing reasons, see Appendix 3).

Figure 4-14 – the distribution of route distances for Original and Algorithm routing.
Adjusting only the order of routes in Analysis 1 (left) and allowing the Algorithm to
route at depot level in Analysis 2 (right) (source: author)

As an example, Workington depot on 05/02/2020 saw a reduction of 37% and 34% in


total route distance and duration, respectively in Analysis 2. Figure 4-15 shows an equal
number of stops per route for Original and Algorithm routing with only vehicle 3
covering a greater distance in Algorithm routing (Figure 4-16). Additionally, under the
Original routes the distance and time covered is similar across all the vehicles which is
not the case in the Algorithm routes. Whilst an even spread in route duration may help
with planning dependent activities (e.g. linen washing shifts) and driver compensation,
it is probable that fewer vehicles (and drivers) are needed. The excess distance covered

Page 47 of 110
is visualised by the differences routes on Figure 4-17 and Figure 4-18. The Original
routing has overlapping routes and limited node-clustering with each route covering
similar areas. Conversely, the Algorithm routing shows a clustering of nodes with
distinct separation of routes.

Figure 4-15 – Network view of Original routing (left) and Algorithm routing (right)
from Workington depot on 05/02/2020 in Analysis 2 (source: author)

Figure 4-16 – Comparison in vehicle distance (left) and time (right) between Original
routing and Algorithm routing from Workington depot on 05/02/2020 in Analysis 2
(source: author)

Page 48 of 110
Figure 4-17 – Map plot of Original routing from Workington depot on 05/02/2020
(source: author)

Figure 4-18 - Map plot of Algorithm routing from Workington depot on 05/02/2020 in
Analysis 2 (source: author)

Page 49 of 110
4.2.3.4. Key Finding 4 – The Algorithm routing maximises vehicle utilisation across
the fleet to first reduce the number of vehicles required and then individual vehicle
utilisation

Analysis 3 and Analysis 4 simulate customer demand and vehicle capacity constraints.
The difference is that Analysis 4 permits any vehicle from the depot and Analysis 3
restricts to the vehicles used in the Original routing. Table 4-9 shows the difference in
usage of the vehicle types across all depots. By giving the algorithm freedom of vehicle
selection, the vehicles with the highest capacity are used more regularly with usage of
the 500-bag capacity vehicle type increasing from 86% to 95% and thus, the smaller
capacity vehicle usage reduced.

Table 4-9 – Percentage of available journeys* vehicle type is used during February
2020 (source: author)

Vehicle Original Analysis 3 – no Analysis 4 – vehicle


Bag Routing vehicle selection selection
Capacity 85% 95% 85% 95%
CoODV CoODV CoODV CoODV
500 86% 86% 86% 95% 94%
250 72% 72% 71% 60% 70%
200 66% 66% 66% 35% 49%
80 60% 12% 21% 1% 10%
10 9% 0% 0% 0% 0%
*available journeys calculated at depot level as: the number of days the depot operated × vehicles at each depot

Due to customer demand being derived from the telematics data, comparisons between
the Algorithm routing and Original routing are questionable due to the assumptions.
Yet, there is a hint that vehicle selection might be suboptimal. Consider the example
from before, Workington depot on 05/02/2020 (Figure 4-17). Figure 4-19 shows the
Algorithm routing uses the larger vehicle as much as possible. Additionally, when the
Algorithm can select from the fleet of vehicles in Analysis 4, both 500-bag capacity
vehicles are utilised, reducing the number of vehicles needed from 5 to 3. Since
customer demand has been derived based upon the capacity of the original delivery
vehicles, in this example there are 1160 bags to be delivered. Whereas the Original
routing uses 5 vehicles to achieve that capacity, the Algorithm uses 3 vehicles, with 1
vehicle having a surplus space of 40 bags. Demand may be less too, say 986 bags, in

Page 50 of 110
which case, the algorithm shows that only two vehicles are required (Figure 4-21). In
both cases, route duration is not substantially impacted either (e.g. Figure 4-22).

Figure 4-19 – Number of nodes visited per vehicle for the Original routing and the
Algorithm routing under 95% CoODV simulation (left) and the bag capacity of the
vehicles (right) (source: author)

Figure 4-20 - Number of nodes visited per vehicle for the Original routing and the
Algorithm routing under 95% CoODV simulation (left) and the bag capacity of the
vehicles (right) with vehicle selection (source: author)

Figure 4-21 - Number of nodes visited per vehicle for the Original routing and the
Algorithm routing under 85% CoODV simulation with vehicle selection (source:
author)

Page 51 of 110
Figure 4-22 – Route duration for the Original routing and the Algorithm under 95%
CoODV simulation with vehicle selection (source: author)

Figure 4-23 - Map plot of Algorithm routing from Workington depot on 05/02/2020 in
with 95% CoODV and vehicle selection (source: author)

Page 52 of 110
There may be other reasons for vehicle selection not captured in the Algorithm routing:
particular vehicles may have been unavailable through maintenance, sites could be
inaccessible by larger vehicles, or drivers unavailable etc. However, this also impacts
the size and makeup of vehicle fleet required. In the analysis, the fleet consists of 28
vehicles (Table 4-10) and the maximum number used on any day in February was 25
vehicles occurring 7 times in the month suggesting there is a surplus of vehicles.
However, February is low-season for Shortridge, and the surplus vehicles are likely
utilised during high-season. Yet with improved routing and vehicle selection, both the
maximum number of vehicles required and the frequency in which the maximum occurs
reduced to 20-24 vehicles. There is also the possibility of reducing this even further by
holistic planning of customer demand, operations, and routing so slack from weekends
and Wednesdays may also be used.

Table 4-10 – Average usage of each vehicle type during February 2020 (source:
author)

Vehicle Bag Number Original Analysis 3 – no Analysis 4 –


Capacity of Routing vehicle selection vehicle selection
Vehicles 85% 95% 85% 95%
available CoODV CoODV CoODV CoODV
500 6 5.1 5.1 5.1 5.7 5.7
250 12 8.6 8.6 8.5 7.4 8.5
200 5 3.3 3.3 3.3 1.8 2.4
80 4 1.2 0.5 0.8 0 0.4
10 1 0.1 0 0 0 0
Total 28 18.3 17.6 17.8 14.9 17
Minimum 2 (10%) 2 (10%) 2 (11%) 2 (10%) 2 (11%)
(%freq)
Maximum 25 (24%) 20 (10%) 23 (7%) 20 (3%) 24 (4%)
(%freq)

4.2.3.5. Key Finding 5 – Algorithm routing does not always find a solution when the
solution-space is small

A Vehicle Routing Problem may have many solutions, however, only one of these
solutions is the Global Optimum. The more constraints that are added and the more
restrictive those constraints are, the smaller the solution-space becomes i.e. there could
be fewer local optima or even a single optimum. Thus, the heuristic and metaheuristic
employed by the algorithm may not find a solution within the time limit.

Page 53 of 110
Each of the 95% CoODV simulations failed to find a solution for the same record
(Dumfries on 19/02/2020). The problem appears feasible as it is designed based on the
Original routing and appears unremarkable with 49 nodes and a total demand of 2537
bags versus a total vehicle capacity over the 8 vehicles of 2700 bags. Furthermore,
additional constraints could make the problem unsolvable. This is perhaps evidenced in
Analysis 5 where limiting the duration of routes to less than 9 hours including a 10-
minute stop per customer may have made problems unsolvable, since in the analysis
the Original routing had routes over 9 hours under these conditions (Figure 4-24). Table
4-11 shows that the problems failing to find a solution, tended to have greater nodes
suggesting a more complex problem and smaller solution space.

Table 4-11 – Description of records with and without solutions from the routing
algorithm in Analysis 5 (source: author)

Analysis 5 85% CoODV 95% CoODV


Failed Succeeded Failed Succeeded
Records 3 69 16 56
Mean Nodes 185 84 142 72.6
Min Nodes 170 3 49 3
Max Nodes 203 184 203 82

Figure 4-24 – Routes from Original routing and Algorithm routing in Analysis 5 with
the route time constraint (source: author)

Page 54 of 110
4.2.4. Limitations

The Algorithm routing and simulations are a simplification of a real and dynamic
problem. Whilst the scope and assumptions have been explicitly stated, there will be
intricacies and variables in the real problem that have not been accounted for. For
example, the 18T vehicles are excluded from much of the analysis to simplify the
problem to customer deliveries only. However, as well as sometimes being used in
customer deliveries, deliveries from Darlington depend on these vehicles. Hence, these
vehicles have an important role on the real routing problem which is not accounted for
in these simulations. The definition and derivation of customer demand in Analyses 3,4
and 5 is unlikely to completely reflective of actual customer demand. True customer
demand is independent of vehicle capacity and will be driven by lots of factors
including both the season and the linen the customer already has in circulation. As well
as limitations in the data and how it is used (e.g. 1720 nodes versus a customer base of
1000), the visualisations also highlight discrepancies in the data of how a route has been
defined as the activity of the vehicle on a particular day. For example, for Route “2020-
02-28+PX67TNO”, the baseline route is split into two parts, returning to the depot in
between. Under a different route definition this could be defined as two routes or it
could highlight, if vehicle capacity was available, an inefficiency in the routing. The
algorithm for this route reduces the distance travelled by 157km (31%) (Figure 4-25).
Further cross-referencing of data sources would improve validity of the results.

Figure 4-25 – Example of a route that returns to the depot twice (source: author)

Page 55 of 110
4.3. Qualitative Analysis

The results and template analysis of the online questionnaire is described below. The
aim of the questionnaire was to gauge broad perceptions and understanding of BD and
BDA across the organisation. The questionnaire was designed to be completed in less
than 15 minutes to avoid taking too much time away from the business and ensure
completion. Completion rate was 85% and average completion time was 8.5 minutes.

4.3.1. Demographics of the sample

There were 13 participants that responded to the online questionnaire at Shortridge with
all participants answering positively to the questions on consent. However, 2 (15%)
participants did not complete the full questionnaire and with the placement of the
demographic questions at the end of the survey, these were not completed. Placing the
demographic questions earlier in the questionnaire may have led to lower response rates
in general (Roberson and Sundstrom, 1990) or it may have had no effect (Teclaw et al.,
2012). As the purpose of demographic questions in research is often either as an
independent variable or to ensure the “correct” population responded to the survey
(Hughes et al., 2016), its less important in this research within a single organisation.

Figure 4-26 shows how the 11 participants identified their role at Shortridge and the
length of time they have worked at Shortridge. Most participants identified as Senior
Management (64%) and had been with Shortridge at least one year (72%). The sample
is also influenced by the effects of the COVID-19 pandemic as only employees that
returned from furlough on 1st July 2020 were sent the link for the questionnaire.

8 64% 6
7 45%
5
6
5 4
Frequency

Frequency

4 27%
27% 3
3 18%
2 9% 2
1 9%
1
0
Sales / Customer Manager / Senior 0
Service and Supervisor Management More than 3 1 to 3 years Less than 1 Other
Office years year
Position at Shortridge Length of time with Shortridge

Figure 4-26 – Role of participant at Shortridge (left) and length of time with Shortridge
(right) (source: author)

Page 56 of 110
4.3.2. Problem-Solving (Q3)

The purpose of the question was to understand the approach the participant takes to
solve a problem. Following feedback from the pilot, this became a closed question to
aid understanding of the question, with three responses indicating the participant either
solves problems using their experience and intuition (A), shared problem-solving with
other colleagues (B) or through data and analysis (C). All 13 respondents answered the
question.

Table 4-12 – Responses from Q3 of the questionnaire (source: author)

Q3 If you were faced with a problem in your normal work, such as planning a
large production line, choosing to offer an additional product/service, or a
change in regulations or guidelines; would you...

A use your own experience and judgement to solve the problem 1 (8%)
and make a decision

B identify and speak with colleagues (or the internet) who may 6 (46%)
have an answer and between you make a choice

C collect data on the problem from lots of sources and using 4 (31%)
analysis outputs to make a decision

D Mixture of A,B,C 2 (15%)

Sourcing external information was the approach taken either from other colleagues or
from hard data, although when speaking with a colleague that external information
could be experience based or more likely from these results, data-driven. This could
also be indicative of the demographic of senior management who will regularly raise
and discuss issues during management meetings or seek insight from direct reports.

4.3.3. Example of company using BD (Q4)

The purpose of the question was to gauge the participants understanding of BD through
association with a company and prime the participant to be thinking in terms of BD for
the next questions, thus avoiding technical terms that may decrease response rate
(Bryman, 2012). The majority (77%) of participants showed an understanding of BD
through association by naming large technology companies who use BD (e.g. Amazon,
Google, Netflix, etc.), industries (e.g. financial services) or mainstream news (e.g.

Page 57 of 110
Cambridge Analytica). The remaining 3 participants (24%) did not provide an answer
or indicated they did not know.

4.3.4. Examples of BD at Shortridge (Q5)

The purpose of the question was to understand if the participants could recognise
opportunities for problem-solving with BD in their workplace. The majority (69%) of
participants provided an example with Participant 4 providing five possible examples.
General themes centred around the operational aspects of the organisation (e.g. washing
machine efficiency, transport and routing, daily reporting) and customer demand, both
current (e.g. customer databases, orders) and prospective (e.g. marketing, market
research). Participant 9 also gave an interesting response eluding to their definition of
BD:

“…the term “big” is just relative to the size of the organisation.” (Participant 9)

Perhaps illustrating the ambiguity of the definition of BD and the differences it has
between the mainstream and literature. Although this participant clarifies that most
organisations can use data, the response possibly eludes to a misconception that BD
applies only to “Big” companies.

4.3.5. Barriers to using BD at Shortridge (Q6)

The purpose of the question was to gauge the participants perception of the barriers to
BD at Shortridge and response rate was 72%. Themes broadly align with the
Technology, People and Organisation resources from the RBV of the firm (Table 4-13).

Several responses related to the data itself. Data integrity and quality is well-
documented in the literature (Hazen et al., 2014; Wamba et al., 2015). Whilst data
protection is an issue for some use-cases of BD, particularly for personal customer
information, it is unlikely to be an obstacle for all use-cases (e.g. washing machine
efficiency which this participant suggested for the previous question). Additionally, the
response by Participant 1 on applicability of the data perhaps corroborates the limited
knowledge of BDA within the business suggested by Participant 10.

Page 58 of 110
Table 4-13 – Summary of responses to Q6 from the questionnaire (source: author)

Resource Responses
Type

Technology  Data is “not specific enough to the requirements” (Participant 1)


 Data protection
 Integrity of the data
 Cost
 Infrastructure
Process  Implementation
 Collecting the data
 “lack of knowledge within the business of how to access it and
use it” (Participant 10)
People  Experience
 Technical Skills
 Knowledge
Organisation  “appreciation of [the data’s] value” (Participant 9)
 Isolated analytics usage

Other responses such as “Implementation” (Participant 5) and “Systems to collect it”


(Participant 3) may be from the perspective of installing the infrastructure and cost of
doing so. Or it could be from the perspective of understanding “how”: how it would be
implemented; how it would be collected and how it is accessed and used.

4.3.6. BD/BDA tools and techniques (Q7)

The purpose of this question was to understand the technical skills of the participants
at Shortridge through selection from a non-exhaustive list of BDA tools. Excluding the
participant who did not select any of the options, every participant had used Microsoft
Excel. Both Google and Amazon analytics and cloud products were selected by over
50% of participants. The validity of these responses is questionable considering the
answers to previous questions. In addition, experience of these computing services
tends to be accompanied by knowledge of a programming language, which only 1
respondent answered positively to those – SQL. This perhaps highlights a limitation
with the question design too. An open question or forced-choice design (Bryman, 2012)
may have yielded more reliable results or perhaps it is the conglomerate nature of these
companies that has caused confusion.

Page 59 of 110
11
10
9
8
7
Frequency

6
5
4
3
2
1
0
Microsoft Google Amazon Web SQL (inc. Hadoop / Python / R
Excel Analytics / Services MySQL, Spark / Hive
Google Teradata etc)
Cloud

Software, services and programming

Figure 4-27 – Selected software, services and programming options to Q7 from the
questionnaire (source: author)

Analytical techniques have low frequency selection with Regression Analysis the most
frequent selection of analytical technique with 3 positive respondents. This is
corroborated by the response of Participant 9 whom adds:

“historically a lone user of data to solve problems in the company” (Participant 9)

11
10
9
8
Frequency

7
6
5
4
3
2
1
0
Linear Programming / Clustering / Regression Analysis Machine Learning
Optimisation Segmentation
Techniques

Figure 4-28 – Selected responses to analytical techniques in Q7 of the questionnaire


(source: author)

4.4. Conclusion

Through an inductive and iterative approach, the analysis illustrates how the
experience-led heuristic (original) routing at Shortridge compares to a BDA solution
(algorithm). The BDA solution appears to show improvements on the experience-led

Page 60 of 110
heuristic, reducing the total distance between 4-31% and total duration by 4-26% and
explores the subsequent impact on fleet size. Though there are several limitations and
assumptions which will impact the external validity of the results, the indication is that
a more objective approach to vehicle routing, standardised across all three depots would
ultimately reduce delivery costs. Additionally, the reduced number of deliveries on a
Wednesday perhaps indicates that more holistic demand planning would benefit the
vehicle routing too. The results of the questionnaire indicate a limited understanding
and knowledge of BDA techniques at Shortridge. Although most participants elect to
solve problems with peer support and some identify as having data-driven problem-
solving approaches, this appears not to be comparable with modern analytical
approaches. 75% of respondents identified a BD associated business, 69% were able to
identify a use-case at Shortridge and a minority of the question responses clearly show
an appetite for using data and analytics but the level of knowledge and understanding
across the organisation is clearly a barrier. This is borne out in the responses to BD
obstacles and the BDA tools and techniques.

Page 61 of 110
Chapter 5 - Discussion

5.1. Introduction

The aim of this research is to investigate how a BDA solution to a VRP compares to an
experience-led heuristic in an SME and to understand the issues this case highlights in
SME adoption of BDA. The following chapter discusses the findings from undertaking
the five objectives (Table 3-1) in relation to the two research questions and the academic
literature.

5.2. Discussion
5.2.1. Research Question 1 - How does a BDA solution to a VRP compare to an
experience-led heuristic?

The results show the BDA solution outperformed the experience-led heuristic routing,
reducing distance covered between 4-31% which aligns with the 10-30% reduction
often claimed by vendors of CVRS (Bräysy and Hasle, 2014). The results also support
the findings by Fontaine et al. (2020) that even with experienced logistics personnel,
manual routing rarely solves the VRP optimally. Additionally, the differences between
the two methods are less marked when viewed at route-level than depot-level (4% vs
20-31%) which aligns with Fontaine et al. (2020) finding that within-cluster routing is
reasonable but clustering is generally poor. However, this may be an unfair comparison
since the structure of the routes go largely unchanged at Shortridge, only when the
demand peaks do the routes change, and this is not captured in the one-month window
used for the analysis. Yet not changing the routes for ease of administering the routes
and for developing a routine for the driver is indicative of VRP complexity. It highlights
the planning resource required for manual planning, the potential errors and
inefficiencies (Carlan et al., 2020) so using the same routes simplifies the problem.
Such a simplification ignores the principles of routing in the literature (e.g. minimising
mileage, using the largest vehicle first to maximise utilisation (Güneri, 2007)) leading
to excess variable costs. Conversely, the BDA solution captures these principles
inherently through the objective function. CVRS is estimated to save 80-90% of the
planning time over simple heuristics (Bräysy and Hasle, 2014) and, once developed,
the indications from this analysis are an equivalent BDA solution would do the same.

Page 62 of 110
The BDA solution also provides an oversight on the route duration allowing objective
performance monitoring of the routes and the drivers. Integrating the telematics with
BDA routing could also provide real-time updates on delivery progress and
interruptions on the route (Hopkins and Hawking, 2018). Furthermore, the BDA
solution designs routes around the customer demand rather than fitting the customer
demand into predetermined routes changing the focus of the logistics from the operation
to the customer, enabling involvement in holistic demand planning. Such a change in
focus has been shown to have further benefits such as higher customer satisfaction and
improved competitive advantage (Thomé et al., 2012; Wagner et al., 2014). With the
saving in route durations, it finds extra capacity for Shortridge to expand sales, reduce
their fleet or enhance their customer service offering for further competitive advantage.
For example, encouraging the drivers to use the extra time to provide a better customer
service.

However, a challenge with the BDA solution is making it as representative as possible


whilst still being able to find a solution, noted by the reduced solutions in Analysis 5
(96% and 78%) where constraints were most restrictive. For instance, to be fully
representative further iterations of the BDA solution would be necessary to include
constraints around accessibility at each customer site, to be able to simulate differing
sized deliveries from collections and to include the simulations for trunking. These
criticisms around practicality are also levelled at CVRS software in the literature
(Rushton et al., 2010; Rincon-Garcia et al., 2017; Fontaine et al., 2020). However, the
benefit of a BDA solution is the flexibility to attempt these changes where CVRS is
either “black-box” or requires external support to do so (Bräysy and Hasle, 2014; Carlan
et al., 2020). From this example, the trade-off appears to be the expertise to build the
solution versus the cost of purchasing the CVRS (Rincon-Garcia et al., 2017) but as
suggested by Bräysy and Hasle (2014), manual experience-led routing is inadequate.

5.2.2. Research Question 2 - How does a BDA solution to a VRP compare to an


experience-led heuristic?

The BDA solution to the VRP and subsequent routing improvements showcases the
insights and value that can be generated from BDA in an SME and confirms BDA as
an effective tool for planning (Chehbi-Gamoura et al., 2020). Despite this, there are
limitations to the analysis, not least from the quality of the raw telematics data. Under

Page 63 of 110
the RBV (Barney, 1991), BD is a critical resource (Mikalef et al., 2018) and multiple
researchers highlight the importance of data quality in using BDA to avoid incorrect
decisions (Hazen et al., 2014). The autonomously generated telematics data contained
a large volume of noise requiring extensive data cleansing for analysis. However, there
is still noise in the data as seen by the 1720 nodes versus a customer base of
approximately 1000. Though the analysis could have cross-referenced with other
datasources for validation, the noise inherent in the data potentially decreases the value
of the insights. LaValle et al. (2011) stated data quality is not a barrier to adoption,
however, it is a significant obstacle and can limit the value (Wamba et al., 2015).

To overcome such data quality issues and deliver value requires the intangible resources
of technical expertise to transform the data and understand the business context (Waller
and Fawcett, 2013; Wamba et al., 2015). In this analysis, this meant the transformation
of the raw data, the routing engine setup, the build of the routing algorithm to meet the
problem requirements, and then to run the simulations. However, the results of the
questionnaire indicate a limited understanding of BD and BDA with numerous
participants citing little experience and knowledge across Shortridge as a barrier to
adoption. It appears a little more nuanced though since the majority (69%) of
participants provided an example potential BD at Shortridge indicating an awareness of
what BD is. The possible gap in knowledge is in knowing both how to collect the BD
and then to derive value from it with BDA. This gap in empirical examples is widely
reported in the literature too (Mortenson et al., 2015; Mikalef et al., 2018; Conboy et
al., 2020) and Coleman et al. (2016) suggests these example use-cases are particularly
important for SME adoption.

The drivers are a critical resource to the logistics operation and key to implementation
of new routing. The drivers need to trust the changes to the static routes to undertake
them and their feedback is critical in continually improving the routing. Literature
highlights “Integrated human-data intelligence” as core for developing BDA
capabilities within a production environment (Belhadi et al., 2019, p. 12) implying the
BDA solution as an enhancement rather than a replacement of human decisioning. This
is perhaps the heart of a DDC with all levels of the organisation viewing and perceiving
data as an enhancement (Akter et al., 2019; Dremel et al., 2020). Vidgen et al. (2017)
highlights a shift in problem-solving skills is often required, not least in senior

Page 64 of 110
management whose support has been shown as important for BDA adoption
(Schoenherr and Speier-Pero, 2015; Shukla and Mattar, 2019). Plus, this is the
population that often performs analytics (Bordeleau et al., 2019). The questionnaire
sample was largely (91%) Management or Senior Management, and though a
proportion (31%) of this sample identified with data-driven decision-making, there
appears not to be the requisite understanding of analytic tools and techniques to
recognise how insights would be derived, which researchers have found is important
for DDC (McAfee et al., 2012; Mikalef et al., 2018; Conboy et al., 2020). Thus, without
the requisite knowledge and understanding to develop the DDC at a senior management
level, adoption of BDA is unlikely.

5.3. Conclusion

The BDA routing algorithm reduced the total distance of the manual experience-led
routing by 4-31%. The results highlight both the complexity of the VRP and the
inefficiencies and inadequacy of manual experience-led routing. BDA routing
algorithm allows the flexibility to plan routes according to demand and unlocks
opportunities to reduce costs, to expand or to improve customer service and
performance. The benefits are suspected to be similar to CVRS but inexpensive and
with the advantage of creating a bespoke solution to the problem. However, the
challenge is having the expertise and understanding to do so. In this case, though there
is an awareness of BD in the organisation, the specific understanding and expertise to
collect, extract and manipulate the BD to build BDA solutions is limited. As highlighted
by the literature, further use-cases that focus on “how” will help develop this
understanding and demonstrate the value of BDA. Internally too, other intangible
resources cited in the literature such as a DDC and senior management support are also
evident from the case which supports the literature from an SME perspective.

Page 65 of 110
Chapter 6 - Conclusion

6.1. Introduction

This chapter concludes the research. It first begins with a summary of the research and
the outcomes before, assessing the wider implications for practitioners and future
research and acknowledging the limitations.

6.2. Research Summary and Outcomes

This research investigated how a BDA solution to a VRP compares to an experience-


led heuristic in an SME and what this case highlights in SME adoption of BDA. The
literature review revealed ambiguous definitions for BD and BDA which were also
evidenced in the case. Under a reductionist view, BD refers to volume, however, the
literature shows it can have up to eight dimensions which if included, potentially creates
further opportunities for BD use. The literature also highlighted that data and analytics
are interdisciplinary, so different terms are used for BDA across research disciplines
which is another probable cause of ambiguity. There is limited BDA adoption among
SMEs and whilst there is a vast amount of literature on BDA, there is an absence of
empirical examples particularly in SMEs, at the intersection of LSCM and OR
disciplines and of how organisations create a BDAC. Furthermore, methods for solving
VRPs in the literature, while extensive, are largely inaccessible and impractical for
practitioners which leaves practitioners with the option of inadequate, experience-led
heuristic routing or expensive CVRS.

To add practical research to the body of BDA literature, the research strategy followed
a single case study of an SME, Shortridge Limited, with data collection through an
embedded mixed methods design. Descriptive BDA on a sample of telematics data and
other secondary data highlighted the inadequacy of experience-led heuristic routing via
both the complexity of the VRP in context and a deviation from the principles of routing
found in the literature. To simplify the VRP at Shortridge, routes cycled weekly and
were largely static with similar numbers of routes operated each weekday despite
differing customer volumes. Using prescriptive BDA, five versions of Algorithm
routing were developed iteratively in Python using the open-source Google OR-Tools
library. A comparative analysis showed the Algorithm routing reduced total distance
covered by 4-31% and total duration of routes by 4-26% versus the experience-led

Page 66 of 110
heuristic routing. This comparative analysis highlighted further inefficiencies of the
experience-led heuristic routing in the order the customers were visited, the way the
customers were clustered to create routes and likely underutilisation of delivery
vehicles. Such inefficiencies likely lead to unnecessary transportation variable costs for
Shortridge with the results drawing a parallel with the benefits associated with CVRS
in the literature.

Although these results show the value of BDA, challenges were identified in the
analysis such as the BDA routing algorithm struggling to find solutions as the VRP
became more complex and the required expertise and knowledge resources required to
build such a solution. The self-report questionnaire of the employees at Shortridge
showed that perhaps the required technical expertise is unavailable creating a barrier
for adoption of BDA. Additionally, other resource barriers identified from the literature
were also present such as data quality, DDC, senior leadership support, costs, data
protection and understanding “how” to collect the data and generate value. With the
evidence of such barriers, it is comprehensible why experience-led heuristic routing and
CVRS are perhaps more commonly used for routing and why adoption of BDA in SMEs
is limited.

6.3. Implications for practitioners

The research provides an example use-case of BDA in real business setting. The
research illustrates the value that can be generated and the opportunities that BDA can
unlock to optimise operational processes. Specifically, the research shows how to setup
an inhouse routing engine (and prerequisites) to optimise routing of delivery vehicles
using opensource data and tools, providing an alternative to CVRS. The research
demonstrates the further opportunities that can be uncovered to improve an
organisation. The BDA solution to routing, enables a customer-focussed logistics
operation, enables objective route and driver performance monitoring and involvement
in holistic demand planning which can be used to expand operations, cut costs and
improve customer service for further competitive advantage. For SMEs, this is
particularly relevant as it shows how their customer service can become inimitable by
larger organisations. Additionally, the research perhaps provides an informal
benchmark for SMEs on which to self-reflect. The barriers to BDA adoption may be

Page 67 of 110
similar in other organisations and the challenge for practitioners and research is to work
out a way of overcoming them.

6.4. Implications for future research

Future research should look to address the “how” questions. How can organisations and
SMEs overcome the barriers to BDA adoption, how do organisations and SMEs collect
the BD and generate value from it, how do organisations and SMEs cultivate a DDC.
Further case studies, empirical examples and action research of BDA adoption and
usage within different contexts would help to establish a more representative common
case among SMEs. In addition, these examples will provide practitioners with examples
they can look to recreate. The literature has emphasised the need for further research at
the intersection of OR and LSCM and the VRP is a prime example where reality meets
theory. Future VRP methods and solutions should look to be benchmarked on both the
standard benchmarking datasets and in a real context for the research to have more
utility for practitioners. Additionally, comparisons with and among CVRS would also
deliver value for developers, researchers and practitioners.

6.5. Limitations

The research follows a single case study which may mean findings lack generalisability
to the wider population (other SMEs, larger organisations, industries). The data
available, the routing problems, the knowledge of the employees could well be very
different in other cases and the wider population. The research also took place during
the COVID-19 pandemic in which UK Government-imposed lockdown restrictions
limited the choice of research approaches. This impacted the Shortridge too, with the
furloughing of staff and only operating a single plant/depot due to 99% of the customer
base shutting down also. Results from the BDA routing may also be less valid for the
level of operation at Shortridge after the lockdown restrictions are raised and the
questionnaire may have had different results if the furlough was not in place.

The BDA routing relied on the cleansing of the telematics data and several explicitly
called out assumptions. The results and conclusions should only be quoted with
reference to these assumptions. A variation in results would also be expected in a real
implementation as the cleansed telematics data included noise so nodes that were not

Page 68 of 110
customer nodes have been used in the analysis, plus the large 18T vehicles were out of
scope in the analysis.

Formatting and clarity of a couple of the online self-report questionnaire questions


could have been improved, particularly Question 7. A forced-choice design is
recommended in the literature. Definitions of the tools, services and techniques may
also have helped improve participant understanding and choices. Thus, improving
internal validity of the instrument. Questionnaire responses were also limited, and other
collection instruments may have been more effective.

Page 69 of 110
References

Abdel-Basset, M., Abdel-Fatah, L. and Sangaiah, A. K. (2018) 'Metaheuristic


Algorithms: A Comprehensive Review', in Sangaiah, A. K., Sheng, M. and Zhang, Z.
(eds.) Computational Intelligence for Multimedia Big Data on the Cloud with
Engineering Applications. Elsevier, pp. 185-231.

Akter, S., Bandara, R., Hani, U., Wamba, S. F., Foropon, C. and Papadopoulos, T.
(2019) 'Analytics-Based Decision-Making for Service Systems: A Qualitative Study
and Agenda for Future Research', International Journal of Information Management,
48, pp. 85-95.

Akter, S., Wamba, S. F., Gunasekaran, A., Dubey, R. and Childe, S. J. (2016) 'How to
Improve Firm Performance Using Big Data Analytics Capability and Business Strategy
Alignment?', International Journal of Production Economics, 182, pp. 113-131.

Andrade, E. S. D. (2018) How to Use APIs with Pandas and Store the Results in
Redshift. Available at: https://medium.com/@ericsalesdeandrade/how-to-call-rest-
apis-with-pandas-and-store-the-results-in-redshift-2b35f40aa98f (Accessed: 02 July
2020).

Barney, J. (1991) 'Firm Resources and Sustained Competitive Advantage', Journal of


Management, 17(1), pp. 99-120.

Belhadi, A., Zkik, K., Cherrafi, A., Yusof, S. r. M. and El fezazi, S. (2019)
'Understanding Big Data Analytics for Manufacturing Processes: Insights from
Literature Review and Multiple Case Studies', Computers & Industrial Engineering,
137.

Bharadwaj, A. S. (2000) 'A Resource-Based Perspective on Information Technology


Capability and Firm Performance: an Empirical Investigation', MIS Quarterly, 24(1),
pp. 169-196.

Biddle, C. and Schafft, K. A. (2015) 'Axiology and Anomaly in the Practice of Mixed
Methods Work: Pragmatism, Valuation, and the Transformative Paradigm', Journal of
Mixed Methods Research, 9(4), pp. 320-334.

Page 70 of 110
Bing Maps (2020) Distance Matrix API. Available at: https://www.microsoft.com/en-
us/maps/distance-matrix (Accessed: 2 July 2020).

Boldosova, V. (2019) 'Deliberate Storytelling in Big Data Analytics Adoption',


Information Systems Journal, 29(6), pp. 1126-1152.

Bordeleau, F.-E., Mosconi, E. and de Santa-Eulalia, L. A. (2019) 'Business Intelligence


and Analytics Value Creation in Industry 4.0: A Multiple Case Study in Manufacturing
Medium Enterprises', Production Planning & Control, 31(2-3), pp. 173-185.

Bowling, A. (2005) 'Mode of Questionnaire Administration can have Serious Effects


on Data Quality', Journal of Public Health, 27(3), pp. 281-291.

Božič, K. and Dimovski, V. (2019) 'Business Intelligence and Analytics Use,


Innovation Ambidexterity, and Firm Performance: A Dynamic Capabilities
Perspective', The Journal of Strategic Information Systems, 28(4), p. 101578.

Braekers, K., Ramaekers, K. and Van Nieuwenhuyse, I. (2016) 'The Vehicle Routing
Problem: State of the Art Classification and Review', Computers & Industrial
Engineering, 99, pp. 300-313.

Bräysy, O. and Hasle, G. (2014) 'Chapter 12: Software Tools and Emerging
Technologies for Vehicle Routing and Intermodal Transportation', in Toth, P. and Vigo,
D. (eds.) Vehicle Routing: Problems, Methods, and Applications. 2nd edn.
Philadelphia: Society for Industrial and Applied Mathematics, pp. 351-380.

Bryman, A. (2012) Social Research Methods. 4th edn. New York: Oxford University
Press.

Bughin, J. (2016) 'Big Data, Big Bang?', Journal of Big Data, 3(1), pp. 1-14.

Buonanno, G., Faverio, P., Pigni, F., Ravarini, A., Sciuto, D. and Tagliavini, M. (2005)
'Factors Affecting ERP system Adoption: A Comparative Analysis between SMEs and
Large Companies', Journal of Enterprise Information Management, 18(4), pp. 384-426.

Carlan, V., Huybrechts, T., Hellinckx, P. and Vanelslander, T. (2020) 'A Universal
Middleware Streaming Framework and Data Analytics: Analysing their Economic

Page 71 of 110
Feasibility in Road Transport Planning', Research in Transportation Business &
Management, 34, p. 100424.

CDRC (2018) The General Data Protection Regulation & Social Science Research.
Available at: https://www.cdrc.ac.uk/wp-content/uploads/2018/05/6-GDPR-and-
social-science-research-full-document-1.pdf (Accessed: 8 April 2020).

Chehbi-Gamoura, S., Derrouiche, R., Damand, D. and Barth, M. (2020) 'Insights from
Big Data Analytics in Supply Chain Management: An All-Inclusive Literature Review
Using the SCOR Model', Production Planning & Control, 31(5), pp. 355-382.

Chen, H., Chiang, R. H. and Storey, V. C. (2012) 'Business Intelligence and Analytics:
From Big Data to Big Impact', MIS Quarterly, 36(4), pp. 1165-1188.

Chen, M., Mao, S. and Liu, Y. (2014) 'Big Data: A Survey', Mobile Networks and
Applications, 19(2), pp. 171-209.

Clarke, G. and Wright, J. W. (1964) 'Scheduling of Vehicles from a Central Depot to a


Number of Delivery Points', Operations Research, 12(4), pp. 568-581.

Coleman, S., Göb, R., Manco, G., Pievatolo, A., Tort-Martorell, X. and Reis, M. S.
(2016) 'How Can SMEs Benefit from Big Data? Challenges and a Path Forward',
Quality and Reliability Engineering International, 32(6), pp. 2151-2164.

Conboy, K., Mikalef, P., Dennehy, D. and Krogstie, J. (2020) 'Using Business Analytics
to Enhance Dynamic Capabilities in Operations Research: A Case Analysis and
Research Agenda', European Journal of Operational Research, 281(3), pp. 656-672.

Creswell, J. W. (2003) Research Design: Qualitative, Quantitative, and Mixed Methods


Approaches. 2nd edn. California: Sage Publications.

Creswell, J. W. and Plano Clark, V. L. (2011) Designing and Conducting Mixed


Methods Research. 2nd edn. California: Sage Publications.

Cukier, K. (2010) 'Data, Data Everywhere: Managing Information', The Economist.


[Online] Available at: https://www.economist.com/special-report/2010/02/27/all-too-
much (Accessed: 21 May 2020).

Page 72 of 110
Dantzig, G. B. and Ramser, J. H. (1959) 'The Truck Dispatching Problem', Management
Science, 6(1), pp. 80-91.

Dedić, N. and Stanier, C. (2016) 'Towards Differentiating Business Intelligence, Big


Data, Data Analytics and Knowledge Discovery', International Conference on
Enterprise Resource Planning Systems. Hagenberg, Austria, 14 November. Springer,
pp. 114-122.

Dedić, N. and Stanier, C. (2017) 'Measuring the Success of Changes to Business


Intelligence Solutions to Improve Business Intelligence Reporting', Journal of
Management Analytics, 4(2), pp. 130-144.

Del Vecchio, P., Di Minin, A., Petruzzelli, A. M., Panniello, U. and Pirri, S. (2018) 'Big
Data for Open Innovation in SMEs and Large Corporations: Trends, Opportunities, and
Challenges', Creativity and Innovation Management, 27(1), pp. 6-22.

Demchenko, Y., Grosso, P., De Laat, C. and Membrey, P. (2013) 2013 International
Conference on Collaboration Technologies and Systems (CTS). California, USA, 20-
24 May. IEEE.

Department for Business Energy & Industrial Strategy (2019) Business Population
Estimates 2019.UK Government. [Online]. Available at:
https://www.gov.uk/government/statistics/business-population-estimates-2019
(Accessed: 22 February 2020).

DistanceMatrix AI (2020) Pricing. Available at: https://distancematrix.ai/pricing


(Accessed: 2 July 2020).

Docker (2020). Available at: https://www.docker.com/ (Accessed: 29 June 2020).

Dong, J. Q. and Yang, C.-H. (2020) 'Business Value of Big Data Analytics: A Systems-
Theoretic Approach and Empirical Test', Information & Management, 57(1), p. 103124.

Dremel, C., Herterich, M. M., Wulf, J. and vom Brocke, J. (2020) 'Actualizing Big Data
Analytics Affordances: A Revelatory Case Study', Information & Management, 57(1).

Duan, L. and Xiong, Y. (2015) 'Big Data Analytics and Business Analytics', Journal of
Management Analytics, 2(1), pp. 1-21.

Page 73 of 110
Eiselt, H. and Sandblom, C.-L. (2000) 'Heuristic Algorithms', in Integer Programming
and Network Models. Berlin: Springer, pp. 229-258.

Eisenhardt, K. M. and Martin, J. A. (2000) 'Dynamic Capabilities: What Are They?',


Strategic Management Journal, 21(10‐11), pp. 1105-1121.

European Commission (2020) European Data Strategy. Available at:


https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-
age/european-data-strategy (Accessed: 27 May 2020).

Eurostat (2020) Big Data Analysis. Available at:


https://appsso.eurostat.ec.europa.eu/nui/show.do?query=BOOKMARK_DS-
801562_QID_5CF62F3F_UID_-
3F171EB0&layout=UNIT,L,X,0;GEO,L,Y,0;TIME,C,Z,0;INDIC_IS,L,Z,1;SIZEN_R
2,L,Z,2;INDICATORS,C,Z,3;&zSelection=DS-801562INDIC_IS,E_BD;DS-
801562SIZEN_R2,L_C10_S951_XK;DS-801562TIME,2018;DS-
801562INDICATORS,OBS_FLAG;&rankName1=TIME_1_0_-
1_2&rankName2=INDICATORS_1_2_-1_2&rankName3=SIZEN-R2_1_2_-
1_2&rankName4=INDIC-
IS_1_2_0_0&rankName5=UNIT_1_2_0_0&rankName6=GEO_1_2_0_1&rStp=&cSt
p=&rDCh=&cDCh=&rDM=true&cDM=true&footnes=false&empty=false&wai=fals
e&time_mode=NONE&time_most_recent=false&lang=EN&cfo=%23%23%23%2C
%23%23%23.%23%23%23 (Accessed: 27 May 2020).

Feilzer, M. Y. (2009) 'Doing Mixed Methods Research Pragmatically: Implications for


the Rediscovery of Pragmatism as a Research Paradigm', Journal of Mixed Methods
Research, 4(1), pp. 6-16.

Ferraris, A., Mazzoleni, A., Devalle, A. and Couturier, J. (2019) 'Big Data Analytics
Capabilities and Knowledge Management: Impact on Firm Performance', Management
Decision, 57(8), pp. 1923-1936.

Fontaine, P., Taube, F. and Minner, S. (2020) 'Human Solution Strategies for the
Vehicle Routing Problem: Experimental Findings and a Choice-Based Theory',
Computers & Operations Research, p. 104962.

Page 74 of 110
Furnon, V. (2017) 'Ortools RoutingModel not finding best solution to a VRP in a 14-
node example'. or-tools-discuss: Google. Available at:
https://groups.google.com/forum/#!topic/or-tools-discuss/6KHuJZ3C3VQ (Accessed:
7 July 2020).

Gandomi, A. and Haider, M. (2015) 'Beyond the Hype: Big Data Concepts, Methods,
and Analytics', International Journal of Information Management, 35(2), pp. 137-144.

Garengo, P. and Bititci, U. (2007) 'Towards a Contingency Approach to Performance


Measurement: An Empirical Study in Scottish SMEs', International Journal of
Operations & Production Management, 27(8), pp. 802-825.

Geisberger, R., Sanders, P., Schultes, D. and Delling, D. (2008) WEA 2008:
International Workshop on Experimental and Efficient Algorithms. Provincetown,
USA, 30 May - 1 June. Germany: Springer Berlin Heidelberg.

Gendreau, M., Potvin, J.-Y., Bräumlaysy, O., Hasle, G. and Løkketangen, A. (2008)
'Metaheuristics for the Vehicle Routing Problem and Its Extensions: A Categorized
Bibliography', in Golden, B., Raghavan, S. and Wasil, E. (eds.) The Vehicle Routing
Problem: Latest Advances and New Challenges. Boston, MA: Springer US, pp. 143-
169.

Geofabrik (2018). Available at: https://download.geofabrik.de/europe.html (Accessed:


29 June 2020).

George, G., Osinga, E. C., Lavie, D. and Scott, B. A. (2016) 'Big Data and Data Science
Methods for Management Research', Academy of Management Journal, 59(5), pp.
1493-1507.

GetTheData (2020) Open Postcode Geo - API Version. Available at:


https://www.getthedata.com/open-postcode-geo-api (Accessed: 9 April 2020).

Ghasemaghaei, M. and Calic, G. (2020) 'Assessing the Impact of Big Data on Firm
Innovation Performance: Big Data is not Always Better Data', Journal of Business
Research, 108, pp. 147-162.

Page 75 of 110
Gibb, A. A. (2000) 'SME Policy, Academic Research and the Growth of Ignorance,
Mythical Concepts, Myths, Assumptions, Rituals and Confusions', International Small
Business Journal, 18(3), pp. 13-35.

Glover, F. W. and Kochenberger, G. A. (eds.) (2003) Handbook of Metaheuristics. New


York: Kluwer Academic Publishers.

Goldberg, D. W., Wilson, J. P. and Knoblock, C. A. (2007) 'From Text to Geographic


Coordinates: The Current State of Geocoding', Journal of Urban and Regional
Information Systems Association, 19(1), pp. 33-46.

Google Maps (2020). Available at:


https://developers.google.com/maps/documentation/distance-matrix/usage-and-billing
(Accessed: 2 July 2020).

Google OR-Tools (2020a) Routing Options. Available at:


https://developers.google.com/optimization/routing/routing_options (Accessed: 5 July
2020).

Google OR-Tools (2020b) Vehicle Routing Problem. Available at:


https://developers.google.com/optimization/routing/vrp (Accessed: 2 July 2020).

Gray, C. and Mabey, C. (2016) 'Management Development', International Small


Business Journal: Researching Entrepreneurship, 23(5), pp. 467-485.

Gubbins, E. J. (2003) Managing Transport Operations. 3rd edn. London: Kogan Page.

Güneri, A. (2007) 'Physical Distribution Activities and Vehicle Routing Problems in


Logistics Management: A Case Study', Proceedings of the Institution of Mechanical
Engineers, Part B: Journal of Engineering Manufacture, 221(1), pp. 123-133.

Gurobi Optimization (2020) tsp.py. Available at:


https://www.gurobi.com/documentation/9.0/examples/tsp_py.html (Accessed: 2 July
2020).

Hazen, B. T., Boone, C. A., Ezell, J. D. and Jones-Farmer, L. A. (2014) 'Data Quality
for Data Science, Predictive Analytics, and Big Data in Supply Chain Management: An

Page 76 of 110
Introduction to the Problem and Suggestions for Research and Applications',
International Journal of Production Economics, 154, pp. 72-80.

Hindle, G., Kunc, M., Mortensen, M., Oztekin, A. and Vidgen, R. (2020) 'Business
Analytics: Defining the Field and Identifying a Research Agenda', European Journal
of Operational Research, 281(3), pp. 483-490.

Hindle, G. and Vidgen, R. (2018) 'Developing a Business Analytics Methodology: A


Case Study in the Foodbank Sector', European Journal of Operational Research,
268(3), pp. 836-851.

Hopkins, J. and Hawking, P. (2018) 'Big Data Analytics and IoT in Logistics: A Case
Study', The International Journal of Logistics Management, 29(2), pp. 575-591.

Hornstra, R. P., Silva, A., Roodbergen, K. J. and Coelho, L. C. (2020) 'The Vehicle
Routing Problem with Simultaneous Pickup and Delivery and Handling Costs',
Computers & Operations Research, 115, p. 104858.

Hughes, J. L., Camden, A. A. and Yangchen, T. (2016) 'Rethinking and Updating


Demographic Questions: Guidance to Improve Descriptions of Research Samples', Psi
Chi Journal of Psychological Research, 21(3), pp. 138-151.

IBM (2020) IBM CPLEX Optimizer. Available at:


https://www.ibm.com/analytics/cplex-optimizer (Accessed: 2 July 2020).

Jenkins, H. (2004) 'A Critique of Conventional CSR theory: An SME Perspective',


Journal of General Management, 29(4), pp. 37-57.

Kache, F. and Seuring, S. (2017) 'Challenges and Opportunities of Digital Information


at the Intersection of Big Data Analytics and Supply Chain Management', International
Journal of Operations & Production Management, 37(1), pp. 10-36.

Kamble, S. S. and Gunasekaran, A. (2019) 'Big Data-Driven Supply Chain Performance


Measurement System: A Review and Framework for Implementation', International
Journal of Production Research, 58(1), pp. 65-86.

Kasilingam, R. G. (1998) Logistics and transportation: Design and Planning.


Dordrecht: Kluwer Academic Publishers.

Page 77 of 110
Kayser, V., Nehrke, B. and Zubovic, D. (2018) 'Data Science as an Innovation
Challenge: From Big Data to Value Proposition', Technology Innovation Management
Review, 8(3), pp. 16-25.

King, N. (2012) 'Doing Template Analysis', in Symon, G. and Cassell, C. (eds.)


Qualitative Organizational research: Core Methods and Current Challenges. London:
Sage Publications, pp. 426-450.

Kovács, G., van Hoek, R. and Spens, K. M. (2005) 'Abductive Reasoning in Logistics
Research', International Journal of Physical Distribution & Logistics Management,
35(2), pp. 132-144.

Kuo, R. (2001) 'A Sales Forecasting System Based on Fuzzy Neural Network with
Initial Weights Generated by Genetic Algorithm', European Journal of Operational
Research, 129(3), pp. 496-517.

Lamba, H. S. and Dubey, S. K. (2015) 'Analysis of Requirements for Big Data Adoption
to Maximize IT Business Value', 2015 4th International Conference on Reliability,
Infocom Technologies and Optimization (ICRITO)(Trends and Future Directions).
Noida, India, 2-4 September. pp. 1-6.

Laporte, G. (1992) 'The Vehicle Routing Problem: An Overview of Exact and


Approximate Algorithms', European Journal of Operational Research, 59(3), pp. 345-
358.

LaValle, S., Lesser, E., Shockley, R., Hopkins, M. S. and Kruschwitz, N. (2011) 'Big
Data, Analytics and the Path from Insights to Value', MIT Sloan Management Review,
52(2), pp. 21-32.

Leuthold, F. (2020) 'Run OSRM in Docker on Windows', phabi.ch. Available at:


https://phabi.ch/2020/05/06/run-osrm-in-docker-on-windows/ (Accessed: 30 June
2020).

Lipworth, W., Mason, P. H., Kerridge, I. and Ioannidis, J. P. A. (2017) 'Ethics and
Epistemology in Big Data Research', Journal of Bioethical Inquiry, 14(4), pp. 489-500.

Page 78 of 110
Lowrie, I. (2017) 'Algorithmic Rationality: Epistemology and Efficiency in the Data
Sciences', Big Data & Society, 4(1).

Luxen, D. and Vetter, C. (2011) 'Real-time routing with OpenStreetMap data', 19th
ACM SIGSPATIAL international conference on advances in geographic information
systems. Chicago, USA, 1-4 November. New York, USA: ACM, pp. 513-516.

Malandraki, C. and Daskin, M. S. (1992) 'Time Dependent Vehicle Routing Problems:


Formulations, Properties and Heuristic Algorithms', Transportation Science, 26(3), pp.
185-200.

Mapbox (2020) Mapbox Pricing. Available at: https://www.mapbox.com/pricing/


(Accessed: 2 July 2020).

MaxOptra (2020). Available at: https://maxoptra.com/ (Accessed: 19 June 2020).

McAfee, A., Brynjolfsson, E., Davenport, T. H., Patil, D. and Barton, D. (2012) 'Big
Data: The Management Revolution', Harvard Business Review, 90(10), pp. 60-68.

Mehozay, Y. and Fisher, E. (2019) 'The Epistemology of Algorithmic Risk Assessment


and the Path Towards a Non-Penology Penology', Punishment & Society, 21(5), pp.
523-541.

Mikalef, P., Boura, M., Lekakos, G. and Krogstie, J. (2019) 'Big Data Analytics and
Firm Performance: Findings from a Mixed-Method Approach', Journal of Business
Research, 98, pp. 261-276.

Mikalef, P., Krogstie, J., Pappas, I. O. and Pavlou, P. (2020) 'Exploring the Relationship
Between Big Data Analytics Capability and Competitive Performance: The Mediating
Roles of Dynamic and Operational Capabilities', Information & Management, 57(2), p.
103169.

Mikalef, P., Pappas, I. O., Krogstie, J. and Giannakos, M. (2018) 'Big Data Analytics
Capabilities: A Systematic Literature Review and Research Agenda', Information
Systems and e-Business Management, 16(3), pp. 547-578.

Min, H. (1989) 'The Multiple Vehicle Routing Problem with Simultaneous Delivery
and Pick-up Points', Transportation Research, 23(5), pp. 377-386.

Page 79 of 110
Miwa, T. and Bell, M. G. (2017) 'Efficiency of Routing and Scheduling System for
Small and Medium Size Enterprises Utilizing Vehicle Location Data', Journal of
Intelligent Transportation Systems, 21(3), pp. 239-250.

Montagné, R. and Sanchez, D. T. (2020) A Python Framework for Solving the VRP and
its Variants with Column Generation. Available at: https://github.com/Kuifje02/vrpy
(Accessed: 2 July 2020).

Mortenson, M. J., Doherty, N. F. and Robinson, S. (2015) 'Operational Research from


Taylorism to Terabytes: A Research Agenda for the Analytics Age', European Journal
of Operational Research, 241(3), pp. 583-595.

Motorvation (Shows on the Road) Ltd (2020) Truck Spec. Available at:
http://motorv.com/truck-specifications/ (Accessed: 01 July 2020).

Müller, O., Fay, M. and vom Brocke, J. (2018) 'The Effect of Big Data and Analytics
on Firm Performance: An Econometric Analysis Considering Industry Characteristics',
Journal of Management Information Systems, 35(2), pp. 488-509.

Nguyen, T., Li, Z., Spiegler, V., Ieromonachou, P. and Lin, Y. (2018) 'Big Data
Analytics in Supply Chain Management: A State-of-the-Art Literature Review',
Computers & Operations Research, 98, pp. 254-264.

O'Gorman, K. D. and MacIntosh, R. (2015) Research Methods for Business and


Management: A Guide to Writing Your Dissertation. 2nd edn. Oxford: Goodfellow
Publishers Limited.

OSRM (2020) Project-OSRM/osrm-backend. Available at: https://github.com/Project-


OSRM/osrm-backend (Accessed: 29 June 2020).

Parragh, S. N., Doerner, K. F. and Hartl, R. F. (2008) 'A Survey on Pickup and Delivery
Problems', Journal für Betriebswirtschaft, 58(2), pp. 81-117.

Pisinger, D. and Ropke, S. (2007) 'A General Heuristic for Vehicle Routing Problems',
Computers & Operations Research, 34(8), pp. 2403-2435.

Page 80 of 110
Raguseo, E., Vitari, C. and Pigni, F. (2020) 'Profiting from Big Data Analytics: The
Moderating Roles of Industry Concentration and Firm Size', International Journal of
Production Economics, p. 107758.

Rincon-Garcia, N., Waterson, B. J. and Cherrett, T. J. (2017) 'Requirements from


Vehicle Routing Software: Perspectives from Literature, Developers and the Freight
Industry', Transport Reviews, 38(1), pp. 117-138.

Roberson, M. T. and Sundstrom, E. (1990) 'Questionnaire Design, Return Rates, and


Response Favorableness in an Employee Attitude Questionnaire', Journal of Applied
Psychology, 75(3), p. 354.

Robusto, C. C. (1957) 'The Cosine-Haversine Formula', The American Mathematical


Monthly, 64(1), pp. 38-40.

Rosenfeld, P., Booth-Kewley, S. and Edwards, J. E. (1993) 'Computer-Administered


Surveys in Organizational Settings: Alternatives, Advantages, and Applications', The
American Behavioral Scientist, 36(4), pp. 485-511.

Roßmann, B., Canzaniello, A., von der Gracht, H. and Hartmann, E. (2018) 'The Future
and Social Impact of Big Data Analytics in Supply Chain Management: Results from a
Delphi study', Technological Forecasting and Social Change, 130, pp. 135-149.

Rozados, I. V. and Tjahjono, B. (2014) 'Big Data Analytics in Supply Chain


Management: Trends and Related Research', 6th International Conference on
Operations and Supply Chain Management (OSCM). Bali, 10-12 December.

Rushton, A., Croucher, P. and Baker, P. (2010) The Handbook of Logistics and
Distribution Management. 4th edn. London: Kogan Page Limited.

Rushton, A., Croucher, P. and Baker, P. (2014) The Handbook of Logistics &
Distribution Management 5th edn. London: Kogan Page.

Russom, P. (2011) 'Big Data Analytics', TDWI Best Practices Report, Fourth Quarter.

Saunders, M., Lewis, P. and Thornhill, A. (2016) Research Methods for Business
Students. 7th edn. Harlow, UK: Pearson Education.

Page 81 of 110
Schoenherr, T. and Speier-Pero, C. (2015) 'Data Science, Predictive Analytics, and Big
Data in Supply Chain Management: Current State and Future Potential', Journal of
Business Logistics, 36(1), pp. 120-132.

Seddon, J. J. J. M. and Currie, W. L. (2017) 'A Model for Unpacking Big Data Analytics
in High-Frequency Trading', Journal of Business Research, 70, pp. 300-307.

Selamat, S. A. M., Prakoonwit, S., Sahandi, R., Khan, W. and Ramachandran, M.


(2018) 'Big Data Analytics - A review of Data-Mining Models for Small and Medium
Enterprises in the Transportation Sector', Wiley Interdisciplinary Reviews: Data Mining
and Knowledge Discovery, 8(3).

Seyedghorban, Z., Tahernejad, H., Meriton, R. and Graham, G. (2020) 'Supply Chain
Digitalization: Past, Present and Future', Production Planning & Control, 31(2-3), pp.
96-114.

Shah, S., Soriano, C. B. and Coutroubis, A. (2017) 'Is Big Data for Everyone? The
Challenges of Big Data Adoption in SMEs', 2017 IEEE International Conference on
Industrial Engineering and Engineering Management (IEEM). Singapore, 10-13
December. IEEE, pp. 803-807.

Shortridge Ltd. (2018) Financial Statements. [Online]. Available at:


https://beta.companieshouse.gov.uk/company/02853436/filing-history (Accessed: 19
June 2020).

Shortridge Ltd. (2020). Available at: https://www.shortridgelaundry.co.uk/ (Accessed:


7 July 2020).

Shukla, M. and Mattar, L. (2019) 'Next Generation Smart Sustainable Auditing Systems
Using Big Data Analytics: Understanding the Interaction of Critical Barriers',
Computers & Industrial Engineering, 128, pp. 1015-1026.

Solomon, M. M. (1987) 'Algorithms for the Vehicle Routing and Scheduling Problems
with Time Window Constraints', Operations Research, 35(2), pp. 254-265.

Surbakti, F. P. S., Wang, W., Indulska, M. and Sadiq, S. (2020) 'Factors Influencing
Effective Use of Big Data: A Research Framework', Information & Management, 57(1).

Page 82 of 110
Syed, A., Gillela, K. and Venugopal, C. (2013) 'The Future Revolution on Big Data',
Future, 2(6), pp. 2446-2451.

Taillard, É. D. (1999) 'A Heuristic Column Generation Method for the Heterogeneous
Fleet VRP', RAIRO-Operations Research, 33(1), pp. 1-14.

Teclaw, R., Price, M. C. and Osatuke, K. (2012) 'Demographic Question Placement:


Effect on Item Response Rates and Means of a Veterans Health Administration Survey',
Journal of Business and Psychology, 27(3), pp. 281-290.

Teddlie, C. and Tashakkori, A. (2009) Foundations of Mixed Methods Research:


Integrating Quantitative and Qualitative Approaches in the Social and Behavioral
Sciences. California: Sage Publications.

Teece, D. J. (2007) 'Explicating Dynamic Capabilities: The Nature and


Microfoundations of (Sustainable) Enterprise Performance', Strategic Management
Journal, 28(13), pp. 1319-1350.

Thomé, A. M. T., Scavarda, L. F., Fernandez, N. S. and Scavarda, A. J. (2012) 'Sales


and Operations Planning and the Firm Performance', International Journal of
Productivity and Performance Management, 61(4), pp. 359-381.

TomTom (2020) Pricing. Available at: https://developer.tomtom.com/store/maps-api


(Accessed: 2 July 2020).

UK Government (2020a) Drivers' hours: GB domestic rules. Available at:


https://www.gov.uk/drivers-hours/gb-domestic-rules (Accessed: 7 July 2020).

UK Government (2020b) 'PM Address to the Nation on Coronavirus: 23 March 2020'.


Available at: https://www.gov.uk/government/speeches/pm-address-to-the-nation-on-
coronavirus-23-march-2020 (Accessed: 29 June 2020).

Vidal, T., Laporte, G. and Matl, P. (2020) 'A Concise Guide to Existing and Emerging
Vehicle Routing Problem Variants', European Journal of Operational Research,
286(2), pp. 401-416.

Page 83 of 110
Vidgen, R., Shaw, S. and Grant, D. B. (2017) 'Management Challenges in Creating
Value from Business Analytics', European Journal of Operational Research, 261(2),
pp. 626-639.

W.S Hunt's Transport Ltd (2015) Dimensions and Capabilities. Available at:
https://huntstransport.co.uk/our-fleet/dimensions-and-capabilities/ (Accessed: 01 July
2020).

Wagner, S. M., Ullrich, K. K. and Transchel, S. (2014) 'The Game Plan for Aligning
the Organization', Business Horizons, 57(2), pp. 189-201.

Waller, M. A. and Fawcett, S. E. (2013) 'Data Science, Predictive Analytics, and Big
Data: A Revolution that will Transform Supply Chain Design and Management',
Journal of Business Logistics, 34(2), pp. 77-84.

Wamba, S. F. and Akter, S. (2019) 'Understanding Supply Chain Analytics Capabilities


and Agility for Data-rich Environments', International Journal of Operations &
Production Management, 39(6/7/8), pp. 887-912.

Wamba, S. F., Akter, S., Edwards, A., Chopin, G. and Gnanzou, D. (2015) 'How ‘Big
Data’ can make Big Impact: Findings from a Systematic Review and a Longitudinal
Case Study', International Journal of Production Economics, 165, pp. 234-246.

Wamba, S. F., Gunasekaran, A., Akter, S., Ren, S. J.-f., Dubey, R. and Childe, S. J.
(2017) 'Big Data Analytics and Firm Performance: Effects of Dynamic Capabilities',
Journal of Business Research, 70, pp. 356-365.

Wang, G., Gunasekaran, A., Ngai, E. W. and Papadopoulos, T. (2016) 'Big Data
Analytics in Logistics and Supply Chain Management: Certain Investigations for
Research and Applications', International Journal of Production Economics, 176, pp.
98-110.

Wilson, N. H. M. and Colvin, N. J. (1977) Computer Control of the Rochester Dial-A-


Ride System (Report R77-31). Boston, USA: Department of Civil Engineering, M. I. T.

Yin, R. K. (2018) Case Study Research and Applications: Design and Methods. 6th
edn. California: Sage publications.

Page 84 of 110
Zheng, P., Sang, Z., Zhong, R. Y., Liu, Y., Liu, C., Mubarok, K., Yu, S. and Xu, X.
(2018) 'Smart Manufacturing Systems for Industry 4.0: Conceptual Framework,
Scenarios, and Future Perspectives', Frontiers of Mechanical Engineering, 13(2), pp.
137-150.

Page 85 of 110
Appendices

- Questions from Questionnaire (source: author)


Q1 I have read the information sheet and have an understanding of what the research is
about, what my involvement will be and how the information that I provide will be used

o Yes
o No
Skip To: End of Survey If I have read the information sheet and have an understanding of what the
research is about, what m... = No
Q2 I voluntarily consent to be a participant in this research and understand that I can
refuse to answer questions, I can withdraw from the study at any time without giving
a reason and that the information I provide will be kept anonymous

o Yes
o No
Skip To: End of Survey If I voluntarily consent to be a participant in this research and understand that I
can refuse to an... = No
Q3 If you were faced with a problem in your normal work, such as planning a large
production line, choosing to offer an additional product/service, or a change in
regulations or guidelines; would you...

o use your own experience and judgement to solve the problem and make a decision
o identify and speak with colleagues (or the internet) who may have an answer and
between you make a choice

o collect data on the problem from lots of sources and using analysis outputs to
make a decision

o use another method (please describe below!)


o don't know
Q4 You may have heard of the term "Big Data" associated with companies that use
large amounts of data and analytics on a day-to-day basis. Which company do you think
of first?

Q5 In your day-to-day work, can you think of any examples of where Big Data might
be created or is used?

Q6 What do you think the obstacle(s) are to using Big Data in your place of work?

Page 86 of 110
Q7 Have you ever used any of the following?

○ Microsoft Excel

○ Python/R

○ Google Analytics/Google Cloud

○ Amazon Web Services

○ Machine Learning

○ SQL (inc. MySQL, Teradata etc)

○ Clustering/Segmentation

○ Linear Programming/Optimisation

○ Hadoop/Spark/Hive etc

○ Regression Analysis

Q8 Thank you for your time and your answers. I would appreciate any other thoughts,
questions or considerations that you have about your understanding of big data and any
other feedback on this questionnaire

Page 87 of 110
– Key python packages used in the analysis (source: author)

Package Version Description

cspy 0.1.1 Algorithms for constrained shortest path problem


(dependency for vrpy)

folium 0.11.0 Used for plotting on geocoded locations on maps

ipython 7.14.0 Interactive Python interpreter

jsonschema 3.2.0 Used to parse JSON formatted data (responses from


HTTP requests)

matplotlib 3.2.1 Plotting library (for charts etc)

networkx 2.4 Used for creating network and graph visualisations

numpy 1.18.4+mkl Core package for manipulating numeric data


(dependency for pandas)

ortools 7.6.7691 Google OR-Tools library for constrained optimisation –


holds algorithms used for VRP

pandas 1.0.3 Provides data structures, fundamental for data analysis

pickleshare 0.7.5 Used for saving/loading datasets with python formats

pip 20.1.1 Used to install all other packages

pipwin 0.5.0 Used to install packages that failed installation with pip

polyline 1.4.0 Used to interpret the route output between two locations
in OSRM

python- 2.8.1 Used for manipulating dates


dateutil

requests 2.23.0 Used for making HTTP requests for Geocoding and
Routing

scipy 1.4.1 Scientific library; used for statistical tests

vrpy 0.2.0 In development library for solving vehicle routing


problems

Page 88 of 110
– Data Cleansing activity detail (source: author)

Data Description Result


Quality
point
Records
Whenever the vehicle’s engine is turned on the GPS is recorded 4,180 rows were
with the
until the engine is switched off. This means that when the removed where
same “Start
vehicle is switched on to be loaded or unloaded, there will be a the “Start
Postcode”
record in the dataset. i.e. the vehicle was stationary. Postcode” and
and “End
“End Postcode”
Postcode”
are the same.
8,482 rows in
total.
Creation of
A “Route_ID” key was created to identify the different routes. 9,156 rows in
unique key
For general deliveries, this is a concatenation of the date from total and the
for each
“End Time” and the “Registration”. For routes involving the identification of
route
two 18 Tonne vehicles which is used for trunking overnight, the 674 routes.
“Route_ID” is formatted by differently through using the depot
postcodes to avoid breaking a single trunk journey into two
routes. Dummy records were also created for the starting
location of each “Route_ID”.
Assessing Each record has
“End Postcode” was identified as the key field for analysing the
the quality an End_Postcode
customer demand nodes for analysis. The field provides an
of “End populated
exact location of where the vehicle was when it stopped.
Postcode”
Although 100% populated, “End Postcode” was either defaulted
to “Unknown” or incomplete for 176 records (2%).
This was resolved in three ways:
1. 39 records had the “Start Postcode” from the subsequent
record in the route sequence applied as the “End Postcode”
(following the logical assumption that the “End Postcode”
of record n is the “Start Postcode” of record n+1.
2. 126 records had the longitude and latitude applied manually
using Google Maps and Bing! Maps searches of the “End
Location” and incomplete “End Postcode” (38 distinct
postcodes)
3. 11 records had “Unknown” for “End Location” and “End
Postcode” with the “Start Postcode” in the subsequent
record also unknown. These were manually fixed using the
“End Location” and “End Postcode” from the subsequent
record – effectively flattening these rows. Only two of
these records related to non-depot locations.
Depot Identification of the depot locations show a large number were The depot
locations in split over multiple postcodes. Darlington depot was split over postcodes were
“End DL1 4QB (133 records) and DL1 4QD (364 records), Dumfries set as the most
Postcode” split over DG2 0HS (421 records) and DG2 0JE (12 records) frequently
and Workington split over CA14 4JH (171 records) and CA14 occurring with
4JX (422 records). alternatives
changed to these:

Page 89 of 110
Darlington – DL1
4QD, Dumfries –
DG2 0HS, and
Workington –
CA14 4JX.
Routes not 44 routes were identified that did not begin or end at one of the The 44 were
starting or depot postcodes (DL1 4QD, DG2 0HS, CA14 4JX) removed leaving
ending at 630 routes and
depot 8812 rows in the
location dataset.
Duplicated
There are routes where a postcode is repeated, indicating it was Postcodes are
“End
visited more than once for a particular route. This is likely to be deduplicated in
Postcodes”
noise in the data and unlikely to represent multiple visits to the routes. Also,
in routes
same customer. Therefore, only the first instance of the repeated repeated nodes in
postcode is kept. This aligns the original routes with the depot-level
requirements of the routing solvers which cannot have repeated analysis are
nodes. Another way of approaching this would be to assume removed across
these are different nodes and create dummy nodes for each routes so any
repetition, however, this would likely lead to further noise and depot-day pair has
bias within the results as the unique list of nodes would a unique set of
approach 7000 versus the 1720 without repeated nodes and a postcodes to
further departure from the Shortridge customer base of around deliver to.
1000 customers.
This issue exists again at depot level used for depot level
analysis (Analysis 2 to Analysis 5), where routes on the same
day may share a particular postcode. In this instance, the
postcode is kept only once and repeats are removed from other
routes on that day. In this way, the list of nodes in the routes in
the data and the algorithm are unique and are the same. Though,
again may highlight further limitations in the current method of
routing where similar areas are covered more than once in the
same route and by multiple routes.
Distances
Though the telematics dataset contains columns related to Route distances
and
distance and time, these were actual travel distances and times and route times
durations
and will include effects of traffic etc. To ensure a like-for-like were taken from
comparison, only distances and times from OSRM were used OSRM only
for the analysis.

Page 90 of 110
– Routing Engine setup

A method of calculating the distance between each of the locations is required to find
the optimum routes and for the research to hold real-world value, actual distances are
used rather than crows-fly distance calculation from the latitude and longitude with the
Haversine formula (Robusto, 1957). Routing distances are often sourced in a pair-wise
distance matrix, whereby the origins are columns of the matrix and destinations the row
with the route distance the cell of intersection. Appendix Figure 1 illustrates the distance
from B to C and note that the distance from C to B is slightly longer.

Appendix Figure 1 - Example distance matrix (source: author)

Commercially available routing engines carry an expense. Appendix Table 1 provides


a view of the cost of requesting a pairwise distance matrix. A single pairwise
1720x1720 distance matrix has 2,958,400 elements which is a minimum of $2,167.
Though there are other ways of reducing the size of the matrix required (e.g. multiple
smaller distance matrices), but for route planning at a holistic level, it’s not
unreasonable to want to have a complete matrix (e.g. for planning which customers are
served by which depot). Fortunately, there is an open source routing engine available
called Open Source Routing Machine (OSRM) (Luxen and Vetter, 2011). A local
instance of OSRM was built using a Docker container (Docker, 2020). Appendix Figure
2 illustrates how this interaction works with the Docker container acts like a virtual
machine on which an OSRM docker image is installed. The Great Britain map data was
pre-processed in OSRM using the default car routing profile, “car.lua”. The engine was
then launched using the following script in Appendix Figure 3. This uses the contraction
hierarchies algorithm (Geisberger et al., 2008), ”-- algorithm ch” for routes,
recommended by the OSRM developers due to the large size of the distance matrix
requested, and increases the maximum table/matrix size that can be requested. For
further information see Leuthold (2020) and OSRM (2020) for a guide on how to setup
OSRM using Docker.

Page 91 of 110
Appendix Table 1 – Example providers of distance matrices and the cost (source:
author)

Price for Price for


Provider Method of Charging 10,000 2,958,400
elements* elements**
Charged by matrix elements
Google Maps (2020) Free $2100-$5000+
($200 monthly credit)
DistanceMatrix AI
Charged by matrix elements $40 $1540-$4000+
(2020)
Charged by billable N/A N/A
Bing Maps (2020) transactions: (max is 2500 (max is 2500
(matrix elements / 4) elements) elements)
Mapbox (2020) Charged by matrix elements Free $3951
TomTom (2020) Charged by matrix elements $25 $2167

*an element is a single cell within a distance matrix e.g. Appendix Figure 1 has 16 elements
**this research used a 1720×1720 matrix which has 2,958,400 elements

Appendix Figure 2 – Diagram explaining setup of OSRM with OpenStreetMap


(source: author)

docker run -t -i -p 5000:5000 -v %cd%:/data osrm/osrm-


backend osrm-routed --algorithm ch --max-table-size 10000
/data/great-britain-latest.osrm

Appendix Figure 3 - Commands to launch OSRM within a Docker container (source:


author)

Page 92 of 110
Appendix Figure 4 - Screenshot of the frontend of OSRM running on local machine
(source: author)

Page 93 of 110
– Setting the time-limit for the routing algorithm

The time-limit is calculated as described in Appendix Equation 1. This allows the


routing problems in the dataset with more nodes longer calculation time as there are
more permutations of possible solutions whilst enabling the dataset to run through the
algorithm in a reasonable time. The constant varied per analysis:

 c = 10 for Analysis 1; total run time = 8 hours


 c = 20 for Analysis 2; total run time = 11.5 hours
 c = 15 for Analyses 3, 4 and 5; total run time = 15 hours

(𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑜𝑑𝑒𝑠)2
𝑡𝑖𝑚𝑒 𝑙𝑖𝑚𝑖𝑡 (𝑠𝑒𝑐𝑜𝑛𝑑𝑠) = 30 + ⌊ ⌋
𝑐

(Appendix Equation 1, source: author)

where the “number of nodes” is the unique number of nodes in the baseline route and c is a
constant. Note the brackets return an integer.

Page 94 of 110
– Deriving customer demand for Analyses 3,4,5

Customer demand for bags of linen is derived based on the assumption from Shortridge
that customers that are visited most frequently tend to be larger customers and have
larger orders. Thus, the customer demand is modelled as proportional to the per week
frequency a customer was delivered to in February 2020. This frequency is calculated
using Appendix Equation 2.

𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑛𝑜𝑑𝑒 𝑣𝑖𝑠𝑖𝑡𝑠 𝑖𝑛 𝐹𝑒𝑏𝑟𝑢𝑎𝑟𝑦 2020


𝑁𝑜𝑑𝑒 𝑃𝑒𝑟𝑊𝑒𝑒𝑘𝐹𝑟𝑒𝑞 = 𝑚𝑖𝑛 [ , 6]
4

(Appendix Equation 2, source: author)

where “Node PerWeekFreq” is the minimum of either the average number of times per week a
node is visited during February 2020 and 6 (the maximum integer number of days per week
any node can be serviced, assumption viii.), and 4 is the number of complete weeks in February
2020.

The demand for a customer/node is not static and is modelled to vary based on the
demand of the other customers in the route and the capacity of the original delivery
vehicle (Table 4-2). Therefore, the node demand for a particular customer, j, in any one
of the 72 records is as described in Appendix Equation 3.

𝐵𝑎𝑠𝑒𝑙𝑖𝑛𝑒 𝑉𝑒ℎ𝑖𝑐𝑙𝑒 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 × 𝑁𝑜𝑑𝑒 𝑃𝑒𝑟𝑊𝑒𝑒𝑘𝐹𝑟𝑒𝑞𝑗 × 𝑈𝑡𝑖𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛


𝑁𝑜𝑑𝑒 𝐷𝑒𝑚𝑎𝑛𝑑𝑗 (𝑏𝑎𝑔𝑠) = ⌊ ⌋
∑𝑖𝑛 𝑁𝑜𝑑𝑒 𝑃𝑒𝑟𝑊𝑒𝑒𝑘𝐹𝑟𝑒𝑞𝑖

(Appendix Equation 3, source: author)

where “Baseline Vehicle Capacity” is the capacity in bags of the original delivery vehicle, and
is divided by the sum of the “Node PerWeekFreq” for all nodes, n, in the original delivery
route. This is multiplied by the “Node PerWeekFreq” for customer j and multiplied by the
vehicle “Utilisation” – a percentage between 0 and 100%. Node Demand is then rounded down
to the nearest integer.

Appendix Figure 5 - Python code that derives the customer/node demand. Here original
vehicle utilisation is set to 95% (source: author)

Page 95 of 110
– Additional analysis – differences in routing between the depots

Of the three depots, Workington served 713 nodes in February with Dumfries and
Darlington serving 557 and 466, respectively. The box plots (Appendix Figure 6) also
indicate a wider variance in route distances from Workington routes which is likely
driven by the differences in the nodes to delivered to per day and per route (Appendix
Figure 7).

Appendix Figure 6 - Box Plots per depot of the total distance (A) and duration (B) in
original routing (source: author)

Appendix Table 2 - Descriptive Statistics for total distance and total time for each depot
in original routing (source: author)

Dumfries Workington Darlington


Total nodes 557 713 466
Total Mean 1423 1599 1387
Distance Median 1524 1921 1665
(km) Minimum 267 0 0
Maximum 2143 2413 2025
Standard Deviation 614.9 764.3 637.2
Total Total nodes 557 713 466
Time Mean 21.6 26.1 21.1
(hrs) Median 24 32 26
Minimum 3 0 0
Maximum 32 40 30
Standard Deviation 9.9 13 9.7

Page 96 of 110
Appendix Figure 7 - Distribution of the number of nodes to be delivered to each day in
February (source: author)

Appendix Figure 8 - Map of nodes and depot serviced from. Tooltip indicates the depot
location (Red = Dumfries, Blue = Workington, Black = Darlington) (source: author)

Page 97 of 110
Throughout the comparative analyses, algorithm routing shows the greatest reduction
in total distance and duration compared to the original routing from the Workington
depot. For example, routes from Workington depot reduced total distance by an average
25%, compared with Dumfries (17%) and Darlington (12%) in Analysis 2. Even by
simply reordering the nodes in the route in Analysis 1 saw routes from Workington
account for 17 of the 25 routes making a 20% reduction in distance covered.

Appendix Figure 9 shows the visual differences in route distributions between the
depots in Analysis 2. Whilst there is a statistically significant difference in all algorithm
routing distributions compared to the original routing (p-values for Wilcoxon 2-sample
test: 7.4×10-9 (Workington), 5.6×10-5 (Dumfries) and 0.001 (Darlington)), it is much
less pronounced at Darlington depot. Using routing from Darlington on 13/02/2020
which had 73 nodes to service. The five routes in the original routing (Appendix Figure
10) are very similar visually to those proposed by the algorithm (Appendix Figure 11).
Perhaps there are different routing methods between the depots or since the Darlington
depot appears to cover a greater area (), a manual approach to clustering routes is easier.

Dumfries Depot Workington Depot

Darlington Depot

Appendix Figure 9 - Histograms of the route distance from each depot under Original
routing and Algorithm routing in Analysis 2 (source: author)

Page 98 of 110
Appendix Figure 10 - Original routing from Darlington on 13/02/2020 (source: author)

Appendix Figure 11 - Algorithm routing from Darlington on 13/02/2020 in Analysis 2


(source: author)

Page 99 of 110
The changes to routing and vehicle selection also have a differing impact on the number
of vehicle journeys from each depot versus the original routing. Workington sees the
largest reduction in vehicle journeys going from a median of 9.5 to 6.5 with improved
vehicle selection (Appendix Table 3) but this remains the same at Darlington (Appendix
Table 5).

Appendix Table 3 – Selected descriptive statistics of the number of vehicle journeys


from Workington depot (source: author)

Workington Original Analysis 3 – no vehicle Analysis 4 – vehicle


routing selection selection
85% 95% 85% 95%
CoODV CoODV CoODV CoODV
Mean (std. dev.) 8 (3.3) 5.5 (1.8) 6.7 (2.4) 5.2 (1.8) 6.3 (2.5)
Median 9.5 6 7.5 6 6.5
Maximum 11 7 9 7 9

Appendix Table 4 - Selected descriptive statistics of the number of vehicle journeys from
Dumfries depot (source: author)

Dumfries Original Analysis 3 – no vehicle Analysis 4 – vehicle


routing selection selection
85% 95% 85% 95%
CoODV CoODV CoODV CoODV
Mean (std. dev.) 6.1 (2.5) 5 (1.9) 5.7 (2.4) 4.8 (2) 5.5 (2.5)
Median 7 6 7 6 7
Maximum 9 7 8 7 8

Appendix Table 5 - Selected descriptive statistics of the number of vehicles journeys


from Darlington depot (source: author)

Darlington Original Analysis 3 – no vehicle Analysis 4 – vehicle


routing selection selection
85% 95% 85% 95%
CoODV CoODV CoODV CoODV
Mean (std. dev.) 5.5 (1.7) 5.1 (1.6) 5.4 (1.7) 4.8 (1.7) 5.3 (1.8)
Median 6 5 6 5 6
Maximum 8 7 7 7 7
These differences between the depots perhaps highlight a difference in the method of
routing between depots or there could be an environmental constraint more visible at
Workington than other depots (e.g. road quality, suitability for larger vehicles). This
also might explain the different fleet makeup between the depots.

Page 100 of 110


– Initial coding template used in template analysis

Page 101 of 110


– Information sheet sent to questionnaire participants (adapted from
template)

Information for participants

1. What is the research about?

A questionnaire will be shared across Shortridge Ltd asking questions about data and
analytics understanding at Shortridge. The anonymous responses will provide
justification and a background for the research to help future readers interpret the
results.

The questionnaire is linked to additional research that is seeking to compare three


different approaches to solving a real-world logistics problem. This real-world logistics
problem is the Vehicle Routing Problem which is concerned with finding the best way
for a fleet of vehicles to perform all required collections and dropoffs. At Shortridge
Ltd, this is the daily routing of the vehicles to fulfil the collections and deliveries of
laundry at the customer sites. The research will use data provided by Shortridge Ltd to
compare an experience-led approach to route the vehicles (heuristic), using a software
product to produce the routing (software) and an analytically derived routing solution
using historic data (Big Data Analytics).

This research will be my dissertation and contribute towards achieving my Masters


Degree.

2. What will my involvement be?

You will be asked to complete an anonymous online questionnaire of 5 questions about


your experiences of data and analytics. It should take approximately 15 minutes.

3. Do I have to take part?

It is up to you to decide whether or not to take part. You do not have to take part if you
do not want to. If you do decide to take part, please follow the link in the email and
answer positively to the two statements in the questionnaire related to consent.

4. How do I withdraw from the study?

You can withdraw at any point of the study, without having to give a reason. If any
questions during the questionnaire make you feel uncomfortable, you do not have to
answer them. Withdrawing from the study will have no effect on you. If you withdraw
from the study, I will not retain the information you have given thus far, unless you are
happy for me to do so.

5. What will my information be used for?

Page 102 of 110


The questions seek to gauge your level of understanding of analytics and Big Data, both
personal and at Shortridge Ltd, so I can highlight the differences and similarities
between the organisation and the literature on analytics and Big Data in SMEs and
logistics. I will predominantly be using the responses to provide a context for the
research and to help explain the findings. This will hopefully aid how the whole
research is interpreted by the readers.

6. Will my taking part and my data be kept confidential? Will it be anonymised?

The records from this study will be kept as confidential as possible. The data will be
stored securely on the University systems. Only myself, my supervisor and exam
markers will have access to the data generated by the study. Your data will be
anonymised – your name is not recorded so will not be used in any reports or
publications resulting from the study. Any hard copies of research information will be
kept in locked files at all times.

7. Research Ethics and Data Protection

The Heriot-Watt University Research Ethics Policy can be found here:


https://www.hw.ac.uk/documents/research-ethics-policy.pdf

The Heriot-Watt University’s Data Protection Policy can be found here:


https://www.hw.ac.uk/documents/heriot-watt-university-data-protection-policy.pdf

The legal basis used to process your personal data will be Legitimate interests. The
legal basis used to process special category personal data (e.g. data that reveals racial
or ethnic origin, political opinions, religious or philosophical beliefs, trade union
membership, health, sex life or sexual orientation, genetic or biometric data) will be for
scientific and historical research or statistical purposes.

To request a copy of the data held about you please contact Edmund, egh3@hw.ac.uk.

8. What if I have a question or complaint?

If you have any questions regarding this study, please contact the researcher: Edmund
Houldridge (egh3@hw.ac.uk)

If you have any concerns or complaints regarding the conduct of this research, in the
first instance please contact Dr Adam Gripton (a.gripton@hw.ac.uk)

If you are dissatisfied with the response from my supervisor, please contact the School
of Social Sciences Research Officer: Dr James Richards (j.richards@hw.ac.uk)

Page 103 of 110


– Python script used in Analysis 5 (source: author)

START OF SCRIPT

#!/usr/bin/env python
# coding: utf-8
# In[4]:
import pandas as pd,requests,folium, polyline,json
import numpy as np
import networkx as nx
import vrpy as vrp
import matplotlib.pyplot as plt
import pickle
import datetime
import collections
from future import print_function
from ortools.constraint_solver import routing_enums_pb2
from ortools.constraint_solver import pywrapcp
pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', 500)

# ### 1. Import files


# In[5]:

#load inputs
location = "C:/Users/Ed/OneDrive - Heriot-Watt University/Uni
work/Dissertation/ShortridgeData/VRPinput_v2/VRPinputs3_allv_2.txt"
with open(location, "rb") as fp: # Unpickling
list_of_nodes_bl = pickle.load(fp)

# In[6]:

#load matrices
location_dist = "C:/Users/Ed/OneDrive - Heriot-Watt University/Uni
work/Dissertation/ShortridgeData/Distance_matrix1.txt"
location_dur = "C:/Users/Ed/OneDrive - Heriot-Watt University/Uni
work/Dissertation/ShortridgeData/Duration_matrix1.txt"
with open(location_dist, "rb") as fp: # Unpickling
distance_matrix = pickle.load(fp)
with open(location_dur, "rb") as fp: # Unpickling
duration_matrix = pickle.load(fp)

# In[7]:
#for testing
#list_of_nodes_bl = list_of_nodes_bl.head(3)
# In[8]:

#Set demand to be 95% capacity of vehicles - with relative weight


given to customer order rate => larger customers
def _weighted_demand(row):
for i in range(len(row['routes2'])):
c = int(0.95* row['Bag_Capacity'][i])
u = row['routes2'][i]
v = [i for i in u if i not in [0,1,2]]
inds = []
for element in v: #ignore depot
inds.append(row['Node2'].index(element))
pw = [row['perWeek2'][i] for i in inds]

Page 104 of 110


if sum(pw) == 0: #added because row 65, route 4 is [0,0]
t = 0
else:
t = c / sum(pw)
dem = [max(int(i * t),1) for i in pw]
for a,b in zip(inds,dem):
row['Demand2'][a] = b
# In[9]:
list_of_nodes_bl.apply(_weighted_demand,axis=1)
list_of_nodes_bl.tail(3)
# In[62]:
#testing
#list_of_nodes_bl = list_of_nodes_bl.head(3)
#list_of_nodes_bl = pd.DataFrame(list_of_nodes_bl.iloc[[48,49],:])

# ### 2. Set up Routing


# In[10]:
def _HCVRP_orVRP (xvrpmatrix, xothermatrix, xvehicles, xcapacity,
xdemand, xwaiting_time,
xdepot_pos, xspancoeff, xtimelimit, xfirstsolution,
xmetaheuristic):

#make matrix integer only


matrix_to_use = [[int(x) for x in y ] for y in xvrpmatrix]
time_matrix = [[int(x/60) for x in y] for y in xothermatrix]
#minutes

#maximum travel distance for distance capacity constraint


#if using duration_matrix - might want to reduce capacity so
vehicles finish in a certain time?
sum_dist = sum([min([x for x in y]) for y in matrix_to_use])
sum_dist = int(sum_dist/xvehicles)

#Creates a dictionary of the data model


data = {}
data['distance_matrix'] = matrix_to_use
data['time_matrix'] = time_matrix
data['num_vehicles'] = xvehicles
data['depot'] = xdepot_pos
data['demands'] = xdemand
data['vehicle_capacities'] = xcapacity

#Create output model


vrp_output = {}
vrp_output['routes'] = []
vrp_output['vehicle'] = []
vrp_output['r_node_times'] = []
vrp_output['r_node_capacity'] = []
vrp_output['r_capacities'] = []
vrp_output['route_time'] = []
vrp_output['total_time'] = 0
vrp_output['total_load'] = 0
vrp_output['r_node_dists'] = []
vrp_output['r_distance'] = []
vrp_output['total_distance'] = 0

# Create the routing index manager.


manager =
pywrapcp.RoutingIndexManager(len(data['distance_matrix']),
data['num_vehicles'],
data['depot'])

Page 105 of 110


# Create Routing Model.
routing = pywrapcp.RoutingModel(manager)

# Define distance callback


def distance_callback(from_index, to_index):
"""Returns the distance between the two nodes."""
# Convert from routing variable Index to distance matrix
NodeIndex.
from_node = manager.IndexToNode(from_index)
to_node = manager.IndexToNode(to_index)
return data['distance_matrix'][from_node][to_node]

transit_callback_index =
routing.RegisterTransitCallback(distance_callback)

# Define time callback


def time_callback(from_index, to_index):
"""Returns the time between the two nodes."""
# Convert from routing variable Index to time matrix
NodeIndex.
from_node = manager.IndexToNode(from_index)
to_node = manager.IndexToNode(to_index)
return data['time_matrix'][from_node][to_node] +
xwaiting_time

# Add duration constraint


time_callback_index =
routing.RegisterTransitCallback(time_callback)
routing.AddDimension(time_callback_index,
9*60, #slack in minutes
9*60, #total in minutes
True,
'Duration')
duration_dimension = routing.GetDimensionOrDie("Duration")

# Add demand callback from inputs


def demand_callback(from_index):
"""Returns the demand of the node."""
# Convert from routing variable Index to demands NodeIndex.
from_node = manager.IndexToNode(from_index)
return data['demands'][from_node]

#Add demand constraint


demand_callback_index =
routing.RegisterUnaryTransitCallback(demand_callback)
routing.AddDimensionWithVehicleCapacity(
demand_callback_index,
0, # null capacity slack
data['vehicle_capacities'], # vehicle maximum capacities
True, # start cumul to zero
'Capacity')

# Define cost of each arc.


routing.SetArcCostEvaluatorOfAllVehicles(transit_callback_index)

# Setting first solution heuristic.


search_parameters = pywrapcp.DefaultRoutingSearchParameters()
search_parameters.first_solution_strategy = xfirstsolution
search_parameters.local_search_metaheuristic = xmetaheuristic
search_parameters.time_limit.seconds = xtimelimit

Page 106 of 110


search_parameters.log_search = True

#Solve!
solution = routing.SolveWithParameters(search_parameters)

# Generate route output


total_distance = 0 #calc below
total_time = 0
total_load = 0

#Prints to window and adds to vrp_output dictionary


for vehicle_id in range(data['num_vehicles']):
vrp_output['vehicle'].append(vehicle_id)
index = routing.Start(vehicle_id)
plan_output = 'Route for vehicle {}:\n'.format(vehicle_id)
route_distance = 0
route_load = 0
nodes = []
loads = []
dists = []
while not routing.IsEnd(index):
node_index = manager.IndexToNode(index)
nodes.append(node_index)
route_load += data['demands'][node_index]
loads.append(route_load)
plan_output += ' {0} Load({1}) -> '.format(node_index,
route_load)
previous_index = index
index = solution.Value(routing.NextVar(index))
dist = routing.GetArcCostForVehicle(previous_index,
index, vehicle_id)
route_distance += dist
dists.append(dist)

vrp_output['routes'].append(nodes+[manager.IndexToNode(index)])
vrp_output['r_node_capacity'].append(loads)
vrp_output['r_capacities'].append(route_load)
plan_output += ' {0}
Load({1})\n'.format(manager.IndexToNode(index),route_load)
plan_output += 'Distance of the route:
{}m\n'.format(route_distance)
plan_output += 'Load of the route: {}\n'.format(route_load)
vrp_output['r_distance'].append(route_distance)
vrp_output['r_node_dists'].append(dists)
print(plan_output)
total_distance += route_distance
total_load += route_load
vrp_output['total_load'] = total_load
vrp_output['total_distance'] = total_distance
print('Total Distance of all routes: {}m'.format(total_distance))
print('Total load of all routes: {}\n\n'.format(total_load))

#Add time from time matrix


for route in vrp_output['routes']:
for i in range(len(route)):
if i == 0:
r_cum_time = 0
r_time = [0]
else:
r_cum_time += int(xothermatrix[route[i-1]][route[i]])

Page 107 of 110


r_time.append(int(xothermatrix[route[i-
1]][route[i]]))
vrp_output['r_node_times'].append(r_time)
vrp_output['route_time'].append(r_cum_time)
vrp_output['total_time'] += r_cum_time

return vrp_output

# ### 3. Define function to loop dataset over VRP


# In[11]:
def _matrix_slim (xmatrix, xpoints):
A = np.matrix(xmatrix,dtype=(int))
xrows = A[xpoints,:]
xcols = xrows[:,xpoints]
return xcols.tolist()

def _demand (dem_list):


d = dict(enumerate(dem_list))
return {k + 1: v for k,v in d.items()}

def _flatten(l):
for el in l:
if isinstance(el, collections.abc.Iterable) and not
isinstance(el, (str, bytes)):
yield from flatten(el)
else:
yield el

# In[12]:
def _loop_frame (row):

print("\n\n Row Number: {} \n".format(row.name))

#Format required inputs


Nodes = [row['Depot_Node']] + row['Node2'] #Select distinct nodes
from current route
depot = 0 #Set location of depot node within Nodes - should be
first element
demand = [0] + row['Demand2'] #add 0 demand for depot..
vrp_matrix = _matrix_slim(distance_matrix,Nodes)
other_matrix = _matrix_slim(duration_matrix,Nodes)

#Vehicles
no_vehicles = len(row['Bag_Capacity_all'])
v_capacity = row['Bag_Capacity_all']

#Run VRP optimisation


try:
vrp = _HCVRP_orVRP(xvrpmatrix = vrp_matrix,
xothermatrix = other_matrix,
xdepot_pos = depot,
xvehicles = no_vehicles,
xcapacity = v_capacity,
xdemand = demand,
xwaiting_time = 10,
xspancoeff = 100,
xtimelimit = 30 +
int((len(Nodes)*len(Nodes))/15),
xfirstsolution =
(routing_enums_pb2.FirstSolutionStrategy.PARALLEL_CHEAPEST_INSERTION)
,

Page 108 of 110


xmetaheuristic =
(routing_enums_pb2.LocalSearchMetaheuristic.GUIDED_LOCAL_SEARCH)
)

#Turn the nodes from reduced matrix into actual nodes


vrp['route_fix'] = [[Nodes[x] for x in route] for route in
vrp['routes']]
vrp['algo_solve'] = 'Y'

except:
vrp = {}
vrp['total_distance'] = row['total_distance2']
vrp['total_time'] = row['total_time2']
vrp['vehicle'] = [i for i in range(len(row['Bag_Capacity']))]
vrp['route_fix'] = row['routes2']
vrp['r_node_times'] = []
vrp['r_node_dists'] = []
vrp['r_distance'] = row['r_distance2']
vrp['route_time'] = row['route_time2']
vrp['total_load'] = sum(row['Demand2'])
vrp['r_node_capacity'] = []
vrp['algo_solve'] = 'N'

return (vrp['total_distance'],
vrp['total_time'],
vrp['vehicle'],
vrp['route_fix'],
vrp['r_node_times'],
vrp['r_node_dists'],
vrp['r_distance'],
vrp['route_time'],
vrp['total_load'],
vrp['r_node_capacity'],
vrp['algo_solve'])

# ### 4. Run Loop over dataset and add algo columns to dataset
# In[13]:

cols = ['total_distance_algo',
'total_time_algo',
'vehicle_algo',
'routes_algo',
'r_node_times_algo',
'r_node_dists_algo',
'r_distance_algo',
'r_time_algo',
'total_load',
'r_node_capacity',
'algo_solve']
list_of_nodes_bl[cols] = list_of_nodes_bl.apply(lambda row:
pd.Series(_loop_frame(row)),axis=1)

# ### 5. Save file

# In[14]:

Page 109 of 110


today = str(datetime.date.today())

#save file
location = "C:/Users/Ed/OneDrive - Heriot-Watt University/Uni
work/Dissertation/ShortridgeData/4.
All_vehicles/5.All_vehicles_algo_time_95"+today+".txt"
with open(location, "wb") as fp: #Pickling
pickle.dump(list_of_nodes_bl, fp)

# In[42]:

location = "C:/Users/Ed/OneDrive - Heriot-Watt University/Uni


work/Dissertation/ShortridgeData/4.
All_vehicles/4.All_vehicles_algo"+today+".txt"
with open(location, "rb") as fp: # Unpickling
list_of_nodes_bl = pickle.load(fp)
list_of_nodes_bl.head()

# In[15]:

list_of_nodes_bl.to_csv("C:/Users/Ed/OneDrive - Heriot-Watt
University/Uni work/Dissertation/ShortridgeData/4.
All_vehicles/5.All_vehicles_algo_time_95"+today+".csv")

# In[ ]:

END OF SCRIPT

Page 110 of 110

You might also like