Professional Documents
Culture Documents
Trends and Challenges Towards An Effective Data-Driven
Trends and Challenges Towards An Effective Data-Driven
Funding
arXiv:2305.15454v1 [cs.CY] 24 May 2023
This work was funded by the European Union under the European Regional Development Fund (The Big Data Corridor
project - Project no. 12R16P00220) and match-funded by six Project Partners - Birmingham City Council, Aston University,
Birmingham City University, EnableID, Innovation Birmingham and West Midlands Combined Authority. The funding
source had no role in the design, execution, interpretation, or writing of the work..
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 1 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 2 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
the EU, recognize the importance of empowering SMEs to permission of these businesses. The study focuses on the
benefit from the digital revolution and generate measurable lessons learnt from the analysis of business data and the
economic benefits. This is evident from the proportion of adoption of data-driven solutions by SMEs.
EU funding allocated to data-related projects, big data, and The rest of the paper is organized as follows: Section
data science [14]. To increase the number of highly skilled 2 recaps concepts of open data and data analytics, then in
workers in AI and data science, the UK government, the Section 3, we briefly analyse SMEs’ data and digital tech-
Office for Students, universities, and industry partners have nology trends. Section 4 presents related discussion and the
established a fund of up to £24 million [15]. key lessons learned while supporting and collaborating with
In emerging economies, Small and Medium-sized Enter- SMEs. We provide a summary and conclusion in Section 5.
prises (SMEs) are estimated to generate 60% of employment
and 40% of Gross Domestic Product (GDP). Moreover, the
proportion of SMEs that employ 66% of the workforce in 2. Opening Data for SMEs
the European Union is higher [16]. Despite this, SMEs in In [19], Ghahremanlou et al. identified several static data
the UK are adopting big data analytics at a rate of less resources that have been established by, or are associated
than 1% [17, 18]. Yet businesses in the UK are becoming with six UK cities, and classified them according to these
more aware of the value of data-driven decision-making, and themes (domains), Figure 1.
data analytics is increasing in popularity, according to recent A focus of the study is to better understand how open and
reports. The data science industry is also poised to expand shared data might impact the economy. Consider outlining
over the next few years. how open data can be used by small and medium businesses
This paper aims to explore trends and challenges to- and startups, along with how it can contribute to economic
wards an effective data-driven decision making for UK busi- growth, new jobs, innovation, and other improvements in
nesses and how SMEs advance their Business Models (BMs) social and economic development aligned to smart cities.
around data to handle data-driven products and how this By using empirical studies to ground their analysis, we were
contributes to their innovation and performance. We present able to identify the most important challenges facing SMEs
an analysis of the challenges and opportunities of digitiliza- when considering open data strategies. As well as principles
tion, and adoption of data science and Artificial Intelligence that have proven, real-world applicability.
(AI) within the UK business sector. Our analysis of 85 UK SME’s can make use of collected and collated open
SMEs is based on case studies of SMEs located primarily data relatively inexpensively. Open data can be used to
in England’s West Midlands who are supported in the areas make innovative business decisions, or it can gain access to
of data management, machine learning, data analytics, and new data sources previously inaccessible. A data product or
other related digital technologies. Our study also briefly service can be created by combining open data with closed
examines how small businesses can take advantage of data- proprietary sources, either internally or with partners. As a
driven innovation and decision-making, while highlighting result of its potential to fuel digital innovation, governments
challenging areas where support for digital technology adop- have encouraged the utilisation of open data. Researchers
tion is most needed. The study explores digital technology and policymakers alike highlight the importance of open
trends, challenges limiting SMEs from effective utilization data for SMEs in job creation and economic growth [20].
of enabling technologies and the state of their adoption in The BDC project (Project name removed for double
AI technologies. Business opportunities and advantages of blind reviewing) provided us an opportunity to work with
adopting data analytic technologies and AI are demonstrated SMEs, supported by Birmingham City Council as lead
through several selected practical case studies in Section 4. project partner to motivate and encourage the use of open
The multi-perspective analysis and case studies in this data. Our work contributes to data driven innovation re-
paper inform the SME business industry and business in- search by analysing the main capabilities needed to over-
novation and growth bodies of potential challenges and key come existing barriers to successfully take advantage of
opportunities in AI usage. As well as encouraging small open data for SMEs support.
businesses to derive meaningful insights from closed data We analyse the data collected from 85 business assists
(those belonging to the business) and open data (those with SMEs participating in the BDC project and adopting an
available publicly) by taking advantage of emerging trends in integrated open data-based strategies. We identify a number
machine learning, and data and analytics in the past decade. of core factors that shape open data acquisition, assimilation,
The research also highlights areas where future support and transformation and exploitation by SMEs. Results show that
funding are most needed to enable SMEs enterprise sector without the specific capabilities delivered by our expert
to embark on the digital revolution and AI adoption thereby team, it will be difficult for these SMEs to successfully use
contributing to the growth and development of this key open data, which may explain why the uptake of open data
sector in the United Kingdom. by SMEs more broadly has so far been constrained.
Due to non-disclosure agreements and the need to pre- As open data sources expand, SMEs and startups often
serve business confidentiality, we do not detail individual find it difficult to identify and assess the usefulness of open
SMEs’ support or provide any associated data except in data. It could be a result of limited time or multitasking,
the two case studies that were used as examples with the inability to identify available relevant data sources, a lack of
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 3 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
Figure 1: Open Static Datasets Distribution Across Major Themes by Six UK Cities
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 4 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
Table 1
Top 10 business sectors among the 85 SMEs supported by BDC project.
The SMEs of the sample had a minimum of one (the since each company could only receive one, most of the
owner) and a maximum of 146 employees. The minimum companies (81 out of 85 or 95%) benefited from this type
turnover of the SMEs was £10,000 and the maximum re- of output. This can be attributed to the fact that this type
ported turnover was £10 million. Companies were cate- of support included attendance to seminars and workshops
gorised based on their activities to 20 different sectors. with data science related subjects, as well as receiving quick
Table 1 shows the top 10 business sectors based on the support to solve non-complex data related business prob-
count of SMEs, together with the ranges for the number of lems. From the remaining outputs which required longer-
employees and turnover for the SMEs of each sector. term assistance and collaboration, 61 of the supported SMEs
The top 10 business sectors presented in Table 1, to- were involved in the introduction of a new product or service
gether accounts for 78.9% of the total SMEs from the sample. to their businesses internally and 29 on the introduction of a
As expected, most of the SMEs were from the Information new product or service to the market. These results show
& Communications Technology sector with 16 businesses that companies mostly sought for data-related support to
representing 18.8% of the total companies involved in the upgrade their internal services, extract insight from their
project. Other sectors represented with a higher number of data and improve the decision-making processes. However,
SMEs includes Education and Training (10 SMEs and11.8% we should also mention that some of the outputs that were
of total), Consultancy (10 SMEs and 11.8% of total), Mar- not introduced to the market were either Proof of Concept
keting & Public Relations (7 SMEs and 8.2% of total) (PoC) solutions or components of larger projects aimed to
and Human health & Social work activities with 6 SMEs be released to the market in the future.
and 7.1% of total. These numbers suggest that companies
in the technology and services sectors rely more on data- Figure 4 presents the counts per type of output for
driven solutions to support their business when compared to the twenty different business sectors. The sectors with the
companies in other sectors. most outputs were Information & Communication Technol-
The SMEs included in this analysis completed at least ogy (32 outputs) and Education & Training (29 outputs).
one of the following types of business support (outputs) However, considering the count of SMEs from each sector,
provided by the BDC project: the Education & Training sector had almost 3 outputs
per SME (29 outputs for 10 SMEs) while Information &
1. 12 hours business assist and training
Communication Technology had 2 per SME (32 outputs for
2. Support to introduce a new product/service to the 16 SMEs).In this category, the Manufacturing sector comes
business first with 14 outputs for 4 SMEs (3.5 outputs per SME).
3. Support to introduce a new product/service to the The sector that received most of the 12-hour support was
market Information & Communication Technology with 14 outputs
Figure 3 below shows the count and proportion for
each of the three output types. The total number of outputs
(171) is higher than the total number of SMEs considering
that companies could receive multiple types of business
support. More specifically, each company should complete
only one 12-hours business assist output, but there was no
limitation on the number of new products/services offered
to a business and on the new products/services to market
outputs. The most prevalent type of output was the 12-
hours business assist with a total count of 81 assists, and Figure 3: Count and proportion of project outputs by type
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 5 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
followed by Education & Training and Consultancy sectors (34.1%) for system design and development, and 19 (22.3%)
with 10 outputs each. The Information & Communication for data management and analytics. Across all different types
Technology sector also received most of the product/service of support, the main data science and analytics aspects in
to market support with 6 outputs, and Education & Training which the companies required assistance can be summarised
sector received most of the product/service to business as follows:
support with 14 outputs. 1. Acquiring data (open or proprietary) from external
web resources such as APIs, websites or web reposito-
For the purpose of this analysis, we further introduced ries and transform, filter or combine this data to extract
four categories of digital support based on the requirements business insight.
of the support provided to businesses from the project’s 2. Creating advanced visualisations including interactive
academic partners: dashboards and performing descriptive analytics to
1. Skills development and training: Companies attended gain insight and improve products/services or market-
training seminars and workshops focused on data col- ing practices.
lection, transformation and storage as well as semi- 3. Collecting, storing and analysing data originating
nars about data analysis, visualisation and using data from sensors including IoT.
science and machine learning to get better insight and 4. Using Machine Learning (ML) to improve their prod-
make data-driven decisions. ucts/services with data-driven processes or to provide
2. Data management and analytics: Supported SMEs to recommendations for their customers
collect and store data more efficiently, in order to
make them accessible for analysis and visualisation
and allow for better business insight extraction.
3. System design and development: Supported SMEs
to design and develop end-to-end business solutions
either to improve their products and services or to
enhance internal business decision making and plan-
ning.
4. Other: This refers to a small number of SMEs provided
with a support that does directly fall under one of the Figure 5: Count and percentage of SMEs per type of support
above three categories, such as recycling, and fire and received
safety.
As illustrated in Figure 5, 34 companies (40%) re- Figure 6 presents the number of outputs per type of
quired support related to skills development and training, 29 support. It is evident that the project produced higher number
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 6 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
of outputs for System Design & Development and Data effectively utilize data-driven strategies and techniques to
Management & Analytics support when compared to Skills enhance their business operations.
Development & Training, especially for outputs that assisted
companies to introduce new products and services internally • First, improving digital marketing proved to be a com-
or to the market. mon data-driven use case among small businesses.
Specifically, a large number of the study’s SMEs
managed to derive immense marketing insights from
analysis performed on their own data or relevant open
data, e.g., identifying lead customers, gaining a better
understanding of customer purchasing patterns, etc.
• Second, the use of Social Media (SM) platforms for
digital marketing was found popular among our case
study SMEs. This is driven by the valuable data gen-
erated because of customers communicating SMEs
via their dedicated channels on these platforms. This
SM analytics was used by some enterprises to study
the effect of marketing on different platforms (e.g.,
Figure 6: Output count by type and average support period
Instagram, Facebook, LinkedIn, Twitter) on customer
per type of support purchase and service use intents. This leads SMEs to
retain the highest revenue generating platforms and
unsubscribe from underperforming ones, eventually
The duration of support received by SMEs ranges from reducing costs. The same analytics was also used to
1 to 16 months and the average support period was slightly identify working marketing strategies based on specif-
below 5 months (4.75) per SME. As evident in Figure 6, ically studied Key Performance Indicators (KPIs),
companies that received support to design and develop data- e.g., views, likes, reach, impressions, etc. Table 1
related solutions or to improve their data management and and Figure 7 show that Marketing and Public relations
analytics processes required higher support periods on aver- represents the top support seeking business sectors
age (5-6 months) when compared to companies that received according to our study. Linked to this, a study on
support for skills development and training (average support the adoption of social media and SMEs’ performance
period of 3 months). In Figure 7 we show the top 10 business found that around 70% of SMEs use social media for
sectors supported and the average support period received marketing [22].
by SMEs from each sector. Education & Training sector
SMEs had the highest average support period (8 months), • Third, a good number of the SMEs supported under
followed by Accounting & Finance (7 months) and Human BDC used data-driven methods to identify potential
Health & Social Work Activities (6 months). In general, new markets by performing cross-analysis of their
it was observed that longer support periods were required customer data and various relevant open datasets,
for sophisticated projects, which either demanded expert e.g. government open datasets, such as census, demo-
knowledge, or due to decreased levels of data science and graphics and geospatial data. Such customer analytics
technical skills within the companies. enabled SMEs to locate new markets with a high
Another important observation derived from Figure 7 is demand for their services and products, thus allowing
that System design & development support was requested them to expand their market presence.
by companies across all sectors. This highlights the high • Eventually, performing exploratory analysis on his-
demand for data-driven solutions and services in the market torical data with the objective of creating predictive
and corroborates with the main aim of the study, to encour- models was another common application observed
age digital transformation and use of data in SMEs to achieve among the case study SMEs. Going to the predictive
growth. stage of the analytics ladder enables many small and
medium-sized businesses to make use of cutting edge
3.2. Challenges and Learned Lessons ML and AI technologies. The selected two case stud-
The digital transformation of SMEs and their adoption
ies presented Section 4 represent good examples for
of data science techniques require organizational changes,
this type of application. They include an SME trans-
investment and resource before businesses can reap any
forming from manual to digital business operations,
resulting profits and benefits. However, we have observed
and another adopting machine learning for parameter
during this study that majority of micro and small enterprises
optimization in additive manufacturing.
are hesitant to invest in new digital tools and methods if they
cannot anticipate quick positive results and revenues. From Linked with the above, the research and consultancy work
our analysis, we have empirically recognized four main areas with SMEs under this project has also produced other
(without excluding other potential ones) in which SMEs lessons. Especially, we underline a number of factors found
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 7 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
Figure 7: Count of SMEs received support by type of support and average support period per sector
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 8 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
demand for businesses to exchange real-time information has in the long run. SMEs’ decision-making process could be
grown since the adoption of technologies such as mobile made more data-driven by encouraging a business shift.
commerce, electronic funds transfers, supply chain man-
agement, and online transaction processing. Organizations 3.2.6. Shortage of domain-specific data analysts
could benefit from integrating their IT infrastructures to It is often necessary to combine data analytics skills with
address this need. In many cases, however, portability and business context and domain knowledge in order to analyze
interoperability measures for SMEs are still being imple- business data effectively [? ]. As a result of the limited
mented relatively slowly, while in others they are at an early number of analysts in the business who meet such criteria,
stage. Data integration is a significant problem for SMEs SMEs outsource their data analytics needs to experts at com-
due to high costs and technological requirements and is fre- paratively higher costs. Data science and machine learning
quently cited as being a sticking point within organizations. for process control and modelling are still in their infancy in
Some existing approaches to integration, such as electronic some domains, e.g., additive manufacturing, due to the lack
data interchange (EDI), inventory management, and auto- of many experts who understand the business context. SMEs
mated data collection systems, can help SMEs overcome in manufacturing are less likely to use data for analytics and
some of their integration challenges, but they have their lim- decision-making [27], which makes it harder for them to
itations. It is possible to further simplify austere integration exploit their data to enhance growth and productivity.
problems by utilizing web services nowadays. The need for
case studies on the integration efforts of SMEs and how 3.2.7. Lack of or outdated technical knowledge
they deal with integration and interoperability problems is Small businesses need to develop and upgrade employee
critical [25]. skills in order to grow. The lack of finance, the lack of
capacity in data science and machine learning, and the lack
3.2.4. Limited internal and external financial of awareness of innovation funds, among others, prevent
resources many SMEs from remaining competitive as technology ad-
In our research work with SMEs, we discovered that vances in terms of employee skills and competence. These
most of them are able to recognize and utilize data and are especially applicable to smaller companies, such as man-
analytics to grow their businesses. Also, they we noted ufacturing and technology companies, who face challenges
their willingness to use data analytics and invest in related when designing, developing, and testing new products or
technologies to derive meaning from their data. Despite their upgrading their existing ones to meet new needs. Based on
attempts for adopting data technology, their capabilities are Section 3.1 analysis, we can understand that the percentage
insufficient since they do not possess the required knowledge of SMEs with skills and training development needs is
and expertise, nor the budget to hire specialists or outsource roughly 40% as shown in Figure 5.
their analytic needs. A further barrier comes from SMEs
having limited access to funding and loans compared to big 3.2.8. Insufficient knowledge of the data tools and
companies. Thus, the significance of the ERDF and other funds available
similar projects, which aim to bridge this gap, is justified. A In our study, we found that SMEs are unaware of the
greater level of financial and technical support is also needed available funding, especially for adopting innovative digi-
to foster the adoption of data analytics by micro- and small- tal and data-driven ideas. Knowledge Transfer Partnerships
sized businesses, especially during this period of COVID- (KTP) are considered one of the most appealing funding
19 [26]. schemes with respect to the BDC’s digital assistance and
collaboration with SME business owners since minimal con-
3.2.5. Perceived ease of doing business conventionally tributions are required from SMEs and the partnership may
The majority of SMEs operate in specific business sec- have a high level of impact on SMEs. In addition, our study
tors and are dependent upon conventional business meth- found that only a very small percentage of micro enterprises
ods. Thus, many business owners would prefer to maintain are aware of some of the most widely used data management
their traditional business practices and refrain from adopt- and analysis tools, and the benefits these tools can provide.
ing disruptive technologies, such as artificial intelligence, It proved crucial to many SMEs to support their adoption of
data analytics, etc. As a consequence, they are not tempted these tools (including Dropbox, Power BI, and R Studio).
to use data other than for record-keeping purposes. There
are a number of possible explanations for this reluctance 3.2.9. Capacity limitation in data science and machine
to leverage the benefits of advanced data usage-not just learning
a misunderstanding of what these advances are meant to One other primary challenge facing SMEs in digitizing
offer their companies, but also the perceived short-term and adopting AI and data science technologies is the inade-
disruptions such changes might cause-such as learning new quacy or lack of working knowledge and capacity in these
tools, implementing software packages or investing in cloud technologies. This barrier is partially linked to the other
computing, and hiring employees with data analytics skills. aforementioned challenges such as SMEs limited resources,
However, the result is an overlooked data-driven business e.g. finance. Figure 5 shows the breakdown analysis of the
opportunity, which could have been beneficial for the SMEs digital technology support sought by the sample SMEs of
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 9 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
which skills development and training forms the largest per- the following guidance to the business: "your care services,
centage (40%) followed by system design and development treatment and support achieves good outcomes, helps you
(34%). For some SMEs, this was partially compensated by to maintain quality of life and is based on the best available
outsourcing data science and AI solutions from external evidence.".
expertise such as cloud computing-based Machine learning We list below the workflow steps of our collaboration
as a Service (MLaaS) platforms [26]. plan with PBL Care for its digitization: step 1: Identification
of Key Performance Indicators (KPIs); step 2: Data collec-
tion; step 3: Data visualisation and dashboard prototyping;
4. Case studies in digitization and adoption of and step 4: Data interpretation and evaluation. The aim of
AI and analytics aspects in SMEs the last step is to analyse outcomes following the implemen-
In this section, we present two illustrative case studies. tation of new processes by PBL Care management team; an
The first one (Section 4.1) is about an SME that transformed analysis that could lead to further process changes. In the
from manual business operations to digital processing under following sections we will describe each of these steps.
this research and support project. The second case study
looks at an additive manufacturing company that embarks 4.1.3. Development of a digital solution
the route to AI and Machine Learning adoption for parameter Identification of KPIs In order to identify KPIs, we
optimization (Section 4.2) started by looking at paper-based information collected by
PBL Care. The SME used paper “log books" to record
4.1. A data-driven solution for monitoring the information such as date, time in, time out, carer’s name,
delivery of PBL care services detailed log of tasks carried out and any concerns raised,
PBL Care is a domiciliary care service SME based in food/fluids taken, pad changed, etc. Care staff fill one log
Birmingham - West Midlands [28]. It provides home care book per visit at the service user (patient) location. We
and support to individuals who live independently in their identified the following attributes as key for monitoring:
own homes. The SME offers a wide range of services such Patient ID, Care staff ID, Date, Time In, Time Out.
as personal care, assistance with eating and toileting, med-
ication support, and palliative care. Among the individuals Data Collection Data collection templates were created
supported by PBL Care, some have physical disabilities, de- using Excel spreadsheets to digitally gather data based on the
mentia, and mental health conditions. PBL Care is regulated identified KPIs. For the first iteration of data collection and
by the Care Quality Commission (CQC), part of the UK dashboard prototyping, care staff continued to fill in paper
Department of Health and Social Care, responsible for the log books at the patients’ homes and other staff inputted
regulation and inspection of health and social care services these data in the spreadsheet at PBL Care office. We will
in England [29]. also discuss how this process was later improved.
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 10 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 11 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
improvement process, these steps can then be repeated and development projects for a range of energy systems includ-
as the number of iterations grows, it is likely to observe ing fuel cells, turbine machinery, nuclear, concentrated solar
improvements in the services provided and in the solution power, and other heat and power generation systems for auto-
developed. motive and airspace applications. HiETA’s unique technical
Conscious of the importance of collecting data digitally, capabilities and technologies include the development of
PBL Care started to use an accredited software provider from heat transfer surfaces with increased high heat transfer. The
February 2019, including a mobile application for care staff company strives to create methods and processes that dra-
and a web application for PBL Care management team. In matically reduce product size, and increase cycle efficiencies
March 2019, the Care Quality Commission conducted a new and product life.
inspection of PBL Care which resulted in a better rating from
“Requiring improvement” from "satisfactory" to “Good”. 4.2.1. Problem statement
We have learnt that a digital solution can help monitor the HiETA Technologies currently uses a series of trial
provision of care. However, it must not be the only source and error testing (printing component samples repeatedly
of information for driving decisions. Underlying factors can and testing their quality and defects) to optimize process
be at play behind the scenes and not be reflected on the parameters and get to optimal input values for the production
dashboard. The digital tool can help generating hypotheses of final components. This brute force method, commonly
and highlight patterns that need to be raised in conversations used in the wider 3D printing business sector, does not only
between management staff, care staff, and service users. For cost significant amount of time causing major delays, but
example, the management team must not jump to conclu- also incurs countless failures and mistakes before yielding
sions and blame a care staff if the dashboard shows that the right parameters that work for the given production.
an employee had a couple of visits that lasted 5 minutes. To address this issue, the company wants to investigate the
A simple reason like a service user telling the carer he did possibility of using machine learning in their processes to
not require care on that day could be the explanation. The reduce errors and the high time and material costs associated
identification of KPIs and interpretation of data must not be with the use of their existing trial and error method. On that
based only on performance, it must also reflect the provision initiative, HiETA joined the BDC in a research collaboration
of good care and satisfaction of individuals, both service to investigate this problem.
users and care staff.
4.2.2. Initial data preparation and exploration
4.2. Process parameter optimization for Additive HiETA’s primary objective of applying machine learning
Manufacturing using supervised machine in its additive manufacturing process is to automatically
learning identify the optimal parameters for building components
Additive Manufacturing (AM) is the process of fabricat- with few or no trails. In particular, the task of this first
ing components by adding layer-upon-layer of materials with stage is to develop a bespoke predictive model to be used
the aid of digital 3D design [33]. AM has recently gained in the material characterisation of new product develop-
increasing research attention because of its advantages in ment. This will in the long term optimise process parame-
comparison with traditional subtractive manufacturing [34]. ters for given geometry-material-machine combination, thus
Although, the use of machine learning for additive manufac- improving the SME’s understanding of parameter interac-
turing is still in its infancy stage, several machine-learning tions and cause-effect relationships in the AM process. Such
algorithms have been applied in AM tasks including param- predictive model will also establish a relationship formula
eter optimization and fault detection [23, 33]. Related to between the input and target parameters enabling HiETA
this, Meng et al. [23] reviews latest applications of different to run the minimum possible experiments to get the right
machine learning algorithms in additive manufacturing. The parameter values that work. This will also focus production
study matches various ML methods to corresponding AM time and resources into a small pool of experimental trials.
applications including parameter optimization and anomaly HiETA provided the BDC project with sample data of
detection. six parameters for this pilot investigation. Table 2 shows
This case study is about parameter optimization in 3D anonymised statistical summary of the sample parameters
printing for a company called HiETA Technologies1 . It data and their modelling roles.
attempts to investigate a model for establishing relation- You will note that four of data parameters, namely; laser
ship between specified process inputs and defect indica- power (LP), point distance (PD), hatch option (HO), and
tors in produced components. HiETA Technologies is a re- exposer time (ET) are predictors whereas border length den-
search and product development company founded in 2011. sities (S1BLD, S2BLD, S3BLD) and bulk length densities
The SME specializes in the use of Additive Manufacturing (S1BULD, S2BULD, S3BULD) for samples 1, 2, and 3
(metal 3D printing) for thermal management and light- are response variables. The table shows that all four input
weighting solutions. They offer an end-to-end metal 3D parameters are normally distributed as shown in their mean-
printing services in additive manufacturing and engage in median equality and their skewness. This suggests that there
is no need for performing any variable transformation prior
1 https://www.hieta.biz/
to training a predictive model. The roles indicated by output
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 12 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
Table 2
Summary statistics of the Sample data parameters and roles (rounded to 3 S.F)
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 13 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
models. One challenge facing HiETA, and perhaps the wider Table 4
AM research and product development companies, is the Defect Prediction performance of different regression algo-
lack or insufficiency of past fabrication data, which led for rithms based on oversampled data of size 500 instances: all
some researchers to suggest the use of simulated data for figures are rounded to 3 S.F
ML-based parameter optimization [23]. Insufficient training
Algorithm S1BLD S2BLD S1BULD S2BULD
data is a common challenge in machine learning, particularly
for domains starting to adopt AI or in which the process of Baseline 1.475 1.046 0.474 0.865
creating the training data is very expensive and time consum- Linear 1.32 0.926 0.384 0.74
ing such as additive manufacturing [23, 35]. To mitigate that, Polynomial 0.485 0.259 0.117 0.231
Decision Tree 0.410 0.25 0.090 0.126
we have used SMOGN algorithm [36] to oversample the data
Random Forest 0.461 0.289 0.104 0.161
into various sizes and up-to 2000 instances as the original SVM (rbf) 0.777 0.423 0.139 0.303
data was just less than 100 observations. It is widely accepted Neural Network 0.598 0.314 0.140 0.251
in AI/ML that predictive accuracy improves with increased
training data [37, 38]. Resampling limited training data and
synthesising new data instances is a common practice in Table 5
predictive machine learning with the objective of increasing Defect Prediction performance of different regression algo-
small-sized datasets. rithms based on oversampled data of size 1000 instances: all
figures are rounded to 3 S.F
To build a product defect predictive model, we have
applied 5 regression algorithms to the oversampled data Algorithm S1BLD S2BLD S1BULD S2BULD
of up-to 2000 instances. Tables 4-5 summarize the Root Baseline 1.718 1.188 0.513 0.954
Mean Square Error (RMSE) results of the models built Linear 1.282 1.025 0.380 0.786
with these algorithms and the best over-sampling rates, 500 Polynomial 0.433 0.293 0.144 0.286
and 1000. The RMSE is a regression performance measure Decision Tree 0.379 0.257 0.092 0.114
used to evaluate the differences between actual and model Random Forest 0.404 0.284 0.106 0.140
predicted parameter values. It is computed as per Equation 1, SVM (rbf) 0.545 0.363 0.148 0.335
where 𝑛, 𝑦𝑖 , 𝑦̂𝑖 represent the sample size, actual and predicted Neural Network 0.466 0.753 0.165 0.285
values respectively.
Table 6
√
√ 𝑛 Average dataset performance over all regression algorithms
√∑ (𝑦̂𝑖 − 𝑦𝑖 )2
𝑅𝑀𝑆𝐸 =√
algorithms
(1)
𝑛
𝑖=1 Data size S1BLD S2BLD S1BULD S2BULD
Tables 4-5 shows that decision trees, random forest, original 0.965 0.770 0.206 0.590
and polynomial models perform better for this sample data 500 0.790 0.501 0.207 0.383
1000 0.747 0.595 0.221 0.414
compared to other applied regression models. Polynomial
1500 0.802 0.664 0.207 0.391
models are known for their high computational complexity
2000 0.872 0.680 0.221 0.403
with increasing data variables and degrees. Overall, it is
clear from this set of results, that decision trees produce
lowest prediction error consistently across all models and over-sampled datasets 1500 and 2000 for nearly all models
for all oversampling rates2 . The fact that linear regression are higher than those achieved with lower level oversam-
has the lowest accuracy among all models suggests that pling, e.g. the 500 and 100 observations. This may be due to
the relationship between manufacturing defects and input the fact that the noise in the data gets significantly amplified
parameters are nonlinear. This corroborates the findings in at the higher oversampling rates.
the initial data exploration, e.g., the correlation coefficients To investigate any possible further improvement that
between the predictors and target parameters as in Table 3 might be achieved with the best identified model, we have
In Table 6, we show the average performance of all tuned the decision tree depth hyper-parameter. For illustra-
algorithms across varying over-sampled data sizes including tion purpose only Table 7 and Figure 14 show the model
original one 3 . errors (hence accuracy) for the S1BLD target variable for
Figure 13 visually illustrates the data size effect on different oversampled training data sizes while varying de-
prediction performance as presented in Table 6. Empirically cision tree depths. The results show the model accuracy
speaking, the oversampled data in the range between 500- improves with increasing decision tree depth, although the
1000 produces best overall average results across models. errors become stable from a maximum depth of 21. In
In other words, the average prediction errors produced with addition, the over sampled data of size 1500 instances con-
2 This includes results achieved with other over-sampled data sizes sistently achieves the lowest error with various models.
above 1000 and up-to 2000 instances. Based on our analysis, there are several reasons for the
3 The size of the original data could not be disclosed due to SME’s
weakness of tested models with the given sample data size
business confidentiality
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 14 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 15 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
Our analysis clearly suggests that most SMEs collect [17] M. Willetts, A. S. Atkins, C. Stanier, Barriers to smes adoption
and store some sort of business data but require skills to of big data analytics for competitive advantage, in: 2020 Fourth
analyse and produce useful insights for data-driven decision International Conference On Intelligent Computing in Data Sciences
(ICDS), IEEE, 2020, pp. 1–8.
making. If SMEs are empowered with the right skills and/or [18] S. Coleman, R. Göb, G. Manco, A. Pievatolo, X. Tort-Martorell,
supported financially for this purpose, they can make full M. S. Reis, How can smes benefit from big data? challenges and a
utilization of data to help them, and hence driving the growth path forward, Quality and Reliability Engineering International 32 (6)
of entire economy. (2016) 2151–2164.
[19] L. Ghahremanlou, A.-R. H Tawil, P. Kearney, H. Nevisi, X. Zhao,
A. Abdallah, A survey of open data platforms in six uk smart city
References initiatives, The Computer Journal 62 (7) (2019) 961–976.
[20] J. Wolf, T. L. Pett, Small-firm performance modeling the role of
[1] M. Savvy, Uk small business statistics (2021). product and process improvements, Journal of Small Business Man-
URL https://www.merchantsavvy.co.uk/uk-sme-data-stats-charts/ agement 44 (2) (2006) 268–284.
[2] M. W. Georgina Hutton, Business Statistics, House of Commons [21] A. Landi, M. Thompson, V. Giannuzzi, F. Bonifazi, I. Labastida,
Library, 2021. L. O. B. da Silva Santos, M. Roos, The “a” of fair–as open as possible,
URL https://researchbriefings.files.parliament.uk/documents/SN as closed as necessary, Data Intelligence 2 (1-2) (2020) 47–55.
06152/SN06152.pdf [22] S. A. Qalati, L. W. Yuan, M. A. S. Khan, F. Anwar, A mediated model
[3] M. Mohamed, P. Weber, Trends of digitalization and adoption of big on the adoption of social media and smes’ performance in developing
data & analytics among uk smes: Analysis and lessons drawn from countries, Technology in Society 64 (2021) 101513.
a case study of 53 smes, in: 2020 IEEE International Conference on [23] L. Meng, B. McWilliams, W. Jarosinski, H.-Y. Park, Y.-G. Jung,
Engineering, Technology and Innovation (ICE/ITMC), IEEE, 2020, J. Lee, J. Zhang, Machine learning in additive manufacturing: a
pp. 1–6. review, Jom 72 (6) (2020) 2363–2377.
[4] S. M. Chege, D. Wang, The influence of technology innovation on sme [24] J. Mancini, Data portability, interoperability and digital platform
performance through environmental sustainability practices in kenya, competition: Oecd background paper, Interoperability and Digital
Technology in Society 60 (2020) 101210. Platform Competition: OECD Background Paper (June 8, 2021).
[5] I. Alvarez, I. Zamanillo, E. Cilleruelo, Have information technologies [25] M. Themistocleous, H. Chen, Investigating the integration of smes’
evolved towards accommodation of knowledge management needs in information systems: an exploratory case study, International Journal
basque smes?, Technology in Society 46 (2016) 126–131. of Information Technology and Management 3 (2-4) (2004) 208–234.
[6] J.-w. Lee, Analysis of technology-related innovation characteristics [26] OECD, The Digital Transformation of SMEs, OECD Publishing,
affecting the survival period of smes: Focused on the manufacturing 2021. doi:https://doi.org/10.1787/bdb9256a-en.
industry of korea, Technology in Society 67 (2021) 101742. URL https://www.oecd-ilibrary.org/content/publication/bdb9256a
[7] S. Wang, H. Wang, Big data for small and medium-sized enterprises -en
(sme): a knowledge management model, Journal of Knowledge Man- [27] A. Soroka, Y. Liu, L. Han, M. S. Haleem, Big data driven customer
agement. insights for smes in redistributed manufacturing, Procedia CIRP 63
[8] D. Laney, 3D data management: Controlling data volume, velocity, (2017) 692–697.
and variety, Tech. rep., META Group (February 2001). [28] Pblcare caring about needs (2022).
URL http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3 URL https://pblcare.com/
D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety [29] Care quality commission: The independent regulator of health and
.pdf social care in england (2022).
[9] A. Gandomi, M. Haider, Beyond the hype: Big data concepts, meth- URL https://www.cqc.org.uk/
ods, and analytics, Int. J. Inf. Manag. 35 (2015) 137–144. [30] H. Atasoy, B. N. Greenwood, J. S. McCullough, The digitization of
[10] B. Marcinkowski, B. Gawin, Data-driven business model patient care: a review of the effects of electronic health records on
development–insights from the facility management industry, health care quality and utilization, Annual review of public health 40
Journal of Facilities Management. (2019) 487–500.
[11] A. Khayer, M. S. Talukder, Y. Bao, M. N. Hossain, Cloud computing [31] M. Mihailescu, D. Mihailescu, The emergence of digitalisation in the
adoption and its impact on smes’ performance for cloud supported context of health care, in: Proceedings of the 51st Hawaii International
operations: A dual-stage analytical approach, Technology in Society Conference on System Sciences, 2018. doi:10.24251/HICSS.2018.387.
60 (2020) 101225. [32] J. Soikkeli, M. Pulkkinen, T. Ruohonen, Evaluating the value of
[12] OECD, The Digital Transformation of SMEs, OECD Studies on enterprise resource planning in home care services, in: The European
SMEs and Entrepreneurship Series, OECD, 2021. Conference on Information Systems Management, Academic Confer-
URL https://books.google.co.uk/books?id=-Kc8zgEACAAJ ences International Limited, 2013, p. 249.
[13] K. Lepenioti, A. Bousdekis, D. Apostolou, G. Mentzas, Prescriptive [33] U. Delli, S. Chang, Automated process monitoring in 3d printing us-
analytics: Literature review and research challenges, International ing supervised machine learning, Procedia Manufacturing 26 (2018)
Journal of Information Management 50 (2020) 57–70. 865–870.
[14] M. Ghasemaghaei, Understanding the impact of big data on firm [34] X. Qi, G. Chen, Y. Li, X. Cheng, C. Li, Applying neural-network-
performance: The necessity of conceptually differentiating among based machine learning to additive manufacturing: current applica-
big data characteristics, International Journal of Information Manage- tions, challenges, and future perspectives, Engineering 5 (4) (2019)
ment (2019) 102055. 721–729.
[15] P. R. GOV.UK, 2,500 new places on artificial intelligence and data [35] M. A. Lateh, A. K. Muda, Z. I. M. Yusof, N. A. Muda, M. S. Azmi,
science conversion courses (2020). Handling a small dataset problem in prediction model by employ
URL https://www.gov.uk/government/news/2500-new-places-on-art artificial data generation approach: A review, in: Journal of Physics:
ificial-intelligence-and-data-science-conversion-courses-now-o Conference Series, Vol. 892, IOP Publishing, 2017, p. 012016.
pen-to-applicants [36] P. Branco, L. Torgo, R. P. Ribeiro, Smogn: a pre-processing approach
[16] S. K. Lam, S. Sleep, T. Hennig-Thurau, S. Sridhar, A. R. Saboo, for imbalanced regression, in: First international workshop on learn-
Leveraging frontline employees’ small data and firm-level big data in ing with imbalanced domains: Theory and applications, PMLR, 2017,
frontline management: An absorptive capacity perspective, Journal of pp. 36–50.
Service Research 20 (1) (2017) 12–28.
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 16 of 17
Towards an Effective Data-Driven Decision Making in UK SMEs
[37] M. Banko, E. Brill, Scaling to very very large corpora for natural
language disambiguation, in: Proceedings of the 39th annual meeting
of the Association for Computational Linguistics, 2001, pp. 26–33.
[38] A. Halevy, P. Norvig, F. Pereira, The unreasonable effectiveness of
data, IEEE Intelligent Systems 24 (2) (2009) 8–12.
[39] C. N. Kroll, P. Song, Impact of multicollinearity on small sample hy-
drologic regression models, Water Resources Research 49 (6) (2013)
3756–3769.
Abdel-Rahman Tawil, Muhidin Mohamed, Xavier Schmoor, Konstantinos Vlachos, Diana Haidar Page 17 of 17