A Machine Learning Approach To Predict Production Time U - 2022 - Advanced Engin

Advanced Engineering Informatics 52 (2022) 101631
Contents lists available at ScienceDirect
Advanced Engineering Informatics

journal homepage: www.elsevier.com/locate/aei
Full length article
A machine learning approach to predict production time using real-time

RFID data in industrialized building construction
Osama Mohsen *, Yasser Mohamed, Mohamed Al-Hussein
Department of Civil and Environmental Engineering, University of Alberta, Edmonton, Alberta T6G 1H9, Canada
A R T I C L E I N F O A B S T R A C T
Keywords: Industrialized building construction is an approach that integrates manufacturing techniques into construction
Industrialized building construction projects to achieve improved quality, shortened project duration, and enhanced schedule predictability. Time
Prefabricated construction savings result from concurrently carrying out factory operations and site preparation activities. In an industri
Production time
alized building construction factory, the accurate prediction of production cycle time is crucial to reap the
Time prediction
advantage of improved schedule predictability leading to enhanced production planning and control. With the
RFID
Machine learning large amount of data being generated as part of the daily operations within such a factory, the present study
proposes a machine learning approach to accurately estimate production time using (1) the physical charac
teristics of building components, (2) the real-time tracking data gathered using a radio frequency identification
system, and (3) a set of engineered features constructed to capture the real-time loading conditions of the job
shop. The results show a mean absolute percentage error and correlation coefficient of 11% and 0.80, respec
tively, between the actual and predicted values when using random forest models. The results confirm the sig
nificant effects of including shop utilization features in model training and suggest that predicting production
time can be reasonably achieved.
1. Introduction which encompasses automation, data exchange, and digitization in

manufacturing [6–8]. Critical reviews of IBC were conducted that have
Various terms, such as prefabricated construction, offsite construc identified the major research topics, trends, and challenges in this field
tion, and modular construction, to name a few [1], all refer to the same [1,9,10]. Project planning and design, including workflow optimization
approach: a construction process that applies manufacturing principles and project delivery process, have been among the most frequently
and techniques to the construction industry where products (in this case, studied research themes in IBC over the past decade [11].
building components) go through a particular life-cycle from concept, to An IBC factory is one of the complex manufacturing environments
design, to planning, to manufacturing, and then on-site assembly, as due to numerous factors, namely, (i) the diversity of building compo
shown in Fig. 1. The term “industrialized building construction” (IBC) is nents that leads, occasionally, to the complete customization of building
used herein to denote any of the terms above. modules; (ii) the integration of many sub-systems (structural, electro
Industrialized building construction (IBC) approaches, including mechanical, communications, finishing) that must be included in the
offsite, prefab, and modularization, are being used by more than 80% of building; (iii) the large and heavy machinery and components involved
contractors in the U.S., underscoring the potential for future market in the manufacturing process; and (iv) the numerous interrelated ac
growth [2]. In fact, it is projected that modular construction will reach tivities including production, assembly, finishing, delivery and on-site
$130 billion in market value in the U.S. and Europe by 2030 [3]. In installation [12].
Canada, modular construction accounted for a market value of more In any manufacturing process, as well as in the context of IBC, a job’s
than $1 billion in 2019 [4]. Building construction enterprises have come cycle time (CT) is a key performance metric [13], defined as the time
to recognize the value of information technologies and knowledge dis taken to complete the production process of one product from start to
covery approaches [5]. The most recent evidence is the emerging finish. A closely related term is lead time (LT), which is defined as the
paradigm of Construction 4.0, a term coined in reference to Industry 4.0, time taken from order placement until the order is delivered to the
* Corresponding author.
E-mail address: omohsen@ualberta.ca (O. Mohsen).
https://doi.org/10.1016/j.aei.2022.101631
Received 4 August 2021; Received in revised form 15 April 2022; Accepted 9 May 2022
Available online 27 May 2022
1474-0346/© 2022 Elsevier Ltd. All rights reserved.
O. Mohsen et al. Advanced Engineering Informatics 52 (2022) 101631
customer. Accurate prediction of cycle time is crucial when determining data, such as knowledge extraction, information discovery, information
customer delivery dates, when scheduling resources and actions, and harvesting, and data pattern processing [16,17]. Several studies have
when controlling daily operations. Scheduling in IBC depends on the investigated the application of KDD in the construction industry. For
schedule of on-site activities, the factory production processes, and the example, KDD processes are used to identify the causes of delays in
expected delivery dates of raw materials from suppliers. As such, pro drainage pipeline installation, focusing mainly on data preparation [18];
duction CT constitutes a critical component in this supply chain, from to identify the key factors that contribute to delays in construction
raw material to the end product. A more predictable production projects [19]; to estimate future resource requirements in multi-project
schedule allows the production/project manager to mitigate delays and industrial construction [20], and to forecast construction project time
implement alternative actions should there be any unexpected inter and cost using support vector regression methods [21]. A detailed
ruption to factory production. In addition, accurate prediction of CT literature review is presented of the current applications and future
enables effective management of the fabrication shop’s capacity and potential of data analytics in construction, affirming that adopting these
workload. technologies in construction lags its broader application in other fields
In the literature, several methods have been developed to predict [22]. More recently, various ML algorithms are being used in con
cycle time [14]. Simulation has been the most widely used approach in struction to detect non-certified work on sites [23], classify workers
that it can realistically capture the complexities of manufacturing; encountering high fatality risk accidents [24], verify the personal pro
however, practical disadvantages include the expense of model devel tective equipment compliance of workers [25,26], and forecast profit
opment and maintenance and the emphasis on average performance margins for new projects using an applied machine learning process
rather than on the prediction of CT for an individual product. On the [27]. It is evident from the literature that ML application in many areas
other hand, the most fundamental analytical approach to predicting the of the construction industry is gaining increasing acceptance and
average CT is Little’s law [15], where cycle time is deterministically adoption; however, these techniques have seen limited applications in
calculated. Such a static model does not provide good predictions project planning and scheduling in IBC.
because it does not capture the stochastic and make-to-order nature of To enable the application of KDD and ML, the availability of large
IBC manufacturing; nevertheless, it can be used as a benchmark. Despite datasets is crucial. One technology for automatic real-time data acqui
a large amount of data being collected in IBC factories and the advances sition typical to IBC is the radio frequency identification (RFID) tracking
over the last decade to collect and analyze data automatically, produc system. RFID is used as a sensing mechanism to locate product tags using
tion data are not being fully utilized to improve decision-making in IBC. the electromagnetic field of antennas connected to RFID readers, which
The main goal of the present study is to devise a framework for ac populate a central database with tracking information. Within any
curate prediction of CT that exploits the knowledge embedded in the specific setup, each tag contains unique identification information. The
raw data from production tracking systems. The present study represents RFID system setup used in the present study is shown in Fig. 3; it consists
a middle approach between simple statistical methods (e.g., Little’s law) of three main components: (a) scanning antennas, (b) a transceiver that
and more sophisticated simulation modeling techniques. More specif reads and interprets product timestamps, and (c) transponders/tags that
ically, the present study aims at extracting knowledge about the status of have been programmed with information and are attached to the objects
the workload of the production environment that is not readily attain being tracked (i.e., wall panels in the present study). Valero [28] and
able using existing production tracking data. Such extracted knowledge Valero and Adán [29] provided a comprehensive review of recent RFID
is then used to augment the inputs to machine learning (ML) models to applications in construction.
improve the prediction of CT.
The remainder of this paper is organized as follows. Section 2 pro 2.2. Machine learning for cycle time predictions
vides a brief review of the relevant literature concerning IBC and the
application of ML in both construction and production cycle time pre An overview of machine learning applications in manufacturing was
diction. The study objectives and research methods are presented in conducted and indicated that future research would be focused on
Sections 3 and 4. Section 5 describes the data preparation and feature design, shop floor control, supply chain management, and developing
engineering experiments, while Section 6 describes our experimentation solutions that can be easily integrated with existing systems [30].
with ML models, including a summary and discussion of the results. Similarly, Choudhary et al. [31] reviewed the literature about data
Lastly, the conclusions, including limitations and future work, are pre mining in manufacturing, focusing on categorizing the articles into the
sented in Section 7. five primary functions of data mining, namely, description, association,
classification, prediction, and clustering. In recent years, a few studies
2. Related literature have focused on predicting LT and CT using ML approaches in
manufacturing environments other than IBC. For example, a regression
2.1. Knowledge discovery in construction tree is used to estimate LT in a hypothetical make-to-order
manufacturing factory [32], a hybrid method of clustering and regres
The term “Knowledge Discovery in Databases” (KDD) had emerged sion trees is used to predict CT in a semiconductor factory [33], and in
by the late-1980s to collectively refer to the various methods for iden the context of wafer fabrication, a CT range is estimated using fuzzy
tifying valid, practical, and nontrivial patterns and similarities in raw backpropagation network, principal component analysis, and fuzzy c-
Fig. 1. Simplified high-level workflow in industrialized building construction.
2
means [34]. Another study used genetic programming to predict process (1) to accurately predict the CT of a production process using ML models
CT based on system status information [35]. More recently, Gyulai et al. and real-time data; (2) to examine the effect of augmenting the dataset
[36] compared analytical and ML techniques for LT prediction in the with domain-specific engineered features that capture the real-time
optics industry, and Lingitz et al. [37] compared the accuracy of several loading conditions of the fabrication shop; and (3) to investigate the
regression algorithms when predicting LT in semiconductor effect of different grouping strategies, used to combine several features,
manufacturing; both suggested that random forests are the highest ac on the prediction accuracy.
curacy models in terms of lower error measures.
Although IBC is an interdisciplinary field at the interface of 4. Research approach and methods
manufacturing and construction, relatively few ML studies within this
domain have been conducted. IBC typically employs a make-to-order The research approach is based on an integrated KDD framework
production approach where products (i.e., building components or consisting of several interconnected phases, as shown in Fig. 2. This
modules) are only built after a confirmed customer order. Products are framework is developed based upon commonly used frameworks for
also highly customized, and their production must be coordinated with data mining, such as the cross-industry standard process model for data
site delivery; all the above factors make the prediction of production mining (CRISP-DM) [38]; Oracle’s architecture development process
cycle time a complex task. Previous studies applying ML to predict CT [39]; the work published by Fayyad et al. [16], Davenport et al. [40],
have mainly used simulated data. In contrast, the real-time RFID data Dietrich et al. [41]; and the IBM foundational methodology for data
utilized in this study is more prone to noise than simulated data and, science [42].
hence, is more challenging to manipulate and prepare for modeling. Phase (1) involves identifying and collecting data by determining
Actual RFID production data also provides a more accurate and realistic some criteria for data inclusion suitable for the present analysis. This
knowledge representation of the production system. phase comprises two main steps: identifying data and acquisition
mechanisms and identifying appropriate data storage media. In this
3. Study objectives research, data is acquired from a central database containing (a) the
physical properties of wall panels as stored in BIM and (b) the RFID
The present study builds upon the previous research to achieve an timestamps captured by the panel tracking system.
accurate prediction of cycle time in manufacturing environments using Phase (2) involves the iterative tasks of cleaning and preparing the
ML in the context of IBC, which differs from other manufacturing en data once it has been captured and stored. Essentially, this phase en
vironments as described above. The goal is to develop data-driven pre deavors to answer the following two questions: Is the data collected
dictive models to estimate the cycle time required to accomplish a highly representative of the problem to be solved? What should efficient
customized production process in a wall panel manufacturing facility cleaning steps be followed to represent the data in a format suitable for
and to explore the effect of different product and production process prediction models?
attributes on the predictability of CT. The primary input data to the Phase (3) is subdivided into two tasks. The first is the construction of
target models include 1) physical characteristics of wall panels obtained engineered features to reflect the real-time loading conditions of the job
from building information models (BIM), 2) RFID timestamps used to shop. In the second task, the newly generated features are combined
track panel production, and 3) a set of engineered features that depicts with the cleaned dataset from phase (2), and the resulting integrated
the real-time loading conditions of the job shop. data is then directed back through phase (2) for further preparation. The
More specifically, the present study seeks to achieve three objectives: output of this iterative process is a “tidy dataset.” Tidy data, it should be
Fig. 2. Framework of research approach.
3
noted, is a common term used in data science and has been formally The floor layout is configured such that each MWP is framed, prep
defined by Wickham [43] as “a standard way of mapping the meaning of ped, sheathed, and nailed between antenna locations A1 to A5. Then,
a dataset to its structure” in which each variable/feature forms a col each MWP is cut into several single panels and transferred to a butterfly
umn, each observation/record forms a row, and each type of observa table, where antenna A6 records when the single wall panel starts the
tional unit forms a table. To realize the CT, two factors should be following production phase. This study focuses on the first phase of
quantified: the quantified product composition (PC), which represents production: the MWP production starting at antenna A1 (located at the
physical properties, and the shop loading conditions (SL). Accordingly, front of FS) and ending at antenna A5 (located at the end of MFB).
based on PC and SL, the predicted value of CT, CTpred, can be formulated
as shown in Equation (1): 5.2. Data acquisition and description
CT pred = f (PC, SL) (1)
The raw data analyzed in this study covers a period of 42 months
where f stands for the mapping function from PC and SL to CTpred. between February 2015 and August 2018. The raw data is composed
Phase (4) involves mining the dataset by applying various ML primarily of two datasets. The first dataset (DF1) is the “RFID Readings”
regression techniques to the different versions of the tidy data. These dataset that contains timestamps for each MWP along the production
versions are developed for the purpose of experimenting with different route. It consists of 416,948 records and ten attributes. The second
aspects of data aggregation and integration. Phase (5) involves evalu dataset (DF2) is the “Multipanel” dataset that contains descriptive at
ating the developed models using five performance metrics. Model tributes of each MWP, including the physical properties of the panels. It
evaluation typically consists of two main phases: the diagnostic phase, consists of 39,703 records and 37 attributes. Fig. 4 illustrates the flow of
which ensures the model is working as intended, and the statistical the data identification and data cleaning steps. Initial data identification
testing phase, which is applied to ensure that the data is being properly is performed to set the study’s limits and to include only the information
handled and interpreted within the model. relevant to the current analysis. These steps are expert rules depending
The following sections describe each of these phases in greater detail, mainly on the specific application. For example, in DF1, steps include (a)
including the results. removing records before September 11, 2015, since the factory layout
and RFID system setup was changed, and (b) only keeping RFID records
5. Identifying and preparing the data of antennas A1 through A5 as these correspond to the scope of the study.
As for DF2, the steps include (a) removing panels that are manually
5.1. Process description manufactured and (b) excluding any attribute where all data are
missing, or there is only one dominant value.
The production of prefabricated wall panels at an IBC factory located In addition, as part of this phase of the methodology, descriptive
in Edmonton, Canada, is analyzed. The production floor layout is shown statistics and data visualization are applied to assess the content and
in Fig. 3, where the scope of the present study is restricted to the quality and to gain initial insights into the data. Fig. 5 shows the total
multiwall panel (MWP) production route, which consists of four number of RFID readings per month, per week, and using a 14-day
consecutive workstations: framing station (FS), buffer table (BT), rolling window. Periods during which there is a low number of RFID
sheathing station (SS), and multifunction bridge (MFB). A wall panel readings, such as from July 2016 to September 2016, might indicate low
goes through these stations in sequence, and no two panels are processed productivity periods due to market recessions. Such periods represent
at one station simultaneously. RFID antennas are located between abnormal operating conditions and should be excluded from the analysis
consecutive workstations where timestamps are recorded by reading as they may skew the results.
RFID tags attached to wall panels as they pass by. However, a more consistent indicator of daily productivity is to
count the number of panels produced each day over the study period.
Fig. 3. Production floor layout and location of RFID antennas between workstations.
4
Fig. 4. Flowchart of initial data identification and data preprocessing steps.
Fig. 5. Number of RFID readings per month, per week, and 14-days rolling window.
For example, there were approximately 1,400 panels manufactured predictability of CT as most of the panels have no large doors installed.
when the daily production rate was between 35 and 40 panels/day, This can be inferred from the value of zero for the mean, standard de
Fig. 6. Working days are excluded if the daily production is less than five viation, and interquartile range.
or more than 75 panels/day as production beyond these limits repre
sents abnormal operating conditions. Approximately 97% of the total 5.3. Data cleaning and preprocessing
study period is covered by excluding these data. This exercise aims to
determine which production days represent anomalous operations and, Data preprocessing is an essential step in improving the quality of the
therefore, exclude the records during these days from further analysis. data itself and, consequently, the quality of the data mining results. Data
Furthermore, Table 1 summarizes the statistics for the attributes in preprocessing techniques include cleaning, integration, transformation,
DF2. Some attributes have extreme values compared to the mean and and reduction [44]. The main preprocessing steps applied to each
median. For example, ‘NailCount’ has an extreme maximum value of dataset in the present study (i.e., DF1 and DF2) are summarized in
2,466, which is considered an outlier. It can also be noted that the Table 2.
attribute ‘LargeDoors’ is expected to have minimal influence on the As an MWP passes over the RFID antenna, a timestamp is recorded in
5
Fig. 6. Total panel counts for each daily production throughput.
Table 1
Statistics summary of attributes in DF2 dataset.
mean std min 25% 50% 75% max
Length 8999 3315 1203 6087 10,404 11,878 12,338

Width 2525 158 1607 2467 2467 2505 3235
Height 117 29 89 89 140 140 314
Window 0 1 0 0 0 0 9
LargeWindow 0 1 0 0 0 0 5
Door 1 1 0 0 0 1 10
LargeDoor 0 0 0 0 0 0 3
Sheetfull 1 2 0 0 0 0 17
SheetPartial 3 4 0 0 0 4 31
Cutzone 2 2 0 0 1 3 17
Drillhole 4 3 0 2 4 6 26
Stud 15 7 0 9 15 21 215
DStud 0 0 0 0 0 0 6
LStud 2 2 0 0 1 3 18
MStud 0 1 0 0 0 0 14
Block 1 3 0 0 0 0 42
Backing 2 4 0 0 0 3 50
NailCount 327 401 0 0 190 566 2466
Nailline 33 37 0 0 20 63 447
the attribute “InitialReadDateTime.” When the panel leaves the RFID

∑
4
antenna range, the attribute “LastReadDateTime” records another ICT = tinitialRead, i+1 − tinitialRead, i (3)
timestamp. Ideally, when there is no interruption or waiting in the i=1
production line, the two timestamps should be similar, indicating that
the panel has not been sitting idle over the antenna. The idle time (IT) where tinitialRead,i+1 , and tinitialRead,i are the timestamps at the end and start
for panel i can be defined by Equation (2): antenna locations, respectively. It should be noted that the cycle time as
measured in the present study encompasses processing time, idle time,
ITpanel = tlastRead, i − tinitialRead, i (2) and the time spent on regular breaks during a typical working shift.
The intermediate CT can be described as the processing time plus the Fig. 7 shows the histogram and the kernel density smoothing curve of
idle time. Time differences between consecutive antenna locations both the measured CT (ActualCT) and the CT without the break times
indicate the time during which the panel is processed at each of the four (ActualCT-BT) for all MWPs. There are three break times during a typical
intermediate stations. Total CT is calculated, according to Equation (3), working shift in the case production factory: two 15-minute coffee
as the sum of these individual station times: breaks and one 30-minute lunch break. If the production of the panel
falls within these break times, the measured CT is reduced accordingly.
In this manner, actual production times are calculated, and any adverse
6
Table 2 attributes of panels that preceded the panel, i, in the production line but
Main preprocessing steps applied to DF1 and DF2. whose production is still underway, i.e., work in progress (WIP). These
Preprocessing Step Reasoning WIP panels represent the current state of system utilization at the
moment production of the current panel, i, begins. If we let Xn×m be a
“RFID Readings” dataset (DF1)
1 Keeping “InitialReadDateTime” and Initial and last timestamps used matrix in which each row, n, represents one WIP panel preceding the
“LastReadDateTime”. for time difference calculations. current panel, and each column, m, represents an existing feature, x. If
2 Discarding “LocationSourceAntenna”, Contain either redundant or we consider θ to be a binary vector of size m, where each element cor
“LocationTagID”, “TagID”, “WallNumber”, irrelevant information. responds to one of the existing features, then a value of 1 corresponds to
and “LastReadDate”.
3 Transforming the dataset: each row stores To prepare the data in a tidy
that feature being included when calculating the new engineered feature
timestamps for each MWP at each station; format, and to facilitate CT of the current panel; otherwise, the feature is ignored. This can be
the original dataset stores each timestamp calculations. expressed by Equation (4):
for each panel at each station in a separate ⎡ ⎤
row. x11 &x12 ⋯&x1n
4 Replacing a missing value of timestamps at Replacing missing values so that Si = θ⋅X = [θ1 θ2 ⋯ θm ]⎣
T
⋮&⋱&⋮ ⎦ (4)
individual stations with the mean value the dataset is consistent and xm1 &xm2 ⋯&xnm
provided that the timestamps at all other complete.
stations are not missing. SL features are developed for the present study to capture the system
5 Records, where the production of a panel Discarding outliers for consistent status based on the total production during one work shift. The con
started on one day and finished on a cycle time calculations.
structed features are ˝WPi ˝, ˝WPLengthi ˝ and ˝WPTimei ˝ and they store
subsequent day, are discarded.
“Multipanel” dataset (DF2) the number, length, and processing time, respectively, of the WIP
1 Keeping the dimension attributes Only panel manufactured at the panels. These features represent the set, Si, in Equation (4), with each
(“Length”, “Width”, and “Height”) within a automatic production line are being generated by setting the appropriate values of the binary vector, θ.
specific range. Panel lengths range kept. Smaller panels, that are Given i = {1, 2, …, n}, where n is the total number of WIP associated
between 1,203 and 12,338 mm; widths manually produced, are
range between 1,607 and 3,235 mm; and discarded.
with the current panel, the attributes above are calculated based on
heights range between 89 and 314 mm. Equations (5) through (8):
2 Nominal values of the attribute ‘Type’ are For all unique values of the
converted to numbers using the “dummy nominal attribute, a new WPi = |A| = cardinality of set A (5)
coding” technique, commonly known as attribute is created. The attribute { }
binarization. There are five unique values: corresponding to the value of the A = (x, y) : ∃ x, y ∈ DF 2 | x.T1 > y.T1 and x.T1 < max(y.T2 , y.T3 , y.T4 )
‘EXT’, ‘INT’, ‘STR’, ‘GAR’, and ‘MEC’. record gets a value of 1; other
(6)
attributes are set to 0.
3 Discarding the attributes “Job”, These attributes are irrelevant as
they contain redundant
∑
m
{ }
“Component”, “wall”, “Siding”,
WPLengthi = x.Lj : ∃ x ∈ A where m = |A| (7)
“SidingLine”, “Model”, “Floor”, “Unit”, information, have one unique
j=1
“GarageDoor”, “Sequence”, value, have few prevalent values,
“Basementwall”, “position”, and or have many missing values.
∑
m
{ }
“ProductionJob”.
WPTimei = x.PT j : ∃ x ∈ A where m = |A| (8)
4 Missing values are either replaced using Replacing missing values so that
j=1
functions defined by fitting curves to the the dataset is consistent and
known values or discarded if they account complete.
for more than 90% of the total count of an where: (x, y) the pair of timestamps; x is the timestamp at antenna A1 of
attribute. the current panel i, and y is the timestamp of any other panel concur
rently being processed;
effect of break times on the accuracy of the CT predictions is mitigated. DF denotes the tidy dataset;
Hence, ‘ActualCT-BT’ rather than ‘ActualCT’ is used to develop the Tk is the timestamp at antenna location Ak, k = {1, 2, 3, 4};
prediction models. x.Lj is the length of panels that satisfy the condition; and
x.PTj is the processing time of the panels that satisfy the condition.
5.4. Feature engineering and selection
A Python script is developed to iterate records in DF and create the
Feature engineering (FE) has been widely used in data-driven engineered features. The schematic in Fig. 8 represents the process of
modeling to improve the quality of feature sets. FE mainly encom calculating the length of WIP panels, ˝WPLengthi ˝, that precede panel i
passes feature selection, feature extraction, and feature construction in a typical work shift. In this example, tp,fs is the timestamp associated
[45]. In the present research, each record in the tidy dataset stores the with the panel when its production begins at the framing station, while
physical properties of a single multiwall panel and the respective actual tp− 1,ss and tp− 2,mfb are the timestamps for two panels that are ahead of the
cycle time, which is the target value to be predicted. One of the main current panel but are still in production.
contributions of the present study is to capture the system status by Moreover, based on the newly constructed features, different ver
constructing new features that objectively measure the loading condi sions of the dataset are generated, each containing a different set of
tions of the job shop (i.e., SL features). In essence, these new features are attributes to examine the effect of varying attribute subsets on the pre
developed to augment the existing features of a product unit based on diction accuracy. These new attributes are created by combining exist
the units that immediately precede it along the production line. The CT ing physical characteristics to reduce both the dataset dimensions and to
in the IBC is largely governed by the SL features (i.e., a measure of how examine the effect of features granularity on the accuracy of ML models.
busy the fabrication floor is when the production of a new wall panel Table 3 summarizes the experiments conducted with different attribute
begins). The inclusion of SL features in model development significantly subsets. Results are summarized in Table 4 and discussed in Section
enhances the model accuracy and, hence, CT predictability. This is 6.3.3.
mainly due to the fact that these features capture shop utilization, which
otherwise (i.e., if only the physical features such as length, width, etc.,
are used) would be unaccounted for.
Let Si represents the set of newly constructed features of a product, i.
Each feature ski ∈ Si is generated in such a manner as to capture
7
Fig. 7. Total cycle time (between A1 and A5 in minutes).
Fig. 8. Engineered feature calculations for Panel i at the time its production begins.
6. Experiments, results, and discussion

Table 3
Subsets created by combining attributes of the tidy dataset.
6.1. Machine learning regression models
Name Size (records × Description
attributes)
The three main categories of ML are supervised learning, unsuper
Raw data 1,094 × 23 Contains only physical properties of panels vised learning, and reinforcement learning. The CT prediction under
Engineered 1,094 × 28 In addition to the physical properties of
investigation in the present study is a regression supervised learning
panels, it contains engineered features
capturing the system status (WP, WPLength,
task, and there exist numerous ML models to predict a continuous var
WPTime, Panels/Day). iable of this nature. Among these, four are selected for our study: linear
Engineered 1,094 × 20 Combining similar properties of a panel into regression (LR), k-nearest neighbor (KNN), random forest (RF), and
combined1 single attribute: neural networks (NN). A successful model should provide reasonably
SheetFP = Sheetfull + SheetPartial
accurate estimates of the CT to be used in factory planning and should be
TotalStuds = Stud + 2*DStud + 2*LStud +
3*MStud easy to interpret. Representative references describing these methods
BlockBack = Block + Backing include Hastie et al. [46], Witten et al. [47], and, in the context of Py
totalWD = Window + LargeWindow + Door thon language, Raschka and Mirjalili [48]. For the treatment of regres
+ LargeDoor
sion trees and random forests, the interested reader may refer to
Engineered 1,094 × 17 Combining all types of components
combined2 (openings, stud, sheets, supportive elements)
Breiman et al. [49] and Breiman [50]. The four models mentioned above
into one attribute: are selected from among the many available ML models available for
Components = totalWD + TotalStuds + regression based on the following considerations: (1) they range from
SheetFP + BlockBack simple to implement and easily interpretable (i.e., LR and KNN) to more
sophisticated black-box models (i.e., RF and NN); (2) they have been
8
Table 4 models include support vector machines and deep learning networks,
Performance measures of the models for the datasets described in Table 3. which were not thoroughly examined due to an initial screening that
ML model LR KNN RF NN discarded the models that had a long run time given the large size of the
Hyperparameters – k = 11, p 400 trees, (52, 52) dataset used in the present study. It should also be noted that the main
=2 MAE 2,000 itr. focus of the present study is to evaluate the impact of engineered fea
Raw data R2 on 0.27 0.91 0.80 0.24 tures on ML model performance in general.
training With the tidy dataset being developed, the selected models are
MAE 9.2 6.8 5.8 7.7 trained and tested. Per Witten et al. [47], the dataset is divided into
MedAE 6.7 3.6 3.1 4.8
MAPE 23.6% 17.3% 17.0% 19.7%
three subsets: a training set, a validation set, and a test set. The test set
CC − 0.03 0.13 0.47 0.19 contains the records from a given day for which the CT is to be predicted.
Engineered R2 on 0.46 0.99 0.90 0.42 The training and validation sets contain the production data from pre
training vious days, and this data is used for training and validation using a 5-fold
MAE 7.1 6.1 5.3 6.7
cross-validation technique, as shown in Fig. 9. The models are trained
MedAE 5.6 3.4 3.5 4.4
MAPE 17.8% 14.9% 13.7% 16.4% and tested using Scikit-Learn, an open-source Python library for ML
CC 0.34 0.47 0.71 0.44 [60].
Engineered R2 on 0.43 0.99 0.90 0.40
combined 1 training
MAE 6.6 6.1 4.9 6.9
6.2. Comparison of model performances
MedAE 4.2 3.4 2.8 5.0
MAPE 16.4% 14.9% 12.3% 16.7%
CC 0.40 0.47 0.73 0.39 The performance of the models is evaluated based on the following
Engineered R2 on 0.41 0.99 0.90 0.39 measures: mean absolute error (MAE), median absolute error (MedAE),
combined 2 training mean absolute percentage error (MAPE), and correlation coefficient
MAE 6.1 6.0 4.4 6.5
(CC), which are calculated as shown in Equations (9) through (12):
MedAE 4.3 3.3 3.2 4.3
MAPE 15.0% 14.9% 11.0% 15.6%
1∑ n
CC 0.53 0.47 0.80 0.44 Mean Absolute Error (MAE) = |ai − pi | (9)
n i=1
implemented in previous studies to predict production cycle time as Median Absolute Error (MedAE) = mediani=1, n (|ai − pi |) (10)
discussed earlier in Section 2.2; and (3) they have been widely used in
n ⃒ ⃒
the literature to investigate construction problems. For example, LR was 100 ∑ ⃒a i − pi ⃒
Mean Absolute Percentage Error (MAPE) = ⃒ ⃒ (11)
employed to quantify construction delays [51], and to predict con n i=1 ⃒ ai ⃒
struction costs [52,53]; KNN methods have been applied in the predic
∑ ∑ ∑
tion of construction cost index [54], and planning of construction n ai pi − ai pi
technical specifications for deep foundations [55]; RF models were used Correlation Coefficient (CC) = √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
∑ 2 ∑ 2 ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
̅ √ ∑ ∑ ̅
n ai − ( ai ) n p2i − ( pi )2
to predict ground surface settlement due to tunnel construction [56],
and to improve the accuracy of BIM-enabled clash detection [57]; NN (12)
was used to predict labor productivity [58], and estimate schedule to
completion of construction projects [59]. Other potentially powerful where ai is the actual CT of panel i, pi is the predicted CT of panel i, and n
is the total number of panels in the data sample. These measures are
Fig. 9. Data splitting and model evaluation.
9
selected as per the literature analysis on performance metrics in ML error of the LR and RF models dropped from 17.8% and 13.7% in the
regression algorithms [61]. case of the first version of the engineered dataset, ‘Engineered’, to 15.0%
The resulting performance measures for the four datasets described and 11.0% in the case of the reduced dataset, ‘Engineered combined 2’,
in Table 3 are given in Table 4. The four models’ hyperparameters have respectively. In the aggregated dataset ‘Engineered combined 2’, all
been tuned using the training and validation datasets to attain better components of a wall panel (studs, windows, doors, backing, and
performing and more robust models. The performance of ML algorithms sheathing sheets) are combined and represented by one feature, namely,
depends highly on finding an optimal set of hyperparameters. Various “Components.”
strategies can be used to search for the best combination of hyper In addition to requiring less computational time, it can be concluded
parameters; these include grid search and random search [62]. In the that reducing the dimensionality of the data by combining similar at
present study, a combination of manual and brute force search using the tributes does improve the performance in terms of lower prediction er
GridSearchCV method available from the Scikit-Learn library is used rors and higher correlation coefficients. This might be because the
to search for the optimal combination of a predefined set of values for combined features have a weak correlation with the target value when
each model’s hyperparameters. The most important values are listed in computed individually. In contrast, when these features are aggregated,
Table 4 under the corresponding model’s name. the correlation with the target value increases. For example, when
comparing the correlation coefficients in Table 4, we observe an in
6.3. Observations on model performances crease of 19% for LR models and 9% for RF models between the second
dataset ‘Engineered’ and the fourth dataset ‘Engineered combined 2’.
6.3.1. General trends However, in the case of KNN and NN models, we observe no improve
Referring to Table 4, a distinct difference can be observed between ment. It should be noted that these observations are specific to the
the MAE and the MedAE values. The MedAE is smaller than the MAE for current case study and are not necessarily true for other cases. Further
all models, a trend aligned with the distribution pattern of actual CT, investigation is required to explore the effect of combining features on
which is positively skewed (moderately skewed right), as shown in model performance in the context of different cases.
Fig. 7. This behavior is attributable to the presence of a few outlier re
cords that are significantly higher than the average values. These 6.3.4. Lookback timeframe for model training
extreme values cause the distribution of CT to be widespread. Another The training/validation set is examined in reference to various
observation is that achieving a higher value of R2 on the training set does lookback timeframes as part of the ML model development process. The
not necessarily translate into better performance on the testing set, as goal here is to investigate the effect of the number of days used to train
can be seen for the performance of KNN, where the model is observed to the model on the model performance and the predictability of cycle
have overfitted the training data. As this example underscores, the time. On average, using a lookback timeframe of 10 to 20 days for
model performance should not be evaluated based on the metrics of the training and validation is found to result in better performance measures
training set. compared to shorter or longer lookback timeframes. Fig. 9 illustrates the
Across all models and datasets, the MAPE range between 24% manner in which the splitting of the data is carried out. It is worth noting
(worst) and 11% (best), and the CC range between 0.0 (no correlation) that further investigation is required to examine the full effect of varying
and 0.8 (strong positive correlation). Overall, the RF models have better lookback timeframes on different days of the week. This task will be
performance measures in terms of lower MAE, MedAE, MAPE, and undertaken as part of future work.
higher correlation coefficients. However, it should be noted that RF
requires slightly more computational time. An advantage of RF over 6.4. Comparison with Little’s law
other methods is the ability to handle both categorical and continuous
attributes, and attributes need not be scaled to common units. When The simplest and most widely used analytical method to predict cycle
testing several configurations of RF models, using approximately 400 time is Little’s law, which can be applied to dynamic systems by dividing
trees and MAE as the evaluation criterion when splitting the nodes the time into finite intervals. Equation (13) presents Little’s law, where
yielded the best performance. any of the average CT, system throughput (TH), and average units in the
system (WIP or work-in-progress) can be calculated by making obser
6.3.2. Engineered features vations about the other two quantities over the given time interval [15]:
The production system utilization status is valuable information to
WIP = CT × TH (13)
add to the ML models to improve the predictability of CT. The raw
dataset, ‘Raw data’ in Table 3, includes only the physical properties of To compare these quantities, the time interval in the present study is
the panels; the engineered features that represent the system utilization set to one work shift. The average daily production, shown in Fig. 6, is
status are only included in the other three datasets. Using the raw data found to be 39.3 panels/day. Given a 7.5-hour work shift, this translates
without the SL features leads to inferior model performance. The MAPE into a system throughput of about 5 panels/hour. When the observed
is found to be as high as 24%, while the CC is low for all models. panel enters the production route, the average number of WIP panels is
Meanwhile, an improvement in MAPE of as much as 9% is achieved 3.5 panels. Using Little’s law, the cycle time is then calculated as 42 min,
when augmenting the datasets with the proposed SL features. Moreover, close to the calculated average of 40.9 min, as shown in Fig. 7.
the correlation coefficients have increased for all models to as high as Assuming that this average number is used as the cycle time for all
0.80 in the case of the RF model. panels, the performance measures used to compare ML models are
Intuitively, augmenting the feature set with engineered features calculated for the given dataset: MAE = 11 min, MedAE = 8 min, and
should result in better predictability of the developed ML models MAPE = 30%. These values are inferior to those obtained from any data-
because these features provide additional information about how the driven ML models. Although this analytical method can be used to
production floor is being utilized when the production of a new product capture the average performance of the system by estimating CT based
begins. Based on the presented results, this assumption is proven to be on the average daily production, error rates are high, and the variability
valid, with performance improvements having been consistently ach in the production can not be efficiently captured as in the case of
ieved for all models. applying ML models to predict the CT.
6.3.3. Combining features 7. Conclusions

The performance of all models does slightly improve by reducing the
granularity of the features. For example, the mean absolute percentage This study focuses on the accurate prediction of cycle time in a
10
panelized wall manufacturing shop using the physical properties of the improve the predictability of CT.
panels and real-time loading conditions of the fabrication shop con
structed from raw RFID timestamps. Real-time actual data, although it
poses several challenges compared to the simulated data used in most Declaration of Competing Interest
previous studies in this domain, as it has a considerable amount of noise,
offers the advantage of depicting the actual production system more The authors declare that they have no known competing financial
realistically than does simulated data. Therefore, extensive data iden interests or personal relationships that could have appeared to influence
tification, preprocessing, and cleaning are carried out to obtain the tidy the work reported in this paper.
dataset used for model training, validation, and testing. A methodology
that draws upon the data mining frameworks commonly used in the Acknowledgments
literature is proposed and systematically applied to develop several ML
models for CT prediction. The approach uses real-time RFID data to This research was funded through a Collaborative Research Devel
construct new features depicting the real-time loading conditions of the opment Grant by the Natural Science and Engineering Research Council
job shop. It seeks to map the relationship between features representing of Canada (Grant File No. CRDPJ 536466-18).
the current state of shop utilization, the product properties, and the CT
based on historical data. The results show that all the developed ML References
models (LR, KNN, RF, and NN) yield predictions of production CT that
are more accurate than the predictions of average CT as calculated using [1] Z. Li, G.Q. Shen, X. Xue, Critical review of the research on the management of
Little’s law. In addition, all the developed models showed improved prefabricated construction, Habitat Int. 43 (2014) 240–249, https://doi.org/
10.1016/j.habitatint.2014.04.001.
prediction accuracy when engineering features that represent shop uti [2] USG Corporation and U.S. Chamber of Commerce, Commercial Construction Index
lization were added to the training dataset. The random forests model, Q1 2018, 2018. https://www.uschamber.com/report/usg-us-chamber-commerce
which uses an ensemble of regression decision trees, provides a flexible -commercial-construction-index-2018-q1 (accessed April 2, 2021).
[3] N. Bertram, S. Fuchs, J. Mischke, R. Palter, G. Strube, J. Woetzel, Modular
method and is found to achieve the best performance among the various
construction: From projects to products, 2019. https://www.mckinsey.com/bus
techniques considered, with a MAPE of 11% and a CC of 0.80. iness-functions/operations/our-insights/modular-construction-from-projects-to
The present study demonstrates the applicability and accuracy of ML -products (accessed April 2, 2021).
[4] Modular Building Institute, 2020 Canadian Commercial Modular Construction
for predicting production CT in IBC using real-time shop utilization data.
Annual Report, 2020. https://www.modular.org/HtmlPage.aspx?name=2020
This approach can help production managers make sound decisions -MBI-annual-reports (accessed April 2, 2021).
regarding dynamic scheduling of orders and shop capacity planning. We [5] Z. You, C. Wu, A framework for data-driven informatization of the construction
show how this improved accuracy of CT prediction is largely attribut company, Adv. Eng. Inform. 39 (2019) 269–277, https://doi.org/10.1016/j.
aei.2019.02.002.
able to the augmentation of the dataset with engineered features that [6] H. Lasi, P. Fettke, H.-G. Kemper, T. Feld, M. Hoffmann, Industry 4.0, Bus. Inf. Syst.
capture shop utilization. This is evident in the improved performance (i. Eng. 6 (2014) 239–242, https://doi.org/10.1007/s12599-014-0334-4.
e., lower error measures and higher CC) of all ML models tested in this [7] M. Hermann, T. Pentek, B. Otto, Design Principles for Industrie 4.0 Scenarios, in:
49th Hawaii Int. Conf. Syst. Sci., Koloa, HI, 2016, pp. 3928–3937. https://doi.org/
study compared to using only the physical properties of products. 10.1109/HICSS.2016.488.
Moreover, the results show that all models perform slightly better when [8] K. Schwab, The Fourth Industrial Revolution, Crown Business, New York, 2017.
the granularity of the dataset is reduced by combining multiple physical [9] A. Aapaoja, H. Haapasalo, The Challenges of Standardization of Products and
Processes in Construction, in: 22nd Annu. Conf. Int. Gr. Lean Constr., Oslo,
attributes into unified features (e.g., panel components = studs + win Norway, 2014, pp. 983–993. https://doi.org/10.13140/2.1.3993.7600.
dows + doors + sheathing + backing), which is found to have a higher [10] M.R. Hosseini, I. Martek, E.K. Zavadskas, A.A. Aibinu, M. Arashpour, N. Chileshe,
correlation with the cycle time. We also show that a lookback timeframe Critical evaluation of offsite construction research: A Scientometric analysis,
Autom. Constr. 87 (2018) 235–247, https://doi.org/10.1016/j.
of 10–20 days results in more accurate predictions compared to using all
autcon.2017.12.002.
available historical data. However, further study is required to examine [11] R. Jin, S. Gao, A. Cheshmehzangi, E. Aboagye-Nimo, A holistic review of offsite
the effect of varying lookback timeframes on the robustness and accu construction literature published between 2008 and 2018, J. Clean. Prod. 202
(2018) 1202–1219, https://doi.org/10.1016/j.jclepro.2018.08.195.
racy of the ML models.
[12] M.A. Mullens, Factory Design for Modular Homebuilding: Equipping the Modular
The present research study uses historical data obtained from an Factory for Success, Constructability Press, Winter Park, FL, 2011.
RFID data acquisition system. This data contains considerable noise and [13] M.E. Pfund, S.J. Mason, J.W. Fowler, Semiconductor Manufacturing Scheduling
requires extensive data cleaning and preparation. In addition, due to the and Dispatching, in: Handb. Prod. Sched., Kluwer Academic Publishers, Boston,
2006, pp. 213–241. https://doi.org/10.1007/0-387-33117-4_9.
existing RFID system setup, some data is not collected that might contain [14] S.-H. Chung, H.-W. Huang, Cycle time estimation for wafer fab with engineering
valuable information, especially the features associated with waiting lots, IIE Trans. 34 (2002) 105–118, https://doi.org/10.1080/
times for each panel throughout the production process. Therefore, 07408170208928854.
[15] W.J. Hopp, M.L. Spearman, Basic Factory Dynamics, in: Fact. Phys. Found. Manuf.
future research directions include developing an integrated data mining Manag., 2nd ed., McGraw-Hill/Irwin, Boston, MA, 2000, pp. 213–247.
and simulation modeling that can generate the missing data regarding [16] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From Data Mining to Knowledge
waiting and idle times of panels while in production. Discovery in Databases, AI Mag. 17 (1996) 37–54, https://doi.org/10.1609/aimag.
v17i3.1230.
Moreover, future research is to be directed towards investigating the [17] P. Cabena, R. Stadler, J. Verhees, A. Zanasi, P. Hadjnian, Discovering data mining
application of ML and data mining techniques in other similar indus from concept to implementation, Prentice Hall, New Jersey, 1998.
trialized building construction settings and how accurate prediction of [18] L. Soibelman, H. Kim, Data Preparation Process for Construction Knowledge
Generation through Knowledge Discovery in Databases, J. Comput. Civ. Eng. 16
CT impacts the scheduling of on-site activities. The goal is to examine
(2002) 39–48, https://doi.org/10.1061/(ASCE)0887-3801(2002)16:1(39).
the applicability and consistency of using ML models to predict CT in [19] H. Kim, L. Soibelman, F. Grobler, Factor selection for delay analysis using
mass customization production as in the IBC industry. It is worth noting Knowledge Discovery in Databases, Autom. Constr. 17 (2008) 550–560, https://
doi.org/10.1016/j.autcon.2007.10.001.
that the dataset used in the present study represents one fabrication
[20] A.M. Hammad, An Integrated Framework for Managing Labour Resources Data in
facility, and future work should also be directed towards confirming the Industrial Construction Projects: A Knowledge Discovery in Data (KDD) Approach,
results obtained through this study by repeating the work for other PhD, University of Alberta, 2009 https://www.collectionscanada.gc.ca/obj/
cases. The details of the engineered feature development reflecting the thesescanada/vol2/002/NR55814.PDF (accessed April 2, 2021).
[21] M. Wauters, M. Vanhoucke, Support Vector Machine Regression for project control
loading conditions of the job shop will depend on the specific case study, forecasting, Autom. Constr. 47 (2014) 92–106, https://doi.org/10.1016/j.
and some custom development may be required. However, due to the autcon.2014.07.014.
similarities among IBC manufacturing environments, with some minor [22] M. Bilal, L.O. Oyedele, J. Qadir, K. Munir, S.O. Ajayi, O.O. Akinade, H.A. Owolabi,
H.A. Alaka, M. Pasha, Big Data in the Construction Industry: A review of Present
customization, the presented features can give a representative picture Status, Opportunities, and Future Trends, Adv. Eng. Informatics. 30 (2016)
of the current state of shop utilization in a given IBC factory and thereby 500–521, https://doi.org/10.1016/j.aei.2016.07.001.
11
[23] Q. Fang, H. Li, X. Luo, L. Ding, T.M. Rose, W. An, Y. Yu, A deep learning-based [43] H. Wickham, Tidy Data, J. Stat. Softw. 59 (2014) 1–23. https://doi.org/10.18637/
method for detecting non-certified work on construction sites, Adv. Eng. jss.v059.i10.
Informatics. 35 (2018) 56–68, https://doi.org/10.1016/j.aei.2018.01.001. [44] S. Chakrabarti, E. Cox, E. Frank, R.H. Güting, J. Han, X. Jiang, M. Kamber, S.
[24] J. Choi, B. Gu, S. Chin, J.-S. Lee, Machine learning predictive model based on S. Lightstone, T.P. Nadeau, R.E. Neapolitan, D. Pyle, M. Refaat, M. Schneider, T.
national data for fatal accidents of construction workers, Autom. Constr. 110 J. Teorey, I.H. Witten, Data Mining Know It All, Morgan Kaufmann, Burlington,
(2020) 1–14, https://doi.org/10.1016/j.autcon.2019.102974. MA, 2009.
[25] N.D. Nath, A.H. Behzadan, S.G. Paal, Deep learning for site safety: Real-time [45] Y. Li, C. Yang, Domain knowledge based explainable feature construction method
detection of personal protective equipment, Autom. Constr. 112 (2020) 1–20, and its application in ironmaking process, Eng. Appl. Artif. Intell. 100 (2021),
https://doi.org/10.1016/j.autcon.2020.103085. 104197, https://doi.org/10.1016/j.engappai.2021.104197.
[26] J. Wu, N. Cai, W. Chen, H. Wang, G. Wang, Automatic detection of hardhats worn [46] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, 2nd ed.,
by construction personnel: A deep learning approach and benchmark dataset, Springer New York, New York, NY, 2009. https://doi.org/10.1007/978-0-387-
Autom. Constr. 106 (2019) 1–7, https://doi.org/10.1016/j.autcon.2019.102894. 84858-7.
[27] M. Bilal, L.O. Oyedele, Guidelines for applied machine learning in construction [47] I.H. Witten, E. Frank, M.A. Hall, Data Mining: Practical Machine Learning Tools
industry—A case of profit margins estimation, Adv. Eng. Informatics. 43 (2020), and Techniques, 3rd ed., Morgan Kaufmann, Burlington, MA, 2011. https://doi.
101013, https://doi.org/10.1016/j.aei.2019.101013. org/10.1007/s00170-004-2497-5.
[28] E. Valero, A. Adán, C. Cerrada, Evolution of RFID Applications in Construction: A [48] S. Raschka, V. Mirjalili, Python Machine Learning - Second Edition, 2nd ed., Packt
Literature Review, Sensors. 15 (2015) 15988–16008, https://doi.org/10.3390/ Publishing, Birmingham, 2017.
s150715988. [49] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification And Regression
[29] E. Valero, A. Adán, Integration of RFID with other technologies in construction, Trees, 1st ed., Routledge, 1984. https://doi.org/10.1201/9781315139470.
Measurement 94 (2016) 614–620, https://doi.org/10.1016/j. [50] L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32, https://doi.org/
measurement.2016.08.037. 10.1023/A:1010933404324.
[30] J.A. Harding, M. Shahbaz, A.K. Srinivas, Data Mining in Manufacturing: A Review, [51] A.H. Al-Momani, Construction delay: a quantitative analysis, Int. J. Proj. Manag.
J. Manuf. Sci. Eng. 128 (2006) 969–976, https://doi.org/10.1115/1.2194554. 18 (2000) 51–59, https://doi.org/10.1016/S0263-7863(98)00060-X.
[31] A.K. Choudhary, J.A. Harding, M.K. Tiwari, Data mining in manufacturing: a [52] D.J. Lowe, M.W. Emsley, A. Harding, Predicting Construction Cost Using Multiple
review based on the kind of knowledge, J. Intell. Manuf. 20 (2009) 501–521, Regression Techniques, J. Constr. Eng. Manag. 132 (2006) 750–758, https://doi.
https://doi.org/10.1007/s10845-008-0145-x. org/10.1061/(ASCE)0733-9364(2006)132:7(750).
[32] A. Öztürk, S. Kayalıgil, N.E. Özdemirel, Manufacturing lead time estimation using [53] T.M. Zayed, D.W. Halpin, Productivity and Cost Regression Models for Pile
data mining, Eur. J. Oper. Res. 173 (2006) 683–700, https://doi.org/10.1016/j. Construction, J. Constr. Eng. Manag. 131 (2005) 779–789, https://doi.org/
ejor.2005.03.015. 10.1061/(ASCE)0733-9364(2005)131:7(779).
[33] P. Backus, M. Janakiram, S. Mowzoon, G.C. Runger, A. Bhargava, Factory Cycle- [54] J. Wang, B. Ashuri, Predicting ENR’S Construction Cost Index Using the Modified K
Time Prediction With a Data-Mining Approach, IEEE Trans. Semicond. Manuf. 19 Nearest Neighbors (KNN) Algorithm, in: Constr. Res. Congr. 2016, American
(2006) 252–258, https://doi.org/10.1109/TSM.2006.873400. Society of Civil Engineers, Reston, VA, 2016, pp. 2502–2509. https://doi.org/
[34] T. Chen, R. Romanowski, Precise and Accurate Job Cycle Time Forecasting in a 10.1061/9780784479827.249.
Wafer Fabrication Factory with a Fuzzy Data Mining Approach, Math. Probl. Eng. [55] Y. Zhang, L. Ding, P.E.D. Love, Planning of Deep Foundation Construction
2013 (2013) 1–14, https://doi.org/10.1155/2013/496826. Technical Specifications Using Improved Case-Based Reasoning with Weighted k-
[35] B. Can, C. Heavey, A Demonstration of Machine Learning for Explicit Functions for Nearest Neighbors, J. Comput. Civ. Eng. 31 (2017) 04017029, https://doi.org/
Cycle Time Prediction Using MES Data, in: 2016 Winter Simul. Conf., Washington, 10.1061/(ASCE)CP.1943-5487.0000682.
DC, 2016, pp. 2500–2511. https://doi.org/10.1109/WSC.2016.7822289. [56] J. Zhou, X. Shi, K. Du, X. Qiu, X. Li, H.S. Mitri, Development of Ground Movements
[36] D. Gyulai, A. Pfeiffer, G. Nick, V. Gallina, W. Sihn, L. Monostori, Lead time Due to a Shield Tunnelling Prediction Model Using Random Forests, in: Geo-China
prediction in a flow-shop environment with analytical and machine learning 2016, American Society of Civil Engineers, Reston, VA, 2016, pp. 108–115.
approaches, IFAC-PapersOnLine. 51 (2018) 1029–1034, https://doi.org/10.1016/ https://doi.org/10.1061/9780784480106.014.
j.ifacol.2018.08.472. [57] Y. Hu, D. Castro-Lacouture, Clash Relevance Prediction Based on Machine
[37] L. Lingitz, V. Gallina, F. Ansari, D. Gyulai, A. Pfeiffer, W. Sihn, L. Monostori, Lead Learning, J. Comput. Civ. Eng. 33 (2019) 1–15, https://doi.org/10.1061/(ASCE)
time prediction using machine learning algorithms: A case study by a CP.1943-5487.0000810.
semiconductor manufacturer, Proc. CIRP 72 (2018) 1051–1056, https://doi.org/ [58] G. Heravi, E. Eslamdoost, Applying Artificial Neural Networks for Measuring and
10.1016/j.procir.2018.03.148. Predicting Construction-Labor Productivity, J. Constr. Eng. Manag. 141 (2015)
[38] P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, R. Wirth, 04015032, https://doi.org/10.1061/(asce)co.1943-7862.0001006.
CRISP-DM 1.0 Step-by-Step Data Mining Guide, 2000. https://the-modeling-agen [59] M.-Y. Cheng, Y.-H. Chang, D. Korir, Novel Approach to Estimating Schedule to
cy.com/crisp-dm.pdf (accessed April 2, 2021). Completion in Construction Projects Using Sequence and Nonsequence Learning,
[39] P. Heller, D. Piziak, R. Stackowiak, A. Licht, T. Luckenbach, B. Cauthen, A. Misra, J. Constr. Eng. Manag. 145 (2019) 04019072, https://doi.org/10.1061/(asce)
J. Wyant, J. Knudsen, An Enterprise Architect’s Guide to Big Data — Reference co.1943-7862.0001697.
Architecture Overview, 2016. http://www.oracle.com/technetwork/topics/ent [60] G. Varoquaux, L. Buitinck, G. Louppe, O. Grisel, F. Pedregosa, A. Mueller, Scikit-
arch/articles/oea-big-data-guide-1522052.pdf (accessed April 2, 2021). learn: Machine Learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830. htt
[40] T.H. Davenport, J.G. Harris, R. Morison, Analytics at Work: Smarter Decisions, p://jmlr.csail.mit.edu/papers/volume12/pedregosa11a/pedregosa11a.pdf.
Better Results, Harvard Business Press, Boston, Massachusetts, 2010. accessed April 2, 2021.
[41] D. Dietrich, B. Heller, Y. Beibei, Data Science & Big Data Analytics: Discovering, [61] A. Botchkarev, A New Typology Design of Performance Metrics to Measure Errors
Analyzing, Visualizing and Presenting Data, John Wiley & Sons, Indianapolis, IN, in Machine Learning Regression Algorithms, Interdiscip. J. Information,
2015. Knowledge, Manag. 14 (2019) 045–076. https://doi.org/10.28945/4184.
[42] J.B. Rollins, Foundational Methodology for Data Science, IBM Anal. (2015). [62] J. Bergstra, Y. Bengio, Random Search for Hyper-Parameter Optimization, J. Mach.
https://tdwi.org/~/media/64511A895D86457E964174EDC5C4C7B1.PDF Learn. Res. 13 (2012) 281–305. https://jmlr.csail.mit.edu/papers/volume13/be
(accessed April 2, 2021). rgstra12a/bergstra12a.pdf. accessed April 2, 2021.
12

A Machine Learning Approach To Predict Production Time U - 2022 - Advanced Engin

Uploaded by

Copyright:

Available Formats

You might also like

A Machine Learning Approach To Predict Production Time U - 2022 - Advanced Engin

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Machine Learning Approach To Predict Production Time U - 2022 - Advanced Engin

Uploaded by

Copyright:

Available Formats

Advanced Engineering Informatics 52 (2022) 101631

Contents lists available at ScienceDirect

Advanced Engineering Informatics

Full length article

A machine learning approach to predict production time using real-time

1. Introduction which encompasses automation, data exchange, and digitization in

Fig. 1. Simplified high-level workflow in industrialized building construction.

Fig. 2. Framework of research approach.

Fig. 4. Flowchart of initial data identification and data preprocessing steps.

Fig. 6. Total panel counts for each daily production throughput.

Length 8999 3315 1203 6087 10,404 11,878 12,338

the attribute “InitialReadDateTime.” When the panel leaves the RFID

Fig. 7. Total cycle time (between A1 and A5 in minutes).

6. Experiments, results, and discussion

Fig. 9. Data splitting and model evaluation.

6.3.3. Combining features 7. Conclusions

You might also like