Professional Documents
Culture Documents
Artificial Intelligence For BIM Content Management and Delivery: Case Study of Association Rule Mining For Construction Detailing
Artificial Intelligence For BIM Content Management and Delivery: Case Study of Association Rule Mining For Construction Detailing
net/publication/355000000
Artificial intelligence for BIM content management and delivery: Case study
of association rule mining for construction detailing
CITATIONS READS
19 943
2 authors:
All content following this page was uploaded by Hamid Abdirad on 22 November 2021.
a
DPR Construction; hamida@dpr.com (Corresponding Author)
b
North Carolina State University; pmathur2@ncsu.edu
1
Abstract
The proliferation of Building Information Modeling (BIM) applications in design and
engineering practice, in tandem with the extensive variation of building products, has posed
new demands on firms to efficiently manage and reuse BIM content (i.e., data-rich parametric
model objects and assembly details). Tasks such as classifying objects, indexing them with
meta-data (e.g., category), and searching digital libraries to load objects into models still plague
practice with inefficient manual workflows. This research aims to improve the productivity of
recommender system. Using data from a case-study firm, this research extracted content from
over 30,000 technical BIM views (e.g., plans, sections, details) in historical projects to build
explicated the strength of relationships among co-occurring BIM objects. Using this prototype
as the backbone AI-engine in live BIM sessions, this research developed a context-aware
recommender system that dynamically provides BIM users with a set of objects associable with
their modeling context (e.g., type of view, existing objects in the model) and human-computer
interactions (e.g., objects selected by the user). By mining association data from hundreds of
historical projects, this development marks a departure from the existing prototypes that rely
on explicit coding, recurring user input, or subjective ratings to recommend BIM content to
users. The simulation and experimental implementation of this recommender system yielded
high efficacy in predicting content needs and significant saving time in BIM workflows.
Keywords
2
1. Introduction
Architectural, Engineering, and Construction (AEC) practices accumulate large quantities
of technical information in their day-to-day work (Zhou, Goh, and Shen 2016). This
information describes technical characteristics of buildings with symbolic and iconic means
(text and graphics) through heterogenous physical or digital media (Tzonis and White 2012).
The uptake of digital media, particularly Building Information Modeling (BIM) technologies,
engendered new opportunities and challenges to represent technical building information. BIM
geometries, graphics, and information of building components in digital models and drawings.
However, the extensive variation of building products and the growing expectations for digital
information delivery in projects have imposed new demands on designers and engineers to
create, standardize, maintain, and reuse thousands of object templates in their digital BIM
libraries. In particular, the industry estimates show that an average BIM user spends 40- 80
hours annually (2 to 4 percent of total working hours) just to search these extensive libraries to
locate and load objects into BIM models (UNIFI 2020). This issue motivated the development
of BIM Content Management and Delivery (BCMD) tools to facilitate organizing, searching,
loading BIM content (with cloud repositories, digital libraries, add-on user interfaces, and
search engines). Despite these advances in BCMD tools, the emergent solutions still plague
BIM workflows with manual efforts such as: (a) classifying objects and organizing folders
based on custom categories or taxonomies; (b) indexing and registering objects in search
engines with hard-coded meta-data on their functional or technical properties; and (c) querying
or browsing BCMD tools to locate, pick, and load relevant objects into project models.
Given the significance of current BCDM inefficiencies in design and engineering, this
study aims to develop an artificially intelligent BIM Content Recommender System (BCRS)
to minimize the inefficiencies associated with manual content search and retrieval. This BCRS
uses Association Rule Mining (ARM) as an unsupervised Machine Learning (ML) approach
3
to accomplish the following objectives: (1) mine big data analytics of BIM objects from the
content used in historical BIM models; (2) measure the statistical probability for co-occurrence
and associations between BIM objects based on their technical precedents in BIM views (e.g.,
sections, details); and (3) use these statistics to automatically and dynamically load and
recommend a series of BIM objects to users in live BIM sessions given the cues from the
modeling context and human-computer interactions. Put differently, the goal for this prototype
is to learn from and make recommendations based on the association of BIM objects that
The simulation and experimental implementation of the proposed BCRS in a case study
engineering firm achieved 80% efficacy in predicting content needs and saved 15% of the time
spent on conventional BIM workflows. This state-of-the-art dynamic BCRS marks a departure
from the existing solutions that commonly rely on explicit coding, extensive user input and
feedback, or subjective ratings (Lee et al. 2020). With significant predictive power and process
efficiency, lessons learned from this BCRS development can make theoretical and empirical
contributions to BIM-enabled engineering informatics and inform the future of AI-backed BIM
2. Background
2.1. Artificial Intelligence for Recommender Systems
Recommender systems provide users with appropriate content based on data from
characteristics of users, contents, or their historical interactions (Falk 2019). These systems are
widely implemented in digital artifacts that consumers use for e-commerce, entertainment, and
knowledge management (Lee et al. 2020). Although the literature offers different taxonomies
their data sourcing and content filtering approaches (Taghavi et al. 2018). These approaches
include (a) collaborative-filtering, (b) content-based filtering, and (c) hybrid filtering.
Collaborative filtering systems rely on the patterns of content usage in similar past scenarios
4
(e.g., based on content usage by similar peer users; based on ratings by other users). Content-
based filtering systems analyze and match the attributes and characteristics of contents to those
sought by the active user. Hybrid filtering systems (also known as Ensemble Systems) use a
combination of collaborative and content-based filtering to tap into massive historical datasets
as well as detailed descriptions of contents and users to make recommendations (Chen, Teng,
and Chang 2015, Falk 2019). Depending on the knowledge domain, hybrid systems may utilize
The core computational backbone of recommender systems is the algorithm or model that
analyze input data and filter content to predict what a user needs (Taghavi et al. 2018). The
uptake of Artificial Intelligence (AI) has engendered and advanced many algorithms and
models for recommender systems (Falk 2019). The early approaches to AI automation, also
known as Symbolic AI, involved explicitly hard-coding human knowledge, tasks, and logical
rules in digital artifacts to generate outputs. These approaches - commonly used in expert
systems and case-based reasoning prototypes - failed to address complex problems with large
datasets where it was infeasible for humans to build in-depth knowledge, identify rules, or
explicate logics. This motivated the development of Machine-Learning (ML) methods and
Knowledge Discovery in Databases (KDD), whereby computers automatically learn rules and
build knowledge using data mining techniques (Figure 1). In fact, rather than being explicitly
programmed, ML models are trained with example cases to learn and implement rules for such
5
Input Data
Symbolic AI Systems Output Answers & Prediction
Explicitly Coded
Rules and Conditions
Input Data
Machine-Learning Systems
Learned Rules and Conditions
Example Output Answers
Descriptive models learn rules and identify characteristics of existing datasets to inform
practice, while predictive models apply the learned rules on new datasets to predict their
characteristics. The learning process in these models can be either supervised or unsupervised.
With supervised ML models, previously seen outputs from example cases and their known
description and prediction of features in new data. With unsupervised ML models, previously
unknown features and outputs (e.g., discriminant characteristics, association of objects, clusters
of objects) are directly derived from data (Zhang and Zhang 2003, Flach 2012). Among the
unsupervised ML models, Association Rule Mining (ARM) has gained popularity because of
its diverse applications, clear representation, relevancy to both descriptive and predictive
approaches. ARM identifies the strength of relationships between objects that frequently occur
together in large datasets (Zhang and Zhang 2003). This research builds an ARM model that
supports a context aware BCRS. The goal in this prototype is to mine existing BIM datasets
and identify association of BIM objects that occur together in technical engineering
representations. These rules will inform the BCRS to monitor human-computer interactions in
live BIM sessions and recommend relevant BIM objects to users accordingly.
6
2.2. Applications of ML in the AEC Industry
Despite the significant progress in the development and diffusion of ML applications in
many high-tech sectors, the AEC industry has been notoriously slow and lagging in exploring
and adopting ML in business settings (Xu et al. 2021). With its unique institutional and business
context, the fragmented nature of this industry imposes many practical, commercial, cultural,
and organizational challenges onto the empirical investigation and real-world adoption of ML
(Sacks, Girolami, and Brilakis 2020). In fact, only in the past several years has the research
gained momentum to revisit and build on the early examples of ML in the AEC domains
(McCabe, AbouRizk, and Goebel 1998, Wilson, Sharpe, and Kenley 1987).
With this increasing momentum, recent studies proposed a growing number of ML applications
practice in different project and life-cycle phases of built assets. For example, in the design and
planning phases, the proposed application of ML include predicting the efficacy of design
simulations based on prior simulations, and mining and classifying historical building data for
extracting implicit design insights (Tamke, Nicholas, and Zwierzycki 2018, Sacks, Girolami,
and Brilakis 2020). For preconstruction phases, researchers proposed ML prototypes that
predict project cost, schedule, profit margins, potentials for disputes, and potentials for
successful tendering (Bilal and Oyedele 2020, Xu et al. 2021, Amer and Golparvar-Fard 2021).
For the construction phase, the existing ML prototypes heavily focused on recognizing and
tracking objects (workers, building components, objects, and equipment) based on imagery,
GPS, or laser scan data in order to predict, monitor, and control project progress, safety hazards,
and as-built conditions (Braun and Borrmann 2019, Kim et al. 2021, Akhavian and Behzadan
2015). For the post-construction phases, the applications of ML have advanced methods of
analyzing built assets for operations, maintenance, and condition assessment, especially for
structural health monitoring and asset inspection (e.g., detecting crack or damages from
7
images) and predictive maintenance planning for equipment (Cheng et al. 2020, Akinosho et
With the growing utilization of BIM in design, construction and operations, researchers and
practitioners have been intrigued by the possibility to leverage large BIM datasets or
visualizations in innovative ML applications (Braun and Borrmann 2019, Lomio et al. 2018).
However, this line of research is still in its infancy because of the challenges such as incomplete
models, poorly structured or inaccessible file formats, the lack of semantically rich objects, and
data representations incompatible with machine learning algorithms (Sacks, Girolami, and
Brilakis 2020). These challenges limit the feasibility of developing scalable BIM-based ML
applications, and they often necessitate generating new data in addition to existing BIM models
to support ML. For instance, the existing studies that developed ML-based BCRS require
supplementing BIM data with explicit codes, extensive user input and feedback, or subjective
ratings (Lee et al. 2020, Zhang, Liu, and Al-Hussein 2018). To overcome these challenges and
minimize data and development inefficiencies, this study proposes an innovative approach to
collecting, structuring, and representing historical BIM data to build a ML-backed BCRS
transactions (Agrawal, Imieliński, and Swami 1993). The term transaction refers to a group of
items or entities that have implicit or explicit relationships as they build a meaningful collection
together. For example, this can be a group of items bought together in a shopping cart, or a set
of webpages a user visited in a session. This study applies this concept to BIM processes such
that each technical BIM view (e.g., plan, section, detail, etc.) is considered a transaction
composed of associative BIM objects. ARM mines transactions and discloses the rules that
quantify the strength of association(s) between their items. Each rule is defined in the form
8
X→Y, meaning that when the item X (antecedent) is observed in a transaction, item Y
(consequent) is also observed (Tan et al. 2019, Zhang and Zhang 2003).
Using the set theory, let 𝐼𝐼 = {𝑖𝑖1 , 𝑖𝑖2 ,· · ·, 𝑖𝑖𝑚𝑚 } be the set of all unique items that can take place in
a transaction. Let 𝑇𝑇 = �𝑡𝑡1 , 𝑡𝑡2 ,· · ·, 𝑡𝑡𝑗𝑗 � be the set of all distinct transactions in a dataset. Each
transaction 𝑡𝑡𝑗𝑗 stores a number of items from I. Given the dataset T, for the association rule
X→Y, let X and Y be the item-sets that (1) hold one or more items from I (X ⊂ 𝐼𝐼, Y ⊂ I), (2)
occur within one or more transactions: {X ∪ Y} ⸦ 𝑡𝑡𝑗𝑗 , and (3) do not intersect: X ⋂ Y = Ø.
The search space for mining association rules and, therefore, the number of rules extractable
from big datasets are often too large for manual evaluation. The literature has proposed
different computational algorithms to search for frequent item-sets (X and Y). These
algorithms eliminate the need to investigate the enumeration of all possible item-sets in data
because the number of all possible item-sets can grow exponentially when a new item is
introduced into the search space. The choice of algorithm is highly dependent on the size of
dataset, complexity of transactions, and expected runtime to identify item-sets. A review of the
literature shows that, the commonly used algorithms include the variations of classic Apriori
(Agrawal and Srikant 1994) and FP-Growth algorithms (Han, Pei, and Yin 2000). In large and
complex datasets, FP-Growth outperforms the Apriori algorithm in generating the item-sets of
After generating item-sets, the ARM model ranks and quantifies their association strength
using probabilistic measures (Zhang and Zhang 2003, Hahsler 2015). The literature has offered
different measures to quantify the strength of association rules. These measures include, but
are not limited to, support, confidence, lift, conviction, leverage, chi-square, and correlation
(Tan et al. 2019). Despite their extensive applications, these measures have different statistical
power and descriptive capacity. They may even report inconsistent outcomes when
developers need to evaluate the efficacy and choice of measures in their specific ARM model
9
development (Tan, Kumar, and Srivastava 2004). In this study, the research methods section
will define and report the efficacy of a set of candidate measures evaluated specifically for the
proposed BCRS.
data for products or concepts relevant to the AEC industry (e.g., building components, spatial
elements and datums, and drafting symbols and annotations). With their pre-fined, flexible,
and re-usable templates, these objects are conducive to the productivity of design and
engineering tasks (Abdirad and Lin 2015, Abdirad and Dossick 2020). Therefore, the state-of-
the-art digital practice is centered on BCMD to develop, standardize, organize, and share BIM
objects across AEC teams (Holzer 2011, Afsari and Eastman 2014). However, because of the
sheer number of BIM objects that firms store in their digital libraries, it takes an average user
40-80 hours annually to locate and load BIM objects into models (UNIFI 2020). This motivated
the development and growth of BCMD tools and BCRSs to promote BIM productivity through
improved search and retrieval of objects. Most commercial tools, however, still rely on
inefficient manual efforts to query or browse libraries based on object classifications or meta-
data (Afsari and Eastman 2014). These systems predominantly employ Symbolic AI to process
the queried search terms using hard-coded reasoning logics (Lee et al. 2020).
Recent BCRSs have applied ML and KDD methods that use object data supplemented with
historical usage information (for collaborative filtering) or custom object features (for content-
based filtering). For instance, using collaborative filtering, Lee et al. (2020) applied
probabilistic matrix factorization to learn from and rank pre-fabricated components based on
the subjective relevancy scores that users assigned to the searched keywords and recommended
objects. Using content-based filtering, Zhang, Liu, and Al-Hussein (2018) trained Neural
Networks to classify lighting-fixtures with their product images and specifications, and to
recommend objects based on new images and requirements users loaded into the system.
10
Although these state-of-the-art solutions have utilized AI to improve conventional BCMD
workflows, they are still reliant on the collection of heterogeneous data such as explicit user
The foregoing limitations motivated this study to develop an alternative approach to AI-backed
BCRS; this approach uses massive datasets of technical information and their context from
awareness for the proposed BCRS is in that it mitigates the need for (a) collecting heterogenous
non-BIM data (e.g., user ratings) and (b) requiring active queries or explicit inputs to generate
recommendations. In this development, the technical context entails (a) the type of technical
view within which a user carries out the BIM tasks (e.g., detail view, plan view, section view)
(b) existing BIM objects that appear on the technical view as potentially associable objects
(e.g., existing steel framing objects could be antecedents of welds or bolts as connection
objects).
• The conventional BCMD workflows require manual queries for object search and
retrieval. These tools are rigid and inefficient in finding content as they heavily rely on
human experience and cognition to recurringly filter and target coveted contents.
individual objects for content search and retrieval. This approach lacks the intelligence
to learn from exogenous relationships and logical associations between different objects
additional data specifically for serving ML processes (e.g., user feedback and ratings),
11
the BIM practice has yet to fully benefit from AI to minimize development efforts and
better manage the massive search space for digital content and BIM resources.
• Research and practice have yet to fully explore the potential applications of ARM in
to both descriptive and predictive approaches (Zhang and Zhang 2003). The important
parameters for model building, and finally, building, validating, and implementing
Zhang 2003). The tasks carried out to develop the AI prototype that empowered the BCRS
included: (1) Problem Formulation and Case Study Setting, (2) Data Collection, (3) Data
Cleaning, (4) Data Selection, (5) Data Transformation, (6) Model Development and Rule
Extraction, (7) Pattern Evaluation, (8) Model Deployment and User Interface Design, (9)
Maintenance Tasks, and (10) Knowledge Representation and Usage (Zhang and Zhang 2003).
similarity of their historical preferences as the bases for recommending content (Falk 2019).
However, in professional practice, advanced recommender systems shall provide users with
content relevant to standards of practice and technical context in projects. Therefore, for
contents. Accordingly, this study advocates knowledge-based and context-based BCRSs that
promote technical coherence in BIM workflows. However, identifying all useful technical
patterns and explicating them in a BCRS can be infeasible with manual analysis because these
12
patterns are buried and distributed in large datasets (Zhang and Zhang 2003). Hence, this
research plans to extend the applications of AI and ML to the BCMD domain to automate
knowledge extraction and minimize the inefficiencies associated with manual knowledge
This study develops a hybrid BCRS accomplish the following objectives: (1) mine big data
analytics of BIM objects from the content used in historical BIM models; (2) measure the
statistical probability for co-occurrence and associations between BIM objects based on their
technical precedents in BIM views; and (3) use these statistics to automatically and
dynamically load and recommend a series of BIM objects to users in live BIM sessions given
the cues from the modeling context and human-computer interactions. Using ARM, this
exploration will analyze the instances of BIM objects shown on each technical view in project
models to discover rules about objects that co-exist. This process explores objects that tend to
be associated together in different technical contexts (e.g., building assemblies shown on BIM
plans, sections, and details) to discover patterns that resemble the following: Technical Views
Which Represented BIM Object X also Represented BIM Object Y. In contrast to the
conventional applications of ARM, this approach can explicate AEC technical knowledge
because BIM objects in project representational views logically associate to generate cohesive
This research used a case-study approach to develop and evaluate the proposed BCRS (Yin
2009). The case-study setting was a structural engineering firm in the United States. The
empirical setting of this case was especially well-suited for this study because (1) as an early
adopter of BIM, this firm had hundreds of completed digital models suitable for data mining;
(2) it had a large portfolio of diverse ongoing projects in different market sectors; (3) it
maintained and utilized a large BIM library with more than 1200 objects; and (4) it had highly
routinized digital workflows to reinforce firm-wide efficiency with the use of standardized
BIM objects. Although some characteristics of BIM data in this research are unique to the case-
13
study firm (e.g., object names and graphics), the development approach and the sequential
methods presented in this case are highly replicable and transferable to other firms. Therefore,
as an information-oriented case, this study can offer new insights and inform BIM practices on
This research developed a custom software application (using C#) to collect BIM content usage
data from over 400 domestic and international projects within the case-study firm. These
projects used Autodesk Revit (versions 2014 to 2021) to develop data-rich BIM models,
drawings, and documentation. Using Application Programming Interface (API) in Revit, this
custom tool recorded data from over forty-five thousand (45,000) technical views (e.g., plans,
sections, and details). For each view, this application extracted data from the represented BIM
objects, and it captured their identifying information in a list. Put differently, this research
considered each BIM view a transaction of multiple associative BIM objects (Figure 2). The
authors stored the initial dataset in a JSON format for further data analysis in Python with data
MLXTEND, Plotly, Matplotlib, Seaborn). Table 1 presents a snippet of the initial dataset.
14
Transaction_Set_View_23744 = {'AISC Tube Shapes-Section', 'Grid Head - Circle - Detail', 'Rebar-Detail-Bend-Z',
'Bolt - Side', 'AISC Wide Flange Shapes-Top', 'Weld Studs-Side', 'Squiggle Symb', ‘Rebar-Detail-Hook 135', 'Conn
Steel Shear-Elevation', 'AISC Tube Shapes-Side', 'AISC Wide Flange Shapes-Side', 'Conn Steel Plate-Side', 'Bolt -
Top', 'Anno Break Line - Single', 'Weld Symb', 'AISC Angle Shapes-Section', 'AISC Wide Flange Shapes-Section',
'Level Head - Target - Detail', 'Rebar Symb - Hooked - 90', 'Rebar Symb - Bend - 90', 'Rebar-Detail-Bar Section'}
Figure 2 – Extracting a BIM Object Information as a Transaction from BIM Views
The initial dataset contained (a) noisy data of non-standard and highly custom BIM objects
used in a small fraction of projects, (b) variations in the name of standard objects as BIM
libraries and implementation procedures evolved from 2014 to 2021, and (c) miscellaneous
BIM contents that did not represent technical BIM objects (e.g., text notes). The data cleaning
process, therefore, involved standardizing data from BIM objects and BIM views and filtering
15
1 Table 1 – Snippets of the Cleaned Dataset
ViewID ViewType BIM VIEW TITLE Transaction Set (Unique BIM Objects) Set Length
View_1_10 SECTION BEAM - BEAM MOMENT CONNECTION {'Bolt - Head-Side ', 'AISC Wide Flange Shapes-Section ', 'Weld 7
Symb ', 'Gusset Plate-Rectangular-Elevation', 'AISC Wide Flange
Shapes-Side ', 'Anno Break Line - Single ', 'Conn Beam Cope'}
View_1_101 SECTION CONCRETE COLUMN SPIRAL REINFORCING {'Level Head - Target - Detail ', 'Rebar-Detail-Bar Section', 'Anno 4
Section - Detail ', 'Anno Break Line - Single '}
View_1_10121 SECTION SECTION {'Rebar-Detail-Bar Section', 'Level Head - Target - Detail ', 'Rough 9
Joint Segment ', 'Rebar-Detail-Stirrup Closed', 'Rebar-Embed End',
'PT Detail Anchor-Section ', 'Rebar-Detail-Bend-90', 'Anno Extent-
Line - Detail ', 'Anno Break Line - Single '}
View_1_10128 SECTION TRELLIS POST ANCHORAGE {'Bolt - Side ', 'Squiggle Symb ', 'Conn Steel Plate-Side', 'Anno 7
Section - Detail ', 'AISC Wide Flange Shapes-Section ', 'Weld Symb
', 'Anno Break Line - Single '}
View_1_1019 SECTION ELEVATOR RAIL SUPPORT POST {'Bolt - Head-Side ', 'Bolt - Side ', 'Anno Section - Detail ', 'Level Head 6
- Target - Detail ', 'Weld Symb ', 'Anno Break Line - Single '}
* Supplementary data collected for each BIM view but not shown here include: “ProjectName”, “ProjectPath”, “Revit FileName”, RevitVersion”, “Sheet Name”, “View
Revit ID”, “View GUID”, “Transaction List” (where multiple instances of BIM Objects may be listed – not unique), “List Length”
+ The final dataset (after data cleaning and data selection) included 30,320 rows. Due to space limitations, this table presents only several transaction sets from BIM views.
2
16
3 4.4. Data Selection
4 The preliminary exploration of BIM views collected for this study revealed the
5 importance of informed data selection for ARM. This analysis showed that (a) some
6 views represented only text with no BIM object (e.g., general notes, specifications), (b)
7 some views represented fewer than two unique BIM objects, and (c) some drafted views
9 (e.g., independent CMU wall and drywall sections drawn together). Accordingly, from
11 (where fewer than two objects were used) or potentially implied incorrect associations
13 single view). Therefore, this research selected those views that used two or more BIM
14 objects that logically associate in construction assemblies. In this process, the authors
15 selected 30,320 technical BIM views for further analysis and ARM. The number of BIM
16 objects these views represented ranged between 2 and 50 (i.e., set length as shown in
17 Table 1). Figure 3 shows the distribution of views and the number of BIM objects they
19
20 * The Boxplot Shows Inter Quartile Range (IQR), i.e., Middle 50% of the Values
21 + The Curve Shows Kernel Density Estimate (KDE)
22 Figure 3 – Distribution of Count of BIM Objects Represented in BIM Views
17
23 4.5. Data Transformation
24 This research considered each BIM view a transaction list of multiple associative
25 BIM objects. As each view may represent multiple instances of an object (e.g., multiple
26 'AISC Wide Flange Shapes-Section' in Figure 2), the data transformation process
27 involved converting lists to immutable transaction sets of unique BIM objects (Table 1).
30 frequent item-sets across transaction sets, and (2) quantifying the strength or likelihood
31 of associations each pair of frequent item-sets (Zhang and Zhang 2003). The authors
32 randomly split the dataset into training data (80% of views) and test data (20% of views).
33 This facilitated building the ARM model on the training data and validating the predictive
34 capacity of extracted association rules on the test data. To generate the item-sets of
35 interest from the training dataset (80% of views, i.e., 24,256 views), this research used
36 MLXTEND module in Python (Raschka 2018) with FP-Growth algorithm to process the
37 large and complex dataset of technical BIM contents. FP-Growth performs faster than
38 alternative algorithms (e.g., Apriori algorithm) in large datasets (Han, Pei, and Yin 2000).
40 The classic ARM models use the general support-confidence framework to evaluate the
41 strength of association rules. The extant literature has found this framework inadequate
42 in many applications (Zhang and Zhang 2003), and therefore, explored and offered a
44 and Jorge 2007). These measures often evaluate the statistical correlation, probabilistic
46 definitions and formulas, these measures have different statistical power and predictive
47 capacity. Therefore, this study evaluates the efficacy of twelve relevant measures to
18
48 identify a measure that offers the highest predictive capacity for the BCRS in this study.
49 Table 2 outlines the definitions and formulas for these measures. This paper considered
50 both null-invariant and null-variant measures (Wu, Chen, and Han 2007) to account for
51 the possibility that many transactions may not represent certain BIM objects.
52 To manage the computational performance for the analysis and BCRS, the FP-Growth
53 algorithm generated frequent item-sets that occurred in at least 0.1-percent of the BIM
54 views (i.e., minimum support for X or Y in the association rules = 0.001). With this
55 criterion, this process retrieved 18,740 frequent item-sets from the dataset. Figure 4 shows
57
58 Figure 4 – Top 20 BIM Objects Found in the Dataset
59 After generating frequent item-sets, the authors calculated the strength of associations
60 between all combinations of the item-sets using the measures reported in Table 2. This
61 analysis returned 672,156 combinations with the assumption that the strength of X → Y
62 may not be equal to the strength of Y → X because some measures are not bidirectional.
63 Table 3 presents a snippet of the extracted rules and values for their strength.
19
64 Table 2 – Measures to Evaluate the Strength of Association Rules
Measures Definition Formulas Interpretive Limitations
Support Frequency of itemset X appearing in all transactions |X(t)| The rare item problem when
supp(X) =
(range: [0,1]). |T| distribution is uneven, minimum
Frequency of X → Y, i.e., item-sets X and Y appearing |{X ∪ Y}(t)| support criteria is too low, and it is
supp(X → Y) =
|T|
together in transactions (Agrawal, Imieliński, and the sole criteria.
Swami 1993).
Confidence Frequency of itemset Y appearing in transactions that supp(X → Y) It is a directed measure. It is highly
conf (X → Y) =
contain X (range: [0,1]) (Agrawal, Imieliński, and supp(X) sensitive to supp(Y). It is not *null-
Swami 1993). invariant.
Lift The rate at which X and Y occur together more often conf (X → Y) Sensitive to small databases wherein
lift (X → Y) =
than they were statistically independent (range: [0,∞]). supp(Y) item-sets with low support may co-
A lift value of 1 suggests X and Y independence (Brin occur together a few times by
et al. 1997). chance and generate high lift values.
It is not *null-invariant.
Leverage Also known as Piatetsky-Shapiro measure. The levg (X → Y) = supp(X → Y) − supp(X )supp(X ) Similar to Lift. It is not *null-
difference between X and Y occur together in the data invariant.
and the probability that X and Y were statistically
independent (range: [−1,1]). A leverage value of 0
suggests independence (Piatetsky-Shapiro 1991).
Conviction Reports the rate at the X is possible to occur without Y 1 − supp(Y) It is a directed measure. It is not
conv (X → Y) =
if they were dependent divided by the actual frequency 1 − conf (X → Y) *null-invariant.
of X occurring without Y (range: [0,∞]). A conviction
value of 1 indicates independence (Brin et al. 1997).
* Null Invariance means that the value does not change with the number of null transactions (transactions that do not contain X and Y).
65
20
66 Table 2 (continued) – Measures to Evaluate the Strength of Association Rules
Measures Definition Formulas Interpretive Limitations
Correlation Covariance between the two items divided by their corcoef (X → Y) Measure of linear interdependency.
Coefficient standard deviation (range: [−1,1]). A value of 0 supp(X → Y) − supp(X) ∗ supp(Y) It is not *null-invariant.
=
suggests independence. Also known as Phi correlation �supp(X) ∗ supp(Y) ∗ �1 − supp(X)� ∗ �1 − supp(Y)�
(Tan, Kumar, and Srivastava 2004).
2
Chi-Squared Test of independence between the antecedent and the supp(X) ∗ supp(Y) Sensitive to infrequent occurrences
�supp(X → Y) − �
|T|
consequent. Larger values indicate stronger 𝑐𝑐ℎ𝑖𝑖(X → Y) = of item-sets in large datasets. It is
supp(X → Y)
relationships (range: [0,∞]) (Brin, Motwani, and |T| not *null-invariant.
Silverstein 1997).
Cosine Modified cosine similarity measure for association 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(X → Y) *Null invariant measure of
𝑐𝑐𝑐𝑐𝑐𝑐(X → Y) =
rule mining (range: [0,1], 0.5 means no correlation) �supp(X) ∗ supp(Y) correlation.
(Tan, Kumar, and Srivastava 2004).
Jaccard Jaccard similarity between the two sets of transactions 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(X → Y) *Null invariant measure of
𝑗𝑗𝑗𝑗𝑗𝑗(X → Y) =
that contain the items in X and Y (range: [0,1]) (Tan, supp(X) + supp(Y) − 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(X → Y) dependence.
Kumar, and Srivastava 2004).
Klosgen Test of independence between the antecedent and the klosgen(X → Y) = �𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(X → Y)(conf (X → Y) *Null invariant measure of
consequent (range: [−1,1]). A value of 0 suggests − 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑌𝑌)) dependence.
independence (Tan, Kumar, and Srivastava 2004).
Kulczynski Measure of interestingness of relationships (range: conf (X → Y) + conf (Y → X) *Null-invariant measure with a
Kulc(X → Y) =
[0,1]). A Kulczynski value of 0.5 suggests neutral or 2 preference for skewed patterns.
uninteresting measure (Wu, Chen, and Han 2010).
Loevinger Also known as Certainty Factor (CF). It measures the conf (X → Y) − supp (Y ) *Null invariant measure of
𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿(X → Y) =
probability that Y is in a transaction that contains X 1 − supp (Y ) dependence.
(range: [-1,1]). A Loevinger values of 0 suggests
independence (Berzal et al. 2002).
* Null Invariance means that the value does not change with the number of null transactions (transactions that do not contain X and Y).
21
67
68 Table 3 – Snippets of Association Rules for Select BIM Objects
𝐗𝐗 → 𝐘𝐘 (Rules) 1 2 3 4
X* Y* SUPPORT SUPPORT SUPPORT CONFIDENCE LIFT LEVERAGE
(Antecedent) (Consequent) X Y 𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘
Anno Break Line - Single Weld Symb 0.915 0.370 0.350 0.380 1.030 0.010
Continued Below
Weld Symb Conn Steel Plate-Side 0.366 0.160 0.130 0.350 2.120 0.070
AISC Wide Flange Shapes-Side Weld Symb 0.136 0.370 0.100 0.770 2.100 0.050
AISC Tube Shapes-Side Weld Symb 0.098 0.370 0.080 0.830 2.280 0.050
Rebar-Detail-Bar Section Rebar-Detail-Hook 135 0.236 0.110 0.050 0.200 1.840 0.020
PT Detail Anchor-Section Rebar-Detail-Bar Section 0.019 0.240 0.010 0.770 3.280 0.010
Rebar-Detail-Stirrup Open Rebar-Detail-Bend-90 0.082 0.060 0.010 0.160 2.740 0.010
CMU-Section CMU-Bond Beam-Section 0.019 0.010 0.010 0.550 49.790 0.010
69
5 6 7 8 9 10 11 12
CONVICTION COSINE JACCARD KLOSGEN KULCZYNSKI CHI-SQUARED CORRELATION LOEVINGER
𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘 COEFFICIENT 𝐗𝐗 → 𝐘𝐘
𝐗𝐗 → 𝐘𝐘
Continued from Above
22
+ This dataset includes 672,156 rules; this study did not assume bidirectionality in the rules. Due to space limitations, this table presents only several rules.
70
23
71 4.7. Pattern Evaluation Using BIM Process Simulation
72 This research used the test dataset (20% of views, i.e., 6064 views) to simulate forty-
73 eight (48) potential configurations of a BCRS that recurringly offered BIM objects to a
74 virtual user in a BIM implementation process (sequentially picking and placing objects
76 capacity of the twelve (12) association measures (Table 3) by providing the user with N
77 recommendations (N = 24, 30, 36, or 42; i.e., four settings for the count of
79 Considering each view in the training data as a transaction set (Figure 2), this simulation
80 analyzed the percentage of objects that the BCRS accurately predicted and presented to
81 the user to assemble the transaction set. This simulation carried out the following steps
83 (1) Assuming that the BIM view is empty when a user begins the modeling process,
84 the system recommends the top N frequently used BIM objects to the user.
85 (2) If there is no overlap (or intersection) between the N recommendations and the
86 transaction set needed for the BIM view, the simulation stops and the efficacy for
88 (3) If there is overlap (or intersection) between the N recommendations and the
89 transaction set of the BIM view, the user selects an object needed in the
90 transaction set.
91 (4) Considering the previously selected object(s) as antecedents (X), the BCRS
92 presents the top N consequents (Y) with the highest association strength (based
24
94 (5) Repeat steps 3 and 4 until (a) there is no overlap (or intersection) between the
95 recommendations and the transaction set of the BIM view, or (b) all objects
96 needed for assembling the transaction set are selected and used.
97 (6) Calculate the efficacy of the BCRS by dividing the number of appropriately
98 selected BIM objects by the number BIM objects needed in the transaction set.
99 Table 4 shows the efficacy of the simulated recommended system with 48 configurations.
100 The results show that Kulczynski has the highest average efficacy and the least standard
101 deviations and variance among the measures. With this measure, when N = 30 or more,
102 the simulation predicted and included, in average, 80% or more of the required BIM
103 objects in the sequential recommendations. Figure 5 visualizes these results with
104 boxplots, depicting the clear advantage that Kulczynski has over the alternative measures.
105 The authors used the Analysis of Variance using One-Way Type II ANOVA with
106 Ordinary Least Squares to test for statistical significance of the results among the
107 measures (Park 2009). These analyses showed significant F-statistics (p-value <0.001)
108 and rejected the null hypothesis that all measures have similar average efficacy in the
109 BCRS. The pairwise comparison of measures using the Tukey’s Honest Significant
110 Difference (HSD; Abdi and Williams 2010) confirmed that Kulczynski had statistically
111 significant differences in efficacy when compared to other measures (p-value < 0.001;
112 see Table 5). Therefore, the authors selected Kulczynski as the proper measure that
114 It is important to note that this study did not apply precision and recall as model
115 evaluation metrics because these measures are more suited for supervised ML methods
116 wherein the efficacy of predictions are often treated as binary. In ARM, as an
118 the authors developed and implemented a custom simulation-based evaluation method.
25
119 Table 4 – Statistics for the Efficacy of the 48 Simulated Configurations using the Test Dataset
Efficacy for N Recommendations
Metric N = 24 N = 30 N = 36 N = 42
Mean Median STD Mean Median STD Mean Median STD Mean Median STD
Chi-Squared 0.415 0.333 0.246 0.435 0.375 0.250 0.476 0.444 0.243 0.510 0.500 0.241
Confidence 0.518 0.500 0.248 0.531 0.500 0.251 0.550 0.500 0.257 0.569 0.500 0.258
Conviction 0.554 0.500 0.244 0.574 0.500 0.246 0.596 0.500 0.250 0.620 0.600 0.248
Correl-Coeff 0.565 0.500 0.242 0.584 0.500 0.245 0.608 0.600 0.243 0.632 0.625 0.244
Cosine 0.534 0.500 0.251 0.560 0.500 0.258 0.584 0.500 0.264 0.614 0.571 0.266
Jaccard 0.521 0.500 0.248 0.540 0.500 0.253 0.561 0.500 0.258 0.589 0.500 0.262
Klosgen 0.535 0.500 0.248 0.548 0.500 0.250 0.571 0.500 0.254 0.588 0.500 0.256
Kulczynski 0.743 0.777 0.209 0.802 0.860 0.195 0.841 0.910 0.181 0.873 0.967 0.171
Leverage 0.541 0.500 0.250 0.571 0.500 0.248 0.591 0.500 0.249 0.621 0.600 0.249
Lift 0.415 0.333 0.246 0.435 0.375 0.250 0.475 0.444 0.243 0.510 0.500 0.241
Loevinger 0.554 0.500 0.244 0.574 0.500 0.246 0.595 0.500 0.250 0.620 0.600 0.248
Support 0.556 0.500 0.265 0.580 0.500 0.270 0.608 0.500 0.275 0.639 0.667 0.276
120
121
26
Table 5 – Tukey’s HSD Test of Significance for Differences in Mean Efficacy for ARM Measures
Values of Mean Difference Upper Triangular Matrix Shows Results for N = 24
ARM Measure Support Correl-Coeff Lift Conviction Loevinger Leverage Cosine Jaccard Klosgen Confidence Kulczynski Chi-Squared
Support 0.009** 0.142** 0.002 0.002 0.016** 0.022** 0.035** 0.022** 0.038** 0.187** 0.141**
Correl-Coeff 0.004 0.151** 0.011** 0.011** 0.025** 0.031** 0.044** 0.03** 0.047** 0.178** 0.15**
Lift 0.145** 0.148** 0.139** 0.139** 0.126** 0.12** 0.107** 0.12** 0.103** 0.328** 0.0
Conviction 0.006 0.01** 0.139** 0.0 0.013** 0.02** 0.033** 0.019** 0.036** 0.189** 0.139**
Loevinger 0.006 0.01** 0.139** 0.0 0.013** 0.02** 0.033** 0.019** 0.036** 0.189** 0.139**
Leverage 0.009** 0.013** 0.136** 0.003 0.003 0.006 0.019** 0.006 0.023** 0.202** 0.126**
Cosine 0.02** 0.024** 0.125** 0.014** 0.014** 0.011** 0.013** 0.001 0.016** 0.209** 0.119**
Jaccard 0.04** 0.044** 0.105** 0.034** 0.034** 0.031** 0.02** 0.014** 0.003 0.222** 0.106**
Klosgen 0.031** 0.035** 0.113** 0.025** 0.025** 0.022** 0.011** 0.009** 0.017** 0.208** 0.12**
Confidence 0.049** 0.052** 0.096** 0.042** 0.043** 0.039** 0.028** 0.009** 0.017** 0.225** 0.103**
Kulczynski 0.223** 0.219** 0.367** 0.229** 0.228** 0.232** 0.243** 0.263** 0.254** 0.271** 0.328**
Chi-Squared 0.144** 0.148** 0.0 0.138** 0.139** 0.135** 0.124** 0.104** 0.113** 0.096** 0.367**
Lower Triangular Matrix Shows Results for N = 30
Significant of Different in Means: * p-value < 0.01; ** p-value < 0.001
+ the pairwise post-hoc comparison of measures carried out after ANOVA with significant F-statistics (p-value <0.001)
122
27
123
124 —— Median … Mean
125 * Colored Boxes Show Inter Quartile Ranges (IQR), i.e., Middle 50% of the Values
126 + ANOVA and Tukey’s HSD Confirmed Statistically Significant Differences Between Kulczynski and
127 Other Measures
128 Figure 5 – Visualization for the Efficacy of the 48 Simulated Configurations using the Test Dataset
28
129 Since BIM views may represent different counts of objects in project settings (Figure 3), the
130 authors measured the average efficacy of association measures for views of different
131 complexity (in terms of count of BIM objects). Figure 6 visualizes the results for Kulczynski,
132 Support, Correlation-Coefficient, and Conviction (as the top candidate measures). This
133 analysis further confirmed that Kulczynski has a clear advantage over the other measures.
134 Figure 6 shows also that the efficacy of the BCRS keeps relatively consistent between 80%
136 The simulations showed that the change in the number of recommendations (N) can make up
137 to 13% difference in the average efficacy of Kulczynski (Table 4). The choice of N in this
138 development is not only important for the BCRS’s efficacy, it also affects user-experience as
139 a relatively small interface must accommodate N objects for presentation to the user. After
140 consultation with BIM experts in the case-study firm, the authors deployed the BCRS with N
141 =30 to keep the interface concise and yet effective (>80 % efficacy). The following section
143
29
144
145
146 Figure 6 – The Efficacy of Simulated Configurations across BIM Views of Different Complexity in the
147 Test Dataset (for clarity, only top 4 candidate measures are presented)
30
148 4.8. Model Deployment and User Interface Design
149 The simulation of BIM implementation tasks (sequentially placing objects on a view)
150 showed that Kulczynski (with N =30 or more) is a practical measure to support the proposed
151 BCRS. Therefore, the authors used the final ARM model (Table 3) as the backbone data of a
152 custom Revit add-in. Using the Revit API and C#, the authors programmed the add-in as an
153 IExternalApplication to (a) create a Graphical User Interface (GUI) to customize the user
154 experience in Revit, and (b) trigger context-aware events when users open a BIM view (using
155 ViewActivated Event), update content in the view (using DocumentChanged Event) or select
157 The BCRS add-in works as a dockable pane attached to the user-interface in Revit (Figure 7;
158 Part 2). This pane is composed of two sections: (1) AI Object Recommender (Figure 7; Part
159 2.A), and (2) Logical Object Browser (Figure 7; Part 2.B). The AI Object Recommender is
160 responsive to the BIM implementation context to recurringly update recommendations when
161 users activate a BIM view, change its content, or select an object (i.e., reading from
163 objects with pre-saved thumbnail images. Revisiting the ARM model (Table 3), the objects
164 represented in the active view are antecedents (Figure 7; Part 1) and the recommended objects
165 are their consequents (Figure 7; Part 2.A). To improve relevancy of recommendations, this
166 study used multiple ARM models for different view types (e.g., ARM for Elevations, Sections,
167 Details, or Plan. See ViewType in Table 1). The Logical Object Browser is responsive to the
168 objects the user selects in the active view or in the AI Object Recommender, and it lists logical
169 alternatives when parametric BIM objects are built to accommodate two or more alternatives.
170 For instance, as shown in Figure 7, Standard C-shaped Steel Channel selected in Part 2.A
171 generated the list of alternative designations (e.g., C12x30, C9x15) in Part 2.B.
31
172
173 Figure 7 – Dockable GUI for the BCRS with Revit User Interface.
174 Part 1: Active View Shows Technical View in BIM and BIM Objects Used as Antecedents; Part 2: BCRS Add-in.
175 Part 2.A: AI Object Recommender Showing Consequents; Part 2.B: Logical Object Browser Showing Applicable Logical Alternatives to a Selected Element.
32
176 4.9. Maintenance Tasks for the Recommender System
177 Regular maintenance is indispensable for the proposed BCRS because the standards of
178 engineering practice, BIM standards, and BIM platforms change over time. Table 6 lists the
179 maintenance tasks that the authors identified during the protype development and
181 Table 6 – Planned Maintenance Tasks in Response to Changes in BIM Implementation Ecosystem
Changes in BIM Implementation Ecosystem Maintenance Tasks
Standards of Design and Engineering Practice Recurringly collect data from BIM views in ongoing
and future BIM projects to capture and mine the latest
building assemblies and association of BIM objects.
184 users, who create and maintain digital building models in projects, and (2) BIM managers, who
185 standardize BIM workflows and libraries for efficiency and consistency of practice across
186 projects. For BIM users, the interface of the BCRS represents knowledge regarding associable
187 BIM objects given a technical context at hand in projects (Figure 7). For BIM managers, the
188 ARM output (e.g., Table 3) can offer knowledge regarding (a) the most frequently used BIM
189 objects (ranking item-sets by support), (b) the least frequently used BIM objects (ranking item-
190 sets by support), (c) the strongly associated BIM objects (ranking item-sets by strength of ARM
191 measures). BIM managers can translate this knowledge into actionable insights on the potential
192 changes to BIM content, standards, and processes. For instance, the BIM managers may create
33
193 a new set of composite content to combine strongly associated BIM objects (e.g., embedding
194 the ‘Rebar Coupler’ object into a parametric ‘Rebar’ object to facilitate showing/hiding
195 embedded objects instead of maintaining and using two different objects). The BIM managers
196 may also pre-load a set of most frequently used BIM objects into standard project templates to
197 mitigate the need to constantly access digital BIM libraries and wait for objects to load. The
198 BIM managers may also exclude a set of least frequently used BIM objects from standard
199 project templates to keep the size of BIM models manageable and avoid unfavorable user-
200 experience (e.g., sluggish BIM platforms when working with large models). Therefore, the
201 knowledge on the state of BIM object usage across projects can inform the BIM managers on
202 priority tasks of maintaining BIM content libraries and project artifacts.
205 an experiment in the case-study firm. In this experiment, the authors invited five BIM users to
206 create a sample of 50 BIM views randomly selected from the dataset. To ensure these views
207 captured the conventional BIM tasks in the firm, the sampling targeted the interquartile range
208 for complexity of views (i.e., showing 8 to 20 BIM objects; Figure 3).
209 The experiment used the One-Group Pretest-Posttest design (Campbell and Stanley 2015),
210 wherein the users created BIM views first without using the BCRS (pretest) and then re-created
211 the same BIM views using the BCRS (posttest). The goal was to time the users, calculate the
212 average time savings achieved by using the BCRS, and determine whether there was a
213 statistically significant difference between the time spent on pretest and posttest. The posttest
214 was carried out after a 5-min demonstrative introduction to the BCRS’s GUI since it was the
215 first exposure of the users to this tool. Controlling for the effect of user characteristics and
216 maturation (e.g., industry experience, technical knowledge, and familiarity with BIM objects),
217 the authors invited users with seven or more years of experience in BIM modeling at the firm.
34
218 To minimize the effect of history, instrumentation, and testing, the pretest and posttest for each
219 BIM view were carried out one week apart in a controlled office setting at the same daytime
220 hour using the same BIM platform for all users (Campbell and Stanley 2015). The total time
221 spent on pretest was 14.35 hours and the total time spent on posttest was 12.7 hours (~27 hours
223 To analyze the experimental data, the authors carried out a paired sample t-test to compare
224 work on BIM views with and without the recommendation system. This analysis showed that
225 using the BCRS, in average, saved 15 percent of the active time spent on the BIM views.
226 Furthermore, in this sample, there was a significant difference in the average time spent on
227 BIM views with the BCRS (mean = 15.1 minutes, STD = 11.8) and without the BCRS (mean
228 = 17.2 minutes, ST = 11.35) in the t-test (mean difference = 2.1; t(49) = 7.66; p < 0.001). In
229 summary, these results show that, by dynamically predicting and presenting associative BIM
230 objects, this BCRS can improve the efficiency of conventional BIM workflows with
232 6. Discussion
233 Despite the exponential growth of the literature on BIM applications in the AEC industry,
234 research on BCRSs is still in its infancy. This section discusses the results and significance of
235 this study in comparison to the narrow body of literature on this topic. This study developed
236 an innovative AI-backed BCRS to bridge some critical development gaps in the recent BCMD
237 solutions and BCRSs. First, the proposed prototype mines BIM content association data from
238 existing projects to mitigate the need for collecting heterogenous non-BIM data. This marks a
239 departure from existing solutions that require subjective user ratings (e.g., Lee et al. 2020) or
240 product images (Zhang, Liu, and Al-Hussein 2018) to inform the AI-engine on the relevancy
241 or importance of BIM objects. Second, the proposed prototype automatically reads the
242 technical context of live BIM sessions (e.g., type of technical view; existing objects in a BIM
35
243 model)) and offers recommendations accordingly. Recent solutions, however, rely on manual
244 user queries (e.g., search strings; Lee et al. 2020) or other inputs (e.g., uploading images;
245 Zhang, Liu, and Al-Hussein 2018) as the basis to trigger the AI-engine and generate
246 recommendations. Third, by using ARM as the ML method, the proposed prototype is
247 completely unsupervised, is computationally efficient and inexpensive for model building, and
248 it does not require iterative development cycles, manual exploration, and fine-tuning of hyper-
249 parameters. In addition to the streamlined development process and improved user-experience,
250 the simulation and experimental implementation of the proposed protype achieved significant
251 efficacy (>80% accuracy in prediction) and efficiency (saving 15% on man-hours). Although
252 the existing research to date has not yet provided similar statistics to facilitate a direct
253 quantitative comparison of the prototypes (Lee et al. 2020, Zhang, Liu, and Al-Hussein 2018),
254 the promising results in this study show that BCDM can benefit from future AI developments
255 to improve the user-experience and efficiency in BIM implementation in AEC domains.
256 7. Conclusion
257 This research developed and implemented a context aware AI-backed BCRS in a case-
258 study engineering firm to minimize the manual efforts commonly associated with the
259 development and use of existing BCRSs. In particular, the proposed BCRS leveraged big data
260 from over 30,000 technical BIM views to build an unsupervised ARM model and explicate the
261 strength of associative relationships among BIM objects. This BCRS dynamically provides
262 users with a set of BIM objects that are probabilistically associable with the cues from modeling
263 context (e.g., view type, existing objects in the model) and human-computer interactions in live
264 BIM sessions (e.g., objects selected by the user). Based on the simulation carried out on over
265 6000 actual BIM views, the BCRS achieved the efficacy of over 80% in predicting the sought-
266 after BIM objects needed for sequentially creating the views. The field experiment of the BCRS
36
267 in the case-study firm showed that it can save in average 15% of the time spent on conventional
269 This study carries noteworthy contributions to research and developments in engineering
270 informatics, especially BIM processes and technologies. First, this study offers an innovative
271 approach to learning from historical BIM data for BCMD. This approach leverages coherent
272 technical BIM precedents while mitigating the need for actively collecting heterogenous data
273 like user similarity, personalized ratings, or popularity of contents to serve the recommendation
274 system. Second, this study demonstrates the application of BCRS in live BIM sessions, wherein
275 the recommender system can automatically respond to human-computer interactions and
276 technical context in models. Earlier prototypes dominantly need recurring input from users to
277 offer recommendations. With these contributions, this study is emphatic in that
278 recommendation systems must align with the nature of data, users, tasks, and work domain to
279 be effective (Falk 2019). Therefore, the authors caution the BIM practitioners against merely
280 adopting conventional e-commerce recommendations systems for such technical tasks as
282 Although the research and development methods presented in this case-study can be replicable
283 and transferrable to other cases, the generalizability of achieved efficacy and time-savings is
284 subject to certain limitations. For instance, BIM standards and workflows may significantly
285 vary across firms and across AEC disciplines (e.g., the number of BIM objects maintained in
286 BIM libraries and used in projects). Furthermore, since this BCRS relies on BIM precedents,
287 it is biased toward available data from historical cases. Therefore, similar to other
288 recommender systems, this BRCS may not account for new BIM objects or new digital
289 building assemblies that AEC firms introduce into their design and engineering practice. As
290 discussed earlier in section 4.9, recurring maintenance tasks must be carried out to address such
37
292 Funding
293 This research did not receive any specific grant from funding agencies in the public,
295 8. References
296 Abdi, Hervé, and Lynne J Williams. 2010. "Tukey’s honestly significant difference (HSD)
297 test." Encyclopedia of research design 3 (1):1-5.
298 Abdirad, H., and C. Dossick. 2020. "Rebaselining Asset Data for Existing Facilities and
299 Infrastructure." Journal of Computing in Civil Engineering 34 (1):05019004. doi:
300 10.1061/(ASCE)CP.1943-5487.0000868.
301 Abdirad, H., and Kenyu Lin. 2015. "Advancing in Object-Based Landscape Information
302 Modeling: Challenges and Future Needs." 2015 International Workshop on Computing
303 in Civil Engineering, Austin, TX.
304 Afsari, Kereshmeh, and Chuck Eastman. 2014. "Categorization of building product models
305 in BIM Content Library portals." Blucher design proceedings 1 (8).
306 Agrawal, Rakesh, Tomasz Imieliński, and Arun Swami. 1993. "Mining association rules
307 between sets of items in large databases." Proceedings of the 1993 ACM SIGMOD
308 international conference on Management of data.
309 Agrawal, Rakesh, and Ramakrishnan Srikant. 1994. "Fast algorithms for mining association
310 rules." Proc. 20th int. conf. very large data bases, VLDB.
311 Akhavian, Reza, and Amir H. Behzadan. 2015. "Construction equipment activity
312 recognition for simulation input modeling using mobile sensors and machine learning
313 classifiers." Advanced Engineering Informatics 29 (4):867-877. doi:
314 https://doi.org/10.1016/j.aei.2015.03.001.
315 Akinosho, Taofeek D., Lukumon O. Oyedele, Muhammad Bilal, Anuoluwapo O. Ajayi,
316 Manuel Davila Delgado, Olugbenga O. Akinade, and Ashraf A. Ahmed. 2020. "Deep
317 learning in the construction industry: A review of present status and future
318 innovations." Journal of Building Engineering 32:101827. doi:
319 https://doi.org/10.1016/j.jobe.2020.101827.
320 Amer, Fouad, and Mani Golparvar-Fard. 2021. "Modeling dynamic construction work
321 template from existing scheduling records via sequential machine learning." Advanced
322 Engineering Informatics 47:101198. doi: https://doi.org/10.1016/j.aei.2020.101198.
323 Azevedo, Paulo J, and Alípio M Jorge. 2007. "Comparing rule measures for predictive
324 association rules." European Conference on Machine Learning.
38
325 Berzal, Fernando, Ignacio Blanco, Daniel Sánchez, and Maria-Amparo Vila. 2002.
326 "Measuring the accuracy and interest of association rules: A new framework."
327 Intelligent Data Analysis 6 (3):221-235.
328 Bilal, Muhammad, and Lukumon O. Oyedele. 2020. "Guidelines for applied machine
329 learning in construction industry—A case of profit margins estimation." Advanced
330 Engineering Informatics 43:101013. doi: https://doi.org/10.1016/j.aei.2019.101013.
331 Braun, Alex, and André Borrmann. 2019. "Combining inverse photogrammetry and BIM
332 for automated labeling of construction site images for machine learning." Automation
333 in Construction 106:102879. doi: https://doi.org/10.1016/j.autcon.2019.102879.
334 Brin, Sergey, Rajeev Motwani, and Craig Silverstein. 1997. "Beyond market baskets:
335 Generalizing association rules to correlations." Proceedings of the 1997 ACM
336 SIGMOD international conference on Management of data.
337 Brin, Sergey, Rajeev Motwani, Jeffrey D Ullman, and Shalom Tsur. 1997. "Dynamic
338 itemset counting and implication rules for market basket data." Proceedings of the 1997
339 ACM SIGMOD international conference on Management of data.
340 Campbell, Donald T, and Julian C Stanley. 2015. Experimental and quasi-experimental
341 designs for research: Ravenio Books.
342 Chen, Meng-Hui, Chin-Hung Teng, and Pei-Chann Chang. 2015. "Applying artificial
343 immune systems to collaborative filtering for movie recommendation." Advanced
344 Engineering Informatics 29 (4):830-839. doi:
345 https://doi.org/10.1016/j.aei.2015.04.005.
346 Cheng, Jack C. P., Weiwei Chen, Keyu Chen, and Qian Wang. 2020. "Data-driven
347 predictive maintenance planning framework for MEP components based on BIM and
348 IoT using machine learning algorithms." Automation in Construction 112:103087. doi:
349 https://doi.org/10.1016/j.autcon.2020.103087.
350 Dayan, Aviram, Guy Katz, Naseem Biasdi, Lior Rokach, Bracha Shapira, Aykan Aydin,
351 Roland Schwaiger, and Radmila Fishel. 2011. "Recommenders benchmark
352 framework." Proceedings of the fifth ACM conference on Recommender systems.
353 Falk, Kim. 2019. Practical Recommender Systems. Shelter Island: Manning Publications.
354 Flach, Peter. 2012. Machine learning: the art and science of algorithms that make sense of
355 data: Cambridge University Press.
356 Flyvbjerg, B. 2001. Making Social Science Matter: Why Social Inquiry Fails and How it
357 Can Succeed Again: Cambridge University Press.
358 Francois, Chollet. 2017. Deep learning with Python. Shelter Island: Manning Publications.
39
359 Hahsler, Michael. 2015. "A probabilistic comparison of commonly used interest measures
360 for association rules." accessed 2020/12/20.
361 http://michael.hahsler.net/research/association_rules/measures.html.
362 Han, Jiawei, Jian Pei, and Yiwen Yin. 2000. "Mining frequent patterns without candidate
363 generation." ACM sigmod record 29 (2):1-12.
364 Holzer, Dominik. 2011. "BIM's seven deadly sins." International journal of architectural
365 computing 9 (4):463-480.
366 Huang, M. Q., J. Ninić, and Q. B. Zhang. 2021. "BIM, machine learning and computer
367 vision techniques in underground construction: Current status and future perspectives."
368 Tunnelling and Underground Space Technology 108:103677. doi:
369 https://doi.org/10.1016/j.tust.2020.103677.
370 Kim, Min-Koo, Julian Pratama Putra Thedja, Hung-Lin Chi, and Dong-Eun Lee. 2021.
371 "Automated rebar diameter classification using point cloud data based machine
372 learning." Automation in Construction 122:103476. doi:
373 https://doi.org/10.1016/j.autcon.2020.103476.
374 Larose, D. T. 2015. Data Mining and Predictive Analytics, Wiley Series on Methods and
375 Applications in Data Mining: Wiley.
376 Lee, Pin-Chan, Danbing Long, Bo Ye, and Tzu-Ping Lo. 2020. "Dynamic BIM component
377 recommendation method based on probabilistic matrix factorization and grey model."
378 Advanced Engineering Informatics 43:101024. doi:
379 https://doi.org/10.1016/j.aei.2019.101024.
380 Lomio, F., R. Farinha, M. Laasonen, and H. Huttunen. 2018. "Classification of Building
381 Information Model (BIM) Structures with Deep Learning." 2018 7th European
382 Workshop on Visual Information Processing (EUVIP), 26-28 Nov. 2018.
383 McCabe, Brenda, M. AbouRizk, and Randy Goebel. 1998. "Belief Networks for
384 Construction Performance Diagnostics." Journal of Computing in Civil Engineering
385 12 (2):93-100. doi: 10.1061/(ASCE)0887-3801(1998)12:2(93).
386 Mythili, MS, and AR Mohamed Shanavas. 2013. "Performance evaluation of apriori and
387 fp-growth algorithms." International Journal of Computer Applications 79 (10).
388 Park, Hun Myoung. 2009. "Comparing group means: t-tests and one-way ANOVA using
389 Stata, SAS, R, and SPSS." The University Information Techology Services (UITS)
390 Center for Statistical and Mathematical Computing, Indiana University, IN, USA.
391 Piatetsky-Shapiro, Gregory. 1991. "Discovery, analysis, and presentation of strong rules."
392 Knowledge discovery in databases:229-238.
40
393 Raschka, Sebastian. 2018. "MLxtend: providing machine learning and data science utilities
394 and extensions to Python's scientific computing stack." Journal of open source
395 software 3 (24):638.
396 Sacks, Rafael, Mark Girolami, and Ioannis Brilakis. 2020. "Building Information
397 Modelling, Artificial Intelligence and Construction Tech." Developments in the Built
398 Environment 4:100011. doi: https://doi.org/10.1016/j.dibe.2020.100011.
399 Taghavi, Mona, Jamal Bentahar, Kaveh Bakhtiyari, and Chihab Hanachi. 2018. "New
400 Insights Towards Developing Recommender Systems." The Computer Journal 61
401 (3):319-348. doi: 10.1093/comjnl/bxx056.
402 Tamke, Martin, Paul Nicholas, and Mateusz Zwierzycki. 2018. "Machine learning for
403 architectural design: Practices and infrastructure." International Journal of
404 Architectural Computing 16 (2):123-143. doi: 10.1177/1478077118778580.
405 Tan, P.N., M. Steinbach, A. Karpatne, and V. Kumar. 2019. Introduction to Data Mining.
406 Essex: Pearson.
407 Tan, Pang-Ning, Vipin Kumar, and Jaideep Srivastava. 2004. "Selecting the right objective
408 measure for association analysis." Information Systems 29 (4):293-313.
409 Tzonis, A., and I. White. 2012. Automation Based Creative Design - Research and
410 Perspectives. Amsterdam: Elsevier Science.
411 UNIFI. 2020. "Reasons You Need a BIM Content Management System." Digital Built
412 Environment Institute, accessed 12/12/2020. https://www.dbei.org/news/top-3-
413 reasons-you-need-a-bim-content-management-system/.
414 Wilson, O. D., K. Sharpe, and R. Kenley. 1987. "Estimates given and tenders received: a
415 comparison." Construction Management and Economics 5 (3):211-226. doi:
416 10.1080/01446198700000021.
417 Wu, Tianyi, Yuguo Chen, and Jiawei Han. 2007. "Association mining in large databases: A
418 re-examination of its measures." European conference on principles of data mining and
419 knowledge discovery.
420 Wu, Tianyi, Yuguo Chen, and Jiawei Han. 2010. "Re-examination of interestingness
421 measures in pattern mining: a unified framework." Data Mining and Knowledge
422 Discovery 21 (3):371-397.
423 Xu, Yayin, Ying Zhou, Przemyslaw Sekula, and Lieyun Ding. 2021. "Machine learning in
424 construction: From shallow to deep learning." Developments in the Built Environment
425 6:100045. doi: https://doi.org/10.1016/j.dibe.2021.100045.
41
426 Yin, R.K. 2009. Case Study Research: Design and Methods. Thousand Oaks: SAGE
427 Publications.
428 Zhang, C., and S. Zhang. 2003. Association Rule Mining: Models and Algorithms. Berlin:
429 Springer Berlin Heidelberg.
430 Zhang, Yuxuan, Hexu Liu, and Mohamed Al-Hussein. 2018. "Recommender System for
431 Improving BIM Efficiency: An Interior Finishing Case Study." In Construction
432 Research Congress 2018, 22-32.
433 Zhou, Zhipeng, Yang Miang Goh, and Lijun Shen. 2016. "Overview and Analysis of
434 Ontology Studies Supporting Development of the Construction Industry." Journal of
435 Computing in Civil Engineering 30 (6):1-14. doi: 10.1061/(ASCE)CP.1943-
436 5487.0000594.
437
42