Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/355000000

Artificial intelligence for BIM content management and delivery: Case study
of association rule mining for construction detailing

Article in Advanced Engineering Informatics · October 2021


DOI: 10.1016/j.aei.2021.101414

CITATIONS READS

19 943

2 authors:

Hamid Abdirad Pegah Mathur


DPR Construction North Carolina State University
17 PUBLICATIONS 456 CITATIONS 3 PUBLICATIONS 51 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Hamid Abdirad on 22 November 2021.

The user has requested enhancement of the downloaded file.


Abdirad, H., and Mathur, P. (2021). "Artificial intelligence for BIM content management and
delivery: Case study of association rule mining for construction detailing." Advanced
Engineering Informatics, 50, 101414. https://doi.org/10.1016/j.aei.2021.101414

Artificial Intelligence for BIM Content Management and Delivery:


Case Study of Association Rule Mining for Construction Detailing
Hamid Abdirad, Ph.D.a
Pegah Mathur, Ph.D. Candidateb

a
DPR Construction; hamida@dpr.com (Corresponding Author)

315 2nd Ave S Ste.200, Seattle, WA 98104, United States of America

b
North Carolina State University; pmathur2@ncsu.edu

50 Pullen Rd, Raleigh, NC 27607, United States of America

1
Abstract
The proliferation of Building Information Modeling (BIM) applications in design and

engineering practice, in tandem with the extensive variation of building products, has posed

new demands on firms to efficiently manage and reuse BIM content (i.e., data-rich parametric

model objects and assembly details). Tasks such as classifying objects, indexing them with

meta-data (e.g., category), and searching digital libraries to load objects into models still plague

practice with inefficient manual workflows. This research aims to improve the productivity of

BIM content management and retrieval by developing an AI-backed BIM content

recommender system. Using data from a case-study firm, this research extracted content from

over 30,000 technical BIM views (e.g., plans, sections, details) in historical projects to build

an unsupervised machine-learning prototype with association rule mining. This prototype

explicated the strength of relationships among co-occurring BIM objects. Using this prototype

as the backbone AI-engine in live BIM sessions, this research developed a context-aware

recommender system that dynamically provides BIM users with a set of objects associable with

their modeling context (e.g., type of view, existing objects in the model) and human-computer

interactions (e.g., objects selected by the user). By mining association data from hundreds of

historical projects, this development marks a departure from the existing prototypes that rely

on explicit coding, recurring user input, or subjective ratings to recommend BIM content to

users. The simulation and experimental implementation of this recommender system yielded

high efficacy in predicting content needs and significant saving time in BIM workflows.

Keywords

BIM, Content Management, Recommender Systems, Machine Learning, Association Rules

2
1. Introduction
Architectural, Engineering, and Construction (AEC) practices accumulate large quantities

of technical information in their day-to-day work (Zhou, Goh, and Shen 2016). This

information describes technical characteristics of buildings with symbolic and iconic means

(text and graphics) through heterogenous physical or digital media (Tzonis and White 2012).

The uptake of digital media, particularly Building Information Modeling (BIM) technologies,

engendered new opportunities and challenges to represent technical building information. BIM

supports creating, customizing, and re-using data-rich parametric “objects” to drive

geometries, graphics, and information of building components in digital models and drawings.

However, the extensive variation of building products and the growing expectations for digital

information delivery in projects have imposed new demands on designers and engineers to

create, standardize, maintain, and reuse thousands of object templates in their digital BIM

libraries. In particular, the industry estimates show that an average BIM user spends 40- 80

hours annually (2 to 4 percent of total working hours) just to search these extensive libraries to

locate and load objects into BIM models (UNIFI 2020). This issue motivated the development

of BIM Content Management and Delivery (BCMD) tools to facilitate organizing, searching,

loading BIM content (with cloud repositories, digital libraries, add-on user interfaces, and

search engines). Despite these advances in BCMD tools, the emergent solutions still plague

BIM workflows with manual efforts such as: (a) classifying objects and organizing folders

based on custom categories or taxonomies; (b) indexing and registering objects in search

engines with hard-coded meta-data on their functional or technical properties; and (c) querying

or browsing BCMD tools to locate, pick, and load relevant objects into project models.

Given the significance of current BCDM inefficiencies in design and engineering, this

study aims to develop an artificially intelligent BIM Content Recommender System (BCRS)

to minimize the inefficiencies associated with manual content search and retrieval. This BCRS

uses Association Rule Mining (ARM) as an unsupervised Machine Learning (ML) approach

3
to accomplish the following objectives: (1) mine big data analytics of BIM objects from the

content used in historical BIM models; (2) measure the statistical probability for co-occurrence

and associations between BIM objects based on their technical precedents in BIM views (e.g.,

sections, details); and (3) use these statistics to automatically and dynamically load and

recommend a series of BIM objects to users in live BIM sessions given the cues from the

modeling context and human-computer interactions. Put differently, the goal for this prototype

is to learn from and make recommendations based on the association of BIM objects that

formed technically sound information in historical models and drawings.

The simulation and experimental implementation of the proposed BCRS in a case study

engineering firm achieved 80% efficacy in predicting content needs and saved 15% of the time

spent on conventional BIM workflows. This state-of-the-art dynamic BCRS marks a departure

from the existing solutions that commonly rely on explicit coding, extensive user input and

feedback, or subjective ratings (Lee et al. 2020). With significant predictive power and process

efficiency, lessons learned from this BCRS development can make theoretical and empirical

contributions to BIM-enabled engineering informatics and inform the future of AI-backed BIM

contents, processes, and platforms.

2. Background
2.1. Artificial Intelligence for Recommender Systems
Recommender systems provide users with appropriate content based on data from

characteristics of users, contents, or their historical interactions (Falk 2019). These systems are

widely implemented in digital artifacts that consumers use for e-commerce, entertainment, and

knowledge management (Lee et al. 2020). Although the literature offers different taxonomies

to dissect recommender systems, it is common to characterize recommender systems based on

their data sourcing and content filtering approaches (Taghavi et al. 2018). These approaches

include (a) collaborative-filtering, (b) content-based filtering, and (c) hybrid filtering.

Collaborative filtering systems rely on the patterns of content usage in similar past scenarios

4
(e.g., based on content usage by similar peer users; based on ratings by other users). Content-

based filtering systems analyze and match the attributes and characteristics of contents to those

sought by the active user. Hybrid filtering systems (also known as Ensemble Systems) use a

combination of collaborative and content-based filtering to tap into massive historical datasets

as well as detailed descriptions of contents and users to make recommendations (Chen, Teng,

and Chang 2015, Falk 2019). Depending on the knowledge domain, hybrid systems may utilize

supplementary context-based information (e.g., real-time user behavior or technical reasoning)

to improve recommendations (Dayan et al. 2011).

The core computational backbone of recommender systems is the algorithm or model that

analyze input data and filter content to predict what a user needs (Taghavi et al. 2018). The

uptake of Artificial Intelligence (AI) has engendered and advanced many algorithms and

models for recommender systems (Falk 2019). The early approaches to AI automation, also

known as Symbolic AI, involved explicitly hard-coding human knowledge, tasks, and logical

rules in digital artifacts to generate outputs. These approaches - commonly used in expert

systems and case-based reasoning prototypes - failed to address complex problems with large

datasets where it was infeasible for humans to build in-depth knowledge, identify rules, or

explicate logics. This motivated the development of Machine-Learning (ML) methods and

Knowledge Discovery in Databases (KDD), whereby computers automatically learn rules and

build knowledge using data mining techniques (Figure 1). In fact, rather than being explicitly

programmed, ML models are trained with example cases to learn and implement rules for such

tasks as making recommendations (Francois 2017, Falk 2019). Therefore, knowledge in ML is

data-driven while knowledge in Symbolic AI is human-driven (Zhang and Zhang 2003).

5
Input Data
Symbolic AI Systems Output Answers & Prediction
Explicitly Coded
Rules and Conditions

Input Data
Machine-Learning Systems
Learned Rules and Conditions
Example Output Answers

Figure 1 – Paradigm Shift in AI to Use Machine-Learned Knowledge (Francois 2017)

The analytic spectrum of ML models ranges from descriptive to predictive approaches.

Descriptive models learn rules and identify characteristics of existing datasets to inform

practice, while predictive models apply the learned rules on new datasets to predict their

characteristics. The learning process in these models can be either supervised or unsupervised.

With supervised ML models, previously seen outputs from example cases and their known

features (e.g., observed classification of objects or regressed numeric variables) drive

description and prediction of features in new data. With unsupervised ML models, previously

unknown features and outputs (e.g., discriminant characteristics, association of objects, clusters

of objects) are directly derived from data (Zhang and Zhang 2003, Flach 2012). Among the

unsupervised ML models, Association Rule Mining (ARM) has gained popularity because of

its diverse applications, clear representation, relevancy to both descriptive and predictive

approaches. ARM identifies the strength of relationships between objects that frequently occur

together in large datasets (Zhang and Zhang 2003). This research builds an ARM model that

supports a context aware BCRS. The goal in this prototype is to mine existing BIM datasets

and identify association of BIM objects that occur together in technical engineering

representations. These rules will inform the BCRS to monitor human-computer interactions in

live BIM sessions and recommend relevant BIM objects to users accordingly.

6
2.2. Applications of ML in the AEC Industry
Despite the significant progress in the development and diffusion of ML applications in

many high-tech sectors, the AEC industry has been notoriously slow and lagging in exploring

and adopting ML in business settings (Xu et al. 2021). With its unique institutional and business

context, the fragmented nature of this industry imposes many practical, commercial, cultural,

and organizational challenges onto the empirical investigation and real-world adoption of ML

(Sacks, Girolami, and Brilakis 2020). In fact, only in the past several years has the research

gained momentum to revisit and build on the early examples of ML in the AEC domains

(McCabe, AbouRizk, and Goebel 1998, Wilson, Sharpe, and Kenley 1987).

With this increasing momentum, recent studies proposed a growing number of ML applications

(ranging from supervised to unsupervised, and descriptive to predictive) to improve AEC

practice in different project and life-cycle phases of built assets. For example, in the design and

planning phases, the proposed application of ML include predicting the efficacy of design

alternatives based on historical precedents, accelerating the duration of building performance

simulations based on prior simulations, and mining and classifying historical building data for

extracting implicit design insights (Tamke, Nicholas, and Zwierzycki 2018, Sacks, Girolami,

and Brilakis 2020). For preconstruction phases, researchers proposed ML prototypes that

predict project cost, schedule, profit margins, potentials for disputes, and potentials for

successful tendering (Bilal and Oyedele 2020, Xu et al. 2021, Amer and Golparvar-Fard 2021).

For the construction phase, the existing ML prototypes heavily focused on recognizing and

tracking objects (workers, building components, objects, and equipment) based on imagery,

GPS, or laser scan data in order to predict, monitor, and control project progress, safety hazards,

and as-built conditions (Braun and Borrmann 2019, Kim et al. 2021, Akhavian and Behzadan

2015). For the post-construction phases, the applications of ML have advanced methods of

analyzing built assets for operations, maintenance, and condition assessment, especially for

structural health monitoring and asset inspection (e.g., detecting crack or damages from

7
images) and predictive maintenance planning for equipment (Cheng et al. 2020, Akinosho et

al. 2020, Huang, Ninić, and Zhang 2021).

With the growing utilization of BIM in design, construction and operations, researchers and

practitioners have been intrigued by the possibility to leverage large BIM datasets or

visualizations in innovative ML applications (Braun and Borrmann 2019, Lomio et al. 2018).

However, this line of research is still in its infancy because of the challenges such as incomplete

models, poorly structured or inaccessible file formats, the lack of semantically rich objects, and

data representations incompatible with machine learning algorithms (Sacks, Girolami, and

Brilakis 2020). These challenges limit the feasibility of developing scalable BIM-based ML

applications, and they often necessitate generating new data in addition to existing BIM models

to support ML. For instance, the existing studies that developed ML-based BCRS require

supplementing BIM data with explicit codes, extensive user input and feedback, or subjective

ratings (Lee et al. 2020, Zhang, Liu, and Al-Hussein 2018). To overcome these challenges and

minimize data and development inefficiencies, this study proposes an innovative approach to

collecting, structuring, and representing historical BIM data to build a ML-backed BCRS

prototype with Association Rule Mining (ARM).

2.3. Association Rule Mining


ARM is an unsupervised ML method used to discover hidden relationships in distinct

transactions (Agrawal, Imieliński, and Swami 1993). The term transaction refers to a group of

items or entities that have implicit or explicit relationships as they build a meaningful collection

together. For example, this can be a group of items bought together in a shopping cart, or a set

of webpages a user visited in a session. This study applies this concept to BIM processes such

that each technical BIM view (e.g., plan, section, detail, etc.) is considered a transaction

composed of associative BIM objects. ARM mines transactions and discloses the rules that

quantify the strength of association(s) between their items. Each rule is defined in the form

8
X→Y, meaning that when the item X (antecedent) is observed in a transaction, item Y

(consequent) is also observed (Tan et al. 2019, Zhang and Zhang 2003).

Using the set theory, let 𝐼𝐼 = {𝑖𝑖1 , 𝑖𝑖2 ,· · ·, 𝑖𝑖𝑚𝑚 } be the set of all unique items that can take place in

a transaction. Let 𝑇𝑇 = �𝑡𝑡1 , 𝑡𝑡2 ,· · ·, 𝑡𝑡𝑗𝑗 � be the set of all distinct transactions in a dataset. Each

transaction 𝑡𝑡𝑗𝑗 stores a number of items from I. Given the dataset T, for the association rule

X→Y, let X and Y be the item-sets that (1) hold one or more items from I (X ⊂ 𝐼𝐼, Y ⊂ I), (2)

occur within one or more transactions: {X ∪ Y} ⸦ 𝑡𝑡𝑗𝑗 , and (3) do not intersect: X ⋂ Y = Ø.

The search space for mining association rules and, therefore, the number of rules extractable

from big datasets are often too large for manual evaluation. The literature has proposed

different computational algorithms to search for frequent item-sets (X and Y). These

algorithms eliminate the need to investigate the enumeration of all possible item-sets in data

because the number of all possible item-sets can grow exponentially when a new item is

introduced into the search space. The choice of algorithm is highly dependent on the size of

dataset, complexity of transactions, and expected runtime to identify item-sets. A review of the

literature shows that, the commonly used algorithms include the variations of classic Apriori

(Agrawal and Srikant 1994) and FP-Growth algorithms (Han, Pei, and Yin 2000). In large and

complex datasets, FP-Growth outperforms the Apriori algorithm in generating the item-sets of

interest (Mythili and Shanavas 2013).

After generating item-sets, the ARM model ranks and quantifies their association strength

using probabilistic measures (Zhang and Zhang 2003, Hahsler 2015). The literature has offered

different measures to quantify the strength of association rules. These measures include, but

are not limited to, support, confidence, lift, conviction, leverage, chi-square, and correlation

(Tan et al. 2019). Despite their extensive applications, these measures have different statistical

power and descriptive capacity. They may even report inconsistent outcomes when

comparatively analyzed in different applications (Azevedo and Jorge 2007). Therefore, ML

developers need to evaluate the efficacy and choice of measures in their specific ARM model

9
development (Tan, Kumar, and Srivastava 2004). In this study, the research methods section

will define and report the efficacy of a set of candidate measures evaluated specifically for the

proposed BCRS.

2.4. BIM Content Management and Delivery (BCMD)


The technical premise of BIM entails digital objects that drive geometries, graphics, and

data for products or concepts relevant to the AEC industry (e.g., building components, spatial

elements and datums, and drafting symbols and annotations). With their pre-fined, flexible,

and re-usable templates, these objects are conducive to the productivity of design and

engineering tasks (Abdirad and Lin 2015, Abdirad and Dossick 2020). Therefore, the state-of-

the-art digital practice is centered on BCMD to develop, standardize, organize, and share BIM

objects across AEC teams (Holzer 2011, Afsari and Eastman 2014). However, because of the

sheer number of BIM objects that firms store in their digital libraries, it takes an average user

40-80 hours annually to locate and load BIM objects into models (UNIFI 2020). This motivated

the development and growth of BCMD tools and BCRSs to promote BIM productivity through

improved search and retrieval of objects. Most commercial tools, however, still rely on

inefficient manual efforts to query or browse libraries based on object classifications or meta-

data (Afsari and Eastman 2014). These systems predominantly employ Symbolic AI to process

the queried search terms using hard-coded reasoning logics (Lee et al. 2020).

Recent BCRSs have applied ML and KDD methods that use object data supplemented with

historical usage information (for collaborative filtering) or custom object features (for content-

based filtering). For instance, using collaborative filtering, Lee et al. (2020) applied

probabilistic matrix factorization to learn from and rank pre-fabricated components based on

the subjective relevancy scores that users assigned to the searched keywords and recommended

objects. Using content-based filtering, Zhang, Liu, and Al-Hussein (2018) trained Neural

Networks to classify lighting-fixtures with their product images and specifications, and to

recommend objects based on new images and requirements users loaded into the system.

10
Although these state-of-the-art solutions have utilized AI to improve conventional BCMD

workflows, they are still reliant on the collection of heterogeneous data such as explicit user

feedback, subjective ratings, or sample inputs from users.

The foregoing limitations motivated this study to develop an alternative approach to AI-backed

BCRS; this approach uses massive datasets of technical information and their context from

existing BIM models to generate recommendations. Accordingly, the importance of context

awareness for the proposed BCRS is in that it mitigates the need for (a) collecting heterogenous

non-BIM data (e.g., user ratings) and (b) requiring active queries or explicit inputs to generate

recommendations. In this development, the technical context entails (a) the type of technical

view within which a user carries out the BIM tasks (e.g., detail view, plan view, section view)

(b) existing BIM objects that appear on the technical view as potentially associable objects

(e.g., existing steel framing objects could be antecedents of welds or bolts as connection

objects).

3. Points of Departure and Development Gaps

• The conventional BCMD workflows require manual queries for object search and

retrieval. These tools are rigid and inefficient in finding content as they heavily rely on

human experience and cognition to recurringly filter and target coveted contents.

• The existing BCRSs primarily use inherent properties or indigenous features of

individual objects for content search and retrieval. This approach lacks the intelligence

to learn from exogenous relationships and logical associations between different objects

as they form coherent building information and construction assemblies in projects.

• The application of innovative recommender systems has advanced searching

mechanisms in many content-abundant fields (e.g., e-commerce, advertisement, and

manufacturing). Since the state-of-the-art BCRSs are still reliant on generating

additional data specifically for serving ML processes (e.g., user feedback and ratings),

11
the BIM practice has yet to fully benefit from AI to minimize development efforts and

better manage the massive search space for digital content and BIM resources.

• Research and practice have yet to fully explore the potential applications of ARM in

BCMD. As an unsupervised ML method, ARM has clear representation and relevancy

to both descriptive and predictive approaches (Zhang and Zhang 2003). The important

advantages of ARM over alternative ML methods are that it is fully unsupervised; it

does not require manual interpretation, explorations, and fine-tuning of hyper-

parameters for model building, and finally, building, validating, and implementing

ARM models are computationally inexpensive (Larose 2015).

4. Research Methods and Development Process


This research applied the sequential KDD tasks to build the ARM model (Zhang and

Zhang 2003). The tasks carried out to develop the AI prototype that empowered the BCRS

included: (1) Problem Formulation and Case Study Setting, (2) Data Collection, (3) Data

Cleaning, (4) Data Selection, (5) Data Transformation, (6) Model Development and Rule

Extraction, (7) Pattern Evaluation, (8) Model Deployment and User Interface Design, (9)

Maintenance Tasks, and (10) Knowledge Representation and Usage (Zhang and Zhang 2003).

4.1. Problem Formulation and Case Study Setting


The conventional recommender systems often utilize measures on similarity of users or

similarity of their historical preferences as the bases for recommending content (Falk 2019).

However, in professional practice, advanced recommender systems shall provide users with

content relevant to standards of practice and technical context in projects. Therefore, for

BCMD recommendations, technical pertinence shall take precedence over conventional

approaches used in e-commerce like user similarity, personalized ratings, or popularity of

contents. Accordingly, this study advocates knowledge-based and context-based BCRSs that

promote technical coherence in BIM workflows. However, identifying all useful technical

patterns and explicating them in a BCRS can be infeasible with manual analysis because these

12
patterns are buried and distributed in large datasets (Zhang and Zhang 2003). Hence, this

research plans to extend the applications of AI and ML to the BCMD domain to automate

knowledge extraction and minimize the inefficiencies associated with manual knowledge

coding and retrieval.

This study develops a hybrid BCRS accomplish the following objectives: (1) mine big data

analytics of BIM objects from the content used in historical BIM models; (2) measure the

statistical probability for co-occurrence and associations between BIM objects based on their

technical precedents in BIM views; and (3) use these statistics to automatically and

dynamically load and recommend a series of BIM objects to users in live BIM sessions given

the cues from the modeling context and human-computer interactions. Using ARM, this

exploration will analyze the instances of BIM objects shown on each technical view in project

models to discover rules about objects that co-exist. This process explores objects that tend to

be associated together in different technical contexts (e.g., building assemblies shown on BIM

plans, sections, and details) to discover patterns that resemble the following: Technical Views

Which Represented BIM Object X also Represented BIM Object Y. In contrast to the

conventional applications of ARM, this approach can explicate AEC technical knowledge

because BIM objects in project representational views logically associate to generate cohesive

assemblies and information.

This research used a case-study approach to develop and evaluate the proposed BCRS (Yin

2009). The case-study setting was a structural engineering firm in the United States. The

empirical setting of this case was especially well-suited for this study because (1) as an early

adopter of BIM, this firm had hundreds of completed digital models suitable for data mining;

(2) it had a large portfolio of diverse ongoing projects in different market sectors; (3) it

maintained and utilized a large BIM library with more than 1200 objects; and (4) it had highly

routinized digital workflows to reinforce firm-wide efficiency with the use of standardized

BIM objects. Although some characteristics of BIM data in this research are unique to the case-

13
study firm (e.g., object names and graphics), the development approach and the sequential

methods presented in this case are highly replicable and transferable to other firms. Therefore,

as an information-oriented case, this study can offer new insights and inform BIM practices on

the development of innovative BCRSs (Flyvbjerg 2001).

4.2. Data Collection

This research developed a custom software application (using C#) to collect BIM content usage

data from over 400 domestic and international projects within the case-study firm. These

projects used Autodesk Revit (versions 2014 to 2021) to develop data-rich BIM models,

drawings, and documentation. Using Application Programming Interface (API) in Revit, this

custom tool recorded data from over forty-five thousand (45,000) technical views (e.g., plans,

sections, and details). For each view, this application extracted data from the represented BIM

objects, and it captured their identifying information in a list. Put differently, this research

considered each BIM view a transaction of multiple associative BIM objects (Figure 2). The

authors stored the initial dataset in a JSON format for further data analysis in Python with data

engineering, visualization, and machine learning modules (e.g., Panadas, Scikit-Learn,

MLXTEND, Plotly, Matplotlib, Seaborn). Table 1 presents a snippet of the initial dataset.

14
Transaction_Set_View_23744 = {'AISC Tube Shapes-Section', 'Grid Head - Circle - Detail', 'Rebar-Detail-Bend-Z',
'Bolt - Side', 'AISC Wide Flange Shapes-Top', 'Weld Studs-Side', 'Squiggle Symb', ‘Rebar-Detail-Hook 135', 'Conn
Steel Shear-Elevation', 'AISC Tube Shapes-Side', 'AISC Wide Flange Shapes-Side', 'Conn Steel Plate-Side', 'Bolt -
Top', 'Anno Break Line - Single', 'Weld Symb', 'AISC Angle Shapes-Section', 'AISC Wide Flange Shapes-Section',
'Level Head - Target - Detail', 'Rebar Symb - Hooked - 90', 'Rebar Symb - Bend - 90', 'Rebar-Detail-Bar Section'}
Figure 2 – Extracting a BIM Object Information as a Transaction from BIM Views

4.3. Data Cleaning

The initial dataset contained (a) noisy data of non-standard and highly custom BIM objects

used in a small fraction of projects, (b) variations in the name of standard objects as BIM

libraries and implementation procedures evolved from 2014 to 2021, and (c) miscellaneous

BIM contents that did not represent technical BIM objects (e.g., text notes). The data cleaning

process, therefore, involved standardizing data from BIM objects and BIM views and filtering

out BIM data inapplicable to ARM (Table 1).

15
1 Table 1 – Snippets of the Cleaned Dataset
ViewID ViewType BIM VIEW TITLE Transaction Set (Unique BIM Objects) Set Length
View_1_10 SECTION BEAM - BEAM MOMENT CONNECTION {'Bolt - Head-Side ', 'AISC Wide Flange Shapes-Section ', 'Weld 7
Symb ', 'Gusset Plate-Rectangular-Elevation', 'AISC Wide Flange
Shapes-Side ', 'Anno Break Line - Single ', 'Conn Beam Cope'}
View_1_101 SECTION CONCRETE COLUMN SPIRAL REINFORCING {'Level Head - Target - Detail ', 'Rebar-Detail-Bar Section', 'Anno 4
Section - Detail ', 'Anno Break Line - Single '}
View_1_10121 SECTION SECTION {'Rebar-Detail-Bar Section', 'Level Head - Target - Detail ', 'Rough 9
Joint Segment ', 'Rebar-Detail-Stirrup Closed', 'Rebar-Embed End',
'PT Detail Anchor-Section ', 'Rebar-Detail-Bend-90', 'Anno Extent-
Line - Detail ', 'Anno Break Line - Single '}
View_1_10128 SECTION TRELLIS POST ANCHORAGE {'Bolt - Side ', 'Squiggle Symb ', 'Conn Steel Plate-Side', 'Anno 7
Section - Detail ', 'AISC Wide Flange Shapes-Section ', 'Weld Symb
', 'Anno Break Line - Single '}
View_1_1019 SECTION ELEVATOR RAIL SUPPORT POST {'Bolt - Head-Side ', 'Bolt - Side ', 'Anno Section - Detail ', 'Level Head 6
- Target - Detail ', 'Weld Symb ', 'Anno Break Line - Single '}

* Supplementary data collected for each BIM view but not shown here include: “ProjectName”, “ProjectPath”, “Revit FileName”, RevitVersion”, “Sheet Name”, “View
Revit ID”, “View GUID”, “Transaction List” (where multiple instances of BIM Objects may be listed – not unique), “List Length”
+ The final dataset (after data cleaning and data selection) included 30,320 rows. Due to space limitations, this table presents only several transaction sets from BIM views.
2

16
3 4.4. Data Selection
4 The preliminary exploration of BIM views collected for this study revealed the

5 importance of informed data selection for ARM. This analysis showed that (a) some

6 views represented only text with no BIM object (e.g., general notes, specifications), (b)

7 some views represented fewer than two unique BIM objects, and (c) some drafted views

8 with hundreds of BIM objects represented a collection of different construction details

9 (e.g., independent CMU wall and drywall sections drawn together). Accordingly, from

10 an ARM standpoint, these views either had no information on association of objects

11 (where fewer than two objects were used) or potentially implied incorrect associations

12 between BIM objects (where different construction assemblies were represented in a

13 single view). Therefore, this research selected those views that used two or more BIM

14 objects that logically associate in construction assemblies. In this process, the authors

15 selected 30,320 technical BIM views for further analysis and ARM. The number of BIM

16 objects these views represented ranged between 2 and 50 (i.e., set length as shown in

17 Table 1). Figure 3 shows the distribution of views and the number of BIM objects they

18 represented (mean = 15, median = 13).

19
20 * The Boxplot Shows Inter Quartile Range (IQR), i.e., Middle 50% of the Values
21 + The Curve Shows Kernel Density Estimate (KDE)
22 Figure 3 – Distribution of Count of BIM Objects Represented in BIM Views

17
23 4.5. Data Transformation
24 This research considered each BIM view a transaction list of multiple associative

25 BIM objects. As each view may represent multiple instances of an object (e.g., multiple

26 'AISC Wide Flange Shapes-Section' in Figure 2), the data transformation process

27 involved converting lists to immutable transaction sets of unique BIM objects (Table 1).

28 4.6. Model Development and Rule Extraction


29 The tasks to build a machine learning model for ARM included: (1) generating

30 frequent item-sets across transaction sets, and (2) quantifying the strength or likelihood

31 of associations each pair of frequent item-sets (Zhang and Zhang 2003). The authors

32 randomly split the dataset into training data (80% of views) and test data (20% of views).

33 This facilitated building the ARM model on the training data and validating the predictive

34 capacity of extracted association rules on the test data. To generate the item-sets of

35 interest from the training dataset (80% of views, i.e., 24,256 views), this research used

36 MLXTEND module in Python (Raschka 2018) with FP-Growth algorithm to process the

37 large and complex dataset of technical BIM contents. FP-Growth performs faster than

38 alternative algorithms (e.g., Apriori algorithm) in large datasets (Han, Pei, and Yin 2000).

39 Measuring the strength of association between item-sets is a crucial problem in ARM.

40 The classic ARM models use the general support-confidence framework to evaluate the

41 strength of association rules. The extant literature has found this framework inadequate

42 in many applications (Zhang and Zhang 2003), and therefore, explored and offered a

43 multitude of new measures to quantify association of item-sets (Hahsler 2015, Azevedo

44 and Jorge 2007). These measures often evaluate the statistical correlation, probabilistic

45 dependence, or general interestingness of associations. However, with different

46 definitions and formulas, these measures have different statistical power and predictive

47 capacity. Therefore, this study evaluates the efficacy of twelve relevant measures to

18
48 identify a measure that offers the highest predictive capacity for the BCRS in this study.

49 Table 2 outlines the definitions and formulas for these measures. This paper considered

50 both null-invariant and null-variant measures (Wu, Chen, and Han 2007) to account for

51 the possibility that many transactions may not represent certain BIM objects.

52 To manage the computational performance for the analysis and BCRS, the FP-Growth

53 algorithm generated frequent item-sets that occurred in at least 0.1-percent of the BIM

54 views (i.e., minimum support for X or Y in the association rules = 0.001). With this

55 criterion, this process retrieved 18,740 frequent item-sets from the dataset. Figure 4 shows

56 the top 20 frequently used BIM objects in the dataset.

57
58 Figure 4 – Top 20 BIM Objects Found in the Dataset
59 After generating frequent item-sets, the authors calculated the strength of associations

60 between all combinations of the item-sets using the measures reported in Table 2. This

61 analysis returned 672,156 combinations with the assumption that the strength of X → Y

62 may not be equal to the strength of Y → X because some measures are not bidirectional.

63 Table 3 presents a snippet of the extracted rules and values for their strength.

19
64 Table 2 – Measures to Evaluate the Strength of Association Rules
Measures Definition Formulas Interpretive Limitations
Support Frequency of itemset X appearing in all transactions |X(t)| The rare item problem when
supp(X) =
(range: [0,1]). |T| distribution is uneven, minimum
Frequency of X → Y, i.e., item-sets X and Y appearing |{X ∪ Y}(t)| support criteria is too low, and it is
supp(X → Y) =
|T|
together in transactions (Agrawal, Imieliński, and the sole criteria.
Swami 1993).
Confidence Frequency of itemset Y appearing in transactions that supp(X → Y) It is a directed measure. It is highly
conf (X → Y) =
contain X (range: [0,1]) (Agrawal, Imieliński, and supp(X) sensitive to supp(Y). It is not *null-
Swami 1993). invariant.
Lift The rate at which X and Y occur together more often conf (X → Y) Sensitive to small databases wherein
lift (X → Y) =
than they were statistically independent (range: [0,∞]). supp(Y) item-sets with low support may co-
A lift value of 1 suggests X and Y independence (Brin occur together a few times by
et al. 1997). chance and generate high lift values.
It is not *null-invariant.
Leverage Also known as Piatetsky-Shapiro measure. The levg (X → Y) = supp(X → Y) − supp(X )supp(X ) Similar to Lift. It is not *null-
difference between X and Y occur together in the data invariant.
and the probability that X and Y were statistically
independent (range: [−1,1]). A leverage value of 0
suggests independence (Piatetsky-Shapiro 1991).
Conviction Reports the rate at the X is possible to occur without Y 1 − supp(Y) It is a directed measure. It is not
conv (X → Y) =
if they were dependent divided by the actual frequency 1 − conf (X → Y) *null-invariant.
of X occurring without Y (range: [0,∞]). A conviction
value of 1 indicates independence (Brin et al. 1997).
* Null Invariance means that the value does not change with the number of null transactions (transactions that do not contain X and Y).
65

20
66 Table 2 (continued) – Measures to Evaluate the Strength of Association Rules
Measures Definition Formulas Interpretive Limitations
Correlation Covariance between the two items divided by their corcoef (X → Y) Measure of linear interdependency.
Coefficient standard deviation (range: [−1,1]). A value of 0 supp(X → Y) − supp(X) ∗ supp(Y) It is not *null-invariant.
=
suggests independence. Also known as Phi correlation �supp(X) ∗ supp(Y) ∗ �1 − supp(X)� ∗ �1 − supp(Y)�
(Tan, Kumar, and Srivastava 2004).
2
Chi-Squared Test of independence between the antecedent and the supp(X) ∗ supp(Y) Sensitive to infrequent occurrences
�supp(X → Y) − �
|T|
consequent. Larger values indicate stronger 𝑐𝑐ℎ𝑖𝑖(X → Y) = of item-sets in large datasets. It is
supp(X → Y)
relationships (range: [0,∞]) (Brin, Motwani, and |T| not *null-invariant.
Silverstein 1997).
Cosine Modified cosine similarity measure for association 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(X → Y) *Null invariant measure of
𝑐𝑐𝑐𝑐𝑐𝑐(X → Y) =
rule mining (range: [0,1], 0.5 means no correlation) �supp(X) ∗ supp(Y) correlation.
(Tan, Kumar, and Srivastava 2004).
Jaccard Jaccard similarity between the two sets of transactions 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(X → Y) *Null invariant measure of
𝑗𝑗𝑗𝑗𝑗𝑗(X → Y) =
that contain the items in X and Y (range: [0,1]) (Tan, supp(X) + supp(Y) − 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(X → Y) dependence.
Kumar, and Srivastava 2004).
Klosgen Test of independence between the antecedent and the klosgen(X → Y) = �𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(X → Y)(conf (X → Y) *Null invariant measure of
consequent (range: [−1,1]). A value of 0 suggests − 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑌𝑌)) dependence.
independence (Tan, Kumar, and Srivastava 2004).
Kulczynski Measure of interestingness of relationships (range: conf (X → Y) + conf (Y → X) *Null-invariant measure with a
Kulc(X → Y) =
[0,1]). A Kulczynski value of 0.5 suggests neutral or 2 preference for skewed patterns.
uninteresting measure (Wu, Chen, and Han 2010).
Loevinger Also known as Certainty Factor (CF). It measures the conf (X → Y) − supp (Y ) *Null invariant measure of
𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿(X → Y) =
probability that Y is in a transaction that contains X 1 − supp (Y ) dependence.
(range: [-1,1]). A Loevinger values of 0 suggests
independence (Berzal et al. 2002).
* Null Invariance means that the value does not change with the number of null transactions (transactions that do not contain X and Y).

21
67
68 Table 3 – Snippets of Association Rules for Select BIM Objects
𝐗𝐗 → 𝐘𝐘 (Rules) 1 2 3 4
X* Y* SUPPORT SUPPORT SUPPORT CONFIDENCE LIFT LEVERAGE
(Antecedent) (Consequent) X Y 𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘
Anno Break Line - Single Weld Symb 0.915 0.370 0.350 0.380 1.030 0.010

Continued Below
Weld Symb Conn Steel Plate-Side 0.366 0.160 0.130 0.350 2.120 0.070
AISC Wide Flange Shapes-Side Weld Symb 0.136 0.370 0.100 0.770 2.100 0.050
AISC Tube Shapes-Side Weld Symb 0.098 0.370 0.080 0.830 2.280 0.050
Rebar-Detail-Bar Section Rebar-Detail-Hook 135 0.236 0.110 0.050 0.200 1.840 0.020
PT Detail Anchor-Section Rebar-Detail-Bar Section 0.019 0.240 0.010 0.770 3.280 0.010
Rebar-Detail-Stirrup Open Rebar-Detail-Bend-90 0.082 0.060 0.010 0.160 2.740 0.010
CMU-Section CMU-Bond Beam-Section 0.019 0.010 0.010 0.550 49.790 0.010
69
5 6 7 8 9 10 11 12
CONVICTION COSINE JACCARD KLOSGEN KULCZYNSKI CHI-SQUARED CORRELATION LOEVINGER
𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘 𝐗𝐗 → 𝐘𝐘 COEFFICIENT 𝐗𝐗 → 𝐘𝐘
𝐗𝐗 → 𝐘𝐘
Continued from Above

1.020 0.600 0.370 0.010 0.660 30319.030 0.080 0.020


1.290 0.520 0.320 0.070 0.560 30319.530 0.380 0.220
2.720 0.470 0.260 0.130 0.530 30319.520 0.330 0.630
3.790 0.430 0.210 0.130 0.530 30319.560 0.320 0.740
1.120 0.300 0.160 0.020 0.320 30319.460 0.160 0.100
3.360 0.220 0.060 0.070 0.420 30319.700 0.180 0.700
1.120 0.190 0.100 0.010 0.190 30319.640 0.130 0.110
2.190 0.710 0.530 0.050 0.740 30319.980 0.710 0.540
* For clarity, the item-sets presented in this table held only a single BIM object. The length of frequent item-sets (those with support > 0.001) varied between 1 to 10.

22
+ This dataset includes 672,156 rules; this study did not assume bidirectionality in the rules. Due to space limitations, this table presents only several rules.
70

23
71 4.7. Pattern Evaluation Using BIM Process Simulation
72 This research used the test dataset (20% of views, i.e., 6064 views) to simulate forty-

73 eight (48) potential configurations of a BCRS that recurringly offered BIM objects to a

74 virtual user in a BIM implementation process (sequentially picking and placing objects

75 on a view). These 48 configurations aimed at evaluating the efficacy and predictive

76 capacity of the twelve (12) association measures (Table 3) by providing the user with N

77 recommendations (N = 24, 30, 36, or 42; i.e., four settings for the count of

78 recommendations; 4 settings × 12 measures = 48 configurations).

79 Considering each view in the training data as a transaction set (Figure 2), this simulation

80 analyzed the percentage of objects that the BCRS accurately predicted and presented to

81 the user to assemble the transaction set. This simulation carried out the following steps

82 for each view in the test dataset:

83 (1) Assuming that the BIM view is empty when a user begins the modeling process,

84 the system recommends the top N frequently used BIM objects to the user.

85 (2) If there is no overlap (or intersection) between the N recommendations and the

86 transaction set needed for the BIM view, the simulation stops and the efficacy for

87 the BCRS is zero (0).

88 (3) If there is overlap (or intersection) between the N recommendations and the

89 transaction set of the BIM view, the user selects an object needed in the

90 transaction set.

91 (4) Considering the previously selected object(s) as antecedents (X), the BCRS

92 presents the top N consequents (Y) with the highest association strength (based

93 on association measures presented in Table 3).

24
94 (5) Repeat steps 3 and 4 until (a) there is no overlap (or intersection) between the

95 recommendations and the transaction set of the BIM view, or (b) all objects

96 needed for assembling the transaction set are selected and used.

97 (6) Calculate the efficacy of the BCRS by dividing the number of appropriately

98 selected BIM objects by the number BIM objects needed in the transaction set.

99 Table 4 shows the efficacy of the simulated recommended system with 48 configurations.

100 The results show that Kulczynski has the highest average efficacy and the least standard

101 deviations and variance among the measures. With this measure, when N = 30 or more,

102 the simulation predicted and included, in average, 80% or more of the required BIM

103 objects in the sequential recommendations. Figure 5 visualizes these results with

104 boxplots, depicting the clear advantage that Kulczynski has over the alternative measures.

105 The authors used the Analysis of Variance using One-Way Type II ANOVA with

106 Ordinary Least Squares to test for statistical significance of the results among the

107 measures (Park 2009). These analyses showed significant F-statistics (p-value <0.001)

108 and rejected the null hypothesis that all measures have similar average efficacy in the

109 BCRS. The pairwise comparison of measures using the Tukey’s Honest Significant

110 Difference (HSD; Abdi and Williams 2010) confirmed that Kulczynski had statistically

111 significant differences in efficacy when compared to other measures (p-value < 0.001;

112 see Table 5). Therefore, the authors selected Kulczynski as the proper measure that

113 consistently yields high efficacy across all configurations.

114 It is important to note that this study did not apply precision and recall as model

115 evaluation metrics because these measures are more suited for supervised ML methods

116 wherein the efficacy of predictions are often treated as binary. In ARM, as an

117 unsupervised method, the relevancy of recommendations is probabilistic and therefore,

118 the authors developed and implemented a custom simulation-based evaluation method.

25
119 Table 4 – Statistics for the Efficacy of the 48 Simulated Configurations using the Test Dataset
Efficacy for N Recommendations
Metric N = 24 N = 30 N = 36 N = 42
Mean Median STD Mean Median STD Mean Median STD Mean Median STD
Chi-Squared 0.415 0.333 0.246 0.435 0.375 0.250 0.476 0.444 0.243 0.510 0.500 0.241
Confidence 0.518 0.500 0.248 0.531 0.500 0.251 0.550 0.500 0.257 0.569 0.500 0.258
Conviction 0.554 0.500 0.244 0.574 0.500 0.246 0.596 0.500 0.250 0.620 0.600 0.248
Correl-Coeff 0.565 0.500 0.242 0.584 0.500 0.245 0.608 0.600 0.243 0.632 0.625 0.244
Cosine 0.534 0.500 0.251 0.560 0.500 0.258 0.584 0.500 0.264 0.614 0.571 0.266
Jaccard 0.521 0.500 0.248 0.540 0.500 0.253 0.561 0.500 0.258 0.589 0.500 0.262
Klosgen 0.535 0.500 0.248 0.548 0.500 0.250 0.571 0.500 0.254 0.588 0.500 0.256
Kulczynski 0.743 0.777 0.209 0.802 0.860 0.195 0.841 0.910 0.181 0.873 0.967 0.171
Leverage 0.541 0.500 0.250 0.571 0.500 0.248 0.591 0.500 0.249 0.621 0.600 0.249
Lift 0.415 0.333 0.246 0.435 0.375 0.250 0.475 0.444 0.243 0.510 0.500 0.241
Loevinger 0.554 0.500 0.244 0.574 0.500 0.246 0.595 0.500 0.250 0.620 0.600 0.248
Support 0.556 0.500 0.265 0.580 0.500 0.270 0.608 0.500 0.275 0.639 0.667 0.276
120

121

26
Table 5 – Tukey’s HSD Test of Significance for Differences in Mean Efficacy for ARM Measures
Values of Mean Difference Upper Triangular Matrix Shows Results for N = 24
ARM Measure Support Correl-Coeff Lift Conviction Loevinger Leverage Cosine Jaccard Klosgen Confidence Kulczynski Chi-Squared
Support 0.009** 0.142** 0.002 0.002 0.016** 0.022** 0.035** 0.022** 0.038** 0.187** 0.141**
Correl-Coeff 0.004 0.151** 0.011** 0.011** 0.025** 0.031** 0.044** 0.03** 0.047** 0.178** 0.15**
Lift 0.145** 0.148** 0.139** 0.139** 0.126** 0.12** 0.107** 0.12** 0.103** 0.328** 0.0
Conviction 0.006 0.01** 0.139** 0.0 0.013** 0.02** 0.033** 0.019** 0.036** 0.189** 0.139**
Loevinger 0.006 0.01** 0.139** 0.0 0.013** 0.02** 0.033** 0.019** 0.036** 0.189** 0.139**
Leverage 0.009** 0.013** 0.136** 0.003 0.003 0.006 0.019** 0.006 0.023** 0.202** 0.126**
Cosine 0.02** 0.024** 0.125** 0.014** 0.014** 0.011** 0.013** 0.001 0.016** 0.209** 0.119**
Jaccard 0.04** 0.044** 0.105** 0.034** 0.034** 0.031** 0.02** 0.014** 0.003 0.222** 0.106**
Klosgen 0.031** 0.035** 0.113** 0.025** 0.025** 0.022** 0.011** 0.009** 0.017** 0.208** 0.12**
Confidence 0.049** 0.052** 0.096** 0.042** 0.043** 0.039** 0.028** 0.009** 0.017** 0.225** 0.103**
Kulczynski 0.223** 0.219** 0.367** 0.229** 0.228** 0.232** 0.243** 0.263** 0.254** 0.271** 0.328**
Chi-Squared 0.144** 0.148** 0.0 0.138** 0.139** 0.135** 0.124** 0.104** 0.113** 0.096** 0.367**
Lower Triangular Matrix Shows Results for N = 30
Significant of Different in Means: * p-value < 0.01; ** p-value < 0.001
+ the pairwise post-hoc comparison of measures carried out after ANOVA with significant F-statistics (p-value <0.001)

Values of Mean Difference Upper Triangular Matrix Shows Results for N = 36


ARM Measure Support Correl-Coeff Lift Convictio Loevinger Leverage Cosine Jaccard Klosgen Confidenc Kulczynski Chi-Squared
Support 0.001 0.133* 0.013** 0.013** 0.017** 0.024** 0.047** 0.037** 0.058** 0.232** 0.132**
Correl-Coeff 0.007(0. 0.133* 0.012** 0.013** 0.016** 0.024** 0.047** 0.037** 0.057** 0.233** 0.132**
Lift 0.129** 0.122** 0.121** 0.12** 0.117** 0.109** 0.086** 0.096** 0.075** 0.366** 0.001
Conviction 0.019** 0.012** 0.11** 0.001 0.004 0.012** 0.035** 0.025** 0.045** 0.245** 0.12**
Loevinger 0.02** 0.012** 0.109* 0.0 0.003 0.011** 0.034** 0.024** 0.045** 0.246** 0.119**
Leverage 0.018** 0.011** 0.11** 0.001 0.001 0.008 0.031** 0.021** 0.041** 0.249** 0.116**
Cosine 0.025** 0.018** 0.104* 0.006 0.006 0.007 0.023** 0.013** 0.034** 0.257** 0.108**
Jaccard 0.05** 0.043** 0.079* 0.031** 0.031** 0.032** 0.025** 0.01** 0.011** 0.28** 0.085**
Klosgen 0.051** 0.044** 0.077* 0.032** 0.032** 0.033** 0.026** 0.001 0.021** 0.27** 0.095**
Confidence 0.07** 0.063** 0.059* 0.051** 0.051** 0.052** 0.045** 0.02** 0.019** 0.29** 0.074**
Kulczynski 0.234** 0.241** 0.363* 0.253** 0.253** 0.252** 0.259** 0.284** 0.285** 0.304** 0.365**
Chi-Squared 0.129** 0.121** 0.0 0.109** 0.109** 0.11** 0.104** 0.078** 0.077** 0.058** 0.363**
Lower Triangular Matrix Shows Results for N = 42
Significant of Different in Means: * p-value < 0.01; ** p-value < 0.001
+ the pairwise post-hoc comparison of measures carried out after ANOVA with significant F-statistics (p-value <0.001)

122

27
123
124 —— Median … Mean
125 * Colored Boxes Show Inter Quartile Ranges (IQR), i.e., Middle 50% of the Values
126 + ANOVA and Tukey’s HSD Confirmed Statistically Significant Differences Between Kulczynski and
127 Other Measures
128 Figure 5 – Visualization for the Efficacy of the 48 Simulated Configurations using the Test Dataset

28
129 Since BIM views may represent different counts of objects in project settings (Figure 3), the

130 authors measured the average efficacy of association measures for views of different

131 complexity (in terms of count of BIM objects). Figure 6 visualizes the results for Kulczynski,

132 Support, Correlation-Coefficient, and Conviction (as the top candidate measures). This

133 analysis further confirmed that Kulczynski has a clear advantage over the other measures.

134 Figure 6 shows also that the efficacy of the BCRS keeps relatively consistent between 80%

135 and 90% for BIM views of different complexity.

136 The simulations showed that the change in the number of recommendations (N) can make up

137 to 13% difference in the average efficacy of Kulczynski (Table 4). The choice of N in this

138 development is not only important for the BCRS’s efficacy, it also affects user-experience as

139 a relatively small interface must accommodate N objects for presentation to the user. After

140 consultation with BIM experts in the case-study firm, the authors deployed the BCRS with N

141 =30 to keep the interface concise and yet effective (>80 % efficacy). The following section

142 discusses the development of the user-interface.

143

29
144
145
146 Figure 6 – The Efficacy of Simulated Configurations across BIM Views of Different Complexity in the
147 Test Dataset (for clarity, only top 4 candidate measures are presented)

30
148 4.8. Model Deployment and User Interface Design
149 The simulation of BIM implementation tasks (sequentially placing objects on a view)

150 showed that Kulczynski (with N =30 or more) is a practical measure to support the proposed

151 BCRS. Therefore, the authors used the final ARM model (Table 3) as the backbone data of a

152 custom Revit add-in. Using the Revit API and C#, the authors programmed the add-in as an

153 IExternalApplication to (a) create a Graphical User Interface (GUI) to customize the user

154 experience in Revit, and (b) trigger context-aware events when users open a BIM view (using

155 ViewActivated Event), update content in the view (using DocumentChanged Event) or select

156 an object in the view (using Idling Event).

157 The BCRS add-in works as a dockable pane attached to the user-interface in Revit (Figure 7;

158 Part 2). This pane is composed of two sections: (1) AI Object Recommender (Figure 7; Part

159 2.A), and (2) Logical Object Browser (Figure 7; Part 2.B). The AI Object Recommender is

160 responsive to the BIM implementation context to recurringly update recommendations when

161 users activate a BIM view, change its content, or select an object (i.e., reading from

162 ViewActivated or DocumentChanged, or Idling Events). This recommender shows BIM

163 objects with pre-saved thumbnail images. Revisiting the ARM model (Table 3), the objects

164 represented in the active view are antecedents (Figure 7; Part 1) and the recommended objects

165 are their consequents (Figure 7; Part 2.A). To improve relevancy of recommendations, this

166 study used multiple ARM models for different view types (e.g., ARM for Elevations, Sections,

167 Details, or Plan. See ViewType in Table 1). The Logical Object Browser is responsive to the

168 objects the user selects in the active view or in the AI Object Recommender, and it lists logical

169 alternatives when parametric BIM objects are built to accommodate two or more alternatives.

170 For instance, as shown in Figure 7, Standard C-shaped Steel Channel selected in Part 2.A

171 generated the list of alternative designations (e.g., C12x30, C9x15) in Part 2.B.

31
172
173 Figure 7 – Dockable GUI for the BCRS with Revit User Interface.
174 Part 1: Active View Shows Technical View in BIM and BIM Objects Used as Antecedents; Part 2: BCRS Add-in.
175 Part 2.A: AI Object Recommender Showing Consequents; Part 2.B: Logical Object Browser Showing Applicable Logical Alternatives to a Selected Element.

32
176 4.9. Maintenance Tasks for the Recommender System
177 Regular maintenance is indispensable for the proposed BCRS because the standards of

178 engineering practice, BIM standards, and BIM platforms change over time. Table 6 lists the

179 maintenance tasks that the authors identified during the protype development and

180 implementation phases.

181 Table 6 – Planned Maintenance Tasks in Response to Changes in BIM Implementation Ecosystem
Changes in BIM Implementation Ecosystem Maintenance Tasks
Standards of Design and Engineering Practice Recurringly collect data from BIM views in ongoing
and future BIM projects to capture and mine the latest
building assemblies and association of BIM objects.

Rebuild the ARM model based on the latest BIM data.


BIM Standards Clean and standardize BIM object data in the ARM
model as new objects are added to the BIM library, old
objects are decommissioned, and existing objects are
changed (e.g., updates in naming conventions).
BIM platforms (i.e., Revit) Update the codebase for (1) the custom data collection
application, and (2) the BCRS add-in developed for
Revit when its Software Development Kit (SDK) and
API evolve with new Revit versions.

182 4.10. Knowledge Representation and Usage


183 The knowledge generated by the ARM prototype has two primary stakeholders: (1) BIM

184 users, who create and maintain digital building models in projects, and (2) BIM managers, who

185 standardize BIM workflows and libraries for efficiency and consistency of practice across

186 projects. For BIM users, the interface of the BCRS represents knowledge regarding associable

187 BIM objects given a technical context at hand in projects (Figure 7). For BIM managers, the

188 ARM output (e.g., Table 3) can offer knowledge regarding (a) the most frequently used BIM

189 objects (ranking item-sets by support), (b) the least frequently used BIM objects (ranking item-

190 sets by support), (c) the strongly associated BIM objects (ranking item-sets by strength of ARM

191 measures). BIM managers can translate this knowledge into actionable insights on the potential

192 changes to BIM content, standards, and processes. For instance, the BIM managers may create

33
193 a new set of composite content to combine strongly associated BIM objects (e.g., embedding

194 the ‘Rebar Coupler’ object into a parametric ‘Rebar’ object to facilitate showing/hiding

195 embedded objects instead of maintaining and using two different objects). The BIM managers

196 may also pre-load a set of most frequently used BIM objects into standard project templates to

197 mitigate the need to constantly access digital BIM libraries and wait for objects to load. The

198 BIM managers may also exclude a set of least frequently used BIM objects from standard

199 project templates to keep the size of BIM models manageable and avoid unfavorable user-

200 experience (e.g., sluggish BIM platforms when working with large models). Therefore, the

201 knowledge on the state of BIM object usage across projects can inform the BIM managers on

202 priority tasks of maintaining BIM content libraries and project artifacts.

203 5. Experimental Implementation


204 To evaluate the BIM process efficiency achieved from the BCRS, the authors carried out

205 an experiment in the case-study firm. In this experiment, the authors invited five BIM users to

206 create a sample of 50 BIM views randomly selected from the dataset. To ensure these views

207 captured the conventional BIM tasks in the firm, the sampling targeted the interquartile range

208 for complexity of views (i.e., showing 8 to 20 BIM objects; Figure 3).

209 The experiment used the One-Group Pretest-Posttest design (Campbell and Stanley 2015),

210 wherein the users created BIM views first without using the BCRS (pretest) and then re-created

211 the same BIM views using the BCRS (posttest). The goal was to time the users, calculate the

212 average time savings achieved by using the BCRS, and determine whether there was a

213 statistically significant difference between the time spent on pretest and posttest. The posttest

214 was carried out after a 5-min demonstrative introduction to the BCRS’s GUI since it was the

215 first exposure of the users to this tool. Controlling for the effect of user characteristics and

216 maturation (e.g., industry experience, technical knowledge, and familiarity with BIM objects),

217 the authors invited users with seven or more years of experience in BIM modeling at the firm.

34
218 To minimize the effect of history, instrumentation, and testing, the pretest and posttest for each

219 BIM view were carried out one week apart in a controlled office setting at the same daytime

220 hour using the same BIM platform for all users (Campbell and Stanley 2015). The total time

221 spent on pretest was 14.35 hours and the total time spent on posttest was 12.7 hours (~27 hours

222 active time for the experiment).

223 To analyze the experimental data, the authors carried out a paired sample t-test to compare

224 work on BIM views with and without the recommendation system. This analysis showed that

225 using the BCRS, in average, saved 15 percent of the active time spent on the BIM views.

226 Furthermore, in this sample, there was a significant difference in the average time spent on

227 BIM views with the BCRS (mean = 15.1 minutes, STD = 11.8) and without the BCRS (mean

228 = 17.2 minutes, ST = 11.35) in the t-test (mean difference = 2.1; t(49) = 7.66; p < 0.001). In

229 summary, these results show that, by dynamically predicting and presenting associative BIM

230 objects, this BCRS can improve the efficiency of conventional BIM workflows with

231 statistically significant outcomes.

232 6. Discussion
233 Despite the exponential growth of the literature on BIM applications in the AEC industry,

234 research on BCRSs is still in its infancy. This section discusses the results and significance of

235 this study in comparison to the narrow body of literature on this topic. This study developed

236 an innovative AI-backed BCRS to bridge some critical development gaps in the recent BCMD

237 solutions and BCRSs. First, the proposed prototype mines BIM content association data from

238 existing projects to mitigate the need for collecting heterogenous non-BIM data. This marks a

239 departure from existing solutions that require subjective user ratings (e.g., Lee et al. 2020) or

240 product images (Zhang, Liu, and Al-Hussein 2018) to inform the AI-engine on the relevancy

241 or importance of BIM objects. Second, the proposed prototype automatically reads the

242 technical context of live BIM sessions (e.g., type of technical view; existing objects in a BIM

35
243 model)) and offers recommendations accordingly. Recent solutions, however, rely on manual

244 user queries (e.g., search strings; Lee et al. 2020) or other inputs (e.g., uploading images;

245 Zhang, Liu, and Al-Hussein 2018) as the basis to trigger the AI-engine and generate

246 recommendations. Third, by using ARM as the ML method, the proposed prototype is

247 completely unsupervised, is computationally efficient and inexpensive for model building, and

248 it does not require iterative development cycles, manual exploration, and fine-tuning of hyper-

249 parameters. In addition to the streamlined development process and improved user-experience,

250 the simulation and experimental implementation of the proposed protype achieved significant

251 efficacy (>80% accuracy in prediction) and efficiency (saving 15% on man-hours). Although

252 the existing research to date has not yet provided similar statistics to facilitate a direct

253 quantitative comparison of the prototypes (Lee et al. 2020, Zhang, Liu, and Al-Hussein 2018),

254 the promising results in this study show that BCDM can benefit from future AI developments

255 to improve the user-experience and efficiency in BIM implementation in AEC domains.

256 7. Conclusion
257 This research developed and implemented a context aware AI-backed BCRS in a case-

258 study engineering firm to minimize the manual efforts commonly associated with the

259 development and use of existing BCRSs. In particular, the proposed BCRS leveraged big data

260 from over 30,000 technical BIM views to build an unsupervised ARM model and explicate the

261 strength of associative relationships among BIM objects. This BCRS dynamically provides

262 users with a set of BIM objects that are probabilistically associable with the cues from modeling

263 context (e.g., view type, existing objects in the model) and human-computer interactions in live

264 BIM sessions (e.g., objects selected by the user). Based on the simulation carried out on over

265 6000 actual BIM views, the BCRS achieved the efficacy of over 80% in predicting the sought-

266 after BIM objects needed for sequentially creating the views. The field experiment of the BCRS

36
267 in the case-study firm showed that it can save in average 15% of the time spent on conventional

268 BIM modeling tasks.

269 This study carries noteworthy contributions to research and developments in engineering

270 informatics, especially BIM processes and technologies. First, this study offers an innovative

271 approach to learning from historical BIM data for BCMD. This approach leverages coherent

272 technical BIM precedents while mitigating the need for actively collecting heterogenous data

273 like user similarity, personalized ratings, or popularity of contents to serve the recommendation

274 system. Second, this study demonstrates the application of BCRS in live BIM sessions, wherein

275 the recommender system can automatically respond to human-computer interactions and

276 technical context in models. Earlier prototypes dominantly need recurring input from users to

277 offer recommendations. With these contributions, this study is emphatic in that

278 recommendation systems must align with the nature of data, users, tasks, and work domain to

279 be effective (Falk 2019). Therefore, the authors caution the BIM practitioners against merely

280 adopting conventional e-commerce recommendations systems for such technical tasks as

281 building information modeling.

282 Although the research and development methods presented in this case-study can be replicable

283 and transferrable to other cases, the generalizability of achieved efficacy and time-savings is

284 subject to certain limitations. For instance, BIM standards and workflows may significantly

285 vary across firms and across AEC disciplines (e.g., the number of BIM objects maintained in

286 BIM libraries and used in projects). Furthermore, since this BCRS relies on BIM precedents,

287 it is biased toward available data from historical cases. Therefore, similar to other

288 recommender systems, this BRCS may not account for new BIM objects or new digital

289 building assemblies that AEC firms introduce into their design and engineering practice. As

290 discussed earlier in section 4.9, recurring maintenance tasks must be carried out to address such

291 limitations or biases in the system.

37
292 Funding

293 This research did not receive any specific grant from funding agencies in the public,

294 commercial, or not-for-profit sectors.

295 8. References
296 Abdi, Hervé, and Lynne J Williams. 2010. "Tukey’s honestly significant difference (HSD)
297 test." Encyclopedia of research design 3 (1):1-5.

298 Abdirad, H., and C. Dossick. 2020. "Rebaselining Asset Data for Existing Facilities and
299 Infrastructure." Journal of Computing in Civil Engineering 34 (1):05019004. doi:
300 10.1061/(ASCE)CP.1943-5487.0000868.

301 Abdirad, H., and Kenyu Lin. 2015. "Advancing in Object-Based Landscape Information
302 Modeling: Challenges and Future Needs." 2015 International Workshop on Computing
303 in Civil Engineering, Austin, TX.

304 Afsari, Kereshmeh, and Chuck Eastman. 2014. "Categorization of building product models
305 in BIM Content Library portals." Blucher design proceedings 1 (8).

306 Agrawal, Rakesh, Tomasz Imieliński, and Arun Swami. 1993. "Mining association rules
307 between sets of items in large databases." Proceedings of the 1993 ACM SIGMOD
308 international conference on Management of data.

309 Agrawal, Rakesh, and Ramakrishnan Srikant. 1994. "Fast algorithms for mining association
310 rules." Proc. 20th int. conf. very large data bases, VLDB.

311 Akhavian, Reza, and Amir H. Behzadan. 2015. "Construction equipment activity
312 recognition for simulation input modeling using mobile sensors and machine learning
313 classifiers." Advanced Engineering Informatics 29 (4):867-877. doi:
314 https://doi.org/10.1016/j.aei.2015.03.001.

315 Akinosho, Taofeek D., Lukumon O. Oyedele, Muhammad Bilal, Anuoluwapo O. Ajayi,
316 Manuel Davila Delgado, Olugbenga O. Akinade, and Ashraf A. Ahmed. 2020. "Deep
317 learning in the construction industry: A review of present status and future
318 innovations." Journal of Building Engineering 32:101827. doi:
319 https://doi.org/10.1016/j.jobe.2020.101827.

320 Amer, Fouad, and Mani Golparvar-Fard. 2021. "Modeling dynamic construction work
321 template from existing scheduling records via sequential machine learning." Advanced
322 Engineering Informatics 47:101198. doi: https://doi.org/10.1016/j.aei.2020.101198.

323 Azevedo, Paulo J, and Alípio M Jorge. 2007. "Comparing rule measures for predictive
324 association rules." European Conference on Machine Learning.

38
325 Berzal, Fernando, Ignacio Blanco, Daniel Sánchez, and Maria-Amparo Vila. 2002.
326 "Measuring the accuracy and interest of association rules: A new framework."
327 Intelligent Data Analysis 6 (3):221-235.

328 Bilal, Muhammad, and Lukumon O. Oyedele. 2020. "Guidelines for applied machine
329 learning in construction industry—A case of profit margins estimation." Advanced
330 Engineering Informatics 43:101013. doi: https://doi.org/10.1016/j.aei.2019.101013.

331 Braun, Alex, and André Borrmann. 2019. "Combining inverse photogrammetry and BIM
332 for automated labeling of construction site images for machine learning." Automation
333 in Construction 106:102879. doi: https://doi.org/10.1016/j.autcon.2019.102879.

334 Brin, Sergey, Rajeev Motwani, and Craig Silverstein. 1997. "Beyond market baskets:
335 Generalizing association rules to correlations." Proceedings of the 1997 ACM
336 SIGMOD international conference on Management of data.

337 Brin, Sergey, Rajeev Motwani, Jeffrey D Ullman, and Shalom Tsur. 1997. "Dynamic
338 itemset counting and implication rules for market basket data." Proceedings of the 1997
339 ACM SIGMOD international conference on Management of data.

340 Campbell, Donald T, and Julian C Stanley. 2015. Experimental and quasi-experimental
341 designs for research: Ravenio Books.

342 Chen, Meng-Hui, Chin-Hung Teng, and Pei-Chann Chang. 2015. "Applying artificial
343 immune systems to collaborative filtering for movie recommendation." Advanced
344 Engineering Informatics 29 (4):830-839. doi:
345 https://doi.org/10.1016/j.aei.2015.04.005.

346 Cheng, Jack C. P., Weiwei Chen, Keyu Chen, and Qian Wang. 2020. "Data-driven
347 predictive maintenance planning framework for MEP components based on BIM and
348 IoT using machine learning algorithms." Automation in Construction 112:103087. doi:
349 https://doi.org/10.1016/j.autcon.2020.103087.

350 Dayan, Aviram, Guy Katz, Naseem Biasdi, Lior Rokach, Bracha Shapira, Aykan Aydin,
351 Roland Schwaiger, and Radmila Fishel. 2011. "Recommenders benchmark
352 framework." Proceedings of the fifth ACM conference on Recommender systems.

353 Falk, Kim. 2019. Practical Recommender Systems. Shelter Island: Manning Publications.

354 Flach, Peter. 2012. Machine learning: the art and science of algorithms that make sense of
355 data: Cambridge University Press.

356 Flyvbjerg, B. 2001. Making Social Science Matter: Why Social Inquiry Fails and How it
357 Can Succeed Again: Cambridge University Press.

358 Francois, Chollet. 2017. Deep learning with Python. Shelter Island: Manning Publications.

39
359 Hahsler, Michael. 2015. "A probabilistic comparison of commonly used interest measures
360 for association rules." accessed 2020/12/20.
361 http://michael.hahsler.net/research/association_rules/measures.html.

362 Han, Jiawei, Jian Pei, and Yiwen Yin. 2000. "Mining frequent patterns without candidate
363 generation." ACM sigmod record 29 (2):1-12.

364 Holzer, Dominik. 2011. "BIM's seven deadly sins." International journal of architectural
365 computing 9 (4):463-480.

366 Huang, M. Q., J. Ninić, and Q. B. Zhang. 2021. "BIM, machine learning and computer
367 vision techniques in underground construction: Current status and future perspectives."
368 Tunnelling and Underground Space Technology 108:103677. doi:
369 https://doi.org/10.1016/j.tust.2020.103677.

370 Kim, Min-Koo, Julian Pratama Putra Thedja, Hung-Lin Chi, and Dong-Eun Lee. 2021.
371 "Automated rebar diameter classification using point cloud data based machine
372 learning." Automation in Construction 122:103476. doi:
373 https://doi.org/10.1016/j.autcon.2020.103476.

374 Larose, D. T. 2015. Data Mining and Predictive Analytics, Wiley Series on Methods and
375 Applications in Data Mining: Wiley.

376 Lee, Pin-Chan, Danbing Long, Bo Ye, and Tzu-Ping Lo. 2020. "Dynamic BIM component
377 recommendation method based on probabilistic matrix factorization and grey model."
378 Advanced Engineering Informatics 43:101024. doi:
379 https://doi.org/10.1016/j.aei.2019.101024.

380 Lomio, F., R. Farinha, M. Laasonen, and H. Huttunen. 2018. "Classification of Building
381 Information Model (BIM) Structures with Deep Learning." 2018 7th European
382 Workshop on Visual Information Processing (EUVIP), 26-28 Nov. 2018.

383 McCabe, Brenda, M. AbouRizk, and Randy Goebel. 1998. "Belief Networks for
384 Construction Performance Diagnostics." Journal of Computing in Civil Engineering
385 12 (2):93-100. doi: 10.1061/(ASCE)0887-3801(1998)12:2(93).

386 Mythili, MS, and AR Mohamed Shanavas. 2013. "Performance evaluation of apriori and
387 fp-growth algorithms." International Journal of Computer Applications 79 (10).

388 Park, Hun Myoung. 2009. "Comparing group means: t-tests and one-way ANOVA using
389 Stata, SAS, R, and SPSS." The University Information Techology Services (UITS)
390 Center for Statistical and Mathematical Computing, Indiana University, IN, USA.

391 Piatetsky-Shapiro, Gregory. 1991. "Discovery, analysis, and presentation of strong rules."
392 Knowledge discovery in databases:229-238.

40
393 Raschka, Sebastian. 2018. "MLxtend: providing machine learning and data science utilities
394 and extensions to Python's scientific computing stack." Journal of open source
395 software 3 (24):638.

396 Sacks, Rafael, Mark Girolami, and Ioannis Brilakis. 2020. "Building Information
397 Modelling, Artificial Intelligence and Construction Tech." Developments in the Built
398 Environment 4:100011. doi: https://doi.org/10.1016/j.dibe.2020.100011.

399 Taghavi, Mona, Jamal Bentahar, Kaveh Bakhtiyari, and Chihab Hanachi. 2018. "New
400 Insights Towards Developing Recommender Systems." The Computer Journal 61
401 (3):319-348. doi: 10.1093/comjnl/bxx056.

402 Tamke, Martin, Paul Nicholas, and Mateusz Zwierzycki. 2018. "Machine learning for
403 architectural design: Practices and infrastructure." International Journal of
404 Architectural Computing 16 (2):123-143. doi: 10.1177/1478077118778580.

405 Tan, P.N., M. Steinbach, A. Karpatne, and V. Kumar. 2019. Introduction to Data Mining.
406 Essex: Pearson.

407 Tan, Pang-Ning, Vipin Kumar, and Jaideep Srivastava. 2004. "Selecting the right objective
408 measure for association analysis." Information Systems 29 (4):293-313.

409 Tzonis, A., and I. White. 2012. Automation Based Creative Design - Research and
410 Perspectives. Amsterdam: Elsevier Science.

411 UNIFI. 2020. "Reasons You Need a BIM Content Management System." Digital Built
412 Environment Institute, accessed 12/12/2020. https://www.dbei.org/news/top-3-
413 reasons-you-need-a-bim-content-management-system/.

414 Wilson, O. D., K. Sharpe, and R. Kenley. 1987. "Estimates given and tenders received: a
415 comparison." Construction Management and Economics 5 (3):211-226. doi:
416 10.1080/01446198700000021.

417 Wu, Tianyi, Yuguo Chen, and Jiawei Han. 2007. "Association mining in large databases: A
418 re-examination of its measures." European conference on principles of data mining and
419 knowledge discovery.

420 Wu, Tianyi, Yuguo Chen, and Jiawei Han. 2010. "Re-examination of interestingness
421 measures in pattern mining: a unified framework." Data Mining and Knowledge
422 Discovery 21 (3):371-397.

423 Xu, Yayin, Ying Zhou, Przemyslaw Sekula, and Lieyun Ding. 2021. "Machine learning in
424 construction: From shallow to deep learning." Developments in the Built Environment
425 6:100045. doi: https://doi.org/10.1016/j.dibe.2021.100045.

41
426 Yin, R.K. 2009. Case Study Research: Design and Methods. Thousand Oaks: SAGE
427 Publications.

428 Zhang, C., and S. Zhang. 2003. Association Rule Mining: Models and Algorithms. Berlin:
429 Springer Berlin Heidelberg.

430 Zhang, Yuxuan, Hexu Liu, and Mohamed Al-Hussein. 2018. "Recommender System for
431 Improving BIM Efficiency: An Interior Finishing Case Study." In Construction
432 Research Congress 2018, 22-32.

433 Zhou, Zhipeng, Yang Miang Goh, and Lijun Shen. 2016. "Overview and Analysis of
434 Ontology Studies Supporting Development of the Construction Industry." Journal of
435 Computing in Civil Engineering 30 (6):1-14. doi: 10.1061/(ASCE)CP.1943-
436 5487.0000594.
437

42

View publication stats

You might also like