NLP-Based Query-Answering System For Information Extraction From Building Information Models

NLP-Based Query-Answering System for Information
Extraction from Building Information Models

Ning Wang, S.M.ASCE 1; Raja R. A. Issa, F.ASCE 2; and Chimay J. Anumba, F.ASCE 3
Downloaded from ascelibrary.org by SUNGKYUNKWAN UNIVERSITY on 03/28/22. Copyright ASCE. For personal use only; all rights reserved.
Abstract: The construction industry is information-intensive, and building information modeling (BIM) has been proposed as an infor-
mation source for supporting decision making by construction project team members in the architecture, engineering, construction, and
operation (AECO) industry. Because building information models contain more building data, further use of the aggregated building
information to support construction and operation activities has become important. In Industry 4.0, similar-to-real-life virtual assistants,
e.g., Apple’s Siri and Google Assistant, are becoming ever more popular. This research developed a query-answering (QA) system for
BIM information extraction (IE) by using natural language processing (NLP) methods to build a virtual assistant for construction project
team members. The architecture of the developed QA system for BIM IE consists of three major modules: natural language understanding, IE,
and natural language generation. A Python-based prototype application was developed based on the architecture of the QA system for BIM IE
to evaluate functionalities of the developed QA system using several BIM/industry foundation classes (IFC) models. Seven building infor-
mation models and 127 test queries were utilized to evaluate the accuracy of the developed QA system for BIM IE. The experimental results
indicated that the developed QA system for BIM IE achieved an 81.9 accuracy score. The developed NLP-based QA system for BIM is valid
to provide relatively accurate answers based on natural language queries. The contributions of this research facilitate the development of
virtual assistants in the AECO industry, and the architecture of the developed QA system can be extended to queries in other areas. DOI:
10.1061/(ASCE)CP.1943-5487.0001019. © 2022 American Society of Civil Engineers.
Author keywords: Building information modeling (BIM); Query answering (QA); Natural language processing (NLP); Industry
foundation classes; Information extraction (IE); Virtual assistant.
Introduction (Paredes-Valverde et al. 2016). The existing BIM IE efforts are

limited in recognizing natural language queries and generating
Building information modeling (BIM) has attained increasing pop- corresponding natural language responses. In addition, with the
ularity as a tool to provide information support for practitioners in increase in data size and complexity of models and software func-
the architecture, engineering, construction, and operation (AECO) tions, BIM users will need more time to study BIM software and
industry. BIM not only focuses on developing the digital model of a tools, and the process of information acquisition will become more
physical building but also aims to provide more comprehensive in- difficult (Lin et al. 2016).
formation acquisition services to onsite and off-site construction In the Fourth Industrial Revolution (Industry 4.0), an increasing
personnel. However, the outcomes of the existing information ex- number of systems and companies are conducting research on
traction (IE) from building information models are limited to a data developing natural language–based virtual assistants or query-
spreadsheet, an information list, and other data representations, answering (QA) systems to support daily life, such as Apple
which require BIM users to have knowledge of the BIM data struc- Siri, Amazon Alexa, Google Assistant, IBM Watson, Microsoft
ture and experience in manipulating BIM software. Existing re-
Cortana, and Nvidia Jarvis. QA systems aim to develop a human–
search on BIM IE focused on obtaining structured BIM data by
machine interactive dialogue system to provide information support
using a structured query language (SQL) or SPARQL query lan-
to humans (Park and Kang 2019). However, existing research
guage request from the BIM database (Krijnen and Beetz 2018;
on QA for IE from building information models is limited.
Niknam and Karshenas 2015; Zhang and Issa 2013). However,
A QA system for BIM IE can provide natural language–based in-
BIM users will need more time to study query languages and BIM
formation acquisition from building information models. Natural
databases. Compared with SQL and SPARQL, natural language
queries and answers are more acceptable for nonexpert users language processing (NLP) is used to enable a machine to imitate
natural human language capabilities and is used to build a natural
1
Ph.D. Candidate, Rinker School of Construction Management, Univ. language–based virtual assistant to support daily work activities.
of Florida, Gainesville, FL 32611 (corresponding author). ORCID: https:// This research aims to use NLP methods to develop a QA system
orcid.org/0000-0003-3096-2385. Email: n.wang@ufl.edu for BIM IE. The virtual assistant for BIM IE requires (1) analyzing
2
Distinguished Professor, Rinker School of Construction Management, a natural human language query and identifying different types of
Univ. of Florida, Gainesville, FL 32611. ORCID: https://orcid.org/0000 content words from the query; (2) locating target structured build-
-0001-5193-3802. Email: raymond-issa@ufl.edu ing information from a BIM data repository, based on the identified
3
Dean and Professor, College of Design, Construction, and Planning, and classified content words; and (3) transforming structured infor-
Univ. of Florida, Gainesville, FL 32611. Email: anumba@ufl.edu
mation into a natural human response. This paper describes the re-
Note. This manuscript was submitted on July 23, 2021; approved on
December 21, 2021; published online on February 21, 2022. Discussion search undertaken to achieve the aforementioned goals. It starts
period open until July 21, 2022; separate discussions must be submitted with a “Literature Review” section that explains the key terminol-
for individual papers. This paper is part of the Journal of Computing ogy and background knowledge used and describes the state of the
in Civil Engineering, © ASCE, ISSN 0887-3801. art of research on BIM IE. Following the literature review, the
© ASCE 04022004-1 J. Comput. Civ. Eng.
J. Comput. Civ. Eng., 2022, 36(3): 04022004

“Methodology” section presents the approach taken to develop the (Karan et al. 2016; Nepal et al. 2013). The outcomes of existing
QA system for BIM IE, including the system architecture. The de- research and applications for BIM information acquisition can be
veloped system is comprised of three major modules: natural lan- summarized into three categories: (1) spreadsheets that contain all
guage understanding (NLU), IE, and natural language generation relevant building information based on the different ontology do-
(NLG). The developed system utilizes the industry foundation mains (Lin et al. 2016); (2) partial BIM/IFC model codes for the
classes (IFC) data format as the information repository for building purpose of sharing, visualizing, and comparing (Jongsung et al.
information models because it is a widely supported, open-source 2013; Zhang and Issa 2013); and (3) organized and retrieved infor-
BIM specification in the AECO industry. mation representation and data structure for model checks, data inter-
The NLU module was developed to use semantic and syntactic operability, or data retrieval use (Zhang and El-Gohary 2015, 2016).
NLP methods to recognize the types of natural language queries Existing research on QA for IE from building information mod-
and identify different types of content words from queries. The IE els is limited, so this paper utilized related works for demonstration.
module uses these content words to locate the target IFC informa- Some research has used the Resource Description Framework
tion and extract the IFC instance data. The IE module directly ex- (RDF) and Web Ontology Language (OWL) as BIM/IFC reposito-
tracts property information from BIM architectural and structural ries and SPARQL (SPARQL Protocol and RDF query language) as
models without complex reasoning and computation. The NLG the query language to get a relevant BIM data spreadsheet from the
module generates the corresponding natural language response BIM RDF or OWL database for information retrieval (Karan et al.
based on the structured information from the NLU and IE modules. 2016; Liu et al. 2016; Studer et al. 2007). However, SPARQL and
The algorithms for the three modules were developed in this re- ontology language are not easily understood by construction per-
search. A Python-based prototype application was developed based sonnel. BIM RDF/OWL research requires transforming building
on the architecture of the developed QA system for BIM. Seven data from BIM/IFC into a RDF or OWL data format, which is a
BIM/IFC architectural/structural models and 127 natural language time-consuming and complicated process, and users are required to
queries were used to test the functionalities and accuracy of the have experience in RDF/OWL. Moreover, the storage size of an
developed QA system for BIM. Comparisons were performed to RDF or OWL file is much larger than its corresponding BIM/
illustrate the differences between the developed BIM IE system IFC file (buildingSMART 2019; Krijnen and Beetz 2018).
and other related IE methods. The developed QA system for BIM Lin et al. (2016) proposed an NLP-based BIM framework for
IE will help facilitate the development of other virtual assistants in information retrieval. Their research utilized syntactic NLP as a
the AECO industry. tool to extract keywords from natural language requests. However,
in their approach, after the part-of-speech (POS) tagging of natural
language, the classification of concept and property words in a sen-
Literature Review tence depended on keyword direct mapping with the IFC dictionary
library, which restricted the word selection of queries. Zhang and
Industry Foundation Classes El-Gohary (2015) developed an automated IE from BIM, transform-
ing it into semantic logic–based data representation for automated
IFC is one of the most popular BIM data exchange specifications
compliance checking (ACC) of building information models. Their
for different BIM software platforms in the AECO industry. buil-
research was based on first-order logic (FOL), which is the most
dingSMART International (bSI) proposed IFC as the international
prominent and fundamental logical formalism for semantic NLP
standard [ISO 16739-1:2018 (ISO 2018)] for a BIM data share
use. Their research outcome was a logic-based information represen-
and exchange format. Because the IFC specification is open-source
tation that was close to natural language.
and easily accessible, data in a BIM/IFC model can be accessed,
The general purpose of existing research on BIM information
checked, and freely modified without any license (Wang and Issa
acquisition was to provide information support for AECO activ-
2019). IFC was developed based on the Standard for the Exchange
ities, but the research results are difficult for AECO personnel with
of Product model data (ISO-STEP) EXPRESS data modeling lan-
limited BIM experience to interpret. Natural language queries and
guage (buildingSMART 2019). The EXPRESS language is a product
outcomes are expected to be more acceptable to nonexpert BIM
data specification language defined by ISO standard (ISO 2004).
users (Paredes-Valverde et al. 2016). The QA for BIM research
IFC4 Addendum 2 Technical Corrigendum 1 (IFC4 ADD2 TC1)
compared the proposed IE methodology with the related research
is the most stable and official version of the IFC specification. There-
of Zhang and El-Gohary (2015), Lin et al. (2016), and Karan et al.
fore, this research used the IFC version IFC4 ADD2 TC1. IFC4
(2016) in the “Discussion” section. The reason for comparing the
ADD2 TC1 [ISO 16739-1:2018 (ISO 2018)] was published in 2017,
proposed methodology with the aforementioned studies was that
and contains 776 entities, 420 property sets, 93 quantities sets, and
they focused on information retrieval or extraction requests from
130 defined data types (buildingSMART 2017). One STEP-IFC file
building information models, which is related to the BIM IE in this
is comprised of two sections of information: header and data. The
QA for BIM research.
header section defines the basic information, like Model View
Definition (MVD). The data section contains considerable building
components information, like three-dimensional (3D) topological Natural Language Processing
data.
Existing QA applications in building information acquisition have
been limited. This research proposed a QA system to mimic lan-
BIM Information Extraction
guage capabilities in providing building information. To achieve
Building information searching and retrieval involve many time- mimicking natural language capabilities, this research used NLP
consuming tasks (Sacks et al. 2018). The related application areas methods (i.e., semantic, and syntactic analysis) instead of machine
of information acquisition from BIM include model compliance learning methods. Machine learning (ML) is one of the most popu-
check (Zhang and El-Gohary 2015, 2016), model comparison lar approaches to process a large volume of raw data to make pre-
(Ghang et al. 2011; Shi et al. 2018), data retrieval (Lin et al. 2016; dictions. ML-based QA systems require a large size of textual
Liu et al. 2016), partial model extraction (Jongsung et al. 2013; dialogue for training and testing purposes, but such training data
Zhang and Issa 2013), and data interoperability and integration for the AECO industry are very limited. Therefore, this research
J. Comput. Civ. Eng., 2022, 36(3): 04022004

utilized NLP methods for the understanding and generation of natu- Grammar Syntax and Part of Speech
ral language. NLP aims to process natural language, which enables The syntactic sentence structure or POS tagging is the key to NLU
a machine to mimic human language capabilities (Cherpas 1992; and NLG (Li et al. 2019). The underlying sentence structure of natu-
Zhang and El-Gohary 2015). The major research on NLP focused ral human language in English is the subject-verb-object (S-V-O)
on semantic features and syntactic rules and structures within a syntactic word order (Celce-Murcia et al. 1999). A phrase structure
natural language sentence. Semantic NLP focuses on processing is a basic component to form the S-V-O sentence structure, and the
the literal meaning of natural language. Syntactic NLP deals with phrase structure provides rules to demonstrate three indispensable
the grammar structure of a sentence. There are two major applica- parts of the S-V-O sentence structure. The phrase categories of gram-
tion areas of NLP: NLU and NLG to implement the semantic and mar syntax in a sentence include noun phrase (NP), verb phrase
syntactic analysis. (VP), and prepositional phrase (PP). NP and VP are the two basic
phrase structures to represent a sentence (i.e., NP + VP), in which NP
stands for the subject and VP summarizes the verb and object parts.
Natural Language Understanding

NLU is a subfield of NLP focused on getting a machine to identify To form the phrase structure of a sentence, English vocabularies are
and interpret natural language. NLU is widely applied for informa- required, including noun, determiner, adjective, verb, adverb, prepo-
tion retrieval, information extraction, text classification, and query- sition, and conjunction. Noun, adjective, verb, and adverb are con-
answering systems (Mujtaba and Mahapatra 2019). NLU is also a tent words, whereas determiner, preposition, and conjunction are
core component of a virtual assistant system (Zheng et al. 2020). function words in English grammar syntax. Content words carry ac-
NLU enables a machine to process natural language inputs. This tual meaning, whereas function words serve grammatical functions.
research utilized semantic and syntactic analysis to process natural In the process of NLP, POS tagging is applied to tag different
language understanding. In computational linguistics, NLU also elements in a sentence, like noun, verb, and adjective. Penn Tree-
deals with synonyms. Similar to Wu et al. (2019) and Zhang bank POS tags are one of the most popular POS taggers for NLP
and El-Gohary (2016), this QA for BIM research utilized WordNet (Penn Treebank 2003). A grammar tree is utilized to visualize the
(2020) to deal with synonyms. phrase structure and POS tags in a sentence. Fig. 1 shows a gram-
mar tree for an example sentence of “The elevation of the second
Natural Language Generation level is 10 feet.” In this grammar tree, the root is the sentence (S),
A natural language response is required for a query-answering sys- and there are several routes from the root to the leaves. The under-
tem or a virtual assistant (Park and Kang 2019; Su and Chen 2019). lying phrase structure is the type of NP + VP, and forming a phrase
NLG was developed to generate such natural language responses. structure is based on the end node, which are word types, such as
NLG is a subtopic of NLP, which focuses on processing syntactic noun (NN), determiner (DT), and preposition (IN). The syntactic
features within natural language. NLG is able to transform struc- structure or sentence segmentation is the fundamental linguistic and
tured data into natural language, which is also the basic concept of grammar basis for NLP (Lin et al. 2016; Wu et al. 2019). Therefore,
IE. NLG was developed to describe structured data via textual rep- this research implemented POS tagging to analyze natural language
resentation (i.e., natural language representation), which can be input (i.e., NLU), and the phrase structure of NP + VP was utilized
easily understood by humans. to generate natural human language based on the BIM/IFC model
In general, the process of NLG needs six dependent elements: data (i.e., NLG).
content determination, discourse planning, sentence aggregation,
lexicalization, referring expression generation, and linguistic reali- Methodology
zation (Montenegro et al. 2018). As for the QA system, those six
elements can be categorized into semantic content and syntactic The goal of the QA system for BIM is to provide BIM users with
structure (Wang and Issa 2020). In NLG, the semantic content natural language queries and answers. For example, a project man-
is extracted from structured data. The syntactic sentence structure ager can type in a natural language query like “What is the height of
is a grammar rule in a natural language sentence, which can be the second floor?,” and the QA system can answer to the project
represented by POS tagging. The generation of natural language manager via natural language, like “The height of the second floor
requires grammar rules and syntactic structure to form a sentence. is 10 feet.” The QA system for BIM can be considered as a virtual
The grammar syntax is also required for the processing of NLU. assistant for project managers to obtain useful building information
Fig. 1. Grammar tree: phrase structure in a sentence.
J. Comput. Civ. Eng., 2022, 36(3): 04022004

from BIM. BIM users can utilize a textual natural language ques- might be many versions of a model. Building models are com-
tion to query the developed system, and the QA system for BIM can monly changed, and users need to track the version of the model.
provide users with natural language responses. Because there are The model creation date provides the version information. Also, the
many types of information aggregated in a BIM/IFC model, such host information provides users a contact person for communica-
as architectural, structural, mechanical, electrical, and plumbing in- tion. For a large construction project, it is difficult to know the
formation, this research focused on building data from the archi- responsible person for the building information model. For differ-
tectural and structural BIM/IFC models. The target information ent IFC sections, the detailed algorithms for NLU, IE, and NLG are
included building components property information (e.g., property different. For example, if the input query is asking for header sec-
object types), geometric information (e.g., floor elevation), and ba- tion information, like the IFC file creation date, one phrase from the
sic model information (e.g., model creation date). To achieve the input query will be used to search IFC data and generate a natural
research goal, the architecture of the NLP-based QA system for language answer. However, most times, the input query is requesting
BIM IE was designed and developed to consist of three major mod- building information from the data section because the data section
ules: NLU, IE, and NLG. Fig. 2 shows the basic architecture of the contains more building data. In this scenario, three types of phrases
developed system. and words were identified and classified by the NLU model, and
Due to the limited AECO training and testing data available, this those phrases are the major indexes to find the target structured IFC
research implemented grammar and syntax analysis for NLU and data and key components to generate the corresponding natural lan-
NLG. The NLU module was used to understand and classify the guage responses.
input textual natural language query. NLU aimed to identify differ-
ent content words within the natural language query and output
words with classification to the IE module. The IE module was Natural Language Understanding Module
developed to locate the queried building information and extract The goal of the developed NLU module was to identify and classify
the corresponding structured IFC data from the BIM/IFC model content words from the natural language query. Fig. 3 illustrates the
based on classified content words. This module aimed to directly NLU algorithm for the developed QA system for BIM IE. After a
extract such information from BIM/IFC models without computa- natural language query is inputted into the system, the first job is to
tion and ontology reasoning. This research focused on the basic tokenize and POS tag the input query. For example, if the query is
model and attribute information of IfcBuildingElements and IfcSpa- “What is the height of the second floor?,” the output of tokenization
tialStructureElements from an architectural/structural building in- and POS tag is to show the POS tags for each word, like (‘what’,
formation model, including IfcDoor, IfcWindow, IfcBuildingStorey, ‘WP’), (‘is’ ‘VBZ’), (‘the’, ‘DT’), (‘height’, ‘NN’), (‘of’, ‘IN’),
IfcWall, IfcBeam, IfcRoof, IfcColumn, IfcSlab, IfcSpace, and Ifc- (‘the’, ‘DT’), (‘second’, ‘JJ’), (‘floor’, ‘NN’), (‘?’, ‘.’) (Figs. 4
Stair. A NLG module was finally implemented to transform the and 5).
structured IFC data into a natural language response. There are many tools for tokenization and POS tagging for
A Python-based prototype application was developed based on NLU. For example, the Natural Language Toolkit (NLTK) is a lead-
the architecture of the QA system for BIM IE. NLU, IE, and NLG ing platform of the Python library to process natural language, such
were programmed and developed in isolated modules so that the as tokenizing, tagging, and stemming (Bird et al. 2009). The Penn
main system can call different modular functions. The schema Treebank tag set is used by the NLTK library. Therefore, this re-
of the input IFC file is IFC4 ADD2 TC1, and the MVD is the de- search utilized NLTK to label each input word with a Penn Tree-
sign transfer view. After the developed QA system for BIM IE re- bank POS tagger.
ceives the natural language query text, the first step is to identify the Based on the tagged query, the next job was to check the queried
target IFC section by the NLU module because the data structures IFC section. For better generating the corresponding natural lan-
of the IFC header and data sections are different. To better extract guage response, the target queried IFC section is necessary. The
IFC data and generate natural language answers, the NLU module syntactic word dependencies within the tagged query were ana-
requires checking the queried IFC type. Although the information lyzed. Fig. 6 shows the syntax tree of a query example “What
within the header section is simple, it also contains important model is the object of door 302?,” which is regarding the IFC data section.
data. For example, users need the model creation date because there The child node NP (i.e., the object of door 302) in the parent node
Fig. 2. NLP-based QA system for BIM IE.
J. Comput. Civ. Eng., 2022, 36(3): 04022004

Fig. 3. NLU algorithm for QA system for BIM.
Fig. 4. NLU classification of query for IFC header section.
Fig. 5. NLU classification of query for IFC data section. Fig. 6. Syntax tree: word dependencies in a sentence.
VP consists of a NP and a PP, and the node PP (i.e., of door 302) eliminated such useless PPs (e.g., “in the building information
depends on the node NP (i.e., the object). This research used syn- model”) before NLU.
tactic word dependencies to differentiate queries for the IFC header The criteria to distinguish such useless PPs and useful PPs in
and data section. For example, the query “What is the model cre- natural language queries was that there was no building element
ation date?” regarded the model information from the header sec- information within the PP. The PP in the query regarding the IFC
tion, and there was no word dependency relationship between a NP data section will contain building element information, for exam-
and a PP, whereas the query for the data section will contain a word ple, the “third floor” and “door 17.” After the target IFC section was
dependencies relationship between a NP and a PP, like the example determined, the next job was to find all content words within the
query shown in Fig. 6. For queries like “What is the creation date of query because content words carry the actual meaning. For the
the model?,” although the query contains a word dependency rela- queries for the header section, only nouns are the content words,
tionship between a NP and a PP, the information within the PP “of so the combination of the nouns is the target to be identified (Fig. 4).
the model” is unnecessary for NLU. Therefore, this research In this research, the combination of the nouns became the keyword
J. Comput. Civ. Eng., 2022, 36(3): 04022004

phrase returned to the main program. In this scenario, the NLU
module aimed to find the keyword phrase for the IE module use. A·B
Cosine similarity ¼ cosðθÞ ¼
If the query contained a syntactic word dependency relationship kAk · kBk
between a NP and a PP, the target was the IFC data section. In this Pn
i¼1 Ai · Bi
scenario, the keyword phrase was not suitable for BIM IE, but the ¼ pP ffi pP
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1Þ
n A2 · n 2
content words were still the target of the NLU module. For the ex- i¼1 i n¼1 Bi
ample query “What is the height of the second floor?,” content
words include nouns and adjectives. However, content words carry where Ai and Bi = components of two vectors A and B.
different semantic meanings. “Height” and “floor” are nouns, but The range of cosine similarity is from −1 to 1, where −1 means
“height” is to express an attribute of a building component, whereas the two vectors are exactly opposite, and 1 means the two vectors
“floor” is the type of the building component. In this research, are the same. In this research, the cosine similarity between the
“height” can be classified as an attribute word, whereas the other
vectors should range from 0 to 1, and the highest cosine similarity

noun “floor” is a type word (Fig. 5). The type word is used to locate means they are more similar. The line in the header section with the
the IFC instance type (e.g., IfcBuildingStorey), and the attribute highest cosine similarity will be extracted. This research utilized
word is the index to find the corresponding IFC instance attribute the CountVectorizer and cosine similarity functions from the
name (e.g., elevation). sciki-learn Python package. For example, if the keyword phrase
Although attribute word and type word have the same POS tag is “model date,” the outcome of this function is the line “* Model
NN in a syntax tree, the node type word and the node attribute word creation date: Sat Jun 06 21:40:32 2020” with the highest cosine
have the same parent node, and the type word (i.e., in a PP) depends similarity. The TF using the BoW method is based on word fre-
on the attribute word (i.e., in a NP). After identifying all nouns quency between texts, which cannot be used for semantic similarity
within the natural language input, the NLU module requires check- comparison.
ing if the noun is in a PP, which depends on a NP. If not, the noun is Fig. 7 shows the IE algorithm for the queries regarding the IFC
the attribute word and returns to the main program. If a noun is in a data section. To locate the target IFC data instance, the type of the
PP, the NLU module will return the noun as a type word. In addi- IFC instance is required. Type word from the NLU module is uti-
tion, the content word adjective is combined with the type word as lized to find such IFC instance type. This system is to match the
a name phrase, which is utilized to locate the exact IFC instance type word with all supported instance types (i.e., IfcBuildingEle-
(e.g., find “second floor” in all IfcBuildingStorey types). Some- ment and IfcSpatialStructureElement) within an IFC-STEP file.
times, users may use a cardinal number instead of an adjective. Some researchers have used the IFC dictionary library or buil-
For example, “level 2” has the same meaning as the “second floor.” dingSMART Data Dictionary (bSDD) to develop a mapping chart
If there is no adjective in the natural language input, the cardinal of the keyword. However, a specific IFC-STEP file may not contain
number can be considered as a content word. In the latter scenario, all instance types. Using the list of all supported instance types
the NLU module was developed to identify the attribute word, type from the target IFC-STEP file to match the keywords can improve
word, and name phrase returned to the main program. the efficiency of locating the target IFC instance type. Therefore,
for the data IE, the developed module was to extract all supported
instance types into an all-type list and then to check if the type word
Information Extraction Module
is a substring of an IFC instance type in the all-type list.
The IE module aims to locate the queried structured IFC data within In a real-world scenario, the type word is not the same as a sub-
an IFC-STEP file based on all classified keywords from the NLU string because the naming convention of IFC instance types was
module. The scope of this module was to directly extract IFC data designed for BIM software use instead of natural language use.
without complex computation and reasoning. The IE algorithms for For example, in a real-world situation, a BIM user may utilize
the IFC header and data section were different due to the different “floor” instead of “IfcBuildingStorey,” and “floor” is not a substring
data structures. The data representation of the header section was an of “IfcBuildingStorey.” Users may use different type words to
attribute name with an attribute value (e.g., Schema: IFC4), whereas express the same meaning like “level” or “floor” for “IfcBuilding-
that of the data section is comprised of IFC instance ID, instance Storey.” In this scenario, the BIM synonyms function was devel-
type, and multiple attribute values (e.g., #164=IFCBUILDING- oped to find synonyms of type word based on the NLTK WordNet
STOREY (‘0KoSqKsC516fss_6JNUq13’,#42,‘Second Floor’, $, Python package. The BIM synonyms function used the type word
‘Level: 1/4” Head’, #163, $,‘Second Floor’,.ELEMENT.,10). The (e.g., floor) to find all synonyms (e.g., level, story, and so on) in
corresponding attribute names were defined by the EXPRESS mod- WordNet, and check whether one of the synonyms is a substring of
eling language of the IFC4 ADD2 TC1 schema. an IFC instance type in the all-type list.
For example, the attribute names of an IfcBuildingStorey include After the module identified the target IFC instance type (e.g., Ifc-
GlobalId, OwnerHistory, Name, Description, Object Type, Object BuildingStorey), the next job was to locate the exact IFC instance
Placement, Representation, Long Name, Composition Type, and (e.g., second floor). The name phrase from the NLU module was
Elevation (buildingSMART 2020). This module is comprised of used to find the target IFC instance. This research also utilized the
two categories: IE for the IFC header section and IE for the data TF-BoW method to vectorize phrases and computed the cosine
section. The method of header IE is vectorization to compare similarity between the name phrase and the “name” of each IFC
the cosine similarity between keyword phrase and each line in instance in the target type. The “name” is one of the attribute names
the header section. The vectorization method was based on term of IfcBuildingElement and IfcSpatialStructureElement based on the
frequency (TF) using the Bag of Words (BoW) (Manning et al. IFC4 schema. The IFC instance with the highest cosine similarity
2008). The TF using the BoW (TF-BoW) method builds a corpus will be extracted. For example, if the type word is “floor” and name
for the two texts (i.e., keyword phrase and each header line) and phrase is “second floor”, the IFC instance with the highest cosine
converts the two texts into two vectors, and the measurement was similarity is “#164= IFCBUILDINGSTOREY (‘0KoSqKsC516fss
cosine similarity between these two vectors. The cosine similarity _6JNUq13’, #42, ‘second level’, $, ‘Level:1/4” Head’, #163, $,
between the two vectors is computed by Eq. (1), which was derived ‘second level’,.ELEMENT., 10).” The IE module only extracts par-
from the dot product of two vectors (Spiegel et al. 2009) tial data from the extracted IFC instance for NLG, for example,
J. Comput. Civ. Eng., 2022, 36(3): 04022004

Fig. 7. IE algorithm for queries regarding data section.
only the “height” information is queried by the user. Therefore, the NP + VP from the basic English grammar syntax. The NLG pattern
corresponding attribute value is required to be extracted. of the IFC header data is DT NN VBZ NN/CD where DT NN is NP,
The IFC schema was used to find the attribute value of the IFC VBZ NN/CD is VP, and the NLG pattern of the IFC data section is
instance. The IE module aims to use the attribute word to match DT NN IN DT NN VBZ NN/CD, where DT NN IN DT NN is NP
each attribute name from the IFC instance; each IFC type may have and VBZ NN/CD is VP.
different attribute names. The IFC4 ADD2 TC1 schema was com- The two patterns of natural language generation were based on
plied with to get corresponding attribute names of the target IFC IFC data structure: Pattern P1 is for the IFC header section and
instance. Once the attribute word matches the attribute name of the Pattern P2 is for the IFC data section. In Pattern P1, the X 1 repre-
IFC instance, the corresponding attribute value will be extracted. sents the NLU keyword phrase, and the X 2 stands for the corre-
However, system users may use “height” to express the “elevation” sponding attribute value. In Pattern P2, Y represents the NLU
of the target IfcBuildingStorey. In this scenario, the BIM synonyms attribute word, X 1 is the NLU name phase, and X 2 is the corre-
function will be implemented to find synonyms of “height” to match sponding attribute value. The attribute values could be a noun or
the attribute name “elevation” from the IFC schema. The outcome of cardinal number. If the attribute value is a cardinal number, a unit
the IE module is the extracted attribute value. word is required for the natural language response. The unit word
can be extracted from the entity IfcConversionBasedUnit, and the
imperial unit “Foot” is commonly used in building elements. In the
Natural Language Generation Module meantime, the autonomous pluralization of the unit word was con-
The NLG module aims to generate a natural language answer based sidered in the process of NLG. The rest of the NLG format is the
on structured information from the NLU and IE modules. Table 1 preset language pattern, such as determiner and preposition. For
provides the syntactic NLG patterns and NLG examples. The pat- example, if the input query is “What is the height of the second
terns were developed based on the underlying phrase structure of floor?,” the attribute word is “height”, and the name phrase is
Table 1. Syntactic NLG patterns and formats

Pattern number IFC type NLG pattern NLG format NLG example
P1 Header DT NN VBZ NN/CDa The X 1 is X 2 b The schema is IFC4.
P2 Data DT NN IN DT NN VBZ NN/CD NN/NNSa The Y of the X 1 is X 2 c The height of the second floor is 10 feet.
a
DT = determiner; NN = noun (singular); VBZ = verb; CD = cardinal number; IN = preposition; and NNS = noun (plural).
b
X 1 = keyword phrase; and X 2 = corresponding attribute value.
c
Y = attribute word; X 1 = name phase; and X 2 = corresponding attribute value.
J. Comput. Civ. Eng., 2022, 36(3): 04022004

“second floor”. The corresponding attribute value is 10 extracted is the height of the second level?” The developed NLU module
from the target IfcBuildingStorey instance. The natural language recognized the queried IFC section and identified the keyword
response is generated by these words, and the outcome is “The phrase, attribute word, type word, and name phrase from queries.
height of the second floor is 10 feet.” The identified “height,” “level,” “second level” were extracted for
Based on the syntactic NLP patterns and IFC schema, the NLG evaluating the accuracy of the NLU module. The IE module used
module converts the structured BIM/IFC data into a natural lan- these three indexes to extract the code of the target IFC instance.
guage response. The first step is to input all keywords from the Also, the target IFC data was extracted to evaluate the accuracy of
NLU module, input the extracted IFC instance code from the IE the IE module. The NLG module transformed the structured infor-
module, and initialize all variables for the preset NLG patterns. mation into a natural language answer. Based on the predefined
The type of IFC instance is an indispensable and contributing factor syntactic NLG pattern for the IFC data section, the generated natu-
to determining the format of the NLG patterns. If the IFC instance ral language response was mainly comprised of NP and VP struc-
belongs to the header part, the keyword phrase and the queried attrib- tures. In the NP part of generated natural language, the phrase
ute value are utilized to generate natural language for pattern P1. If structure is “the <attribute word> of the <name phrase>”. The
the instance is within the IFC data section the natural language sen- VP represents the verb phrase structure of “is <attribute value>.”
tence is generated based on attribute word, name phrase, and attribute This research generated 127 test queries with corresponding
value in the NLG Pattern P2. The NLG module returns the generated building information models’ names to test the accuracy of the de-
natural language response to the main program and outputs the result veloped prototype. The 127 queries were developed based on the
for the system user. The generated natural language is the outcome of IFC4 schema. For example, an IfcBuildingStorey “level 1” included
the developed QA system for BIM. the attribute “elevation,” one natural language query can be generated
“What is the elevation of the level 1?”. There were 127 queries were
generated based on this logic including 20 basic model information
Evaluation and Results queries (e.g., model creation date), 23 IfcDoor queries, 16 IfcWindow
queries, 21 IfcBuildingStorey queries, 9 IfcWall, 6 IfcStair, 9 Ifc-
The Python-based prototype application was developed based on
the architecture of the QA system for BIM IE. The developed Space, 6 IfcBeam, 6 IfcColumn, 6 IfcSlab, and 5 IfcRoof. Each natural
Python Package IfcReader version 1.0.0 parses IFC data and language query was linked with the corresponding building informa-
was published on the GitHub repository 1. IfcReader helps to ex- tion model’s name. Also, this research collected the corresponding
tract organized data from IFC files. The open-source NLTK library ground truth (i.e., natural language answer) based on NLP patterns
was used to achieve the research goal. Three BIM/IFC sample mod- for each query, which was used to compute the accuracy.
els, namely the Sample architectural model 1 (16 KB), VDC Center This research recorded each extracted natural language answer to
Architecture (14.6 MB), and VDC Center Structure (13.1 MB), and compare with the ground truth to account for whether the predicted
four actual architectural/structural models, namely Rinker Building answer matched with the ground truth. The collected extracted an-
Architecture (26.8 MB), Rinker Building Structure (2.31 MB), an swers were cleaned by removing extra spaces, and together with the
Airport Building Architecture (40.7 MB), and an Airport building ground truths, they were converted to lower case in preparation for
Structure (3.43 MB), were utilized to validate the functions of accuracy computations. The accuracy was computed by the accuracy
each module in the prototype application. All BIM/IFC models score function from scikit-learn. The experiment results showed an
were exported in the Design Transfer View of the IFC4 ADD2 81.9 accuracy score, which means 104 predictions exactly matched
TC1 schema from the BIM authoring tool Autodesk Revit 2021. their ground truths. There were 23 failed predictions during the val-
The hardware environment used to evaluate the prototype applica- idation. The reason for those failed predictions was that the attribute
tion was an Intel Core i7-10700 CPU of 2.90 GHz with eight cores word was wrongly matched with the target IFC instance by the vec-
and 32.0 GB RAM. torization methods with the highest cosine similarity. Test queries,
Fig. 8 shows two example results of the prototype application in corresponding building information model’s name, ground truth,
the PyCharm interface. The queries were “model’s date” and “What and predictions were published on GitHub repository 2.
Fig. 8. Example result of prototype application in PyCharm.
J. Comput. Civ. Eng., 2022, 36(3): 04022004

Discussion language sentences, like “The height of door 312 is 6.67 feet.”
The system can be considered as a virtual assistant or chatbot
The evaluation results indicated that the developed Python-based for the project manager to obtain useful building information from
prototype application enabled understanding of the natural lan- BIM. The developed BIM system can assist the project manager to
guage query, extracting corresponding IFC attribute information, save more time in finding accurate building information, and does
and transforming the structured information into a natural language not even require the project manager to understand the complex
response. The developed prototype could provide BIM users with manipulation of BIM software.
model information with an accuracy score of 81.9%. The prototype The developed methodology still has limitations. Because this re-
application was able to generate a natural language answer for BIM search used NLP syntactic analysis instead of machine learning
users based on their queries. The human performance on a general methods, the queries have a similar pattern. Machine learning can
QA task is 86.8% (Rajpurkar et al. 2018), and the developed QA provide a more intelligent way of analyzing queries with different
system for BIM is only 4.9 points lower than that of humans. The
patterns. However, there are no such training data for this process.
experimental results indicated that the developed methodology for Existing training and testing data for the QA development are more
the NLP-based QA system for BIM IE is valid to give a relatively generalized text data for a general purpose. Such custom training
accurate natural language answer based on the user’s query. data for the AECO industry are limited. Building such a data set
Compared with the regular BIM IE method, the developed QA is one of the proposed future research directions. To better under-
system for BIM can provide BIM users with a natural language input stand input natural language queries with fewer restrictions, deep-
option. The developed QA system for BIM IE can identify content learning methods, like artificial neural networks, can provide a good
words to extract the target IFC data and generate the corresponding solution to expand the flexibility of the NLU module in analyzing
natural language response back to the user. A comparison between more unstructured queries. Google has improved the Google Assis-
the developed methodology and the related BIM IE approaches was tant by using deep neural network methods (Kepuska and Bohouta
conducted to show the differences (Table 2). Although those studies 2018). Deep learning is the future direction of the developed meth-
were not focused on building a QA system for BIM IE, they aimed to odology. The developed IE module directly extracts architectural
acquire building information from BIM, which was relevant to this property information without computation and reasoning, but in
research. The developed system used BIM/IFC models as the BIM other scenarios, users may query “Tell me the width of the bedroom
data repository, which can reduce the time to convert IFC-STEP files window in Apartment 101.” There is an ontology relation between
into other formats and save more storage space than other databases the bedroom window and Apartment 101. The ontology-based IE
(DB) and OWL/RDF formats. method is also one of the future directions of this research to provide
For the NLU part, the developed methodology utilized semantic a more intelligent IE with reasoning capability. The developed NLP-
and syntactic NLP methods to identify different types of keywords. based QA for BIM IE is structured, and each module can be readily
For the BIME IE part, the developed QA system used vectorization substituted in future efforts.
with the cosine similarity method. For the NLG part, the developed
system used syntax grammar and POS to generate corresponding
natural language responses. For the outcome, other related research Conclusions
generated different BIM information representations for the pur-
poses of model checks (Zhang and El-Gohary 2015), information With the development of information technologies, many organi-
retrieval (Lin et al. 2016), or data interoperability and integration zations and companies have focused their research on developing
(Karan et al. 2016). Compared with those outcomes of existing re- natural language–based virtual assistants to provide comprehensive
search, natural language responses are more acceptable by nonre- information services to support daily life. Existing BIM IE methods
gular BIM users, and software manipulation and data structure are require users to have knowledge of the BIM software data structure
not required for the developed system. and be experienced in manipulating BIM software, and SQL or
Traditional virtual assistants were designed to detect exact key- SPARQL query languages. However, BIM involves AECO users
words to recognize the information from the natural language input from different domains of interest and many construction practi-
(Kobayashi et al. 2019). For example, some keywords like calendar tioners with limited experience in BIM and query languages, who
were used in the traditional QA system to fulfill relevant calendar all require useful building information to support their construction
jobs from users. The developed QA system for BIM aims to pro- and operation activities. A natural language–based QA system for
vide BIM users with more input options instead of using exact key- BIM IE that allows them to use natural language queries to extract
words to restrict the inputs. For example, a project manager who useful building information becomes very important to these users.
wants to find out height information about a door annotated as 312 To fill this gap, this research developed a natural language–based
on the drawings can use natural language questions like “What is QA system for BIM IE using NLP methods. The developed QA
the height of door 312?” to query the QA system for BIM. The system can identify and classify content words in natural language
developed system can answer the project manager via natural queries and generate the corresponding natural language responses.
Table 2. Comparison between developed BIM IE and other related IE

Categories QA for BIM IE Zhang and El-Gohary (2015) Lin et al. (2016) Karan et al. (2016)
Input query Textual natural language None Textual natural language SPARQL
BIM repository type IFC IFC MongoDB OWL/RDF
BIM repository Not necessary Not necessary Yes (IFC serialization) Yes
transformation
NLU Yes (NLP) Not necessary Yes (NLP) Not necessary
Information searching Vectorization Not necessary IFC keyword mapping Structured query
NLG Yes (NLP) Not necessary None Not necessary
Outcome result Natural language response Data representation (logic facts) Spreadsheets and model visualization Data spreadsheets
J. Comput. Civ. Eng., 2022, 36(3): 04022004

A Python-based prototype application was developed based on the Jongsung, W., L. Ghang, and C. Chiyon. 2013. “No-schema algorithm for
architecture of the QA system for BIM IE to evaluate the developed extracting a partial model from an IFC instance model.” J. Comput. Civ.
system. Eng. 27 (6): 585–592. https://doi.org/10.1061/(ASCE)CP.1943-5487
The QA system can provide relatively accurate responses for .0000320.
Karan, E. P., J. Irizarry, and J. Haymaker. 2016. “BIM and GIS integration and
information support for construction project team members. The
interoperability based on semantic web technology.” J. Comput. Civ. Eng.
developed QA system can be considered as a virtual assistant to pro- 30 (3): 04015043. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000519.
vide comprehensive building information for users. For example, an Kepuska, V., and G. Bohouta. 2018. “Next-generation of virtual personal
onsite project manager can type in a natural language query on their assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google
smartphone, and the developed QA system for BIM can generate a Home).” In Proc., IEEE 8th Annual Computing and Communication
natural language answer for the project manager. The developed Workshop and Conf. CCWC 2018, 99–103. New York: IEEE.
methodology still has limitations in recognizing natural language Kobayashi, Y., T. Yoshida, K. Iwata, H. Fujimura, and M. Akamine. 2019.
queries with complex syntax. To enable machine learning–based “Out-of-domain slot value detection for spoken dialogue systems with
NLU, future research will develop a data set of building informa- context information.” In Proc., IEEE Spoken Language Technology
tion–related natural language queries for training and testing. Workshop, SLT 2018, 854–861. New York: IEEE.
The deep neural network method is the future research direction Krijnen, T., and J. Beetz. 2018. “A SPARQL query engine for binary-
for the NLU module to understand building information–related formatted IFC building models.” Autom. Constr. 95 (Nov): 46–63.
https://doi.org/10.1016/j.autcon.2018.07.014.
natural language queries with more complex syntax. An ontology-
Li, H., A. Y. C. Wang, Y. Liu, D. Tang, Z. Lei, and W. Li. 2019.
based IE module is also a future research direction to provide a “An augmented transformer architecture for natural language generation
more advanced IE method for reasoning. In addition, there are other tasks.” In Proc., Int. Conf. on Data Mining Workshops (ICDMW),
semantic relations between key phrases and IFC entities. For ex- 1131–1137. New York: IEEE. https://doi.org/10.1109/ICDMW48858
ample, second floor provides similar semantics to level 2. A deep .2019.9024754.
learning–based semantic similarity comparison method will be Lin, J., Z. Hu, J. Zhang, and F. Yu. 2016. “A natural-language-based ap-
used in future research. The algorithms for the NLU, IE, and NLG proach to intelligent data retrieval and representation for cloud BIM.”
modules need to be refined to improve the accuracy and intelli- Comput.-Aided Civ. Infrastruct. Eng. 31 (1): 18–33. https://doi.org/10
gence of each module. When the system can provide more intelli- .1111/mice.12151.
gent conversational capabilities, it would be ready for full-scale Liu, H., M. Lu, and M. Al-Hussein. 2016. “Ontology-based semantic
validation on projects. It is expected that the developed QA system approach for construction-oriented quantity take-off from BIM models
in the light-frame building industry.” Adv. Eng. Inf. 30 (2): 190–207.
for BIM IE would facilitate the development of virtual assistants in
https://doi.org/10.1016/j.aei.2016.03.001.
the AECO industry, and the architecture of the developed QA sys- Manning, C. D., P. Raghavan, and H. Schütze. 2008. Introduction to
tem can be extended to queries in other areas. information retrieval. Cambridge, UK: Cambridge University Press.
Montenegro, J. L. Z., C. A. Da Costa, R. D. R. Righi, A. Roehrs, and
E. R. Farias. 2018. “A proposal for postpartum support based on natural
Data Availability Statement language generation model.” In Proc., Int. Conf. on Computational
Science and Computational Intelligence (CSCI), 756–759. New York:
All data, models, or code that support the findings of this study are IEEE. https://doi.org/10.1109/CSCI46756.2018.00151.
available from the corresponding author upon reasonable request. Mujtaba, D., and N. Mahapatra. 2019. “Recent trends in natural language
understanding for procedural knowledge.” In Proc., 6th Annual Conf.
on Computational Science and Computational Intelligence, CSCI 2019,
420–424. New York: IEEE.
References
Nepal, M. P., S.-F. Sheryl, P. Rachel, and Z. Jiemin. 2013. “Ontology-based
Bird, S., E. Loper, and E. Klein. 2009. Natural language processing with feature modeling for construction information extraction from a build-
python. Newton, MA: O’Reilly Media. ing information model.” J. Comput. Civ. Eng. 27 (5): 555–569. https://
buildingSMART. 2017. “Industry foundation classes 4.0.2.1 version 4.0— doi.org/10.1061/(ASCE)CP.1943-5487.0000230.
Addendum 2—Technical corrigendum 1.” Accessed May 21, 2020. Niknam, M., and S. Karshenas. 2015. “Integrating distributed sources of
https://standards.buildingsmart.org/IFC/RELEASE/IFC4/ADD2_TC1 information for construction cost estimating using Semantic Web
/HTML/link/alphabeticalorder-entities.htm. and Semantic Web Service technologies.” Autom. Constr. 57 (Sep):
buildingSMART. 2019. “Industry foundation classes (IFC).” Accessed 222–238. https://doi.org/10.1016/j.autcon.2015.04.003.
June 10, 2019. https://technical.buildingsmart.org/standards/ifc. Paredes-Valverde, M. A., R. Valencia-García, M. Á. Rodríguez-García,
buildingSMART. 2020. “IFC specifications database.” Accessed October 10, R. Colomo-Palacios, and G. Alor-Hernández. 2016. “A semantic-based
2020. https://standards.buildingsmart.org/IFC/RELEASE/IFC4/ADD2 approach for querying linked data using natural language.” J. Inf. Sci.
_TC1/EXPRESS/IFC4.exp. 42 (6): 851–862. https://doi.org/10.1177/0165551515616311.
Celce-Murcia, M., D. Larsen-Freeman, and H. A. Williams. 1999. The gram- Park, Y., and S. Kang. 2019. “Natural language generation using depend-
mar book: An ESL/EFL teacher’s course. Boston: Heinle & Heinle. ency tree decoding for spoken dialog systems.” IEEE Access 7 (Dec):
Cherpas, C. 1992. “Natural language processing, pragmatics, and verbal 7250–7258. https://doi.org/10.1109/ACCESS.2018.2889556.
behavior.” Anal. Verbal Behav. 10 (1): 135. https://doi.org/10.1007 Penn Treebank. 2003. “Penn Treebank P.O.S. tags.” Accessed June 7, 2020.
/BF03392880. https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank
Ghang, L., W. Jongsung, H. Sungil, and S. Yuna. 2011. “Metrics for quan- _pos.html.
tifying the similarities and differences between IFC files.” J. Comput. Rajpurkar, P., R. Jia, and P. Liang. 2018. “Know what you don’t know:
Civ. Eng. 25 (2): 172–181. https://doi.org/10.1061/(ASCE)CP.1943 Unanswerable questions for SQuAD.” In Proc., 56th Annual Meeting
-5487.0000077. of the Association for Computational Linguistics, 784–789. Stroudsburg,
ISO. 2004. Industrial automation systems and integration—Product PA: Association for Computational Linguistics.
data representation and exchange—Part 11: Description methods: The Sacks, R., C. M. Eastman, P. M. Teicholz, and G. Lee. 2018. BIM hand-
EXPRESS language reference manual. ISO 10303-11:2004. Geneva: book: A guide to building information modeling for owners, designers,
ISO. engineers, contractors, and facility managers. Hoboken, NJ: Wiley.
ISO. 2018. Industry foundation classes (IFC) for data sharing in the con- Shi, X., Y. S. Liu, G. Gao, M. Gu, and H. Li. 2018. “IFCdiff: A content-
struction and facility management industries—Part 1: Data schema. based automatic comparison approach for IFC files.” Autom. Constr.
ISO 16739-1:2018. Geneva: ISO. 86 (Jun): 53–68. https://doi.org/10.1016/j.autcon.2017.10.013.
J. Comput. Civ. Eng., 2022, 36(3): 04022004

Spiegel, M., S. Lipschutz, and D. Spellman. 2009. Vector analysis Wu, S., Q. Shen, Y. Deng, and J. Cheng. 2019. “Natural-language-based
(Schaum’s outlines). 2nd ed. New York: McGraw Hill. intelligent retrieval engine for BIM object database.” Comput. Ind.
Studer, R., A. Abecker, and S. Grimm. 2007. Semantic web services: 108 (Jun): 73–88. https://doi.org/10.1016/j.compind.2019.02.016.
Concepts, technologies, and applications. Edited by R. Studer, Zhang, J., and N. M. El-Gohary. 2015. “Automated extraction of
A. Abecker, and S. Grimm. Berlin: Springer. information from building information models into a semantic logic-
Su, S. Y., and Y. N. Chen. 2019. “Investigating linguistic pattern ordering in based representation.” In Computing in civil engineering 2015,
hierarchical natural language generation.” In Proc., IEEE Spoken Lan- 173–180. Reston, VA: ASCE. https://doi.org/10.1061/9780784479
guage Technology Workshop, SLT 2018, 779–786. New York: IEEE. 247.022.
Wang, N., and R. R. A. Issa. 2019. “Ontology-based building information Zhang, J., and N. M. El-Gohary. 2016. “Extending building information
model design change visualization.” In Proc., Workshop on Linked models semiautomatically using semantic natural language processing
Building Data and Semantic Web Technologies (WLS2019), 53–61. techniques.” J. Comput. Civ. Eng. 30 (5): 331–346. https://doi.org/10
Gainesville, FL: Smart Construction Informatics (SCI) Laboratory. .1061/(ASCE)CP.1943-5487.0000536.
Wang, N., and R. R. A. Issa. 2020. “Natural language generation from Zhang, L., and R. R. A. Issa. 2013. “Ontology-based partial building in-
building information models for intelligent NLP-based information ex- formation model extraction.” J. Comput. Civ. Eng. 27 (6): 576–584.
traction.” In Proc., EG-ICE 2020 Workshop on Intelligent Computing in https://doi.org/10.1061/(ASCE)CP.1943-5487.0000277.
Engineering, edited by L. C. Ungureanu and T. Hartmann, 275–284. Zheng, Y., G. Chen, and M. Huang. 2020. “Out-of-domain detection for
Berlin: Universitätsverlag der TU Berlin. natural language understanding in dialog systems.” IEEE/ACM Trans.
WordNet. 2020. “WordNet: A lexical database for English.” Accessed Audio Speech Lang. Process. 28 (Apr): 1198–1209. https://doi.org/10
June 7, 2020. https://wordnet.princeton.edu/. .1109/TASLP.2020.2983593.
J. Comput. Civ. Eng., 2022, 36(3): 04022004

NLP-Based Query-Answering System For Information Extraction From Building Information Models

Uploaded by

Copyright:

Available Formats

You might also like

NLP-Based Query-Answering System For Information Extraction From Building Information Models

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NLP-Based Query-Answering System For Information Extraction From Building Information Models

Uploaded by

Copyright:

Available Formats

NLP-Based Query-Answering System for Information

Extraction from Building Information Models

Introduction (Paredes-Valverde et al. 2016). The existing BIM IE efforts are

© ASCE 04022004-1 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2022, 36(3): 04022004

© ASCE 04022004-2 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2022, 36(3): 04022004

Natural Language Understanding

Fig. 1. Grammar tree: phrase structure in a sentence.

© ASCE 04022004-3 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2022, 36(3): 04022004

Fig. 2. NLP-based QA system for BIM IE.

© ASCE 04022004-4 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2022, 36(3): 04022004

Fig. 3. NLU algorithm for QA system for BIM.

Fig. 4. NLU classification of query for IFC header section.

© ASCE 04022004-5 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2022, 36(3): 04022004

vectors should range from 0 to 1, and the highest cosine similarity

© ASCE 04022004-6 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2022, 36(3): 04022004

Fig. 7. IE algorithm for queries regarding data section.

Table 1. Syntactic NLG patterns and formats

© ASCE 04022004-7 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2022, 36(3): 04022004

Fig. 8. Example result of prototype application in PyCharm.

© ASCE 04022004-8 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2022, 36(3): 04022004

Table 2. Comparison between developed BIM IE and other related IE

© ASCE 04022004-9 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2022, 36(3): 04022004

© ASCE 04022004-10 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2022, 36(3): 04022004

© ASCE 04022004-11 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2022, 36(3): 04022004

You might also like