Professional Documents
Culture Documents
NLP-Based Query-Answering System For Information Extraction From Building Information Models
NLP-Based Query-Answering System For Information Extraction From Building Information Models
NLP-Based Query-Answering System For Information Extraction From Building Information Models
Abstract: The construction industry is information-intensive, and building information modeling (BIM) has been proposed as an infor-
mation source for supporting decision making by construction project team members in the architecture, engineering, construction, and
operation (AECO) industry. Because building information models contain more building data, further use of the aggregated building
information to support construction and operation activities has become important. In Industry 4.0, similar-to-real-life virtual assistants,
e.g., Apple’s Siri and Google Assistant, are becoming ever more popular. This research developed a query-answering (QA) system for
BIM information extraction (IE) by using natural language processing (NLP) methods to build a virtual assistant for construction project
team members. The architecture of the developed QA system for BIM IE consists of three major modules: natural language understanding, IE,
and natural language generation. A Python-based prototype application was developed based on the architecture of the QA system for BIM IE
to evaluate functionalities of the developed QA system using several BIM/industry foundation classes (IFC) models. Seven building infor-
mation models and 127 test queries were utilized to evaluate the accuracy of the developed QA system for BIM IE. The experimental results
indicated that the developed QA system for BIM IE achieved an 81.9 accuracy score. The developed NLP-based QA system for BIM is valid
to provide relatively accurate answers based on natural language queries. The contributions of this research facilitate the development of
virtual assistants in the AECO industry, and the architecture of the developed QA system can be extended to queries in other areas. DOI:
10.1061/(ASCE)CP.1943-5487.0001019. © 2022 American Society of Civil Engineers.
Author keywords: Building information modeling (BIM); Query answering (QA); Natural language processing (NLP); Industry
foundation classes; Information extraction (IE); Virtual assistant.
tion and extract the IFC instance data. The IE module directly ex- (RDF) and Web Ontology Language (OWL) as BIM/IFC reposito-
tracts property information from BIM architectural and structural ries and SPARQL (SPARQL Protocol and RDF query language) as
models without complex reasoning and computation. The NLG the query language to get a relevant BIM data spreadsheet from the
module generates the corresponding natural language response BIM RDF or OWL database for information retrieval (Karan et al.
based on the structured information from the NLU and IE modules. 2016; Liu et al. 2016; Studer et al. 2007). However, SPARQL and
The algorithms for the three modules were developed in this re- ontology language are not easily understood by construction per-
search. A Python-based prototype application was developed based sonnel. BIM RDF/OWL research requires transforming building
on the architecture of the developed QA system for BIM. Seven data from BIM/IFC into a RDF or OWL data format, which is a
BIM/IFC architectural/structural models and 127 natural language time-consuming and complicated process, and users are required to
queries were used to test the functionalities and accuracy of the have experience in RDF/OWL. Moreover, the storage size of an
developed QA system for BIM. Comparisons were performed to RDF or OWL file is much larger than its corresponding BIM/
illustrate the differences between the developed BIM IE system IFC file (buildingSMART 2019; Krijnen and Beetz 2018).
and other related IE methods. The developed QA system for BIM Lin et al. (2016) proposed an NLP-based BIM framework for
IE will help facilitate the development of other virtual assistants in information retrieval. Their research utilized syntactic NLP as a
the AECO industry. tool to extract keywords from natural language requests. However,
in their approach, after the part-of-speech (POS) tagging of natural
language, the classification of concept and property words in a sen-
Literature Review tence depended on keyword direct mapping with the IFC dictionary
library, which restricted the word selection of queries. Zhang and
Industry Foundation Classes El-Gohary (2015) developed an automated IE from BIM, transform-
ing it into semantic logic–based data representation for automated
IFC is one of the most popular BIM data exchange specifications
compliance checking (ACC) of building information models. Their
for different BIM software platforms in the AECO industry. buil-
research was based on first-order logic (FOL), which is the most
dingSMART International (bSI) proposed IFC as the international
prominent and fundamental logical formalism for semantic NLP
standard [ISO 16739-1:2018 (ISO 2018)] for a BIM data share
use. Their research outcome was a logic-based information represen-
and exchange format. Because the IFC specification is open-source
tation that was close to natural language.
and easily accessible, data in a BIM/IFC model can be accessed,
The general purpose of existing research on BIM information
checked, and freely modified without any license (Wang and Issa
acquisition was to provide information support for AECO activ-
2019). IFC was developed based on the Standard for the Exchange
ities, but the research results are difficult for AECO personnel with
of Product model data (ISO-STEP) EXPRESS data modeling lan-
limited BIM experience to interpret. Natural language queries and
guage (buildingSMART 2019). The EXPRESS language is a product
outcomes are expected to be more acceptable to nonexpert BIM
data specification language defined by ISO standard (ISO 2004).
users (Paredes-Valverde et al. 2016). The QA for BIM research
IFC4 Addendum 2 Technical Corrigendum 1 (IFC4 ADD2 TC1)
compared the proposed IE methodology with the related research
is the most stable and official version of the IFC specification. There-
of Zhang and El-Gohary (2015), Lin et al. (2016), and Karan et al.
fore, this research used the IFC version IFC4 ADD2 TC1. IFC4
(2016) in the “Discussion” section. The reason for comparing the
ADD2 TC1 [ISO 16739-1:2018 (ISO 2018)] was published in 2017,
proposed methodology with the aforementioned studies was that
and contains 776 entities, 420 property sets, 93 quantities sets, and
they focused on information retrieval or extraction requests from
130 defined data types (buildingSMART 2017). One STEP-IFC file
building information models, which is related to the BIM IE in this
is comprised of two sections of information: header and data. The
QA for BIM research.
header section defines the basic information, like Model View
Definition (MVD). The data section contains considerable building
components information, like three-dimensional (3D) topological Natural Language Processing
data.
Existing QA applications in building information acquisition have
been limited. This research proposed a QA system to mimic lan-
BIM Information Extraction
guage capabilities in providing building information. To achieve
Building information searching and retrieval involve many time- mimicking natural language capabilities, this research used NLP
consuming tasks (Sacks et al. 2018). The related application areas methods (i.e., semantic, and syntactic analysis) instead of machine
of information acquisition from BIM include model compliance learning methods. Machine learning (ML) is one of the most popu-
check (Zhang and El-Gohary 2015, 2016), model comparison lar approaches to process a large volume of raw data to make pre-
(Ghang et al. 2011; Shi et al. 2018), data retrieval (Lin et al. 2016; dictions. ML-based QA systems require a large size of textual
Liu et al. 2016), partial model extraction (Jongsung et al. 2013; dialogue for training and testing purposes, but such training data
Zhang and Issa 2013), and data interoperability and integration for the AECO industry are very limited. Therefore, this research
BIM IE was designed and developed to consist of three major mod- building information from the data section because the data section
ules: NLU, IE, and NLG. Fig. 2 shows the basic architecture of the contains more building data. In this scenario, three types of phrases
developed system. and words were identified and classified by the NLU model, and
Due to the limited AECO training and testing data available, this those phrases are the major indexes to find the target structured IFC
research implemented grammar and syntax analysis for NLU and data and key components to generate the corresponding natural lan-
NLG. The NLU module was used to understand and classify the guage responses.
input textual natural language query. NLU aimed to identify differ-
ent content words within the natural language query and output
words with classification to the IE module. The IE module was Natural Language Understanding Module
developed to locate the queried building information and extract The goal of the developed NLU module was to identify and classify
the corresponding structured IFC data from the BIM/IFC model content words from the natural language query. Fig. 3 illustrates the
based on classified content words. This module aimed to directly NLU algorithm for the developed QA system for BIM IE. After a
extract such information from BIM/IFC models without computa- natural language query is inputted into the system, the first job is to
tion and ontology reasoning. This research focused on the basic tokenize and POS tag the input query. For example, if the query is
model and attribute information of IfcBuildingElements and IfcSpa- “What is the height of the second floor?,” the output of tokenization
tialStructureElements from an architectural/structural building in- and POS tag is to show the POS tags for each word, like (‘what’,
formation model, including IfcDoor, IfcWindow, IfcBuildingStorey, ‘WP’), (‘is’ ‘VBZ’), (‘the’, ‘DT’), (‘height’, ‘NN’), (‘of’, ‘IN’),
IfcWall, IfcBeam, IfcRoof, IfcColumn, IfcSlab, IfcSpace, and Ifc- (‘the’, ‘DT’), (‘second’, ‘JJ’), (‘floor’, ‘NN’), (‘?’, ‘.’) (Figs. 4
Stair. A NLG module was finally implemented to transform the and 5).
structured IFC data into a natural language response. There are many tools for tokenization and POS tagging for
A Python-based prototype application was developed based on NLU. For example, the Natural Language Toolkit (NLTK) is a lead-
the architecture of the QA system for BIM IE. NLU, IE, and NLG ing platform of the Python library to process natural language, such
were programmed and developed in isolated modules so that the as tokenizing, tagging, and stemming (Bird et al. 2009). The Penn
main system can call different modular functions. The schema Treebank tag set is used by the NLTK library. Therefore, this re-
of the input IFC file is IFC4 ADD2 TC1, and the MVD is the de- search utilized NLTK to label each input word with a Penn Tree-
sign transfer view. After the developed QA system for BIM IE re- bank POS tagger.
ceives the natural language query text, the first step is to identify the Based on the tagged query, the next job was to check the queried
target IFC section by the NLU module because the data structures IFC section. For better generating the corresponding natural lan-
of the IFC header and data sections are different. To better extract guage response, the target queried IFC section is necessary. The
IFC data and generate natural language answers, the NLU module syntactic word dependencies within the tagged query were ana-
requires checking the queried IFC type. Although the information lyzed. Fig. 6 shows the syntax tree of a query example “What
within the header section is simple, it also contains important model is the object of door 302?,” which is regarding the IFC data section.
data. For example, users need the model creation date because there The child node NP (i.e., the object of door 302) in the parent node
Fig. 5. NLU classification of query for IFC data section. Fig. 6. Syntax tree: word dependencies in a sentence.
VP consists of a NP and a PP, and the node PP (i.e., of door 302) eliminated such useless PPs (e.g., “in the building information
depends on the node NP (i.e., the object). This research used syn- model”) before NLU.
tactic word dependencies to differentiate queries for the IFC header The criteria to distinguish such useless PPs and useful PPs in
and data section. For example, the query “What is the model cre- natural language queries was that there was no building element
ation date?” regarded the model information from the header sec- information within the PP. The PP in the query regarding the IFC
tion, and there was no word dependency relationship between a NP data section will contain building element information, for exam-
and a PP, whereas the query for the data section will contain a word ple, the “third floor” and “door 17.” After the target IFC section was
dependencies relationship between a NP and a PP, like the example determined, the next job was to find all content words within the
query shown in Fig. 6. For queries like “What is the creation date of query because content words carry the actual meaning. For the
the model?,” although the query contains a word dependency rela- queries for the header section, only nouns are the content words,
tionship between a NP and a PP, the information within the PP “of so the combination of the nouns is the target to be identified (Fig. 4).
the model” is unnecessary for NLU. Therefore, this research In this research, the combination of the nouns became the keyword
only the “height” information is queried by the user. Therefore, the NP + VP from the basic English grammar syntax. The NLG pattern
corresponding attribute value is required to be extracted. of the IFC header data is DT NN VBZ NN/CD where DT NN is NP,
The IFC schema was used to find the attribute value of the IFC VBZ NN/CD is VP, and the NLG pattern of the IFC data section is
instance. The IE module aims to use the attribute word to match DT NN IN DT NN VBZ NN/CD, where DT NN IN DT NN is NP
each attribute name from the IFC instance; each IFC type may have and VBZ NN/CD is VP.
different attribute names. The IFC4 ADD2 TC1 schema was com- The two patterns of natural language generation were based on
plied with to get corresponding attribute names of the target IFC IFC data structure: Pattern P1 is for the IFC header section and
instance. Once the attribute word matches the attribute name of the Pattern P2 is for the IFC data section. In Pattern P1, the X 1 repre-
IFC instance, the corresponding attribute value will be extracted. sents the NLU keyword phrase, and the X 2 stands for the corre-
However, system users may use “height” to express the “elevation” sponding attribute value. In Pattern P2, Y represents the NLU
of the target IfcBuildingStorey. In this scenario, the BIM synonyms attribute word, X 1 is the NLU name phase, and X 2 is the corre-
function will be implemented to find synonyms of “height” to match sponding attribute value. The attribute values could be a noun or
the attribute name “elevation” from the IFC schema. The outcome of cardinal number. If the attribute value is a cardinal number, a unit
the IE module is the extracted attribute value. word is required for the natural language response. The unit word
can be extracted from the entity IfcConversionBasedUnit, and the
imperial unit “Foot” is commonly used in building elements. In the
Natural Language Generation Module meantime, the autonomous pluralization of the unit word was con-
The NLG module aims to generate a natural language answer based sidered in the process of NLG. The rest of the NLG format is the
on structured information from the NLU and IE modules. Table 1 preset language pattern, such as determiner and preposition. For
provides the syntactic NLG patterns and NLG examples. The pat- example, if the input query is “What is the height of the second
terns were developed based on the underlying phrase structure of floor?,” the attribute word is “height”, and the name phrase is
belongs to the header part, the keyword phrase and the queried attrib- tures. In the NP part of generated natural language, the phrase
ute value are utilized to generate natural language for pattern P1. If structure is “the <attribute word> of the <name phrase>”. The
the instance is within the IFC data section the natural language sen- VP represents the verb phrase structure of “is <attribute value>.”
tence is generated based on attribute word, name phrase, and attribute This research generated 127 test queries with corresponding
value in the NLG Pattern P2. The NLG module returns the generated building information models’ names to test the accuracy of the de-
natural language response to the main program and outputs the result veloped prototype. The 127 queries were developed based on the
for the system user. The generated natural language is the outcome of IFC4 schema. For example, an IfcBuildingStorey “level 1” included
the developed QA system for BIM. the attribute “elevation,” one natural language query can be generated
“What is the elevation of the level 1?”. There were 127 queries were
generated based on this logic including 20 basic model information
Evaluation and Results queries (e.g., model creation date), 23 IfcDoor queries, 16 IfcWindow
queries, 21 IfcBuildingStorey queries, 9 IfcWall, 6 IfcStair, 9 Ifc-
The Python-based prototype application was developed based on
the architecture of the QA system for BIM IE. The developed Space, 6 IfcBeam, 6 IfcColumn, 6 IfcSlab, and 5 IfcRoof. Each natural
Python Package IfcReader version 1.0.0 parses IFC data and language query was linked with the corresponding building informa-
was published on the GitHub repository 1. IfcReader helps to ex- tion model’s name. Also, this research collected the corresponding
tract organized data from IFC files. The open-source NLTK library ground truth (i.e., natural language answer) based on NLP patterns
was used to achieve the research goal. Three BIM/IFC sample mod- for each query, which was used to compute the accuracy.
els, namely the Sample architectural model 1 (16 KB), VDC Center This research recorded each extracted natural language answer to
Architecture (14.6 MB), and VDC Center Structure (13.1 MB), and compare with the ground truth to account for whether the predicted
four actual architectural/structural models, namely Rinker Building answer matched with the ground truth. The collected extracted an-
Architecture (26.8 MB), Rinker Building Structure (2.31 MB), an swers were cleaned by removing extra spaces, and together with the
Airport Building Architecture (40.7 MB), and an Airport building ground truths, they were converted to lower case in preparation for
Structure (3.43 MB), were utilized to validate the functions of accuracy computations. The accuracy was computed by the accuracy
each module in the prototype application. All BIM/IFC models score function from scikit-learn. The experiment results showed an
were exported in the Design Transfer View of the IFC4 ADD2 81.9 accuracy score, which means 104 predictions exactly matched
TC1 schema from the BIM authoring tool Autodesk Revit 2021. their ground truths. There were 23 failed predictions during the val-
The hardware environment used to evaluate the prototype applica- idation. The reason for those failed predictions was that the attribute
tion was an Intel Core i7-10700 CPU of 2.90 GHz with eight cores word was wrongly matched with the target IFC instance by the vec-
and 32.0 GB RAM. torization methods with the highest cosine similarity. Test queries,
Fig. 8 shows two example results of the prototype application in corresponding building information model’s name, ground truth,
the PyCharm interface. The queries were “model’s date” and “What and predictions were published on GitHub repository 2.
patterns. However, there are no such training data for this process.
experimental results indicated that the developed methodology for Existing training and testing data for the QA development are more
the NLP-based QA system for BIM IE is valid to give a relatively generalized text data for a general purpose. Such custom training
accurate natural language answer based on the user’s query. data for the AECO industry are limited. Building such a data set
Compared with the regular BIM IE method, the developed QA is one of the proposed future research directions. To better under-
system for BIM can provide BIM users with a natural language input stand input natural language queries with fewer restrictions, deep-
option. The developed QA system for BIM IE can identify content learning methods, like artificial neural networks, can provide a good
words to extract the target IFC data and generate the corresponding solution to expand the flexibility of the NLU module in analyzing
natural language response back to the user. A comparison between more unstructured queries. Google has improved the Google Assis-
the developed methodology and the related BIM IE approaches was tant by using deep neural network methods (Kepuska and Bohouta
conducted to show the differences (Table 2). Although those studies 2018). Deep learning is the future direction of the developed meth-
were not focused on building a QA system for BIM IE, they aimed to odology. The developed IE module directly extracts architectural
acquire building information from BIM, which was relevant to this property information without computation and reasoning, but in
research. The developed system used BIM/IFC models as the BIM other scenarios, users may query “Tell me the width of the bedroom
data repository, which can reduce the time to convert IFC-STEP files window in Apartment 101.” There is an ontology relation between
into other formats and save more storage space than other databases the bedroom window and Apartment 101. The ontology-based IE
(DB) and OWL/RDF formats. method is also one of the future directions of this research to provide
For the NLU part, the developed methodology utilized semantic a more intelligent IE with reasoning capability. The developed NLP-
and syntactic NLP methods to identify different types of keywords. based QA for BIM IE is structured, and each module can be readily
For the BIME IE part, the developed QA system used vectorization substituted in future efforts.
with the cosine similarity method. For the NLG part, the developed
system used syntax grammar and POS to generate corresponding
natural language responses. For the outcome, other related research Conclusions
generated different BIM information representations for the pur-
poses of model checks (Zhang and El-Gohary 2015), information With the development of information technologies, many organi-
retrieval (Lin et al. 2016), or data interoperability and integration zations and companies have focused their research on developing
(Karan et al. 2016). Compared with those outcomes of existing re- natural language–based virtual assistants to provide comprehensive
search, natural language responses are more acceptable by nonre- information services to support daily life. Existing BIM IE methods
gular BIM users, and software manipulation and data structure are require users to have knowledge of the BIM software data structure
not required for the developed system. and be experienced in manipulating BIM software, and SQL or
Traditional virtual assistants were designed to detect exact key- SPARQL query languages. However, BIM involves AECO users
words to recognize the information from the natural language input from different domains of interest and many construction practi-
(Kobayashi et al. 2019). For example, some keywords like calendar tioners with limited experience in BIM and query languages, who
were used in the traditional QA system to fulfill relevant calendar all require useful building information to support their construction
jobs from users. The developed QA system for BIM aims to pro- and operation activities. A natural language–based QA system for
vide BIM users with more input options instead of using exact key- BIM IE that allows them to use natural language queries to extract
words to restrict the inputs. For example, a project manager who useful building information becomes very important to these users.
wants to find out height information about a door annotated as 312 To fill this gap, this research developed a natural language–based
on the drawings can use natural language questions like “What is QA system for BIM IE using NLP methods. The developed QA
the height of door 312?” to query the QA system for BIM. The system can identify and classify content words in natural language
developed system can answer the project manager via natural queries and generate the corresponding natural language responses.
NLU, future research will develop a data set of building informa- context information.” In Proc., IEEE Spoken Language Technology
tion–related natural language queries for training and testing. Workshop, SLT 2018, 854–861. New York: IEEE.
The deep neural network method is the future research direction Krijnen, T., and J. Beetz. 2018. “A SPARQL query engine for binary-
for the NLU module to understand building information–related formatted IFC building models.” Autom. Constr. 95 (Nov): 46–63.
https://doi.org/10.1016/j.autcon.2018.07.014.
natural language queries with more complex syntax. An ontology-
Li, H., A. Y. C. Wang, Y. Liu, D. Tang, Z. Lei, and W. Li. 2019.
based IE module is also a future research direction to provide a “An augmented transformer architecture for natural language generation
more advanced IE method for reasoning. In addition, there are other tasks.” In Proc., Int. Conf. on Data Mining Workshops (ICDMW),
semantic relations between key phrases and IFC entities. For ex- 1131–1137. New York: IEEE. https://doi.org/10.1109/ICDMW48858
ample, second floor provides similar semantics to level 2. A deep .2019.9024754.
learning–based semantic similarity comparison method will be Lin, J., Z. Hu, J. Zhang, and F. Yu. 2016. “A natural-language-based ap-
used in future research. The algorithms for the NLU, IE, and NLG proach to intelligent data retrieval and representation for cloud BIM.”
modules need to be refined to improve the accuracy and intelli- Comput.-Aided Civ. Infrastruct. Eng. 31 (1): 18–33. https://doi.org/10
gence of each module. When the system can provide more intelli- .1111/mice.12151.
gent conversational capabilities, it would be ready for full-scale Liu, H., M. Lu, and M. Al-Hussein. 2016. “Ontology-based semantic
validation on projects. It is expected that the developed QA system approach for construction-oriented quantity take-off from BIM models
in the light-frame building industry.” Adv. Eng. Inf. 30 (2): 190–207.
for BIM IE would facilitate the development of virtual assistants in
https://doi.org/10.1016/j.aei.2016.03.001.
the AECO industry, and the architecture of the developed QA sys- Manning, C. D., P. Raghavan, and H. Schütze. 2008. Introduction to
tem can be extended to queries in other areas. information retrieval. Cambridge, UK: Cambridge University Press.
Montenegro, J. L. Z., C. A. Da Costa, R. D. R. Righi, A. Roehrs, and
E. R. Farias. 2018. “A proposal for postpartum support based on natural
Data Availability Statement language generation model.” In Proc., Int. Conf. on Computational
Science and Computational Intelligence (CSCI), 756–759. New York:
All data, models, or code that support the findings of this study are IEEE. https://doi.org/10.1109/CSCI46756.2018.00151.
available from the corresponding author upon reasonable request. Mujtaba, D., and N. Mahapatra. 2019. “Recent trends in natural language
understanding for procedural knowledge.” In Proc., 6th Annual Conf.
on Computational Science and Computational Intelligence, CSCI 2019,
420–424. New York: IEEE.
References
Nepal, M. P., S.-F. Sheryl, P. Rachel, and Z. Jiemin. 2013. “Ontology-based
Bird, S., E. Loper, and E. Klein. 2009. Natural language processing with feature modeling for construction information extraction from a build-
python. Newton, MA: O’Reilly Media. ing information model.” J. Comput. Civ. Eng. 27 (5): 555–569. https://
buildingSMART. 2017. “Industry foundation classes 4.0.2.1 version 4.0— doi.org/10.1061/(ASCE)CP.1943-5487.0000230.
Addendum 2—Technical corrigendum 1.” Accessed May 21, 2020. Niknam, M., and S. Karshenas. 2015. “Integrating distributed sources of
https://standards.buildingsmart.org/IFC/RELEASE/IFC4/ADD2_TC1 information for construction cost estimating using Semantic Web
/HTML/link/alphabeticalorder-entities.htm. and Semantic Web Service technologies.” Autom. Constr. 57 (Sep):
buildingSMART. 2019. “Industry foundation classes (IFC).” Accessed 222–238. https://doi.org/10.1016/j.autcon.2015.04.003.
June 10, 2019. https://technical.buildingsmart.org/standards/ifc. Paredes-Valverde, M. A., R. Valencia-García, M. Á. Rodríguez-García,
buildingSMART. 2020. “IFC specifications database.” Accessed October 10, R. Colomo-Palacios, and G. Alor-Hernández. 2016. “A semantic-based
2020. https://standards.buildingsmart.org/IFC/RELEASE/IFC4/ADD2 approach for querying linked data using natural language.” J. Inf. Sci.
_TC1/EXPRESS/IFC4.exp. 42 (6): 851–862. https://doi.org/10.1177/0165551515616311.
Celce-Murcia, M., D. Larsen-Freeman, and H. A. Williams. 1999. The gram- Park, Y., and S. Kang. 2019. “Natural language generation using depend-
mar book: An ESL/EFL teacher’s course. Boston: Heinle & Heinle. ency tree decoding for spoken dialog systems.” IEEE Access 7 (Dec):
Cherpas, C. 1992. “Natural language processing, pragmatics, and verbal 7250–7258. https://doi.org/10.1109/ACCESS.2018.2889556.
behavior.” Anal. Verbal Behav. 10 (1): 135. https://doi.org/10.1007 Penn Treebank. 2003. “Penn Treebank P.O.S. tags.” Accessed June 7, 2020.
/BF03392880. https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank
Ghang, L., W. Jongsung, H. Sungil, and S. Yuna. 2011. “Metrics for quan- _pos.html.
tifying the similarities and differences between IFC files.” J. Comput. Rajpurkar, P., R. Jia, and P. Liang. 2018. “Know what you don’t know:
Civ. Eng. 25 (2): 172–181. https://doi.org/10.1061/(ASCE)CP.1943 Unanswerable questions for SQuAD.” In Proc., 56th Annual Meeting
-5487.0000077. of the Association for Computational Linguistics, 784–789. Stroudsburg,
ISO. 2004. Industrial automation systems and integration—Product PA: Association for Computational Linguistics.
data representation and exchange—Part 11: Description methods: The Sacks, R., C. M. Eastman, P. M. Teicholz, and G. Lee. 2018. BIM hand-
EXPRESS language reference manual. ISO 10303-11:2004. Geneva: book: A guide to building information modeling for owners, designers,
ISO. engineers, contractors, and facility managers. Hoboken, NJ: Wiley.
ISO. 2018. Industry foundation classes (IFC) for data sharing in the con- Shi, X., Y. S. Liu, G. Gao, M. Gu, and H. Li. 2018. “IFCdiff: A content-
struction and facility management industries—Part 1: Data schema. based automatic comparison approach for IFC files.” Autom. Constr.
ISO 16739-1:2018. Geneva: ISO. 86 (Jun): 53–68. https://doi.org/10.1016/j.autcon.2017.10.013.
building information models for intelligent NLP-based information ex- formation model extraction.” J. Comput. Civ. Eng. 27 (6): 576–584.
traction.” In Proc., EG-ICE 2020 Workshop on Intelligent Computing in https://doi.org/10.1061/(ASCE)CP.1943-5487.0000277.
Engineering, edited by L. C. Ungureanu and T. Hartmann, 275–284. Zheng, Y., G. Chen, and M. Huang. 2020. “Out-of-domain detection for
Berlin: Universitätsverlag der TU Berlin. natural language understanding in dialog systems.” IEEE/ACM Trans.
WordNet. 2020. “WordNet: A lexical database for English.” Accessed Audio Speech Lang. Process. 28 (Apr): 1198–1209. https://doi.org/10
June 7, 2020. https://wordnet.princeton.edu/. .1109/TASLP.2020.2983593.