Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Advanced Engineering Informatics 34 (2017) 17–35

Contents lists available at ScienceDirect

Advanced Engineering Informatics


journal homepage: www.elsevier.com/locate/aei

Full length article

Long-term knowledge evolution modeling for empirical engineering


knowledge
Xinyu Li a, Zuhua Jiang a,⇑, Bo Song a,b, Lijun Liu a
a
Department of Industrial Engineering and Management, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, PR China
b
China Institute of FTZ Supply Chain, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, PR China

a r t i c l e i n f o a b s t r a c t

Article history: In this era of knowledge economy, appropriate management of the rapidly evolving knowledge is a real
Received 20 March 2016 and urgent issue for factories and enterprises, in order to maintain the competitive edges. However, fac-
Received in revised form 28 June 2017 ing the onerous analysis required for understanding the long-term knowledge evolution, especially the
Accepted 4 August 2017
evolving of empirical knowledge in the engineering field, effective and comprehensive modeling methods
Available online 17 August 2017
for knowledge evolution are absent. In this paper, a novel knowledge evolution modeling method is pro-
posed for portraying the long-term evolution of empirical engineering knowledge (EEK) and assisting
Keywords:
engineers in comprehending the evolving history. Three phases, EEK elicitation and formalization, EEK
Knowledge evolution
Empirical engineering knowledge (EEK)
networks foundation, and family-tree evolution model construction, are included in the modeling
Evolution model method. This method is developed using natural language processing, semantic similarity calculation,
Knowledge representation fuzzy neural network prediction, clustering algorithm, and latent topic extraction techniques. To evaluate
Data visualization the performance of the proposed modeling method, an evolution model of empirical knowledge in
computer-aided design (CAD) is constructed and then verified. Experimental results show that the pro-
posed method outperforms the former approaches in feasibility and effectiveness, and hence opens up
a better way of further understanding the long-term evolution course of EEK.
Ó 2017 Published by Elsevier Ltd.

1. Introduction development [3]. It then expanded and enriched to co-word net-


works in the following researches [4]. To establish and assess the
In this era of knowledge economy, knowledge was fast matur- networks, metrology method is widely adopted for measuring
ing and mutating, driven by plenty of fresh concepts, techniques, the research development in different aspects: authors, research
methodologies, experiences and activities. The Internet and net- groups, countries, keywords, journals, etc. With the networks and
work technology also endowed knowledge with the characteristics metrology methods, the evolution of the knowledge presented
of quick update, wide transmission and high interdisciplinary, and reviewed in academic literatures are objectively and quantita-
which further promote the renewal of knowledge [1]. Under this tively modeled [5–13].
situation, appropriate management of this rapid evolving knowl- However, in practice, modeling the evolution of academic
edge is a significant task for the factories and enterprises, since it knowledge is far from enough. Both in factory and enterprise, it’s
is the core to maintain their competitive edges in creativity and urgent to model the evolution of engineering knowledge, espe-
adaptability [2]. It also aroused an inevitable research question in cially the evolution of empirical engineering knowledge (EEK).
the field of knowledge management: how the knowledge evolves EEK is deduced and concluded from technical designs, decision-
over a long period of time? Since 1950s, researchers have realized making or other engineering activities, which provides valuable
that the networks constructed with the relations among academic instructions for the success and lessons from the failures. Having
research outputs could distinctly reveal the growth and inheri- concluded and integrated in numerous relevant literatures
tance of knowledge in some disciplines, hence solving the above [14,15,26–30], EEK can be defined as a specific technical know-
question with some preliminary findings. Citation network for aca- how about solving an engineering problem, which is a consequence
demic papers was firstly proposed for discovering the course of of probable association and extension of engineering concepts under
specific constraints of engineering scenarios, deduced and concluded
from repeated observations, practices and communications of engi-
⇑ Corresponding author. neering technicians in long-term engineering activities.
E-mail address: Zhjiang@sjtu.edu.cn (Z. Jiang).

http://dx.doi.org/10.1016/j.aei.2017.08.001
1474-0346/Ó 2017 Published by Elsevier Ltd.
18 X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35

Unfortunately, two aspects impede the direct application of co- neering knowledge could be characterized by some discrete and
word network and metrology method in modeling EEK evolution: standard attributes. Based on this idea, several scholars concen-
the ‘‘randomness” [14–15] and ‘‘tacitness” [16] of EEK. Except for trated on EEK representation [14,15,26–30], and a concise sum-
this traditional methodology, several innovative modeling meth- mary of EEK representation in their works is listed in Table 1.
ods [17–24] have been designed in recent researches. They used As summarized in Table 1, all researchers above regarded the
the theories and tools in complex network analysis (CNA), or put core contents of EEK are Problem, Context and Solution. To compre-
forward some specific evolution modes. However, most of them hensively represent the capacity of EEK and relevance among EEKs,
either fail to consider the innate semantic and logical characteris- they also considered some more detailed attributes, such as Effect,
tics of EEK comprehensively, or lack the capability of portraying Contributors and Relevance. Unfortunately, none of above research-
engineering domain or subdomain-level evolution courses. Such ers considers the timeliness of EEK in representation. In some engi-
defects become more critical, as they analyze a proliferating engi- neering fields, such as electronic device design, advanced material
neering field with a huge amount of fast updating information and manufacturing or software engineering, knowledge is updating fre-
maturing knowledge in a long-term (such as one decade or more). quently. The replacement of some obsolete empirical knowledge
Therefore, a customized and long-term-capable method for filter- by some experience about new methods and equipment occurs
ing colossal information and mining evolution courses of EEK shall rapidly, widely and deeply. Under this condition, it is hard for
be our top priority. the existing representations to reflect the dynamics of EEK and
To achieve the object, this paper proposes a novel long-term portray its life cycle, since they are only designed for representing
knowledge evolution modeling method to reveal the courses of just one steady state of EEK. Therefore, Time should be another
EEK evolution over several overlapping time periods. Based on important attribute in EEK representation, serving as a hint for
the elicitation and formalization of EEK, the proposed method demonstrating the evolution of EEK.
adopts semantic similarity calculation and fuzzy neural network
prediction for constructing the EEK networks—a kind of network
enlightened by the co-word network. Also, clustering algorithm
2.2. Mechanisms of EEK evolution
and latent topic allocation are used to quantitatively discover the
inheritance relationship in engineering fields. Presenting how
The theory of evolution was initially designed for understand-
EEK clusters evolve in a long time with a family-tree model, it will
ing and explaining the development of complex biological systems
be easy to identify and visualize the courses of knowledge evolu-
by Charles Darwin in 1842. Despite controversies regarding to this
tion and discover some interesting trends and patterns, thereby
theory, it is widely adopted in numerous studies. As an analogy, in
obtaining a better understanding of EEK evolution in a long-term.
knowledge evolution, the mutation and selection of concepts, ter-
The remainder of this paper is structured as follows. Section 2
minologies and approaches, are the fundamental causes of the
uses some related works to introduce the empirical engineering
knowledge evolution course that directs from specialty to general-
knowledge (EEK) and its evolution. Additionally, some recent
ity, from vagueness to clarity, and from abstract to concreteness
knowledge evolution modeling approaches are briefly analyzed
[31]. Basically, there are two knowledge evolution mechanisms
in this section. Section 3 designs the general framework of the pro-
accepted commonly by researchers: Darwinism knowledge evolu-
posed evolution modeling method. The elicitation and formaliza-
tion and Lamarckism knowledge evolution [22,24,32–36]. Having
tion of EEK are illustrated in Section 4. Sections 5 and 6 detail
those relevant literatures reviewed, two evolutional mechanisms
the foundation of EEK networks and the construction of knowledge
are compared in Table 2.
evolution model. The example of using the proposed method to
For the evolution of EEK, both mechanisms work simultane-
model the evolution of EEKs originated from computer-aided
ously. On one hand, since the EEK is oriented from the solving of
design (CAD) missions, including the comparison and discussion
engineering problems, the necessary conditions of Darwinism
with former works, is presented in Section 7. The last section con-
knowledge evolution are nurtured. Along with the repeated cycles
cludes the paper with some possible improvements.
of utilization and update in various engineering situations, EEK is
gradually developing. On the other hand, triggered by disruptive
2. Related works technological innovation, fundamental theories, tools or
approaches varied, namely, paradigm shifts happened [36,37].
2.1. Representation of empirical engineering knowledge (EEK) During the process of paradigm shift, the core contents of EEKs
are partly or totally changed, and Lamarckism knowledge evolu-
In factories and enterprises, EEK is of significant value for inno- tion generated. Therefore, in the research of long-term EEK evolu-
vative design and decision-making process, and a good manage- tion, characteristics of both two evolutional mechanisms should be
ment of EEK will greatly promote the processes. As the first and considered. It’s essential to perceive the coexistence of gradual and
foremost step in EEK management, a proper representation of radical change in the evolution courses of EEK. In the proposed
EEK can largely facilitate the subsequent EEK acquisition, accumu- modeling method, they will be considered by the well-designed
lation and reuse. Chan [25] firstly proposed that empirical engi- time windows.

Table 1
Considered attributes in EEK representation.

Representative Problem/ Requirement/Context/ Conclusion/Solution/ Credibility/ Contributors/Participant/ Relevance/


Works Goal Cause Response Effect Individual Reference
p p p
Chen [15]
p p p
D’Eredita [26]
p p p p
Argote [27]
p p p p
Foguem [28–29]
p p p p p
Zhang [30]
p p p p p p
Liu [14]
X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35 19

Table 2
Characteristics of two evolutional mechanisms.

Mechanisms Darwinism knowledge evolution Lamarckism knowledge evolution


Sameness  Knowledge evolution is the corollary of knowledge accumulation and reuse.
 Knowledge evolution is fundamentally triggered by the variation of constraints and requirements of the external environment.
 The essence of knowledge evolution is the mutation of contained key concepts and their relations.
Difference  The course of evolution is slow, passive and continuous.  The course of evolution is radical, active and sporadic.
 The direction of evolution is determined by practical requirements.  The direction of evolution is determined by industrial forecasts.
 The dynamic of evolution is sourced from the adaption of engineering  The dynamic of evolution is sourced from the chasing of technologi-
scenarios. cal tracks.
 The cycle of evolution is short and the occurrence of evolution is  The cycle of evolution is long and the occurrence of evolution is
frequent. occasional.

2.3. Approaches for modeling knowledge evolution for complex networks analysis (CNA) offer a novel direction for
analyzing knowledge networks and modeling their evolution.
Considering how evolution modeling methods extract and uti- Some scholars implemented CNA on co-word networks directly
lize the relations between new ideas and original ones, they can [10–12]. They analyzed the variation of the co-word network
be categorized into three types: metrology methods, CNA-based according to the changes of the measurements in the networks,
approaches and mode-based approaches [38]. and then postulated the evolution of the domain. Besides, some
more complex networks were also established and analyzed, in
order to deal with the mechanism of the knowledge evolution with
2.3.1. Co-word networks and metrology methods
more detailed consideration. These categories of networks
Academic literatures reflect the cognitive status of the scholars,
included hyper-network [17], Bayesian networks [18], and agent-
and referred by subsequent researches. Hence, titles, keywords and
based network [23].
abstracts in the academic literature, as well as the references rela-
Besides the various choices of categories of complex networks,
tions, can be utilized to track the evolution of academic knowledge
researchers also concentrated on different ranges in CNA-based
and ‘‘co-word networks” are constructed. A vertex in the networks
methods. Using communities to model the evolution of complex
is usually a theme of a paper, while an edge indicates a co-
networks is a new hotspot in recent CNA-based research works.
occurrence relation between two pieces of research work. By
With a structured between macro and micro, communities may
observing the change of topological structure of the network in a
reveal the differentiation, mutation and fusion courses for a bunch
long time, scholars could speculate the process of the past develop-
of tightly related knowledge. Some tentative but interesting results
ment and future trend in a certain field. This kind of methodology,
were also achieved. Newman [46] insisted that the variation of
sometimes called metrology methods or ‘‘bibliometric analysis” in
communities were more capable of perceiving the pace of develop-
some recent works, was frequently applied in multiple academic
ment. Aiming at modeling the evolution in physics, Herrera et al.
fields, for example, knowledge-based systems [5], industrial sym-
[47] showed that long-lived communities of knowledge are more
biosis [7], nano-technological innovation systems [8], technology
possible to grow larger.
mapping [9], information systems [10,12], social simulation [11],
Despite the fact that CNA-based method put forward some new
and other more broader and universal academic fields [6,13]. Inte-
ideas and tools for the research of knowledge evolution, some
gration with data visualization techniques pushed the application
insufficiencies in current methods hinder the application in model-
of metrology methods one step forward. With the help of Cite Space
ing EEK evolution. Firstly, only theoretical simulations were estab-
[39], SCI2 Tool [40], In-SPIRE [41], or other visualization software, it
lished in their works. They usually lack a detailed empirical
could be more convenient for the scholars to map the evolution pro-
analysis, not to mention an analysis in an actual engineering field
cess over decades and then insightfully forecast the future trends in
with large amount of knowledge. Secondly, the analysis of knowl-
the field. Except for the academic literatures, in recent years, some
edge evolution carried out in the aforementioned research mainly
structured textual data, for example, patent documents [42] and
based on the geometrical topology of the networks. In this situa-
news articles [43], were also analyzed with co-word networks
tion, the characteristics of nodes and edges, especially the semantic
and metrology method, in order to model the knowledge evolution.
meanings of the concepts and relations, were given insufficient
The relative easiness of the establishment of co-word networks
concentration in their CNA-based methods. As a consequence, the
and metrology methods often makes such approaches the first
results of their works are sparse of informative findings, and their
choice for experts to explore the knowledge evolution in many dis-
validity might not be firmly persuasive.
ciplines. However, metrology methods may not achieve a ideal
result when they are transplanted to model the evolution of empir-
ical engineering knowledge (EEK). This unavailability is caused by
2.3.3. Mode-based modeling approaches
two main reasons. Firstly, as a kind of practical but immature
Considering the uneasy representation (even unfeasible some-
knowledge, EEKs evolves more complexly and randomly in a long
times) for informal knowledge, several scholars concluded some
period of time [14,15]. Under this situation, co-word networks
specific evolution modes to model the evolution process. Chen
and metrology methods mines some trivial information, which is
et al. [19] linked the paths of the changing topics and concepts in
not systematic enough to model a complex EEK evolution. Sec-
the threads of a professional virtual community, and discovered
ondly, as a kind of tacit knowledge, the concepts and logic relations
the modes of knowledge evolution in online discussion. Chen
in EEK are often concealed in the thoughts or natural language of
et al. [20] defined two knowledge evolution strategies: knowledge
the engineers. It’s rather difficult to extract the key contents in
mutation and knowledge crossover, and showed their impacts on
EEK, let alone constructing relations among EEKs [16].
different aspects of organizational performance. O’Leary [21]
traced the sorts of changes in the taxonomy that records the ‘‘best
2.3.2. CNA-based modeling approaches practices” knowledge in enterprises, and then built a mode-based
Based on the understanding and applications of small-world approach to analyze the evolution of taxonomy. Ma [22] proposed
networks [44] and scale-free networks [45], theories and tools the mode of dynamic evolution of ontologies by specifying changes
20 X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35

in ontologies caused by the business requirements. Barthelmé [24] of themes of EEKs and scales of EEK communities in a long-term.
hypothesized three evolution modes of cognition flows in knowl- Hence all of the dimensions listed in Table 3 are required in the
edge systems, namely accommodation, assimilation and differenti- modeling methods. Since none of the traditional approaches can
ation, imitated from the theories of biologic evolution. satisfy such requirements simultaneously, a novel long-term
Such approaches mentioned above demonstrate advantages as knowledge evolution modeling method for EEK should be
they are capable of dealing with semiformal and informal knowl- developed.
edge. However, aforementioned mode-based approaches are
short-term and microscopic compared to the metrology or CNA- 3. Framework of long-term knowledge evolution modeling for
based methods. This is because the mode-based approaches are EEK
designed for some targeted application, as modeling the knowl-
edge evolution in a particular kind of problem-solving process or In order to study the evolution of empirical engineering knowl-
a customized ontology. The consideration of the whole structure edge (EEK) in a long-term, this paper proposed a three-phase mod-
of the field and entire process of its development is absent in such eling method, as Fig. 1 demonstrates.
approaches. Thus, if the time scale of evolution enlarges to one or
several decades, and the range of concentration enriches to a (1) Eliciting and formalizing EEKs: From huge numbers of
domain-level, there will be plenty of unrepresentative and unper- problem-solving documents in a long period, plenty pieces
suasive evolution modes. of EEK are collected and extracted. For each piece of EEK,
seven extracted attributes, namely Engineering Problem,
2.3.4. Summary Problem Context, Problem Solution, Effectiveness, Contributor,
Table 3 summarizes the modeling approaches for knowledge Time and Feature Association, are refined with the assistance
evolution described above, and compares all three types. Generally, of natural languages process (NLP) techniques and formed as
constructing the co-word network and using the metrology EEK = hEP, PC, PS, E, C, T, FAi. Then the formalized EEKs are
method is easy to model the long-term and domain-level evolution organized with the isometric overlapping time windows.
of formal knowledge. CNA-based approach achieves better perfor- Determined by two appropriate parameters, these time win-
mance in analyzing semi-structured data and modeling ability at dows can retain the gradual and mutational evolution.
the cost of ignoring the semantic meanings of the knowledge. On (2) Founding EEK networks: Similarity of each pair of attri-
the contrary, mode-based approach fully supports various unstruc- butes in two EEKs is calculated with their numerical rela-
tured data of informal knowledge and takes the semantic mean- tions and semantic relations. Based on similarity
ings into consideration, but lacks abilities in investigating trends computation of seven attributes, an overall evaluation of
and patterns in long-term and domain-level knowledge evolution. EEK relationship is forecasted by a well-trained T-S Fuzzy
In this paper, EEK is embedded in textual records of problem- Neural Networks (T-S FNN). Filtering out several weak rela-
solving procedures and the paper intends to discover the variation tions with a pre-set threshold, EEK network in each time

Table 3
Knowledge evolution modeling approaches and their characteristics.

Approaches Diversity of processible input data Availability of semantic information Modeling ability in domain Modeling ability in
level time scale
Co-word and Metrology [5–13] Low: structured data Medium: considering keywords Medium: domain-level High: over a decade
CNA-based [10–12,17,18,23] Medium: structured data and semi- Low: containing few semantic High: domain-level and High: over a decade
structured data meanings subdomain-level
Mode-based [19–22,24] High: semi-structured data and High: regarding concepts, topics and Low: concept-level Low: several weeks
unstructured data semantic relations or months

Fig. 1. Framework of proposed EEK long-term evolution modeling method.


X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35 21

window is constructed with the belonged EEK pairs and Respondent 2: offer some corresponding suggestions PS12
their relations, and is expressed and saved with an undi- according to EP + PC1;
rected weighted graph. . . .. . . (Several respondents replies)
(3) Modeling evolution of EEKs: A graph-based clustering Proposer: (First round of interaction finished) adopt some sug-
method – Chameleon Algorithm – is adopted to divide the gestions and end the discussion; OR add some other problem
EEK networks into several EEK clusters. To represent EEK contexts PC2 and continue the discussion;
cluster, the theme of cluster is expressed with the latent (If discussion continues)
topic extracted by Latent Dirichlet Allocation (LDA) model, Respondent 1: offer some corresponding suggestions PS21
and the scale of cluster is quantified with numbers of con- according to EP + PC1+PC2;
taining EEKs. Build sematic and membership inheritance Respondent 2: offer some corresponding suggestions PS22
relations among clusters in neighboring time windows, according to EP + PC1+PC2;
and then a family-tree model is eventually constructed for . . .. . . (Several respondents replies)
revealing the courses in long-term evolution of EEK. . . .. . . (Proposer decides whether to continue)
. . .. . . (Several rounds of interaction between proposer and
4. Elicitation and formalization of EEK respondents)
Proposer: Finally, adopt some suggestions and end the
4.1. Extracting EEK from problem-solving documents discussion.

In collaborative working environments, a lot of electronic docu- Considering the retrievable information in this structural tem-
ments are recorded and spread in solving engineering missions. plate and the elements of EEK mentioned in the previous works
These documents include virtual community Q&A threads, meeting [14,15,26–30], EEK is represented and extracted with a formula
notes, exchanged emails, success or failure cases, revision history EEK = hEP, PC, PS, E, C, T, FAi in this paper. Each attribute is
of a Wiki page and others. A piece of electronic document down- explained as follows:
loaded from a professional CAD community is shown in Fig. 2. In
this piece of document, the questioner encountered a problem in  EP (Engineering Problem) proposes a specific engineering
fixing layer in drawing engineering plots, and he/she described the problem.
situation when the problem occurred. Through communication  PC (Problem Context) describes the background information and
with other answerers, the asker finally got an applicable solution: constrains of this EP.
XREFLAY.lsp. Some documents may also use plots, videos, audios,  PS (Problem Solution) shows the empirical solution of this EP.
program files or mathematical models to provide more abundant  E (Effectiveness) evaluates the fitness of PS for this EP under such
and detailed information, but these parts will not be considered PC.
in this paper.  C (Contributor) collects all the participants in the generation of
These problem-solving documents are the appropriate source of this EEK.
EEK: they record the encountered problems, the background situ-  T (Time) records the time when the text of this piece of EEK is
ations and constraints, the empirical solutions, and some other finally responded.
direct-accessed information, hence being able to cover all main  FA (Feature Association) lists the relationships and their
elements of EEK. Even though these texts vary in wording and strengths with other EEKs.
phrasing, they share a similar structure, which can be presented
with the following structural template: These seven attributes comprehensively represent all aspects of
a piece of EEK concluded in the previous works, and can be
Proposer: propose an engineering problem EP as the topic of extracted from the problem-solving texts without so much effort.
discussion; With the structural template, three main attributes of EEK, EP,
Proposer: provide some problem contexts PC1 in EP; PC, and PS, could be easily collected by excerpting the contents in
Respondent 1: offer some problem solutions PS11 according corresponding paragraphs in the problem-solving documents.
to EP + PC1; Other two attributes of EEK, C and T, are obviously presented in

Fig. 2. A CAD community (forums.autodesk.com) and a piece of problem-solving thread.


22 X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35

the documents, hence they can be extracted directly. E and FA seem Step 3.2: Find the highest regression value, and assign it to E;
a little bit complicated, but they can be concluded from the pro- Step 3.3: Find the corresponding Answer sentences of the
posers’ evaluation of the results and the referred experiences in highest value, elicit the noun phrases in the sentences, and
the threads. Any engineer in the domain, whether experienced or fill them into PS.
not, can match the structural template and elicit a complete EEK Step 4: Obtain the participants from the original document
from the text. directly, and fill them into C;
However, it’s complicated for the intelligent systems to auto- Step 5: Obtain the time of final response from the original doc-
matically achieve such elicitation. Since the askers and respon- ument directly, and fill them into T;
dents are often unable to identify the key concepts in Step 6: FA is left blank at present. It will be filled in when cal-
engineering problems, a plenty of ‘‘noise” in the discussion, formed culating the EEKs attribute similarities.
by some unrelated sentences, hinders the correct understanding of
intelligent systems. Additionally, for a record that contains no suc- In the processing, all noun phrases in the attributes are singu-
cinct topic, for example, the non-topical discussion or Wiki pages larized and converted into lowercase. Repeated phrases in one
revision, intelligent systems are unable to locate the encountered attribute are eliminated, thus each noun phrase represents a
engineering problem in the record. To solve these two issues, more unique engineering concept. The CRF model and Logistic Regres-
detailed works in text processing and classification should be sion Model are trained with some annotated samples before for-
incorporated. malization procedure. Two training methods are separately
illustrated in article [48,49]. The corpus used in NLP procedures
4.2. NLP-based EEK formalization is composed with all collected texts of empirical problem-solving
documents.
To filter out the unrelated information in the problem-solving As an example, Table 4 shows a formalized EEK from the texts
text, and refine the content in each attribute in EEK, this paper illustrated in Fig. 2. After the formalization, the contents of the
adopts natural language processing (NLP) techniques in EEK for- attributes, especially EP, PC, and PS, have already filtered out unre-
malization. With little cost and high speed, NLP techniques could lated information, but still retained key concepts in the original
automatically understand the large volume of problem-solving problem-solving texts. Therefore, NLP-based EEK formalization
texts and precisely extract key concepts. Two former works using keeps a balance between the simplicity and integrity, and makes
NLP techniques laid the basis for this paper: Song et al. [48] and it convenient for the subsequent analysis of EEK evolution.
Shah et al. [49]. To process the texts of Q&A threads, Song adopted
a Conditional Random Field (CRF) text classification technique to 4.3. Dividing time windows
label the role of each sentence (Question/Context/Answer/Plain),
and then represented problem objectives and context constrains For a long researched period, it’s necessary to divide the whole
with some key concepts in the specific sentences. On the other span into several time intervals. Obviously, if each interval lasts
hand, to evaluate the answers, Shah C. extracted vocabulary, gram- longer, more number of evolution phenomena may occur inside
mar and semantic features from questions, answers, and partici- the interval. The variation between neighboring intervals will also
pating users in the threads. They then constructed a Logistic be more drastic. Hence, choosing proper time windows is another
Regression model to quantify the quality of answers and pick out important preparation EEK evolution research. Continuous stages
the best answers. are chosen by most researchers [7,9,11–13]. For example, 15 years
To formalize EEK, this paper combines two NLP-based works ranging from 1996 to 2010 may be divided into three five-year-
together, and fully considers the text structure proposed in Sec- periods: 1996–2000, 2001–2005 and 2006–2010. Therefore, the
tion 4.1. The formalization process is detailed as follows: granularity of evolution in the time dimension is only determined
by one parameter—the length of the time period t.
Step 1: For each piece of text, generate a unique EEKID for However, this choice creates a dilemma: if they want to dis-
indexing; cover some low frequent drastic transitions occurred in the neigh-
Step 2: Extract EP and PC; boring time intervals, t may be set to a rather large number, like,
Step 2.1: As Song did, train a CRF model to label each sen- three or five years; or if some high frequent gradual mutations
tence in the text, dividing the sentences into four categories: are focused in the research, t may be set to a rather small number,
Question/Context/Answer/Plain; like, several months or one year. Unable to retain the mutations
Step 2.2: Elicit the verb-object phrase in the sentences and transitions simultaneously, one parameter is far from dividing
labeled as Question, and fill them into EP; the well-behaved time windows. To address this issue, this paper
Step 2.3: Elicit the noun phrases in the sentences labeled as designs a kind of isometric overlapping time windows. As shown
Question and Context, and fill them into PC. in Fig. 3, the length and moving step for each time window are sep-
Step 3: Extract PS and E; arately identical. Therefore, dividing these time windows are
Step 3.1: As Shah did, train a feature-based Logistic Regres- decided by two parameters: length of the time window t and mov-
sion Model to evaluate all answers; ing distance Dt.

Table 4
A formalized EEK elicited from Fig. 2.

Attribute Data Type Instance (EEKID = 18,005)


EP Set of VO structures (objects are noun phrases) {fix layer; fix layer name} (Noun-Phrase: {layer, layer name});
PC Set of noun phrases {layer name, drawing file, layer manager, dollar sign};
PS Set of noun phrases {file, cad drawing, xref, lsp, web, xreflay.lsp, layer};
E Numerical value 0.645;
C Set of name strings {universe08, imadhabash, beekeecz, justindoughty};
T Numerical value 2015.302
FA – {};
X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35 23

Fig. 3. Isometric overlapping time windows.

The values of t and Dt should be properly set, according to the To establish these EEK networks, the most essential work is to
characteristics of mutations and transitions. Dt is a rather short evaluate the strength of relation among EEK nodes. With the rep-
length of time, indicating the high frequency of mutation and the resentation of EEKs, similarities of seven attributes will facilitate
short cycle of Darwinism knowledge evolution. This kind of evolu- the evaluation. Namely, for two EEKs in the same time window,
tion is caused by quotidian knowledge accumulating, reusing or EEK1 = hEP1, PC1, PS1, E1, C1, T1, FA1i and EEK2 = hEP2, PC2, PS2, E2,
updating, resulting in some minor modifications to the original C2, T2, FA2i, seven types of similarities, EPSim, PCSim, PSSim, ESim,
EEKs. Even though frequently happened, Darwinism knowledge CSim, TSim and FASim, will be calculated in the first step.
evolution is hard to be completely detected, especially from a huge
number of evolutional phenomena. Under this condition, smaller – Computing EPSim, PCSim and PSSim
Dt will generate a more careful search and comparison in the long
evolutional history, and disclose the process of EEK mutation with EPSim, PCSim and PSSim calculate the similarities of two pieces
more detailed steps. However, the cost for the evolution model of EEK in corresponding attributes. Since EP, PC and PS are the sets
with a smaller Dt is a larger number of time windows (for a deter- of concepts, their similarities in two pieces of EEK can be directly
mined total length T, (T – t + Dt)/Dt time widows are divided), and calculated with semantic similarities:
a heavier burden for evolution analysis.
EPSimðEEK 1 ; EEK 2 Þ ¼ SemanticSimðEP 1 ; EP 2 Þ ð1Þ
Another parameter, the length of time window t, ensures the
continuity among the knowledge evolution processes. As shown
PCSimðEEK 1 ; EEK 2 Þ ¼ SemanticSimðPC 1 ; PC 2 Þ ð2Þ
in Fig. 3, neighboring time windows will share some pieces of
EEK. The longer the time windows are, the more proportion of PSSimðEEK 1 ; EEK 2 Þ ¼ SemanticSimðPS1 ; PS2 Þ ð3Þ
EEK in a time window will be reoccurred in another one, and
accordingly, the higher structure similarity between the neighbor- In Eqs. (1), (2) and (3), semantic similarity is calculated with the
ing EEK distributions will be achieved. Since paradigm shifts will containing noun-phrases (attribute ATT is EP, PC or PS):
lead to some drastic changes in themes or members between 8
> 0 SS:I
neighboring time windows, the evolution model can detect these >
>
> 0 P 1
> max NPSimðNPi ;NPj Þ
sudden transitions to report corresponding Lamarckism knowl- < NP i 2ATT 1 8NP 2ATT 2

SenmanticSimðATT 1 ; ATT 2 Þ ¼ 1 B C
j

edge evolution. With larger t and higher continuity, the model is > B CountðATT 1 Þ C
>
> B
2@ P C SS:II
more likely to avoid the misidentification of transitions and hence >
> max NPSimðNP ;NP Þ A
: NPj 2ATT 2
8NP i 2ATT 1
i j

discover exact inheritance relationship before and after the para- þ CountðATT 2 Þ
digm shifts. Certainly, excessively high value of t will let the model ð4Þ
regard some actual paradigm shifts as ordinary modifications, thus
alleviating the effectiveness of evolution modeling. Similar to the SS.I When ATT1 or ATT2 is empty;
setting of Dt, t is set according to the frequency of transition and SS.II When ATT1 and ATT2 contain several noun-phases respec-
the long-cycle of Lamarckism knowledge evolution. tively. NPi is a noun-phrase of ATT1, and NPj is a noun-phrase of
ATT2.
The value of function Count(ATT) is the number of non-
5. Foundation of EEK networks
repetitive noun-phrases in the EEK attributes (EP, PC or PS). And
similarities between noun-phrases are computed as [48]:
5.1. Similarity calculation for EEK attributes
8
>
> 2L1 NP:I
Enlightened by the co-word network, this paper intends to con- <
struct a similar EEK network in each time window. By analyzing NPSimðNP 1 ; NP 2 Þ ¼ 1 NP:II
>
>
the variations in topological structures and semantic meanings of : max WSimðWordi ; Wordj Þ NP:III
Wordi 2NP1 ;Wordj 2NP 2
the neighboring EEK networks, the dynamic of EEK evolution is
discovered. ð4:1Þ
24 X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35

NP.I When words in all the corresponding positions of the two – Computing ESim, CSim and TSim
phrases are synonyms, hyper/hyponyms, or the same; L is the
phrase length; ESim, CSim and TSim analyze some background information in
NP.II When some words of the two phrases are synonyms, two pieces of EEK. Since E, C and T in EEK are numerical values
hyper/hyponyms, or the same; or simple sets of strings, the similarities are computed as follows:
NP.III When all the words of the two phrases are literally differ- (
ent. Wordi is a word of NP1 and Wordj is a word of NP2. 0 if E1 ¼ E2 ¼ 0
WordNet [50] is used to determine whether two words are syn- ESimðEEK 1 ; EEK 2 Þ ¼ minðE1 ;E2 Þ ð6Þ
maxðE1 ;E2 Þ
else
onyms, hyper/hyponyms. For a pair of literally different words,
word similarity is determined with the larger value of their JCn
similarity [51] in WordNet (WS.I) and normalized point wise CountðC 1 \ C 2 Þ
CSimðEEK 1 ; EEK 2 Þ ¼ ð7Þ
mutual information (WS.II): CountðC 1 [ C 2 Þ
(
JCnðWord1 ; Word2 Þ WS:I Function Count(C) is the number of non-repetitive contributors
WSimðWord1 ; Word2 Þ ¼ max pðWord1 ;Word2 Þ in contributor set C.
log pðWord1 ÞpðWord2 Þ
= log jDj WS:II
ð4:2Þ jT 1  T 2 j
TSimðEEK 1 ; EEK 2 Þ ¼ 1  ð8Þ
WS.I According to the study of Mohler et al. [52], JCn similarity t
in WordNet showed an excellent performance in evaluating the
t in Eq. (8) is the length of time window.
semantic similarities between literally different words. However,
some professional terminologies, such as ‘‘xref” and ‘‘lsp” in the
PS of EEK presented in Table 4, are not recorded in WordNet. They 5.2. Evaluating of EEK relationship and establishing EEK networks
are judged with normalized mutual information.
WS.II p(Word1) is the proportion of EEKs that contain Word1, In evaluating the relationship between a pair of EEKs, the signif-
and p(Word1, Word2) is the proportion of EEKs that contain Word1 icance of each attribute is hard to decide, which impedes the uti-
and Word2 simultaneously. Since the similarity of two different lization of linear models. Actually, exact results are not necessary
words should not exceed the similarity of two same words, the lar- in the evaluation, since a minor error won’t significantly affect
gest possible value of point mutual information log |D| is applied to the topology structure of the EEK network. Hence, to obtain a fuzzy
normalize the similarity, and the minus similarity is modified to 0. evaluation from seven attributes, T-S Fuzzy Neural Networks (T-S
FNN) method is applied. T-S FNN combines supervised
– Computing FASim machining-learning and fuzzy logics [53–55], and its layer-
structure and working principle are detailed in the article [54].
Unlike EPSim, PCSim and PSSim, FASim considers the causal rel- In this paper, T-S FNN has been trained to output an overall
evance between two pieces of EEK, rather than the sameness. grade of EEK relationship, based on the input of seven types of cal-
That’s to say, even though two pieces of EEK intend to solve differ- culated similarities. Since T-S FNN is a supervised method, training
ent two engineering problems and adopt different two empirical data is required. A certain amount of sample EEK pairs is manually
answers, they may be still strongly related if one of them will lead evaluated by a panel of domain experts. For each EEK pair, experts
to another. For FASim, two assumed relationships between two are told to understand two formalized EEKs and corresponding
pieces of EEK are considered: original threads, and then they make a judgment of the relevance
according to their own engineering experience. The manually eval-
 Trigger Relationship: if PC2 of EEK2 contains EP1 of EEK1, then uated relevance is then used as the target in the training of T-S
EEK1 will trigger EEK2 FNN, while the calculated seven types of similarities are served
 Solved-by Relationship: if PS2 of EEK2 contains EP1 of EEK1, EEK1 as the input data. Altering the values of parameters in the net-
is solved by EEK2 works, T-S FNN reduces the error between network forecast and
actual target in the training iterations. Until an admissible error
The similarity between EP and PC or EP and PS will determine has been obtained after enough numbers of iterations, T-S FNN will
the type of a relationship, and then decide the value of FASim. be regarded to be well-trained, and be used to evaluate all other
The value of attribute FA is filled in with a record of (EEKID, FASim). EEK pairs.
With the well-trained T-S FNN, relationship between each pair
FASimðEEK 1 ; EEK 2 Þ ¼ maxfTriggerSimðEEK 1 ; EEK 2 Þ;
of EEKs is ranked into one of the several grades. To avoid high den-
Solv edbySimðEEK 1 ; EEK 2 Þg ð5Þ sity and complexity of the network, a certain grade Gradethreshold is
pre-set as a threshold to delete some weak relations in the net-
TriggerSimðEEK 1 ; EEK 2 Þ ¼ maxfSemanticSimðEP 1 ; PC 2 Þ; works. Specifically, for a time window Wn, an undirected weighted
SemanticSimðEP 2 ; PC 1 Þg ð5:1Þ graph UWGn = hVn, Eni is established with the following process:
Step 1: Construct an empty undirected weighted graph UWGn =
Solv edbySimðEEK 1 ; EEK 2 Þ ¼ maxfSemanticSimðEP 1 ; PS2 Þ; hVn, Eni, where Vn = £ and En = £;
Step 2: According to the attribute of T, retrieve all pieces of EEK
SemanticSimðEP 2 ; PS1 Þg ð5:2Þ
that belong to Wn;
Semantic similarities in Eq. (5.1) and (5.2) are also computed Step 3: Use well-trained T-S FNN to forecast a pair of EEK, EEK1
with Eq. (4). and EEK2, if their relevance strength GradeEEK1, EEK2 exceeds
This kind of crossed computation links some pieces of EEK that Gradethreshold, then create an edge e = hEEK1, EEK2, GradeEEK1, EEK2i;
have potential casual associations. Taking these potential casual else, jump to step 5;
associations into similarity evaluation, FASim models the possibil- Step 4: For edge e, add it into En; and add EEK1 and EEK2 into Vn,
ity of choosing a related EEK in solving some predecessor or suc- unless they already exist in Vn;
cessor engineering problems, like the citing relations among Step 5: If all pairs of EEK in Wn are forecasted, output the graph
academic literatures. UWGn = hVn, Eni and end the process; else, return to step 3.
X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35 25

6. Evolution modeling based on clustering and topic extracting then link the similar EEK clusters in different time windows, and
finally visualize the changing courses, constructing a model that
6.1. Clustering EEK and extracting latent topics resembles a family-tree.
EEK evolution model is constructed with the inheritance rela-
In CNA-based approach, key nodes in the sequential networks tionships. Inheritance relationship, which is established from the
are concentrated. Evolution is modeled and explained by the vari- inherited choice of mutations in concepts, terminologies and
ations of these nodes in a long period of time. However, the evolu- approaches occurred in antecedent EEKs, reflects the adaptation
tion process portrayed by these key nodes is too abstract to be and growth of EEK communities. These relationships are quantita-
comprehended, due to the lack of enough sematic and numerical tively evaluated by the similarities between two clusters in the
meaning of these nodes. To tackle this issue and provide more neighboring time windows. Specifically, in this paper, two kinds
detailed clues in themes and scales, referring to articles [46,47], of similarities are taken into account: semantic similarity and
EEK communities are focused to model the evolution of EEK in this membership similarity.
paper. Chameleon Algorithm [56] is adopted to cluster the vertexes The semantic similarity of two clusters, evaluated by the simi-
in EEK networks and find the EEK communities in each time win- larities of two latent topics LTSim, indicates the variation of themes
dow. Chameleon Algorithm is a graph-based clustering method, in EEK evolution. Specifically, for Cluster1 in Wn and Cluster2 in
which contains two phases: partitioning and merging. Considering Wn+1, LTSim between two clusters is computed with the weighted
the strength of each edge, the whole network is torn into plenty of similarity of two corresponding latent topics LT1 = {(Concept1i ,
small groups of nodes with K-nearest method in the first phase. In Nprob(Concept1i ))n} and LT2 = {(Concept2j , Nprob(Concept2j ))m}:
the second phase, these groups are integrated with a merge-
X
n X
m
function RIðC i ; C j Þ  RCðC i ; C j Þa and evaluation threshold MI. LTSimðCluster1 ; Cluster2 Þ ¼
1 2
Nprobi Nprobj NPSimðC 1i ; C 2j Þ ð9Þ
Detailed clustering process and calculation is illustrated in litera- i¼1 j¼1
ture [56]. For the outcome of clustering result, we filter the clusters
with |C|min, a pre-set minimum numbers of containing members. where calculation for NPsim(Concept1i ,Concept2j ) relies on Eqs.
Namely, a cluster which contains more than |C|min pieces of EEK (4.1) and (4.2). With the normalization for the probability of con-
Pn Pm 1 2
in time window Wn is retained, and then denoted as An,i. The set- cepts in Section 6.1, i¼1 j¼1 Nprobi Nprobj ¼ 1 is constant for
ting of |C|min will direct the evolution model to focus on the main each pair of LTs.
communities in the engineering field, neglecting some occasionally For membership similarity, it evaluates the cluster member
occurred EEKs. transformation in the evolution: CTSim. In the overlapping time
To further understand these EEK communities, this paper period, if more mutually EEKs are inherited in the transformation,
adopts a Latent Dirichlet Allocation (LDA) model to extract the CTSim will be higher and the membership inheritance relationship
latent topics and consider the semantic information. Proposed by between two clusters will be stronger. Namely, for Cluster1 in Wn
Blei [57,58], LDA assumes a probabilistic generative model, which and Cluster2 in Wn+1, the CTSim is defined as the inclusion index
models the documents as a mixture of several latent topics and of two clusters [5] and computed with Eq. (10):
characterizes each topic as a unique distribution over the observed
words in the corpus. Similarly, with LDA model, EEK clusters could CountðSubCluster 1 \ SubCluster 2 Þ
CTSimðCluster 1 ; Cluster 2 Þ ¼
be semantically represented with the latent topics constructed by minðCountðSubCluster 1 Þ; CountðSubCluster 2 ÞÞ
the set of concept Ci and its possibility prob(Ci). Besides, because of
the former clustering work, the involved concepts in each cluster
SubCluster 1 ¼ fEEKjEEK 2 Cluster1 and EEK:T 2 W n \ W nþ1 g
are rather converged, thus extracting only one latent topic is
enough for each cluster.
What this paper differs from Blei is the initial weights of con- SubCluster 2 ¼ fEEKjEEK 2 Cluster2 and EEK:T 2 W n \ W nþ1 g
cepts in EEK in the calculation. In Blei’s method, all words in a piece ð10Þ
of document, except for some stop words, have a same initial
weight in the calculation. However, in EEK calculation, according Using these two similarities, two inheritance relationships are
to Huang et al. [59], the different locations of noun phrases in a extracted from clusters in the neighboring time windows:
piece of EEK (i.e. EP, PC or PS) will influence its semantic meaning.
For example, even though two similar engineering problems occur,  Semantic inheritance: The semantic similarity LTSim of two
the distinguished problem contexts may lead to two totally distinct clusters in the neighbor time window is larger than pre-set
solutions. In this situation, the noun phrases in PC weigh more in threshold LTSimthreshold;
the semantic meaning of a piece of EEK, and their weights should  Membership inheritance: The membership similarity CTSim of
be higher than the noun phrases in other locations. To handle this two clusters in the neighbor time window is larger than pre-
issue, during the documents generation for LDA analyzing, this set threshold CTSimthreshold.
paper duplicates the noun phrases in various attributes of EEK
different times to coordinate with the initial weights optimized Scanning any two clusters in neighbor time windows and con-
by Huang. The corpus is also constructed with all the adjusted EEKs necting the pairs that satisfy two inheritance relationships simul-
in a cluster. For the convenience of subsequent calculations, the taneously, the directed graph of a family-tree style evolution
possibilities in the latent topic are normalized with NprobðC i Þ ¼ model for EEKs is thereby created.
PprobðC i Þ , hence denoting the topic as LT = {(Ci, Nprob(Ci))n}. Till With the family-tree model, it is intuitionistic for the domain
probðC i Þ
n practitioners to explore the relationship among the EEK clusters.
now, EEK clusters could be represented with containing EEKs and With the connected nodes in the family-tree model, the evolution
extracted latent topic. courses in the long history of the engineering field can be easily
discovered. Inspecting a whole course in some continuous time
6.2. Family-tree model of EEK evolution intervals, the model reveals the life of a sub-domain, from its birth
to death. Some courses may be merged or split in the model, which
In this final step of our proposed method, we will firstly extract indicates some merging or differentiated patterns in the evolution.
the inheritance relationships between neighboring EEK clusters, Through researching the variations in semantic meanings and the
26 X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35

number of containing members in the courses, more detailed find- In 15 time windows, we randomly selected 440 pairs of EEKs.
ings for the motivations of EEK evolution will be obtained. Two pieces of EEK in each pair belonged to the same time window.
Similar to the co-word networks and metrology methods, the 3 domain experts were invited to score the pairs with a scale of 1–5
family-tree evolution model can be visualized with data visualiza- according to their relevance. 408 valid samples were returned, and
tion software for a better cognition of the whole evolution process. other 32 samples were deleted because of the significant diver-
gence of opinions (all the ranks of a sample evaluated by experts
are different). To train T-S FNN and evaluate its performance, 408
7. Case study
valid samples were divided into two groups: 350 samples train-
data and 58 samples test-data. Considering the number of EEK
In this section, we evaluated the feasibility and effectiveness of
attributes and scoring scales, 7 input nodes, 1 output node and
the proposed long-term EEK evolution modeling method. We ran
14 fuzzy rule nodes were contained in the FNN. Both Learning
the Java code of the proposed method on a Core i5 2.5 GHz PC with
constant and Momentum constant were 0.5, and the iteration
8 GB memory, and visualized the result data with Gephi software
number was 50000, referring to Fukuda [54]. The precision of the
version 0.8.2. We chose the case of using AutoCAD software to fin-
well-trained T-S FNN forecast was 87.9%, which is reliable for
ish computer-aided engineering design (CAD) missions. A family-
evaluating the rest of EEK pairs and constructing the EEK networks.
tree model for CAD EEK evolution was constructed and visualized,
Gradethreshold was set to 3.
showing the evolution courses over a decade. There are mainly
Fig. 4 shows the undirected weighted graph of EEK networks in
three reasons for choosing CAD and AutoCAD: (1) CAD is a typical
time window W0 as an illustration. This network shows a static
knowledge intensive mission in the engineering field, which
structure in CAD domain in 1998–2000. Each node represents a
receives frequent attentions from the engineering technicians as
piece of formalized EEK, and the number on the node is its EEKID.
well as the knowledge management researchers; (2) AutoCAD soft-
The edges in the network indicate the relations between two
ware, which is the most popular CAD tool in worldwide, is applied
linked pieces of EEK. As shown in the network, the red and center
and discussed widely and deeply by huge amount of CAD workers;
nodes in the graph indicate the key EEKs that established a lot of
(3) CAD experts are available in this case for annotating the sam-
connections with other EEKs, while the blue and marginal ones
ples. The family-tree model was also inspected and assessed by
represent some isolated EEKs. In CAD domain, most pieces of the
CAD experts, in order to offer a comprehensive evaluation of the
EEK are highly inter-related (in red1 or yellow colors), assuming
proposed modeling method.
that in CAD problem-solving processes, many similar or related EEKs
were generated and accumulated. They were the fertile sources
7.1. Establishing family-tree evolution model for CAD EEK nurturing the evolution of CAD EEK.

7.1.1. CAD EEK networks


From three professional virtual communities, forums.au- 7.1.2. Family-tree model for CAD EEK
todesk.com, www.cadtutor.net and www.cadforum.cz, this paper col- Based on the CAD EEK networks, Chameleon Algorithm clus-
lected 18004 Q&A English threads describing computer-aided tered the EEKs as the process described in Section 6.1. Referring
engineering design (CAD) missions, as shown in Fig. 2. Forums.au- to Karypis [56], in the first phase, 2-nearest-neighbor approach
todesk.com was the official forum of AutoCAD software, while the was adopted. In merge-function RIðC i ; C j Þ  RCðC i ; C j Þa , a was set
other two forums were the first and most influential virtual com- to 2 to emphasize the relative closeness RC in the second phase.
munities in CAD fields. These CAD missions were solved with dif- The threshold MI was set to 0.2. The minimum number of members
ferent versions of AutoCAD software, ranging from March 1998 in each cluster |C|min was 5, and approximately 90% of all EEKs are
to December 2014. Thus these threads completely recorded the considered in the clusters. Table 5 presents the statistic of EEK
long-term evolution of EEK in CAD field. 415 threads were ran- clusters in each time window.
domly selected and annotated with some experienced CAD The latent topic of each EEK cluster was extracted by LDA
experts. To train CRF model, each sentence in the sample thread model. The noun phrases of EP/PC/PS were integrally duplicated
was labeled with their role (Question/Context/Answer/Plain). The 2/5/3 times respectively during the generation of the documents,
best Answer of each sample thread were also picked out in annota- which is corresponded to the optimized weights in the article
tion and used for training Logistic Regression Model. The remain- [59]. Java implementation JGibbLDA [60] was used for extracting
ing 17589 threads were then processed by these two well- the latent topics of each cluster. Top 20 normalized probabilities
trained models. Verb-object phrases and noun phrases in the and corresponding noun phrases were contained in the latent topic
threads were picked up by two NLP tools, Stanford Parser and Stan- LT.
ford Tagger. Fig. 5 presents a visualized clustering result of EEK network in
Choosing proper values for two parameters in dividing time time window W0 as an example (This figure also shows the unqual-
windows mattered in this the proposed method, and this paper ified clusters that have fewer than |C|min pieces of EEK). After the
determined these values considering the characteristics of users clustering, nodes were grouped together to form different EEK
and tools in CAD fields. Even though the AutoCAD software is communities. Nodes in each EEK cluster were painted with the
updated annually, CAD engineers may not update their tools so fre- same color. Intuitively, clusters in EEK network had a similar
quently, due to their usage habits. In the investigation of the CAD power-law distribution, namely, few larger clusters occupied the
users, it averagely lasted 3 years for them to use the same version most ranges in the domain, while many smaller ones only pos-
of AutoCAD software in their CAD missions. Then they would like sessed some marginal areas. This phenomenon in EEK also coin-
to update the software and adapt themselves to the new versions. cided with the conclusions proposed by Choi [10] and Herrera
Under this situation, the gradual evolution of CAD EEK, corre- [47], that the more popular an topic of EEK community, the more
sponding to the annual update, had a cycle of 1 year; while the likely it is selected and evolved in follow-on engineering activities.
drastic evolution, corresponding to the user habits, had a cycle of To construct the family-tree model to reveal the CAD EEK
3 years. Hence in the setting of overlapping time windows, the evolution more clearly, LTSim and CTSim of any two clusters in
moving distance Dt of the time windows was set to 1 year, and
the length of the time window t was 3 years. Totally 15 time win- 1
For interpretation of color in Fig. 4, the reader is referred to the web version of
dows were divided. this article.
X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35 27

Fig. 4. CAD EEK network in Time Window W0 (1998–2000).

Table 5
The statistic of EEK clusters in each time window.

Time Window Wn Time Span Number of EEKs Number of qualified clusters Number of clustered EEKs Clustering proportion
W0 1998–2000 758 19 657 86.68%
W1 1999–2001 1425 30 1241 87.09%
W2 2000–2002 2003 41 1804 90.06%
W3 2001–2003 2506 46 2288 91.30%
W4 2002–2004 2551 50 2320 90.94%
W5 2003–2005 2714 51 2395 88.25%
W6 2004–2006 2915 52 2593 88.95%
W7 2005–2007 3241 55 2927 90.31%
W8 2006–2008 3582 57 3241 90.48%
W9 2007–2009 3726 60 3393 91.06%
W10 2008–2010 3974 63 3637 91.52%
W11 2009–2011 4077 63 3690 90.51%
W12 2010–2012 4101 64 3719 90.69%
W13 2011–2013 4144 65 3696 89.19%
W14 2012–2014 4084 64 3668 89.81%

neighbor time windows were computed. Referring to Cobo et al. belonged to the subdomain of Management of Drawing Set, identi-
[5], CTSimthreshold was set to 0.5. For LTSimthreshold, it was empirically fied by the inherited highly-weighted noun phrases in their latent
set to 0.3, according to the distribution of similarity values. Finally, topics such as drawing set, license manager, database, and signature.
the family-tree EEK evolution model was constructed. It was These noun phrases were also regarded as the key concepts in this
demonstrated by the data table of Table 6 and a visualized graph subdomain. Considering the size of clusters, this subdomain stayed
in Fig. 6. prosperous from W0 to W3 (namely 1998–2003), and a large num-
In Fig. 6, each node represented an EEK cluster, and the arrows ber of related EEKs were generated and accumulated. Under this
between the nodes indicated the courses of EEK evolution. From condition, it was probable to emerge some gradual knowledge
Fig. 6, despite the fact that courses in the evolution of CAD EEK mutations in this course. Indicated by some newly emerged
were complicated, the family-tree model offered plenty of assis- concepts in the latent topic of A3,11, such as company, provider,
tance in analyzing these courses. Taking the evolution course A0,1 collaboration and copyright, and the more weighted concepts, for
? A1,1 ? A2,2 ? A3,11 as an example, all the clusters in this course example, dwf file (a specific file format for sharing AutoCAD
28 X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35

Fig. 5. Clusters in the EEK network in Time Window W0.

Table 6 of these three clusters, this evolution was occurred because of an


Data table of the family-tree model of EEK evolution. enhanced tool (dynamic blocks) in a new version of AutoCAD soft-
Ancestor Descendant CTSim LTSim RelationID ware. This tool was able to achieve two plotting functions together
A0,1 A1,1 0.8490 0.4373 0
(drawing a block and modifying dimensions in the block), thereby
A0,10 A1,15 0.8333 0.4467 1 constructing a bridge between two EEK communities. Actually,
A0,18 A1,16 0.5333 0.3651 2 with the family-tree model, more interesting evolution phenom-
A0,18 A1,2 0.5455 0.3926 3 ena would be discovered, and the motivations of the evolution
. . .. . . . . .. . . . . .. . . . . .. . . . . .. . .
could be speculated with a further inspection and analysis on the
A1,1 A2,2 0.8923 0.5524 9
A1,10 A2,11 0.5901 0.3988 10 evolution courses in the model.
A1,10 A2,15 0.5094 0.3172 11
A1,11 A2,13 0.9500 0.6005 12 7.2. Comparing family-tree model with related modeling approaches
A1,12 A2,12 0.5405 0.3648 13
A1,18 A2,22 1.0000 0.6711 14
. . .. . . . . .. . . . . .. . . . . .. . . . . .. . . 7.2.1. Comparative evaluation
We directly compare the proposed method with the metrology
method introduced in Section 2.3.1 to prove the effectiveness and
drawings online), publisher and access, an EEK evolution direction the progressiveness, and the bibliometric analysis proposed by
of Working and Sharing the Drawing Set over the Internet was Cobo et al. [5] is chosen in this case. As we mainly focused on
implied with the evolution course from A2,2 to A3,11. the evolution of the conceptual sub-domains, the analyses of
In addition, the split or merged courses revealed some drastic authors and nationalities are neglected and only thematic evolu-
evolution in the family-tree model. For example, two clusters, tion model was established. Like the academic articles, in the setup
A2,6 and A2,25, merged into a larger cluster A3,16 in the following work for bibliometric analysis, we used 5 noun-phrases scoring the
time window. Two separate groups of key concepts were inte- highest TF-IDF values as the keywords for each EEK document, and
grated into a new latent topic. Through analyzing the latent topics two documents are regarded as citation-related if they share at
X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35 29

Fig. 6. The family-tree model for evolution of CAD EEKs (an illustrative part).

least 2 same keywords. Detailed in the article [5], the method was evaluation, and were satisfied with the proposed long-term knowl-
comprised of four stages: (1) applying co-word analysis and cen- edge evolution modeling method in managing EEK, comprehend-
ters algorithm to cluster keywords into themes; (2) calculating ing the evolution history and forecasting future trend of
centrality and density to map a two-dimensional strategic diagram engineering domains, and discovering the EEK evolution courses
and classify the themes; (3) identifying the thematic area (a set of with less time. Therefore, the proposed method performed well
themes that have evolved over several time periods) and discover- in modeling the long-term evolution of EEK. However, the perfor-
ing the evolution courses; (4) evaluating the performance. The mance of bibliometric analysis is significantly inferior in all the
result achieved with bibliometric analysis under the same division questions (in T-test, p-value < 0.05) and less than 60% of the users
of time windows was presented in Fig. 7. Arrows in bold indicate are satisfied with the bibliometric analysis for detecting the long-
the conceptual nexus, while the thin arrows reveal the partial con- term evolution of CAD EEKs.
ceptual nexus. Word on each node was the name of it representing Upon the consideration of respondents’ feedback, three reasons
theme, which was determined by the most centered keyword in contribute to the deficiency of bibliometric analysis. First, since the
the theme. authors of EEK documents often use ordinary natural language to
Delphi method is a helpful tool for acquiring a consensus-based raise questions and offer solutions, EEK documents contain a large
opinion from a panel of experts. 20 randomly selected inheritance amount of ambiguous and personal words and phrases, thereby
relations in the family-tree model (Fig. 6) and bibliometric analysis deteriorating the efficiency and accuracy for modeling EEK com-
(Fig. 7), as well as the whole structure of two models, were sent to munities and discovering the evolution courses. On the contrary,
18 domain experts to evaluate the validity. To facilitate the under- based on the modeling method proposed in this paper, such words
standing, domain experts were provided the latent topics of and phrases are filtered out or unified in the phase of EEK formal-
involved clusters, engineering problems of all EEKs in correspond- ization. Second, bibliometric analysis uses the center algorithm to
ing clusters, and the quantified inheritance relationships. The organize the keywords, leading to the fact that the keywords in
model performance was then assessed by the questionnaire problems, contexts and solutions of an EEK document often fall
referred to Chen et al. [19], as Table 7 presented. Table 8 presents into different clusters. In this situation, each cluster is just a group
the assessment of the family-tree model and bibliometric analysis. of strongly interrelated keywords, and thus fails to represent a
In the investigation with CAD experts, approximately 90% of meaningful sort of EEK. Third, as the EEK mainly focuses on solving
them recognized the family-tree model as ‘‘very useful” or ‘‘useful” an engineering problem, specific concepts are often chosen
30 X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35

Fig. 7. The bibliometric analysis result for evolution of CAD EEKs (an illustrative part).

Table 7
Questionnaire for assessing the performance of the CAD EEK evolution model.

1. The degree to which the Knowledge Evolution Model helps you in organizing CAD knowledge.
A. Very useful B. Useful C. No comment D. Useless E. Very useless
2. The degree to which the Knowledge Evolution Model helps you in understanding the changes of CAD field.
A. Very useful B. Useful C. No comment D. Useless E. Very useless
3. The degree to which the Knowledge Evolution Model helps you in forecasting the future trends of CAD field.
A. Very useful B. Useful C. No comment D. Useless E. Very useless
4. The degree to which the Knowledge Evolution Model helps you in discovering accurate evolution courses of CAD knowledge.
A. Very useful B. Useful C. No comment D. Useless E. Very useless
5. The degree to which the Knowledge Evolution Model helps you in saving time for analyzing the evolution of CAD knowledge.
A. Very useful B. Useful C. No comment D. Useless E. Very useless

(for example, some CAD commands, jargons or slangs) and served and inter-characteristics of documents, while in metrology meth-
as key concepts in EEKs. They are hard to represent a sub-domain, ods, keywords, authors and citation relations are considered sepa-
consequently leading to a higher density of interrelationships rately in the analysis. Contrastively, seven types of targeted
among conceptual groups, as shown in Fig. 7. It raises the difficul- attributes are extracted to formalize EEK and then used to calculate
ties in understanding the evolution courses. In this paper, these the overall similarities in the family-tree model. Such attributes
concepts are related to or replaced by some more ordinary con- portrayed the property of EEK in a more effective way and revealed
cepts in LDA topic extraction. the relations among them in depth. It is also the fundamental
Besides, the family-tree model outperforms the co-word-based advantage of the proposed method for modeling the evolution of
metrology methods [5–13] because of the full consideration of in- EEK.
X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35 31

Table 8
Assessment of the family-tree model and bibliometric analysis.

Methods Questions Very useful(5) Useful(4) No comment(3) Useless(2) Very useless(1) Total Satisfaction
Family-tree model Q1 10 7 1 0 0 18 94.4%
Q2 7 9 1 1 0 18 88.8%
Q3 6 9 1 2 0 18 83.3%
Q4 7 7 2 2 0 18 77.8%
Q5 11 7 0 0 0 18 100.0%
Average 8.2 7.8 1.0 1.0 0 18 88.9%
Bibliometric analysis Q1 4 7 5 2 0 18 61.1%
Q2 2 8 6 1 1 18 55.6%
Q3 2 6 5 3 2 18 44.4%
Q4 1 6 7 3 1 18 38.9%
Q5 5 9 4 0 0 18 77.8%
Average 2.8 7.2 5.4 1.8 0.8 18 55.6%

7.2.2. Qualitative comparison work [10–12], using CNA on some other kinds of networks
Since different modeling methods depend on respectively [17,18,23], and modeling approaches based on evolution modes
encoded empirical knowledge instead of a shared dataset, it is [19–22,24].
impossible for us to compare the proposed method with former Because two kinds of CNA-based approaches only concentrate
research works quantitatively. To make a qualitative comparison on topology characteristics of the networks, and they lack the con-
instead, we use Zachman Framework [61] to assess the capability sideration of semantic associations among knowledge nodes, they
coverage of former models and the family-tree model in this sec- fail to provide a comprehensive nexus for the Network Model.
tion. A complete Zachman Framework can be described by six Besides, the simulation models founded by using CNA on some
primitive models, namely, Data Model (What), Process Model other kinds of networks also fail to cover the Data Model since they
(How), Network Model (Where), Workflow Model (Who), Dynamic neglect the innate semantic and logical characteristics of the
Model (When), and Motivation Model (Why). Detailed representa- knowledge and lose some potential information in knowledge evo-
tions and explanation of these models are given in the article [61]. lution. In spite of the aforementioned potential drawbacks, the
Providing a complete description of the system, Zachman Frame- improving tools of CNA-based approaches are of significant assis-
work is widely utilized to qualitatively compare the models for tance in analyzing and visualizing colossal pieces of knowledge.
elaborating the scenarios, acquiring engineering knowledge and They can be used for optimizing the performance of the family-
modeling information systems [14,62,63]. tree model in precision and efficiency in the future. Owing to the
Assessing modeling capability with Zachman Framework, fact that the mode-based approaches are only able to detect the
Table 9 shows why the family-tree model covers all facets. Then variation modes of the topics or key concepts in a short time,
the comparisons between the proposed method and some former mode-based approaches lack support for Dynamic Model. More-
modeling methods are presented in Table 10. These methods over, they perform unsatisfactorily in portraying an engineering
include using complex network analysis (CNA) on the co-word net- domain or subdomain-level knowledge evolution, making them
unable to cover the facet of Motivation Model. Actually, mode-
based approaches focus on processing the short-term evolution
of knowledge (approximately several weeks or few months), while
Table 9
The modeling capability of the family-tree model against Zachman Framework.
the family-tree model intends to reveal the evolution courses in a
long period (one or more decades). They are not the same cases in
Generic Model Modeling capability of the proposed method
studying the knowledge evolution.
Data Model (What) Supported by the representing of EEKs and their In conclusion, in the aspect of qualitative comparison, the
state transitions advantages of the proposed method lie in that the method provides
Process Model (How) Supported by the three-phases-framework for
knowledge evolution modeling
more detailed information and more comprehensive relationships,
Network Model (Where) Supported by the EEKs’ and clusters’ numerical as well as the long-term and domain-level analyzing capability.
and semantic interrelationships
Workflow Model (Who) Supported by the content information of EEKs
and clusters 7.3. Discussion of the family-tree model
Dynamic Model (When) Supported by the sequence of time windows
Motivation Model (Why) Supported by the goals and goal decomposition Successfully operating with plenty of EEKs in CAD domain, the
of each phase family-tree model provides a great performance in modeling
long-term EEK evolution. To expand the applications of the

Table 10
Comparing proposed method with other approaches against the Zachman Framework.

Representative works This study Choi [10] Lee [18] O’Leary [21]
Modeling methods Family-tree modeling Co-word Network + CNA Other networks + CNA Mode-based Approaches
Data Model (What) s s  s
Process Model (How) s s s s
Network Model (Where) s   s
Workflow Model (Who) s s s s
Dynamic Model (When) s s s 
Motivation Model (Why) s s s 

s: supported, : not supported.


32 X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35

proposed method to more engineering domains, this section dis- ence in adjusting these two parameters, the value of Dt is preferred
cusses the five main factors that influence the performance of to set to the cycle of short-term mutations of EEK, for example,
the family-tree model. equals to the average time usage in common engineering projects.
While the value of t is set to 3–5 times of Dt, in order to consider
– Parameters in dividing time windows the technical paradigm shifts occurred in the field. With these val-
ues of parameters, EEK evolution would be properly modeled with
In the process of time window dividing, two parameters, abundant details in the evolution courses but less workload in evo-
namely time window length t and moving distance Dt, should be lution analysis.
firstly determined according to the analyzed engineering domain.
Fig. 8 reports two structures of family-tree model of CAD EEK – Thresholds in clustering process
under different values of t and Dt.
Comparing the structure of the left family-tree model with the Since the clustering is an unsupervised process, the setting of its
result in Fig. 6, larger moving distance Dt eased the workload in the thresholds, namely MI in the chameleon algorithm and |C|min in fil-
analysis, but alleviated the connections among the EEK clusters. tering out small clusters, exerts an effect on the final result.
More mutations accumulated in the moving distance, smaller sim- Table 11 reports the number of qualified EEK clusters and cluster-
ilarities were computed between neighboring clusters. Therefore, ing proportions for choosing different values of MI and |C|min in
when Dt was larger, the evolution courses in the family-tree model clustering the network shown in Fig. 4 (CAD EEK network for time
were short and incoherent, making it hard to comprehend. On the window W0).
contrary, larger time window length t enhanced the connections These thresholds determine the granularity and inclusiveness of
and resulted in plenty of long courses. However, more courses the clusters: higher MI in chameleon algorithm resulted in a larger
were parallel, if we compared the right structure in Fig. 8 with number of qualified clusters N (since small clusters were harder to
the structure in Fig. 6. Merged and split courses were also less dis- merge into a large one), which enabled the latent topics of the clus-
covered with a larger t. As mentioned in Section 7.1.2, they indi- ters to focus on more specific kinds of engineering problems; while
cated the fusion or differentiation patterns of EEK communities, higher |C|min filtered out more small clusters that contained few
revealing the drastic knowledge evolution. Since the larger t would occasional EEK members and lowered the clustering proportions
increase the number of mutual EEKs between two clusters in CP, directing the evolution model to concentrate on more popular
neighboring time windows, it caused a significantly stronger inher- topics. Achieving a balance between specific and general, minority
itance relationship between these two clusters. The different part and majority, two thresholds should be set after a deliberate
in two clusters, where indicated the evolution, was neglected in adjustment.
this process. Hence, some drastic transitions would be disregarded Empirically, according to the power-law distribution of EEK
and the effectiveness of evolution modeling could be alleviated. shown in Fig. 5, |C|min is set to retain about 80–90% clustered nodes
For the applications of the family-tree model in some fast devel- in the original networks. Under this situation, it is ideal to get a
oping engineering domains, such as chip manufacturing or vehicle number of clusters which is 5–10 times greater than that of divided
R&D, smaller Dt and t is recommended. According to our experi- subdomains in the field. By applying this number of clusters, each

Fig. 8. Structures of family-tree model under two pairs of parameters.

Table 11
Number of qualified clusters N and clustering proportions CP under different MI and |C|min.

MI = 0.1 MI = 0.2 MI = 0.3 MI = 0.4


|C|min = 2 N = 35, CP = 98.42% N = 38, CP = 95.25% N = 52, CP = 93.80% N = 53, CP = 93.67%
|C|min = 5 N = 10, CP = 90.11% N = 19, CP = 86.68% N = 22, CP = 77.97% N = 21, CP = 77.18%
|C|min = 10 N = 6, CP = 86.28% N = 8, CP = 75.07% N = 15, CP = 70.32% N = 16, CP = 71.77%
|C|min = 15 N = 4, CP = 82.85% N = 6, CP = 71.77% N = 9, CP = 60.16% N = 9, CP = 59.63%

(Values in bold were selected in Section 7.1.2, as shown in table 5; CP = Number of clustered EEKs/Total EEKs in the network)
X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35 33

cluster can represent a proper specific EEK community in the However, excessively high thresholds may also deteriorate the
domain as well as reveal its activities in the evolution like the capability of evolution modeling and not lead to a desired result.
fusion or differentiation. Hence, MI is highly recommended to be Fig. 9 illustrates the family-tree models of the first two cases in
set to achieve this number of clusters. Table 12 (the same annotated nodes in Figs. 6 and 9 are the same
EEK clusters). Comparing two model structures with the result in
– Thresholds in constructing inheritance relations Fig. 6 (the third case in Table 12), larger values of two thresholds
more strictly filtered the inheritance relations, and correspond-
Two thresholds in Section 6.2, CTSimthreshold and LTSimthreshold, ingly simplified the family-tree model. The retained relations in
determine the construction of inheritance relations in the family- the model could be more persuasive: as the evolution course
tree model. To evaluate the filtered models after changing these A1,1 ? A2,2 ? A3,11 repeatedly shown in three structures, it firmly
two thresholds, question 2 and 4 (comprehension for the evolution revealed the evolution in Management of Drawing Set. Nevertheless,
and accuracy of evolution courses) in Table 7 was utilized to judge some potential information embedded in other evolution courses
the inheritance relations. Similar as described in Section 7.2.1, for was lost. Especially for the merged and split courses, according
each model, its structure and 20 randomly selected relations were to the computation of CTSim and LTSim (see Eqs. (9) and (10)), it
sent to 18 domain experts to evaluate the validity. Additionally, the was hard to achieve high similarities on both inheritance relations
average counts of valid inheritance relations among 14 pairs of simultaneously. As a consequence, the merged and split courses
neighboring time windows were counted, which demonstrated were likely to be filtered as an ordinary single course or even dis-
the richness of information in the evolution model. The results appeared in the model. For example, as shown in Fig. 6 and men-
are presented in Table 12. tioned in Section 7.1.2, A2,6 (drawing a block) and A2,25 (modifying
Actually, there is a potential positive correlation between CTSim dimensions in the block) merged into a larger cluster A3,16, while
and LTSim: more sharing pieces of EEKs in two clusters, more sim- these two merged courses disappeared in Fig. 9 and the evolution
ilar their latent topics distribute on the engineering concepts. With caused by dynamic block was hence neglected.
lower thresholds, more inheritance relations are tolerated and From the above, in order to guarantee the effectiveness of the
extracted in the family-tree model, resulting in a complex model family-tree model, thresholds in constructing inheritance relations
with plenty of inaccurate evolution courses. Hard to detect the should be fine-tuned to reach a balance between filtering many
main courses, users’ satisfaction for the evolution model was neg- relations and offering enough information about EEK evolution
atively influenced. courses. The adjustment of two thresholds should consider the
numbers and granularity of EEK clusters, and the distribution of
similarity values. Inspections by domain experts are also recom-
Table 12
mended, to double check the stable evolution courses in obtained
Average counts of valid inheritance relations and model evaluation. different models, and comprehensively evaluate the potential
information embedded in the filtered courses, when changing
Values of two thresholds Average Satisfaction for Satisfaction
(CTSimthreshold, counts of Q2 for Q4
thresholds in the system.
LTSimthreshold) relations (Comprehension) (Accuracy)
– Users’ experiences and understandings
(0.75, 0.5) 4.79 72.2% 83.3%
(0.6, 0.4) 10.14 88.8% 83.3%
(0.5, 0.3) 21.36 88.8% 77.8% According to the feedback of respondents (see Table 8),
(0.35, 0.2) 59.29 61.1% 55.6% approximately all the respondents agree with the opinion that
(0.15, 0.1) 163.14 38.9% 22.2%
proposed modeling method can organize empirical knowledge in
(Values in bold were selected in Section7.1.2 7.1.2, as shown in Fig. 6) engineering field with less time. However, the satisfaction for the

Fig. 9. Structures of family-tree model under different values of CTSimthreshold and LTSimthreshold.
34 X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35

correctness of the evolution courses, as well as the easiness for detailed information and comprehensive relations. These relations
forecasting and cognizing the knowledge evolution, are just a little and information will support knowledge workers to manage EEK in
bit inferior. A probable reason for this inferior is the diverse levels a more effective way, cognize the development of the engineering
of users’ experience in engineering fields and their understandings domain, and forecast the future trends with less cost and higher
of the meaning of latent topics. For example, as shown in Table 5, efficiency.
experienced engineers would swiftly and correctly recognize the Although this paper achieves an explicit improvement in long-
theme of Parametric Plotting by matching some special concepts term EEK evolution modeling, there are still three aspects in our
in the latent topic (constraint, parameter manager, drawing and dim- methods that could be further improved, in order to obtain more
constraint) with their own engineering experience. However, these persuasive and insightful conclusions in the following researches.
concepts might not trigger such response in the minds of novice Firstly, an auto-learnt domain dictionary or a professional ontology
engineers. The ‘‘tacit” implications of latent topics may not be will be constructed and utilized for deeper mining connections
easily and completely recognized by these respondents, thus lead- among EEKs, better understanding of the meanings of the concepts,
ing to some biased judgments for the obtained results. and more precisely evaluating their semantic similarities in the
In the following study, the latent topic could be represented future. Secondly, although a lot of EEKs are generated in the net-
with more readable texts for better understanding, to create a works, about 10% EEKs are filtered out by |C|min during the cluster-
more user-friendly family-tree model for novice engineers. ing process, which may contain some potential information that
describes the germination of brand-new knowledge. It’s undeni-
– Quality and quantity of EEKs able that the mechanism of emergence and development of
brand-new knowledge has the same weight to the maturing of
The feasibility and effectiveness of the family-tree model also existing groups of knowledge. Thirdly, some frequently recurring
depend on two external factors: the quality and quantity of avail- patterns in the family-tree model, such as the fusion-pattern like
able domain EEKs. The high quality of EEK means that the contexts A2;6 þ A2;25 ! A3;16 and differentiation-pattern like A1;10 !
in the EEK are detailed and clear, and the empirical solution quit A2;15 þ A2;11 shown in Fig. 6, imply some interesting patterns of
matches the problem under these contexts. With such high quality knowledge evolution. In the future, we will put great effort to
EEKs, the relations among EEK will be correctly extracted and eval- design some rules for searching them and find the corresponding
uated. Besides, the amount of EEKs also matters in the proposed explanations, thereby achieving a further cognition of the motiva-
method. Only by clustering enough numbers of EEK nodes to form tions of the long-term evolution of EEK.
complete EEK communities, can the evolution courses in multiple
time windows be portrayed with high validity and persuasiveness.
Acknowledgements
Choosing multiple sources to collect these EEKs and set proper
limitations in preliminary filtering are essential to ensure these
This research was supported by National Natural Science Foun-
two factors. Taking the CAD EEK used in this paper for instance,
dation of China (No. 70971085, 71271133, 71671113), Shanghai
more than 35,000 CAD Q&A threads were firstly downloaded. How-
Science and Technology commission (No. 13111104500), and
ever, only 18,004 threads that contain at least one round of Q&A
Shanghai Municipal Education Commission (13ZZ012). The
interaction in solving encountered CAD problems were used in
authors would like to thank and express great gratitude to all the
the analysis. Besides, before the beginning time of the focus period,
editors and reviewers for their fair, encouraging and constructive
there were enough amounts of CAD users and problem-solving
advices.
threads in CAD forums chosen in this paper. These collections
and considerations guarantee the quality and quantity of CAD
EEKs, as well as the performance of the evolution model. References

[1] E. Žemaitis, Knowledge management in open innovation paradigm context:


high tech sector perspective, Proc. – Soc. Behav. Sci. 110 (2014) 164–173.
8. Conclusion and future works [2] G. Dosi, M. Faillo, L. Marengo, Organizational capabilities, patterns of
knowledge accumulation and governance structures in business firms: an
introduction, Org. Stud. 29 (2008) 1165–1185.
Understanding how new EEKs evolves from the original ones is
[3] E. Garfield, Citation indexes for science, Science 122 (1955) 108–111.
a challenging task in managing such rapid updating knowledge. [4] M. Callon, J. Courtial, W.A. Turner, S. Bauin, From translations to problematic
With the long-term EEK evolution courses, knowledge-intensive networks: an introduction to co-word analysis, Soc. Sci. Inf. 22 (1983) 191–
engineers could enhance their creativities in accomplishing the 235.
[5] M.J. Cobo, M.A. Martínez, M. Gutiérrez-Salcedo, H. Fujita, E. Herrera-Viedma,
encountered engineering tasks and save their time spent on getting 25 years at knowledge-based systems: a bibliometric analysis, Knowl.-Based
adapt to a new paradigm. Syst. 80 (2015) 3–13.
In this paper, novel works were contributed to the research of [6] M. Angeles, Martinez, M. Jesus Cobo, M. Herrera, E. Herrera-Viedma, Analyzing
the scientific evolution of social work using science mapping, Res. Soc. Work
long-term EEK evolution in three aspects: firstly, in order to con- Pract. 25 (2015) 257–277.
sider the semantic meaning of EEK and formalize it for further pro- [7] C. Yu, C. Davis, G.P.J. Dijkema, Understanding the evolution of industrial
cess, a NLP-based formalization for extracting EEK from problem- symbiosis research a bibliometric and network analysis (1997–2012), J. Ind.
Ecol. 18 (2014) 280–293.
solving documents was designed; then, aiming at showing the [8] A. Avila-Robinson, K. Miyazaki, Evolutionary paths of change of emerging
whole structure of an engineering domain, a method for construct- nanotechnological innovation systems: The case of ZnO nanostructures,
ing EEK networks was proposed; finally, to directly illustrate the Scientometrics 95 (2013) 829–849.
[9] N. Gerdsri, A. Kongthon, R.S. Vatananan, Mapping the knowledge evolution and
evolution courses in the domain, a family-tree model for revealing professional network in the field of technology roadmapping: a bibliometric
the evolution courses among EEK communities was constructed. analysis, Technol. Anal. Strategic Manage. 25 (2013) 403–422.
Based on natural language processing techniques, fuzzy neural net- [10] J. Choi, S. Yi, K.C. Lee, Analysis of keyword networks in MIS research and
implications for predicting knowledge evolution, Inf. Manage. 48 (2011) 371–
works, clustering algorithm and latent topic model, the proposed
381.
method could be implemented on a huge collection of problem- [11] M. Meyer, I. Lorscheid, K.G. Troitzsch, The development of social simulation as
solving documents to illustrate the long-term and domain-level reflected in the first ten years of JASSS: a citation and Co-Citation analysis,
EEK evolution with detailed semantic meanings. The modeling Jasss-J. Artif. Soc. Soc. Simul. 12 (2009) A224–A243.
[12] E.A. Whitley, R.D. Galliers, An alternative perspective on citation classics:
experiment held in CAD field and the comparison with former evidence from the first 10 years of the European conference on information
approaches also verified that the family-tree model could provide systems, Inf. Manage. 44 (2007) 441–455.
X. Li et al. / Advanced Engineering Informatics 34 (2017) 17–35 35

[13] C. Robert, C.S. Wilson, J. Gaudy, C. Arreto, The evolution of the sleep science [41] S. Rose, S. Butner, W. Cowley, Describing story evolution from dynamic
literature over 30 years: a bibliometric analysis, Scientometrics 73 (2007) information streams, in: 2009 IEEE Symposium on Visual Analytics Science
231–256. and Technology, Atlantic City, NJ, USA, 2009, pp. 99–106.
[14] L. Liu, Z. Jiang, B. Song, A novel two-stage method for acquiring engineering- [42] J. Suh, S.Y. Sohn, Analyzing technological convergence trends in a business
oriented empirical tacit knowledge, Int. J. Prod. Res. (2014) 1–22. ecosystem, Ind. Manage. Data Syst. 115 (2015) 718–739.
[15] Y. Chen, Development of a method for ontology-based empirical knowledge [43] N. Kim, H. Lee, W. Kim, H. Lee, J.H. Suh, Dynamic patterns of industry
representation and reasoning, Decis. Support Syst. 50 (2010) 1–20. convergence: evidence from a large amount of unstructured data, Res. Policy
[16] B. Kump, J. Moskaliuk, S. Dennerlein, T. Ley, Tracing knowledge co-evolution in 44 (2015) 1734–1748.
a realistic course setting: a wiki-based field experiment, Comput. Educ. 69 [44] D.J. Watts, S.H. Strogatz, Collective dynamics of ‘‘small-world” networks,
(2013) 60–70. Nature 393 (1998) 440–442.
[17] J. Liu, G. Yang, Z. Hu, A knowledge generation model via the hypernetwork, [45] A. Barabási, R. Albert, Emergence of scaling in random networks, Science 286
PLoS ONE 9 (2014) e89746. (1999) 509–512.
[18] K.M. Lee, K.M. Lee, Agent-based knowledge evolution management and fuzzy [46] M. Newman, Communities, modules and large-scale structure in networks,
rule-based evolution detection in bayesian networks, in: 2013 International Nat. Phys. 8 (2012) 25–31.
Conference on Fuzzy Theory and its Applications (Ifuzzy 2013), 2013, pp. 146– [47] M. Herrera, D.C. Roberts, N. Gulbahce, Mapping the evolution of scientific
149. fields, PloS One 5 (2010) e10355.
[19] Y. Chen, Y. Chen, Knowledge evolution course discovery in a professional [48] B. Song, Z. Jiang, X. Li, Modeling knowledge need awareness using the
virtual community, Knowl.-Based Syst. 33 (2012) 1–28. problematic situations elicited from questions and answers, Knowl.-Based
[20] D. Chen, T. Liang, Knowledge evolution strategies and organizational Syst. 75 (2015) 173–183.
performance: a strategic fit analysis, Electron. Commer. Res. Appl. 10 (2011) [49] C. Shah, J. Pomerantz, Evaluating and predicting answer quality in community
75–84. QA, in: Proceedings of the 33rd International ACM SIGIR Conference on
[21] D.E. O’Leary, Empirical analysis of the evolution of a taxonomy for best Research and Development in Information Retrieval, ACM, 2010, pp. 411–418.
practices, Decis. Support Syst. 43 (2007) 1650–1663. [50] G.A. Miller, WordNet: a lexical database for English, Commun. ACM 38 (1995)
[22] Y. Ma, B. Jin, Y. Feng, Dynamic evolutions based on ontologies, Knowl.-Based 39–41.
Syst. 20 (2007) 98–109. [51] J.J. Jiang, D.W. Conrath, Semantic similarity based on corpus statistics and
[23] A. Pyka, N. Gilbert, P. Ahrweiler, Simulating knowledge-generation and lexical taxonomy, in: Proceedings of International Conference Research on
distribution processes in innovation collaborations and networks, Cybernet. Computational Linguistics, 1997, pp. 1–15.
Syst. 38 (2007) 667–693. [52] M. Mohler, R. Mihalcea, Text-to-text semantic similarity for automatic short
[24] F. Barthelmé, J. Ermine, C. Rosenthal-Sabroux, An architecture for knowledge answer grading, in: Proceedings of the 12th Conference of the European
evolution in organisations, Eur. J. Oper. Res. 109 (1998) 414–427. Chapter of the Association for Computational Linguistics, Association for
[25] F. Chan, Application of a hybrid case-based reasoning approach in Computational Linguistics, Athens, Greece, 2009, pp. 567–575.
electroplating industry, Expert Syst. Appl. 29 (2005) 121–130. [53] Y. Shi, H. Pan, T. Li, Evaluation model of university management
[26] M.A. D’Eredita, C. Barreto, How does tacit knowledge proliferate? An episode- informatization level based on fuzzy neural network t-s, J. Investig. Med.
based perspective, Org. Stud. 27 (2006) 1821–1841. 62S (2014), S108-S108.
[27] L. Argote, E. Miron-Spektor, Organizational learning: From experience to [54] S. Fukuda, Assessing the applicability of fuzzy neural networks for habitat
knowledge, Organ. Sci. 22 (2011) 1123–1137. preference evaluation of Japanese medaka (Oryzias latipes), Ecol. Inform. 6
[28] P.P. Ruiz, B.K. Foguem, B. Grabot, Generating knowledge in maintenance from (2011) 286–295.
Experience Feedback, Knowl.-Based Syst. 68 (2014) 4–20. [55] W.K. Wong, X.H. Zeng, W.M.R. Au, A decision support tool for apparel
[29] B.K. Foguem, T. Coudert, C. Beler, L. Geneste, Knowledge formalization in coordination through integrating the knowledge-based attribute evaluation
experience feedback processes: an ontology-based approach, Comput. Ind. 59 expert system and the T-S fuzzy neural network, Expert Syst. Appl. 36 (2009)
(2008) 694–710. 2377–2390.
[30] Y. Zhang, X. Luo, J. Li, J.J. Buis, A semantic representation model for design [56] G. Karypis, E.H. Han, V. Kumar, Chameleon: Hierarchical clustering using
rationale of products, Adv. Eng. Inform. 27 (2013) 13–26. dynamic modeling, Computer 32 (1999) 68.
[31] D.J. Futuyma, T.R. Meagher, Evolution, science and society: evolutionary [57] D.M. Blei, Probabilistic topic models, Commun. ACM 55 (2012) 77–84.
biology and the national research agenda, Calif. J. Sci. Educ. 1 (2001) 19–32. [58] D.M. Blei, A.Y. Ng, M.I. Jordan, Latent Dirichlet allocation, J. Mach. Learn. Res. 3
[32] S. Harvey, Creative synthesis: exploring the process of extraordinary group (2003) 993–1022.
creativity, Acad. Manage. Rev. 39 (2014) 324–343. [59] Y. Huang, Z. Jiang, C. He, B. Song, L. Liu, An inner-enterprise wiki system
[33] S. Kim, Darwin and Lamarck in creative ideas: a qualitative study of inventors’ integrated with semantic search for reuse of lesson-learned knowledge in
stories, Qual. Quant. 47 (2013) 2945–2958. product design, Proc. Inst. Mech. Eng. Part B: J. Eng. Manuf. (2014),
[34] R.W. Weisberg, Creativity: Understanding Innovation in Problem Solving, 0954405414555739.
Science, Invention, and the Arts, John Wiley & Sons, 2006. [60] X. Phan, L. Nguyen, S. Horiguchi, Learning to classify short and sparse text &
[35] B.J. Loasby, The evolution of knowledge: beyond the biological model, Res. web with hidden topics from large-scale data collections, in: The 17th
Policy 31 (2002) 1227–1239. International World Wide Web Conference (WWW 2008), Beijing, China, 2008,
[36] G. Dosi, Technological paradigms and technological trajectories: a suggested pp. 91–100.
interpretation of the determinants and directions of technical change, Res. [61] J. Zachman, Excerpted from the Zachman Framework: A Primer for Enterprise
Policy 11 (1982) 147–162. Engineering and Manufacturing, Zachman International, 2003.
[37] T.S. Kuhn, The Structure of Scientific Revolution, University of Chicago Press, [62] Y. Wang, L. Zhao, X. Wang, X. Yang, S. Supakkul, PLANT: A pattern language for
1962. transforming scenarios into requirements models, Int. J. Hum Comput Stud. 71
[38] P.C. Palvia, S. Palvia, J.E. Whitworth, Global information technology: a meta (2013) 1026–1043.
analysis of key issues, Inf. Manage. 39 (2002) 403–414. [63] A. Radwan, M. Aarabi, Study of implementing Zachman framework for
[39] C.M. Chen, Searching for intellectual turning points: progressive knowledge modeling information systems for manufacturing enterprises aggregate
domain visualization, Proc. Natl. Acad. Sci. USA 1011 (2004) 5303–5310. planning, Simulation 16 (2011) 18.
[40] Sci2 Team, Science of science (Sci2) tool, 2009.

You might also like