Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Patent Analysis with Text Mining for TRIZ*

Yanhong Liang , Runhua Tan, Jianhong Ma


Institute of Design for Innovation, Hebei University of Technology, Tianjin, P.R. China

To obtain useful information, the method by scanning


Abstract - Patent is an important knowledge source for or reading the indexed patent documents from long lists of
industrial research and product development on account of noisy results, is a rather trivial and time-consuming task
its innovation and practicability. In TRIZ theory, a product that requires a careful manual selection. And the defects
design problem can be considered as one or more of extract information from patent documents, which
Contradictions. But the Contradictions and corresponding
indexed by standard keyword-based search methods, will
Inventive Principles are rather abstract and general. It will
be more helpful for innovators to solve the problem if they ignore relevant solvable schemes and enlightenment in
obtain related examples from published patents that have other fields. Especially in using TRIZ, patent searched
solved the Contradictions. In order to search the relevant based on keyword or international patent classification
patens according to Contradictions and Inventive Principles (IPC) is not enough, because patents that have solved
automatically, text mining is used to analyze patent same Contradictions and have used same Inventive
documents. In this paper, a computer-aided methodology to Principles may come from different fields. Thus, the
classify patents according to Contradictions and Inventive automatic method to classify patents for TRIZ user is in
Principles is proposed. great demand. A very well suitable method to solve this
difficult problem is text mining. Text mining, which
Keywords - Contradiction, Inventive Principle, TRIZ,
specialized for text formatting, is applied to derive
patent analysis,text mining
information from patents.
In this paper, we first introduce TRIZ, mainly focused
on Inventive Principles, Contradictions, TRIZ process to
I. INTRODUCTION solve problems, and insufficiency of systematic
Information hidden in patents can provide precious innovation software. Then patent analysis at the present
sources for technological-scientific innovation, such as time is reviewed. Following that a proposed methodology
the invention tool--TRIZ, which is originated from of mining patents according to Contradictions and
studying patents. Under analysis of millions of successful Inventive Principles is provided. Finally future research is
patents TRIZ has created as a generic problem-solving presented. Our focus, however, would be largely directed
tool that encapsulates knowledge, strategies and best towards the approach to classify patents according to
practices of the most creative and innovative thinkers of Contradictions and Inventive Principles.
our times.
Altshuller, the father of TRIZ, recognized that the
place where to look for the basics of innovation and new II. TRIZ: A TOOL FOR CREATIVITY AND
ideas was not in the brains of inventors, but in the INNOVATION
published inventions [1]. In product innovation the fact is TRIZ is the Russian acronym that means Theory of
that the problem we are trying to solve may have already Inventive Problem Solving [1]. Unlike the spontaneous
been solved by someone else. That is to say, what we and intuitive creativity of individuals or groups, it is a
normally need to do is to compromise. According to knowledge-based technology for generating new concepts
TRIZ, a significant operation of product innovation is to and ideas. It involves a series of tools, methods and
solve design contradictions. When Contradictions and strategies. TRIZ began in 1946 when scientist G.S.
Inventive Principles are defined, the product can be Altshuller and his colleagues in the former U.S.S.R.
developed by referring to the analogous inventions not discovered that an invention or a problem solving is not a
only in related fields but also in other fields that have random process, but rather is governed by certain
solved the same Contradictions previously. objective laws. In reviewing thousands of excellent
Patent has many key technologies for industrial patents in different fields, Altshuller realized that all
research and product development on account of its innovations emerge from a small number of Inventive
innovation and practicability. In recent years, patent Principles and most difficult problems in engineering
analysis [2, 3, 4, 5] is more highlighted in high- involve fundamental Contradictions.
technology management as the process of innovation In TRIZ theory Contradictions are characterized by a
becomes more complex, the cycle of innovation becomes desire to improve one aspect of a system following
shorter and the market demand becomes more volatile. another property declined in performance or value. By
analyzing the ways of solving and eliminating the
* yanhong Liang thanks sponsors of Scientific Research Contradictions, Altshuller discovered that only a limited
Program of Hebei and Ph.D. Programs Foundation of Ministry
number of Inventive Principles were used to solve or
of Education of China

978-1-4244-2330-9/08/$25.00 ©2008 IEEE 1147


Proceedings of the 2008 IEEE ICMIT

eliminate Contradictions in analyzing many thousands of B. Systematic Innovation Software


breakthrough inventions, irrespective of the patents’ field
(e.g., aerospace and agriculture). He categorized these Because TRIZ is built on a database of hundreds of
Inventive Principles in several retrievable forms, thousands of patents, Inventive Principles and
including a Contradiction table, 40 Inventive Principles, Contradictions, etc., some software has been developed to
and 76 Standard Solutions [1], which allows instant provide the corresponding Inventive Principles with the
access to the strategies and principles used in these highly outlines and illuminations to be displayed quickly in
successful designs. These generic Inventive Principles can friendly formats when Contradictions of the specific
be applied to all areas of technology, greatly reducing problem are defined. Many companies are engaged in
time to produce breakthrough ideas and inventions. developing computer aided innovation software with
TRIZ as the core. Currently popular software includes
A. TRIZ Process to Solve Problems Pro/Innovator [7], Goldfire Innovator [8] and
TechOptimizer [9], etc. And Hebei University of
In the course of solving any technical problem, one Technology has developed InventionTool 2.0. The
tool or many can be used. The 40 Inventive Principles of software acts as catalyst for creativity and problem
Problem Solving are the most accessible "tool" of TRIZ. solving. But Inventive Principles and the corresponding
According to TRIZ, when we have a problem, process, or illuminations are rather abstract and there are different
product that appears to have a contradiction in its design ways to use every Inventive Principles, it is more helpful
or operation, the same Contradiction has been faced, and for innovators to obtain related examples from published
solved with the same Inventive Principles before, by patents that have solved the Contradictions.
people in other industries or technologies, i.e., “someone Through the proposed software innovators can
somewhere has already solved a problem like yours”. acquire Contradictions and Inventive Principles, just as
Using TRIZ, the engineering problem is solved through step 2 and step 3 have mentioned above. After innovators
four steps as shown in Fig.1. have a Contradiction, next what they are interested in is
Step 1 is to analyze the specific problem being the feasible solution. It is more helpful to inspire
encountered. Step 2 is to formulate the problem in terms innovators if they are provided specific examples about
of physical Contradictions using TRIZ language. Could how someone else has solved the analogous
the improvement of one technical characteristic to solve Contradiction. This process is on contrasted to the process
the problem cause other technical characteristics to of TRIZ generation partly.
worsen? Step 3 is to search for Inventive Principles to So far there is no open patent database available with
solve the Contradictions from the Contradiction table. sufficient examples classified by Inventive Principles and
Step 4 is to look for analogous solutions to adapt to the Contradictions [10]. One reason is that it consumes great
specific solution. In step 3, the Inventive Principles are time and man-power to manually classify patents. Since
hints that will help innovators find possible solution to the patent documents are in a large amount and they are
problem; however the Inventive Principles are rather increasing every day. For such classification, a huge
abstract and general, it will be more helpful for innovators number of patents have to be analyzed in their contents.
to solve the specific problem if they obtain related We simply don't have time to follow all the worldwide
examples from previous analogous solutions about how patents manually. Patents classified to this purpose
former inventors have solved the Contradictions with the automatically or semi-automatically will save much time
corresponding Inventive Principles. The relevant patents and efforts. Therefore a method to classify patents
have described the inventions in details, which make automatically according to Inventive Principles and
possible to produce creativity to solve the facing Contradictions for TRIZ users and TRIZ software
Contradiction, especially in other areas. Therefore, patent developers is in great demand.
classification according to Inventive Principles associated
with the related Contradictions can provide quite helpful
references to the innovators. III. PATENT ANALYSIS

Patent documents contain important research results


that are valuable to the process of product innovation. A
patent is a legal title granting its holder the right to
prevent third parties from commercially exploiting an
invention without authorizations [11]. An invention can
belong to any field of technology. It has been indicated by
the World Intellectual Property Organization (WIPO) that
patent publications cover approximately 90%-95% of the
scientific research results worldwide, greater than the
percentage that all scientific journals cover, and probably
70% of patent publications have never been published in
Fig.1. the General Model for Problem Solving with TRIZ [6] other non-patent literature. Now most patents can be

1148
Proceedings of the 2008 IEEE ICMIT

obtained freely on Internet. That people often consult and British Classification, have been developed.
patents in scientific research not only can improve However, the classification schemes used by these
research point and level of the research item, but also can researchers according to the technical fields are not
save almost 60% of the time and 40% of the cost. In suitable for TRIZ users.
recent years patent analysis has been recognized as an Patent documents are divided into different areas
import task for product development. according to technology fields, which is helpful to search
The technologies and schemes implemented for the prior art for traditional inventors. However, it is
patent are described in its claim and description and so on. inadequate for TRIZ users since TRIZ users are interested
Because natural language is complex and multiplicity, it is in previous patents that have solved the same
difficult to handle a patent document on computer. The Contradiction and used the same Inventive Principles,
traditional method of extracting knowledge from patents which may come from different fields [10, 17]. Patent
was based on manual analysis carried out from experts. documents, which are classified according to
However patent documents are often lengthy and rich in Contradictions, will provide a broader view for TRIZ
technical and legal terminology and thus hard to read and users and TRIZ software developers, by helping them find
analyze for non-specialists. It is a time consuming and possible inspiration from the field that may be totally
labor-intensive task, such as the theory of TRIZ is different from others.
developed by over 1500 person-years of research studying
over two million of the world’s most successful patents
[12]. Nowadays the traditional method is impractical as IV. PROPOSED METHODOLOGY OF MINING
patent database grow exponentially. PATENTS ACCORDING TO CONTRADICTIONS
Patent analysis technologies include patent
bibiometric data analysis [13], patent citation analysis A. Framework for Patent Analysis
[14], patent statistical analysis [15], patent classification,
and so on. Bibliometric analysis of patents provides Since the original patent documents are expressed in
information on the growth of the inventive activity and natural language format, text mining, as a technique to
technological trends. Patent citation analysis is usually perform knowledge discovery from collections of
used to measure the quality of patents on the examination unstructured text, is used to extract useful information
of citation links between patents and scientific literature. from huge sets of patents. The overall process of patent
Patent statistical analysis is to apply statistical indictors of analysis with text mining for TRIZ is made up of several
feature words to express patent information by word steps. First of all, full patent documents to be analyzed are
analysis. Patent classification is used to arrange and index collected in electronic text format. This may involve a
the technical content of patent specifications so that a repeated process of devising a set of query terms,
document disclosing the invention identical or similar to searching a couple of patent databases, filtering undesired
the invention for which a patent is claimed can be quickly patents, and downloading patents for local analysis.
found. People can fetch all patents under some IPC categories,
In addition more and more patent analysts have keywords, and/or within some year limits, etc. Many
increased their interests in discovering and exploring patent offices already allow people to download abstracts
information hidden in patents related to technological and completed texts of their patents freely, such as the
activities and innovation. The natural language of patent USPTO from United States, the EPO from Europe. In
is analyzed with computer. For example, Cascini provided particular, http://ep.espacenet.com/, the download service
a Subject-Action-Object model to extract useful of the European Patent Office has become a very popular
information [16]. Byungun proposed a text-mining-based source of information.
patent network method [3]. Although clustering and Second, raw patent documents are transformed into
finding the relationship of similar contents between structured data. Since the original documents are
patents can extract useful information from huge sets of expressed in natural language format, they must be
patent documents, for the time being the defects of these transformed raw data into structured data in order to be
methods are lack of semantic analysis and the results of analyzed and utilized. These patent documents are
analysis are difficult to put into practice for inventors segmented into bag-of-words, and then stop words are
using TRIZ. For TRIZ users, patents are required to be filtered from the bag-of-words to reduce the dimension.
classified by the Contradictions and Inventive Principles. Semantic characteristic and word sense disambiguation
Currently popular patent classification schemes are are carried out. The prior patent documents and the
used to organize and index the technical content of patent corresponding Contradictions and Inventive Principles are
specifications so that specifications on a specific topic or samples of training text collections. A text mining
in a given area of technology can be identified easily and methodology specialized for patent analysis for TRIZ is
accurately. Before their publication, patent documents are proposed and shown in Fig. 2. In relation to patent
given one or more classification codes based on their analysis, text mining is used as a data processing and
textual contents for topic-based analysis and retrieval. information-extracting tool. Then, the patents to be
Many patent classification schemes, such as IPC analyzed are classified according to Contradictions and
(International Patent Classification), US Classification Inventive Principles.

1149
Proceedings of the 2008 IEEE ICMIT

functional roles. For example, a, about, above, about,


across, after, afterwards, again, against, all, almost, alone,
almost, alone, along, already, also, etc. Furthermore,
words that occur extremely often can be said to be of little
information content to distinguish between documents.
Also, words that occur very seldom are likely to be of no
particular statistical relevancy and can be removed [12].
Different forms of the same words are usually
problematic for text data analysis, because they have
different spellings and similar meanings (e.g. learns,
learned, learning). So lemmatization and stemming are
applied. Lemmatization is a process of finding the
normalized form of a word, i.e. removing suffixes.
Generally speaking, it transforms verb forms to the
infinite tense and nouns to the singular form. Words with
a common stem will usually have similar meanings, for
instance, the word works, worked, working. Frequently,
the words are conflated into a single word work by
removal of the various suffixes –s, -ed, -ing. In addition,
the suffix stripping process will reduce the total number
of words, and hence reduce the size and complexity of the
Fig.2. Steps in patent analysis for TRIZ
data in the collection. Stemming is a process of
transforming a word into its stem which is the normalized
B. Data Preprocessing form. It is similar to lemmatization that stemming does
not require to replace the suffix of a word but to extract
1) Bag-of-words Representation
basic forms of the words, i.e., to strip the plural s from
nouns, -ing from verbs, or other affixes. A lot of work
For mining large patent collections it is necessary to
has been carried out in word lemmatization and
preprocess the patent documents and store the information stemming, just as the well-known porter’s algorithm [18].
in data structure, which is more appropriate for further
processing than a plain text file. Since a patent document 3) Patent Analysis based on Semantic Characteristic
is described based on words, that is to say it is composed
of a set of words. A patent document can be represented
It is inadequate to the simple bag-of-words with Term
on a word level (bag-of-words representation). A patent
Frequency and Vector Space Model for patent analysis,
document is preprocessed by removing all punctuation
because word sense is ambiguous in natural language
marks, tabs and other non-text characters. And after
processing (NLP). This phenomenon is ubiquitous that
splitting the text by white space and removing words often have more than one meaning, sometimes the
punctuations, a stream of words will be obtained. This meanings of word fairly similar and sometimes
tokenization representation is then used for further completely different. Without disambiguation the
processing. The set of different words obtained by accuracy of patent classification will be influenced. And
merging all patent documents is called a patent-document
the description of TRIZ Contradictions and Inventive
collection.
Principles are abstract and general, so patents searched
only according to the literal words are usually not
2) Filtering and Lemmatization, Stemming
appropriate. WordNet [19] gives a word multiple senses
which are helpful to connect other words in the text.
In order to reduce size of the bag and dimension of Whenever the designer has a problem in product
the description of patent document in the collection, the development, he would search for inventive experience
set of words of the patent documents can be reduced by and information to enlighten his thinking. However, this
filtering and lemmatization.
search is not an easy process to TRIZ user. A keyword
Filtering is to remove words from the collection, i.e.
search is helpful to the innovator but sometimes the useful
from the document. Term frequency (TF) and inverse
patents maybe be neglected since the same Inventive
document frequency (IDF) are two parameters used in
Principles being used to solve the problem come from
filtering words. Low TF and DF words are often removed patent documents in different fields. Therefore there is
in indexing of a collection. However, using them alone required to return records accurately, that might solve
does not prevent undesired words such as function words analogous problems for innovators.
from being calculated. So stop words filtering is added. In Under these circumstances, the prior patent
the view of non-linguistics stop-words are those words
documents corresponding to Contradictions and Inventive
that carry little or no information, like articles,
Principles are samples to produce training text collections.
conjunctions, prepositions, etc. They mainly act as
In the phase of training, the training patents is

1150
Proceedings of the 2008 IEEE ICMIT

preprocessed by removing stop words, filtering words and [2] V. W. Soo, S. Y. Lin, S. Y. Yang, S. N. Lin, S. L. Cheng,
stemming, then semantic analysis is performed by “A cooperative multi-agent platform for invention based on
WordNet. On the basis of WordNet, word is substituted patent document analysis and ontology”, Expert Systems
by word sense collection, which becomes characteristic with Applications, vol. 31, issue 4, pp.766-775, Nov. 2006.
[3] B. Yoon, Y. Park, “A text-mining-based patent network:
items of the document eigenvector. Then using latent
Analytical tool for high-technology trend”, The Journal of
semantic indexing model, the dimension of document High Technology Management Research, vol. 15, issue 1,
vector space is dropped by MATLAB. Finally the pp. 37-50, Feb. 2004.
structured data can be stored to .arff or .csv format, which [4] G. Fischer, N. Lalyre, “Analysis and visualization with host-
® ™
is appropriate to classify the patents using WEKA based software-The features of STN AnaVist ”, World
software [20]. Patent Information, vol. 28, issue 4, pp.312-318, 2006.
[5] Y. H. Tseng, C. J. Lin, Y. L. Lin, “Text mining techniques
for patent analysis”, Information Processing and
V. CONCLUSION AND FUTURE WORK Management, vol.43, issue 5, pp. 1216-1247, Sep. 2007.
[6] TRIZ, Available from: http://www.triz-journal.com/
archives/what_is_triz/: 2007/05/10.
Classified patents according to Contradictions and [7] Pro/Innovator, Available from: http://www.iwint.com.cn/:
Inventive Principles can help innovators using TRIZ to 2007/06/24
solve the specific problem. Currently, however, we are [8] Goldfire, Available from: http://www.goldfire.com/:
lacking open databases with sufficient classified patents 2008/02/10
of this kind partly because of the huge manpower [9] TechOptimizer, Available from :http://www.invention-
requirement of manual classification. With a wider machine.com/: 2008/02/24.
[10] H. T. Loh, C. He, L. X. Shen, “Automatic classification of
application of TRIZ and enormous increase of patents patent document for TRIZ users”, World Patent
worldwide, there is an urgent need to automatically Information, vol. 28, issue 1, pp. 6-13, Mar. 2006.
classify patents for TRIZ users. The main purpose of this [11] Patent, Available from: http://www.epo.org/patents/Grant
paper is to propose a computer-aided approach for -procedure/About-patents.html: 2008/01/13
classifying patents in TRIZ categories. [12] R. H. Tan, Innovative design-TRIZ: theory of Inventive
In this paper, a methodology based text mining that Problem Solving: TRIZ, China Machine Press, China, 2002
could be used to analyze patent documents for TRIZ user [13] V. K. Gupta, N. B. Pangannaya, “Carbon nanotubes:
is presented. The prior patent documents corresponding to bibliometric analysis of patents”, World Patent Information,
vol.22, issue 3, pp185-189, Sep. 2000.
Contradictions and Inventive Principles are taken as
[14] J. Michel, B. Bettels, “Patent citation analysis: a closer
samples to produce training text collections. And because look at the basic input data from patent search reports”,
the descriptions of TRIZ Contradictions and Inventive Scientometrics, vol. 51, no. 1 pp185–201, 2001
Principles are rather abstract and general, and patents [15] Y. H. Tseng, C. J. Lin, Y. I. Lin, “Text mining for patent
searched only according to the literal words are usually map analysis”. Information Processing & Management, vol.
not appropriate. The knowledge of semantic structure can 43, issue 5, pp1216-1247, Sep. 2007.
be obtained from WordNet. With the trained model, [16] G. Cascini, D. Russo, “Computer-aided analysis of patents
patents will be classified to Contradictions and Inventive and search for TRIZ contradictions”, Int. J. Product
Principles using WEKA software. The corresponding Development, vol.4, no. 1/2, 2007
[17] C. He, H. T. Loh, “Grouping of TRIZ Inventive Principles
patents obtained will accelerate the search for
to facilitate automatic patent classification”, Expert System
breakthrough solutions and give users ability to reach far with Applications, vol.34, issue 1, pp 788-795, Jan. 2008.
greater level of product performance. Besides, in practice, [18] M. F. Porter, “An algorithm for suffix stripping”,
an engineering problem often involves more than one Program, vol.14, no.3, pp130-137, 1980
Contradiction and needs several Inventive Principles. The [19] C. Fellbaum, WordNet: An Electronic Lexical Database,
difficulty is how to amalgamate the solutions of each Language, Speech, and Communication, The MIT Press,
Contradiction. Cambridge, May,1998.
[20] WEKA, Avaliable from: http://www.cs.waikato.ac.nz
/ml/weak/: 2008/01/08.
ACKNOWLEDGMENT

The research is supported in part by Scientific Research


Program of Hebei under Grant Numbers 07215602D-2 and
Ph.D. Programs Foundation of Ministry of Education of China
under Grant Numbers 20060080002.

REFERENCES

[1] R. H. Tan, Theory of Inventive Problem Solving: TRIZ.


China: Science Press, 2004.

1151

You might also like