Professional Documents
Culture Documents
Patent Analysis With Text Mining For TRIZ
Patent Analysis With Text Mining For TRIZ
1148
Proceedings of the 2008 IEEE ICMIT
obtained freely on Internet. That people often consult and British Classification, have been developed.
patents in scientific research not only can improve However, the classification schemes used by these
research point and level of the research item, but also can researchers according to the technical fields are not
save almost 60% of the time and 40% of the cost. In suitable for TRIZ users.
recent years patent analysis has been recognized as an Patent documents are divided into different areas
import task for product development. according to technology fields, which is helpful to search
The technologies and schemes implemented for the prior art for traditional inventors. However, it is
patent are described in its claim and description and so on. inadequate for TRIZ users since TRIZ users are interested
Because natural language is complex and multiplicity, it is in previous patents that have solved the same
difficult to handle a patent document on computer. The Contradiction and used the same Inventive Principles,
traditional method of extracting knowledge from patents which may come from different fields [10, 17]. Patent
was based on manual analysis carried out from experts. documents, which are classified according to
However patent documents are often lengthy and rich in Contradictions, will provide a broader view for TRIZ
technical and legal terminology and thus hard to read and users and TRIZ software developers, by helping them find
analyze for non-specialists. It is a time consuming and possible inspiration from the field that may be totally
labor-intensive task, such as the theory of TRIZ is different from others.
developed by over 1500 person-years of research studying
over two million of the world’s most successful patents
[12]. Nowadays the traditional method is impractical as IV. PROPOSED METHODOLOGY OF MINING
patent database grow exponentially. PATENTS ACCORDING TO CONTRADICTIONS
Patent analysis technologies include patent
bibiometric data analysis [13], patent citation analysis A. Framework for Patent Analysis
[14], patent statistical analysis [15], patent classification,
and so on. Bibliometric analysis of patents provides Since the original patent documents are expressed in
information on the growth of the inventive activity and natural language format, text mining, as a technique to
technological trends. Patent citation analysis is usually perform knowledge discovery from collections of
used to measure the quality of patents on the examination unstructured text, is used to extract useful information
of citation links between patents and scientific literature. from huge sets of patents. The overall process of patent
Patent statistical analysis is to apply statistical indictors of analysis with text mining for TRIZ is made up of several
feature words to express patent information by word steps. First of all, full patent documents to be analyzed are
analysis. Patent classification is used to arrange and index collected in electronic text format. This may involve a
the technical content of patent specifications so that a repeated process of devising a set of query terms,
document disclosing the invention identical or similar to searching a couple of patent databases, filtering undesired
the invention for which a patent is claimed can be quickly patents, and downloading patents for local analysis.
found. People can fetch all patents under some IPC categories,
In addition more and more patent analysts have keywords, and/or within some year limits, etc. Many
increased their interests in discovering and exploring patent offices already allow people to download abstracts
information hidden in patents related to technological and completed texts of their patents freely, such as the
activities and innovation. The natural language of patent USPTO from United States, the EPO from Europe. In
is analyzed with computer. For example, Cascini provided particular, http://ep.espacenet.com/, the download service
a Subject-Action-Object model to extract useful of the European Patent Office has become a very popular
information [16]. Byungun proposed a text-mining-based source of information.
patent network method [3]. Although clustering and Second, raw patent documents are transformed into
finding the relationship of similar contents between structured data. Since the original documents are
patents can extract useful information from huge sets of expressed in natural language format, they must be
patent documents, for the time being the defects of these transformed raw data into structured data in order to be
methods are lack of semantic analysis and the results of analyzed and utilized. These patent documents are
analysis are difficult to put into practice for inventors segmented into bag-of-words, and then stop words are
using TRIZ. For TRIZ users, patents are required to be filtered from the bag-of-words to reduce the dimension.
classified by the Contradictions and Inventive Principles. Semantic characteristic and word sense disambiguation
Currently popular patent classification schemes are are carried out. The prior patent documents and the
used to organize and index the technical content of patent corresponding Contradictions and Inventive Principles are
specifications so that specifications on a specific topic or samples of training text collections. A text mining
in a given area of technology can be identified easily and methodology specialized for patent analysis for TRIZ is
accurately. Before their publication, patent documents are proposed and shown in Fig. 2. In relation to patent
given one or more classification codes based on their analysis, text mining is used as a data processing and
textual contents for topic-based analysis and retrieval. information-extracting tool. Then, the patents to be
Many patent classification schemes, such as IPC analyzed are classified according to Contradictions and
(International Patent Classification), US Classification Inventive Principles.
1149
Proceedings of the 2008 IEEE ICMIT
1150
Proceedings of the 2008 IEEE ICMIT
preprocessed by removing stop words, filtering words and [2] V. W. Soo, S. Y. Lin, S. Y. Yang, S. N. Lin, S. L. Cheng,
stemming, then semantic analysis is performed by “A cooperative multi-agent platform for invention based on
WordNet. On the basis of WordNet, word is substituted patent document analysis and ontology”, Expert Systems
by word sense collection, which becomes characteristic with Applications, vol. 31, issue 4, pp.766-775, Nov. 2006.
[3] B. Yoon, Y. Park, “A text-mining-based patent network:
items of the document eigenvector. Then using latent
Analytical tool for high-technology trend”, The Journal of
semantic indexing model, the dimension of document High Technology Management Research, vol. 15, issue 1,
vector space is dropped by MATLAB. Finally the pp. 37-50, Feb. 2004.
structured data can be stored to .arff or .csv format, which [4] G. Fischer, N. Lalyre, “Analysis and visualization with host-
® ™
is appropriate to classify the patents using WEKA based software-The features of STN AnaVist ”, World
software [20]. Patent Information, vol. 28, issue 4, pp.312-318, 2006.
[5] Y. H. Tseng, C. J. Lin, Y. L. Lin, “Text mining techniques
for patent analysis”, Information Processing and
V. CONCLUSION AND FUTURE WORK Management, vol.43, issue 5, pp. 1216-1247, Sep. 2007.
[6] TRIZ, Available from: http://www.triz-journal.com/
archives/what_is_triz/: 2007/05/10.
Classified patents according to Contradictions and [7] Pro/Innovator, Available from: http://www.iwint.com.cn/:
Inventive Principles can help innovators using TRIZ to 2007/06/24
solve the specific problem. Currently, however, we are [8] Goldfire, Available from: http://www.goldfire.com/:
lacking open databases with sufficient classified patents 2008/02/10
of this kind partly because of the huge manpower [9] TechOptimizer, Available from :http://www.invention-
requirement of manual classification. With a wider machine.com/: 2008/02/24.
[10] H. T. Loh, C. He, L. X. Shen, “Automatic classification of
application of TRIZ and enormous increase of patents patent document for TRIZ users”, World Patent
worldwide, there is an urgent need to automatically Information, vol. 28, issue 1, pp. 6-13, Mar. 2006.
classify patents for TRIZ users. The main purpose of this [11] Patent, Available from: http://www.epo.org/patents/Grant
paper is to propose a computer-aided approach for -procedure/About-patents.html: 2008/01/13
classifying patents in TRIZ categories. [12] R. H. Tan, Innovative design-TRIZ: theory of Inventive
In this paper, a methodology based text mining that Problem Solving: TRIZ, China Machine Press, China, 2002
could be used to analyze patent documents for TRIZ user [13] V. K. Gupta, N. B. Pangannaya, “Carbon nanotubes:
is presented. The prior patent documents corresponding to bibliometric analysis of patents”, World Patent Information,
vol.22, issue 3, pp185-189, Sep. 2000.
Contradictions and Inventive Principles are taken as
[14] J. Michel, B. Bettels, “Patent citation analysis: a closer
samples to produce training text collections. And because look at the basic input data from patent search reports”,
the descriptions of TRIZ Contradictions and Inventive Scientometrics, vol. 51, no. 1 pp185–201, 2001
Principles are rather abstract and general, and patents [15] Y. H. Tseng, C. J. Lin, Y. I. Lin, “Text mining for patent
searched only according to the literal words are usually map analysis”. Information Processing & Management, vol.
not appropriate. The knowledge of semantic structure can 43, issue 5, pp1216-1247, Sep. 2007.
be obtained from WordNet. With the trained model, [16] G. Cascini, D. Russo, “Computer-aided analysis of patents
patents will be classified to Contradictions and Inventive and search for TRIZ contradictions”, Int. J. Product
Principles using WEKA software. The corresponding Development, vol.4, no. 1/2, 2007
[17] C. He, H. T. Loh, “Grouping of TRIZ Inventive Principles
patents obtained will accelerate the search for
to facilitate automatic patent classification”, Expert System
breakthrough solutions and give users ability to reach far with Applications, vol.34, issue 1, pp 788-795, Jan. 2008.
greater level of product performance. Besides, in practice, [18] M. F. Porter, “An algorithm for suffix stripping”,
an engineering problem often involves more than one Program, vol.14, no.3, pp130-137, 1980
Contradiction and needs several Inventive Principles. The [19] C. Fellbaum, WordNet: An Electronic Lexical Database,
difficulty is how to amalgamate the solutions of each Language, Speech, and Communication, The MIT Press,
Contradiction. Cambridge, May,1998.
[20] WEKA, Avaliable from: http://www.cs.waikato.ac.nz
/ml/weak/: 2008/01/08.
ACKNOWLEDGMENT
REFERENCES
1151