Cgworld - Architecture and Features: Lecture Notes in Computer Science July 2002

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/221648867

CGWorld - Architecture and Features

Conference Paper  in  Lecture Notes in Computer Science · July 2002


DOI: 10.1007/3-540-45483-7_20 · Source: DBLP

CITATIONS READS

9 244

2 authors:

Pavlin Dobrev Kristina Toutanova


Bosch.IO Google Inc.
16 PUBLICATIONS   233 CITATIONS    109 PUBLICATIONS   7,483 CITATIONS   

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Pavlin Dobrev on 23 April 2014.

The user has requested enhancement of the downloaded file.


CGWorld – Architecture and Features
Pavlin Dobrev1 and Kristina Toutanova2
1
ProSyst Bulgaria Ltd., Sofia, Bulgaria
pavlin@prosyst.com
2
Stanford University, Department of Computer Science, Stanford, CA, USA
kristina@cs.stanford.edu

Abstract. This article presents recent developments in CGWorld - a web


based workbench for distributed development of a knowledge base of
conceptual graphs, stored on a central server. The initial version of CGWorld
met many of the needs that motivated its creation. It had excellent browsing,
searching and editing features for a KB of CGs. However the support of large
data and distributed development was not fully satisfying because of the
architecture limitations. Subsequently the architecture of CGWorld was
changed according to the latest developments in the area of multi tier web
applications. This paper describes several enhancements to this architecture
and implementation of new features that increase the scalability, reliability
and usability of the application.

1 Motivation and Rationale

The main motivation for creating CGWorld was the need for an application that
allowed Internet access to a knowledge base of Conceptual Graphs (CG). The goal
was to provide various facilities for remote browsing and editing of a KB that resides
on a central server.

Support of different representation formats for CGs was also a high priority. Similarly
to [9, 10, 11] we chose the graphical representation of conceptual graphs as the major
medium for browsing, editing and manipulation of the knowledge base since it is
easier to use by non CG-expert knowledge engineers and end users. The other
supported formats were CGIF [5], First Order Logic and a Prolog format [6,7,8].

CGWorld was first introduced at ICCS 2000 [3]. Future development was presented
at ICCS 2001 [1,2]. The main goals followed in the design and development of the
CGWorld workbench are:
(i) to allow for collaborative, distributed acquisition and editing of a CG
knowledge base;
(ii) to provide easy search and navigation in a large KB;
(iii) to maintain different representation languages, thus accommodating the
needs of different users of CGWorld and the different applications the
KB of CGs is used in;
(iv) to provide a graphical editor and viewer for CGs that is easy to use by
non-experts in CG theory
(v) to integrate and add Web access to previously developed CG
applications, written in different programming languages.

The initial version of CGWorld met many of the needs that motivated its creation. It
had excellent browsing, searching and editing features for a KB of CGs. However the
support of large data and distributed development was not fully satisfying because of
the architecture limitations. Subsequently the architecture of CGWorld was changed
according to the latest developments in the area of multi tier web applications. This
paper describes several enhancements to this architecture that increase the scalability,
reliability and usability of the application. It also reports on the addition of new
procedures for CG acquisition and the integration of a new representation format for
CGs. These features facilitate the development of a large KB by multiple parties using
different representation formats.

The need of an application like CGWorld arose in the context of projects that required
Natural Language Processing to be built on top of a Conceptual Graphs Knowledge
Base [1,2,3,4,6,7,8]. Initially we are concentrating on functionality that is required in
this area. We built different representation formats of Conceptual Graphs the most
used one being display form that is understandable by non-specialists and added the
support of CG operations to be used for inference.

The graphical editing facilities are implemented in the Editor and it is run over the
Internet and not downloaded locally as [10]. The other difference from [10] is that the
Knowledge Base is distributed over the Internet and not loaded from the local
computer. CGWorld has implementation of canonical formation rules as in [9]. An
added advantage of CGWorld is that its Editor is an applet and thus it provides higher
security and easier maintenance.

We do not aim to build “ontology servers” as defined in [14]. WebKB-2 [14] is a


shared annotation tool. It permits Web users to store, organize and retrieve knowledge
in a large single knowledge base on a WebKB server machine. The CGWorld
Knowledge Base is a Conceptual Graphs Knowledge Base. Similarly to WebKB-2 we
use a server to store conceptual data. WebKB-2 may be used for representing the
content of documents and therefore indexing them. CGWorld provides mechanisms
for storing, indexing and retrieval of Conceptual Graphs. We have also implemented
operations on Conceptual Graph. Currently WebKB-2 provides mechanisms to
support user defined Knowledge Bases. CGWorld has a single Knowledge Base that
is accessible to all users. There is only one instance of the type and relation
hierarchies. Users can make changes only if they have the respective permissions.
These changes are immediately available to the other users of the system.
2 Logical Architecture

The architecture of CGWorld follows the idea of organizing applications in multiple


tiers.

The data layer is used to store persistently the


conceptual graphs knowledge base in a relational Application Layer
database.
Conceptual Layer
The conceptual layer represents conceptual
components such as concept, relation, context, referent, Data Layer
and arc, operations for searching conceptual objects,
and different inference rules. This layer uses the data Fig. 1. Layers
layer to store components persistently.

The application layer represents the end user logic. This layer uses the conceptual
layer to implement user-defined functionality (e. g. [2]).

3 Implementation View

In accordance with the new Java technologies the current release of the CGWorld
workbench uses an application server with support of the Java 2 Enterprise Edition
(J2EE). The set of HTML and Java Server Pages (JSP) and most of the Java Beans
components described in [3] were reused. Part of the application logic that was
previously developed as a set of JavaBeans is currently implemented as a set of
Session Enterprise JavaBeans. This facilitates the management of user sessions and
allows strict control of user rights. A set of Entity Enterprise JavaBeans represents
persistent objects that is used to store concept, relation, context, referent, arc and
information about the knowledge base. The object model is very similar to the UML
model defined in [3, fig 2, p. 247]. This allows the maintenance of large amounts of
data and the control of the data integrity is performed by the built-in mechanisms for
transaction maintenance.

The component-based architecture allows the implementation of new features as


standard components (Java Beans or Enterprise Java Beans) and reusing previously
developed applications or integrating new ones [4,6,7,8].

The use of Enterprise Java Beans allows the manipulation of larger amounts of data
and increased numbers of concurrent users. This allows distributed acquisition and
editing of a CG knowledge base. Applications developed on top of J2EE can be
distributed on several computers because most application servers provide this
feature. The J2EE server that we used for development and test purposes was the
Orion Application Server (http://www.orionserver.com) licensed by Oracle and sold
under the name Oracle J2EE container. We are working on an implementation that
can be used with an Open Source J2EE server (e. g. http://www.jboss.org) and we
intend to provide this version (including source code) to the CG community at ICCS
2002.

The Model-View-Controller (MVC) architecture organizes the CGWorld application


design by separating the data presentation, data representation, and the application
behavior. The main changes from the MVC architecture described in [1] and the
current one are in the data representation layer. In the previous versions [1, 3] the data
were stored in files in Prolog format. This limited the amount of the manageable data
to the maximum file size that can be consulted by Prolog. In addition, updating the
knowledge base was slow because it required that the file be saved and reconsulted in
Prolog. Currently persistent data are accessed through Entity Enterprise Java Beans.
The data are stored permanently in a relational database. There is a direct object-
relational mapping between conceptual objects and tables in the database. Each object
instance is directly mapped to a row in the database table. The application server
manages the loading and storing of data, and the transaction behavior.

3.1 Mapping of the Data Layer

The Data layer is defined as a set of container managed persistence Entity Enterprise
Java Beans according to the Enterprise Java Beans 1.1 specification. This means that
it uses the built-in mechanisms for persistence of the corresponding container.

Enterprise Java Bean contains the remote interface, the home interface, and the bean
implementation. The remote interface is the class that exposes the methods of the EJB
to the outside world. The home interface specifies how to create and find a bean that
implements the remote interface. The bean implementation provides an
implementation of the methods specified by the remote and home interfaces.
E

ArcBean
arcId : Integer
Hom e cgId : Integer R em ote
fromId : Integer
ArcHome Arc
toId : Integer

<<EJBFinderMethod>> findAll() <<EJBRemoteMethod>> getArcId() <<EJBRem oteMethod> > getArcId()


<<EJBCreateMethod>> create() <<EJBRemoteMethod>> setArcId() <<EJBRem oteMethod> > setArcId()
<<EJBFinderMethod>> findByPrimaryKey() <<EJBRemoteMethod>> getCgId() <<EJBRem oteMethod> > getCgId()
<<EJBFinderMethod>> findByCgId() <<EJBRemoteMethod>> setCgId() <<EJBRem oteMethod> > setCgId()
<<EJBFinderMethod>> findByFromId() <<EJBRemoteMethod>> getFromId() <<EJBRem oteMethod> > getFrom Id()
<<EJBFinderMethod>> findByToId() <<EJBRemoteMethod>> setFromId() <<EJBRem oteMethod> > setFrom Id()
<<EJBRemoteMethod>> getToId() <<EJBRem oteMethod> > getToId()
<<EJBRemoteMethod>> setToId() <<EJBRem oteMethod> > setToId()
<<EJBCreateMethod>> ejbCreate()
<<EJBCreateMethod>> ejbPostCreate()

Fig 2. UML Model of Arc EJB

Fig 2. contains an UML model of the Arc Entity EJB. Arc is used to store persistently
information about arcs between concepts and relations in a given Conceptual Graph.
Arc is a Remote Interface of the Arc EJB. It contains methods for accessing fields that
are stored persistently in the database. ArcBean is Bean implementation. There is no
need to write any code for data persistency. EJB container automatically does this and
manages transactions and data integrity. ArcHome is the home interface of Arc EJB.
It defines methods for creating and finding an Arc by different parameters like cgId
(Id of the Conceptual Graph), fromId (Id of the Conceptual Object (Concept, Context
or Relation) of the beginning of the Arc) and toId (Id of the Conceptual Object
(Concept, Context or Relation) of the end of the Arc).

The mapping to the database table is defined in the XML deployment descriptor of
the CGWorld application. For the JBoss application server this is defined in files ejb-
jar.xml, jaws.xml and jboss.xml located in the META-INF subdirectory of the
application. The Arc bean is stored persistently in table ARC that has fields ARC_ID,
CG_ID, FROM_ID and TO_ID. As mentioned above there is a direct mapping
between EJB instances and rows in the table. For the Arc EJB this means that every
Arc instance in the container has a corresponding row in the table.

The container loads EJB instances into memory only when they are needed. This
allows large amounts of data to be handled using this model. Another advantage that
we gain from using EJB is that the implementation of the application is independent
of the choice of the particular database. The current version of CGWorld uses the
MySQL database, which is Open Source.

Fig 3 shows a UML Model of the Data Layer that is used to store conceptual
information. Here is a short description of the Entity Enterprise Java Beans given in
the model:

• CgBean – represents a conceptual graph;


• CgcBean – represents a concept;
• RelationBean – represents a conceptual relation;
• ArcBean – represents an arc between a concept and relation in a conceptual
graph;
• RegistryBean – this is a registry of all conceptual objects. For every given
identifier this registry contains a type of the conceptual object that is
represented by the TypeBean;
• TypeBean – contains a list of all allowed types of conceptual objects.
Currently they are Conceptual Graph, Simple Concept, Context – situaton or
proposition, Conceptual Relation, Coreference Link and Arc;
• FsBean – represents name value pairs that can be attached to a given
conceptual object. For example a concept usually has properties like number
(single or plural), definite marker, and/or quantifier (every or lambda). This
also allows external applications to attach additional information that will be
processed by CGWorld. Users can modify this information through the
properties dialog in the Conceptual Graph Editor;
• ReferentBean – represents Coreference Links;
• HierarchyBean – represents hierarchies among conceptual objects. The Type
and Relation hierarchies are represented using this bean;
• RootsBean – stores information about the root element of a particular
hierarchy;
E
CgcBean
E
E i d : Integer
typeId : Integer
cgId : Integer Rel ationBean
nam e : String
CgBean com m ent : String cgrId : Int eger
nam e : String
cgId : Integer
com m ent : String
cgi f : String
cgpro : String
com m ent : String

E E
E
RegistryBean
F sBe an E
ArcBean
cgId : Integer
arcId : Integer id : I nte ger nam e : String
cgI d : In teg er typeId : Integer value : String T ypeBean
fr om Id : Int eger
toId : Integer typeId : Integer
nam e : String

E E
E
ReferentBean
RootsBean
ref Id : Int ege r HierarchyBean
cgId : Integer id : Integer
from Id : Integer rootId : Integer id : Integer
toId : Integer typeId : Integer parentId : Integer
nam e : String hierarchyId : Integer hierarchyI d : Integer

This class diagram show s an overview of


Entity Enterprise Java Beans that is used
f or representing persistent data

Fig 3. Entity EJB Model

4 Features

This section describes the procedures for extending the knowledge base with new
CGs and the additional representation format for CGs now supported by CGWorld.

The conceptual graphs formats currently supported by CGWorld are Display Form,
First Order Logic, CGPro format and the newly implemented XCG.

The basic way to add CGs is by manually creating and editing them with the graphical
CG editor. The latest version of CGWorld includes two additional methods for
creating CGs. These are automatic acquisition from natural language and derivation
from existing CGs through canonical formation rules.
The Conceptual Graphs Editor is a user-friendly graphical editor for CGs. It was
described in an earlier paper [3]. Until recently CGs were created only through this
editor.

Integration of CGExtract [2] into CGWorld makes it possible to extend the


knowledge base of CGs from natural language text. The input format is controlled
English. All CGs automatically generated by CGExtract include comments that are
the sentences used for the CG acquisition. These comments are used during search
and displayed as part of the graphs.

CG operations can also be used for automatic generation of conceptual graphs from
other CGs in the knowledge base. The inference rules for conceptual graphs supported
by CGWorld are join, generalization, specialization, projection, type extraction and
type contraction. They were implemented for simple graphs, graphs with identity lines
and some special complex graphs. The user can request operations and specify their
arguments through a user-friendly interface. Detailed description and snapshots of the
web interface for the operation can be found in [1].

CGWorld now supports an additional representation format for CGs. The added CG
format is XCG. XCG is an XML linearization of a subset of the CG model. XML is
widely used as a platform-independent format for information exchange. Support of
this format is developed according to [12] and Peter Beker’s work in the CGXML
project (http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/tockit/cgxml).

5 Knowledge Base

CGWorld was used to develop a Knowledge Base from the financial domain [12].
This Knowledge Base is an excerpt from the KB of the LARFLAST (LeARning
Foreign LAnguage Scientific Terminology1) Project. Conceptual Graphs are used as a
knowledge representation core in the complex language-learning environment defined
in LARFLAST [4]. In [12] you can find the type hierarchy and Display, CGIF and
CGPro forms of the CGs in this Knowledge base.

As mentioned in [1] the main format used for processing of the Knowledge Base is
Java format. All other formats are translated to/from this format. For better
performance the CGIF and CGPro formats are stored in the database and access to
them is implemented through CgBean. This allows direct implementation of search
through the EJB find methods. The EJB container loads only EJBs that match a given
query. In the previsions versions of CGWorld the whole knowledge base was loaded
into memory. Using the components that provide remote interfaces by default allows
the handling of large numbers of user requests without writing additional code. Most
of the current implementations of EJB containers allow clustering of EJBs. Using this

1 INCO Copernicus'98 Joint Research Project #977074


feature it is possible for EJB applications to be run on several computers without
writing additional code.

Fig 4. Knowledge Base in visual and CGIF form.

Fig. 4. is an example that shows the conceptual graphs "A convertible bond is one
which is convertible into the company's common stock", "When a bond is converted
to common stock, the corporate debt is reduced" and "A bond is converted into
common stock" both in display and CGIF form.

The other representations that CGWorld supports are CGPro, FOL and XCG. The
graph “A bond is converted into common stock” (Fig 4.) in CGPro is:

cgc(55,simple,'bond',[fs(num,sing)],[]).
cgc(53,simple,'common_stock',[fs(num,sing)],[]).
cg(155,[cgr(convert_into, [55, 53], _)],
none,
fs(kind,'body_of_context'),
fs(comment,'A bond is converted into common
stock')]).

The FOL representation is:

exists(A1,exists(A0,convert_into(A0,A1)
& bond(A0) & common_stock(A1)))

The XCG representation is:

- <relation type="convert_into">
- <concept type="bond">
<number type="single" />
</concept>
- <concept type="common_stock">
<number type="single" />
</concept>
</relation>

The XGC, CGPro, CGIF and Java representation are equivalent. The Conceptual
Graphs can be converted from one representation to another. Currently FOL is
supported for a limited number of graphs and only as an output format. The modules
of CGWorld process Conceptual Graphs both in Java and Prolog representations. For
example the conversion, searching, browsing and editing of Conceptual Graphs are
implemented in Java. Conceptual Graph operations are implemented in Prolog.

6 Used Software

• JDK 1.3.1-03 from Sun Microsystems (http://java.sun.com).


• SICStus Prolog from the Swedish Institute of Computer Science
(http://www.sics.se/sicstus). It allows for easy integration with Java.
• Tomcat - a servlet container with a JSP environment. A servlet container is a
runtime shell that manages and invokes servlets on behalf of users. Tomcat was
developed by the Apache Software Foundation as part of the open source Jakarta
Project (http://jakarta.apache.org).
• Jboss (http://www.jboss.org) - Open Source implementation of Java 2 Enterprise
specification. Currently we use version 2.2.4 of the product that is bundled
together with Tomcat 4.0.1.
• MySQL (http://www.mysql.com) - Open Source Relational Database from
MySQL AB.

7 Conclusion

During the last three years CGWorld implemented different architectural concepts
and its development reflects the evolution of the authors’ understanding of enterprise
architectures. The general idea was to provide a set of components that can be used as
building blocks for CG applications and the authors continue to work in this direction.
8 Future Work

Currently CGWorld is implemented as a Java Application. It is not straightforward to


integrate it in applications written in a language other than Java. Supporting different
formats for knowledge representation allows importing and exporting of data to other
applications. Using XML as a format for knowledge representation allows data to be
exchanged between different applications. The next step that we intend to undertake is
extension of GGWorld components to WEB services. WEB services, as the name
implies, are services offered via the Web. In a typical Web services scenario, a
business application sends a request to a service at a given URL using the SOAP
protocol over HTTP. The service receives the request, processes it, and returns a
response. The idea is to offer a set of services for processing conceptual data and
exporting it in different formats. This will allow integration with applications written
in languages other than Java. For example languages supported by the Microsoft .Net
platform such as C#, C, J#, Visual Basic etc.

9 Acknowledgements

We would like to thank all researchers involved in the LARFLAST (LeARning


Foreign LAnguage Scientific Terminology2) project and especially Galia Angelova,
Albena Strupchanska, Svetla Boytcheva, Ani Nenkova and Toma Nikolov that helped
us with the programming and knowledge base development.

10 References

1. P. Dobrev, A. Strupchanska and K. Toutanova. CGWorld-2001 - New


Features and New Directions, CGTools Workshop at ICCS 2001
2. S. Boytcheva, P. Dobrev and G. Angelova. CGExtract: Towards
Extraction of Conceptual Graphs from Controlled English. In: G. W.
Mineau (Ed.), Conceptual Structures: Extracting and Representing
Semantics, Contributions to ICCS 2001, pp. 89-101
3. P. Dobrev and K. Toutanova. CGWorld - A Web Based Workbench for
Conceptual Graphs Management and Applications. In: G. Stumme (Ed.),
Working with Conceptual Structures, Contributions to ICCS 2000,
Shaker Verlag, Germany, pp. 243-256.
4. G. Angelova, A. Nenkova, S. Boycheva and T. Nikolov. Conceptual
Graph as a Knowledge Representation Core in a Complex Language
Learning Environment. In: G. Stumme (Ed.), Working with Conceptual
Structures, Contributions to ICCS 2000, Shaker Verlag, Germany, pp.
45-58.

2 INCO Copernicus'98 Joint Research Project #977074


5. Conceptual Graph Standard Information Technology (IT) - Conceptual
Graphs draft proposed American National Standard (dpANS)
NCITS.T2/98-003 (http://www.bestweb.net/~sowa/cg/cgdpansw.htm).
6. G. Angelova, K. Toutanova and S. Damianova. Knowledge Base of
Conceptual Graphs in DBR-MAT. University of Hamburg, Computer
Science Faculty, Project DBR-MAT (funded by the Volkswagen
Foundation). Technical Report BG-3-98, July 1998.
7. G. Angelova, S. Damianova, K. Toutanova, K. Bontcheva: Menu-Based
Interfaces to Conceptual Graphs: The CGLex Approach. In Proc. ICCS
1997, LNAI 1257, Springer, 1997, pp. 603-606.
8. G. Angelova, K. Bontcheva: DB-MAT: Knowledge Acquisition,
Processing and NL Generation Using Conceptual Graphs. In Proc. ICCS
1996, LNAI 1115, Springer 1996, pp. 115-129
9. S. Pollitt, A. Burrow, P. Eklund. WebKB-GE - A Visual Editor for
Canonical Conceptual Graphs. ICCS 1998: pp. 111-118
10. H. Delugah, CharGer – A Conceptual Graph Editor written by Harry
Delugah (http://www.cs.uah.edu/~delugach/CharGer/).
11. H. Delugah, CharGer: Some Lessons Learned and New Directions.
Working with Conceptual Structures, Contributions to ICCS 2000: pp.
306-309
12. A. Strupchanska, P. Dobrev, S. Boytcheva, T. Nikolov, K. Toutanova,
Sample Knowledge Base in Finance,Contribution to CGTools Workshop
at ICCS 2001 (http://www.ksl.stanford.edu/iccs2001/CGTools/)
13. M. Altheim, XML Conceptual Graphs (XCG) 1.0, Sun Microsystems
Technical Report 23 August 2001
14. Ph. Martin, P. Eklund. Large-scale cooperatively-built heterogeneous
KBs. Proceedings of ICCS'2001, 9th International Conference on
Conceptual Structures (Springer Verlag, LNAI 2120, pp. 231-244),
Stanford University, California, US

View publication stats

You might also like