A Reverse Engineering Tool For Precise Class Diagrams: January 2004

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/221501056

A reverse engineering tool for precise class diagrams

Conference Paper · January 2004


DOI: 10.1145/1034914.1034917 · Source: DBLP

CITATIONS READS
44 248

1 author:

Yann-Gaël Guéhéneuc
Concordia University Montreal
268 PUBLICATIONS   5,833 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Lattice-based software re-engineering View project

Software Processes for Video Games Development View project

All content following this page was uploaded by Yann-Gaël Guéhéneuc on 05 February 2015.

The user has requested enhancement of the downloaded file.


This paper has been accepted at CASCON 2004.

A Reverse Engineering Tool for Precise Class Diagrams


Yann-Gaël Guéhéneuc

Département d’informatique et de recherche opérationnelle


Université de Montréal – CP 6128 succ. Centre Ville
Montréal, Québec, H3C 3J7 – Canada
guehene@iro.umontreal.ca

Abstract 1 Introduction
Developers use class diagrams to describe Software developers use UML-like class dia-
the architecture of their programs intensively. grams to describe the architecture of object-
Class diagrams represent the structure and oriented programs intensively during develop-
global behaviour of programs. They show the ment. Class diagrams represent the structure
programs classes and interfaces and their rela- and global behaviour of programs [13], showing
tionships of inheritance, instantiation, use, as- classes, interfaces, and their relationships [16].
sociation, aggregation and composition. Class They help software developers by abstracting
diagrams could provide useful data during pro- implementation details and by presenting an
grams maintenance. However, they often are easier-to-grasp clustered view of the programs
obsolete and imprecise: They do not reflect lines of code [17].
the real implementation and behaviour of pro- Class diagrams would help software main-
grams. We propose a reverse-engineering tool tainers to understand programs architecture
suite, Ptidej, to build precise class diagrams and to locate places requiring modifications
from Java programs, with respect to their during maintenance. However, they are of-
implementation and behaviour. We describe ten obsolete—unsynchronised with the con-
static and dynamic models of Java programs crete implementation of programs—when ex-
and algorithms to analyse these models and isting at all [6].
to build class diagrams. In particular, we de-
Software maintainers need tools to recover
tail algorithms to infer use, association, ag-
class diagrams from programs source code and
gregation, and composition relationships, be-
binaries, which are the only sources of data
cause these relationships do not have precise
available usually during the maintenance pro-
definitions. We show that class diagrams ob-
cess and which can be used to build both static
tained semi-automatically are similar to those
and dynamic models of programs.
obtained manually and more precise than those
We present Ptidej (Pattern Trace Identifi-
provided usually.
cation, Detection, and Enhancement in Java),
Copyright °c 2004 Yann-Gaël Guéhéneuc. Permis- a reverse engineering tool suite to build class
sion to copy is hereby granted provided the original diagrams from static and dynamic models of
copyright notice is reproduced in copies made.
This work has been partly funded by IBM OTI
Java programs semi-automatically. This paper
Labs – 2670 Queensview Drive – Ottawa, Ontario, summarises our previous work on program ar-
K2B 8K1 – Canada. chitecture recovery started in 2000 at École des
Mines de Nantes and being pursued at Univer- referenced strings. Entities are linked with
sity of Montréal. The main contribution of this subclass, implementation, declaration, field ac-
paper is an overview of our tool suite and a cess/modification, and method call relation-
complete example of static and dynamic analy- ships. A repository contains a model of the
ses, using the JHotDraw program, which ex- program, which reproduces exactly a program
emplifies the need for precise architecture re- static model and does not increase its preci-
covery for maintainers. sion with respect to other kinds of relation-
By precise architecture recovery, we mean ships, such as use, association, aggregation, or
that the recovered architecture reflects the im- composition, or with dynamic data.
plementation of the analysed programs and
that it provides the same data as if recovered Womble. Jackson and Waingold propose
by maintainers manually, in particular with re- Womble [13], a tool for the lightweight ex-
spect to usual UML constituents: Classes, in- traction of object models, which are similar to
terfaces, inheritance, instantiation, use, asso- class diagrams. The latest version of Womble
ciation, aggregation, and composition relation- is able to analyse programs class files and to
ships. identify inheritance, use, and association rela-
In Section 2, we present related work briefly tionships. It proposes heuristics to infer multi-
and discuss their limitations. Then, we intro- plicities of the origin and target classes, which
duce the Ptidej tool suite: Its models and distinguish association and aggregation. How-
tools. We also sketch the use of the tool ever, the authors do not attempt to identify
suite. In Section 3, we detail the definitions composition relationships to increase the level
and algorithms to identify relationships among of precision of the recovered object models.
classes, interfaces, and their instances, and dis-
cuss their precision and recall. In Section 4, we CASE Tools. CASE tools, such as Ar-
apply our tool suite on the real-world JHot- goUML and Rational Rose, offer reverse en-
Draw program. We show that the class dia- gineering capabilities, but their capabilities are
gram obtained semi-automatically is similar to very limited. They only distinguish use, as-
one obtained manually yet more precise than sociation, aggregation, and composition rela-
this provided. Finally, in Section 5, we con- tionships graphically. Indeed, they use iden-
clude and present future work on and with the tical algorithms to reverse engineer use, asso-
Ptidej tool suite. ciation, aggregation, and composition relation-
ships, which leads to inconsistency with pro-
grams implementations [10].
2 Tool Suite
There exists several reverse engineering tools Discussion. Existing reverse engineering
to recover class diagrams from program im- tools are only capable of identifying structural
plementation. We present three typical tools relationships, existing physically in the static
briefly, discuss their limitations, and sum- models of Java programs. They are incapable
marise the contributions of our tool suite. of abstracting relationships, which must be
Then, we detail our models and tools, and inferred both from static and dynamic models
present a short example. of programs. In particular, they lack precise
definitions and algorithms to identify (or to
distinguish) use, association, aggregation, and
2.1 Related Work
composition relationships.
Chava. Korn et al. propose Chava [15], a The Ptidej tool suite is different from ex-
reverse engineering tool dedicated to Java ap- isting reverse engineering tools because it uses
plets. Chava creates a repository that contains both static and dynamic data to infer relation-
the structure of a program from the source ships among classes and interfaces. It is able
code or class files of the program. A reposi- to infer inheritance, instantiation, use, associ-
tory stores entities that represent classes, in- ation, aggregation, and composition relation-
terfaces, packages, files, methods, fields, and ships among classes and interfaces to represent

2
precisely programs. Thus, it helps software 2.3 Tools
maintainers to grasp and to understand pro-
The Ptidej tool suite decomposes into three
grams architecture.
tools1 : To analyse static models; To generate
The recovery of programs class diagrams de-
and to analyse dynamic models; To build class
composes in two steps: First, class diagrams
diagrams from the analyses.
are built from static data provided by Java pro-
grams class files; Second, class diagrams are re-
fined with dynamic data obtained by analysing PADL ClassFile Creator. Different al-
the runtime behaviour of the programs. gorithms to analyse static models can be con-
nected to the PADL meta-model, using the
Bridge design pattern, to create class dia-
2.2 Models grams from different sources of data. We of-
fer a default implementation of such a creator,
We use three different models to represent and PADL ClassFile Creator, that analyses
to analyse static and dynamic data about Java static models of Java programs and create the
programs and to describe class diagrams. corresponding class diagrams by instantiating
the constituents of the meta-model.
Static Model. We use class files composing For each constituent C of the PADL meta-
Java programs as static models. Class files em- model, the PADL ClassFile Creator de-
body all the data provided by software devel- clares a recognizeC() method used to identify
opers about a program architecture and about constructs corresponding to C in static models
its runtime behaviour statically. They are eas- of Java programs and to instantiate C with the
ier to manipulate than source code, using spe- appropriate data from the constructs. Meth-
cialised tools such as CFParse [8] and Javas- ods recognizeCPriority() order the identi-
sist [4], because of their structure and of the fications of constituents. (Creators for AOL
processing performed at compilation-time, in files [3] and C++ files exist also.)
particular type binding. Also, class files are
always available, whereas source code is not. Caffeine. We develop a tool for the dy-
namic analysis of Java programs. Caf-
feine [11] is a 100%-pure Java program that
Dynamic Model. We use traces as mod- generates and analyses on the fly dynamic
els of the runtime behaviour of Java pro- models of Java programs. Analyses of dy-
grams. A trace is a history of execution namic models are performed with Prolog pred-
events: Field accesses/modifications; Class icates. We use Prolog because of its unification
loads/unloads; Method, constructor, and final- and backtrack mechanisms and its high-level
izer entries/exits; Program end [11]. A pro- pattern-matching capabilities.
gram has one and only one static model but A Prolog engine runs as a co-routine of the
several (possibly an infinity of) dynamic mod- Java program under analysis. It controls the
els. Thus, dynamic models approximate pro- program execution with the nextEvent/3 pred-
grams behaviour only. icate. The nextEvent/3 predicate unifies a
Prolog variable with the last event generated
Class Diagram Model. We develop a meta- by the program, according to filters and to a
model, PADL (Pattern and Abstract-level De- list of expected events. Then, dedicated predi-
scription Language) [2], to describe programs cates may analyse the set of events.
as class diagrams. PADL offers constituents, We develop two Prolog predicates to
such as Model, Class, Method, Relationship, analyse dynamic models and to assess
with which we can build class diagrams repre- the presence of composition relationships
senting programs. It offer also methods to ma- among classes, interfaces, and their in-
nipulate class diagrams easily and to generate stances: instanceLevelCompositions/1 and
other representations of class diagrams, using 1 Alltools are available at:
the Visitor design pattern. www.yann-gael.gueheneuc.net/Work/

3
classLevelCompositions/1. We use the 2.4 Example
results of these analyses to refine class dia-
grams of Java programs with dynamic data on We exemplify the Ptidej tool suite with a sim-
relationships among its classes and interfaces. ple document description program. We present
a complete example in Section 4. Figure 2(a)
shows the stand-alone Ptidej front-end. It
Ptidej. The Ptidej tool is a front-end for consists of two panels to display class diagrams
PADL, PADL ClassFile Creator, and (left) and to control the tool (right). In ad-
Caffeine. We implement this front-end both dition to classes and interfaces, we can dis-
as a stand-alone 100%-pure Java program and play the names and graphical representations
as a plug-in for the Eclipse development en- of the relationships among classes, interfaces,
vironment for Java. and their instances (checkboxes on the right).
First, a maintainer selects a program static Figure 2(b) shows the class diagram dis-
model, for example Java class files. The front- played for the document description program,
end calls the appropriate creator, for example with the classes and interfaces declared in the
the PADL ClassFile Creator, to build the selected class files (in black) and those that are
corresponding class diagram, using the PADL only known through references2 (in gray). We
meta-model. Then, the maintainer refines the only display graphical representations of inher-
class diagram by loading results from the dy- itance, aggregation, and composition relation-
namic analysis of the program with the Caf- ships, and names of use, association, and in-
feine tool. Figure 1 summarises the data flow stantiation relationships. This class diagram
among tools. Thus, a maintainer builds semi- is built by the PADL ClassFile Creator
automatically (with user-interactions) a class tool and laid out with a simple layout algo-
diagram representing the concrete implementa- rithm, which minimises the crossing of inheri-
tion of a program. The front-end displays class tance representations. It shows an aggregation
diagrams with a dedicated graphic library and relationship (white triangle) between classes
different layout algorithms, using the Strategy Document and Element, which represent a doc-
design pattern. ument and its structure respectively.
Figure 3(a) shows the Caffeine tool to
analyse the document description program dy-
Java class
files / Jar files
namically. This tool runs as a co-routine of the
program and analyses generated events. Fig-
ure 3(b) shows the output generated by the tool
after analysing an execution of the document
description program. The output shows that a
Appropriate Appropriate
static analyses dynamic analyses composition relationship exists between classes
PADL ClassFile Caffeine Document and Element, which is more pre-
Creator ( CFParse ) (Prolog engine)
cise than the aggregation relationship found by
static analysis. Figure 3(c) shows the class dia-
gram refined with the results from the dynamic
Merge of the
analysis: A composition relationship (black tri-
analyses results angle) replaces the aggregation relationship.
and display of the
class diagram

Ptidej
3 Relationships
We now detail the definitions and algorithms
Class
diagram that we use to identify the inheritance, instan-
model
2 We call ghosts classes and interfaces known only by

references from analysed classes and interfaces: They


surround analysed classes and interfaces but we do not
Figure 1: Flow of the data among tools know much about them.

4
(a) The Ptidej front-end.

(b) The Ptidej front-end showing the class diagram obtained from the PADL ClassFile Creator tool.

Figure 2: Use of the Ptidej tool suite on a document description program

5
Caffeine initialized.
Caffeine started.
(Remote JVM)
(Remote JVM) . . .
(Remote JVM)
List of instance−level compositions:
composition(
jtu.example.composite2.Document, 1006,
jtu.example.composite2.Element, 1011,
true)
(a) The Caffeine tool. List of class−level compositions:
composition(
jtu.example.composite2.Document,
jtu.example.composite2.Element,
true)

(b) Data obtained from the Caffeine tool


when analysing the program dynamically.

(c) The Ptidej tool displaying the class diagram refined with dynamic data.

Figure 3: Use of the Ptidej tool suite on a document description program (cont’d)

6
tiation, use, association, aggregation, and com- exist among two or more classes (or inter-
position relationships. The recovery of classes faces). Practically, however, most authors
and interfaces does not pose any problem be- agree (see [10]) that use, association, aggrega-
cause classes and interfaces exist in static mod- tion, and composition relationships involve the
els explicitly. instances of two classes, an origin and a target,
respectively A and B, and that these relation-
ships are oriented, irreflexive, anti-symmetric
3.1 Inheritance, Instantiation
at instance and class level, and asymmetric at
The inheritance and instantiation relationships instance level [12].
are direct to identify in programs static models First, we propose that an association be-
because they exist in source code or class files tween A and B defines the ability of an instance
physically. of A to send a message to an instance of B.
Nothing prevents other relationships to exist
Inheritance. In a static model of a program, between classes B and A.
classes and interfaces declare the classes and Second, we say that an association between
interfaces they implement or extend explicitly. A and B is an aggregation relationship if the
An algorithm to infer inheritance relationships definition of A, the whole, contains instances of
needs only to iterate over classes and interfaces B, the part. The whole must define a field (or
and to retrieve their subclasses/interfaces syn- an array field, or field of type collection) of the
tactically. type of its part. Instances of the whole send
messages to the instances of the part. Sub-
Instantiation. Static initialisers, instance classes inherit the aggregation relationship be-
initialisers (constructors), and methods may tween A and B, because subclasses inherit the
contain objects instantiations. An algorithm structure and behaviour of their superclasses.
to infer instantiation relationships needs only Third, we define a composition as an ag-
to iterate over the byte-codes of each initialis- gregation with constraints on the lifetimes of
ers and methods, looking for New, NewArray, the whole and of the parts and on the own-
ANewArray, MultiANewArray byte-codes. ership of the parts. An instance of the whole
owns the instances of its part. The instances of
3.2 Use, Association, Aggrega- the part are exclusive to the instance of their
tion, Composition whole. Parts can be exchanged during the life-
cycle of the whole, but all the parts owned
The use, association, aggregation, and compo- by a whole at the moment of its destruction
sition relationships are difficult to identify be- are also destroyed. A composition relationship
cause they lack precise definitions. They do only allows an association relationship between
not appear in programs static models explicitly its part and whole, to ensure the exclusivity
and they require the use of dynamic models. and lifetime properties.
From the literature, we propose consensual Finally, the use relationship is the default re-
definitions of these relationships [10], which de- lationship between two classes when these are
compose in four properties: Exclusivity, invo- not linked through an association, an aggrega-
cation site, lifetime, and multiplicity. The four tion, or a composition relationship. For exam-
properties are minimal and we use these to de- ple, a use relationship exists between two in-
velop algorithms to identify the relationships. terfaces IA and IB if interface IA defines meth-
ods which signatures use interface IB. The use
Definitions. Links among classes, interfaces, relationship is similar to the association rela-
and their instances exist at runtime to allow tionships but only suggests that messages may
method invocations and data access. These be sent. We do not further consider use re-
links are described by different relationships lationships in the rest of this section, because
in class diagrams. Conceptually, as in the their recovery is fairly easy from source code
UML, a relationship is non-oriented and may (method parameters, return types. . . ).

7
Table 1 summarises the possible links among method {local variable}. We name yes
classes, interfaces, and their instances, and the set {field, array field, collection,
their representations as relationships. For ex- parameter, local variable}.
ample, Table 1 states (third row, on the right)
that an aggregation relationship exists between The lifetime property constrains the lifetime
two classes A and B if two instances of these of all the instances of B with respect to the life-
classes, respectively a and b, are linked to- time of all the instances of A. It corresponds to
gether such as a sends messages to b and b is the time elapsed between the times of destruc-
a field of A (on the left). tion LTd of two instances of A and B [5]. The
time is in any convenient unit, for example in
Properties. The definitions of the relation- seconds or in CPU ticks.
ships use four language-independent proper- In programming languages with garbage col-
ties. The association relationship allows mul- lection, LTd matches the moment where an in-
tiple instances of A and B to take part in the stance is ready for garbage collection.
relationship, while the aggregation and compo-
sition relationships allow multiple instances of
B to be in a relationship with one instance of LT (A, B) = LTd (A) − LTd (B)
A. With an aggregation relationship, instances ∈ {+, −}
of A access instances of B through a particu-
lar invocation site: Field, array field, or field We name k the set {+, −}. LT (A, B) = +
of type collection. With a composition rela- if instances of B are destroyed before the
tionship, instances of B are exclusive to their corresponding instances of A, LT (A, B) = − if
corresponding instance of A and instances of A destroyed after, and LT (A, B) ∈ k if their times
and B have related lifetimes. of destruction are unrelated (either + or −).
The exclusivity property states whether an
instance of a class involved in a relationship The multiplicity property describes the num-
can be in another relationship at a given time. ber of instances of B allowed in a relationship
with A.
EX(A, B) ∈ {true, f alse}
M U (A, B) ⊂ N ∪ {+∞}
We name B the set {true, f alse}. The
For the sake of simplicity, we use an interval
value true states that an instance of B can
of the minimum and maximum numbers of in-
take part in another relationship with another
stances to represent the multiplicity. We con-
instance of A or of another class. The value
sider multiplicity at the target end of a rela-
f alse indicates that it cannot. The exclusivity
tionship only. The interested reader may refer
property holds at a given time only. It does
to [13] for a discussion on multiplicities at both
not prevent possible exchanges.
ends of a relationship.
The invocation site property indicates that
instances of A, involved in a relationship, send Formalisation. We now formalise the
messages to instances of B. definitions of the relationships with the four
properties to build identification algorithms.
IS(A, B) ⊆ {field, array field, We define an association relationship between
collection, parameter, local variable} A and B, AS(A, B), as:
The values of the IS property summarise
possible invocation sites for messages sent from AS(A, B) =
instances of A to instances of B. There can be (IS(A, B) ⊆ yes) ∧ (IS(B, A) = Ø) ∧
no message sent from A to B: IS(A, B) = Ø, or (EX(A, B) ∈ B) ∧ (EX(B, A) ∈ B) ∧
messages can be sent from A through a {field} (LT (A, B) ∈ k) ∧ (LT (B, A) ∈ k) ∧
of type B, an {array field}, a field of type (M U (A, B) = [0, +∞]) ∧
{collection}, a method {parameter}, or a (M U (B, A) = [0, +∞])

8
Link Relationship

Is described by
Origin Means Target Origin Name Target
Class/Interface Any Class/Interface Class/Interface Use Class/Interface
Instance Direct Instance Class/Interface Association Class/Interface
Instance Field Instance Class Aggregation Class/Interface
Field +
Instance Instance Class Composition Class/Interface
Lifetime property

Table 1: Definitions and applicability of the relationships

We define an aggregation relationship be- another formalisation. For example, the exclu-
tween A and B, AG(A, B), as: sivity property is the only mean to distinguish
an aggregation from a composition relationship
AG(A, B) = because values of the other properties of the ag-
(IS(A, B) ⊆ {field, array gregation relationship satisfy the composition
field, relationship.
collection}) ∧ In the second step, we study the definitions
(IS(B, A) = Ø) ∧ of the relationships from the literature and we
(EX(A, B) ∈ B) ∧ (EX(B, A) ∈ B) ∧ show that they are all expressed using, at least,
(LT (A, B) ∈ k) ∧ (LT (B, A) ∈ k) ∧ these four properties. For example, the defini-
(M U (A, B) = [1, +∞]) ∧ tions of the aggregation and compositions rela-
(M U (B, A) = [0, +∞]) tionships by Henderson-Sellers and Barbier [12,
table 4, page 356] use several characteristics,
We define a composition relationship be- among which: C1. Propagation of one or more
tween A and B, CO(A, B), as: operations and C5. Propagation of destruction
operation related to the invocation site and life-
CO(A, B) = time properties; C2. Ownership related to the
(IS(A, B) ⊆ {field, array exclusivity property; P1. Whole–part related
field, to the multiplicity property. Thus, algorithms
collection}) ∧ based on these four minimal properties identify
(IS(B, A) = Ø) ∧ and only identify association, aggregation, and
(EX(A, B) = true) ∧ composition relationships.
(EX(B, A) = f alse) ∧
(LT (A, B) = +) ∧ (LT (B, A) = −) ∧
(M U (A, B) = [1, +∞]) ∧ Algorithms. Identification of association re-
(M U (B, A) = [1, 1]) lationships requires collecting the value of the
IS property only, values of the other properties
being indifferent. Identification of aggregation
We show in two steps that the four properties relationships requires inferring the values of the
are minimal with respect to our definitions and IS and M U properties. Identification of the
to other properties of the association, aggrega- composition relationships requires the values of
tion, and composition relationships: First, we the IS and M U properties and the values of the
show that the properties are minimal for our EX and LT properties. We compute values of
definitions; Second, we show that the proper- the invocation site, IS, and multiplicity, M U ,
ties appear in all definitions of the relationships properties on static models. We infer the val-
in literature. For lack of space, we cannot de- ues of the exclusivity, EX, and lifetime, LT ,
tail here these two steps. The interested reader properties from dynamic models.
may refer to [10] for the demonstration. The computation of the static values (IS
Typically, in the first step, we remove a prop- and M U ) of the three relationships is simple
erty from the formalisation of a relationship to perform by analysing programs static mod-
and we show that we cannot distinguish it from els. The values of the M U property corre-

9
sponds to the fields and arrays and their mul-
tiplicities (i.e., multiplicity 1 and +∞). A dif-
ficulty arises when fields are typed as Java col-
lections (Collection, Map), because these col-
lections are not typed. If we assume that these
kinds of collections are homogeneous (contain-
ing elements with a common superclass differ-
ent from Object), it is possible to determine
their types using well-known Java program-
ming idioms, such as pairs of add()–remove()
accessors [13, 18].
We assign a value to the IS property ac-
cording to invocation sites and message types
of method calls. We iterate through the
class files, looking for byte-codes corresponding
to method calls: InvokeInterface, Invoke- Figure 4: JHotDraw core classes
Static, InvokeSpecial, and InvokeVirtual.
The computation of the dynamic values (EX
and LT ) of the composition relationship is position relationships may vary depending on
based on the dynamic models of programs. the execution paths taken when running pro-
We check the exclusivity and lifetime prop- grams. Identification of composition relation-
erties of composition relationships with the ships suffer from the common limitations of dy-
Caffeine tool and dedicated Prolog predi- namic analyses.
cates, as presented in Section 2. The pred-
icates instanceLevelCompositions/1 and
3.3 Precision
classLevelCompositions/1 compute the val-
ues of the exclusivity and lifetime properties The Ptidej tool suite provide precise class di-
using the order in which field modifications, agrams, representative of programs implemen-
finalizer exits, and program-end occur. They tations. Indeed, class diagrams are built with
infer the presence of composition relationships both the static and dynamic models of Java
among instances and their respective classes programs using the PADL ClassFile Cre-
from the values of EX and LT . ator and Caffeine tools. They describe the
We performed extensive testing of our algo- programs classes and interfaces, and their re-
rithms on several programs, in particular Java lationships accurately, using precise definitions
AWT v1.2.2, JHotDraw v5.1, and JUnit and formalisations of the relationships and re-
v3.7. We analysed each program manually and peatable algorithms. In particular, we use pre-
compared the results of our analyses with these cise formalisations of the association, aggrega-
of our algorithms. We find that the identifi- tion, and composition relationships with four
cation of association relationships has a preci- minimal properties, which allow our algorithms
sion of 100% and a recall of 100% (4,925 ex- to identify these relationships accurately.
isting), the identification of aggregation rela-
tionships has a precision of 75% and a recall
of 96% (32 existing, 24 found, 1 false hit), and 4 Application
the identification of composition relationships
has a precision of 100% and a recall of 100% We present an experimentation of our tool
(3 existing). The identification of aggregation suite on the JHotDraw program. We choose
relationships does not have a precision and a JHotDraw because it is an independent
recall of 100% because the developers did not medium-size real-world program. We want
respect some of the idioms used in our detec- to show that class diagrams recovered using
tion algorithms to compute values for the M U Ptidej are (1) easily obtained and (2) more
properties. Also, precision and recall for com- precise than class diagrams usually provided

10
Figure 5: Top view of JHotDraw core classes in Ptidej and their concrete relationships

with programs documentation. Thus, we com- are shown on the class diagram, Figure 4, as
pare the architecture of JHotDraw as de- provided in the framework documentation.
scribed by class diagrams from its documen- A DrawingWindow is a window, subclass
tation with the class diagram obtained us- of Frame, displaying a Drawing through a
ing Ptidej. For each class, interface, and DrawingView, subclass of Panel, with a
their relationships in the automatically reverse- Tool to manipulate the drawing. An in-
engineered class diagram of JHotDraw, we stance of DrawingWindow aggregates instances
assess their consistency (existence, absence, of DrawingView, Drawing, and Tool (white
characteristics) with the JHotDraw core lozenges). An instance of DrawingView
classes diagram, on Figure 4. knows its containing DrawingWindow, con-
tained Drawing, and selected Figure (ar-
4.1 JHotDraw rows). Instances of Drawing use instances of
DrawingView (dash arrow). A Drawing is com-
JHotDraw is a highly customisable two- posed of Figures (white lozenge with black
dimensional graphic framework for structured circle) which know their containing Drawing
drawing editors [14]. It simplifies the devel- (arrow) and create Handles to allow user-
opment of drawing applications, such as for interactions (dash arrow with black circle).
Pert diagrams, UML diagrams. The 5.1 version
weighs 155 classes, distributed across 11 pack-
ages, for about 16,000 lines of Java source code.
4.2 JHotDraw and Ptidej
Its source code and binaries are freely available The class diagram shown in Figure 4 does not
at http://members.pingnet.ch/gamma/. Its reflect the real implementation of the JHot-
core classes, interfaces, and their relationships Draw framework: It is obsolete and inaccu-

11
rate. It is neither complete nor precise enough classes implementing interface Figure play the
to allow maintainers to perform modifications role of Leaves.
with confidence. The class diagram shows Finally, classes are associated with, use,
classes and relationships that do not exist in or create various instances of other classes
the real implementation and only represents a to perform their tasks, for example class
simplification of the framework. Indeed, we DrawApplication creates its composing in-
built the JHotDraw class diagram using the stances of classes StandardDrawing and
Ptidej tool suite and we found several differ- StandardDrawingView.
ences with the provided class diagram. Fig-
ures 5 and 6 show the JHotDraw core classes, 4.3 Class Diagrams Comparisons
from the documentation, in Ptidej.
The DrawingWindow class has been renamed The class diagram shown in Figures 5 and 6
DrawingEditor in the implementation. Core provides maintainers with a more precise view
classes are in facts interfaces (<<interface>> of the JHotDraw framework than this pro-
stereotype) and only use relationships exist vided by the authors, in Figure 4.
among them (-u--> symbol). We must add The class diagram is more precise because it
classes implementing these interfaces to display is built from both static and dynamic models
relationships really existing among them. Fig- of JHotDraw, using precise and consensual
ures 5 and 6 show the JHotDraw core inter- definitions of the use, association, aggregation,
faces and implementation classes, recovered by and composition relationships and all data re-
static and dynamic analyses. quired to distinguish these relationships. Thus,
The DrawApplication class, implementing it distinguishes clearly classes, interfaces, and
the DrawingEditor interface, is composed of inheritance, instantiation, use, association, ag-
(black lozenge) the Drawing interface (and its gregation, and composition relationships.
implementation class StandardDrawing). It The class diagram is built semi-
is also composed of the StandardDrawingView automatically (with user-interactions), using
and Tool classes. These composition relation- the Ptidej tool suite, and does not require
ships are conform to what we could expect: An any manual analysis. It is created by the
instance of class DrawApplication represents PADL ClassFile Creator tool in about 2
the JHotDraw editor, composed of a view seconds (along with its graphical representa-
and a drawing panel, when the editor is de- tion) on an AMD Athlon 64bits processor at
stroyed (closed), view and drawing panel are 2GHz. It is refined with data obtained from
destroyed also. Caffeine, which computation-time depends
on the number of generated events [11].
The StandardDrawingView class is com-
Typically, execution time may be slowed down
posed of instances of classes implementing
by a factor between 100 and 5,000 because of
the Figure interface, through the com-
the inefficient yet frequent exchange of data
posed instance of class StandardDrawing,
between the Prolog engine performing the
implementing the Drawing interface and
analyses and the analysed program.
extending the CompositeFigure class.
The StandardDrawingView class aggre-
gates instances of classes implementing the 5 Conclusion
Drawing and DrawingEditor interfaces (white
lozenges). Instances of StandardDrawingView We presented Ptidej, a tool suite for the
use instances of the class implementing in- precise semi-automatic reverse engineering of
terface DrawingEditor as backpointers to Java programs as UML-like class diagrams;
send messages to their parents (instances of i.e., classes and interfaces, inheritance, in-
DrawApplication). stantiation, use, association, aggregation, and
The CompositeFigure class implements the composition relationships. Ptidej uses both
Composite design pattern: The Figure in- static and dynamic models of programs. Static
terface plays the role of Component, the models are analysed using the PADL Class-
CompositeFigure of Composite, and other File Creator tool, dynamic models using the

12
Figure 6: Bottom view of JHotDraw core classes in Ptidej and their concrete relationships

Caffeine tool. PADL ClassFile Creator fulness of recovered class diagrams for main-
and Caffeine compute values of four mini- tainers. Also, we intend to implement sophis-
mal properties (exclusivity, lifetime, multiplic- ticated layout algorithms to improve the visual
ity, and invocation site) that we use to for- appealing of the reverse engineered class di-
malise the use, association, aggregation, and agrams [7, 17]. Finally, we investigated the
composition relationships. We exemplified the use of the reverse engineered class diagrams
Ptidej tool suite on a simple document de- to identify automatically design patterns [1].
scription program and detailed its application We plan to extend our experience to design de-
on the JHotDraw framework. We showed fects [9] to help maintainers further.
that the class diagram obtained for the JHot-
Draw framework semi-automatically is more
precise than the class diagram provided with References
the documentation from the authors.
[1] Hervé Albin-Amiot, Pierre Cointe, Yann-Gaël
Currently, we work on replacing dynamic Guéhéneuc, and Narendra Jussien. Instanti-
analyses with type analyses of single uses of val- ating and detecting design patterns: Putting
ues. Also, we are extending PADL, using its bits and pieces together. In Debra Richardson,
implementation of the Visitor design pattern, Martin Feather, and Michael Goedicke, edi-
with recovery algorithms for more UML con- tors, proceedings of the 16th conference on Au-
stituents, such as data-types, implementation tomated Software Engineering, pages 166–173.
classes, utility classes. Future work includes IEEE Computer Society Press, Nov. 2001.
analyses of real-world programs (thousand of [2] Hervé Albin-Amiot and Yann-Gaël
classes), such as telecommunication systems or Guéhéneuc. Meta-modeling design pat-
development environments to assess the use- terns: Application to pattern detection and

13
View publication stats

code synthesis. In Bedir Tekinerdogan, Pim Doug C. Schmidt, editor, proceedings of the
Van Den Broek, Motoshi Saeki, Pavel Hruby, 19th conference on Object-Oriented Program-
and Gerson Sunyé, editors, proceedings of ming, Systems, Languages, and Applications.
the 1st ECOOP workshop on Automating ACM Press, Oct. 2004. To appear.
Object-Oriented Software Development Meth- [11] Yann-Gaël Guéhéneuc, Rémi Douence, and
ods. Centre for Telematics and Information Narendra Jussien. No Java without Caffeine –
Technology, University of Twente, Oct. 2001. A tool for dynamic analysis of Java programs.
TR-CTIT-01-35. In Wolfgang Emmerich and Dave Wile, edi-
[3] Giuliano Antoniol, Roberto Fiutem, and tors, proceedings of the 17th conference on Au-
L. Cristoforetti. Design pattern recovery in tomated Software Engineering, pages 117–126.
object-oriented software. In Scott Tilley and IEEE Computer Society Press, Sep. 2002.
Giuseppe Visaggio, editors, proceedings of the
[12] Brian Henderson-Sellers and Franck Barbier.
6th International Workshop on Program Com-
A survey of the UML’s aggregation and com-
prehension, pages 153–160. IEEE Computer
position relationships. L’objet : Logiciel, Base
Society Press, Jun. 1998.
de données, Réseaux, 5(3/4):339–366, Dec.
[4] Shigeru Chiba. Javassist – A reflection-based 1999.
programming wizard for Java. In Jean-Charles
Fabre and Shigeru Chiba, editors, proceedings [13] Daniel Jackson and Allison Waingold.
of the OOPSLA workshop on Reflective Pro- Lightweight extraction of object models from
gramming in C++ and Java. Center for Com- bytecode. In David Garlan and Jeff Kramer,
putational Physics, University of Tsukuba, editors, proceedings of the 21st International
Oct. 1998. UTCCP Report 98-4. Conference on Software Engineering, pages
194–202. ACM Press, May 1999.
[5] Franco Civello. Roles for composite objects
in object-oriented analysis and design. In An- [14] Wolfram Kaiser. Become a programming pi-
dreas Paepcke, editor, proceedings of the 8th casso with JHotDraw – Use the highly cus-
conference on Object-Oriented Programming, tomizable GUI framework to simplify draw
Systems, Languages, and Applications, pages application development. JavaWorld, Feb.
376–393. ACM Press, Sep. 1993. 2001.
[6] Serge Demeyer, Stéphane Ducasse, and Oscar [15] Jeffrey Korn, Yih-Farn Chen, and Eleftherios
Nierstrasz. Finding refactorings via change Koutsofios. Chava: Reverse engineering and
metrics. In Doug Lea, editor, proceedings of tracking of Java applets. In Kostas Kontogian-
15th conference on Object-Oriented Program- nis and Françoise Balmas, editors, proceedings
ming Systems, Languages and Applications, of the 6th Working Conference on Reverse En-
pages 166–177. ACM Press, Oct. 2000. gineering, pages 314–325. IEEE Computer So-
[7] Holger Eichelberger and Jürgen Wolff von Gu- ciety Press, Nov. 1999.
denberg. On the visualization of Java pro- [16] Object Management Group, Inc. UML v1.5
grams. In Stephan Diehl, editor, proceed- Specification, Mar. 2003.
ings of the 1st international seminar on Soft- [17] Jochen Seemann. Extending the Sugiyama
ware Visualization, pages 295–306. Springer- algorithm for drawing UML class diagrams:
Verlag, May 2002. Towards automatic layout of object-oriented
[8] Matt Greenwood. CFParse Distribution. IBM software diagrams. In Giuseppe Di Battista,
AlphaWorks, Sep. 2000. editor, proceedings of the 5th international
[9] Yann-Gaël Guéhéneuc and Hervé Albin- symposium on Graph Drawing, pages 415–424.
Amiot. Using design patterns and constraints Springer-Verlag, Sep. 1997.
to automate the detection and correction of [18] Paolo Tonella and Alessandra Potrich. Re-
inter-class design defects. In Quioyun Li, verse engineering of the UML class diagram
Richard Riehle, Gilda Pour, and Bertrand from C++ code in presence of weakly typed
Meyer, editors, proceedings of the 39th con- containers. In Gerardo Canfora and An-
ference on the Technology of Object-Oriented neliese Amschler Andrews-Von Maryhauser,
Languages and Systems, pages 296–305. IEEE editors, proceedings of the 9st International
Computer Society Press, Jul. 2001. Conference on Software Maintenance, pages
[10] Yann-Gaël Guéhéneuc and Hervé Albin- 376–385. IEEE Computer Society Press, Nov.
Amiot. Recovering binary class relation- 2001.
ships: Putting icing on the UML cake. In

14

You might also like