Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

JSummarizer: An Automatic Generator of Natural

Language Summaries for Java Classes


Laura Moreno1, Andrian Marcus1, Lori Pollock2, K. Vijay-Shanker2
1 2
Department of Computer Science Computer and Information Sciences Department
Wayne State University University of Delaware
Detroit, MI, USA Newark, DE, USA
{lmorenoc, amarcus}@wayne.edu {pollock, vijay}@cis.udel.edu

Abstract—JSummarizer is an Eclipse plug-in for automatically II. CLASS SUMMARIZATION


generating natural language summaries of Java classes. The
summary is based on the stereotype of the class, which implicitly
Summarizing a class is more complex than simply listing its
encodes the design intent of the class and is automatically methods and/or its attributes. Object-Oriented (OO) classes
inferred by JSummarizer. The tool uses a set of predefined have generic responsibilities (i.e., domain-independent) and
heuristics to determine what information will be reflected in the specific responsibilities (i.e., domain-dependent). For example,
summary, and it uses natural language processing and generation the main functionality of a class may be providing data (generic
techniques to form the summary. The generated summaries can role) of a particular file, such as an audio file (specific role).
be used to re-document the code and to help developers to easier Ideally, both roles should be reflected by the class summaries.
understand large and complex classes. While the specific responsibilities can be inferred from the
Index Terms—Source code summarization, program textual information embedded in the source code (e.g.,
comprehension, documentation generation. identifiers or comments), the generic responsibilities of a class
must be inferred from its design. To this end, JSummarizer has
I. INTRODUCTION a component that automatically infers the class stereotype [2],
During software evolution, depending on the task at hand, based on the stereotypes of its member methods. Class
developers need to understand relevant parts of the code. In stereotypes are low-level patterns that capture the design intent
consequence, developers often spend more time reading code of the class. For example, a class consisting mostly of methods
[1] than writing it. Good leading comments help when reading that are in charge external objects (i.e., factory and controller
code, by providing developers with at least a superficial methods) is stereotyped as controller.
understanding of the source code artifact that they describe. Next, JSummarizer uses the class stereotype to determine
However, outdated or missing comments are very common and what parts of the class should be reflected in the summary,
developers often must read more of the code or turn to external mostly fields and attributes of the class. The summary of a
documentation in order to gain any understanding of the code class generated by JSummarizer consists of:
relevant to their task.  a general description based on its interfaces,
An obvious solution to this problem would be enforcing the superclass, and/or stereotype;
creation and continuous update of internal documentation.  the characterization of its structure given by the
While such a solution may work with new code, it will likely definition of its class stereotype;
not work on existing, poorly-documented code. A more  a description of its behavior provided by the relevant
suitable approach is automatically generating summaries that methods, grouped in blocks; and
describe the code. Such summaries can be used for re-  the enumeration of its inner classes, if they exist.
documentation and will help developers in quickly
understanding existing code. The goal of the summaries generated by JSummarizer is to
We present JSummarizer, an Eclipse plug-in that highlight the main functionality of a class, so it ignores
automatically generates natural language descriptions of Java algorithmic details of the implementation. It also ignores the
classes. The tool takes as input a Java class and its parent existing comments (if any).
project and produces a short, text-based description of that JSummarizer is very easy to use and fast. The user has the
class, which is inserted as a Javadoc comment. The summary option of selecting a single class and generating its summary
focuses on the content and responsibilities of the class by (right click in the class browser), or selecting an entire project
considering its stereotype, which is automatically inferred by to generate the summaries for each class in it. Single class
JSummarizer. The tool implements a set of predefined summaries are instantaneously generated, whereas for small to
heuristics to determine the information to be included in the medium sized systems it takes seconds (e.g., it took 16 seconds
summary, and makes use of natural language processing and to generate the summaries for aTunes - version 1.6.0, an audio
generation techniques to form the summary. player and manager system consisting of 218 classes, on an
Intel core-i3 powered laptop running Windows 7).

978-1-4673-3092-3/13/$31.00
c 2013 IEEE 230 ICPC 2013, San Francisco, CA, USA
III. DESIGN AND IMPLEMENTATION We expect that JSummarizer will be useful in many
JSummarizer is implemented as a plug-in for the Eclipse evolution and maintenance tasks. We plan to evaluate
development environment, since it is aimed at supporting JSummarizer in the context of traceability link recovery [12]
developers in their daily tasks. The general process for and feature location. The summaries generated by JSummarizer
generating class summaries is presented in Figure 1. are generic, so for specific applications they will need to be
modified to include information needed for the task at hand.
JSummarizer takes as input a Java class (labeled ) and its
respective project ( ). Both elements are used in the VI. AVAILABILITY
identification of the class stereotype (1), by determining the
JSummarizer is free and publicly available for academic
stereotypes of its member methods (2), which in turn are
and non-commercial use at this stage. The most recent version
inferred from their structural attributes (3). Next, the content to
of the plug-in is available to download at:
be included in the summary is selected according to stereotype
http://www.cs.wayne.edu/~severe/jsummarizer
information (4) and the access level of the methods (5). Then,
the summary is built (6) by generating natural language phrases ACKNOWLEDGEMENT
for the selected content (7). Finally, the summary is added (8)
This work was supported in part by grants from the
to the Javadoc comment of the class ( ), which represents the National Science Foundation (CCF-0845706, CCF-1017263,
main output of the tool (Figure 2). and CCF-0915803).
The four major components of JSummarizer (shown in
Figure 1 by the white rectangles) serve for: the identification of REFERENCES
code stereotypes (i.e., JStereoCode [3]); the selection of
[1] A. J. Ko, B. A. Myers, M. J. Coblenz, and H. H. Aung, "An Exploratory
information to be included in the summary; the generation of
Study of How Developers Seek, Relate, and Collect Relevant
the natural-language description; and its inclusion in the
Information during Software Maintenance Tasks," IEEE Transactions on
corresponding Javadoc comment. We developed and presented
Software Engineering (TSE), vol. 32, pp. 971-987, 2006.
the first component independently [3]. The text generation
component is based on the previous work of Hill et al. [4-6], [2] N. Dragan, M. L. Collard, and J. I. Maletic, "Automatic Identification of
whereas the other components are unique to JSummarizer. Class Stereotypes," in 26th IEEE International Conference on Software
Maintenance (ICSM'10), Timisoara, Romania, 2010, pp. 1 -10.
IV. RELATED WORK [3] L. Moreno and A. Marcus, "JStereoCode: Automatically Identifying
The automatic summarization of software artifacts is an Method and Class Stereotypes in Java Code," in 27th IEEE/ACM
emerging field in software engineering, especially the International Conference on Automated Software Engineering (ASE'12),
generation of natural language descriptions of source code. Essen, Germany, 2012, pp. 358-361.
Sridhara et al. proposed genSumm [5], a tool that [4] E. Hill, L. Pollock, and K. Vijay-Shanker, "Automatically Capturing
automatically generates natural language comments for Java Source Code Context of NL-Queries for Software Maintenance and
methods. Just like JSummarizer, this tool is implemented as an Reuse," in 31st IEEE International Conference on Software Engineering
Eclipse plug-in; however, it is not available for use. Note that (ICSE'09), Vancouver, BC, 2009, pp. 232-242.
JSummarizer does not compose summaries of methods that [5] G. Sridhara, E. Hill, D. Muppaneni, L. Pollock, and K. Vijay-Shanker,
would be generated by genSumm. "Towards Automatically Generating Summary Comments for Java
Other simpler, but still important programming constructs Methods," in 25th IEEE/ACM International Conference on Automated
have been automatically described/documented. This is the Software Engineering (ASE'10), Antwerp, Belgium, 2010, pp. 43-52.
case of formal method parameters [7] and exceptions in Java [6] E. Hill, "Developing Natural Language-based Software Analyses And
systems [8]. In a different way, Deltadoc [9] focuses on Tools to Expedite Software Maintenance," Ph.D., Dept. of Computer and
describing the changes in the code by dynamically analyzing Information Sciences, University of Delaware, Newark, DE, 2010.
the behavior of the program. Also, Rastkar et al. [10] proposed [7] G. Sridhara, L. Pollock, and K. Vijay-Shanker, "Generating Parameter
a prototype for the automatic generation of summaries for Comments and Integrating with Method Summaries," in 19th IEEE
cross-cutting concerns.
International Conference on Program Comprehension (ICPC'11),
To the best of our knowledge, JSummarizer is the first tool
Kingston, Ontario, Canada, 2011, pp. 71 - 80
that automatically generates natural language documentation
[8] R. Buse and W. Weimer, "Automatic documentation inference for
for Java classes.
exceptions," in International Symposium on Software Testing and
V. EVALUATION AND FUTURE WORK Analysis (ISSTA'08), Seattle, WA, USA, 2008, pp. 273-282.
[9] R. Buse and W. Weimer, "Automatically Documenting Program
JSummarizer is a fully automated tool, independent of any
existing documentation or external domain knowledge. In an Changes," in 25th IEEE/ACM International Conference on Automated
initial empirical evaluation we asked programmers to evaluate Software Engineering (ASE'10), Antwerp, Belgium, 2010, pp. 33-42.
summaries generated by JSummarizer [11]. The subjects found [10] S. Rastkar, G. C. Murphy, and A. W. J. Bradley, "Generating Natural
that, in most cases, the summaries are concise, readable, and Language Summaries for Crosscutting Source Code Concerns," in 27th
understandable, and moreover, do not miss important IEEE International Conference on Software Maintenance (ICSM'11),
information. Williamsburg, VA, 2011, pp. 103-112.

231
[11] L. Moreno, J. Aponte, G. Sridhara, A. Marcus, L. Pollock, and V. [12] J. Aponte and A. Marcus, "Improving Traceability Link Recovery
Shanker, "Automatic Generation of Natural Language Summaries for Methods through Software Artifact Summarization," in 6th International
Java Classes," in 21st International Conference on Program Workshop on Traceability in Emerging Forms of Software Engineering
Comprehension (ICPC'13), San Francisco, USA, 2013, in press. (TEFSE'11), Honolulu, Hawaii, 2011, pp. 46-49.

Fig. 1. Summary generation process in JSummarizer

(a) (b) (c)


Fig. 2. Different visualizations of the summary generated by JSummarizer for the class AudioFile of aTunes system. In (a) it is shown the full
generated summary as added to the Javadoc of the class, including the HTML tags; in (b) it is presented this Javadoc as shown by a browser; in (c) the
same Javadoc is shown as displayed by the Eclipse pop-up Javadoc, when the mouse is over the name of the class.

232

You might also like