Data Mining For Software Engineering: Computer July 2009

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220476662

Data Mining for Software Engineering

Article  in  Computer · July 2009


DOI: 10.3233/IDA-2010-0475 · Source: DBLP

CITATIONS READS

96 1,778

4 authors, including:

Tao Xie David Lo


University of Illinois, Urbana-Champaign Singapore Management University
289 PUBLICATIONS   8,390 CITATIONS    389 PUBLICATIONS   7,833 CITATIONS   

SEE PROFILE SEE PROFILE

Chao Liu
Suzhou University
62 PUBLICATIONS   2,946 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

mining software repository View project

SE knowledge sharing and management (e.g., Stack Overflow, Github) View project

All content following this page was uploaded by Tao Xie on 01 December 2014.

The user has requested enhancement of the downloaded file.


Mining Software Engineering Data

Tao Xie Jian Pei Ahmed E. Hassan


North Carolina State Univ. Simon Fraser Univ. Univ. of Victoria
USA Canada Canada
xie@csc.ncsu.edu jpei@cs.sfu.ca ahmed@ece.uvic.ca

Abstract lately within software engineering. The workshop in Min-


ing Software Repositories (MSR) has been recognized as
Software engineering data (such as code bases, exe- the most attended workshop at ICSE since 2001. MSR 2006
cution traces, historical code changes, mailing lists, and was oversubscribed. As a reflection of the great interest in
bug databases) contains a wealth of information about the area and the importance of the MSR work within the
a project’s status, progress, and evolution. Using well- context of software engineering, the best papers for three of
established data mining techniques, practitioners and re- the major conferences within SE (ICSE, ASE, and ICSM)
searchers can explore the potential of this valuable data for 2006 are on applying data mining techniques on SE data.
in order to better manage their projects and to produce A recent issue of IEEE Transactions on Software Engineer-
higher-quality software systems that are delivered on time ing (TSE) on the MSR topic received over 15% of all the
and within budget. submissions to the TSE in 2005 [6].
This tutorial presents the latest research in mining Soft- The tutorial will provide participants with an overview of
ware Engineering (SE) data, discusses challenges associ- the field of mining software engineering data, as shown in
ated with mining SE data, highlights SE data mining suc- Figure 1. In particular, the tutorial will cover the following
cess stories, and outlines future research directions. Partic- topics along three dimensions (software engineering, data
ipants will acquire knowledge and skills needed to perform mining, and future directions):
research or conduct practice in the field and to integrate
data mining techniques in their own research or practice. 1. Software Engineering:

(a) What types of SE data are available to be mined?


1. Introduction (b) Which SE tasks can be helped using data mining?
(c) How are data mining techniques used in SE?
Software engineering data (such as code bases, execu-
tion traces, historical code changes, mailing lists, and bug 2. Data Mining:
databases) contains a wealth of information about a soft-
(a) What are the challenges in applying data mining
ware project’s status, progress, and evolution. Many studies
techniques to SE data?
have emerged that use this data to support various aspects of
software development within industrial and open source set- (b) Which data mining techniques are most suitable
tings. Working with Nokia, Gall et al. [4] have shown that for specific types of SE data?
software repositories can help developers change legacy (c) What are freely available data mining and analy-
systems by pointing out hidden code dependencies. Work- sis tools (e.g., R [1] and WEKA [2])?
ing with Bell Labs and Avaya, Graves et al. [5] and Mockus
et al. [8] demonstrated that historical change information 3. Future Directions: What are the challenges and op-
can support management in building reliable software sys- portunities for the data mining and software engineer-
tems by predicting bugs and effort. Working on open source ing communities?
projects, Chen et al. [3] have shown that historical informa-
tion can assist developers in understanding large systems. The tutorial will cover these topics through case studies
Although the idea of applying data mining techniques on from recent software engineering conferences. Participants
software engineering data has existed since mid 1990s [7], will gain the knowledge needed to accomplish the following
the idea has especially attracted a large amount of interest tasks:
they should be customized to fit the requirements and char-
acteristics of SE data.
Second, we intend to understand the current research
and development frontier of data mining practice in soft-
ware engineering. We shall summarize several kinds of data
mining problems in software engineering that are under ac-
tive investigation based on three major perspectives: data
sources being mined, tasks being assisted, and mining tech-
niques being used. Through this discussion, researchers can
rapidly join this active research area and gain immediate
access to commonly available mining techniques for real
problems.
Third, we intend to analyze successful cases of mining
SE data. We shall review and demonstrate briefly several
research prototypes of data-mining systems for software en-
gineering. Through the case studies, the participants can
Figure 1. Overview of mining SE data understand how to build a testbed for research and develop-
ment.
Finally, we intend to give an overview on commonly
1. Appreciate the latest advancement and success stories used data mining tools. Our overview will help the par-
in the field of mining SE data; ticipants gain a better understanding of available tools. The
participants can use such tools in order to explore their data
2. Conduct leading-edge research in the field of mining and integrate data mining techniques in their research and
SE data; day to day work.
3. Apply data mining techniques on their own SE data
using advanced data mining analysis tools and algo- References
rithms;
[1] The R Project for Statistical Computing. Available online at
4. Contrast their results relative to other work within the http://www.r-project.org/.
[2] Weka 3: Data Mining Software in Java. Available online at
field;
http://www.cs.waikato.ac.nz/ml/weka/.
[3] A. Chen, E. Chou, J. Wong, A. Y. Yao, Q. Zhang, S. Zhang,
5. Recognize open problems and possible research direc-
and A. Michail. CVSSearch: Searching through source code
tions within the field.
using CVS comments. In Proceedings of the 17th Interna-
tional Conference on Software Maintenance, pages 364–374,
2. Detailed Overview Florence, Italy, 2001.
[4] H. Gall, K. Hajek, and M. Jazayeri. Detection of logical cou-
pling based on product release history. In Proceedings of
The tutorial will provide a good understanding of exist- the 14th International Conference on Software Maintenance,
ing research on mining SE data. The tutorial will categorize pages 190–198, Bethesda, Washington D.C., 1998.
the existing research [9] in this field into three major per- [5] T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy. Predicting
spectives: data sources being mined, tasks being assisted, fault incidence using software change history. IEEE Trans.
and mining techniques being used. Figure 1 shows such a Softw. Eng., 26(7):653–661, 2000.
categorization with the bottom part as a set of software engi- [6] A. E. Hassan, A. Mockus, R. C. Holt, and P. M. Johnson.
Guest editor’s introduction: Special issue on mining software
neering data being mined, the middle part as a set of mining
repositories. IEEE Trans. Softw. Eng., 31(6):426–428, 2005.
techniques being used, and the top part as a set of software [7] M. Mendonca and N. L. Sunderhaft. Mining software engi-
engineering tasks being assisted. neering data: A survey. A DACS state-of-the-art report, Data
From the categorization, we intend to investigate the fol- & Analysis Center for Software, Rome, NY, 1999.
lowing four issues. First, we intend to identify inherent [8] A. Mockus, D. M. Weiss, and P. Zhang. Understanding
challenges of mining software engineering data. We shall and predicting effort in software projects. In Proceedings of
elaborate the essential requirements in software engineer- the 25th International Conference on Software Engineering,
ing, and analyze the differences between mining software pages 274–284, Portland, Oregon, 2003.
[9] T. Xie. Bibliography on mining software engineering data.
engineering data and mining other types of scientific and
Available online at
engineering data. We shall discuss what types of data min- http://ase.csc.ncsu.edu/dmse/.
ing techniques are desired in software engineering, and how

View publication stats

You might also like