Professional Documents
Culture Documents
Machinelearning Techniques For Software Product Quality Assessme
Machinelearning Techniques For Software Product Quality Assessme
Machinelearning Techniques For Software Product Quality Assessme
Probabilistic Induction
On the other hand, the multilayer perceptron is probably
algorithms Analogy the most widely used neural network architecture to solve
classification problems with supervised learning. It
consists of an input layer, one or more hidden layers, and
Neural
networks Case-based an output layer of neurons that feed one another via
learning synaptic weights. The input layer simply holds the data to
Induction of be classified. The outputs of the hidden and output layers
decision trees are computed, for each neuron, by calculating the sum of
Induction of
rules its inputs multiplied by its synaptic weights and by
passing the result through an output function. The
synaptic weights of the hidden and output neurons are
Figure 1. ML Taxonomy determined by a training procedure where examples of the
OC1 (Oblique Classifier 1) is also a decision tree patterns to be learned are presented successively, and
induction system designed for applications where the where the weights are adjusted so as to minimize the error
instances have numeric (continuous) feature values, which between the obtained classification results and the desired
is the case in our study. However, It builds decision trees ones. The standard training procedure for the multilayer
that contain linear combinations of one or more attributes perceptron uses the backpropagation algorithm [23] or one
at each internal node: of its derivatives. For instance, the resilient
backpropagation (RPROP) algorithm [24] has faster
i k
learning and a better overall performance than regular
¦a x
i 1
i i ak 1
backpropagation or backpropagation with a momentum
These trees then partition the space of examples with both [23].
oblique and axis-parallel hyper planes. More details on Case-Based Learning (CBL) is the most popular technique
this algorithm are given in [21]. in the learning by analogy paradigm. The training
The induction of rules is another important ML approach. examples are stored verbatim, and a distance function is
Because the structure underlying many real-world datasets used to determine which member(s) of the training set is
is quite rudimentary, and just one attribute is sufficient to closest to an unknown test instance. Once the nearest
determine the class of an instance quite accurately, it turns training instance(s) has/have been located, its class is
out that simple rules frequently achieve surprisingly high predicted for the test instance. Although there are multiple
accuracy. The pseudo-code for such an approach (called choices, most instance-based learners use Euclidian
one rule) is the following:
[1] L. Briand, P. Devanbu, W. Melo, “An Investigation into [17] L.C. Briand, J. Wust, H. Lounis, “Replicated Case Studies
Coupling Measures for C++”, Proceedings of ICSE ‘97, Boston, for Investigating Quality Factors in Object-Oriented Designs”, in
USA, 1997. Emp. Soft. Engineering, an Int. Journal, 6 (1):11-58, 2001,
Kluwer Academic Publishers.
[2] J.M. Bieman, B.-K. Kang, “Cohesion and Reuse in an
Object-Oriented System”, in Proc. ACM Symp. Software [18] L. Breiman, J. Friedman, R. Olshen, C. Stone,
Reusability (SSR'94), 259-262, 1995. “Classification and Regression Trees”. Published by Wadsworth,
1984.
[3] S.R. Chidamber, C.F. Kemerer, “A Metrics Suite for Object
Oriented Design”, IEEE Transactions on Software Engineering, [19] B. Cestnik, I. Bratko, I. Kononenko, “ASSISTANT 86: a
20 (6), 476-493, 1994. knowledge elicitation tool for sophisticated users”, Progress in
machine learning, Sigma Press, 1987.
[4] W. Li, S. Henry, “Object-Oriented Metrics that Predict
Maintainability”, J. Systems and Software, 23 (2), 111-122, [20] J.R. Quinlan, “C4.5: Programs for Machine Learning”,
1993. Morgan Kaufmann Publishers, 1993.
[5] T.M. Khoshgoftaar, J.C. Munson, "Predicting Software [21] S.K. Murthy, S. Kasif, S. Salzberg, “A System for Induction
Development Errors Using Software Complexity Metrics", in of Oblique Decision Trees”, in Journal of Artificial Intelligence
IEEE journal on selected Areas in Communications, vol.8, n.2, Research 2, 1-32, august 1994.
February 1990. [22] P. Clark, T. Niblett, “The CN2 induction algorithm”, in ML
[6] A. Porter, R. Selby, “Empirically-guided software Journal, 3(4):261-283, 1989.
development using metric-based classification trees”, in IEEE [23] D. E. Rumelhart, J. L. McClelland, ed., Parallel Distributed
Software, 7(2):46-54, March 1990. Processing, Explorations in the Micro structure of Cognition;
[7] J.R. Quinlan, “Induction of decision tree”, in Machine vol. 1: Foundations, MIT press, 1986.
Learning journal 1, p 81-106, 1986. [24] M. Riedmiller, H. Braun, “A Direct Adaptive Method for
[8] V. Basili, Condon, K. El Emam, R. B. Hendrick, W. L. Melo, Faster Backpropagation Learning: The RPROP Algorithm”, in
“Characterizing and Modeling the Cost of Rework in a Library Proc. of the IEEE Intl. Conf. on Neural Networks, San
of Reusable Software Components”, in Proc. of the IEEE 19th Francisco, CA, 1993.
Int’l. Conf. on S/W Eng., Boston, 1997. [25] P. Langley, W. Iba, K. Thompson, “An analysis of Bayesian
[9] M. Jorgensen, “Experience with the Accuracy of Software Classifiers”, in Proc. of the National Conference on Artificial
Maintenance Task Effort Prediction Models”, in IEEE TSE, Intelligence, 1992.
21(8):674-681, August 1995. [26] R. Kohavi, “Scaling up the accuracy of naive-Bayes
[10] M.A. De Almeida, H. Lounis, W. Melo, "An Investigation classifiers: a decision-tree hybrid”, in Proc. of the 2nd Int.
on the Use of ML Models for Estimating Software Conference on Knowledge Discovery and Data Mining, 1996.