Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

The Journal of Systems and Software 85 (2012) 408–424

Contents lists available at SciVerse ScienceDirect

The Journal of Systems and Software


journal homepage: www.elsevier.com/locate/jss

Design patterns selection: An automatic two-phase method


Seyed Mohammad Hossein Hasheminejad, Saeed Jalili
SCS Lab., Department of Computer Engineering, Tarbiat Modares University, Tehran, Iran

a r t i c l e i n f o a b s t r a c t

Article history: Over many years of research and practices in software development, hundreds of software design patterns
Received 4 September 2010 have been invented and published. Now, a question which naturally arises is how software developers
Received in revised form 30 August 2011 select the right design patterns from all relevant patterns to solve design problems in the software design
Accepted 31 August 2011
phase. To address this issue, in this paper, we propose a two-phase method to select a right design pattern.
Available online 8 September 2011
The proposed method is based on a text classification approach that aims to show an appropriate way
to suggest the right design pattern(s) to developers for solving each given design problem. There are
Keywords:
two advantages of the proposed method in comparison to previous works. First, there is no need for
Software design pattern
Text classification
semi-formal specifications of design patterns and second, the suitable design patterns are suggested
Machine learning with their degree of similarity to the design problem. To evaluate the proposed method, we apply it
Automatic pattern selection on real problems and several case studies. The experimental results show that the proposed method is
promising and effective.
© 2011 Elsevier Inc. All rights reserved.

1. Introduction between the design and the implementation teams (Gamma et al.,
1994; Graves and Czarnecki, 2000).
In software development lifecycle, design is one of the most However, it is so difficult task to find right design patterns for
difficult tasks. In the literature, a number of design methods have solving a given design problem without any tool support, because
been advanced that among ones, architecture-based design meth- determining the applicability of a design pattern to a given prob-
ods which commonly use software architecture styles are believed lem heavily relies on the experience of the software developers
to be more important than the others (Bass et al., 2003). Each (Kim and Khawand, 2007) and it is also extremely difficult for
software architecture style includes some components and their novice developers who are not familiar with design patterns and
relations that meet the style’s constraints. do not know how to find the best one. In addition, there are a
A common practice in the architecture-based design methods is large number of design patterns (Booch, 2006) that become more
ADD (Attribute-Driven Design) (Bass et al., 2003). In ADD, software challenging for the developer who is highly knowledgeable about
developers initially choose a software architecture style consider- patterns.
ing quality attributes that the software system should have, then To help overcome these difficulties, in this paper, we attempt to
they assign system’s requirements to its architectural components. automatically suggest a right design pattern(s) to the developer in
At this step, the software developers can employ software design design phase according to the given design problem.
patterns for designing requirements related to each architectural By far, little work has been made for the automatic selection of
component. design pattern in the software design phase (Kim and Khawand,
A design pattern encapsulates a proven solution to a recurring 2007). On the contrary, many attempts have been made toward
design problem. Over many years, software developers suggest the determination of design patterns after the end of program-
solutions for satisfying design problems of architectural compo- ming phase, which is considered as a reverse engineering method
nents. These experience-based solutions are standardized and have to increase maintainability (Kim and Khawand, 2007).
been organized in the form of design patterns. The use of design Current attempts to select the right design patterns can be
patterns in software development can provide several advantages, divided into two approaches. First, a UML-Based approach (Kim and
such as: increasing reusability, modularization, quality, consistency Khawand, 2007; Hsueh et al., 2007; Kim and Shen, 2008) which
between the design and the implementation, and relationship uses existing UML diagrams in analysis phase (i.e., class diagrams
and collaboration diagrams) to define design patterns and determi-
nate the right design pattern(s) for each design problem. Second,
an Ontology-Based approach (Hasso and Carlson, 2005; Blomqvist,
E-mail addresses: SMH.Hasheminejad@Modares.ac.ir (S.M.H. Hasheminejad), 2008; Hasso and Carlson, 2004; Khoury et al., 2008) which uses
Sjalili@Modares.ac.ir (S. Jalili). ontologies for the definition and selection of the right design

0164-1212/$ – see front matter © 2011 Elsevier Inc. All rights reserved.
doi:10.1016/j.jss.2011.08.031
S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424 409

pattern(s). The UML-Based approach has a number of weaknesses. Table 1


The adapter design pattern description (Gamma et al., 1994).
Firstly, it has a limit to precisely specify the problem definition of
all design patterns. Secondly, it costs high for meta-model pro- Pattern name Adapter (Wrapper) Pattern
duction for each design pattern and is inefficient to deal with a Problem Domain - Problem Definition
large number of design patterns. Finally, it does not suggest differ- Intent Convert the interface of a class into another interface
ent design patterns according to the extent of similarity between clients expect. Adapter lets classes work together that
the problem definition of the retrieved design pattern and the could not otherwise because of incompatible
interfaces.
design problem. The Ontology-Based approach also has several
Motivation Sometimes a toolkit class that is designed for reuse is
shortcomings, for example, it is expensive and creates barriers in not reusable only because its interface does not match
the approach automation. In addition, due to the lack of a unique the domain-specific interface an application requires.
ontology for software engineering domain, it seems highly imprac- Consider for example a drawing editor that lets users
draw and arrange graphical elements (lines, polygons,
tical (Blomqvist, 2008).
text, etc.) into pictures and diagrams. The drawing
The proposed method is based on a Text Classification approach, editor’s key abstraction is the graphical object, which
aims at learning knowledge and experiences of experts who has an editable shape and can draw itself. The interface
invented design patterns and who can select the right design for graphical objects is defined by an abstract class
pattern(s) for each given design problem. The text classification called Shape.
The rest of Motivation section is available in Gamma
approach takes as an input design patterns, each having a spe-
et al. (1994).
cific label called Design Patterns Class that has been manually Applicability You want to use an existing class, and its interface does
determined by experts, for example, in book written by Gamma not match the one you need.
et al. (1994), design patterns are divided into three design pat- You want to create a reusable class that cooperates
with unrelated or unforeseen classes, that is, classes
terns classes, i.e., Creational, Structural, and Behavioral Patterns. The
that do not necessarily have compatible interfaces.
goal is therefore to learn knowledge of experts, i.e., design pattern
labels, to determine the most suitable design pattern label for each Solution Domain - Design Pattern
given design problem. In other words, the input of our proposed
method is the problem definitions of some design patterns and their
labels. Although the existing approaches described above employ a
semi-formal language like UML or the formal meaning of words in
ontologies, the simple, high precision and low-cost text classifica-
Structure
tion approach just uses the narrative text of the problem definition
part of design patterns to determinate a right design pattern for a Solution Domain - Design Specification
given design problem. Participants Target, Adaptee, Client, and Adapter
The proposed method has two distinct steps. In the first step, Collaborations Clients call operations on an Adapter instance. In turn,
i.e., Learning Design Patterns, after performing preprocessing on the the adapter calls Adaptee operations that carry out the
request.
problem definition part of design patterns, for each design pat-
Consequences Adapts Adaptee to Target by committing to a concrete
terns class, one classifier is learned. In the second step, i.e., Design Adapter class. As a consequence, a class adapter will
Pattern Retrieval, among design patterns classes which learned in not work when we want to adapt a class and all its
previous step, first, a candidate design pattern class similar to a subclasses.
given design problem is determined based on the text classifica- Lets Adapter override some of Adaptee’s behavior,
since Adapter is a subclass of Adaptee.
tion approach, then the right design pattern(s) which belongs to Introduces only one object and no additional pointer
the candidate design pattern class is suggested to the developer. indirection is needed to get to the Adaptee.
There are multiple choices for the steps of the proposed method; Implementation The implementation of Adapter in C++ is available in
therefore, an evaluation model has been presented for the determi- Gamma et al. (1994).
Related Patterns Bridge, Decorator, and Proxy Pattern
nation of the best choice. The results of an experimental evaluation
For example: Bridge has a structure similar to an
using real design problems to suggest the right design pattern(s) adapter, but Bridge has a different intent: It is meant to
from three design pattern groups reveal a promising Precision separate an interface from its implementation so that
value of 0.62, a Recall value of 0.75, and a low False Positive value they can be varied easily and independently.
of 0.037.
The rest of this paper is organized as follows: in Section 2, we
describe the definition of design patterns. Section 3 presents an problem. Any design pattern consists of Pattern Name section (orig-
overview of the text classification concepts, and in Section 4, the inated from its concept), Intent section (description of its problem),
proposed method is described. In Section 5, the evaluation model Motivation section (a scenario that illustrates a design problem),
of the proposed method is presented. In Section 6, the proposed Applicability section (the situations in which the design pattern can
method is evaluated using real design problems. Section 7 describes be applied), Structure section (a structure of the participants in the
a model to evaluate the consistency of the proposed design patterns Solution Domain), Participants section (the classes and/or objects
classes manually determined by researchers. Section 8 states some participating in the Solution Domain), Collaborations section (col-
limitations of the proposed method and related work is discussed laboration diagrams between solution participants), Consequences
in Section 9 before the paper concludes in Section 10. section (the results and trade-offs of applying the pattern), Imple-
mentation section, and the Related Patterns section (Gamma et al.,
1994). In this paper, only Problem Domain of each design pat-
2. Analysis of design patterns tern document including Intent, Motivation, and Applicability (called
Problem Definition) sections is used to select automatically the right
In general, a design pattern is described in a template consisting design pattern(s). To illustrate the template of a design pattern,
of two sections, Problem Domain and Solution Domain. The Prob- Table 1 shows the Adapter design pattern document from Gamma
lem Domain describes the problem context where the pattern can et al. book (Gamma et al., 1994). Each section in the table describes a
be applied. Analogously the Solution Domain describes the struc- specific aspect of the Adapter design pattern document. For exam-
ture and collaborations of the pattern solution being applied to the ple, the Intent section describes a short statement that states the
410 S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424

need for the Adapter design pattern. In this table, the Adapter method uses only the problem definitions of design patterns to
design pattern document is divided into three parts, Problem Defi- decide which design pattern(s) is suitable for the given design prob-
nition, Design Pattern, and Design Specification. lem.
There are a number of potential reasons why several researchers
(Gamma et al., 1994; Booch, 2006; Tichy, 1997; Pree, 1995; Coad
3. Overview of text classification
et al., 1995; Rising, 2000; Douglass, 2002; Trowbridge et al., 2006)
classify design patterns. The most significant factor is to create an
In general, supervised automatic text classification is a machine
abstraction for the Problem Domain of design patterns to search
learning technique, i.e., assigning documents automatically to one
ones according to their categories (i.e., design pattern class). In
or several predefined classes. When each document is assigned only
addition, these classifications can help the developers to manually
to one category it is called a single-label categorization.
search design patterns.
Classifications are usually made in binary mode, which means
In this paper, we employ the text classification approach that
the label of each document is either +1 or −1, i.e., each docu-
takes as an input the knowledge of the researchers, which is a
ment is assigned to either one category or its complementary
classification for design patterns, means that some design patterns
(Sebastiani, 2002; Jalili and Bitarafan, 2006; Jalili and Sadri, 2007).
are grouped into several classes (design pattern class) according
The text classification usually uses two sets: a training set and a
to their similarities. For this reason, the classifications made by
test set. The training set is used for learning some classifiers and
researchers are discussed below.
requires a primary group of labeled documents, in which the cat-
We divide the design pattern classifications that had been done
egory related to each document is obvious from its label. The test
manually so far into two groups, Problem Based and Solution Based
set is used to measure the efficiency of the learned classifiers and
classification schemes:
includes labeled documents which do not participate in learning
classifiers.
Problem based classification scheme. Gamma et al. (1994)
divided design patterns into three high level categories, Creational,
3.1. Common text representation and processing
Structural, and Behavioral. Pree (1995) also added a new coarse-
grained classification scheme to the design patterns listed in
Texts cannot be directly interpreted by a classifier or by a
Gamma et al. (1994) which relied on Abstract Coupling and Recur-
classifier-building algorithm. Therefore, at first, a preprocessing is
sive Structures. Coad et al. (1995) organized design patterns into
performed on text documents and then a classifier is learned. Dur-
four major subclasses, Transaction, Aggregate, Plan, and Interaction.
ing the preprocessing phase, usually two actions are performed
Tichy (1997) attempted to categorize the design patterns based on
on the documents, filtering stop words (e.g., articles, conjunc-
problems in design phase. The major categories cited were Decou-
tions, prepositions, and etc.) and stemming words. Stemming
pling, Variant Management, State Handling, Control, Virtual Machine,
methods (Hotho et al., 2005) are used to reduce the num-
Convenience Patterns, Compound Patterns, Concurrency, and Distri-
ber of words in the document and try to build basic forms of
bution. Rising (2000) used application domains as categories, for
words, i.e., strip the plural‘s’ from nouns, the ‘ing’ from verbs,
example, Accounting, Air Defense, Hypermedia, Integration, Trading,
and etc. A well-known rule-based stemming algorithm has been
Database, C++ idioms, Persistence, System Modeling, and Customer
originally proposed by Porter (2006). He defined a set of pro-
Interactions. Douglass (2002) categorized design patterns into six
duction rules to transform iteratively (English) words into their
classes, the Architecture Design, Concurrency, Memory, Resources,
stems. Some stemming instruments have also been produced, such
Distribution, and Safety and Reliability. In the classification made by
as UEA (2011).
Booch (2006), a large number of design patterns had been men-
In the next step, called indexing, the document presentation
tioned, most of them are related to architecture design. In his
model is determined. The most common model for indexing is
classification, design patterns had been categorized into 45 cate-
vector space model, which presents documents as feature vec-
gories. Trowbridge et al. (2006) classified design patterns based
tors without using any explicit semantic information. In this
on the following questions: Purpose (Why), Data (What), Func-
model, each feature indicates the presence or absence of a word,
tion (How), Timing (When), Network (Where), People (Who), and
or frequency of a term in a document. To improve the per-
Scoreboard (Test)
formance of the text classification and remove noises of the
Solution based classification scheme. Zimmer (1995) catego-
documents, usually term weighting schemes are used (Sebastiani,
rized design patterns based on the links between their solutions,
2002; Hotho et al., 2005), where the weights reflect the impor-
such as, the solution of one pattern is composed of the solution of
tance of a word in a specific document of the considered collection.
another pattern, the solution of one pattern is similar to the solu-
There are several ways of determining the weight of any word
tion of another pattern, and etc. However, these relations have
in documents (Hotho et al., 2005): Binary, Term Frequency (TF),
been stated only between a few numbers of design patterns.
Term Frequency Inverse Document Frequency (TFIDF), Term Fre-
quency Collection (TFC), Length Term Collection (LTC), and Entropy
Kim and Han (2007) grouped the design pattern solu- weighting.
tions according to their structures and dependencies in object- In the last step, feature selection methods are used for noise
oriented. omission and reducing the number of words and feature vector
Although current classifications lead to ease in selection of the dimensions. The feature selection methods attempt to keep the
right design pattern(s) manually by the developers, they still face classifier precision unchanged by removal of a subset of features.
two obstacles: the high number of categories and a large number of The feature selection methods are divided into two general cate-
design patterns in each category. These obstacles lead to difficulty gories: wrapper methods and filtering methods. Filtering methods
in selection of design patterns for developers; therefore, to over- are simple, low-cost, useful, more efficient, and computationally
come these obstacles, we need a tool to automatically select right easier than wrapper methods (Hotho et al., 2005). There are differ-
design patterns. ent types of filtering methods (Jalili and Bitarafan, 2006; Hotho et
In the proposed method, between the two presented classifi- al., 2005): Document Frequency (DF), Information Gain (IG), Mutual
cations, namely Problem Based and Solution Based, we select only Information (MI), Chi-square (CHI), and Correlation Coefficient (CC)
the Problem Based classification as an input, because the proposed method.
S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424 411

Fig. 1. Process of the proposed design pattern(s) selection for each given design problem.

classification approach is to improve the efficiency of the proposed


method.
As discussed earlier, at first, stop words are removed; second, the
extracted words are stemmed. Third, for each design patterns class,
one particular training set is created based on single-label catego-
rization and binary mode (see Section 3). To create the training set
Fig. 2. Activities of design patterns preprocessing step.
for each design patterns class, the labels of design patterns belong-
ing to this class are positive (i.e., +1), and the labels of the rest of
design patterns are negative (i.e., −1).
4. Proposed method Fourth, the vector space model is used for indexing and for all
classifiers a feature vector is formed, describing all non-repeated
Text classification approach has been used in different domains; terms in the documents.
therefore, we mention some of that: first, to choose the most suit- Finally, among weighting methods and feature selection meth-
able category for a given advertisement, a newspaper agency, like ods mentioned earlier in Section 3.1, one of them can be selected
Reuters, classifies a high volume of advertisements in an automatic and applied to the feature vector.
manner (Sebastiani, 2002). Second, to organize patents into cate-
gories, the text classification approach is used for making search 4.2. Learning classifiers of design patterns
easier (Larkey, 1999). Third, to automatically classify Web pages,
or websites, under the hierarchical catalogues, text classification In the second step, as shown in Fig. 1, a classifier is learned for
approach is used to find a particularly interest category (Attardi each design patterns class using the training set corresponding to
et al., 1998). Finally, to filter spam, electronic mailboxes often that design patterns class.
use this approach (Androutsopoulos et al., 2000; Jalili and Gerani, Note that there are several techniques for supervised learning
2007). (classification), but it cannot be said which one is better than others
Our main motivation for using the text classification approach is (Alpaydın, 2010). The reason is that there is no learning technique
its abilities in the automatic classification, therefore, we use it in the that achieves good results for all problems. In real life, some of them
proposed method to organize design patterns and to automatically may achieve good results for some problems and bad results for
retrieve the right design pattern for a given design problem. In addi- the rest (Alpaydın, 2010). Therefore, for a new problem (i.e., design
tion, the motivation is the fact that the design patterns selection pattern selection) to find out the best learning technique, we evalu-
problem is similar to an IR (Information Retrieval) problem; there- ate several learning techniques so that the best learning technique
fore, with regard to the literature, we can use the text classification is identified. Thus, one of the contributions of this paper is to sug-
approach for solving the design patterns selection problem. gest the best learning technique compatible with the design pattern
In the proposed method, the text classification steps are per- selection. Because common learning techniques in the text classi-
formed on the narrative text of Problem Definition part of design fications are (Sebastiani, 2002; Hotho et al., 2005) Naïve Bayesian,
patterns (see Table 1). K-Nearest Neighbor (KNN), C4.5 Decision Trees, and Support Vector
Fig. 1 shows the process of the proposed method to select the Machines (SVM) techniques, learning each classifier is performed
right design pattern(s) which works in four steps: (1) Preprocessing, in four different ways.
(2) Learning Design Patterns Classifiers as Design Patterns Organiza- In recent years, a fairly large number of machine learning tools
tion, (3) Determination of A Design Patterns Class, and (4) Suggestion have been advanced in practice to automatically use their tech-
of Design Pattern(s) as Two-Phase Retrieval of the Right Design Pat- niques. Among the well-known machine learning tools, we use
tern(s). The following sections describe the steps of the process. SVM Light (SVM, 2011) and WEKA (JPML, 2006) which include dif-
ferent types of classification methods. Using the evaluation model,
4.1. Preprocessing described in Section 5, the efficiency of the different learning tech-
niques is evaluated.
As shown in Fig. 1, preprocessing the text of all design pat- Finally, we choose the best learning technique compatible with
tern problem definitions is the first step of the proposed method. the design pattern selection based on the presented evaluation
The activities of the preprocessing are illustrated by a chart model.
in Fig. 2. In this chart, there are some optional activities, con- After finding the learning technique with the best performance,
sist of word stemming, term weighting, and feature selection, each classifier is typically trained again using the entire initial
shown by the dark blocks. The reason for their use in the text design patterns of each design patterns class and these classifiers
412 S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424

are then used in subsequent steps. In this case, the result of the eval- tm )), where each w(d, ti ) presents a weight for each word of the
uation is a pessimistic estimation of effectiveness, because the final design patterns collection, i.e., the size of the vector M is defined
classifier is trained with more data in comparison to the evaluated by the number of words of the complete design patterns collec-
classifier (Sebastiani, 2002). tion after stop words removal and word stemming. After similarity
measurement (based on Eq. (1)), one of the following equations is
4.3. Determination of a design patterns class applied to select the best design pattern.

As displayed in Fig. 1, the design pattern(s) corresponding to A. With respect to Eq. (2), jth design pattern with the highest value
each given design problem is suggested to the developer in two of Si is suggested to the developer.
phases (i.e., phases 3 and 4). For more clarity, Fig. 3 shows the
j = argmax Si (2)
proposed two-phase retrieval of the design pattern(s) for a sam- i
ple design problem, where each person is a symbol of the classifier,
while + and − symbols in the rectangle present the decisions of the B. The design patterns with Si that satisfy Eq. (3) are suggested to
corresponding classifier. the developer.
In Fig. 3, we choose a sample collection of design patterns (i.e., |Si | > 1 (3)
Douglass Patterns (Douglass, 2002)), divided by expert opinions
into five classes including, Concurrency, Memory, Resources, Dis- where  1 is the threshold level for similarity. Depending on the
tribution, and Safety and Reliability patterns. Therefore, during the value of  1 , no pattern might be suggested to the developer.
previous steps of the proposed method, five classifiers correspond- C. The design patterns with Si that satisfy Eq. (4) are suggested to
ing design patterns classes are learned. Note that the input of this the developer.
section is the learned classifiers and the given design problems, in |SMax − Si | ≤ 2 , (4)
contrast, the output of it is a candidate design patterns class for
each given design problem. where  2 and SMax are the threshold level for similarity and the
As shown in Fig. 3, a design problem description is given to each highest value of similarities between all design patterns and the
classifier, like an expert, to identify whether the design problem is given design problem.
related to the problem definitions of design patterns belonging to D. The proposed method can use a combination of Eqs. (2)–(4):
the classifier or not. As illustrated in Fig. 3, among the five learned first, use Eq. (3) to filter design patterns, and second, use Eq. (2)
classifiers, only one classifier opinion is positive, i.e., only Distribu- to find the best design pattern, and last, use Eq. (4) to suggest all
tion Patterns classifier decides that design problem n belongs to its design patterns close to the best design pattern.
design patterns class.
After that, one or more design patterns corresponding to the The efficient threshold level is obtained from experiment
given design problem are suggested to the developer (see Section (Sebastiani, 2002). Appropriate values for  1 and  2 will be pro-
4.4). posed in Section 6 according to the performed evaluations.
At this phase, after obtaining the description of the design prob-
lem, stemming and weighting are performed on their words as the 5. Proposed evaluation model
preprocessing process. The vector space of the sample design prob-
lem, i.e., design problem n vector space, is created according to the In this section, first a model for evaluating different learning
classifiers vector space and the result of which is depicted in Fig. 4. techniques of design patterns classes is described. In the evalua-
Each element of the vector shown in Fig. 4 presents a word of the tion, the best learning technique for determining the right design
problem definitions of the design patterns, where M is the number patterns class will be selected. Next, a method is presented for eval-
of words of all problem definitions of design patterns. Fig. 4 shows uating the proposed options in Section 4.4 for suggesting the design
that design problem n has 10 words already in the classifiers vector pattern(s) from design patterns belonging to the selected design
space, so for these words, the corresponding feature values are 1 patterns class. To evaluate the proposed method, we use resources
and for the rest, the feature values are 0. consisting of three design pattern groups and 60 real design prob-
lems, which are described in Sections 5.3 and 5.4, respectively.
4.4. Suggestion of design pattern(s)
5.1. Learning evaluation of design patterns classes
As shown in Fig. 3, in the design pattern(s) suggestion phase,
after a candidate design patterns class which is appropriate to The evaluation of document classifiers is typically conducted
the given design problem, is identified by the classifiers, the most experimentally, rather than analytically. The classification effec-
suitable design pattern to the given design problem is identified tiveness is usually measured in terms of four parameters, Precision
from that design patterns class using cosine similarity and applying (P), Recall (R), F1 (as combination of P and R using Eq. (5)), and False
threshold. Positive error rate (FP). To estimate P and R for learned classifiers,
To suggest a right design pattern (from the identified design pat- either micro-averaging equations or macro-averaging equations
terns class) for the given design problem, first, the cosine similarity can be used.
(Si ) between the problem definition of ith design pattern and the We use micro-averaging equations (see Eqs. (6) and (7)), because
design problem description is calculated based on Eq. (1). they are more precise than macro-averaging equations (Sebastiani,
2002). To select the best weighting method for each learning tech-

M
nique, we use the Effect of Weighting Method (EWM) computed
Si = w(DesignPatterni , tk ) × w(TheDesignProblem, tk ) (1)
according to Eq. (8). Let EWMi be the effect of each weighting
k=1
method, for example, the Binary, TF, TFIDF, TFC, and Entropy weight-
where DesignPatterni and TheDesignProblem are the problem def- ing method, on the learning technique. Therefore, EWMi for each
inition of ith design pattern and the design problem description, weighting method is computed, next the one which has the high-
respectively. In Eq. (1), it is assumed that a document d, for example, est value of EWM is chosen. In the following equations, where C is
a design pattern problem definition or a design problem descrip- the number of design patterns classes, TP is the number of design
tion, is defined by a vector of term weights W(d) = (w(d, t1 ),. . .,w(d, problems which belong to a design patterns class and the classifier
S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424 413

Fig. 3. The scheme of the two-phase retrieval of the right design pattern(s) for Design Problem n.

Fig. 4. The schema of Design Problem n vector space.


414 S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424

has correctly identified them, FP is the number of design problems patterns after removing stop words and stemming in this case
which do not belong to a design patterns class but the classifier study is 471 unique words.
has incorrectly identified them as belonging to that design patterns GoF patterns group. The book (Gamma et al., 1994) of GoF (Gang
class, and FN is the number of design problems which belong to a of Four) patterns which includes object-oriented design patterns
design patterns class but the classifier has not identified them as is one of the major references for design patterns, where 23 object-
belonging to that design patterns class. oriented design patterns are divided into three coarse-grained
categories. The number of non-repeated words for all these 23 pat-
2×P×R
F1 = (5) terns after removing stop words and stemming in this case study
(P + R) is 441 unique words.
|C|
i=1
TPi
P= |C| , Micro-averaging (6) 5.4. Real design problems
i=1
(TPi + FPi )
|C| We employ 60 real design problems extracted from differ-
i=1
TPi ent resources (Schumacher et al., 2006; Silberschatz et al., 2002;
R= |C| , Micro-averaging (7)
(TPi + FNi ) Tanenbaum, 2001; CS, 2011) to evaluate the effectiveness of the
i=1
proposed method. To provide an illustration of these design prob-
EWMi = F1,i − FPi (8) lems, we introduce five of them as follows:

To select the best learning technique, we calculate two factors


called Evaluation Metrics Fusion (EMF) as follows: Real design problems for security patterns evaluation. To eval-
uate security patterns in terms of real design problems, we choose
(P + R) 22 security design problems from (Schumacher et al., 2006) and
EMF1 = − ˛ × FP (9)
2 from design documents of several applications. For example, one
of them is presented as follows:
EMF2 = (1 − ˇ) × P + ˇ × R − ˛ × FP (10)
Design Problem 1: “A user executes processes; processes are usually
As shown in EMF1 , P and R have been assigned the same weight, but created through system calls to the operating system. A process needs
FP has been considered as a penalty with a tuning coefficient ˛ ≥ 0. to create a new process (a child process). How to define the rights to
Based on EMF2 , a learning technique is suggested with regards to be given to a new process?”
the effectiveness (ˇ) of P and R. If Precision (P) is more important Real design problems for douglass patterns evaluation. Due to
than Recall (R) for the proposed design patterns identification sys- the similarity between the Douglass design patterns and operat-
tem, ˇ < 0.5 is considered, otherwise, ˇ > 0.5. In case ˇ = 0.5, EMF2 ing system design problems, we select 19 design problems from
reduces to EMF1 . two famous operating system books (Silberschatz et al., 2002;
Finally, for feature selection, the Document Frequency (DF) Tanenbaum, 2001). For example, two design problems of them
method is used in the evaluation model. are presented as follows:
Design Problem 2: “How to create a dependency between dis-
tributed clients so that when one object changes data, all its
5.2. Evaluation of design pattern(s) selection
dependent clients are notified and updated automatically”.
Design Problem 3: “All Program scripts must be loaded into physical
The goal is to select the closest pattern(s) to the problem, after
memory before they can be executed. So, we need the memory man-
identifying of a design patterns class related to a design problem.
ager of the operating system keeps track of allocated and unallocated
This phase of the evaluation is only applied to real design prob-
regions of physical memory”.
lems. Having determined design patterns class of a real design
Real design problems for GoF patterns evaluation. To evaluate
problem, the proper design pattern(s) is suggested applying the
this group of design patterns, we select 19 design problems from
method with the proper similarity equation presented in Section
two object-oriented detailed design documents including the E-
4.4. Therefore, in this evaluation, among four similarity equations
Archive Project (CS, 2011) and Personal Information Filter Project
described in Section 4.4, the one with the highest Ratio of Correct
(PIF) (CS, 2011). For example, two design problems of them are
Detection of Design Pattern (RCDDP), computed by Eq. (11), is chosen.
presented as follows:
Number of Correctly Suggested Desgin Patterns Design Problem 4: “The JDBC Control is used to provide and stan-
RCDDP = (11)
Number of Suggested Desgin Patterns dardize the interface to the DBMS, thus increasing the modularity of
the connection with the database. How to create JDBC Control?”
5.3. Design patterns groups Design Problem 5: “An operation should handle all incoming
requests from clients and forward the requests to the Event Handler.
In real life, design patterns are categorized in some groups by The client uses a web browser to send httprequests to the servlet.
software engineering experts. In our evaluation model, we use The servlet handles the request by sending a httpservletrequest to
three groups of design patterns which are frequently used. the eventhandler, along with a httpservletresponse that can be used
to send results back to the corresponding browser”.
Security patterns group. The integrating security systems using
security patterns are discussed in Schumacher et al. (2006), where 6. Results of the proposed method evaluation
46 security patterns have been divided into eight categories. The
number of non-repeated words for all these 46 patterns after In this section, the proposed method will be evaluated according
removing stop words and stemming in this case study is 590 to the evaluation model presented in Section 5. To reduce features
unique words. and calculate the values of EMF1 and EMF2 in each learning tech-
Douglass patterns group. A robust scalable architecture has been nique, the most suitable weighting method (according to EWM) has
presented for real time systems in (Douglass, 2002) using design been used for the respective learning technique, unless the weight-
patterns, where 34 design patterns have been presented in five ing method is explicitly mentioned. In this paper, in KNN learning
categories. The number of non-repeated words for all these 34 technique, we assumed the value of parameter K, equal to 3 to
S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424 415

1
0.9 Naïve Bayes
SVM
0.8
0.7
0.6

EMF2
C4.5
0.5
0.4 KNN
0.3
0.2
0.1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Value of β using α = 1
Fig. 5. The evaluation results of learning security patterns.
Fig. 7. EMF2 curve of learning security patterns.

obtain better results. In all evaluations, a learning technique will


be suggested as the best learning technique, which has the highest All curves in Fig. 6 reveal that the values of EMF1 are decreased
value of the EMF1 , where ˛ = 1, and for EMF2 ˛ = 1 and ˇ = 0.5, i.e., by raising the penalty coefficient (˛). However, KNN has a steeper
the point at which the value of criteria P and R are considered as decreasing slope in comparison to the other learning techniques.
equal, like Breakeven Point (Sebastiani, 2002). Therefore, according to EMF1 criterion, Naïve Bayes achieves bet-
ter performance than the other learning techniques for security
6.1. Learning evaluation using design pattern groups patterns.
The upper curve in Fig. 7 reveals that the EMF2 values of Naïve
In this section, the effectiveness of learning techniques in the Bayes goes up to the value of 0.91 for ˇ equals to 0.2 and then
design patterns class identification is evaluated using the evalua- decreases slowly. Thereafter, the C4.5 curve falls steadily, reveal-
tion model explained in Section 5.1 and the three pattern groups ing when the more attention is paid to R than P, the value of EMF2
mentioned in Section 5.3. In this evaluation, both the training set decreases more. The KNN and SVM curves are constant for different
and test set are obtained by a design patterns collection, where 70 ˇ coefficients, revealing that the values of P and R are equal. There-
percent of the design patterns collection, as the training set, are ran- fore, based on EMF2 , Naïve Bayes has better results than the other
domly selected for training the classifier and the rest of the design learning techniques for security pattern.
patterns collection (30 percent), as the test sets, are considered As shown in Fig. 2, in the preprocessing step of the proposed
for testing the classifier. In this section, each learning technique method, we can optionally use the feature selection both to increase
is evaluated using 10 random training and test sets and their best the classification performance and to reduce the length of the fea-
results have been reported. Although different weighting methods ture vectors. Therefore, to evaluate the impact of this selection on
Binary, TF, TFIDF, TFC and Entropy are applied in the evaluations, the overall performance of the proposed method, we use Document
only the best weighting method results in each learning technique Frequency (DF), due to its simplicity and effectiveness in the feature
have been reported. Moreover, in all figures, LT and BWM stand for selection (Sebastiani, 2002).
Learning Technique and the Best Weighting Method, respectively. In this case study, the highest value of DF for a term is 16. In
Fig. 8, the changes in P, R and F1 of the Naïve Bayes are displayed with
6.1.1. Security patterns evaluation regards to the changes of term frequency in document (X) and the
Fig. 5 compares the evaluation results of the four learning tech- number of omitted features (N). For example, in Fig. 8, X equal to 12
niques, SVM, Naïve Bayes, C4.5, and KNN using security patterns. means the omission of features that their DF value equals or is more
As shown in Fig. 5, according to EWM criterion, Naïve Bayes and than 12. In Fig. 8, when X = 10, the values of criteria are: P = 0.94,
SVM achieve better results than the other learning techniques. R = 0.94, F1 = 0.94, and FP = 0.01, and 2% of the features have been
Figs. 6 and 7 depict the results of combination of the evaluation reduced. As illustrated in Fig. 5, before feature selection, the values
criteria according to curves of EMF1 and EMF2 , respectively.

1
1 Naïve R_Naïve Bayes
0.9 Bayes

0.8 0.9
Value of Criterion

0.7 SVM
P_Naïve Bayes
0.6
EMF1

C4.5 0.8
0.5
0.4
0.3
0.7
0.2 F1_Naïve
KNN Bayes
0.1
0 0.6
0 0.5 1 1.5 2 2.5 3 3.5 4
0 3 7 10 19 59 82 185 N
Value of α 16 14 12 10 8 6 4 2 X

Fig. 6. EMF1 curve of learning security patterns. Fig. 8. The results of feature selection in Naïve Bayes learning of security patterns.
416 S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424

Fig. 9. The evaluation results of learning Douglass patterns. Fig. 10. The evaluation results of learning GoF patterns.

of P and FP are 0.88 and 0.02, respectively also, the comparison in Section 5.3 and will be evaluated using real design problems
between Fig.s 5 and 8 reveals that removing features with high DF, mentioned in Section 5.4.
leads to decrease in the degree of error (noise), i.e., FP, and increase
in the value of P. When X is in the range of (6, 10), the value of 6.2.1. Security patterns evaluation
F1 remains the same as before reduction of features. Therefore, in Having learned eight classifiers, one for each security design
this case, the DF method achieves the same previous results in this patterns class mentioned in Section 5.3, we used all 22 security
range with 10% reduction of features. Hence, if the goal here is to design problems mentioned in Section 5.4, to evaluate the mean
reduce features highly, the features reduction rate can be increased effectiveness of the best learning technique.
from 2% to 10%, thereby it leads to reduction of F1 from 0.94 to 0.91. The evaluation results of real design problems for each learn-
Feature selection has also been performed for the other learning ing technique are compared in Fig. 11. Our experiments show that
techniques. In this case study, no improvement is observed in SVM. based on the curves of EMF1 and EMF2 (CS, 2011) and EWM in
In C4.5, when X is in the range of (5, 16), the F1 value increases, Fig. 11, SVM has better results than the others in this case study.
when X becomes less than 5, the F1 value decreases. In KNN, when Among eight learned SVM classifiers for eight security patterns
X is in the range of (10, 16), F1 is constant, in the range of (5, 9) F1 classes, Operating System Access Control Patterns Class is the only
decreases, in the range of (3, 4) F1 increases, and when X becomes suggested class by the proposed method for design problem 1.There-
less than 3, F1 begins to decline. fore; in this case, the result of the proposed method is the same as
expert opinions.
6.1.2. Douglass patterns evaluation
Fig. 9 compares the evaluation results of the four learning tech- 6.2.2. Douglass patterns evaluation
niques, SVM, Naïve Bayes, C4.5, and KNN using Douglass patterns. Having learned five classifiers, one for each Douglass patterns
As shown in Fig. 9, based on EWM criterion, SVM and Naïve Bayes class mentioned in Section 5.3, we used all 19 Douglass design prob-
are better than the other learning techniques, respectively. In this lems mentioned in Section 5.4, to evaluate the mean effectiveness
case study, the curves of EMF1 and EMF2 for the learning tech- of the best learning technique.
niques are reported in (CS, 2011). In fact, in this case study, the The evaluation results of real design problems for each learning
evaluation results of Naïve Bayes and SVM learning techniques indi- technique are compared in Fig. 12. Thus, in this case, according to
cates an ideal mode, because both P and R equals to 1. Therefore, EMF1 and EMF2 (CS, 2011) and EWM of Fig. 12, the two learning
in terms of EMF1 and EMF2 , these two learning techniques have techniques SVM and Naïve Bayes (with a slight difference) achieve
better performance than the others for Douglass patterns. better results than the other learning techniques.

6.1.3. GoF patterns evaluation


Fig. 10 compares the evaluation results of the four learning tech-
niques, SVM, Naïve Bayes, C4.5, and KNN using GoF patterns. As
shown in Fig. 10, based on EWM criterion, SVM and Naïve Bayes
have better results than the others. In this case study, the curves
of EMF1 and EMF2 for the learning techniques are reported in (CS,
2011).
Therefore, our experiments show that based on EMF1 and EMF2
criteria, SVM has better results than the other learning techniques
for GoF patterns. The results of feature selection for two former case
studies are reported in (CS, 2011).

6.2. Learning evaluation using real design problems

In this section, the efficiency of learning techniques in design


patterns class determination will be evaluated using the evaluation
model explained in Section 5.1. Also, in this section classifiers will Fig. 11. The evaluation results of learning security patterns using 22 real design
be learned based on design patterns groups (one by one) mentioned problems.
S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424 417

Table 2
Cosine similarities between Design Problem 1 and the corresponding problem defi-
nition of Operating System Access Control Patterns.

Operating system access control patterns Cosine similarity

Authenticator Pattern 0.14


Controlled Execution Environment Pattern 0.36
Controlled Object Factory Pattern 0.38
Controlled Object Monitor Pattern 0.55
Controlled Process Creator Pattern 0.80
Controlled Virtual Address Space Pattern 0.25
Execution Domain Pattern 0.39
File Authorization Pattern 0

Table 3
Cosine similarities between Design Problem 2 and the corresponding problem defi-
Fig. 12. The evaluation results of learning Douglass patterns using 19 real design
nition of Distribution Patterns.
problems.
Distribution Patterns Cosine similarity
Among five SVM classifiers for Douglass patterns classes, includ- Broker Pattern 0.16
ing Concurrency, Memory, Resources, Distribution, and Safety and Data Bus Pattern 0.42
Reliability patterns classifier (as shown in Fig. 3), Distribution Pat- Observer Pattern 0.64
Proxy Pattern 0.27
terns Class and Memory Patterns Class are suggested by the proposed
Remote Method Call Pattern 0.11
method for design problem 2 and design problem 3, respectively. Shared Memory Pattern 0.21
Therefore; in these cases, the results of the proposed method are
the same as expert opinions.
Table 4
Cosine similarities between Design Problem 3 and the corresponding problem defi-
6.2.3. GoF patterns evaluation nition of Memory Patterns.
Having learned three classifiers, one for each GoF design pat-
terns class mentioned in Section 5.3, we used all 19 GoF design Memory Patterns Cosine similarity

problems mentioned in Section 5.4, to evaluate the mean effective- Fixed Sized Buffer Pattern 0.62
ness of the best learning technique. Garbage Collection Pattern 0.28
Garbage Compactor Pattern 0.45
The evaluation results of real design problems for each learning
Pool Allocation Pattern 0.18
technique are compared in Fig. 13. Therefore, based on EMF1 and Smart Pointer Pattern 0.57
EMF2 (CS, 2011) and EWM of Fig. 13, SVM has better results than Static Allocation Pattern 0.48
the other learning techniques.
Among three SVM classifiers, including Creational, Structural,
Table 5
and Behavioral Patterns classifier, for GoF patterns classes, Struc- Cosine similarities between Design Problem 4 and the corresponding problem defi-
tural Patterns Class and Behavioral Patterns Class are suggested by nition of Structural Patterns.
the proposed method for design problem 4 and design problem 5,
Structural Patterns Cosine similarity
respectively. Therefore; in these cases, the results of the proposed
method are the same as expert opinions. Adapter Pattern 0.31
Bridge Pattern 0.13
Compose Pattern 0.11
6.3. Evaluation of design pattern(s) selection using real design Decorator Pattern 0
problems Façade Pattern 0.14
Flyweight Pattern 0
To show how the proposed options in Section 4.4 work clearly, Proxy Pattern 0.06

we first follow application of the proposed design pattern selection


method on five real design problems in details. Then, we present
6.3.1. Sample design problems
the results of experimental works on some subset of real design
In Section 5.4, five real design problems are mentioned as sam-
problems mentioned in Section 5.4.
ple design problems. In Section 6.2, for all of them, the proper
design patterns class (type) is identified by the proposed method.
In this section, we report the results of applying the proposed

Table 6
Cosine similarities between Design Problem 5 and the corresponding problem defi-
nition of Behavioral Patterns.

Behavioral Patterns Cosine similarity

Chain of Responsibility Pattern 0.36


Command Pattern 0.25
Interpreter Pattern 0
Iterate Pattern 0.08
Mediator Pattern 0.07
Memento Pattern 0.09
Observer Pattern 0.07
State Pattern 0.07
Strategy Pattern 0.06
Template Method Pattern 0
Fig. 13. The evaluation results of learning GoF patterns using 19 real design prob- Visitor Pattern 0.07
lems.
418 S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424

0.8

RCDDP Criterion
0.6

0.4

0.2

0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Value of θ1

Fig. 16. Mean RCDDP changes for 20 real design problems based on option (B) of
Fig. 14. The evaluation of applying Eq. (4) on sample design problems.
cosine similarity method.

options presented in Section 4.4 on these five real design problems. Analysis of Design Problem 5: In Fig. 15, when  1 is in the range
Tables 2–6 shows the results of cosine similarities between Design (0.09, 0.35) and in Fig. 14 for all values of  2 , the highest value of
Problem 1 and Operating System Access Control Patterns Class, Design RCDDP is obtained.
Problem 2 and Distribution Patterns Class, Design Problem 3 and Mem-
ory Patterns Class, Design Problem 4 and Structural Patterns Class, It is worth to note that, we clearly explain in detail how the pro-
Design Problem 5 and Behavioral Patterns Class, respectively, i.e., the posed method retrieves the right design patterns for four another
results of the fourth phase of the proposed method (see Fig. 1). In real object-oriented design problems, and how the cosine similar-
Tables 2–6, the design pattern that has the highest similarity value ities of these design problems and the GoF patterns are computed
is highlighted. in (CS, 2011).
Figs. 14 and 15 are comparing the results of applying Eqs. (4)
and (3) on five sample design problems presented in Section 5.4, 6.3.2. A subset of design problems
respectively. In the special case of Fig. 14, when  2 is equal to 0, the To assess more accurately, 20 design problems are randomly
Eq. (4) reduces to Eq. (2). selected from the 60 real design problems (mentioned in Section
It is worth to note that, according to expert opinions, the high- 5.4). It is assumed that the design patterns class of each design prob-
lighted design patterns in Tables 2–6 are correctly determined by lem is correctly detected in the third phase of the proposed method.
applying Eq. (2) on all five sample design problems. It means that Now, all proposed equations in Section 4.4 are evaluated for selec-
for these five sample design problems, in the fourth phase of the tion of the right design pattern(s) for these design problems. The
proposed method, when choosing the Eq. (2), the highest value of presented results in this section are the average of RCDDP for all
RCDDP is obtained. In the following, we discuss on the best design selected real design problems.
pattern(s) based on different ranges for parameters  1 and  2 . Mean RCDDP changes in terms of different values of  1 and  2 ,
for all options (i.e., Eqs. (2–4)) of cosine similarity (Si ) method are
Analysis of Design Problem 1: In Fig. 15, when  1 is in the range depicted in Figs. 16–18.
(0.55, 0.79) and in Fig. 14 for all values of  2 , design problem 1 As shown in Fig. 16 (option B), when  1 is considered at 0.4, the
has the highest value of RCDDP.Analysis of Design Problem 2: In highest value of RCDDP (0.47) is obtained. In Fig. 17 (option C), when
Fig. 15, when  1 is in the range (0.42, 0.63) and in Fig. 14 for all val-  2 is equal to 0.03, the highest value of RCDDP (0.82) is obtained.
ues of  2 , design problem 2 has the highest value of RCDDP.Analysis Therefore, we recommend that the values of  1 (for option B) and
of Design Problem 3: In Fig. 15, when  1 is in the range (0.57, 0.61),  2 (for option C) are equal to 0.4 and 0.03, respectively.
the highest value of RCDDP is obtained. In Fig. 14, when  2 is in the Note that when the value of  2 in option (C) is zero, then option
range (0, 0.04), the highest value of RCDDP is obtained.Analysis of (C) reduces to option (A). In Fig. 17, the option (A) achieves the value
Design Problem 4: In Fig. 15, when  1 is in the range (0.14, 0.30) of 0.79 for RCDDP.
and in Fig. 14, when  2 is in the range (0, 0.16), the highest value As shown in Fig. 18, when  1 and  2 are respectively equal to
of RCDDP is obtained. 0.3 and 0.03, option (D) achieves the best result (RCDDP is equal

Fig. 15. The evaluation of applying Eq. (3) on sample design problems.
S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424 419

RCDDP Criterion 0.8

0.6

0.4

0.2

Value of θ2
Fig. 19. The process of the evaluation and anomaly reduction of design patterns
classes made by hand.
Fig. 17. Mean RCDDP changes for 20 real design problems based on option (C) of
cosine similarity method.
as long as the value of P and R (or F1 ) are not less than the corre-
to 0.84). The reason is that, for unrelated design problems, the sponding values before features reduction. In our experiment
resemblance of which are less than 0.3 to the design patterns, the to gain maximum feature reduction, the number of features
proposed method does not suggest any design patterns, and when decreases by 10%.
 2 is equal to 0.03, it achieves its maximum effectiveness. D. According to the performed evaluation in Section 6.3, among the
four proposed options in Section 4.4, the option (D) is suggested
6.4. A summary of the evaluations with values of 0.3 and 0.03 for  1 and  2 , respectively. Note that
the effective values of  1 and  2 depend on the selected patterns
According to the analysis performed on each design pattern group; therefore, to achieve the most suitable values, the devel-
group, outlines of the results are as follows: opers need to perform these experiments with sample design
problems as parameters tuning.
A. Among four evaluated learning techniques, Naïve Bayes and SVM
achieve better results. Although from these two learning tech- 7. Consistency evaluation of design patterns classes
niques, SVM has better results in more cases, but the efficiency
of each learning technique depends on design pattern groups In the design patterns learning phase in Section 4.2, a design pat-
under evaluation. Therefore, Section 7 will address to compute terns classification of researchers is used as training sets to learn
consistency of each design patterns group. their design patterns classes as classifiers. Any deficiency in design
B. In order to increase the effectiveness of learning techniques, for pattern classifications can have many negative effects on the effi-
SVM, TFC, TF and TFIDF weighting methods, and for Naïve Bayes, ciency of the learned classifiers. For example, if the classification
TFC, TFIDF and TF weighting methods are respectively suggested. has some inconsistency and considerable overlap between patterns
C. Between SVM and Naïve Bayes learning techniques, SVM is not classes, the results of learned classifiers may be incorrect in the
sensitive to feature selection (Joachims, 1998), but feature selec- design patterns class detection.
tion by DF method in case studies reveals that in Naïve Bayes, One of the main shortcomings of the current classifications
with a decrease in features, the evaluation results improve. The made by different researchers is their inconsistency and anomaly
results of feature selection for this learning technique indicates (Hasso and Carlson, 2004, 2005). As a result, before learning clas-
that if we want to increase the learning precision and decrease sifiers phase in the proposed method, first each design pattern
the error rate (FP), the features must reduce as long as the values classification should be evaluated to measure the classification
of learning precision and error rate reach their maximum and consistency, then after removing any anomaly, it is used to learn
minimum, respectively. In our experiment to gain maximum the classifiers. For this reason, we propose a novel method that
precision and minimum error rate at most 2% of the features evaluates and reduces inconsistency of the classifications that are
decrease. But if the purpose is omission of the most number of performed manually by researchers, which is illustrated in Fig. 19.
features (Jalili and Bitarafan, 2006), then we can omit features In this method, we use another machine learning technique,
called Clustering, that Alpaydın (2010) describes it as follows: “Clus-
tering is an unsupervised learning, aims to find K categories (clusters)
or groupings of the input, where K is determined by the user. In docu-
ment clustering, the aim is to group similar documents. For example,
news reports can be subdivided as those related to politics, sports, arts,
and so on.”
As shown in Fig. 19, first the problem definitions of design pat-
terns are clustered for each design pattern group. The number of
clusters (K) is the same as the number of design pattern group
classes made by researchers. After clustering, the results are com-
pared with the classification made by researchers; therefore, it is
possible – and indeed undesirable – to occur some anomalies.
An anomaly means that an object (i.e., a design pattern) belongs
to a class in reality, but it is incorrectly assigned to another class by
the clustering process. To provide an illustration of the clustering
process and the anomaly, Fig. 20 shows the results of the cluster-
Fig. 18. Mean RCDDP changes for 20 real design problems based on option (D) of ing process for two example object groups that experts subdivided
cosine similarity method. into three classes earlier. As shown in Fig. 20(a), the comparison
420 S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424

Fig. 20. Example of the clustering process with 3 clusters (K = 3): (a) no anomaly appeared and (b) one anomaly appeared.

between objects in the generated clusters and objects in the classes researchers earlier. In the first phase, the second class (i.e., Identi-
reveals that no anomaly occurred. But, in Fig. 20(b), one anomaly fication and Authentication patterns) has the highest anomaly. After
occurred, because one object of Class 1 is located in Cluster 2 during improving that, the results of clustering indicate 16 inconsistent
the clustering process. design patterns. In the second phase, the seventh class (i.e., Fire-
If the number of anomalies is less than the threshold level , the wall Architecture patterns) has the highest anomaly, and after its
design pattern consistency evaluation has finished, else the class improvement, 14 inconsistent design patterns remain. At the end,
which has the highest anomaly is selected as candidate. for this pattern group, all classes are improved and the results of
We know, in each design pattern class, the cohesion (similarity clustering indicate that there is only one inconsistent design pat-
of design patterns inside a class) must be high and coupling (simi- tern. The above-mentioned process reveals that security patterns
larity of design patterns of a class with the rest of design patterns) group has 39% anomaly.
must be low. The class which has the highest anomaly has a low After improving security patterns group, the learning tech-
cohesion and high coupling. Therefore, in the next step, the anoma- niques of security pattern classes are re-evaluated using the
lies of the candidate class design patterns must be eliminated. improved patterns group like Section 6.1.1. The results of the eval-
To eliminate anomaly, it is recommended that descriptive words uation indicate that Naïve Bayes, SVM and C4.5 achieve P = 1, R = 1,
which state the main features (the nature) of a pattern class, and F1 = 1, and FP = 0, i.e., an ideal mode, for all mentioned weighting
discriminate words which distinguish a pattern class from other methods. In KNN, the values of P and R increase in all weighting
classes are used inside each class. To improve the problem defi- methods, but it does not achieve 1 (the ideal mode). The results of
nition text of design patterns inside candidate class, the definition repeating the evaluation in Section 6.2.1 using improved security
of all design patterns of that class should be rewritten using these patterns group are indicated in Fig. 21.
words. This causes the similarity of members inside a class and their Comparing Fig. 21 with Fig. 11, reveals that EWM values of all
dissimilarity with other classes to be meaningful. learning techniques have increased except for C4.5.
After improving design patterns problem definition of candidate Therefore, in terms of EMF1 , EMF2 (CS, 2011), and EWM, Naïve
class, in the next phase, clustering is performed once more time, Bayes has achieved better performance than the other learning
and a number of anomalies are calculated. This process is iteratively techniques.
performed as long as the number of anomalies becomes less than .
The advantage of using the text clustering is that before applying 7.2. Consistency evaluation of douglass patterns classes
the proposed method, the anomaly of the design patterns group can
be evaluated. Also, the different classifications made by researchers The results of the clustering process with the number of clusters
can be verified from viewpoint of the problem definition. (K) equal to 5 (because the number of classes in Douglass patterns
After studying clustering methods, we decide to use Bisection K- group is 5) in these design patterns group indicate that only 2 of
means (RBR) method (Steinbach et al., 2000), because this method
has better performance than Agglomerative Hierarchical Clustering
(AHC) and K-means method in the document clustering (Steinbach
et al., 2000). Additionally, we use CLUTO (Karypis, 2002) toolkit for
the clustering process. This tool has a number of criterion func-
tions for clustering methods, and in this study H1 and I2 criterion
functions are used because they have yielded better results.
Below, first the consistency of the three design pattern groups
mentioned in Section 5.3 is evaluated, then the effect of incon-
sistency reduction on the learning techniques evaluation results
will be presented (in these evaluations the threshold level  is
considered as 3 according to the experimental results).

7.1. Consistency evaluation of security patterns classes

The results of primary clustering process with the number of


clusters (K) equal to 8 (because the number of classes in security
patterns group is 8) in this patterns group reveal that 18 of 46
design patterns have anomalies, i.e., during the clustering process, Fig. 21. The evaluation results of learning improved security patterns using 22 real
18 design patterns do not allocate to classes that are assigned by design problems.
S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424 421

Fig. 22. The evaluation results of learning improved GoF patterns using 19 real
Fig. 23. The evaluation results of learning all improved patterns using 60 real design
design problems.
problems.

34 design patterns are inconsistent in the classification made by As shown in Fig. 23, the value of EWM in Naïve Bayes is more
Douglass. This low anomaly (6%) reveals that design patterns in than the other learning techniques. But at point ˛ = 1 in EMF1 curve,
each class are very similar. Therefore, due to low anomaly, and and at point ˇ = 0.5 in EMF2 curve, there is slight difference between
because in the stage of the evaluation with its design patterns the SVM and Naïve Bayes (CS, 2011), hence, according to EMF1 and EMF2 ,
proposed method yielded P = 1 and R = 1, therefore, there is no need these two learning techniques have better results.
for improvement of problem definition of this group.
7.5. A summary of the evaluations
7.3. Consistency evaluation of GoF patterns classes
The important point observed in these evaluations is that
The results of the clustering process with the number of clusters by improving and reducing inconsistencies, the results of Naïve
(K) equal to 3 (because the number of classes in GoF patterns group Bayes was considerably enhanced, and in the evaluation of three
is 3) in this design patterns group indicate that there are 9 anoma- improved pattern groups with real design problems, based on
lies from the 23 design patterns in the classification made by the EWM, EMF1 , and EMF2 , this learning technique achieved better
author. At the first stage, the second class (i.e., Structural Patterns) results in comparison to the other learning techniques. As a result,
has the highest anomaly. After making improvements, the results we conclude several important consequences. First, when a design
of clustering indicate 5 inconsistent design patterns. In the next pattern classification is consistent and has low anomaly, Naïve
stage, the third class (i.e., Creational Patterns) and in the following Bayes has the best results. Second, SVM achieves better results with-
stage the first class (i.e., Behavioral Patterns) has been improved. out reduction of inconsistencies. Third, the sensitivity of Naïve Bayes
After improving all classes of the group, the number of anomalies toward inconsistent data is more than SVM. Finally, Naïve Bayes
becomes zero. The above-mentioned process shows that the design keeps the values of criteria P and R close to one another, while SVM
patterns group in GoF book has 39% anomaly. sacrifices criterion R to criterion P.
After improving GoF patterns group, the learning techniques of
these classes are re-evaluated using the improved design patterns 8. Limitations of the proposed method
group, like Section 6.1.3. The results of the evaluation indicate that
all learning techniques: Naïve Bayes, SVM, C4.5 and KNN yielded In the course of experimentation during the evaluation, a num-
P = 1 and R = 1 for all of the mentioned weighting methods. ber of limitations of the proposed method became apparent. First,
The results of repeating the evaluation in Section 6.2.3 using the results of applying the learning techniques on each one of the
improved design patterns group are indicated in Fig. 22. three design pattern groups reveal that the effectiveness of the
The comparison of this evaluation results with the evaluation proposed method is dependent on the number of inconsistencies
results presented in Section 6.2.3 (Fig. 13) shows that the results in the classification of the design patterns group that is chosen.
of Naïve Bayes and SVM have considerably increased in terms of For example, the results of applying the proposed method on the
EMF1 , EMF2 (CS, 2011), and EWM. Thus, SVM has achieved better Douglass patterns group are better than the results of the GoF pat-
performance than the other learning techniques. terns group, because the classification of the GoF patterns group has
many inconsistencies and anomalies, in other words, the problem
7.4. Learning evaluation with real design problems for all definitions of the GoF design patterns is weak. Thus, if the problem
improved pattern groups definition of design patterns is more complete, the results of the
proposed method improve.
In this section, we collect the three design patterns groups men- Second, the quality of a given design problem description has
tioned in Section 5.3 as a new design patterns group, which consists a great impact on the overall quality of the proposed method. For
of 103 design patterns, i.e., 46 security patterns and 34 Douglass example, applying the proposed method on design problems 4 and 5
patterns together with 23 GoF patterns. This new design patterns achieved the worst results; see Tables 5 and 6, due to the poor qual-
group includes 16 design patterns classes, i.e., 8 security patterns ity of these design problems. Thus, if the more words of problem
classes and 5 Douglass patterns classes together with 3 GoF pat- definition of design patterns are used in description of real design
terns classes. It is worth remembering that the improved design problems, the probability of finding the right design patterns will
patterns have been used in this section. The evaluation results of be more. To overcome this difficulty, the words that are used in
60 real design problems mentioned in Section 5.4 are presented in the problem definition of the used design patterns group should be
Fig. 23. collected in a glossary, i.e., Design Patterns Glossary, that available
422 S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424

to software developers. It is recommended developers make max- solution in UML, while our proposed method can be applied on
imum use of the glossary to describe design problems to achieve any kind of design patterns. Last, the Meta-model creation stage for
a better result. Alternatively, in the design problems preprocess- each design pattern is not automatic, while all steps of our proposed
ing, we could replace the words of the design problem description method are performed automatically.
by synonym words in the problem definitions of used design pat- Following the first approach, Hsueh et al. (2007) used the differ-
terns group. In this case, instead of forcing the developer to use ent design pattern viewpoints, aim at finding a systematic method
the design patterns glossary, a preprocessing to replace terms with to select right design patterns, consist of Activity-view to see
suitable ones is performed on a given design problem. how a design pattern can facilitate a design activity, Requirement-
Finally, another limitation in the proposed method lies in the view to explore how a design pattern can enhance non-functional
fact that there are a couple of books that describe design pat- requirements, Problem-view to investigate how a design pattern
terns, but the description of their design patterns is not necessarily can resolve design problems and prevent exceptions, and Tradeoff-
searchable by computer. It is recommended that a pool that consists view to see how a design pattern can resolve design conflict. After
of design pattern descriptions is supplied by some large research that, the authors suggested a goal-driven framework based on
centers. Additionally, we recommend that Wikipedia.org, Source- UML diagrams, includes three stages, modeling requirements, objects
Forge.Net and other collaborative web-based encyclopedias create analysis, and objects design, for object-oriented process design. This
an environment in which software engineers can participate to work has a number of limitations. First, an evaluation of this work
complete design pattern descriptions. has been done only by one case study and a limited number of
design patterns. Second, this work does not suggest an automatic
9. Related work way to select the design patterns and it is a framework to help the
developer to detect the right pattern. Last, this work has the limi-
The attempts made to automatically select a design pattern are tations 2, 4 and 5 mentioned for the method of Kim and Khawand
divided into two approaches, UML-Based (Kim and Khawand, 2007; (2007) and Kim and Shen (2008).
Hsueh et al., 2007; Kim and Shen, 2008) and Ontology-Based (Hasso Second approach (Ontology-Based Approach). Hasso and
and Carlson, 2005; Blomqvist, 2008; Hasso and Carlson, 2004; Carlson (2005) and Hasso and Carlson (2004) have suggested a
Khoury et al., 2008) which are discussed in more details below. classification on design patterns based on the meaning of design
In addition, we propose a third approach, called Text Classification. pattern problem definition and semantic analysis of its sentences.
First approach (UML-Based Approach). Kim and Khawand In this work, the classification is made based on the meaning of
(2007) and Kim and Shen (2008) categorized the problem defini- verbs in the design pattern problem definition, uses linguistic theo-
tion sources into two parts, design pattern problem definition and ries in semantic analysis of sentence structure. The authors claimed
known problems for the design pattern. Then, based on the above- that the right design pattern can be detected through the semantic
mentioned sources they have used an abstract Meta-model to classification of design patterns, but they have not presented any
formalize design patterns. The Meta-model has been inspired from evaluation of their method, and it is noted that the classification is
structure (class diagram) and behavior (collaboration diagram) of not automatic.
a software document. In this work, it has been attempted to make Khoury et al. (2008) suggested an interesting method based on
abstract statement of design patterns structure, and pattern detec- Ontology Web Language (OWL), uses an ontological interface for
tion is performed according to UML diagrams. This approach has a software developers to select security patterns, aims at providing
number of limitations. First, it has limited to precisely specify the mapping between requirements from one side and threat models,
problem definition of all design patterns, because the created Meta- security bugs, security errors on another side taking into considera-
model may be the same for some design patterns, for example, it tion their contexts of applicability. This wok has two shortcomings;
seems likely that the Meta-model for State Pattern (from Gamma first, it is need to formal specifications of security concepts and
et al. book (Gamma et al., 1994)) is the same as the one for Strat- security requirements like confidentiality. Nevertheless, there are
egy Pattern (from Gamma et al. book (Gamma et al., 1994)), and no formal descriptions in all software design domains, and then
another example is that the Meta-models for three design patterns, it leads to a limitation to apply this work in practice. However, in
Bridge, Factory Method and Abstract Factory (from Gamma et al. book our proposed method, there is no need for formal specifications of
(Gamma et al., 1994)) are the same. The reason for this difficulty is design patterns. Second, the authors did not report the effectiveness
that the structure of the problem definition of these design patterns of the method.
is the same from viewpoint of UML diagrams, but their purposes are Blomqvist (2008) suggested a method for automatic selection of
different (Kampffmeyer and Zschaler, 2007). Moreover, for many right design patterns, uses ontology of design patterns to rank of
object-oriented design patterns (Gamma et al., 1994), for example, the right design patterns. Blomqvist used some criteria for ranking
Façade, Flyweight, Decorator, and Prototype patterns, a Meta-model the design patterns; include class matches, centrality, density and
cannot be offered. But in our proposed method, because of the semantic similarity. However, this work has a number of limitations
capability of stating any type of problem in text structure, there similar to the ones of Khoury et al. (2008).
is no such limitation. Second, this approach is not scalable due At the end, the comparison between our purposed method and
to increase in the similarities of the Meta-models for great num- second approach are presented below.
ber of design patterns, but as already observed in Section 7.4, our First, the second approach suffers from lack of a single ontology
proposed method can be extended to a large number of design for each of the sections of a domain, while in our proposed method;
patterns. there is no need for a single ontology. Second, the cost of ontology
Third, the cost of creating a Meta-model for each design pattern creation is too high, while our proposed method needs less pro-
is too high, while our proposed method requires less professional fessional work on design patterns and in general lower cost. Last,
working on the design pattern problem definition and in general the second approach has a number of difficulties in automation of
needs low cost. Fourth, in this work, the design pattern(s) is not its process which are explained in more detail in Blomqvist (2008).
suggested along with its similarity to the design problem, while For example, the output of most ontology approaches is diverse and
our proposed method suggest different design patterns according lacks proper structure, thereby requiring manual post-processing
to the extent of similarity between the problem definition of the and after extracting the problem definition (Blomqvist, 2008), then,
retrieved design pattern and the design problem. Fifth, it is just it needs manual addition of concepts, while our proposed method
applicable in object-oriented design patterns with explaining their does not have any of these difficulties.
S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424 423

Third approach (Text Classification Approach). The proposed References


method uses the third approach, i.e., the text classification of the
problem definition of design patterns, to increase efficiency and Alpaydın, E., 2010. Introduction to Machine Learning, second ed. The MIT Press
Cambridge, Massachusetts, London, England.
automation of right pattern selection. The proposed method has Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D., 2000. An exper-
advantages over the similar works. First, there is no need to for- imental comparison of naive Bayesian and keyword-based anti-spam filtering
mal specifications of design pattern problem definition. Second, not with personal e-mail messages. In: Proceedings of SIGIR-00, 23rd ACM Interna-
tional Conference on Research and Development in Information Retrieval, pp.
only it is applicable on any kind of design pattern, but also it can 160–167.
be used in selecting library routines. Third, it can suggest different Attardi, G., Marco, D., Salvi, D., 1998. Categorization by context. Journal of Universal
design patterns according to the extent of similarity between the Computer Science, 719–736.
Bass, L., Clements, P., Kazman, R., 2003. Software Architecture in Practice. Addison-
problem definition of the retrieved design pattern and the design Wesley Professional.
problem. Fourth, the proposed method presents a systematic way Blomqvist, E., 2008. Pattern ranking for semi-automatic ontology construction. In:
to evaluate and improve inconsistencies in the manual classifica- Proceedings of the ACM Symposium on Applied Computing, pp. 2248–2255.
Booch, G., 2006. Handbook of Software Architecture Book. http://www.booch.com/
tions made by researchers
architecture/patterns.jsp.
Fifth, it has simple automation of pattern selection process and it Case Studies of “Design Patterns Selection: An Automatic Two-Phase Method” paper,
is low costs in comparison to other approaches. Last, it is applicable 2011. http://www.modares.ac.ir/enpage/systems/index/Schools/ece/grp/cmp/
for a large number of design patterns, i.e., it is scalable. res/lab/SCSLAB/Project/Project2.
Coad, P., North, D., Mayfield, M., 1995. Object Models: Strategies, Patterns, Applica-
The considerable point in this research in comparison to sim- tions. Yourdon Press, Upper Saddle River, NJ, USA.
ilar works is that the proposed method has been evaluated with Douglass, B.P., 2002. Real-Time Design Patterns: Robust Scalable Architecture For
three pattern groups and enough design problems, but in all similar Real-Time Systems. Addison-Wesley/Longman Publishing Co., Inc., Boston, MA,
USA.
works only a weak evaluation is performed. Therefore, to provide Gamma, E., Helm, R., Johnson, R., Vlissides, J., 1994. Design patterns: Elements of
for the conditions of an equal evaluation, there needs to be a com- Reusable Object-Oriented Software. Addison-Wesley, Reading, MA.
mon benchmark. Graves, A.R., Czarnecki, C., 2000. Design patterns for behavior-based robotics. IEEE
Transactions on Systems, Man and Cybernetics: Part A 30, 36–41.
Hasso, S., Carlson, C.R.,2004. Linguistics-based software design patterns classifica-
tion. In: Proceedings of 37th Annual Hawaii International Conference on System
10. Conclusions
Science (HICSS-37). IEEE Computer Society Press.
Hasso, S., Carlson, C.R., 2005. A theoretically-based process for organizing design
The experimental results show that the proposed method based patterns. In: Proceedings of 12th Pattern Language.
on the text classification approach appears to be a highly promis- Hotho, A., Nürnberger, A., Paaß, G., 2005. A brief survey of text mining. Journal for
Computational Linguistics and Language Technology 20, 19–62.
ing basis for selecting a right design pattern for a given design Hsueh, N.-L., Kuo, J.-Y., Lin, C.-C., 2007. Object-oriented design: a goal-driven and
problem. The text classification approach supports the design pattern pattern-based approach. Journal for Software and Systems Modeling (Springer-
selection method with a low cost preprocessing, thus it is faster, sim- Verlag) 8 (1), 1–18.
Jalili, S., Bitarafan, M., 2006. Performance improvement in text classification based
pler, and less costly than other approaches. The proposed method on a new feature selection method. Journal of Faculty of Engineering, University
was evaluated by using three design pattern groups and enough of Tehran, JFE 40 (3), 313–328 (in Persian).
real design problem descriptions, but since previous research Jalili, S., Gerani, S., 2007. Spam detection with genetic algorithm and SVM. In: 6th
International ISC (Iranian Society of Cryptology) Conference on Information
works are not evaluated based on known real design problems, Security and Cryptology (ISCISC’09), (in Persian).
so we present the advantages of our work just in a qualitative Jalili, S., Sadri, A., 2007. Performance improvement in text classification using two
way. layered classifier committees. Journal of Faculty of Engineering, University of
Tehran, JFE 41 (5), 597–614 (in Persian).
The proposed method has four phases, Preprocessing, Learn- Joachims, T., 1998. Text Categorization with Support Vector Machines: Learning with
ing Design Patterns Classifiers, Determination of a Design Patterns Many Relevant Features. Springer, Berlin/Heidelberg.
Class, and Suggestion of Design Pattern(s). To evaluate the proposed Java Programs for Machine Learning, University of Waikato, 1998–2006.
http://www.cs.waikato.ac.nz/∼ml/weka.
method, an evaluation model was proposed. The main goal of the
Kampffmeyer, H., Zschaler, S., 2007. Finding the pattern you need: the design pattern
evaluation is to show the effectiveness of the proposed method, to intent ontology. Model Driven Engineering Languages and Systems, 211–225.
determine the best learning technique compatible with the design Karypis, G., 2002. CLUTO – A Clustering Toolkit, Release 2.1.1. Department
pattern selection and the best weighting method for each learning of Computer Science, University of Minnesota. http://www.cs.umn.edu/
∼karypis/cluoto.
technique. Khoury, P.E., Mokhtari, A., Coquery, E., Hacid, M.S., 2008. An ontological interface for
According to the obtained results, we conclude several impor- software developers to select security patterns. In: Proceedings of 19th Interna-
tant consequences. First, when a design pattern classification is tional Conference on Database and Expert Systems Application, (DEXA’08), pp.
297–301.
consistent and has a low anomaly, Naïve Bayes has the best results. Kim, G.J., Han, J.S., 2007. Clustering algorithm of design pattern using object-oriented
Second, SVM achieves better results without reduction of incon- relationship. In: LNCS, Springer, Computational Science and Its Applications,
sistencies. Third, the sensitivity of Naïve Bayes toward inconsistent ICCSA 2007, pp. 997–1006.
Kim, D.K., Khawand, C.E., 2007. An approach to precisely specifying the problem
data is more than SVM. Also, Naïve Bayes keeps the values of criteria domain of design patterns. Journal of Visual Languages and Computing (Elsevier)
P and R close to one another, while SVM sacrifices criterion R to cri- 18, 560–591.
terion P. Finally, SVM is not sensitive to the feature selection, while Kim, D.K., Shen, W., 2008. Evaluating pattern conformance of UML models: a divide-
and-conquer approach and case studies. Software Quality Journal (Springer,
the reduction of features in Naïve Bayes improves the evaluation Netherlands) 16 (3), 329–359.
results. Larkey, L.S., 1999. A patent search and classification system. Proceedings of Fourth
In a future work, we intend to use a combination of Naïve ACM Conference on Digital Libraries, 179–187.
Porter, M.F., 2006. An algorithm for suffix stripping. Journal of Program: Electronic
Bayes and Support Vector Machine (NB-SVM) methods to train a
Library and Information Systems 40, 211–218.
classifier to improve the performance of the proposed method. Pree, W., 1995. Design Patterns for Object-Oriented Software Development.
Moreover, we are going to use two adjacent words instead of sin- Addison-Wesley, Reading, MA.
gle word in feature vector (Jalili and Bitarafan, 2006) to improve the Rising, L., 2000. The Pattern Almanac 2000. Addison-Wesley, Boston.
Schumacher, M., Fernandez, E., Hybertson, D., Buschmann, F., 2006. Security Pat-
results. terns: Integrating Security and Systems Engineering. John Wiley & Sons.
Sebastiani, F., 2002. Machine learning in automated text categorization. Journal of
ACM Computing Survey (CSUR) 34, 1–47.
Acknowledgement Silberschatz, A., Galvin, P.B., Gagne, G., 2002. Operating System Concepts, 6 ed.
Steinbach, M., Karypis, G., Kumar, V., 2000. A comparison of document clustering
techniques. In: KDD Workshop on Text Mining, p. 35.
This Project has been supported in part by Iran Telecommuni-
SVM-Light. Support Vector Machine. http://svmlight.joachims.org/.
cation Research Center (ITRC). Tanenbaum, A.S., 2001. Modern Operating Systems, second ed. Prentice Hall.
424 S.M.H. Hasheminejad, S. Jalili / The Journal of Systems and Software 85 (2012) 408–424

Tichy, W.F., 1997. A catalogue of general-purpose software design patterns. In: Pro- Seyed Mohammad Hossein Hasheminejad is a Ph.D.
ceedings of Technology of Object-Oriented Languages and Systems, pp. 330–339. candidate of computer engineering at Tarbiat Modares
Trowbridge, D., Cunningham, W., Brader, L., Slater, P., 2006. Describing the Enter- University (TMU). He received the M.Sc. degree in Soft-
prise Architectural Space Organizing Table. http://msdn.microsoft.com/library/ ware Engineering from TMU in 2009, and the B.Sc. degree
default.asp?url=/library/en-us/dnpag/html/entarch.asp. in Software Engineering from Tarbiat Moalem University
http://www.uea.ac.uk/cmp/research/graphicsvisionspeech/speech/WordStemming. in 2007. His main research interests are Formal Meth-
Zimmer, W., 1995. Relationships between design patterns. Journal of Pattern Lan- ods for Software Engineering, Object-Oriented Analysis
guages of Program Design 1, 345–364. and Design, Search Based Software Engineering, and Self-
Adaptive Systems.
Saeed Jalili received the Ph.D. degree from Bradford Uni-
versity (UK) in 1991 and the M.Sc. degree in computer
science from Sharif University of Technology in 1985.
Since 1992, he has been Associate Professor at Tarbiat
Modares University (TMU). His main research interests are
Self-* Systems, Software Runtime Verification, and Quan-
titative Evaluation of Software Architecture.

You might also like