Professional Documents
Culture Documents
A Framework To Predict Software Quality in Use From Software Reviews
A Framework To Predict Software Quality in Use From Software Reviews
I.
INTRODUCTION
TABLE I.
DEFINITIONS OF QUALITY IN USE
CHARACTERISTICS FROM ISO 25010.
Characteristic
Effectiveness
Efficiency
Freedom
From Risk
Satisfaction
Context
Coverage
Definition
Accuracy and completeness with
which users achieve specified goals
(ISO 9241-11).
Resources expended in relation to
the accuracy and completeness with
which users achieve goals (ISO
9241-11).
Degree to which a product or
system mitigates the potential risk
to economic status, human life,
health, or the environment.
Degree to which user needs are
satisfied when a product or system
is used in a specified context of use.
Degree to which a product or
system can be used with
effectiveness, efficiency, freedom
from risk and satisfaction in both
specified contexts of use and in
contexts beyond those initially
explicitly identified.
THE PROBLEM
RELATED WORKS
Here we present the related works that are used to build the
proposed framework grouped into WordNet, EM algorithm,
topic modeling, and feature extraction and summarization.
A. WordNet
WordNet is a lexical knowledge of English that contains more
than 155,000 words organized into a taxonomic ontology of
terms1. It contains nouns, verbs, adjectives and adverbs that are
clustered into synonym sets (called synsets). Each synset can
be linked to different synsets via a specific relationship entailed
between concepts. Hyponym/hypernym or is-a relationship,
and the meronym/holonym or part-of relationship are
common relationships in WordNet.
B. EM Algorithm
EM is class of iterative algorithms that uses maximum a
posteriori to estimate a new classifier in problems with
incomplete data[6]. In other words, EM uses a semi-supervised
learning seeded with labeled data to estimate the incomplete
data (unlabeled documents). EM is based on two steps known
as the E-Step (Expectation) and M-Step (Maximization). In the
E-step the model parameters (class distributions) are estimated
while in the M-Step the class that gets the maximum value over
the estimated parameters is returned. The algorithm iterates
between E-Step and M-Step till the model parameters
converge. As a result, each unlabeled document will get
labeled with the most appropriate class. In our work, the
initialization of the EM algorithm will be features and opinion
words extracted using Qiu algorithm.
C. Topic Modeling
Topic modeling methods can be instinctively viewed as
clustering algorithms that cluster terms into meaningful
clusters or subtopics. In probabilistic topic models [7][10],
[11], documents are a mixture of topics where a topic is a
probability distribution over words. To generate a document
one choose distribution over topics, for each word in that
document one choose a topic at random according to its
distribution and draws a word from that topic. A famous topic
modeling model is called LSI or LSA[12], [13] . LSA
transforms text to low dimensional matrix and it finds the most
common topics that can appear together in the processed text.
Latent Dirichlet Allocation (LDA) is very famous topic
modeling[7]. The model extends Probabilistic Latent Semantic
Analysis (PLSA) model[6] to cover two problems: over fitting
and the limitation of assigning probability to a document
outside the training set.
D. feature extraction , classification, and summarization
Feature/topic extraction has been discussed in literature in
many works such as [1418]. Most of these works use the
language semantics to extract features such as nouns and noun
phrases along with their frequencies subject to predefined
thresholds. Another type of works deal with user reviews as
documents, and apply text classification approaches such as
[6], [1921] to classify reviews.
Leopairote, Surarerks, & Prompoon [9] proposed a model that
can extract and summarize software reviews in order to predict
software quality in use. The model depends on a manually
built ontology of ISO 9126 quality in use keywords and
WordNet 3.0 synonyms expansion.
Qiu et al. [18], [22] suggested to extract both features and
opinion by propagating information between them using
grammatical syntactic relations. The algorithm outperform
state-of-art algorithms of Kanayama & Nasukawa, [23],
Hofmann [11] , and Lafferty, Mccallum, & Pereira [24] .
Mukherjee & Bhattacharyya [14] extract domain-independent
features from reviews and classify them by graphing dependent
features/opinions and finding the shortest path to predefined
cluster heads.
PROPOSED FRAMEWORK
Review
Corpus
Quality in
use Data
set
Seeds
Seeds
Opinion
lexicon
http://sentiwordnet.isti.cnr.it/
Where
is the quality in use characteristics sentences
classified as {efficiency, effectiveness, risk mitigation,
( ) { } is a
satisfaction, context coverage},
positive or negative orientation of each sentence.
V.
PREIMILIARY RESULTS
ISO 25010
Documents: ISO25010,
Sample software Domain
Quality in
Use
Topic Modeling
Mapping
Building
keywords
Quality in use
topic
Keywords
(quality in use
dictionary)
VII. REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
Effectiveness
Achieve
accessibility
alert
Appropriateness
availability
Bug
Able
background
communication
Automatic
capacity
confidentiality
Control
compatible
Cost
Behavior
CPU
crash
Change
efficiency
Disabilities
Achieve
accessibility
alert
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
CONCLUSION
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]