Professional Documents
Culture Documents
Rough Set Concepts: I Formation
Rough Set Concepts: I Formation
Rough set theory was developed by Zdzislaw Pawlak in the early 1980's. There
has been rapid growth of interest in rough set theory and its applications. It deals
with the classificatory analysis of data tables. The main goal of the rough set
chapter introduces the basic underlying concepts and terminology related to rough
sets. It also explains the classical rough set method for mining rules from the data
sets.
3.1 Introduction
The philosophy of Rough Set (RS) is founded on the assumption that with every
basis of the rough set theory. Any set of all indiscernible objects is called an
elementary set, and forms a basic granule of knowledge about the universe. Any
36
union of some elementary sets is referred to as crisp set otherwise the set is rough
(imprecise, vague). Consequently each rough set has boundary-line cases while
crisp sets have no boundary-line elements at all. In the rough set approach, a
vague concept is replaced by a pair of well defined concepts called the lower and
the upper approximation of the vague concept. The lower approximation consists
of all objects which surely belong to the concept and the upper approximation
values of attributes that describe the objects. These facts are represented in the
form of a data table. Entries in a row represent an object. A data table described as
empty, finite set of objects is called the universe; Q a finite set of attributes; V =
uVq, lfq E Q and Vq being the domain ·of attribute q; andf U x Q ~V, fbe the
information function assigning values from the set of examples to each of the
decision attribute. Information systems of this kind are called decision systems.
37
Example 3.1
Table 3.1 is an example of an information system containing data about 6 patients.
Columns of the table are labelled by attributes (symptoms) and rows by objects
pain, Temperature and Flu, and Patients p2 and p5 are indiscernible with respect
attribute Headache generates two elementary sets {p2, p3, p5} and {pl, p4, p6}.
intended to express that due to lack. of knowledge, we are unable to discern some
38
for a subset P c Q of attributes of an information system S, a relation called
If (x, y) E INDs(P) then objects x andy are called indiscernible with respect toP.
Here f(x, a) and f(y, a) represent the value of the attribute a for the objects x and y
implied from the context. IND(P) is an equivalence relation that partitions U into
equivalence classes, the sets of objects indiscernible with respect toP. Set of such
Example 3.2
We have seen in the Example 3.1 that the attribute Headache generates two
elementary sets namely {pi, p4, p6} and {p2, p3, p5}.
These partitions can be used to build new subsets of the universe. Subsets that are
more often of interest have the same value of the decision attribute. Let X~ U be
a desired subset of the universe. A description for X is desired that can determine
relation is used for this purpose. If a partition defined by IND(P) partially overlaps
39
with the set X, the objects in such an equivalence class can not be determined
A set X for which E._X= PX is called as exact set otherwise it is called rough set
BNp(X) = PX -f_X
is called the ?-boundary region of X, and thus consists of those objects that we can
not decisively classify into X on the basis of knowledge in P. The set U:- P X is
called the ?-outside region of X and consists of those objects which can be with
Boundary region is non-empty for a rough set and empty for a crisp set.
Example 3.3 \
For Table 3.1, Patient p2 suffers from flu, whereas Patient p5 does not, and they
Headache, Muscle-pain and Temperature for p2 and p5. Therefore p2 and p5 are
boundary-line cases which can not be properly classified in view of the available
40
us to classify them with certainty as suffering from flu, p2 and p5 can not be
excluded as suffering from flu and p4 for sure does not suffer from flu, in view of
- -
(2) f_(¢) = P(¢) = ¢, f_(U) = P(U) = U
- - -
(3) P(X u Y) = P(X) u P(Y)
(4) f_(XnY)=f_(X)nf_(Y)
- -
(7) P(X n Y) c P(X) n P(Y)
-
(8) P(-X) = -P(X)
(9) P(-X)=-f_(X)
-
(1 0) P(P(X)) = P(f_(X)) = f_(X)
- -
(11) P(P(X)) =f_(P(X)) = P(X)
41
Boundary
Region
Flu= yes
{ {p 1}'{p3}' {p6}}
-
Flu= yes I no
{{p2,p5}} Flu= no
{ {p4}}
42
3.4.2 Classes of Rough sets
Four basic classes of rough sets can be defined as follows:
a (X)== I f.( X) I,
p I P(X) I
some values of P are determined by the values of R. Rough set theory introduces a
43
YR (p) = card(POSR(P))
card(U) where POS
II
(P) = u RX
XeU I IND(P)-
(3.3)
The set POSR(P), positive region, is the set of all the elements of U that can be
represents the fraction of the number of objects in the universe which can be
Example 3.4
uniquely only some values of the attribute Flu. That is, (Temperature, very high)
implies (Flu, yes), similarly (Temperature, normal) implies (Flu, no), but
(Temperature, high) does not always imply (Flu, yes). Thus there exists partial
equation (3.3):
preserving its basic properties, that is, whether a table contains some superfluous
data. One natural dimension of reducing data is size of the dataset. It is achieved
equivalence class to represent the entire class. The other dimension in reduction is
44
the number of attributes which is achieved by keeping only those attributes that
preserve the indiscemibility relation and, consequently the set approximation. The
rejected attributes are redundant with respect to classification since their removal
3.6.1 Reducts
A minimum set of attributes that preserves the indiscemibility relation is called a
Hence examples are discussed for decision relative reduct only. The relative
YRED(P,Q) = yp(Q)
For any attribute a e RED(P,Q), YRED(I',QJ-{al (Q) < YP (Q), i.e. the relative reduct
a single relative reduct in linear time [SR92, Joh74]. Genetic algorithms [Gol89]
are also used for simultaneous computation of many reducts in often acceptable
time, unless the number of attributes is very high [Wro95, Wro98, BK97].
45
3.6.2 Global and Local Reducts
A reduct if not explicitly mentioned as local reduct is called a global reduct.
Local reducts are also called value reducts. They are based on the fact that we can
often remove some values of the attributes without affecting the consistency of the
information system. Like global reducts, decision relative local reducts are useful
3.6.3 Core
The intersection of all relative reducts is called a relative core. In other words,
each element of the core belongs to some reduct. In a sense, the core is the most
important subset of attributes, for none of its elements can be removed without
Example 3.5
For Table 3.1, we have two reducts {Temperature, Headache} and {Temperature,
Muscle-pain}, with respect to attribute Flu. It means that either the attribute
Headache or the attribute Muscle-pain can be eliminated from the table and
consequently instead of Table 3.1 we can use either Table 3.2 or Table 3.3.
46
Table 3.2: Flu Data with Reduced Condition Attributes Headache and
Temperature·
pl No high yes
p4 No normal no
p5 Yes high no
Table 3.3: Flu Data with Reduced Condition Attributes Muscle-Pain and
Temperature
p2 No high yes
p4 Yes normal no
p5 No high no
47
Example 3.6
By removing some values of the attribute Headache, Table 3.2 and Table 3.3 can
48
3.6.4 Reduct Co1nputation
To understand the methodology of reduct computation, we will use discernibility
matrix which is defined next. LetS be an information system with n objects. The
below,
Each entry thus consists of the set of attributes upon which objects Xi and x1 differ.
Boolean variables a\ ... ,a "m (corresponding to the attributes a1, ... ,am) defined as
follows,
while fully preserving the function's semantics. First of all, duplicate sums can be
idempotence, meaning that a.a = a for all members a. If the function has n
product terms, this can be done by a simple scan and sort procedure which is
bounded by the sorting step, typically O(n log n). Furthermore, a sum that
includes ("is a superset of') another sum in the function can be safely eliminated
called absorption. Absorption can be carried out naively in O(n 2 ) time, but sub
1
An implicant of a Boolean function/is any conjunction of literals (variables or
their negations) such that if the values of these literals are true under an arbitrary
valuation v of variables then the value of the function/under vis also true. A
49
determines the set of all reducts of S. In other words, all constituents in the
minimal disjunctive normal form of the function f(a ·,, ... ,a *mJ are all reducts of
In order to compute the value core and value reducts for an object x1, the
slightly modified,
Relative reducts and core can also be computed using discernibility matrix, which
ciJ =={a E Q I a(x;) -:F- a(x1)} for i,j == l, .. ,n and w(xh xj) (3.7)
Recall that A is the set of condition attributes and D is the set of decision
attributes. Ifthe partition defined by Dis definable by A then the condition w(xi,xj)
in the above definition can be reduced to (xi, x) ~ IND(D). Thus entry Cu· is the set
of all attributes which discern objects xi and x1 that do not belong to the same
equivalence class of the relation IND(D). The D-core (decision relative core) is ·
the set of all single element entries of the ciiscernibility matrix i.e.
50
Similarly, D-reduct (decision relative reduct) is the minimal subset of
attributes that discerns all equivalence classes of the relation INDs(D) discernible
disjunctive normal form of the discernibility function are all decision relative
reducts of A. In this study, decision relative reducts are extensively used, hence
Example 3.7
To compute the decision relative reduct and core of Table 3.1.
Using 3.7, first we construct the decision relative discernibility matrix as shown.
Note that objects having the same decision are not compared among themselves.
Also the matrix is symmetrical with respect to the diagonal; hence only upper half
of the matrix needs to be considered for defining discernibility function using 3.5.
p2 p3 p4 p5 p6
pl t hm
p2 .hmt r.p
p3 ht mt
p4 t hmt ht t
p5 hm r.p mt hmt
p 6\ t
= ht+mt
51
Thus a function in POS form is simplified to a function in SOP form. Each
product term in this simplified form is the prime implicant of the function. Thus
decision relative global reducts of this function are {h, t} and {m, t} corresponding
to Table 3.2 and Table 3.3 respectively. Decision relative core (intersection of all
the reducts) of the Table 3.1 is {t}. By observing single element entries in the
Approximation algorithms do not give an optimal solution but have the acceptable
adopted genetic algorithms for generating large number of reducts and Johnson's
between attribute values. The most primitive pattern and hence the fundamental
of attributes A over an object x E U and reading off the values of x for every
a are support and coverage. Support(a) refers to the number of objects in the
52
information system that have the property described by pattern a and coverage(a)
denote a decision system, and let o. denote a conjunction of selectors that only
allowed decision value. The decision rule read as if o. then f3 is denoted by a-+ f3.
The pattern o. is called the rule's antecedent, while the pattern f3 is called the rule
reading the attribute values from the reduced decision table using attributes in
reduct. In practical applications, where the rules are used to classify unseen
Like patterns, rules also have some numerical measures like accuracy, coverage
sunport(a.fJ)
accuracy( a ~ fJ) = -----=-r....:_____:.___:__;_ (3.8)
support( a)
support(a.jJ)
cove rage( a ~ fJ) = --=--=----____:.---'--~ (3.9)
support(fJ)
high degree of coverage, although one does not necessarily imply others. Figure
3.3 shows graphically that as the antecedent of a decision rule grows longer, the
coverage decreases while the accuracy increases. Defining a point that balances
the trade-off between these two numerical rule measures can be difficult in
53
Xo
1- --,
0
I I x1
& ~----, I_ I
u
0
;.
0 ·:___II 1- ------------,I
~ I
0
.....l r----------1-- - - - - - - - - - - - 1 I
I I I I
I I I I
I I I I
0 .I
I
I
I
I~ -
I I
"'....
0
;.
I I I I
0 I I_ -----------1-
I I
~ L..---------- ------------'
IaI
54
practice, and is also a function of the application domain. Bazan explores the issue
Example 3.8
To produce the decision rules for Table 3.1, we make use of the reducts obtained
1. If Muscle-pain= yes and Temperature= high then Flu= yes (1, 0.25);
2. If Muscle-pain= no and Temperature= high then Flu= yes (0.5, 0.25);
3. If Muscle-pain= yes and Temperature= very high then Flu= yes (1, 0.5);
4. If Muscle-pain= yes and Temperature= normal then Flu= no (1, 0.5);
5. If Muscle-pain= no and Temperature= high then Flu= no (0.5, 0.5);
The figures in the parenthesis refer to accuracy and coverage of the rule
concepts are obtained using the methodology explained in Example 3.8. On closer·.
observation of the rules generated in the above case there exists scope for
improvement on the number of selectors and the number of rules e.g. rule 1 and
rule 2 for both the reducts can be replaced by a single rule "lfTemperature =high
then Flu = yes". This results in reducing the length of the rule as well as
improving the accuracy and coverage. This issue is addressed in the next chapter.
55
3.8 Some Other Terms and Concepts
3.8.1 Rough Membership
The rough membership function is a function that when applied to object x
quantifies the degree of relative overlap between the set X and the equivalence [x]
of Pr(x EX I x, B), the conditional probability that object x belongs to set X, given
Zia93b]. The papers introduce the VPRS model, and demonstrate how it can be
used as a tool for data analysis. The primary advantage of a VPRS model is the
The formulae for the lower and upper set approximations can be
Note that the lower and upper approximations as originally formulated are
obtained as a special case with :r = 1.0 . A rough set X defined through the lower
56
and upper approximations !11rX and B1rX is also referred to as variable precision
rough set and can be seen as a way of thinning the boundary region [Zia93a,
Zia93b].
expresses how exactly the set of attributes B approximates the set of condition
allow increasing the classification accuracy on testing data. The error level of
reduct approximation should be tuned for a given data set to achieve this effect.
the positive region. For R E RED(A,d) and N equal to the number of objects in the
approximation [SP97].
Algorithm ApproximateReduct
Step 2: Choose from the reduct R one attribute a satisfying the condition:
57
Begin
R = R-{ao }, go to step 1
End
Step 4: The new set of attributes (R) is called the approximate reduct.
the classification of objects from the training set, but it results in more general
positive regions, reduct etc., as described in this chapter, can be straight forward
computed from the discernibility matrix with space complexity O(kn 2) and time
attributes of data table. These methods will not be feasible for large data sets with
this complexity. Hoa and Son [HS96] presents efficient algorithms for computing
positive regions, lower and upper approximations in O(kn log n) time using O(n)
complexity and O(kn) space complexity (Chapter 4). Their approach can be\
3.10 Summary
In this chapter, the methodology of classical rough set theory for knowledge
58
definitions. Using a very small hypothetical dataset numbers of examples are
presented. It has been shown that RS can be treated as tool for data table analysis.
Some popular extensions of the classical rough set model e.g. variable precision
rough set model and approximate reducts are also introduced. Efficient heuristics
have been discussed to compute rough set tools thus making them suitable for
59