Rough Set Concepts: I Formation

Chapter 3
Rough Set Concepts
Rough set theory was developed by Zdzislaw Pawlak in the early 1980's. There
has been rapid growth of interest in rough set theory and its applications. It deals
with the classificatory analysis of data tables. The main goal of the rough set
analysis is to synthesize approximation of concepts from the acquired data. This
chapter introduces the basic underlying concepts and terminology related to rough
sets. It also explains the classical rough set method for mining rules from the data
sets.
3.1 Introduction
The philosophy of Rough Set (RS) is founded on the assumption that with every
object of the universe of discourse we associate some information. For example,.if

. \
objects are patients suffering from a certain disease, symptoms of the disease form
information about patients. In view of the available i~formation objects
characterized by the same values of the corresponding attributes are indiscernible
(similar). The indiscernibility relation generated in this way is the mathematical
basis of the rough set theory. Any set of all indiscernible objects is called an
elementary set, and forms a basic granule of knowledge about the universe. Any
36
union of some elementary sets is referred to as crisp set otherwise the set is rough
(imprecise, vague). Consequently each rough set has boundary-line cases while
crisp sets have no boundary-line elements at all. In the rough set approach, a
vague concept is replaced by a pair of well defined concepts called the lower and
the upper approximation of the vague concept. The lower approximation consists
of all objects which surely belong to the concept and the upper approximation
contains all objects which possibly belong to the concept.
3.2 Information System and Decision Table

In rough set theory, knowledge is a collection of facts expressed in terms of the
values of attributes that describe the objects. These facts are represented in the
form of a data table. Entries in a row represent an object. A data table described as
above is called an information system, attribute-value table or information· table.
Formally, an information systemS is a 4-tuple, S = (U, Q, V, f) where U a non-
empty, finite set of objects is called the universe; Q a finite set of attributes; V =
uVq, lfq E Q and Vq being the domain ·of attribute q; andf U x Q ~V, fbe the
information function assigning values from the set of examples to each of the
attributes q for every object in the universe U.
In many applications, there is an outcome of classification that is known.
This posterior knowledge is expressed by one distinguished attribute called
decision attribute. Information systems of this kind are called decision systems.
Formally, a decision table is an information system where Q = (A u D). A is the
set of categorical attributes and D is a set of decision attributes. In RS, the
decision table represents either a full or partial dependency occurring in data.
37
Example 3.1
Table 3.1 is an example of an information system containing data about 6 patients.
Columns of the table are labelled by attributes (symptoms) and rows by objects
(patients), whereas entries of the table are attribute values.
Patients p2, p3 and p5 are indiscernible with respect to the attribute
Headache, Patients p3 and p6 are indiscernible with respect to attributes Muscle-
pain, Temperature and Flu, and Patients p2 and p5 are indiscernible with respect
to attributes Headache, Muscle-pain and Temperature. Hence, for example, the
attribute Headache generates two elementary sets {p2, p3, p5} and {pl, p4, p6}.
Considering Flu as the decision attribute and, Headache, Muscle-pain and
Temperature as condition attribute, the table may be called as decision system.
Table 3.1: Example of an Information System: Flu Dataset
Patient Headache Muscle-pain Temperature Flu

pl no yes high yes
p2 yes no high yes
p3 yes yes very high yes
p4 no yes normal no
p5 yes· no high no
p6 no yes very high yes
3.3 lndiscernibility Relation

The most basic concept in rough set theory is the indiscernibility relation,
generated by information about objects of interest. The indiscernibility relation is
intended to express that due to lack. of knowledge, we are unable to discern some
objects employing the available information. It means that, in general, we are
unable to deal with single objects, a fundamental concept of RS theory. Formally,
38
for a subset P c Q of attributes of an information system S, a relation called
indiscernibility relation denoted by JND is defined as,
INDs (P) = {(x, y) E U xU: f(x, a)= f(y, a) 'r7 aE P}
If (x, y) E INDs(P) then objects x andy are called indiscernible with respect toP.
Here f(x, a) and f(y, a) represent the value of the attribute a for the objects x and y
respectively. The subscript s may be omitted in INDs(P) if information system is
implied from the context. IND(P) is an equivalence relation that partitions U into
equivalence classes, the sets of objects indiscernible with respect toP. Set of such
partitions are denoted by UIIND(P). An equivalence class of IND(P), i.e., the
block of the partition U/IND(P), containing object xis denoted by P(x).
Example 3.2
We have seen in the Example 3.1 that the attribute Headache generates two
elementary sets namely {pi, p4, p6} and {p2, p3, p5}.
Thus, UIIND {Headache}= {{pl, p4, p6}, {p2,p3,p5}}
Similarly, UIIND {Muscle-pain}= {{pl,p3,p4,p6}, {p2,p5} },
UIIND {Headache, Muscle-pain}= {pl,p4,p6}, {p2,p5}, {p3}} and
UIIND {Flu}= {{pl,p2,p3,p6}, {p4,p5} }.
3.4 Approximation of Sets

We have seen that equivalence relation induces a partitioning of the universe.
These partitions can be used to build new subsets of the universe. Subsets that are
more often of interest have the same value of the decision attribute. Let X~ U be
a desired subset of the universe. A description for X is desired that can determine
the membership status of each object in U with respect to X. Indiscernibility
relation is used for this purpose. If a partition defined by IND(P) partially overlaps
39
with the set X, the objects in such an equivalence class can not be determined
without ambiguity. Consequently, description of such a set X may not be possible.
Therefore, the description of X is defined in terms of P-lower approximation
(denoted as E..) and P-upper approximation (denoted asP). For PI;; Q
f_X = u{Y e U I IND(P): Y ~X} (3.1)

P X = u {Y e U I IND ( P) : Y n X :t. ¢} (3.2)
A set X for which E._X= PX is called as exact set otherwise it is called rough set
with respect toP.
The objects in EX can be with certainty classified as members of X on the
basis of knowledge in P, while the objects in P X can only be classified as
possible members of X on the basis of knowledge in P. The set
BNp(X) = PX -f_X
is called the ?-boundary region of X, and thus consists of those objects that we can
not decisively classify into X on the basis of knowledge in P. The set U:- P X is
called the ?-outside region of X and consists of those objects which can be with
certainty classified as do not belonging to X on the basis of knowledge in P.
Boundary region is non-empty for a rough set and empty for a crisp set.
Example 3.3 \
For Table 3.1, Patient p2 suffers from flu, whereas Patient p5 does not, and they
are indiscernible with respect to the attributes Headache, Muscle-pain and
Temperature, hence Flu can not be characterized in terms of the attributes
Headache, Muscle-pain and Temperature for p2 and p5. Therefore p2 and p5 are
boundary-line cases which can not be properly classified in view of the available
knowledge. The remaining Patients p 1, p3 and p6 display symptoms which enable
40
us to classify them with certainty as suffering from flu, p2 and p5 can not be
excluded as suffering from flu and p4 for sure does not suffer from flu, in view of
the displayed symptoms. Thus, if P = {Headache,Muscle-pain,Temperature} then
f_ (Flu== yes)= {p1,p3,p6}, P (Flu= yes)= {p1,p2,p3,p5,p6}, BN(P) = {p2,p5}
f_ (Flu= no)= {p4}, P (Flu= no)= {p2,p4,p5}, BN(P) = {p2,p5}
The above approximation is visualized as presented in Figure 3.1.
3.4.1 Properties of Approximations

Following properties of approximations are easily observable:
(1) f_(X) c X~ P(X)
- -
(2) f_(¢) = P(¢) = ¢, f_(U) = P(U) = U
- - -
(3) P(X u Y) = P(X) u P(Y)
(4) f_(XnY)=f_(X)nf_(Y)
(5) X c Y implies f_(X) ~ f_(Y) and P(X) ~ P(Y)
(6) P(X u Y) ~ f_(X) u f_(Y)
- -
(7) P(X n Y) c P(X) n P(Y)
-
(8) P(-X) = -P(X)
(9) P(-X)=-f_(X)
-
(1 0) P(P(X)) = P(f_(X)) = f_(X)
- -
(11) P(P(X)) =f_(P(X)) = P(X)
In the above properties -X denotes U-X.
41
Boundary
Region
Flu= yes
{ {p 1}'{p3}' {p6}}
-
Flu= yes I no
{{p2,p5}} Flu= no
{ {p4}}
Figure 3.1: Approximating the set of flu patients using

attributes Headache, Muscle-pain and Temperature
Notes:
Equivalence classes contained in the corresponding
regions are :
P (Flu =yes)= {p1, p3, p6}
P (Flu =yes)= {p1, p2, p3, p5, p6}
BN(P) = {p2, p5}
f._ (Flu = no)= {p4}

P (Flu = no)= {p2, p4, p5}
BN(P) = {p2, p5}
42
3.4.2 Classes of Rough sets
Four basic classes of rough sets can be defined as follows:
X is roughly P- defina!;le, iff f..( X) -:f.¢ and P(X) -:f. U
X is internally P- undefinable, iff f_(X) ==¢and P(X) -:f. U
X is externally P- undefinable, iff f.( X) -:f.¢ and P(X) == U
X is totally P- undefinable, iff f.( X) == ¢ and P(X) == U
3.4.3 Accuracy of Approximation

Rough set can also be characterized numerically by the coefficient a/' (X), called
the accuracy of approximation and defined as follows:
a (X)== I f.( X) I,
p I P(X) I
where I X I denotes the cardinality of X. If a I' (X) == 1 means X is crisp while
a P (X)< 1 means X is rough with respect toP.
3.5 Dependency of Attributes

Another important issue in data analysis is discovering dependencies between
attributes. A set of attributes P depends totally on a set of attributes R, denoted as
R~P, if all values of attributes in P are uniquely determined by values of
attributes in R. In other words, P depends totally on R, if there exists a functional
dependency between values of attributes in P and R. A more general concept of
dependency of attributes, called a partial dependency of attributes means that only.
some values of P are determined by the values of R. Rough set theory introduces a
measure of dependency of two subsets of attributes P, R s;;; Q. The measure is
called a degree of dependency of P on R, denoted by y 11 (P). It is defined as
43
YR (p) = card(POSR(P))
card(U) where POS
II
(P) = u RX
XeU I IND(P)-
(3.3)
The set POSR(P), positive region, is the set of all the elements of U that can be
uniquely classified into partitions UIIND(P) by R. The coefficient r 11 (P)
represents the fraction of the number of objects in the universe which can be
properly classified. If P totally depends on R then y 11 (P) = 1, else y 11 (P) < 1.
Example 3.4
To understand the usage of equation (3.3) for computing dependency of Flu on
Temperature (Table 3.1), we observe that the attribute Temperature determines
uniquely only some values of the attribute Flu. That is, (Temperature, very high)
implies (Flu, yes), similarly (Temperature, normal) implies (Flu, no), but
(Temperature, high) does not always imply (Flu, yes). Thus there exists partial
dependency between Temperature and Flu. To determine Y.Jemperature (Flu), using
equation (3.3):
U = {p1,p2,p3,p4,p5,p6} and U/JND(Flu) = {{p1,p2,p3,p6},{p4,p5}}
POSrremperaturej( {Flu})= {p3,p6} U {p4} = {p3,p4,p6}
Thus, YTemperature (Flu) = 3/6 = 0.5
Further, we compute that r Headache (Flu) = 0 and r Muscle-pain (Flu)= 0.
3.6 Reduction of Attributes

One often faces a question, whether we can remove some data from a data table
preserving its basic properties, that is, whether a table contains some superfluous
data. One natural dimension of reducing data is size of the dataset. It is achieved
by identifying equivalence classes and considering only one element of the
equivalence class to represent the entire class. The other dimension in reduction is
44
the number of attributes which is achieved by keeping only those attributes that
preserve the indiscemibility relation and, consequently the set approximation. The
rejected attributes are redundant with respect to classification since their removal
can not deteriorate the classification.
3.6.1 Reducts
A minimum set of attributes that preserves the indiscemibility relation is called a
reduct. In supervised learning, the reduct relative to decision attribute is useful.
Hence examples are discussed for decision relative reduct only. The relative
reduct of the attribute set P, P c Q, with respect to the dependency y P (Q) is
defined as a subset RED(P, Q) s;; P such that:
YRED(P,Q) = yp(Q)
i.e. relative reduct preserves the degree of inter attribute dependency.
For any attribute a e RED(P,Q), YRED(I',QJ-{al (Q) < YP (Q), i.e. the relative reduct
is a minimal subset of attributes with respect to the dependency of one.
Finding a minimal reduct (i.e. reduct with a minimal cardinality of
attributes among all reducts) is NP-hard [SR92]. Number of reducts of an
information system with m attributes may be equal to m (L ~ 2J} It appears that

computing the reducts is a non-trivial task that can not be solved simply by
increase of computational resources. It is, in fact, one of the bottlenecks of the
rough set methodology. Fortunately, there exists an efficient algorithm to compute
a single relative reduct in linear time [SR92, Joh74]. Genetic algorithms [Gol89]
are also used for simultaneous computation of many reducts in often acceptable
time, unless the number of attributes is very high [Wro95, Wro98, BK97].
45
3.6.2 Global and Local Reducts
A reduct if not explicitly mentioned as local reduct is called a global reduct.
Local reducts are also called value reducts. They are based on the fact that we can
often remove some values of the attributes without affecting the consistency of the
information system. Like global reducts, decision relative local reducts are useful
for supervised learning.
3.6.3 Core
The intersection of all relative reducts is called a relative core. In other words,
each element of the core belongs to some reduct. In a sense, the core is the most
important subset of attributes, for none of its elements can be removed without
affecting the classification power of attributes
Example 3.5
For Table 3.1, we have two reducts {Temperature, Headache} and {Temperature,
Muscle-pain}, with respect to attribute Flu. It means that either the attribute
Headache or the attribute Muscle-pain can be eliminated from the table and
consequently instead of Table 3.1 we can use either Table 3.2 or Table 3.3.
Core of dataset in Table 3.1, with respect to Flu is {Temperature}.
46
Table 3.2: Flu Data with Reduced Condition Attributes Headache and
Temperature·
Patient Headache Temperature Flu
pl No high yes
p2 Yes high yes
p3 Yes very high yes
p4 No normal no
p5 Yes high no
p6 No very high yes
Table 3.3: Flu Data with Reduced Condition Attributes Muscle-Pain and
Temperature
Patient Muscle-pain Temperature Flu

pl Yes high yes
p2 No high yes
p4 Yes normal no
p5 No high no
47
Example 3.6
By removing some values of the attribute Headache, Table 3.2 and Table 3.3 can
be simplified as shown in Table 3.4 and Table 3.5 respectively.
Value reducts as observed from Table 3.4 are {(Headache, no),
(Temperature, high)}, {(Headache, yes), (Temperature, high)}, {(Temperature,
normal)} and {(Temperature, very high)}
Value reducts as observed from Table 3.5 are {(Muscle-pain, yes),
(Temperature, high)}, {(Muscle-pain, no), (Temperature, high)}, {(Temperature,
normal)} and {(Temperature, very high)}.
Table 3.4: Flu Data with Reduced Values of Attribute Headache
Patient Headache Temperature Flu

pl No high yes
p2 Yes high yes
p3 - very high yes
p4 - normal no
p5 Yes high no
p6 - very high yes
Table 3.5: Flu Data with Reduced Values of Attribute Muscle-Pain
Patient Muscle-pain Temperature Flu

pl Yes high yes
p2 No high yes
p3 - very high yes
p4 - normal no
p5 No high no
p6 - very high yes
48
3.6.4 Reduct Co1nputation
To understand the methodology of reduct computation, we will use discernibility
matrix which is defined next. LetS be an information system with n objects. The
discernibility matrix of S is a symmetric n x n matrix with entries ciJ as given
below,
c!i ={a E Q I a(x;) -:t- a(x)} for i,j = l, .. ,n (3.4)
Each entry thus consists of the set of attributes upon which objects Xi and x1 differ.
A discernibility function.fs for an information systemS is a Boolean function of m
Boolean variables a\ ... ,a "m (corresponding to the attributes a1, ... ,am) defined as
follows,
A Boolean POS function as defined above can often be considerably simplified
while fully preserving the function's semantics. First of all, duplicate sums can be
eliminated since Boolean algebras have the property of multiplicative
idempotence, meaning that a.a = a for all members a. If the function has n
product terms, this can be done by a simple scan and sort procedure which is
bounded by the sorting step, typically O(n log n). Furthermore, a sum that
includes ("is a superset of') another sum in the function can be safely eliminated
since in Boolean algebras a . (a + b) = a for all members a, b. This property is
called absorption. Absorption can be carried out naively in O(n 2 ) time, but sub
quadratic algorithms exist [Prit95]. The set of all prime implicants 1 of Is
1
An implicant of a Boolean function/is any conjunction of literals (variables or
their negations) such that if the values of these literals are true under an arbitrary
valuation v of variables then the value of the function/under vis also true. A
49
determines the set of all reducts of S. In other words, all constituents in the
minimal disjunctive normal form of the function f(a ·,, ... ,a *mJ are all reducts of
In order to compute the value core and value reducts for an object x1, the
discernibility matrix as defined before is used and the discernibility function is
slightly modified,
Relative reducts and core can also be computed using discernibility matrix, which
needs slight modification,
ciJ =={a E Q I a(x;) -:F- a(x1)} for i,j == l, .. ,n and w(xh xj) (3.7)
where w(xi'x1) =X; E POSA(D) and x 1 ~ POSA(D) or

x, ~ POSA(D) and x 1 E POSA(D) or
X;,x, E POSA(D) and (xi'x1) ~ IND(D)
Recall that A is the set of condition attributes and D is the set of decision
attributes. Ifthe partition defined by Dis definable by A then the condition w(xi,xj)
in the above definition can be reduced to (xi, x) ~ IND(D). Thus entry Cu· is the set
of all attributes which discern objects xi and x1 that do not belong to the same
equivalence class of the relation IND(D). The D-core (decision relative core) is ·
the set of all single element entries of the ciiscernibility matrix i.e.
COREo(A) =={a E A: cij == {a},for some xi'xj }.
prime implicant is a minimal implicant. Here we are interested in implicants of

monotone Boolean functions only i.e. functions constructed without negation.
50
Similarly, D-reduct (decision relative reduct) is the minimal subset of
attributes that discerns all equivalence classes of the relation INDs(D) discernible
by the whole set of attributes. Every modified discernibility matrix defines
uniquely a discernibility function as before. All constituents in the minimal
disjunctive normal form of the discernibility function are all decision relative
reducts of A. In this study, decision relative reducts are extensively used, hence
following example will illustrate the steps for its computation ..
Example 3.7
To compute the decision relative reduct and core of Table 3.1.
Using 3.7, first we construct the decision relative discernibility matrix as shown.
Note that objects having the same decision are not compared among themselves.
Also the matrix is symmetrical with respect to the diagonal; hence only upper half
of the matrix needs to be considered for defining discernibility function using 3.5.
p2 p3 p4 p5 p6
pl t hm
p2 .hmt r.p
p3 ht mt
p4 t hmt ht t
p5 hm r.p mt hmt
p 6\ t
fA(D) = (t) (h+m+t) (h+t) (h+m) (m+t) (t) (h+m+t)
= (t) (h+m+t) (h+t) (h+m) (m+t) (Idempotent law)
= t (h+m) (Absorption law: "h+m+t", "m+t", "h+t" is super set of"t")
= ht+mt
51
Thus a function in POS form is simplified to a function in SOP form. Each
product term in this simplified form is the prime implicant of the function. Thus
decision relative global reducts of this function are {h, t} and {m, t} corresponding
to Table 3.2 and Table 3.3 respectively. Decision relative core (intersection of all
the reducts) of the Table 3.1 is {t}. By observing single element entries in the
decision relative discernibility matrix also qualifies t as the core.
The problem of computing a minimal reduct is NP-hard [SR92] but
approximation algorithms can be used to obtain knowledge about reduct sets.
Approximation algorithms do not give an optimal solution but have the acceptable
time complexity e.g. algorithms based on simulated annealing and Boltzmann
machines, genetic algorithms and algorithms using neural networks. We have
adopted genetic algorithms for generating large number of reducts and Johnson's
algorithm for computation of a single reduct in this thesis.
3. 7 Patterns and Rule Discovery

Rules can be perceived as data patterns or formulae that represent relationships
between attribute values. The most primitive pattern and hence the fundamental
building block for generating rules is called a selector. A selector is simply an
expression a = v where a E Q and v E Va. Patterns can be combined in a recursive
manner in order to form more complex patterns by means of the propositional
connectives {., +, ~, --,}, denoting conjunction, disjunction, implication and
negation respectively. The type of pattern that is most commonly considered in an
information system S is the conjunction of selectors, formed by overlaying the set
of attributes A over an object x E U and reading off the values of x for every
ae Reduct. Two commonly associated numerical measl.,lres associated with pattern
a are support and coverage. Support(a) refers to the number of objects in the
52
information system that have the property described by pattern a and coverage(a)
denotes the proportion of objects in U that match the description given by a.
Similarly as a decision system is a specialized type of information system, a
decision rule is a specialized type of pattern that specifies a relationship, possibly
a probabilistic one, between a set of conditions and a conclusion or decision. LetS
denote a decision system, and let o. denote a conjunction of selectors that only
involve attributes inS. Furthermore, let f3 denote a selector d = v, where v is any
allowed decision value. The decision rule read as if o. then f3 is denoted by a-+ f3.
The pattern o. is called the rule's antecedent, while the pattern f3 is called the rule
's consequent. RS theory provides mechanism to discover decision rules by
reading the attribute values from the reduced decision table using attributes in
reduct. In practical applications, where the rules are used to classify unseen
objects, reduct approximations are typically employed instead of proper reducts.
Like patterns, rules also have some numerical measures like accuracy, coverage
stability associated with them.
sunport(a.fJ)
accuracy( a ~ fJ) = -----=-r....:_____:.___:__;_ (3.8)
support( a)
support(a.jJ)
cove rage( a ~ fJ) = --=--=----____:.---'--~ (3.9)
support(fJ)
A graphical display of the relationship between accuracy and coverage can ·
be found in Figure 3.2. It is desirable for a rule to be accurate as well as to have a
high degree of coverage, although one does not necessarily imply others. Figure
3.3 shows graphically that as the antecedent of a decision rule grows longer, the
coverage decreases while the accuracy increases. Defining a point that balances
the trade-off between these two numerical rule measures can be difficult in
53
Xo
1- --,
0
I I x1
& ~----, I_ I
u
0
;.
0 ·:___II 1- ------------,I
~ I
0
.....l r----------1-- - - - - - - - - - - - 1 I
I I I I
I I I I
I I I I
0 .I
I
I
I
I~ -
I I
"'....
0
;.
I I I I
0 I I_ -----------1-
I I
~ L..---------- ------------'
Figure 3.2: Description of four decision rules ai--+l3i over a binary

decision domain, i.e., l3i =(d=O) or l3i =(d=1 ). Each dashed set
represents support (ai), the set of objects that match the rule's
antecedent ai.
IaI
Figure 3.3: [Coverage and Accuracy] vs. lal

As the length of the antecedent of a decision rule increases, the rule
becomes more specific and less general. As a result, the coverage
decreases while the accuracy increases. Finding a suitable balance
between the trade-off between coverage and accuracy can be difficult
in practice.
54
practice, and is also a function of the application domain. Bazan explores the issue
of stability of a rule in details [Baz98].
Example 3.8
To produce the decision rules for Table 3.1, we make use of the reducts obtained
in Example 3.7. Using reduct {Headache, Temperature}, the rules are:
1. If Headache= no and Temperature= high then Flu= yes (1, .25);

2. If Headache= yes and Temperature= high then Flu= yes (0.5, 0.25);
3. If Headache= yes and Temperature= high then Flu= no (0.5, 0.5);
4. If Headache= yes and Temperature= very high then Flu= yes (1, 0.25);
5. If Headache= no and Temperature= normal then Flu= no (1, 0.5);
6. If Headache= no and Temperature= very high then Flu= yes (1, 0.25);
Similarly using reduct {Muscle-pain, Temperature), we get
1. If Muscle-pain= yes and Temperature= high then Flu= yes (1, 0.25);
2. If Muscle-pain= no and Temperature= high then Flu= yes (0.5, 0.25);
3. If Muscle-pain= yes and Temperature= very high then Flu= yes (1, 0.5);
4. If Muscle-pain= yes and Temperature= normal then Flu= no (1, 0.5);
5. If Muscle-pain= no and Temperature= high then Flu= no (0.5, 0.5);
The figures in the parenthesis refer to accuracy and coverage of the rule
respectively. In subsequent chapters, decision rules using rough set theory
concepts are obtained using the methodology explained in Example 3.8. On closer·.
observation of the rules generated in the above case there exists scope for
improvement on the number of selectors and the number of rules e.g. rule 1 and
rule 2 for both the reducts can be replaced by a single rule "lfTemperature =high
then Flu = yes". This results in reducing the length of the rule as well as
improving the accuracy and coverage. This issue is addressed in the next chapter.
55
3.8 Some Other Terms and Concepts
3.8.1 Rough Membership
The rough membership function is a function that when applied to object x
quantifies the degree of relative overlap between the set X and the equivalence [x]
class to which x belongs. It is defined as follows:
The rough membership function can be interpreted as a frequency based estimate
of Pr(x EX I x, B), the conditional probability that object x belongs to set X, given
knowledge of the information signature of x with respect to attributes B [PWZ88].
3.8.2 Variable Precision Rough Set Model

A generalized model of rough set called Variable Precision Rough Set (VPRS)
model, aimed at modelling classification problems is presented in [Zia93a,
Zia93b]. The papers introduce the VPRS model, and demonstrate how it can be
used as a tool for data analysis. The primary advantage of a VPRS model is the
ability to recognize the presence of data dependencies in situations where data
items are considered independent by the original rough sets model.
The formulae for the lower and upper set approximations can be
generalized to some arbitrary level of precision :r e{ ~, 1] by means of the rou~h
membership function, as shown below.
fl.1rX ={xI J-Lf (x) ~ :r}

B1rX = {xl J-Lf(x) > 1-:r}
Note that the lower and upper approximations as originally formulated are
obtained as a special case with :r = 1.0 . A rough set X defined through the lower
56
and upper approximations !11rX and B1rX is also referred to as variable precision
rough set and can be seen as a way of thinning the boundary region [Zia93a,
Zia93b].
3.8.3 Approximate Reducts

Any subset B of A can be treated as approximate reduct of A, and the number
t: (B)= (y(A, D)- y(B, D))= _ y(B, D)

1
(A,D) y(A, D) y(A, D)
denoted simply as e(B), will be called an error of reduct approximation. It
expresses how exactly the set of attributes B approximates the set of condition
attributes A (relative to decision attributes D).
The concept of an approximate reduct is a generalization of the concept of
a proper reduct and is useful in cases when a smaller number of condition
attributes is preferred over accuracy of classification on training data. This can
allow increasing the classification accuracy on testing data. The error level of
reduct approximation should be tuned for a given data set to achieve this effect.
One of the techniques for reduct approximation is based on the approximation of
the positive region. For R E RED(A,d) and N equal to the number of objects in the
decision system S, the following algorithm computes this kind of reduct
approximation [SP97].
Algorithm ApproximateReduct
Step 1: Calculate positive regions POS R-faJ for all a E R.
Step 2: Choose from the reduct R one attribute a satisfying the condition:
\:fa e R POS 11 _1a.,l ~ POS 11 _1al'
Step 3: If POS 11 _1al > k.N (e.g.k =0.9 etc.) then
57
Begin
R = R-{ao }, go to step 1
End
Step 4: The new set of attributes (R) is called the approximate reduct.
Approximate reduct can help to extract interesting rules from decision
tables. Applying reduct approximation instead of reduct, decrease the quality of
the classification of objects from the training set, but it results in more general
rules with a higher quality of classification for new objects.
3.9 Computational Complexity of Rough Set Tools

Any of the rough set tools like lower approximation, upper approximation,
positive regions, reduct etc., as described in this chapter, can be straight forward
computed from the discernibility matrix with space complexity O(kn 2) and time
complexity O(kn 2) where n is the number of objects and k is the number of
attributes of data table. These methods will not be feasible for large data sets with
this complexity. Hoa and Son [HS96] presents efficient algorithms for computing
rough set tools. There is no need to store the discernibility matrix in
implementation of their methods. They propose the algorithms for computing of
positive regions, lower and upper approximations in O(kn log n) time using O(n)
space. Problem of computation of reduct can be implemented in O(Kn log n)
complexity and O(kn) space complexity (Chapter 4). Their approach can be\
applied for synthesis of efficient algorithms for discretization of numeric attributes
in large tables (Chapter 6).
3.10 Summary
In this chapter, the methodology of classical rough set theory for knowledge
discovery or rule generation is sketched by introducing many relevant terms and
58
definitions. Using a very small hypothetical dataset numbers of examples are
presented. It has been shown that RS can be treated as tool for data table analysis.
Reduct is the minimal set of attributes to preserve the indiscemibility relation in
an information system. Many algorithms are available for computation of reducts.
Some popular extensions of the classical rough set model e.g. variable precision
rough set model and approximate reducts are also introduced. Efficient heuristics
have been discussed to compute rough set tools thus making them suitable for
analysis of large data tables.
59

Rough Set Concepts: I Formation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rough Set Concepts: I Formation

Uploaded by

Copyright:

Available Formats

Chapter 3

Rough Set Concepts

analysis is to synthesize approximation of concepts from the acquired data. This

object of the universe of discourse we associate some information. For example,.if

information about patients. In view of the available i~formation objects

characterized by the same values of the corresponding attributes are indiscernible

(similar). The indiscernibility relation generated in this way is the mathematical

contains all objects which possibly belong to the concept.

3.2 Information System and Decision Table

above is called an information system, attribute-value table or information· table.

Formally, an information systemS is a 4-tuple, S = (U, Q, V, f) where U a non-

attributes q for every object in the universe U.

In many applications, there is an outcome of classification that is known.

This posterior knowledge is expressed by one distinguished attribute called

Formally, a decision table is an information system where Q = (A u D). A is the

set of categorical attributes and D is a set of decision attributes. In RS, the

decision table represents either a full or partial dependency occurring in data.

(patients), whereas entries of the table are attribute values.

Patients p2, p3 and p5 are indiscernible with respect to the attribute

Headache, Patients p3 and p6 are indiscernible with respect to attributes Muscle-

to attributes Headache, Muscle-pain and Temperature. Hence, for example, the

Considering Flu as the decision attribute and, Headache, Muscle-pain and

Temperature as condition attribute, the table may be called as decision system.

Table 3.1: Example of an Information System: Flu Dataset

Patient Headache Muscle-pain Temperature Flu

3.3 lndiscernibility Relation

generated by information about objects of interest. The indiscernibility relation is

objects employing the available information. It means that, in general, we are

unable to deal with single objects, a fundamental concept of RS theory. Formally,

indiscernibility relation denoted by JND is defined as,

INDs (P) = {(x, y) E U xU: f(x, a)= f(y, a) 'r7 aE P}

respectively. The subscript s may be omitted in INDs(P) if information system is

partitions are denoted by UIIND(P). An equivalence class of IND(P), i.e., the

block of the partition U/IND(P), containing object xis denoted by P(x).

Thus, UIIND {Headache}= {{pl, p4, p6}, {p2,p3,p5}}

Similarly, UIIND {Muscle-pain}= {{pl,p3,p4,p6}, {p2,p5} },

UIIND {Headache, Muscle-pain}= {pl,p4,p6}, {p2,p5}, {p3}} and

UIIND {Flu}= {{pl,p2,p3,p6}, {p4,p5} }.

3.4 Approximation of Sets

the membership status of each object in U with respect to X. Indiscernibility

without ambiguity. Consequently, description of such a set X may not be possible.

Therefore, the description of X is defined in terms of P-lower approximation

(denoted as E..) and P-upper approximation (denoted asP). For PI;; Q

f_X = u{Y e U I IND(P): Y ~X} (3.1)

with respect toP.

The objects in EX can be with certainty classified as members of X on the

basis of knowledge in P, while the objects in P X can only be classified as

possible members of X on the basis of knowledge in P. The set

certainty classified as do not belonging to X on the basis of knowledge in P.

are indiscernible with respect to the attributes Headache, Muscle-pain and

Temperature, hence Flu can not be characterized in terms of the attributes

knowledge. The remaining Patients p 1, p3 and p6 display symptoms which enable

the displayed symptoms. Thus, if P = {Headache,Muscle-pain,Temperature} then

f_ (Flu== yes)= {p1,p3,p6}, P (Flu= yes)= {p1,p2,p3,p5,p6}, BN(P) = {p2,p5}

f_ (Flu= no)= {p4}, P (Flu= no)= {p2,p4,p5}, BN(P) = {p2,p5}

The above approximation is visualized as presented in Figure 3.1.

3.4.1 Properties of Approximations

(1) f_(X) c X~ P(X)

(5) X c Y implies f_(X) ~ f_(Y) and P(X) ~ P(Y)

(6) P(X u Y) ~ f_(X) u f_(Y)

In the above properties -X denotes U-X.

Figure 3.1: Approximating the set of flu patients using