Professional Documents
Culture Documents
Construct Rough Approximation Based On GAE: Lin Shi Yang Zhou
Construct Rough Approximation Based On GAE: Lin Shi Yang Zhou
Abstract—Recently, cloud computing has emerged as a new granulation. Generally, GrC involves a number of theories
paradigm which focuses on web-scale problems, large data and techniques and acts as a method dealing with massive
centers, multiple models of computing and highly-interactive data mining, solution of complex problems, fuzzy
web applications. It is high available and scalable for information processing, etc. Rough set theory, quotient space
distributed and parallel data storage and computing based on theory and fuzzy information granulation constitute three
a large amount of cheap PCs. As the representative product, primary research models of GrC. Furthermore, any methods
Google app engine (GAE), which acts a platform as a service or theories based on grouping, classification and clustering
(PaaS) cloud computing platform, mainly contains Google File can fall into GrC theory. In 1979, the first concept about
System (GFS) and MapReduce programming model for
information granularity was proposed by Zadeh [10]. He
massive data process. This paper analyses GAE from the point
thought “information granularity”, as a new concept, exists
of Granular computing (GrC) and explain why it is suitable for
massive data mining. Further we present an example of how to in many domains, while it appears in different forms. Hobss
use it to construct neighborhoods of rough set and compute directly used “granularity” as the title of his research and
lower and upper approximations accurately and strictly. discussed the divide and conquer of granules [11]. Lin
proposed “neighborhood system” in 1988 and studied the
Keywords-Clouding computing; Google app engine; relationship between neighborhood system and database
Granular computing; Rough set [12]. Since then, a lot of related work has been carried out.
This paper will analyze GAE in detail with the theory of
I. INTRODUCTION GrC to present the reason why it is a suitable platform for
massive data mining. We will focus on the distributed
Google app engine (GAE) [1] is a prominent and storage and computing mechanism in the bottom of GAE,
representative platform of cloud computing [2]. It possesses which has similar concept of “divide and conquer” with GrC.
the idea of platform as a service (PaaS) [3]. Since the first At last, a brief example is given to show how to construct
version was released in 2008, it has caused lots of attention neighborhoods of rough set and compute lower and upper
in computer science. As a parallel computing framework, it approximations based on GAE.
has been evaluated to have commendable support on The remainder of this paper is organized as follows. In
development of computationally intensive high-performance section 2, some basic concepts and definitions about GrC,
computing (HPC) [4] algorithms and applications. Data rough set and neighborhood are introduced. The frameworks
submitted by users are stored in a Google distributional of GFS and MapReduce model are elaborated and analyzed
Bigtable [5] using the Google File System (GFS) [6]. with GrC theory in section 3 and section 4. In section 5,
Besides, it employs MapReduce programming model [7, 8] computing methods of granulating, lower and upper
to handle massive data computing. GAE supports approximations are proposed with MapReduce model of
development of web applications for both scientific GAE. Finally, section 6 concludes this paper.
researchers and companies and is suitable for the mining of
massive internet data. II. GRANULAR COMPUTING MODEL AND ROUGH
Granular computing (GrC) [9] is a theory which SET THEORY
simulates human thinking while solving complicated GrC is a computing paradigm and conceptual theory of
problems. Since the concept was proposed last century, information processing. The main content of GrC is
researchers have taken a keen interest on information granulating the universe of problem and gaining basic
260
Fig. 1. The architecture of Google file system.
261
TABLE II. DECISION TABLE
Object C1 C2 D
x1 c1 c2 d1
x2 c2 c3 d2
x3 c3 c1 d2
x4 c2 c3 d2
x5 c1 c2 d1
x6 c3 c2 d2
x7 c3 c1 d1
x8 c1 c1 d2
A. Information Granulation
Information granulation is a significant part of GrC
issues, and it can be hard with massive data. Neighborhood
construction is a procedure of granulation on rough set.
During the past decades, many rough sets based
algorithms have been proposed. However, the enlarging data
make them a challenging task. A previous study has proofed
commendable behavior of MapReduce job on the
computation of equivalence classes and rough set
approximations[16]. In this instance, we will try to extend
the equivalence relation to tolerance relation.
Firstly, we proposed a parallel MapReduce method to
construct neighborhood for each object on universal U.
Algorithm 1. Map reduce function for neighborhood
construction.
Map function input:
key: document name
value: Si =(Ui, C∪D, V, f)
Output:
Fig. 3. A MapReduce programming example analyzed with GrC theory. <key’, value’>pairs, where key’ is the information set of an
object belonging to the condition attribute set C and value’
denotes object ID.
V. INSTANCE ANALYSIS begin
Recently, the data explosion is emerging and becoming for each xi∈ Ui, do
hottest topic in computer realm. Traditional computing let value’=ID of xi
method faces huge challenges as the data explosion. Cloud let key’=∅
computing with high available and scalable computing for each tempc1 in g(fC1 (xi)) do
ability and distributed storage structure has been the most for each tempc2 in g(fC2 (xi)) do
famous method. Now we present how GAE deals with let key’= tempc1, tempc2
massive data mining. The computing of rough set output.collect(key’, value’)
approximation issues is used as a simple example. end
A decision table S = (U, A, V, f) is given in Table Ⅱ, end
where C1, C2 are the condition attributes and D is the end
decision attribute. end
Assuming that there is tolerant relation on Object.
fCj(xi) denotes the jth condition attribute of object xi,
Where TC1={{c1,c2}, {c2,c3}} , short for TC1={{c1,c1}, g(fCj(xi)) gets the value set within condition attribute Cj
{c2,c2}, {c3,c3}, {c1,c2}, {c2,c1}, {c2,c3} , {c3,c2}} , which has tolerant relation with fCj(xi).
TC2={{c1}, {c2}, {c3}} and TD={{d1}, {d2}}. Obviously, Reduce function input:
TC2 and TD satisfiy equivalence relation and can be key: information set of the object respecting to the condition
regarded as a special tolerant relation.
attribute set C
262
value: object ID detail process on the computing of rough set approximations
Output: using MapReduce model of GAE.
<key’, value’>pairs, where key’ information set of the object Firstly, a parallel algorithm is conducted to construct
respecting to the condition attribute set C and value’ denotes associations between neighborhoods on condition attribute
object ID set. and decision attribute. Detailed algorithm is shown as
follows:
begin
Algorithm 2. Map reduce function for association
let key’=key
construction.
let value’=∅
for each ID do Map function input:
let value’= value’ ∪ ID key: document name
end value: Si =(Ui, C∪D, V, f)
output.collect(key’, value’) Output:
end <key’, value’>pairs, where key’ is information of attribute
set and value’ is ∅.
begin
In Map function, objects are partitioned into
for each xi∈ Ui, do
neighborhoods respectively within different splits. In
let key’= fC1(xi)+ fC2(xi)+ fD(xi)
Reduce Function, neighborhoods with the same information
let value’= ∅
set are combined together. Results are shown as follows:
output.collect(key’, value’)
C 1' = < c1 , c1 > , C 1" = { x 8 } ; end
C 2' = < c1 , c 2 > , C 2" = { x1 , x 5 }; end
C 3' = < c1 , c 3 > , C 3" = { x 2 , x 4 } ; Reduce function input:
//key: information of attribute set
C 4' = < c 2 , c1 > , C 4" = { x 3 , x 7 , x 8 }; //value: ∅
C 5' = < c 2 , c 2 > , C 5" = { x1 , x 5 , x 6 } ; Output:
<key’, value’>pairs, where key’ is associated condition and
C 6' = < c 2 , c 3 > , C 6" = { x 2 , x 4 };
decision granules and value’ = True.
C 7' = < c 3 , c1 > , C 7" = { x 3 , x 7 }; begin
C 8' = < c 3 , c 2 > , C 8" = { x 6 }; let key’=h(key)+g(key)
let value’=True
C 9' = < c 3 , c 3 > , C 9" = { ∅ }. output.collect(key’, value’)
Each line in the results can be viewed as a information end
granule, where Ci’ is a label of this granule and Ci” is a h(key) gets the condition granule tag and g(key) gets the
collection of objects contained in it. The neighborhood of decision granule tag according to the content of a subject.
object xi based on condition attribute set C can be found
using the results above. For example, the value of condition We use Ass(Ci’,Dj’)=True, which means Ci’ and Dj’ are
from object x1 is <c1,c2> which equals C2’, so associated, to denote each output line of Reduce Function.
N C ( x1 ) = C2" = {x1 , x5 } . Thus, the final result is Ass(C2’,D1’)=True,
Ass(C6 ,D2 )=True, Ass(C7 ,D2 )=True, Ass(C8’,D2’)=True,
’ ’ ’ ’
In this way, massive data are sorted and gathered further Ass(C7’,D1’)=True, Ass(C1’,D2’)=True.
analysis and processing. At the same time, the neighborhood In this way, the associations between condition and
construction method will benefit from the MapReduce decision granules are confirmed, and the rough set
model and gain high speedup, scaleup and sizeup ability. approximations computation issue can be transformed from
Next, in order to compute the approximate of each massive data to a smaller neighborhood set of tags. Next we
granule on decision attribute, the universe U is granulated implement a serial algorithm to compute approximations of
on D. Similar parallel process can be implemented as each decision granule.
Algorithm 1 using decision attribute D to replace condition With massive data, if we compute the approximations
attribute C. Thus, we get the following results: directly, memory may overflow since neighborhoods besed
on condition attributes and decision attribute both contain
D1' =< d1 >, D1" = {x1 , x5 , x7 }; too many objects. Hence, we use an special algorithm for
D2' =< d 2 >, D2" = {x2 , x3 , x4 , x6 , x8 }. computation of upper and lower approximations of each
decision granule.
Algorithm 3. A serial algorithm to compute approximations
B. Rough Set Approximations Computation of decision granule according to association tags.
In section A we get the neighborhood on both condition
attribute and decision attribute. This subsection will give the
263
Input: VI. CONCLUSION
Condition and decision granule tags and associations Cloud computing has emerged as a popular method
between them. dealing with massive data mining. With information
Output: explosion from the internet, traditional computing methods
apr R ( Di" ), apr R ( Di" ). can not satisfy people's needs anymore. GAE, as a
forerunner of cloud computing, has impeccable abilities
begin facing the distributed storage and computing challenges.
for each Di” do GrC is a method which simulates human thinking. It
apr R ( Di" ) = ∅ simplifies the complicated problems using “divide and
for each Ck’ do conquer” methods and has been used in many academic
if (Ass(Ck’, Di’)=True) then sectors. When GrC crashes cloud computing, we believe
they will both make great progresses.
apr R ( Di" ) = apr R ( Di" ) ∪ Ck"
end ACKNOWLEDGMENT
end This paper is supported by the Natural Science
output( apr R ( Di" ) ) Foundation of Liaoning Province of China (No.
end 20130200029), the National Students' Innovation and
for each Ck’ do Entrepreneurship Training Program of China
if(Ass(Ck’, D1’)=True and Ass(Ck’, D2’)=True) then (No.201210141014).
Set Ck’.bool=False
REFERENCES
end
[1] M. Malawski, M. Kuźniar, P. Wójcik, M. Bubak, "How to Use
end Google App Engine Free Computing ", IEEE Internet Computing, vol.
for each Di” do 17, 2013, pp. 50-59.
apr R ( Di" ) = ∅ [2] M. Viktor, K. Marcel. H. Marius," High
’ performance cloud computing", Future Generation Computer
for each Ck do Systems, vol. 29, August, 2013, pp. 1408-1416.
if(Ass(Ck’,Di’)=True and Ck’.bool≠False) then [3] G. Lawton, "Developing software online with platform-as-a-service
apr R ( Di" ) = apr R ( Di" ) ∪ C k" technology", Computer, 2008, pp. 3-15.
[4] D. Kevin, C. R. Severance, and M.K.Loukides, High performance
end computing, vol. 2, Sebastopol, CA:O'Reilly & Associates, 1998.
end [5] F. Chang, J. Dean,S. Ghemawat, W. C. Hsieh, et al., "Bigtable: A
output( apr ( Di" ) ) distributed storage system for structured data", ACM Transactions on
R
Computer Systems, 2008.
end [6] S. Ghemawat, G. Howard, and S. T. Leung, "The Google file system",
end 19th ACM SIGOPS Operating Systems Review, vol. 37, ACM, 2003.
We explain the detailed process of computing upper and [7] J. Dean, and G. Sanjay, "MapReduce: simplified data processing on
lower approximations for D1” and D2”. When we compute the large clusters", Communications of the ACM, 2008, pp. 107-113.
upper approximation of D1”, we have the associations [8] J. Ekanayake, P. Shrideep, and F. Geoffrey, "Mapreduce for data
intensive scientific analyses", IEEE Fourth International Conference
Ass(C2’,D1’)=True and Ass(C7’,D1’)=True. The object sets of on e-Science, 2008, pp. 7-12.
C2” and C7” are combined to the upper approximation set. So,
[9] T. Y. Lin, "Granular computing: practices, theories, and future
we gain the upper approximation apr R ( D1" ) = {x1 , x3 , x5 , x7 } directions", Computational Complexity. Springer New York, 2012,
for D1”.Similarly for D2”, we have the associations pp. 1404-1420.
Ass(C6’,D2’)=True, Ass(C7’,D2’)=True, Ass(C8’,D2’)=True [10] L. A. Zadeh, Fuzzy sets and information Amsterdam: North-Holland
Publishing, 1979.
and Ass(C1’,D2’)=True, and finaly obtain the upper
[11] J. R. Hobbs, "Granularity", Proceedings of the Ninth International
approximation set apr R ( D2" ) = { x2 , x3 , x4 , x6 , x7 , x8 } . For the Joint Conference on Artificial Intelligence, 1985.
lower approximation set of D1”,while scanning each Ck’, the [12] T. Y. Lin, "Neighborhood systems and relational databases",
boolean value of C7’ is set to False. In this way, two Proceedings of the 1988 ACM sixteenth annual conference on
associations exist with D1”, one of which has False boolean Computer science, 1988.
value. Thus, we only add C2” to the lower approximation and [13] Z. Z. Shi, Z. Zheng, and Z. Q. Meng, "Image segmentation-oriented
tolerance granular computing model", IEEE International Conference
get the final result apr ( D1" ) = {x1 , x5 } . Using the same on Granular Computing, 2008.
R
method we get the lower approximation set of D2”: [14] Z. Pawlak, "Rough sets-theoretical aspect of reasoning about data",
Proceedings of Kluwer Academic Publishers, 1991.
apr R ( D2" ) = { x2 , x4 , x6 , x8 } .
[15] Y. Y. Yao and T. Y. Lin, "Generalization of rough sets using modal
logic", Intelligent automation and soft computing, 1996, pp. 103-120.
[16] J. Zhang, T. Li, D.Ruan, et al. A parallel method for computing rough
set approximations. Information Sciences, 2012, pp.209-223.
264