Professional Documents
Culture Documents
Egpv Data Mining
Egpv Data Mining
Tao Li
University of /X Dong
University of /X Dong
ZDQJ_yilei2000@.com
/LWDRB@VLQDFRPFQ
I. INTRODUCTION
Generally, data mining (sometimes called data or
knowledge discovery) is the process of analyzing data from
different perspectives and summarizing it into useful
information that can be used to increase revenue, cuts costs,
or both. It allows users to analyze data from many different
dimensions or angles, categorize it, and summarize the
relationships. Technically, data mining is the process of
finding correlations or patterns among dozens of fields in
large relational databases. But the pure data mining may
induce some problems[1] that cant be solved in the scope of
the traditional mathematics. So in this paper, the fuzzy set
emerges as the solution of these problems.
II. THE CONCEPT OF E-GOVERNMENT
1-4244-1092-4/07/$25.00 2007IEEE.
774
775
xj
rij
xi
~ r
r r r
, and let R =( ij )n x n x n x n ij = ji , ij =1(i,
rij
1 m
xik xki
M k=1
i<>j
M>0
M>=
max( xik x jk )
i j
k =1
C. Cluster analysis
The cluster analysis has three methods: equivalence close
bag law, most great number method and weave network law,
the most commonly used is the biggest tree. This method is
utilized when n is very big. When the work load is
presented under the state that the index multiple increases,
it makes use of fuzzy matrix to carry on a kind of method of
the cluster directly, and the concrete measure is the
following:
(1) For the summit pinnacle according to the target that is
classified, when
(2) Let
rij
rij
ID
001
002
003
004
005
department
5
3
5
5
4
economy
3
4
2
3
5
vocation
2
5
3
1
3
(5) Let [0,1] , cut the line which upper value is smaller
than the line segment, and it have left what has been
joined there is targets belong to every one under level.
'. Forecast
To every mode that is received during cluster analysis,
try to achieve the average index of this mode according to
the lower type:
Time
5
2
5
1
2
/p
ki
i=1,2,s;j=1,2m
Mod eij=
S shows that all modes are counted, k shows that this mode
has in data warehouses several records are put out, P show
that introduces the total amount of records of this mode.
The sample waiting to be predicted Y(y1, y2,yn) is
N a fuzzy sub collection in talking about land X of sample
and compares with mode which data classify in the
warehouse and ask and publish their pressing close to
degree:
(X, Mod ei)=(1/2)[XMod ei+(1-X Mod ei)
776
CONCLUSION
R =
0.8 0.4 . 1 0.5 0.5
0.5 0.4 0.5 1 0.6
777