Professional Documents
Culture Documents
2011 Data Mining
2011 Data Mining
2011 Data Mining
Page(s): 5 - 21
Abstract
.A variety of emerging online data delivery applications challenge existing techniques for data delivery
to human users, applications, or middleware that are accessing data from multiple autonomous servers.
In this paper, we develop a framework for formalizing and comparing pull-based solutions and present
dual optimization approaches. The first approach, most commonly used nowadays, maximizes user
utility under the strict setting of meeting a priori constraints on the usage of system resources. ... »
Page(s): 22 - 36
Abstract
.We consider the problem of data stream classification, where the data arrive in a conceptually infinite
stream, and the opportunity to examine each record is brief. We introduce a stream classification
algorithm that is online, running in amortized {cal O}(1) time, able to handle intermittent arrival of
labeled records, and able to adjust its parameters to respond to changing class boundaries (“concept
drift”) in the data stream. In addition, when blocks of labeled data are short... »
3) Coupling Logical Analysis of Data and Shadow Clustering for Partially Defined Positive Boolean
Function Reconstruction
Page(s): 37 - 50
.The problem of reconstructing the and-or expression of a partially defined positive Boolean function
(pdpBf) is solved by adopting a novel algorithm, denoted by LSC, which combines the advantages of two
efficient techniques, Logical Analysis of Data (LAD) and Shadow Clustering (SC). The kernel of the
approach followed by LAD consists in a breadth-first enumeration of all the prime implicants whose
degree is not greater than a fixed maximum d. In contrast, SC adopts an effective heuristic procedu... »
Page(s): 51 - 63
.We study the following problem: A data distributor has given sensitive data to a set of supposedly
trusted agents (third parties). Some of the data are leaked and found in an unauthorized place (e.g., on
the web or somebody's laptop). The distributor must assess the likelihood that the leaked data came
from one or more agents, as opposed to having been independently gathered by other means. We
propose data allocation strategies (across the agents) that improve the probability of identifying leak...
»
Tsang, Smith Kao, Ben Yip, Kevin Y. Ho, Wai-Shing Lee, Sau Dan
Page(s): 64 - 78
.Traditional decision tree classifiers work with data whose values are known and precise. We extend
such classifiers to handle data with uncertain information. Value uncertainty arises in many applications
during the data collection process. Example sources of uncertainty include measurement/quantization
errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item
is often represented not by one single value, but by multiple values forming a probability d... »
Page(s): 79 - 94
Page(s): 95 - 109
Natural phenomena show that many creatures form large social groups and move in regular patterns.
However, previous works focus on finding the movement patterns of each single object or all objects. In
this paper, we first propose an efficient distributed mining algorithm to jointly identify a group of
moving objects and discover their movement patterns in wireless sensor networks. Afterward, we
propose a compression algorithm, called 2P2D, which exploits the obtained group movement patterns
to
Zhu, Xiaofeng Zhang, Shichao Jin, Zhi Zhang, Zili Xu, Zhuoming
.Missing data imputation is a key issue in learning from incomplete data. Various techniques have been
developed with great successes on dealing with missing values in data sets with homogeneous attributes
(their independent attributes are all either continuous or discrete). This paper studies a new setting of
missing data imputation, i.e., imputing missing data in data sets with heterogeneous attributes (their
independent attributes are of different types), referred to as imputing mixed-attribut... »
Abstract
We address issues related to the protection of private information in Online Analytical Processing
(OLAP) systems, where a major privacy concern is the adversarial inference of private information
from OLAP query answers. Most previous work on privacy-preserving OLAP focuses on a single
aggregate function and/or addresses only exact disclosure, which eliminates from consideration an
important class of privacy breaches where partial information, but not exact values, of private data is
disclosed ...