Professional Documents
Culture Documents
Data Compression and Aggregation
Data Compression and Aggregation
Data Aggregation
When looking at data-centric networking in isolation, all messages still have to be delivered
to all sinks. The real power of concentrating on data lies in the ability to operate on the data
while it is transported in the network. The simplest example of such in-network processing is
aggregation of data – computing a smaller representation of a number of messages that is
equivalent (or at least suitably represents) in its content to all the individual messages – and
only forwarding such aggregates through the network. Computing a mean or the maximum of
the measured values of all sensors is a typical case in point. The actual benefits of such
aggregation depend on the location of the data sources, relative to the data sink. Intuitively,
when all data sources are spread out, the paths to the sinks do not intersect,and there is little if
any opportunity to aggregate data at some intermediate nodes. If, on the other hand, the data
sources are all nearby – for example, when they all observe an event at a certain place – and
they are located far away from the sink and their paths to the sink merge early on, the
expected benefits of aggregation are large (Figure 12.7). This is in fact often the case and the
intuition about resulting benefits is confirmed by results.The principal mechanics of data
aggregation are thus relatively straightforward: Data flows from sources to a sink along a
tree. Intermediate nodes in the tree apply some form of aggregation function to data they have
collected from some or all of their children. This aggregated value, possibly along with
additional administrative values (for example, the number of nodes that have contributed to a
mean value) is then forwarded. Apart from the tree formulation, data aggregation can also be
used in the context of gossiping data throughout the network.The efficacy of data aggregation
can be judged using different metrics
Accuracy: the difference between the resulting value at the sink and the true value – since
not all data is delivered to the sink any longer
Completeness :Potentially an operational approximation of accuracy is completeness , the
percentage of all readings that are included in the computation of the final aggregate at the
sink.
Latency: Aggregation can also increase the latency of reporting as intermediate nodes might
have to wait for data
Message overhead :The main advantage of aggregation lies, in the reduced message
overhead, which should result in an improved energy efficiency and network lifetime.
The computation becomes even simpler if only sum s and count c are exchanged between
nodes;the update rule is then simply < s,c > = < s1 + s2, c1 + c2 >. The actual average is
then only computed at the ultimate sink.