Data Compression and Aggregation

Compression technologies for WSN
Communication components consume most of the energy in WSNs. Computation

uses less. Therefore, it becomes attractive to deploy data compression techniques, which
might increase computational energy somewhat, but decrease the number of packet
transmissions. Several features of WSNs make it possible to implement effective data
compression protocols: (1) Usually, the data collected in neighboring sensor nodes are
correlated, especially when the deployment of sensor nodes is quite dense in the network; (2)
due to the treelike logical topology of most WSNs, the correlation may become more
apparent on the path from the sensor nodes to the sink; (3) the occurrence of an event may be
assimilated with a continuous-time but random process, and sampling of the random
processes helps extract information content from the process; (4) the application semantic
may enable data aggregation or data fusion; and (5) the tolerance of applications for possible
errors in data may make it possible to reduce data reading and reporting frequencies.
Compression techniques include the following:
1.Information theoretic–based techniques such as distributed source coding using syndromes
(DISCUS) . This is a distributed compression scheme for a dense microsensor network, is
based on Slepian–Wolf coding ,and does not require conversion. Since most WSNs consist of
sensor nodes in a treelike topology where the root is the sink, information is compressed or
encoded at each node incorporating the correlation with data from its parent node. The
decompression or decoder process can be performed by the sink or jointly by the sensor
nodes and the sink.
2. Data aggregation–based compression schemes such as tiny aggregation service for ad hoc
sensor networks (TAG) . TAG realizes several semantic-based aggregations such as MIN,
MAX, and SUM, in an application dependent manner. This approach would not be helpful
for applications that have no such semantic expressions. A problem in this approach is the
location of the aggregation point.
3. Sampling of a random process. If an application tolerates a certain level of error, sensor
nodes can adaptively reduce sampling frequency
Data Aggregation
When looking at data-centric networking in isolation, all messages still have to be delivered
to all sinks. The real power of concentrating on data lies in the ability to operate on the data
while it is transported in the network. The simplest example of such in-network processing is
aggregation of data – computing a smaller representation of a number of messages that is
equivalent (or at least suitably represents) in its content to all the individual messages – and
only forwarding such aggregates through the network. Computing a mean or the maximum of
the measured values of all sensors is a typical case in point. The actual benefits of such
aggregation depend on the location of the data sources, relative to the data sink. Intuitively,
when all data sources are spread out, the paths to the sinks do not intersect,and there is little if
any opportunity to aggregate data at some intermediate nodes. If, on the other hand, the data
sources are all nearby – for example, when they all observe an event at a certain place – and
they are located far away from the sink and their paths to the sink merge early on, the
expected benefits of aggregation are large (Figure 12.7). This is in fact often the case and the
intuition about resulting benefits is confirmed by results.The principal mechanics of data
aggregation are thus relatively straightforward: Data flows from sources to a sink along a
tree. Intermediate nodes in the tree apply some form of aggregation function to data they have
collected from some or all of their children. This aggregated value, possibly along with
additional administrative values (for example, the number of nodes that have contributed to a
mean value) is then forwarded. Apart from the tree formulation, data aggregation can also be
used in the context of gossiping data throughout the network.The efficacy of data aggregation
can be judged using different metrics
Accuracy: the difference between the resulting value at the sink and the true value – since
not all data is delivered to the sink any longer
Completeness :Potentially an operational approximation of accuracy is completeness , the
percentage of all readings that are included in the computation of the final aggregate at the
sink.
Latency: Aggregation can also increase the latency of reporting as intermediate nodes might
have to wait for data
Message overhead :The main advantage of aggregation lies, in the reduced message
overhead, which should result in an improved energy efficiency and network lifetime.
Categories of aggregation operations

Aggregation operations can be distinguished according to the representation of
intermediate results and according to the properties of the actual aggregation function.
Representation of intermediate results

When computing aggregates in an intermediate node, it is, in general, insufficient to only
communicate the result of the actual aggregation function between nodes. An evident
example is the computation of an average.To compute an average, a tuple < average, count >
or, briefly, < a,c > should be exchanged between nodes The tuples referred to as partial
state records.
A new partial state record can be computed in an intermediate node as
The computation becomes even simpler if only sum s and count c are exchanged between
nodes;the update rule is then simply < s,c > = < s1 + s2, c1 + c2 >. The actual average is
then only computed at the ultimate sink.
Placement of aggregation points

When collecting data toward a sink along a tree or along a routing structure such as the one
resulting from directed diffusion, the aggregation points have to be well placed for maximum
benefit. Aggregation should happen close to the sources and many sinks should be aggregated
as early as possible – the tree should have, figuratively, long trunks and bushy leaves.
Directed diffusion does not necessarily result in a tree, but it is well suited to aggregation and,
similar to the tree case, aggregation should happen as early as possible.
If the routing structure is grown without regard to the later aggregation, the resulting structure
is not necessarily optimal. The aggregation is, in a sense, opportunistic. consider how to
influence the directed diffusion routing structure so as to optimize the aggregation benefits.
To ensure that aggregation points are placed near the sources, a simple variation where,
Initially, an energy-efficient path between a source and a sink is constructed, using the usual
exploration and reinforcement rules of directed diffusion. Additional sources join this tree by
searching for the shortest path to this tree. This scheme is implemented using local
interactions.
Implicitly constructing good aggregation trees is also considered by looking at the
consequences of different rules concerning which node is chosen as a parent in the converge
cast tree out of the set of neighboring nodes that have issued invitations to join the tree. The
simplest rule is to use the first such node from which an invitation has been received
(resulting in a breadth-first-search-like tree), to randomly pick one, the nearest node first, or a
weighted randomization. As the tree construction immediately determines the placement of
the aggregation points, these rules can make a considerable difference. Indeed, the authors
show that none of these rules simultaneously achieves good network reliability, latency, and
data aggregation ability and that a compromise has to be struck.
When to stop waiting for more data

When aggregating data, an intermediate node, as well as the sink, has to decide how
long to wait for data from each of its children in a convergecast tree. In the simplest case, a
node knows which of its neighbors are its children (by means of an acknowledgment of the
invitation messages during tree formation) and waits for answers from all of them. This can,
however, take a long time because of fluctuations in the radio channel with ensuing high error
rates, temporary node failures,or simply because of a very imbalanced tree. Waiting a long
time will result in more data entering the computation of the aggregate and thus to higher
accuracy but it will also increase delay and, potentially, energy consumption because of the
required idling of the radio receiver. A compromise has to be found. A relatively simple
scheme, where the times for each hop are essentially regarded as a constant. Here, rules to set
timer values based on a maximum waiting time of the source are described.

Data Compression and Aggregation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Compression and Aggregation

Uploaded by

Copyright:

Available Formats

Compression technologies for WSN

Communication components consume most of the energy in WSNs. Computation

Categories of aggregation operations

Representation of intermediate results

Placement of aggregation points

When to stop waiting for more data

You might also like