Professional Documents
Culture Documents
Distributed DBMS
Distributed DBMS
Applications
We target in-memory DBMSs, analytics, business intelligence, data mining, data
warehousing, and other read-intensive applications with discrete structured data; plus
wireless devices, mobile apps, and infrastructures constrained by bandwidth, latency,
memory, and/or power.
Software Implementation
Typically, data structures are compressed and encrypted individually, and then used in that
form. Decompression is generally encapsulated and invisible above the lowest software
levels. Doing that sets a distinct level above which the host system remains unchanged.
Benefits
Xtreme Compression's proprietary technology reduces storage costs and speeds data
transfer and program execution. Here is how:
By concentrating more information into every physical word read from memory and
processed by the CPU, reducing the influence of the von Neumann bottleneck
In SAN/NAS-type applications, by reducing spindle count to save power
By reducing transmission time over links, networks, and other expensive-to-scale
infrastructure components
By storing more information in internal caches to improve data locality and cache
hit rates
By increasing effective disk cache size to retrieve more real information from hard
disk per read operation
By allowing storage on faster media (i.e., RAM vs hard disk, hard disk vs CD or DVD,
local hard disk vs network, etc.)
By permitting more access data structures and paths for a given amount of storage.
Repopulation
Repopulation is a structural method for compressing monotonic integer sequences in hash
tables and similar data structures. It populates table locations that would otherwise be
unused with subsequences that would otherwise occupy memory.
Unlike almost every other lossless compression method, repopulation is not a replacement
scheme. Instead, repopulation is transpositional and mechanistic; it works like a chessplaying automaton. It draws on no information-theoretic concepts. Repopulation
simultaneously achieves the access speed of a low load factor and the table compactness of
a high one, thus avoiding that historical compromise.
Superpopulation
Superpopulation is a variable-to-variable-length algorithm that compresses index tables,
lists, arrays, zerotrees, and similar data. It systematically accommodates wide local
variations in data statistics. Superpopulation may be used by itself or in conjunction with
repopulation.
Superpopulation recognizes that distributions of values in access data structures are often
far from random, having areas of high and low correlation. It works by classifying each such
area as one of two distinct target types, and applying a target type-specific encoding
method to each.
Wordencoding
Wordencoding is a 0-order (context-independent) variable-to-variable-length algorithm
for compressing text strings in database table record fields. It achieves compression close
to the 0-order source entropy without sacrificing speed. It does that by efficiently
maximizing effective combined data locality over compressed record fields, lexicons
holding strings, and access data structures. Wordencoding deals explicitly with the
structure and statistics of the data by recognizing that redundancy in text strings exists at
multiple levels of granularity.