Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

DISTRIBUTED FRACTIONALIZED DATA

NETWORKS FOR DATA INTEGRITY


Arun Majumdar Govind Mohan
Virgil Systems Virgil Systems
Toronto, Ontario Toronto, Ontario
arun@virgilsystems.com gmohan@virgilsystems.com

Abstract—The world is being transformed by the onset of edge. Edge processing implies offloading of operational stor-
new high speed 5G Technologies that open the possibility of age and computations to the devices themselves [1]. Today’s
IoT networks at scale. This demands delivery guarantees and centralized software systems bottleneck services to the billions
coordinated distributed communications that are resistant to
damage and can self-heal under adversity. The speed of change of connected devices and maintenance of cloud server farms
is increasing with increased automation, artificial intelligence, become points of failure for the network.
information from multiple sources, integrated systems of sys- While blockchain and distributed ledger systems naturally
tems and emerging quantum technologies. Current distributed lend themselves to enabling distributed edge processing, IoT
consensus checking mechanisms are computationally intensive
and fail to scale along with these changes because of the
devices have limited computing power and need to minimize
complexity of proof of work calculations or the unnecessary need energy consumption. Conventional distributed ledger systems
to bind in domain specific elements such as cryptocurrencies. as in blockchains like Ethereum and Bitcoin suffer from the
Furthermore, these mechanisms are brittle in that small changes overhead of full replication of data across all devices in the
in messages can cause restarts or failure of integrity checks, or network for maintaining integrity through consensus: this is
they introduce domain specific elements (e.g. monetary design
that has little to do with integrity). We propose distributed
infeasible given the resource constraints of computation, stor-
ledgers as a pure technology coupled with a strong proof protocol age and power [2]. Recent innovations in ledger technology,
for exchanges, called “Proof of Integrity” without any need for such as LazyLedger [3] have improved performance in this
cryptocurrencies or other domain specific elements. Proof of area by using special codes to reduce the problem of block
Integrity provides distributed data guarantees and operational verification to data availability verification, allowing a node to
continuity through adversity or breakdowns while creating a
reliable and trustworthy layer for the application specificity of
verify a block by downloading just a portion of it. Virgil’s
domain specific elements. Proof of Integrity takes this concept further, by distribut-
ing data across the network as erasure codes and creating
I. I NTRODUCTION anonymity (resistance to censure) by distributed data mixing.
We demonstrate a new system for distributed data integrity This results in guaranteed immutability being preserved while
guarantees and resilience to damage and tolerance to adversity. also enabling full data integrity using novel erasure codes and
Virgil Systems is focused on edge-compute and edge-storage maximized availability by creating redundancies across the
concepts by integration of 100% data integrity with fully dis- network. Innovations in distributed messaging, such as Hedera
tributed data availability. Virgil’s 100% data integrity solution Hashgraph [4] have allowed for a novel availability solution
has three key features: Transmit and receive data with a Proof- elaborated below. Thus, an IoT network is able to withstand
Of-Integrity (POI); Antifragile Scalability to physical storage large network failures without losing any data.
capacity; and Identity based data binding.
Proof of Integrity leverages distributed ledgers and erasure
II. P2P I OT DATA T RANSACTIONS
code designs to ensure edge endpoints can prove data was not
degraded, faked or that losses occurred. Antifragility means A. Conventional Blockchains
that the system can self-heal and gets more resistant to damage
if attacked: our system uses a new holographic data concept Blockchain systems such as Ethereum and Bitcoin require
and distribution mechanism on the network so that content peers to have full copies of the block for transactions to
immutability (i.e. resistance to censure or attack), availability occur. A block has two components: a header that, at the
and scalability are handled by machine-learning processes as very least, contains a hash of the previous block, and the
smart-contracts. Identity based data binding means data is list of transactions included in the present block.This method
bound to its provenance, such as a person’s identity, and moves is computationally intensive: all peers need to verify the
with that identity while privacy preservation and traceability block order as well as transaction validity. LazyLedger [3]
are conserved. improves this scheme by requiring that only some peers
Most modern IoT networks require migrating storage and contain full transaction data while others operate solely using
computation to the cloud and to and from devices at the block headers (light clients).
A peer sending data divides the data into k shards,
each of which are converted to “holograms”, which are a
patented erasure code that can recover the original shard using
s; n/5 < s < n shares, where n is the total number of shares
the hologram contains. Each hologram can be reconverted
to a shard by a unique key, stored in a receipt file. Each of
the holograms is randomly distributed among peers, and the
randomized order is stored in the receipt file as well. Any
peer that requires the data receives the receipt file from the
The transaction data of a block is converted into a 2kx2k
sending peer, allowing it to reconstruct the data. Integrity
(k is the side length of the matrix arrangement of data,
is thus preserved by the novel erasure code. Immutability
doubled when Reed-Solomon is calculated) matrix of row-
is preserved as the holograms received by the peers from
and column-wise Reed-Solomon codes, and Merkle roots are
the sender are “superposed” with the existing data on that
computed for all rows and columns. Another Merkle root
peer (using an operation such as XOR). As a result, any
is computed from the latter roots, known as the data root.
binary data uploaded to the network acts as a smart contract
This data root and all row and column roots are added to
that modifies the state of the binary data spread across the
the block header. The light client can thus sample a random
network. Finally, availability is guaranteed as peers gossip the
number of shares s; 0 < s < (k + 1)2 where each share is a
data that they contain randomly with other peers. Network
member of the matrix and proves transaction validity given
data thus persists even if a substantial amount of the network
the availability of all s shares [5].
is destroyed.

III. N ETWORK DATA AVAILABILITY S OLUTION


A. Hashgraph Structure
The Hashgraph data structure was conceived as a means of
ensuring the fairness of transaction ordering in a distributed
ledger system. In Hashgraph based consensus, all information
a node knows is communicated to another randomly selected
node, known as gossip. A hashgraph is maintained by all
nodes, which contains an order of who gossiped to whom.
B. Virgil Systems Data Integrity Solution The hashgraph itself is gossiped along with transaction
The Virgil Systems Data Integrity Solution takes a similar information, in other words, there is gossip about gossip. In
approach of erasure codes but differs in avoiding block this way, all nodes can keep track of the information known
ordering and maintenance completely, instead favoring by all other nodes. Further, accountability of transaction
Cryptographic Record Chaining (CRC) and network data ordering is possible by verifying exactly when a node was
entanglement using binary operations to maintain network made aware of transactions between other nodes.
data immutability.
B. Virgil Systems Availability
Network availability is ensured even when many nodes are
eradicated. As mentioned above, data shards are randomly
distributed to nodes across the network. These nodes contact
K nodes in the network every t units of time and entangle
all the data they contain with these nodes, using a binary
operation such as XOR. The period t and the number of
redundancy nodes K are network parameters. Nodes maintain
a hashgraph to record what order data is sent to nodes, and
this hashgraph is gossiped so there is overall tracking of state.
Using this, the network can ensure that data is appropriately
gossiped in order to survive catastrophic events. In case of
such an event, nodes can use the hashgraph to reconstruct
any lost data shards using the entangled holograms. Network
data thus persists even if a substantial amount of the network
is destroyed.

ACKNOWLEDGMENT
We are grateful and acknowledge the critical inputs provided
by the Creative Destruction Lab at the University of Toronto,
Dr. Joshua Gans and Dr. Barney Pell.

R EFERENCES
[1] M. Chiang and T. Zhang. Fog and iot: An overview of research
opportunities. IEEE Internet of Things Journal, 3(6):854–864, Dec 2016.
[2] T. M. Fernández-Caramés and P. Fraga-Lamas. A review on the use of
blockchain for the internet of things. IEEE Access, 6:32979–33001, 2018.
[3] Mustafa Al-Bassam. Lazyledger: A distributed data availability ledger
with client-side smart contracts, 2019.
[4] Leemon Baird. Hedera: A public hashgraph network & governing council,
Sep 2019.
[5] Mustafa Al-Bassam, Alberto Sonnino, and Vitalik Buterin. Fraud and
data availability proofs: Maximising light client security and scaling
blockchains with dishonest majorities, 2018.

You might also like