Professional Documents
Culture Documents
Easy There Entropy: Coding, Crypto, Culture, Cosmos
Easy There Entropy: Coding, Crypto, Culture, Cosmos
Easy There Entropy: Coding, Crypto, Culture, Cosmos
Radix trees are nice because they allow keys that begin with the
same sequence of characters to have values that are closer together
in the tree. There are also no key collisions in a trie, like there might
be in hash-tables. They can, however, be rather inefficient, like when
you have a long key where no other key shares a common prefix.
Then you have to travel (and store) a considerable number of nodes
in the tree to get to the value, despite there being no other values
along the path.
Ok. So this all sounds fine and dandy, and you probably read about
it here (https://github.com/ethereum/wiki/wiki/%5BEnglish%5D-
Patricia-Tree) or here
(https://wanderer.github.io/ethereum/nodejs/code/2014/05/21/using-
ethereums-tries-with-node/), or if you’re quite brave, here
(http://gavwood.com/Paper.pdf), but let’s get down and dirty with
some python examples. I’ve set up a little repo on github
(https://github.com/ebuchman/understanding_ethereum_trie) that
you can clone and follow along with.
git clone
git@github.com:ebuchman/understanding_ethereum_trie
Basically I just grabbed the necessary files from the pyethereum repo
(trie.py, utils.py, rlp.py, db.py), and wrote a bunch of exercises as
short python scripts that you can try out. I also added some print
statements to help you see what’s going on in trie.py, though due to
recursion, this can get messy, so there’s a flag at the top of trie.py
allowing you to turn printing on/off. Please feel free to improve the
print statements and send a pull-request! You should be in the trie
directory after cloning, and run your scripts with python
exercises/exA.py, where A is the exercise number. So let’s start
with ex1.py.
k, v = state.root_node
print 'root node:', [k, v]
print 'hp encoded key, in hex', k.encode('hex')
root hash
15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc
root node: [' \x01\x01\x02', '\xc6\x85hello']
hp encoded key, in hex: 20010102
Note the final 6 nibbles are the key we used, 010102, while the first
two give us the HP encoding. The first nibble tells us that this is a
terminator node (since it would be 10 in binary, so the second least
significant bit is on), and since the key was even length (least
significant bit is 0), we add a second 0 nibble.
state = trie.Trie('triedb',
'15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex'))
print state.root_node
state.update('\x01\x01\x02',
rlp.encode(['hellothere']))
print state.root_hash.encode('hex')
print state.root_node
So that’s not all that interesting, but it’s nice that we didn’t overwrite
the original entry, and can still access both using their respective
hashes. Now, let’s add an entry that use’s the same key but with a
different final nibble (ex2b.py):
state.update('\x01\x01\x03',
rlp.encode(['hellothere']))
print 'root hash:', state.root_hash.encode('hex')
k, v = state.root_node
print 'root node:', [k, v]
print 'hp encoded key, in hex:', k.encode('hex')
print state._get_node_type(state.root_node) ==
trie.NODE_TYPE_EXTENSION
common_prefix_key, node_hash = state.root_node
print state._decode_to_node(node_hash)
print
state._get_node_type(state._decode_to_node(node_hash))
== trie.NODE_TYPE_BRANCH
root hash:
b5e187f15f1a250e51a78561e29ccfc0a7f48e06d19ce02f98dd61159e81f71d
root node: ['\x10\x10\x10',
'"\x01\xab\x83u\x15o\'\xf7T-
h\xde\x94K/\xba\xa3[\x83l\x94\xe7\xb3\x8a\xcf\n\nt\xbb\xef\xd9']
hp encoded key, in hex: 101010
True
['', '', [' ', '\xc6\x85hello'], [' ',
'\xcb\x8ahellothere'], '', '', '', '', '', '', '', '',
'', '', '', '', '']
True
Ok, so that was pretty cool. Let’s do it again but with a key equal to
the first few nibbles of our original key (ex2c.py):
state.update('\x01\x01', rlp.encode(['hellothere']))
Again, we see that this results in the creation of a branch node, but
something different has happened. The branch node corresponds to
the key ‘\x01\x01’, but there is also a value with that key
(‘hellothere’). Hence, that value is placed in the final (17th) position
of the branch node. The other entry, with key ‘\x01\x01\x02’, is
placed in the position corresponding to the next nibble in its key, in
this case, 0. Since it’s key hasn’t been fully exhausted, we store the
leftover nibbles (in this case, just ‘2’) in the key position for the node.
Hence the output:
state.update('\x01\x01\x02\x57',
rlp.encode(['hellothere']))
In this case, the opposite of what we just saw happens! The original
entry’s value is stored at the final position of the branch node, where
the key for the branch node is the key for that value
(‘\x01\x01\x02’). The second entry is stored at the position of it’s
next nibble (5), with a key equal to the remaining nibbles (just 7):
Tada! Try playing around a bit to make sure you understand what’s
going on here. Nodes are stored in the database according to the
hash of their rlp encoding. Once a node is retrieved, key’s are used
to travel a path through a further series of nodes (which may
involve more hash lookups) to reach the final value. Of course,
we’ve only used two entries in each of these examples to keep things
simple, but that has been sufficient to expose the basic mechanic of
the trie. We could add more entries to fill up the branch node, but
since we already understand how that works, let’s move on to
something more complicated. In exercise 3, we will add a third
entry, which shares a common prefix with the second entry. This
one’s a little longer, but the result is totally awesome (ex3.py):
state = trie.Trie('triedb',
'15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex'))
print state.root_hash.encode('hex')
print state.root_node
print ''
state.update('\x01\x01\x02\x55',
rlp.encode(['hellothere']))
print 'root hash:', state.root_hash.encode('hex')
print 'root node:', state.root_node
print 'branch node it points to:',
state._decode_to_node(state.root_node[1])
print ''
Nothing new yet. Initialize from original hash, add a new node with
key '\x01\x01\x02\x55'. Creates a branch node and points to it
with a hash. We know this. Now the fun stuff:
state.update('\x01\x01\x02\x57',
rlp.encode(['jimbojones']))
print 'root hash:', state.root_hash.encode('hex')
print 'root node:', state.root_node
branch_node = state._decode_to_node(state.root_node[1])
print 'branch node it points to:', branch_node
We’re doing the same thing – add a new node, this time with key
'\x01\x01\x02\x57' and value 'jimbojones'. But now, in our
branch node, where there used to be a node with value
'hellothere' (ie. at index 5), there is a messy ole hash! What do
we do with hashes in tries? We use em to look up more nodes, of
course!
next_hash = branch_node[5]
print 'hash stored in branch node:',
next_hash.encode('hex')
print 'branch node it points to:',
state._decode_to_node(next_hash)
And the output:
root hash:
17fe8af9c6e73de00ed5fd45d07e88b0c852da5dd4ee43870a26c39fc0ec6fb3
root node: ['\x00\x01\x01\x02',
'\r\xca6X\xe5T\xd0\xbd\xf6\xd7\x19@\xd1E\t\x8ehW\x03\x8a\xbd\xa3\xb2\x92!
\xae{2\x1bp\x06\xbb']
branch node it points to: ['', '', '', '', '', ['5',
'\xcb\x8ahellothere'], '', '', '', '', '', '', '', '',
'', '', '\xc6\x85hello']
root hash:
fcb2e3098029e816b04d99d7e1bba22d7b77336f9fe8604f2adfb04bcf04a727
root node: ['\x00\x01\x01\x02', '\xd5/\xaf\x1f\xdeO!
u>&3h_+\xac?
\xf1\xf3*\xb7)3\xec\xe9\xd5\x9f2\xcaoc\x95m']
branch node it points to: ['', '', '', '', '',
'\x00&\x15\xb7\xc4\x05\xf6\xf3F2\x9a(N\x8f\xb2H\xe75\xcf\xfa\x89C-
\xab\xa2\x9eV\xe4\x14\xdfl0', '', '', '', '', '', '',
'', '', '', '', '\xc6\x85hello']
hash stored in branch node:
002615b7c405f6f346329a284e8fb248e735cffa89432daba29e56e414df6c30
branch node it points to: ['', '', '', '', '', [' ',
'\xcb\x8ahellothere'], '', [' ', '\xcb\x8ajimbojones'],
'', '', '', '', '', '', '', '', '']
Ok! So this has been pretty cool. Hopefully by now you have a
pretty solid understanding of how the trie works, the HP encoding,
the different node types, and how the nodes are connected and refer
to each other. As a final exercise, let’s do some look-ups.
state = trie.Trie('triedb',
'b5e187f15f1a250e51a78561e29ccfc0a7f48e06d19ce02f98dd61159e81f71d'.decode('hex'))
print 'using root hash from ex2b'
print rlp.decode(state.get('\x01\x01\x03'))
print ''
state = trie.Trie('triedb',
'fcb2e3098029e816b04d99d7e1bba22d7b77336f9fe8604f2adfb04bcf04a727'.decode('hex'))
print 'using root hash from ex3'
print rlp.decode(state.get('\x01\x01\x02'))
print rlp.decode(state.get('\x01\x01\x02\x55'))
print rlp.decode(state.get('\x01\x01\x02\x57'))
And that’s that! Now, you might wonder, “so, how is all this trie
stuff actually used in ethereum?” Great question. And my repository
does not have the solutions. But if you clone the official pyethereum
repo, and do a quick grep -r 'Trie' . , it should clue you in.
What we find is that a trie is used in two key places: to encode
transaction lists in a block, and to encode the state of a block. For
transactions, the keys are big-endian integers representing the
transaction count in the current block. For the state trie, the keys are
ethereum addresses. It is essential for any full node to maintain the
state trie, as it must be used to verify new transactions (since
contract data must be referenced). Unlike bitcoin, however, there is
no need to store old transactions for verification purposes, since
there is a state database. So technically the transaction tries don’t
need to be stored. Of course, if no one keeps them around, then no
one will ever be able to verify from the genesis block up to the
current state again, so it makes sense to hang on to them.
REPORT THIS AD
Advertisements
REPORT THIS AD
Uncategorized
REPLY
2. aranad
AUGUST 9, 2014 AT 12:00 PM
Awesome thanks!
REPLY
3. Pingback: Interactive Pyethereum Demo | Easy There Entropy
4. jiehua
JUNE 10, 2015 AT 12:47 PM
Like it!
REPLY
5. Christoph Jentzsch (@ChrJentzsch)
AUGUST 27, 2015 AT 9:17 AM
Nice. This link:
https://github.com/ethereum/wiki/wiki/%5BEnglish%5D-
Patricia-Tree should be replaced by
https://github.com/ethereum/wiki/wiki/Patricia-Tree
REPLY
6. Christoph Jentzsch (@ChrJentzsch)
AUGUST 27, 2015 AT 9:20 AM
And the github repo should be public:
git clone
git@github.com:ebuchman/understanding_ethereum_trie
Cloning into ‘understanding_ethereum_trie’…
Permission denied (publickey).
fatal: Could not read from remote repository.
REPLY
◦ work2heat
AUGUST 27, 2015 AT 3:54 PM
Thanks Christoph! I accidentally moved the repo – should be
back now. Sorry about that!
REPLY
7. Bruce Smith
SEPTEMBER 17, 2015 AT 7:23 AM
Thanks. I tried running the examples but am having trouble
finding sha3 anywhere on the net so I am unable to.
REPLY
8. Pingback: Деревья Меркла в Эфириуме | Ein Buch für Alle
und Keinen
11. Bert
FEBRUARY 10, 2016 AT 6:08 PM
Hi,
could you please explain what the internal key is good for once it
has been created?
Since the internal code may change, as new updates of the tire
may relocate items form terminal nodes to non-terminal nodes it
is not obvious to me why and how the internal key is used later
on.
The client code will use the normal key (the non-hex-prefixed
one) as the client code cannot know in advance where the item is
placed within the data structure.
REPLY
◦ work2heat
FEBRUARY 11, 2016 AT 12:43 AM
Hi Bert,
REPLY
12. Hamish MacEwan
APRIL 30, 2016 AT 11:44 PM
Hi,
REPLY
13. Hamish MacEwan
MAY 2, 2016 AT 3:23 AM
Hi,
REPLY
◦ work2heat
MARCH 10, 2017 AT 4:21 PM
Hello,
REPLY
14. Tadhg
NOVEMBER 20, 2016 AT 4:29 PM
Thanks for this!
Tadhg
REPLY
15. Pingback: Sharding FAQ - Cyber Capital
16. spharish24
MARCH 10, 2017 AT 12:58 PM
Hi Ethan,Can u explain why, if the length is even, we are
appending an extra zero?In this picture ”
https://i.stack.imgur.com/YZGxe.png ” , what does ‘3☐’ signify?
REPLY
◦ work2heat
MARCH 10, 2017 AT 4:24 PM
I’m not sure we are appending an extra zero. I believe it’s just
notation for the extension node that holds the value. Note the
number under “prefix” in that diagram refers to the legend.
The 3[] says its a leaf node with an odd number of nibbles. If I
recall, the even/oddness is related to the hex-prefix encoding.
Hope that helps.
REPLY
BLOG AT WORDPRESS.COM.