Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Cryptographic Hash

Functions
What are Hash functions?
• Mathematical functions: A mapping of items (values) in the domain
to items (values) in the range.
• Hash functions are special mathematical functions that satisfy the
following three properties:
• Inputs (or items in the domain) can be any size (not-fixed); technically size of
input is not unbounded in practice
• Outputs (or items in the range) are fixed-size (a hash function such as SHA-
256 that has an output size of 256 bits)
• Efficiently computable, i.e., the mapping should be efficiently (in polynomial
time) computable
• A hash function is any function that can be used to map data of arbitrary size to fixed-size
values. The values returned by a hash function are called hash values, hash codes, digests,
or simply hashes. The values are usually used to index a fixed-size table called a hash table.
Use of a hash function to index a hash table is called hashing or scatter storage addressing.
• Hash functions and their associated hash tables are used in data storage and retrieval
applications
A hash function takes an input as a key, which is associated with a datum or
record and used to identify it to the data storage and retrieval application.
The keys may be fixed length, like an integer, or variable length, like a name.
In some cases, the key is the datum itself. The output is a hash code used to
index a hash table holding the data or records, or pointers to them.
A hash function may be considered to perform three functions:
•Convert variable length keys into fixed length (usually machine word length
or less) values, by folding them by words or other units using a parity-
preserving operator like ADD or XOR.
•Scramble the bits of the key so that the resulting values are uniformly
distributed over the key space.
•Map the key values into ones less than or equal to the size of the table

A good hash function satisfies two basic properties: 1) it should be very fast
to compute; 2) it should minimize duplication of output values (collisions).
Data Integrity and Source Authentication

• Encryption does not protect data from modification by another party.


• Why?
• Need a way to ensure that data arrives at destination in its original form as sent
by the sender and it is coming from an authenticated source.
Properties : Fixed length
Hello, world 661dce0da2bcb2d8
h 2884e0162acf8194

Fixed length Digest : L


This is a clear text that can
easily read without using the 52f21cf7c7034a20
h
key. The sentence is longer 17a21e17e061a863
than the text above.

• Arbitrary-length message to fixed-length digest


• A cryptographic hash function (CHF) is a
mathematical algorithm that maps data of arbitrary size (often called
the "message") to a bit array of a fixed size (the "hash value", "hash",
or "message digest"). It is a one-way function, that is, a function
which is practically infeasible to invert. Ideally, the only way to find a
message that produces a given hash is to attempt a brute-force
search of possible inputs to see if they produce a match, or use
a rainbow table of matched hashes. Cryptographic hash functions are
a basic tool of modern cryptography.
• An illustration of the potential use of a cryptographic hash is as
follows:
Alice poses a tough math problem to Bob and claims that she has
solved it. Bob would like to try it himself, but would yet like to be sure
that Alice is not bluffing. Therefore, Alice writes down her solution,
computes its hash, and tells Bob the hash value (whilst keeping the
solution secret). Then, when Bob comes up with the solution himself a
few days later, Alice can prove that she had the solution earlier by
revealing it and having Bob hash it and check that it matches the hash
value given to him before. (This is an example of a
simple commitment scheme; in actual practice, Alice and Bob will
often be computer programs, and the secret would be something less
easily spoofed than a claimed puzzle solution.)
• Cryptographic Hash Functions
• cryptographic hash is like a fingerprint
• extremely hard to find another person with the same left thumb
fingerprint
• fingerprint doesn’t disclose any information about the person other than
that particular fingerprint
• Digital information also has fingerprints
• Called as cryptographic hash
• hash means something that’s chopped into small pieces or mixed up
• How to create a cryptographic hash of file?
• send the file into a computer program called a cryptographic hash
function
• Why are cryptographic hash functions useful?
• Cryptographic hash functions can be used as an integrity check to detect
changes in data
The ideal cryptographic hash function has the following main
properties:
• it is deterministic, meaning that the same message always results in
the same hash
• it is quick to compute the hash value for any given message
• it is infeasible to generate a message that yields a given hash value
(i.e. to reverse the process that generated the given hash value)
• it is infeasible to find two different messages with the same hash
value
• a small change to a message should change the hash value so
extensively that the new hash value appears uncorrelated with the
old hash value (avalanche effect)
Example Cryptograph Hash

The output—the hash—is a 256-bit number; 256 bits equals 32


bytes because 1 byte consists of 8 bits which is tiny compared to
the size of the 1.21 MB cat picture
A bit is the smallest unit of information in a computer: 0 or 1. A
byte is 8 bits that together can take 256 different values;
hexadecimal, or hex Each byte is printed as two hex digits each
in the range 0–f, where a = 10 and f = 15
You can’t “reconstruct” the cat picture from just the hash –ONE WAY FUNCTION
How does a cryptographic hash function work?
Suppose you want to hash a file containing the six bytes a1 02
12 6b c6 7d. You want the hash to be a 1-byte number (8 bits).
You can construct a hash function using addition modulo 256,
which means to wrap around to 0 when the result of an addition
reaches 256
Integrity Checks using Hash Values
Cryptographically Secure Hash functions?
• Property 1: Deterministic
• No matter how many times you parse through a particular input through a hash function you will
always get the same result.
• Property 2: Quick Computation
• Hash function should be capable of returning the hash of an input quickly.
• Property 3: Pre-Image Resistance
• Given H(A) it is infeasible to determine A, where A is the input and H(A) is the output hash.
• Property 4: Small Changes In The Input Changes the Hash.
• Even if you make a small change in your input, the changes that will be reflected in the hash will be
huge.
• Property 5: Collision Resistant
• Given two different inputs A and B where H(A) and H(B) are their respective hashes, it is infeasible
for H(A) to be equal to H(B).
• Property 6: Puzzle Friendly
• For every output “Y”, if k is chosen from a distribution with high min-entropy it is infeasible to find
an input x such that H(k|x) = Y.
Cryptographically Secure Hash functions?
• In cryptography (and cryptography based applications such as bitcoins),
we are interested in a special type of hash function, often referred as a
cryptographically secure hash functions
• What is a cryptographically secure hash function?
• Satisfies the following additional security properties:
i. Collision resistant (Strong collision resistance)
ii. Hiding or Preimage resistant (One-way property)
iii. Second preimage resistant (Weak collision resistant)
The same input will always produce the same hash (of same
size 256 bits) and Slightly different inputs will produce very
different hashes (of same size 256 bits).
Collision-Resistance (Collision free)
• Means it should be hard to find two different inputs of any length that
result in the same hash. This property is also referred to as collision
free hash function.
• for a hash function h, it is hard to find any two different inputs x and y such that
h(x) = h(y).
• Since, hash function is compressing function with fixed hash length, it is
impossible for a hash function not to have collisions. This property of
collision free only confirms that these collisions should be hard to find.
• This property makes it very difficult for an attacker to find two input
values with the same hash.
Property 1: Collision Resistance

• Can’t find any two different messages with the same message digest
• Collision resistance implies second preimage resistance
• Collisions, if we could find them, would give signatories a way to repudiate their signatures
Property 1: Collision Resistance
• How to find a collision in a Secure Hash function with a 256 bit output?
• Strategy 1: Brute-force – Continue to randomly pick inputs and compute its Hash until you
find a collision.
• How long does this take?
• Worst-case - 2256 + 1 inputs
• On average – more than 50% chance of finding collision after 2128 inputs (Birthday paradox)
• More than 99.8% chance of collision after 2130 randomly chosen inputs
• Brute-force always works, no matter what H is, in finding collision. However it takes too long to
matter (2128 is a lot of tries!)
• Strategy 2: Find cryptographic or other weaknesses in hash functions
• Is the following function cryptographically secure H(x) = x mod 23 ?
• Most cryptographically secure has functions also have weaknesses. E.g., MD5 was considered to
be secure, until after many years of research collisions were found. SHA 256 (currently used
secure hash function) has no known attacks, but we don’t know it is secure!
No Hash function has proven to be collision-free or secure!
• How to find a collision?
• Usually, collision happens after sqrt(N), where N is total number of possible
ways
• For ex: For 256 bits output, N=2256
• try 2130 randomly chosen inputs
• 99.8% chance that two of them will collide
• This works no matter what H is … but it takes too long to matter
• How big is 2256?
• 2256 is about 1077
• 60 million hashes per second, and the expected number of tries needed to
find a solution is 2255. The result is 2255 / (60 × 106) s ≈ 1068 s ≈ 3 × 1061 years
• Even if we had 1 trillion computers and ran them concurrently, it would take
about 3 × 1049 years
• Is there a faster way to find collisions?
• For some possible Hashes, yes.
• For others, we don’t know of one.
• It is infeasible to find the input having same hash but not
impossible
• No Hash Function has been proven collision-free.
What is the Birthday Paradox?
• Assuming all days of the year have the same likelihood of having a birthday, the
chances of another person sharing your birthday is 1/365 which is a 0.27%.
• So, if you gather up 20-30 people in one room, the odds of two people sharing
the exact same birthday rises up astronomically.
• In fact, there is a 50-50 chance for 2 people of sharing the same birthday in this
scenario!
• Simple rule in probability:
• Suppose you have N different possibilities of an even happening, then you need square
root of N random items for them to have a 50% chance of a collision.
• So applying this theory for birthdays, you have 365 different possibilities of birthdays, so
you just need Sqrt(365), which is ~23, randomly chosen people for 50% chance of two
people sharing birthdays.
Pre-image resistance
• Means that it should be computationally hard to reverse a hash
function.
• if a hash function h produced a hash value z, then it should be a
difficult process to find any input value x that hashes to z.
• This property protects against an attacker who only has a hash value and
is trying to find the input.
Property 2: Hiding or Preimage resistant
• This measures how difficult to devise a message which hashes to the known digest
• Roughly speaking, the hash function must be one-way.
• Hiding or pre-image resistance: Given H(x), it is infeasible to find x

Given only a message digest, can’t find any


message (or preimage) that generates that digest.
Application of hiding: Commitments
• Commit to a value, reveal it later.
• Analogy: Want to “seal a value in an envelope”, and “open the envelope”
later.
• Commitment API
• Commit to a message and that returns two values - "commitment" and a
"key";
• commitment - envelope put on the table;
• key - secret key for unlocking the envelope;
• Allow someone else to verify, given the commitment and key so that the
message is as expected.
Property 3: Puzzle-friendliness or Second
preimage resistant
• Puzzle-friendliness property: For every possible output
value y, if k is chosen from a distribution with high min-
entropy, then it is infeasible to find x such that H(k|x) = y.
Second preimage resistant
• This measures how difficult to devise a message which hashes to the known digest and its
message
• Given one message, can’t find another message that has the same message digest.
• An attack finds a second message with the same message digest is a second pre-image
attack.
• It would be easy to forge new digital signatures from old signatures if the hash
function used weren’t second preimage resistant
Secure Hash Algorithm
• SHA originally designed by NIST & NSA in 1993
• Revised in 1995 as SHA-1
• US standard for use with DSA signature scheme
• standard is FIPS 180-1 1995, also Internet RFC3174
• based on design of MD4 with key differences
• produces 160-bit hash values
• recent 2005 results on security of SHA-1 have raised concerns on its use
in future applications
Secure Hash Algorithm
• SHA originally designed by NIST & NSA in 1993
• Revised in 1995 as SHA-1
• NIST issued revision FIPS 180-2 in 2002
• adds 3 additional versions of SHA
• SHA-256, SHA-384, SHA-512
• designed for compatibility with increased security provided by the AES
cipher
• structure & detail is similar to SHA-1
• hence analysis should be similar
• but security levels are rather higher
Examples of cryptographic hash functions
• MD 5:
• It produces a 128-bit hash. Collision resistance was broken
after ~2^21 hashes.
• SHA 1:
• Produces a 160-bit hash. Collision resistance broke after
~2^61 hashes.
• SHA 256:
• Produces a 256-bit hash. This is currently being used by
Bitcoin.
• Keccak-256:
• Produces a 256-bit hash and is currently used by Ethereum.
SHA Versions
SHA-1 SHA-224 SHA-256 SHA-384 SHA-512

Digest size 160 224 256 384 512

Message size < 264 < 264 < 264 < 2128 < 2128
Block size 512 512 512 1024 1024
Word size 32 32 32 64 64
# of steps 80 64 64 80 80
Secure Hash Algorithm
• SHA-256 is used in several different parts of the Bitcoin network:
• Mining uses SHA-256 as the proof-of-work algorithm.
• SHA-256 is used in the creation of bitcoin addresses to improve security and
privacy.
Construction of Hash functions
• Hash functions are typically constructed from fixed-input compression functions!
• Example: See construction of SHA-256 Hash function → SHA-256 used in Bitcoins
• Also referred to as Merkle-Damgard Transform
• Why does it work?
• Theorem: If c is collision-free, then SHA-256 is collision-free.
Padding (10* | length)
512 bits
Message Message Message
(block 1) (block 2) (block n)

256 bits 256 bits

c c c
IV Hash
SHA 256…
• it takes the message you're hashing, and it breaks it up into blocks that
are 512 bits in size. The message size, in general, isn't necessarily a
multiple of block size. To make it a multiple of block size, we will use
some kind of padding (i.e. a 1 followed by a certain number of 0)
• you start with the 256-bit value called the IV, specified in the standards
document and the first block. This 768-bits string goes through a special
function c (compression function) that outputs a 256-bits string
• Then the compression function (Merkle‐Damgard transform) is
applied to the concatenation of the first output and the second block
• The process is repeated until the end of the blocks, the hash is the final
256-bits output
One Compression function in SHA-256
• One compression function in SHA-
256 comprises
• a 256-bit block cipher with 64
rounds,
• a key expansion mechanism
from 512 to 2048 bits, and
• a final set of eight 32-bit
additions.
Application of SHA-256 in bitcoin

You might also like