B4. Cryptographic Hash Functions

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Cryptographic Hash

Functions

Rajesh Palit, Ph.D.


Electrical and Computer Engineering
North South University
Data Integrity and Message Authentication

• Encryption does not protect data from modification by another party.


• Why?
• Need a way to ensure that data arrives at destination in its original
form as sent by the sender and it is coming from an authenticated
source.
Hash Functions
• Hash functions form an unique image of a message, condenses
arbitrary message to fixed size, h = H(M)
• The hash can be applied to any size of message to produce a fixed size
output
• The hash is easy and efficient to compute but is computationally
infeasible to invert
• Usually assume hash function is public
• Cryptographic hash functions are hash functions with additional
security requirements
Main Idea
• Hash function H is a lossy compression
function
• Collision: H(x)=H(x’) for some inputs x≠x’
• H(x) should look “random”
• Every bit (almost) equally likely to be 0 or 1
• A cryptographic hash function must have
certain properties
Hash Function Properties
Given a function h: X Y, then we say that h is:
• Preimage resistant (one-way): If given y Y it is computationally
infeasible to find a value x X such that h(x) = y
• Second Preimage Resistant (weak collision resistant): If given x  X it
is computationally infeasible to find a value x’  X, such that x’ x and
h(x’) = h(x)
• Collision Resistant (strong collision resistant): If it is computationally
infeasible to find two distinct values x’, x  X, such that h(x’) = h(x)
One-Way Function
• Intuition: hash should be hard to invert
• “Preimage resistance”
• Given a random, it should be hard to find any x such that h(x)=y
• y is an n-bit string randomly chosen from the output space of the hash function, i.e.,
y=h(x) for some x
• How hard?
• Brute-force: try every possible x, see if h(x)=y
• SHA-1 (a common hash function) has 160-bit output
• Suppose we have hardware that can do 230 trials per sec
• Assuming 234 trials per second, can do 289 trials per year
• Will take 271 years to invert SHA-1 on a random image
Weak Collision Resistance
• Given a randomly chosen x, hard to find x’ such that h(x)=h(x’)
• Attacker must find collision for a specific x… by contrast, to break collision
resistance, enough to find any collision
• Brute-force attack requires O(2n) time
• Weak collision resistance does not imply collision resistance (why?)
Birthday Paradox
• n people The probability of having a “collision”
when n items are selected among k
• Suppose each birthday is a random number possible outcomes is:
taken from k days (k = 365) – how many 𝑃𝑛𝑘
possibilities? 𝑝 𝑘, 𝑛 = 1 − 𝑛
𝑘
• kn - samples with replacement The problem parallels the number of
• How many possibilities that all are different? people you need in a room to find two
• (k)n = k(k-1)…(k-n+1) - samples without replacement people with the same birthday – it
only requires 23 people (approx.
• Probability of no repetition?
𝒌! 365) to get better than 50% (0.507)
• (k)n/kn =
𝒌−𝒏 !𝒌𝒏 probability of a match. P(365,30) is ~
• Probability of repetition? 71%
𝑷𝒌𝒏
𝒑 𝒌, 𝒏 = 𝟏 − 𝒏 Hash functions are subject to the
𝒌
same square root attack
Collision Resistance
• Should be hard to find x ≠ x’ such that h(x)=h(x’)
• Birthday paradox
• Let T be the number of values x, x’, x”… we need to look at before finding the
first pair x ≠ x’ such that h(x)=h(x’)
• Assuming h is random, what is the probability that we find a repetition after
looking at T values?
• Total number of pairs? O(2m)
• m = number of bits in the output of hash function
• Conclusion: T  O(2m/2)
• Brute-force collision search is O(2m/2), not O(2m)
• For SHA-1, this means O(280) vs. O(2160)
Message Digest Size and Attempts
Probability of Hash Collisions
• Arbitrary length message → Fixed length hash
• Many messages will map to the same hash
• Given 1000 bit messages → 21000 messages
• 128 bit hash → 2128 possible hashes
• 21000/2128 = 2872 messages/hash value
• n-bit hash → Need 2n/2 tries on average to find two messages with
same hash
• 64 bit hash → 232 tries (feasible)
• 128 bit hash → 264 tries (not feasible)
Usages of Cryptographic Hash Functions
• Software integrity e.g., tripwire
• Timestamping
• How to prove that you have discovered a secret on an earlier
date without disclosing it?
• Message authentication
• Password Storage, One-time passwords
• Hash of the user’s password, is compared with that in the
storage. Hackers can not get password from storage
• Digital signature - Encrypt hash with private key
• Pseudo Random Number Generation – Hash
an IV
Hashing vs. Encryption
 Hashing is one-way. There is no “un-hashing”!
 A ciphertext can be decrypted with a decryption key… hashes have no
equivalent of “decryption”
 Hash(x) looks “random”, but can be compared for equality with
Hash(x’)
 Hash the same input twice  same hash value
 Encrypt the same input twice  different ciphertexts
 Cryptographic hashes are also known as “cryptographic
checksums” or “message digests”
Types of Hash Function
• Keyed – usually based on a • DES can be used in Cipher Block
cipher – users must exchange Chaining (CBC) mode to produce
secret key in order to a digest of the message
authenticate – Message • This requires the exchange of a
Authentication Codes
secret key between users
• Un-keyed – no keys required –
anyone can generate or verify
– Cryptographic Hash
Functions
Well Known Hash Functions
• MD5
– output 128 bits
– collision resistance completely broken by researchers in China in 2004
• SHA1
– output 160 bits
– no collision found yet, but method exist to find collisions in less than 2^80
– considered insecure for collision resistance
– one-wayness still holds
• SHA2 (SHA-224, SHA-256, SHA-384, SHA-512)
– outputs 224, 256, 384, and 512 bits, respectively
– No real security concerns yet
SHA-3
 Wang et.al. showed a method of easily finding collisions in MD5 and
SHA-0
 SHA-1 not yet "broken”
 but similar to broken MD5 & SHA-0
 so considered insecure
 SHA-2 (esp. SHA-512) seems secure
 shares same structure and mathematical operations as predecessors so have
concern
 NIST announced in 2007 a competition for the SHA-3 next gen NIST
hash function
 Keccak winner Oct 2012 – standard in Q2, 2014
Message Digest (MD5)
• Introduced in 1992 by Ron Rivest (RSA fame)
• Un-keyed hash based on MD4
• Processes 512 bit input blocks – 128 bit output hash
• Has four rounds with 16 steps in each round
• Total 64 steps

North South University north south university


daab1263c3fc3ea2a62dd6cf864b8929 99ecc9f7958ede4f1e9144d655c9262a
Message Processing Block by Block
Operations on Each Message Block
MD5: Step 1 & 2
• Step 1. Appending Padding Bits. The original message is "padded"
(extended) so that its length (in bits) is congruent to 448, modulo 512. The
padding rules are:
• The original message is always padded with one bit "1" first.
• Then zero or more bits "0" are padded to bring the length of the message up to 64 bits fewer than a
multiple of 512.
• Step 2. Appending Length. 64 bits are appended to the end of the padded
message to indicate the length of the original message in bytes. The rules
of appending length are:
• The length of the original message in bytes is converted to its binary format of 64 bits. If overflow
happens, only the low-order 64 bits are used.
• Break the 64-bit length into 2 words (32 bits each).
• The low-order word is appended first and followed by the high-order word.
MD5: Step 3 & 4
• Step 3. Initializing MD Buffer. MD5 algorithm requires a 128-bit buffer
with a specific initial value. The rules of initializing buffer are:
• The buffer is divided into 4 words (32 bits each), named as A, B, C, and D.
• Word A is initialized to: 0x67452301.
• Word B is initialized to: 0xEFCDAB89.
• Word C is initialized to: 0x98BADCFE.
• Word D is initialized to: 0x10325476.
• Step 4. Processing Message in 512-bit Blocks. This is the main step of
MD 5 algorithm, which loops through the padded and appended
message in blocks of 512 bits each. For each input block, 4 rounds of
operations are performed with 16 operations in each round.
for i from 0 to 15 do
Auxiliary Functions func := F (Round 1) or G or H or I (Round 4)
X := func + A + M[i] + K[i]
A := D
D := C
• F(B,C,D) = (B AND C ) OR (NOT B AND D) C := B
• G(B,C,D) = (B AND D ) OR (C AND NOT D) B := B + leftrotate(X, s[round,i])
• H(B,C,D) = B XOR C XOR D end for
• I(B,C,D) = C XOR (B OR NOT D)

• R1(F, A, B, C, D, M[i] message block, # of left shifts, si, i)


• R2(G, A, B, C, D, M[i] message block, # of left shifts, si, i)
• R3(H, A, B, C, D, M[i] message block, # of left shifts, si, i)
• R4(I, A, B, C, D, M[i] message block, # of left shifts, si, i)
• 64 values of K

• Step 5. Output. The contents in buffer words A, B, C, D are returned in sequence with low-order
byte first.
History of MD5 Collisions
• 2004: first collision attack
• The only difference between colliding messages is 128 random-looking bytes
• 2007: chosen-prefix collisions
• For any prefix, can find colliding messages that have this prefix and differ up
to 716 random-looking bytes
• 2008: rogue SSL certificates
• Talk about this in more detail when discussing PKI
• 2012: MD5 collisions used in cyberwarfare
• Flame malware uses an MD5 prefix collision to fake a Microsoft digital code
signature
Secure Hash Algorithm (SHA)
• Developed by NIST and released as FIPS, PUB
180 in 1993
• Based on MD4 structure. Similar to MD5 –
messages are padded to multiple of 512 bits
• Produces 160 bit has value
• Every bit of output depends on every bit of
input
• Very important property for collision-resistance
• Brute-force inversion requires 2160 ops, birthday
attack on collision resistance requires 280 ops
• Some weaknesses discovered in 2005
• Collisions can be found in 263 ops
SHA1 algorithm consists of 6 tasks:
• Task 1. Appending Padding Bits. The original message is "padded"
(extended) so that its length (in bits) is congruent to 448, modulo 512. The
padding rules are:
• The original message is always padded with one bit "1" first.
• Then zero or more bits "0" are padded to bring the length of the message up to 64
bits fewer than a multiple of 512.
• Task 2. Appending Length. 64 bits are appended to the end of the padded
message to indicate the length of the original message in bytes. The rules
of appending length are:
• The length of the original message in bytes is converted to its binary format of 64
bits. If overflow happens, only the low-order 64 bits are used.
• Break the 64-bit length into 2 words (32 bits each).
• The low-order word is appended first and followed by the high-order word.
SHA1 Steps
• Task 3. Preparing Processing Functions. SHA1 requires 80 processing
functions defined as:
f(t;B,C,D) = (B AND C) OR ((NOT B) AND D) ( 0 <= t <= 19)
f(t;B,C,D) = B XOR C XOR D (20 <= t <= 39)
f(t;B,C,D) = (B AND C) OR (B AND D) OR (C AND D) (40 <= t <= 59)
f(t;B,C,D) = B XOR C XOR D (60 <= t <= 79)

• Task 4. Preparing Processing Constants. SHA1 requires 80 processing


constant words defined as:
K(t) = 0x5A827999 ( 0 <= t <= 19)
K(t) = 0x6ED9EBA1 (20 <= t <= 39)
K(t) = 0x8F1BBCDC (40 <= t <= 59)
K(t) = 0xCA62C1D6 (60 <= t <= 79)
SHA1 Steps
• Task 5. Initializing Buffers. SHA1 algorithm requires 5 word buffers with the
following initial values:
H0 = 0x67452301
H1 = 0xEFCDAB89
H2 = 0x98BADCFE
H3 = 0x10325476
H4 = 0xC3D2E1F0
• Task 6. Processing Message in 512-bit Blocks. This is the main task of SHA1
algorithm, which loops through the padded and appended message in blocks
of 512 bits each. For each input block, a number of operations are
performed. This task can be described in the following pseudo code slightly
modified from the RFC 3174's method 1:
• Input and predefined functions:
M[1, 2, ..., N]: Blocks of the padded and appended message
f(0;B,C,D), f(1,B,C,D), ..., f(79,B,C,D): Defined as above
K(0), K(1), ..., K(79): Defined as above
H0, H1, H2, H3, H4: Word buffers with initial values

You might also like