Dat Astruc T Hashing Rep

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 13

Hashing

Data Structure
By: Andre L. Marquez
Hashing Definiton

In computer science and data structures, hashing is the


process of converting a given key into an index of a hash
table. A hash table is a data structure that is used to store
and retrieve values associated with a particular key. The
hash function takes a key and returns a hash code, which is
then used as an index in the hash table to retrieve the
associated value.
The purpose of hashing is to provide a fast and efficient
way of accessing and storing data. When a key is hashed, it
is converted into a fixed-size value that can be used as an
index into an array, making it possible to quickly retrieve
data with a simple lookup. Hashing is commonly used in
data structures such as hash tables, dictionaries, and sets.
DIFFERENT TYPES OF HASHING

There are several types of hashing algorithms, each with its own unique characteristics and
strengths. Here are some common types of hashing:
 Division hashing
 Multiplication hashing
 Universal hashing
 Cryptographic hashing
 Perfect hashing
Division Hashing
Division hashing is a simple hashing technique in which the hash code for
an input key is obtained by taking the remainder of the key divided by the
table size.

In other words, given a key k and a table of size m, the hash function h(k)
for division hashing is defined as:

h(k) = k % m

where the % symbol represents the modulo operator, which returns the
remainder of the division.

This method is easy to implement and can work well for many
applications, but it has some limitations. One potential issue is that if the
table size is not chosen carefully, it can lead to many collisions, which
can slow down the performance of the hash table. Additionally, it can be
vulnerable to attacks, such as intentional collisions, if an attacker knows
the hash function being used.
Multiplication Hashing
Multiplication hashing is a technique used in hashing that involves
multiplying the input key by a constant and then using the fractional part
of the product to obtain a hash code.

The general formula for multiplication hashing is:

h(k) = floor(m * ((k * A) mod 1))

where k is the input key, m is the size of the hash table, A is a constant
between 0 and 1, and floor() is the floor function.

The value of A is typically chosen to be a "nice" number, such as the


golden ratio (0.6180339887...) or a prime number close to 1. The
fractional part of the product (k * A) mod 1 ensures that the hash code
falls between 0 and 1, and multiplying by m and taking the floor ensures
that the hash code is within the range of the hash table size.

Multiplication hashing can work well for many applications, and it has
the advantage of producing a more even distribution of hash codes than
division hashing. However, it can be more computationally expensive
than division hashing due to the multiplication operation. Additionally,
choosing an appropriate constant value can be difficult and requires some
trial and error.
Universal Hashing
Universal hashing is a technique used in hashing that provides a way
to select a hash function at random from a family of hash functions.
The goal of universal hashing is to minimize the number of
collisions in a hash table, even when the input data is not known in
advance.
A family of hash functions is said to be universal if, for any two
distinct keys, the probability that they collide when hashed with a
randomly chosen hash function from the family is at most 1/m,
where m is the number of slots in the hash table. In other words, the
probability of a collision is very low for any pair of keys, no matter
what their values are.
To achieve this, universal hashing uses a hash function that is
selected at random from a family of hash functions, rather than
using a fixed hash function. This means that the hash function used
for a given key is not predetermined but is chosen at runtime. By
selecting the hash function randomly, the likelihood of collisions can
be minimized, even when the input data is not known in advance.
Universal hashing can be used with various collision resolution
techniques, such as chaining or open addressing, to create efficient
hash tables that provide fast access to data.
Cryptographic Hashing

Cryptographic hashing is a technique used in computer science and


cryptography to transform data (such as a message or a file) into a fixed-size,
unique, and irreversible output called a hash. The output of a cryptographic
hash function is often called a message digest or digital fingerprint.
Cryptographic hashing algorithms are designed to be one-way functions,
meaning that it is computationally infeasible to generate the original input data
from its hash value. This property makes cryptographic hashing useful for a
variety of security-related applications, such as data integrity verification,
password storage, and digital signatures.
A good cryptographic hash function should have several properties, including:
1.Deterministic:
Given the same input data, the hash function should always
produce the same hash value.
2.Unique: Two different input data should not produce the same hash value.
3.Fixed-length: The hash function should always produce a fixed-length output,
regardless of the size of the input data.
4.Non-invertible: It should be computationally infeasible to generate the
original input data from its hash value.
5.Collision-resistant:It should be computationally infeasible to find two
different input data that produce the same hash value.
Common cryptographic hashing algorithms include SHA-256, SHA-3, and
MD5, among others. These algorithms are widely used in various security-
related applications, such as digital signatures, message authentication, and
secure communications.
Perfect Hashing

Perfect hashing is a technique used in computer science and data structures to


create a hash table with no collisions. A perfect hash function maps each key
in a set of keys to a unique index in a hash table without any collisions.

In a perfect hash table, the time required to search for an item is constant and
independent of the size of the table. This makes perfect hashing ideal for
applications where the set of keys is known in advance and is relatively static.

There are two main approaches to implementing perfect hashing:

1.Staticperfect hashing: This approach is used when the set of keys is known
in advance and is fixed. A static perfect hash function is created by
precomputing all possible hash functions and selecting the one that results in
no collisions.

2.Dynamic perfect hashing: This approach is used when the set of keys can
change over time. A dynamic perfect hash function is created by first
generating a hash function that may produce collisions, and then using a
secondary hash table to resolve any collisions that occur.

Perfect hashing can provide a significant improvement in performance over


traditional hashing techniques, especially in applications where the size of the
hash table is large, and collisions are frequent. However, the process of
generating a perfect hash function can be computationally expensive and may
require significant resources to implement.
Hashing Techniques

There are two primary hashing techniques in a data structure:

 Open Hashing/Separate Chaining


 Closed Hashing(Open Addressing)
Open Hashing/Separate
Chaining

Separate chaining is the most used collision hashing


technique in data structures that uses a lined list. Any two or
more components that meet at the same point are chained
together to form a single-linked list known as a chain.
Every linked list member that hashes is chained to the same
position here. Also known as closed addressing, open
hashing is used to avoid any hash collisions, using an array
of linked lists in order to resolve the collision.
Closed Hashing(Open
Addressing

Open addressing stores all entry records within the array


itself, as opposed to linked lists. The phrase 'open
addressing' refers to the notion that the hash value of an
item does not identify its location or address. In order to
insert a new entry, the array is first checked before
computing the hash index of the hashed value, starting with
the hashed index. If the space at the hashed index is empty,
the entry value is inserted there; otherwise, some probing
sequences are used until an empty slot is found.

The procedure used to navigate through entries is known as


the probe sequence. You can vary the time between
succeeding entry slots or probes in different probe
sequences.
Activity: Identification
1. ______:These are hash functions that are designed to be secure against malicious
attacks, such as intentional collisions or reverse engineering of the original data.

2. ______:This is a special type of hashing that guarantees no collisions will occur.

3. ______:This is the simplest form of hashing, in which the key is divided by the table
size, and the remainder is used as the hash code.

4. ______:This is a family of hash functions that are designed to minimize the number of
collisions. The hash function is chosen randomly from a family of functions, which ensures
that no particular set of keys will cause a large number of collisions.

5. ______:This type of hashing uses a multiplication operation to generate a hash code.


Activity Answer
1. Cryptographic Hashing :These are hash functions that are designed to be secure against
malicious attacks, such as intentional collisions or reverse engineering of the original data.

2. Perfect Hashing :This is a special type of hashing that guarantees no collisions will occur.

3. Division Hashing :This is the simplest form of hashing, in which the key is divided by the
table size, and the remainder is used as the hash code.

4. Universal Hashing :This is a family of hash functions that are designed to minimize the
number of collisions. The hash function is chosen randomly from a family of functions, which
ensures that no particular set of keys will cause a large number of collisions.

5. Multiplication Hashing :This type of hashing uses a multiplication operation to generate a hash
code.

You might also like