UNIT-04 Hashing
What is Hashing in Data Structures?
In this technique, we give an input called a key to the hash function. The function uses this key
and generates the unique index corresponding to that value in the hash table. After that, it
returns the value stored at that index which is known as the hash value.
Data can be hashed into a shorter, fixed-length value for quicker access using a key or set of
characters. This is how key-value pairs are stored in hash tables. The representation of the hash
function looks like this:
• Hash key: It is the data you want to be hashed in the hash table. The hashing algorithm
translates the key to the hash value. This identifier can be a string or an integer. There
are some types of hash keys:
1. Public key - It is an open key used solely for data encryption.
2. Private key - It is known as a symmetric key used for both purposes, encryption
and decryption.
3. SSH public key - SSH is a set of both public and private keys.
• Hash Function: It performs the mathematical operation of accepting the key value as
input and producing the hash code or hash value as the output. Some of the
characteristics of an ideal hash function are as follows:
o It must produce the same hash value for the same hash key to be deterministic.
o Every input has a unique hash code. This feature is known as the hash property.
o It must be collision-friendly.
o A little bit of change leads to a drastic change in the output.
o The calculation must be quick
• Hash Table: It is a type of data structure that stores data in an array format. The table
maps keys to values using a hash function.
Use cases of Hashing In DSA
• Password Storage: Hash functions are commonly used to securely store passwords.
Instead of storing the actual passwords, the system stores their hash values. When a
user enters a password, it is hashed and compared with the stored hash value for
authentication.
• Data Integrity: Hashing is used to ensure data integrity by generating hash values for
files or messages. By comparing the hash values before and after transmission or
storage, it's possible to detect if any changes or tampering occurred.
• Data Retrieval: Hashing is used in data structures like hash tables, which provide
efficient data retrieval based on key-value pairs. The hash value serves as an index to
store and retrieve data quickly.
• Digital Signatures: Hash functions are an integral part of digital signatures. They are
used to generate a unique hash value for a message, which is then encrypted with the
signer's private key. This allows for verification of the authenticity and integrity of the
message using the signer's public key.
Types of Hash Functions
The primary types of hash functions are:
1. Division Method.
2. Mid Square Method.
3. Folding Method.
4. Multiplication Method.
1. Division Method
The easiest and quickest way to create a hash value is through division. The k-value is
divided by M in this hash function, and the result is used.
Formula:
h(K) = k mod M
(where k = key value and M = the size of the hash table)
Advantages:
• This method is effective for all values of M.
• The division strategy only requires one operation, thus it is quite quick.
Disadvantages:
• Since the hash table maps consecutive keys to successive hash values, this could result
in poor performance.
• There are times when exercising extra caution while selecting M's value is necessary.
Example of Division Method
k = 1987
M = 13h(1987) = 1987 mod 13
h(1987) = 4
2. Mid Square Method
The following steps are required to calculate this hash method:
• k*k, or square the value of k
• Using the middle r digits, calculate the hash value.
Formula:
h(K) = h(k x k)
(where k = key value)
Advantages:
• This technique works well because most or all of the digits in the key value affect the
result. All of the necessary digits participate in a process that results in the middle digits
of the squared result.
• The result is not dominated by the top or bottom digits of the initial key value.
Disadvantages:
• The size of the key is one of the limitations of this system; if the key is large, its square
will contain twice as many digits.
• Probability of collisions occurring repeatedly.
Example of Mid Square Method
k = 60Therefore,k = k x k
k = 60 x 60
k = 3600Thus,
h(60) = 60
[Link] Method
The process involves two steps:
• except for the last component, which may have fewer digits than the other parts, the
key-value k should be divided into a predetermined number of pieces, such as k1, k2,
k3,..., kn, each having the same amount of digits.
• Add each element individually. The hash value is calculated without taking into account
the final carry, if any.
Formula:
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
(Where, s = addition of the parts of key k)
Advantages:
• Creates a simple hash value by precisely splitting the key value into equal-sized
segments.
• Without regard to distribution in a hash table.
Disadvantages:
• When there are too many collisions, efficiency can occasionally suffer.
Example of Folding Method
k = 12345
k1 = 67; k2 = 89; k3 = 12Therefore,s = k1 + k2 + k3
s = 67 + 89 + 12
s = 168
[Link] Method
• Determine a constant value. A, where (0, A, 1)
• Add A to the key value and multiply.
• Consider kA's fractional portion.
• Multiply the outcome of the preceding step by M, the hash table's size.
Formula:
h(K) = floor (M (kA mod 1))
(Where, M = size of the hash table, k = key value and A = constant value)
Advantages:
• Any number between 0 and 1 can be applied to it, however, some values seem to yield
better outcomes than others.
Disadvantages:
• The multiplication method is often appropriate when the table size is a power of two
since multiplication hashing makes it possible to quickly compute the index by key.
Example of Multiplication Method
k = 5678
A = 0.6829
M = 200
Now, calculating the new value of h(5678):h(5678) = floor[200(5678 x 0.6829 mod 1)]
h(5678) = floor[200(3881.5702 mod 1)]
h(5678) = floor[200(0.5702)]
h(5678) = floor[114.04]
h(5678) = 114
So, with the updated values, h(5678) is 114.
Key Terms in Hashing
It is important to know some key terminologies in hashing before going any further:
Term Description
Key The unique input value (like an ID or name) used to identify data.
Hash Function A function that converts a key into a hash code or index.
Hash Table A data structure that stores key-value pairs based on hash codes.
Hash Code / Hash
The numeric value generated by the hash function.
Value
Collision When two keys map to the same hash value.
The ratio of stored elements to the size of the hash table. It helps decide
Load Factor
when to resize or rehash.
Collision Resolution Techniques:
1) Separate Chaining
The idea behind Separate Chaining is to make each cell of the hash table point to a linked list
of records that have the same hash function value. Chaining is simple but requires additional
memory outside the table.
Example: We have given a hash function and we have to insert some elements in the hash
table using a separate chaining method for collision resolution technique.
Hash function = key % 5,
Elements = 12, 15, 22, 25 and 37.
Let's see step by step approach to how to solve the above problem:
Step 1
Step 2
Step 3
Step 4
Step 5
2) Open Addressing
In open addressing, all elements are stored in the hash table itself. Each table entry contains
either a record or NIL. When searching for an element, we examine the table slots one by
one until the desired element is found or it is clear that the element is not in the table.
2.a) Linear Probing
In linear probing, the hash table is searched sequentially that starts from the original location
of the hash. If in case the location that we get is already occupied, then we check for the next
location.
Algorithm:
1. Calculate the hash key. i.e. key = data % size
2. Check, if hashTable[key] is empty
• store the value directly by hashTable[key] = data
3. If the hash index already has some value then
• check for next index using key = (key+1) % size
4. Check, if the next index is available hashTable[key] then store the value.
Otherwise try for next index.
5. Do the above process till we find the space.
Example: Let us consider a simple hash function as “key mod 5” and a sequence of keys that
are to be inserted are 50, 70, 76, 85, 93.
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
2.b) Quadratic Probing
Quadratic probing is an open addressing scheme in computer programming for resolving hash
collisions in hash tables. Quadratic probing operates by taking the original hash index and
adding successive values of an arbitrary quadratic polynomial until an open slot is found.
An example sequence using quadratic probing is:
H + 1 2 , H + 2 2 , H + 3 2 , H + 4 2 ...................... H + k 2
This method is also known as the mid-square method because in this method we look for i2-
th probe (slot) in i-th iteration and the value of i = 0, 1, . . . n – 1. We always start from the
original hash location. If only the location is occupied then we check the other slots.
Let hash(x) be the slot index computed using the hash function and n be the size of the hash
table.
If the slot hash(x) % n is full, then we try (hash(x) + 1 2 ) % n.
If (hash(x) + 1 2 ) % n is also full, then we try (hash(x) + 2 2 ) % n.
If (hash(x) + 2 2 ) % n is also full, then we try (hash(x) + 3 2 ) % n.
This process will be repeated for all the values of i until an empty slot is found
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and collision
resolution strategy to be f(i) = i 2 . Insert = 22, 30, and 50
Step 1
Step 2
Step 3
Step 4
2.c) Double Hashing
Double hashing is a collision resolving technique in Open Addressed Hash tables. Double
hashing make use of two hash function,
• The first hash function is h1(k) which takes the key and gives out a location on
the hash table. But if the new location is not occupied or empty then we can easily
place our key.
• But in case the location is occupied (collision) we will use secondary hash-
function h2(k) in combination with the first hash-function h1(k) to find the new
location on the hash table.
This combination of hash functions is of the form
h(k, i) = (h1(k) + i * h2(k)) % n
where
• i is a non-negative integer that indicates a collision number,
• k = element/key which is being hashed
• n = hash table size.
Complexity of the Double hashing algorithm:
Time complexity: O(n)
Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first hash-
function is h1(k) = k mod 7 and second hash-function is h2(k) = 1 + (k mod 5)
Step 1
Step 2
Step 3
Step 4
Step 5
Load Density / Load Factor (α)
Definition:
Load factor (also called load density) is a measure that shows how full the hash table is.
Where:
• n = number of records (keys) actually stored in the table
• m = total number of buckets (or slots) in the hash table
Example:
If a hash table has 10 slots and 7 keys are stored,
Interpretation:
• A higher load factor means the table is fuller → more collisions likely.
• A lower load factor means the table has more empty spaces → fewer collisions but
uses more memory.
Ideal range:
For open addressing methods (like linear probing), α should be less than 0.7.
For chaining methods, it can be greater than 1 since each bucket can hold multiple elements
(linked list).
2. Full Table
A full table means that all the slots in the hash table are occupied.
• This situation occurs when the load factor = 1 (for open hashing).
• Once the table is full, no new element can be inserted without rehashing.
Example:
If a hash table has 10 slots and all 10 are filled, it is a full table.
When a new key comes, the system must either:
• Increase table size, or
• Rehash using a new hash function.
3. Rehashing
Definition:
Rehashing is the process of creating a new, larger hash table and re-inserting all elements
from the old table into the new one using a new hash function.
Purpose:
To reduce collisions and improve efficiency when the table becomes too full.
Steps:
1. Detect that the load factor has crossed a threshold (e.g., α > 0.7).
2. Create a new table with larger size (often double the old one).
3. Use a new hash function (or modify the old one).
4. Insert all elements from the old table into the new one.
Example:
Old hash table size = 10, α = 0.8 → rehashing triggered
New table size = 20
Recalculate hash positions and insert all existing elements again.
Benefits:
• Reduces collisions
• Improves search, insert, and delete performance.
4. Properties of a Good Hash Function
A good hash function should satisfy the following properties:
Property Description
Keys should be distributed uniformly across all table slots to
Uniform Distribution
minimize collisions.
Deterministic The same key must always produce the same hash value.
Efficiency Should be fast and simple to compute.
Less Collision Should minimize cases where different keys map to the same index.
All parts of the input key should affect the hash value to avoid
Use All Key Parts
patterns.
Independent of Similar keys (e.g., 1001 and 1002) should not necessarily hash to
Patterns adjacent slots.