0% found this document useful (0 votes)

63 views6 pages

Hashing and Hash Tables Overview

The document provides an overview of hashing and hash tables, explaining the technique of converting large keys into fixed-size integer values for efficient data storage and retrieval. It covers key concepts such as hash functions, collision resolution strategies (separate chaining and open addressing), and the importance of load factor and resizing for maintaining performance. Additionally, it highlights various applications of hash tables in computer science, including their use in databases, caches, and programming language data structures.

Uploaded by

xgxmu6hy3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views6 pages

Hashing and Hash Tables Overview

Uploaded by

xgxmu6hy3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CSC 3XX - Lecture Notes: Hashing and Hash Tables

Topic: Hashing and Hash Tables

Date: October 26, 2023

1. Introduction to Hashing

Hashing is a technique used to convert a large key (e.g., a string, a large integer, an object)
into a small, ﬁxed-size integer value, which typically serves as an index in an array. The
primary goal of hashing is to enable very eﬃcient data storage and retrieval operations,
ideally achieving constant average time complexity (O(1)).

The core data structure that leverages hashing is called a Hash Table (also known as a Hash
Map or Dictionary). It's an array-based data structure that stores key-value pairs, allowing for
quick lookups, insertions, and deletions based on the key.

Why Hashing? Consider a scenario where you need to store and quickly retrieve information
about millions of users based on their unique email addresses. If you used a simple array or
linked list, searching would be O(N). A balanced binary search tree could achieve O(log N).
Hashing aims to beat even O(log N) on average.

2. Key Concepts

2.1. The Hash Function

A hash function is the heart of any hash table. It takes an input key and returns an integer,
which is then mapped to an index within the hash table's underlying array.

Properties of a Good Hash Function:

1. Deterministic: The same input key must always produce the same hash value.
2. Fast Computation: The function should be eﬃcient to compute, as it's called for every
operation (insert, search, delete).
3. Uniform Distribution: It should distribute keys as evenly as possible across the entire
range of possible hash values. This minimizes collisions and ensures good performance.
4. Low Collision Rate: While collisions are inevitable, a good hash function minimizes
their frequency.

Example Hash Functions:

• Simple Modulo (for integers): If keys are integers, a common approach is

hash_value = key % M, where M is the size of the hash table's array.

○ Problem: If keys have a common factor with M, or are clustered, distribution can
be poor.
○ Better: Choose M as a prime number.

• For Strings (Polynomial Rolling Hash): hash(S) = (S[0] * p^(k-1) + S[1]

* p^(k-2) + ... + S[k-1] * p^0) % M where S is the string, k is its length, p
is a prime number (e.g., 31 or 37 for lowercase English letters), and M is the table size.
This treats the string as a number in base p.

2.2. The Hash Table Structure

A hash table fundamentally consists of:

1. An Array (Buckets): This is the main storage space. Each position in the array is called
a "bucket" or "slot."
2. A Mechanism for Collision Resolution: Since multiple keys can map to the same
bucket (a collision), a strategy is needed to handle this.

Basic Operations and Time Complexity:

• insert(key, value):

1. Compute index = hash_function(key) % M.

2. Place the (key, value) pair at index, handling collisions.

○ Average: O(1)
○ Worst: O(N) (if all keys hash to the same bucket)

• search(key):

1. Compute index = hash_function(key) % M.

2. Look for key at index, handling collisions.

○ Average: O(1)
○ Worst: O(N)

• delete(key):

1. Compute index = hash_function(key) % M.

2. Find and remove key at index, handling collisions.

○ Average: O(1)
○ Worst: O(N)

3. Collision Resolution Strategies

Collisions are inevitable due to the Pigeonhole Principle (mapping a larger set of possible
keys to a smaller set of array indices). Two main approaches exist:

3.1. Separate Chaining

In separate chaining, each bucket in the hash table array doesn't directly store an item, but
rather a reference to a data structure (typically a linked list, but can be a dynamic array or
even another hash table) that holds all key-value pairs that hash to that speciﬁc bucket.

• How it works:

1. To insert(key, value): Compute index = hash(key) % M. Add the

(key, value) pair to the linked list at table[index].
2. To search(key): Compute index. Traverse the linked list at table[index]
until the key is found or the list ends.
3. To delete(key): Compute index. Traverse the linked list at table[index]
and remove the node containing the key.

• Advantages:

○ Relatively simple to implement.

○ The hash table can never truly "ﬁll up" (can always add more elements to linked
lists).
○ Performance degrades gracefully as the load factor increases.
○ Deletion is straightforward.

• Disadvantages:

○ Requires extra space for pointers in the linked lists.

○ Can suﬀer from poor cache performance if linked lists become long, as nodes
might not be contiguous in memory.

Example (Separate Chaining): Hash table size M = 5. Hash function hash(key) = key
% 5. Keys to insert: 10, 22, 5, 15, 7, 12

• 10 % 5 = 0: table[0] -> [10]

• 22 % 5 = 2: table[2] -> [22]
• 5 % 5 = 0: table[0] -> [10] -> [5]
• 15 % 5 = 0: table[0] -> [10] -> [5] -> [15]
• 7 % 5 = 2: table[2] -> [22] -> [7]
• 12 % 5 = 2: table[2] -> [22] -> [7] -> [12]

Resulting table: table[0] -> [10] -> [5] -> [15] table[1] -> NULL table[2] -> [22]
-> [7] -> [12] table[3] -> NULL table[4] -> NULL

3.2. Open Addressing

In open addressing, all elements are stored directly within the hash table array. When a
collision occurs, the algorithm "probes" for an alternative empty slot in the table using a
speciﬁc sequence.

• How it works:

1. To insert(key, value): Compute index = hash(key) % M. If

table[index] is empty, place the item there. If not, systematically search for
the next available empty slot.
2. To search(key): Compute index. If table[index] contains the key, return
it. If table[index] is occupied by a diﬀerent key, follow the same probing
sequence as insertion until the key is found or an empty slot is encountered
(meaning the key is not in the table).
3. To delete(key): This is more complex. Simply removing an item can break the
search chain for other items that were placed later due to that item's presence. A
common solution is to mark the slot as "deleted" (a "tombstone") instead of truly
emptying it. This allows searches to continue past the deleted slot but complicates
insertions (can reuse "deleted" slots) and can degrade performance over time.

• Advantages:

○ Better cache performance because elements are stored contiguously in memory.

○ No overhead for storing pointers.

• Disadvantages:

○ Sensitive to the load factor (table can become full).

○ Deletion is more complex.
○ Can suﬀer from clustering, where occupied slots form blocks, increasing probe
sequence lengths.

Probing Strategies for Open Addressing:

1. Linear Probing:

• Probe sequence: (hash(key) + i) % M, for i = 0, 1, 2, ...

• If table[index] is occupied, try table[(index + 1) % M], then
table[(index + 2) % M], and so on.
• Problem: Primary Clustering – long runs of occupied slots form, making future
insertions and searches take longer.

2. Quadratic Probing:

• Probe sequence: (hash(key) + i^2) % M, for i = 0, 1, 2, ...

• Helps reduce primary clustering.
• Problem: Secondary Clustering – keys that hash to the same initial index follow
the exact same quadratic probe sequence, still causing clusters. Also, not all slots
may be reachable if M is not chosen carefully (e.g., a prime number).

3. Double Hashing:

• Probe sequence: (hash1(key) + i * hash2(key)) % M, for i = 0, 1, 2,

...
• Uses two hash functions: hash1 for the initial position, and hash2 for the step
size in the probing sequence.
• hash2(key) must never return zero and should be relatively prime to M. A
common choice: hash2(key) = R - (key % R) where R is a prime number
smaller than M.
• Eﬀectively eliminates both primary and secondary clustering by providing unique
probe sequences for each key.

Example (Linear Probing): Hash table size M = 5. Hash function hash(key) = key % 5.
Keys to insert: 10, 22, 5, 15, 7

• 10 % 5 = 0: table[0] = 10
• 22 % 5 = 2: table[2] = 22
• 5 % 5 = 0: table[0] is occupied. Try (0+1)%5 = 1. table[1] = 5
• 15 % 5 = 0: table[0] is occupied. Try (0+1)%5 = 1. table[1] is occupied. Try
(0+2)%5 = 2. table[2] is occupied. Try (0+3)%5 = 3. table[3] = 15
• 7 % 5 = 2: table[2] is occupied. Try (2+1)%5 = 3. table[3] is occupied. Try
(2+2)%5 = 4. table[4] = 7

Resulting table: table[0] = 10 table[1] = 5 table[2] = 22 table[3] = 15

table[4] = 7

4. Load Factor and Resizing (Rehashing)

The load factor (α) is a crucial metric for hash table performance. It's deﬁned as: α = N /
M where N is the number of items currently stored in the hash table, and M is the total
number of buckets (array size).

• Impact: A higher load factor means more collisions and longer collision resolution
chains/probes, degrading performance from O(1) towards O(N).
• Thresholds:

○ For separate chaining, α can exceed 1, but typically performance starts degrading
signiﬁcantly above α = 1 or 2.
○ For open addressing, α must always be less than 1. Typically, a threshold of α =
0.5 to 0.7 is used before resizing.

Resizing (Rehashing): When the load factor exceeds a predeﬁned threshold, the hash table
needs to be resized to maintain good performance. This involves:

1. Creating a new, larger array (e.g., double the size of the old array, and often choosing
a new prime number for M).
2. Rehashing all existing key-value pairs from the old table into the new table using the
new M. This is necessary because the modulo operation key % M will produce diﬀerent
indices with a new M.
3. Discarding the old table.

• Cost: Resizing is an O(N) operation, as every item must be re-inserted.

• Amortized Analysis: While a single resize is expensive, if the table grows by a constant
factor, the cost of resizing is "amortized" over many insertions, resulting in an average
O(1) cost per insertion over a sequence of operations.

5. Applications of Hash Tables

Hash tables are one of the most widely used and versatile data structures in computer science
due to their average O(1) performance.

• Symbol Tables in Compilers/Interpreters: Store information about variables,

functions, and other identifiers for quick lookup during parsing and compilation.
• Database Indexing: Used to quickly locate records based on a key (e.g., primary key).
• Caches: Web browsers, CPU caches, and other caching systems use hash tables to store
frequently accessed data for fast retrieval.
• Implementing Set and Dictionary/Map Data Types: Most programming languages
(Python's dict, Java's HashMap, C++'s std::unordered_map) implement these
using hash tables.
• Routing Tables: Used in network routers to map IP addresses to outbound interfaces.
• Checksums and Data Integrity: While distinct from hash functions for data structures,
cryptographic hash functions generate fixed-size "fingerprints" for data blocks to detect
tampering.
• Spell Checkers: Store a dictionary of valid words for quick lookup.
• Graph Algorithms: Can be used to store visited nodes or edges efficiently.

6. Conclusion

Hashing and hash tables are powerful tools for achieving near-constant time complexity for
data storage and retrieval. Understanding hash functions, collision resolution strategies
(separate chaining, open addressing with its probing variants), and the concept of load factor
and resizing are fundamental to designing eﬃcient and reliable software systems. While the
worst-case time complexity can be O(N), a well-designed hash table with good hash functions
and proper load factor management ensures excellent average-case performance in practice.

Understanding Hash Tables and Hashing
No ratings yet
Understanding Hash Tables and Hashing
5 pages
Hashing and Hash Tables Explained
No ratings yet
Hashing and Hash Tables Explained
5 pages
Understanding Hash Tables for Fast Data Storage
No ratings yet
Understanding Hash Tables for Fast Data Storage
4 pages
Understanding Hash Tables in Computer Science
No ratings yet
Understanding Hash Tables in Computer Science
5 pages
Understanding Hash Tables in CS 201
No ratings yet
Understanding Hash Tables in CS 201
5 pages
Understanding Hash Tables in Depth
No ratings yet
Understanding Hash Tables in Depth
4 pages
Understanding Hashing and Collision Resolution
No ratings yet
Understanding Hashing and Collision Resolution
33 pages
Understanding Hashing Techniques and Collision Resolution
No ratings yet
Understanding Hashing Techniques and Collision Resolution
20 pages
Unit 4
No ratings yet
Unit 4
16 pages
Hashing
No ratings yet
Hashing
3 pages
Hashing Data Structures Notes
No ratings yet
Hashing Data Structures Notes
27 pages
Hashing Techniques in Algorithm Design
No ratings yet
Hashing Techniques in Algorithm Design
38 pages
Understanding Hashing and Collision Resolution
No ratings yet
Understanding Hashing and Collision Resolution
20 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
47 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
36 pages
Hash Table Access and Collision Resolution
No ratings yet
Hash Table Access and Collision Resolution
28 pages
Collision Resolution in Hashing Techniques
No ratings yet
Collision Resolution in Hashing Techniques
24 pages
Hashing Techniques and Applications
No ratings yet
Hashing Techniques and Applications
53 pages
Hashing and Open Addressing Explained
No ratings yet
Hashing and Open Addressing Explained
31 pages
Hash Functions and Collision Resolution
No ratings yet
Hash Functions and Collision Resolution
47 pages
Hashing Techniques in Algorithms
No ratings yet
Hashing Techniques in Algorithms
36 pages
Hashing Techniques and Set Data Structures
No ratings yet
Hashing Techniques and Set Data Structures
26 pages
Advantages of Hashing in DSA
No ratings yet
Advantages of Hashing in DSA
55 pages
Hash Functions and Collision Resolution
No ratings yet
Hash Functions and Collision Resolution
3 pages
Hashing Techniques and Collision Resolution
No ratings yet
Hashing Techniques and Collision Resolution
21 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
Hashing Techniques and Collision Handling
No ratings yet
Hashing Techniques and Collision Handling
16 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
9 pages
Hashing Techniques and Collision Resolution
No ratings yet
Hashing Techniques and Collision Resolution
2 pages
Understanding Hash Tables and Functions
No ratings yet
Understanding Hash Tables and Functions
53 pages
Hash Tables: Structure, Functions, and Collisions
No ratings yet
Hash Tables: Structure, Functions, and Collisions
72 pages
Understanding Hashing Techniques and Functions
No ratings yet
Understanding Hashing Techniques and Functions
11 pages
Understanding Hash Tables and Collisions
No ratings yet
Understanding Hash Tables and Collisions
92 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
26 pages
Understanding Hashing and Collision Resolution
No ratings yet
Understanding Hashing and Collision Resolution
55 pages
Understanding Hashing and Its Techniques
No ratings yet
Understanding Hashing and Its Techniques
25 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
53 pages
Hashing Techniques and Functions
No ratings yet
Hashing Techniques and Functions
10 pages
Understanding Hash Tables: Pros and Cons
No ratings yet
Understanding Hash Tables: Pros and Cons
21 pages
Understanding Hashing Techniques
No ratings yet
Understanding Hashing Techniques
18 pages
Understanding Hashing Techniques and Functions
No ratings yet
Understanding Hashing Techniques and Functions
32 pages
Understanding Hashing Techniques
No ratings yet
Understanding Hashing Techniques
32 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
61 pages
Insertion Sequences in Hash Tables
No ratings yet
Insertion Sequences in Hash Tables
51 pages
Hash Tables: Structure and Functions
No ratings yet
Hash Tables: Structure and Functions
29 pages
Understanding Hashing Techniques and Functions
No ratings yet
Understanding Hashing Techniques and Functions
48 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
4 pages
Hash Table Search Complexity Explained
No ratings yet
Hash Table Search Complexity Explained
43 pages
Understanding Hash Tables and Functions
No ratings yet
Understanding Hash Tables and Functions
38 pages
Understanding Hashing and Collision Resolution
No ratings yet
Understanding Hashing and Collision Resolution
52 pages
Load Factor and Hash Function Basics
No ratings yet
Load Factor and Hash Function Basics
39 pages
Understanding Hashing and Its Functions
No ratings yet
Understanding Hashing and Its Functions
30 pages
Understanding Hash Tables and Functions
No ratings yet
Understanding Hash Tables and Functions
13 pages
Dictionary Applications and Hashing Methods
No ratings yet
Dictionary Applications and Hashing Methods
14 pages
Digit Extraction in Hashing Methods
No ratings yet
Digit Extraction in Hashing Methods
56 pages
Hash Tables and Collision Handling Techniques
No ratings yet
Hash Tables and Collision Handling Techniques
25 pages
Sahil Resume - Python Developer
No ratings yet
Sahil Resume - Python Developer
2 pages
Khushab Court Job Vacancies 2025
No ratings yet
Khushab Court Job Vacancies 2025
1 page
Python MongoDB Database Interaction
No ratings yet
Python MongoDB Database Interaction
39 pages
Key Components of Data Science Explained
No ratings yet
Key Components of Data Science Explained
3 pages
SelfOpt Structurizer for Medical Texts
No ratings yet
SelfOpt Structurizer for Medical Texts
21 pages
Lab 4.2
No ratings yet
Lab 4.2
3 pages
Data Mining Concepts and Techniques
No ratings yet
Data Mining Concepts and Techniques
4 pages
Optimize Association Rule Mining with ACO
No ratings yet
Optimize Association Rule Mining with ACO
4 pages
Java Developer with Spring Boot Expertise
No ratings yet
Java Developer with Spring Boot Expertise
2 pages
Deep Learning Applications Overview
No ratings yet
Deep Learning Applications Overview
7 pages
Database Management Systems Overview
No ratings yet
Database Management Systems Overview
32 pages
Bibliometric Analysis of Smart Cities in Indonesia
No ratings yet
Bibliometric Analysis of Smart Cities in Indonesia
5 pages
Pahalgam Hotel Structural Drawings
No ratings yet
Pahalgam Hotel Structural Drawings
1 page
Rahul Chaurasiya: Developer Profile
No ratings yet
Rahul Chaurasiya: Developer Profile
1 page
NLP for Resume Analysis and Job Automation
No ratings yet
NLP for Resume Analysis and Job Automation
7 pages
Overview of Optical Neural Networks
No ratings yet
Overview of Optical Neural Networks
8 pages
1st Semester IT Exam Key Questions
No ratings yet
1st Semester IT Exam Key Questions
5 pages
Hammad Hussain: IT Intern & Projects
No ratings yet
Hammad Hussain: IT Intern & Projects
1 page
Sorting of Radio Signals Using Adversarial Machine Learning
No ratings yet
Sorting of Radio Signals Using Adversarial Machine Learning
4 pages
Unsupervised Learning Techniques Explained
No ratings yet
Unsupervised Learning Techniques Explained
11 pages
BIM-GIS Integration Challenges Explained
No ratings yet
BIM-GIS Integration Challenges Explained
11 pages
Catalogue of Archaeological Finds From
No ratings yet
Catalogue of Archaeological Finds From
67 pages
FASE 2004 Conference Proceedings
No ratings yet
FASE 2004 Conference Proceedings
402 pages
Database Project for Hostel Management
No ratings yet
Database Project for Hostel Management
21 pages
Data Scientist Resume: Skills & Experience
No ratings yet
Data Scientist Resume: Skills & Experience
2 pages
Data Science Course Syllabus Overview
No ratings yet
Data Science Course Syllabus Overview
5 pages
Full-Stack Developer Profile: Borlay
No ratings yet
Full-Stack Developer Profile: Borlay
2 pages
Overview of Azure Storage Services
No ratings yet
Overview of Azure Storage Services
5 pages
Data Transformation and Lift Calculation
No ratings yet
Data Transformation and Lift Calculation
3 pages
Comprehensive Python Programming Guide
No ratings yet
Comprehensive Python Programming Guide
12 pages

Hashing and Hash Tables Overview

Uploaded by

Hashing and Hash Tables Overview

Uploaded by

CSC 3XX - Lecture Notes: Hashing and Hash Tables

Topic: Hashing and Hash Tables

Date: October 26, 2023

2.1. The Hash Function

Properties of a Good Hash Function:

Example Hash Functions:

• Simple Modulo (for integers): If keys are integers, a common approach is

• For Strings (Polynomial Rolling Hash): hash(S) = (S[0] * p^(k-1) + S[1]

2.2. The Hash Table Structure

A hash table fundamentally consists of:

Basic Operations and Time Complexity:

1. Compute index = hash_function(key) % M.

1. Compute index = hash_function(key) % M.

1. Compute index = hash_function(key) % M.

3. Collision Resolution Strategies

3.1. Separate Chaining

1. To insert(key, value): Compute index = hash(key) % M. Add the

○ Relatively simple to implement.

○ Requires extra space for pointers in the linked lists.

• 10 % 5 = 0: table[0] -> [10]

3.2. Open Addressing

1. To insert(key, value): Compute index = hash(key) % M. If

○ Better cache performance because elements are stored contiguously in memory.

○ Sensitive to the load factor (table can become full).

Probing Strategies for Open Addressing:

• Probe sequence: (hash(key) + i) % M, for i = 0, 1, 2, ...

• Probe sequence: (hash(key) + i^2) % M, for i = 0, 1, 2, ...

• Probe sequence: (hash1(key) + i * hash2(key)) % M, for i = 0, 1, 2,

Resulting table: table[0] = 10 table[1] = 5 table[2] = 22 table[3] = 15

4. Load Factor and Resizing (Rehashing)

• Cost: Resizing is an O(N) operation, as every item must be re-inserted.

5. Applications of Hash Tables

• Symbol Tables in Compilers/Interpreters: Store information about variables,

You might also like