0% found this document useful (0 votes)

6 views105 pages

Understanding Hashing Techniques in Data Structures

Chapter 5 discusses hashing as a technique for efficient data retrieval in databases, comparing it to linked lists and binary search trees. It covers the general idea of hash tables, hash functions, collision resolution methods like separate chaining and open addressing, and the performance analysis of these techniques. The chapter emphasizes the importance of selecting effective hash functions and managing load factors to optimize search times.

Uploaded by

jobsobserverpk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views105 pages

Understanding Hashing Techniques in Data Structures

Uploaded by

jobsobserverpk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Chapter 5

Hashing
Motivation
 Let us assume that we want to search for a particular
item in a database of 20,000,000 data items
 How long would it take to find for a successful
search?
 How long would it take for an unsuccessful
search?
 It depends on the data structure
Motivation…
 If the data structure is a linked list,
 the search time is O(N)
 If the data structure is a binary search tree,
 estimated running time is O(logN)
 log 20,000,000 ≈ 24

 Can we do even better than O(logN) ?

 hash table ADT
Chapter 5: Hashing
5.1 General Idea
5.2 Hash Function
5.3 Separate Chaining
5.4 Open Addressing
5.5 Rehashing
5.6 Extendible Hashing
Chapter 5: Hashing
Our goals: We will
 See several methods of implementing the hash table
 Compare these methods analytically
 Show numerous applications of hashing
 Compare hash tables with binary search trees
First some terminology
 Hash table ADT is a data structure that supports
only a subset of the operations allowed by the
binary search trees
 Implementation of a hash table is called hashing
 Hashing is a technique used for performing
insertions, deletions, and finds in a constant time
General Idea
 The general idea behind hashing is to directly map
each data item into an address in memory using
some function
 key  hash function  index to an array

 Components of hashing

 A hash table is an array of some fixed size ‘m’

 A hash function h(k) that maps a search key k to
some location in the range [0...m-1]
• h(k): S  {0, 1, …, m-1}
General Idea… array

0
Name: Irzam Shahid h(Irzam) = 1
University: RCET 1
Office: room 1 EED
Mobile Number: 2
Email:
etc

Data Item

Here we are using a hashing function that

accepts my last name as a key and returns a 1

m-1
General Idea…
 Desired Properties of h(k)
 simple to compute
 uniform distribution of keys over {0, 1, …, m-1}

• when h(k1) = h(k2) for two distinct keys k1, k2 , we

have a collision
General Idea… array

0
Name: Irzam Shahid h(Irzam) = 1
University: RCET 1
Office: room 1 EED
Mobile Number: 2
Email:
etc

Data Items
Name: Rehan Arif
University: RCET
Office: room 8 EED
Mobile Number:
h(Rehan) = 1
Email:
etc
A collision has occurred m-1
General Idea…
 Two Important Topics in Hashing
 How to select a hash function
 How to resolve collisions
General Idea…
 Hashing revisited
 A hash table data structure is an array
 Each data element contains a key
 Each key is mapped to some number in the range
from 0 to TableSize-1, with the help of a hash
function
• The hash function should be efficient to compute and
should ensure that different data items get mapped to
different numbers
 The key and the hashing function are used both to
insert the data into the table and to later find that
data
General Idea…
 Example
 PTCL is a large telephone company, and they
want to maintain a database that provides the
caller ID capability
• given a phone number, return the caller’s name
• phone numbers range from 0 to r = 107 -1
• want to do this as efficiently as possible
General Idea…
 Solution 1
 an array indexed by key
• takes O(1) time,
• O(r) space - huge amount of wasted space

Umer (null) Hassan (null) (null)

Hamid Hamid
6829227 0000000 6829229 0000000 0000000
General Idea…
 Solution 2
 Linked list
• takes O(r) time,
• O(r) space (only as much space as is needed )

Umer Hamid Hassan Hamid

6829227 6829229
General Idea…
 Solution 3
 Hash table
• O(1) expected time, O(n+m) space, where m is table size
 Like an array, but come up with a function to map the
large range into one which we can manage
• e.g. take the original key, modulo the (relatively small) size
of the array, and use that as an index
• 6829229 mod 5 = 4

(null) (null) (null) (null) Hassan

Hamid
0 1 2 3 4
Hash Function
 A simple hash function
 If input keys (k) are integers
 hash function, h( k ) = k mod m
um ber
• where m is the table size ime n
a pr
l d be
shou
io n, m
 Example
s i t ua t
c h a
• o i d su
To aSuppose
v m = 10,
• k = 10, 20, 30, 40
• h(k) = 0, 0, 0, 0
– A bad choice if the keys end in zeros
Hash Function…
 Another simple hash function
 If input keys (k) are integers
 hash function, h( k ) = k mod m
• where m is the table size and is a prime number

 Example
• Suppose m = 11,
• k = 10, 20, 30, 40
• h(k) = 10, 9, 8, 7
– Distributes the keys more uniformly
Hash Function…
 A simple hash function
 If the keys are strings, then the hash function
can be some function of the characters in the
strings
 One possibility is to simply add the ASCII
values of the characters:

 length  1 
h( str )   str[i ]  %m
 i 0 
• Example
– h(ABC) = (65 + 66 + 67)%m
Hash Function…
 Problem
 If the table size is large, the function does not
distribute the keys well
 TableSize = 10,007 (prime number)
 Keys are <= 8 characters
 Each char is 1 byte long so highest value it can
have is 28 – 1 = 127
 Hash function will have range: 0 to (127*8) = 0 to
1016
 ~10K spaces in the table and only using the first
1K elements
Hash Function…
 Another hash function
 If the keys are strings
 convert the string into some number in some
arbitrary base b

 length  1 i
h( str )   str[i ] b  %m
 i 0 

• Example
– h(ABC) = (65b0 + 66b1 + 67b2) %m
Hash Function…
 Examines first three characters of the input
 The value 27 represents the number of letters
in English alphabet, plus the blank o not
i s al s
abl e ,
m p ut l a r g e
i l y c o a b ly
he a s eas o n
oug le i s r
i on , th t a b
Index f un ct e ha sh
T his a t e i f th
pr p ri char *Key, int TableSize )
Hash2(oconst
{ ap
return ( Key[ 0 ] + 27 * Key[ 1 ] + 729 * Key[ 2 ] )% TableSize;

}
Hash Function…
 Rule of Thumb

 Hash functions should try to achieve uniform full

coverage of the hash table, while minimizing
collisions
 Since this is usually impossible, and collisions will
almost always occur, an important design
consideration is how you deal with the collision
resolution
Separate Chaining
 How to deal with two keys which hash to the same
spot in the array?
 Use chaining
 All data items that hash to the same number are
kept in a linked list
• Setup an array of lists, indexed by the keys, to
lists of items with the same key
Separate Chaining…
 Example

0 Name: Irzam Shahid Name: Rehan Arif

University: RCET University: RCET
Office: room 8 EED
1 Office: room 1 EED
Mobile Number:
Mobile Number:
Email: Email:
2 etc etc

m-1 The two entries are now stored

in a linked list
Separate Chaining…
 Example
 Here the size of the
hash table = 10
 Keys are the first ten
perfect squares 0, 1, 4,
9, 16, 25, 36, 49, 64, and
81
 The hash function,
h(k) = k mod 10

A separate chaining hash table

Separate Chaining…
 To find an element
 using hash function, look up its position in table
 search for the element in the linked list of the
hashed slot

 To insert an element
 compute h(k) to determine which list to traverse
 If T[h(k)] contains a null pointer, initialize this entry
to point to a linked list that contains k alone
 If T[h(k)] is a non-empty list, we add k at the
beginning of this list
Separate Chaining…
 To delete an element
 compute h(k), then search for k within the list at
T[h(k)]
 delete k if it is found
Separate Chaining…
 Analysing the performance of separate chaining hash
table
 as we increase the number of elements N in the
hash table, more and more items will be stored in
linked lists, thus slowing everything down
 Also increasing the table size TableSize allows you
to hold more data in an efficient manner
 It turns out that the ratio λ = N / T is the important
quantity to analyze
• This is called the load factor
Separate Chaining…
 Analysing the performance of separate chaining hash
table…
 Time to perform search = the constant time
required to evaluate the hash function + time to
traverse the list
 Note that, for separate chaining, the average
length of a linked list is λ
 Thus, an unsuccessful search will require to
traverse λ links on average
 A successful search requires that about 1 + (λ/2)
links be traversed
Separate Chaining…
 Analysing the performance of separate chaining hash
table…
 Thus, lowering the load factor is a good thing, from
the time point of view
 From the space point of view, lowering the load
factor means increasing the table size
• This can lead to largely wasted space
 A reasonable compromise is λ ≈ 1
• search times will be roughly O(1)
Open Addressing
 Separate chaining has the disadvantage of using
linked lists that slows the algorithm because of the
time required to allocate new cells

 Open addressing
 relocate the key k to be inserted if it collides with
an existing key
• That is, we store k at an entry different from
T[h(k)]
Open Addressing…
 Open addressing hashing resolves collisions by trying
alternative slots in the hash table, until an empty cell
is found
 cells h0 (X), h1 (X), h2 (X),… are tried in succession
where hi (X) = (Hash(X) + F(i))mod TableSize with
F(0) = 0
 The function, F, is the collision resolution strategy
Open Addressing…
 Linear Probing
 F(i) is a linear function of i, i.e. F(i) = i
• h0(X) = Hash(X) + 0
• h1(X) = Hash(X) + 1
• h2(X) = Hash(X) + 2
•…
• cells are probed sequentially (with wraparound)
in search of an empty cell
Open Addressing…
 *Example
 suppose that our hash function converts a 2-digit
integer into a single digit by taking the least-
significant digit

*[Link]
~ece250/
Open Addressing…
 *Insertions
 Insert the numbers 81, 70, 97, 60, 51, 38, 89, 68, 24 into the
initially empty hash table:

0 1 2 3 4 5 6 7 8 9

*[Link]
~ece250/
Open Addressing…
 *Insertions…
 We can easily insert 81, 70, and 97 into their corresponding
bins:

0 1 2 3 4 5 6 7 8 9
70 81 97

*[Link]
~ece250/
Open Addressing…
 *Insertions…
 Inserting 60 causes a collision in bin 0, therefore, we check:

• bin 1 (also full), and

• bin 2 (empty)

0 1 2 3 4 5 6 7 8 9
70 81 60 97

*[Link]
~ece250/
Open Addressing…
 *Insertions…
 Inserting 51 also causes a collision, this time, in bin 1,
therefore, we check:
• bin 2 (also full), and
• bin 3 (empty)

0 1 2 3 4 5 6 7 8 9
70 81 60 51 97

*[Link]
~ece250/
Open Addressing…
 *Insertions…
 38 and 89 can be placed into bins 8 and 9 respectively
without collisions

0 1 2 3 4 5 6 7 8 9
70 81 60 51 97 38 89

*[Link]
~ece250/
Open Addressing…
 *Insertions…
 Inserting 68 causes a collision in bin 8, and therefore we
check bins:
• 9, 0, 1, 2, 3, and finally 4 which is empty
• insert 68 into bin 4

0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 97 38 89

*[Link]
~ece250/
Open Addressing…
 *Insertions…
 Inserting 24 causes a collision in bin 4, however the next bin
is empty

0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89

*[Link]
~ece250/
Open Addressing…
 *Searching
 Testing for membership is similar to insertions
 Start at the appropriate bin, and continue
searching forward until either:
• the item is found, or
• an empty bin is found

*[Link]
~ece250/
Open Addressing…
 *Searching…
 Searching for 68, we first examine bin 8, then 9, 0, 1, 2, 3,
and 4, finding 68 in bin 4
 Searching for 23, we search bins 3, 4, 5, and bin 6 is empty,
so 23 is not in the table

0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89

*[Link]
~ece250/
Open Addressing…
 *Removing
 We cannot simply remove elements from the hash table
 For example, if we delete 89 by removing it, we can no
longer find 68

0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89

*[Link]
~ece250/
Open Addressing…
 *Removing…
 However, we cannot simply move all entries up to fill the gap
 Moving 70 to bin 9 would make it impossible to find 70

0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89

81 60 51 68 24 97 38 70

*[Link]
~ece250/
Open Addressing…
 *Removing…
 Instead, we must probe forward, moving only those
elements which would not be moved to a location before
their bin starts
 For example, we remove 89

0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38

*[Link]
~ece250/
Open Addressing…
 *Removing…
 We probe forward until we find an entry which can be moved
into bin 9
 We cannot move 70, 81, 60, or 51, but we can move 68

0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68

*[Link]
~ece250/
Open Addressing…
 *Removing…
 Next, we search forward again, and note that 24 can be
moved forward
 The next cell is already empty, and therefore we are finished

0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68

*[Link]
~ece250/
Open Addressing…
 *Removing…
 Suppose we now remove 60

0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68

*[Link]
~ece250/
Open Addressing…
 *Removing…
 We find 60 in bin 2, and therefore we remove it
 We search forward and find that we can move 51 into bin 2

0 1 2 3 4 5 6 7 8 9
70 81 51 24 97 38 68

*[Link]
~ece250/
Open Addressing…
 *Removing…
 We cannot move 24 forward
 The next bin (5) is empty, therefore we are finished

0 1 2 3 4 5 6 7 8 9
70 81 51 24 97 38 68

*[Link]
~ece250/
Open Addressing…
 *Primary Clustering
 We have already observed the following
phenomenon:
• as we insert more elements into the hash table,
the contiguous regions get larger
• Any key that hashes into the cluster will require
several attempts to resolve the collision
 This results in longer search times

*[Link]
~ece250/
Open Addressing…
 *Primary Clustering…
 Consider inserting the following entries 81, 70, 97, 63, 76,
38, 85, 68, 21, 9, 55, 73, 57, 60, 72, 74, 85, 16, 61, 7, 49
 Use the number modulo 25 to determine which bin it should
occupy
 The first five don’t cause any collisions

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

76 81 63 70 97

*[Link]
~ece250/
Open Addressing…
 *Primary Clustering…
 Inserting 38 causes a collision in bin 13
 The next seven do not cause any further collisions

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

76 55 81 57 9 85 63 38 68 70 21 97 73

*[Link]
~ece250/
Open Addressing…
 *Primary Clustering…
 The next four insertions cause collisions:

60 (bin 10)
72 (bin 22)
74 (bin 24)
85 (bin 10)
 We can safely insert 16 into bin 16

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

74 76 55 81 57 9 85 60 85 63 38 16 68 70 21 97 73 72

*[Link]
~ece250/
Open Addressing…
 *Primary Clustering…
 The remaining insertions all cause collisions:

61 (bin 11)
7 (bin 7)
49 (bin 24)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

74 76 49 55 81 57 7 9 85 60 85 63 38 61 16 68 70 21 97 73 72

*[Link]
~ece250/
Open Addressing…
 Asymptotic Performance
 Primary clustering affects the number of probes
required to perform the insertions, searches or
deletions
 The average number of probes for a successful
search can be estimated as
• Number of probes  ( ½ ) ( 1+1/( 1- ) )
• where  is the load factor – what fraction of the table is
used
Open Addressing…
 Asymptotic Performance…
 The number of probes for an unsuccessful search
n ha lf
or for an insertion is higher: r e tha
2 is m
o
• Number of probes  ( ½ ) ( 1+1/( t1- a bl e
) )
i f t h e
e
ic expected
• if  = 0.75 , 8.5 probeshoare
b c
ad are expected, and this unreasonable
• if  = 0.9 , 50be a
probes
c a n
r o bi ng
i ne a rp
l
full
Open Addressing…
 *The following plot shows how the number of
required probes increases

*[Link]
~ece250/
Open Addressing…
 *Primary clustering occurs with linear probing
because the same linear pattern
 if a bin is inside a cluster, then the next bin must
either
• also be in that cluster, or
• expand the cluster

 Instead of searching forward in a linear fashion,

consider searching forward using a quadratic function

*[Link]
~ece250/
Open Addressing…
 Quadratic Probing
 with quadratic probing F(i) = i2
 This eliminates the primary clustering problem of
linear probing
• h0(X) = Hash(X) + 0
• h1(X) = Hash(X) + 1
• h2(X) = Hash(X) + 4
• …
Open Addressing…
*Insertions
 Suppose that an element should appear in bin h
 if bin h is occupied, then check the following
sequence of bins
h + 12, h + 22, h + 32, h + 42, h + 52, ...
h + 1, h + 4, h + 9, h + 16, h + 25, ...

 For example, with M = 17

*[Link]
~ece250/
Open Addressing…
*Insertions…
 If one of h + i2 falls into a cluster, this does not imply
the next one will

*[Link]
~ece250/
Open Addressing…
*Insertions…
 For example, suppose an element was to be inserted in
bin 23 in a hash table with 31 bins

 The sequence in which the bins would be checked is

23, 24, 27, 1, 8, 17, 28, 10, 25, 11, 30, 20, 12, 6, 2, 0

 Even if two bins are initially close, the sequence in

which subsequent bins are checked varies greatly

*[Link]
~ece250/
Open Addressing…
*Insertions…
 Thus, quadratic probing solves the problem of
primary clustering
 Unfortunately, there is a second problem which must
be dealt with
 Suppose we have M = 8 bins
12 ≡ 1, 22 ≡ 4, 32 ≡ 1

 In this case, we are checking bin h + 1 twice

having checked only one other bin

*[Link]
~ece250/
Open Addressing…
*Insertions…
 Unfortunately, there is no guarantee that
h + i2 mod M
will cycle through 0, 1, ..., M – 1

 Solution
 M should be a prime number
 in this case, h + i2 mod M for i = 0, ..., (M – 1)/2 will
cycle through exactly (M + 1)/2 values before
repeating

*[Link]
~ece250/
Open Addressing…
*Insertions…
 Example

 with M = 11
0, 1, 4, 9, 16 ≡ 5, 25 ≡ 3, 36 ≡ 3

 with M = 13
0, 1, 4, 9, 16 ≡ 3, 25 ≡ 12, 36 ≡ 10, 49 ≡ 10

 with M = 17
0, 1, 4, 9, 16, 25 ≡ 8, 36 ≡ 2, 49 ≡ 15, 64 ≡ 13, 81 ≡ 13

*[Link]
~ece250/
Open Addressing…
*Insertions…
 Thus, quadratic probing avoids primary clustering

 Unfortunately, we are not guaranteed that we will use

all the bins
 In reality, if the hash function is reasonable, this is not
a significant problem until  approaches 1

*[Link]
~ece250/
Open Addressing…
*Insertions…
 Example
 with a hash table with M = 19 using quadratic
probing, insert the following random 3-digit
numbers

086, 198, 466, 709, 973, 981, 374,

766, 473, 342, 191, 393, 300, 011,
538, 913, 220, 844, 565
 using the number modulo 19 to be the initial bin
*[Link]
~ece250/
Open Addressing…
*Insertions…
 The first two fall into their correct bin
086 → 10, 198 → 8

 The next already causes a collision

466 → 10 → 11

 The next four cause no collisons

709 → 6, 973 → 4, 981 → 12, 374 → 13

 Then another collision

766 → 6 → 7
*[Link]
~ece250/
Open Addressing…
*Insertions…
 At this point, two clusters have appeared and the
load factor is  = 0.42

*[Link]
~ece250/
Open Addressing…
*Insertions…
 The next three also go into their appropriate bins
473 → 17, 342 → 0, 191 → 1

 Then there is one more collision

393 → 13 → 14

 300 falls into its correct bin

300 → 15

*[Link]
~ece250/
Open Addressing…
*Insertions…
 With previous five insertions, the load factor is  =
0.68 with one large cluster

*[Link]
~ece250/
Open Addressing…
*Insertions…
 At this point, insertions become more tedious

011 → 11 → 12 → 15 → 1 → 8 → 17 → 9

538 → 6 → 7 → 10 → 15 → 3

913 → 1 → 2

220 → 11 → ⋅⋅⋅ → 9 → 3 → 18

844 → 8 → 9 → 12 → 17 → 5

*[Link]
~ece250/
Open Addressing…
*Insertions…
 To show how quadratic probing works, consider the
addition of 538, starting in bin 6

 The first four bins all fall within the same cluster,
however, the fifth bin checked falls far outside the
cluster

*[Link]
~ece250/
Open Addressing…
*Insertions…
 At this point, the array is almost full (bin 16 is open)
and the load factor is  = 0.95

 If we try to add the last number 565, the sequence of

bins checked is
14 → 15 → 18 → 4 → 11 → 1 → 12 → 6 → 2 →
0
which does not hit bin 16
*[Link]
~ece250/
Open Addressing…
 *We can compare the number of probes required with
that of linear probing
086 → 10, 10 198 → 8
466 → 10 → 11 709 → 6
973 → 4 981 → 12
374 → 13 766 → 6 → 7
473 → 17 342 → 0
191 → 1 393 → 13 → 14
300 → 15 011 → 11 → 12 → 13 → 14 → 15 → 16
538 → 6 → 7 → 8 → 9 913 → 1 → 2
220 → 11 → 12 → 13 → 14 → 15 → 16 → 17 → 18
844 → 8 → 9 → 10 → 11 → 12 → 13 → 14 → 15 → 16 → 17 → 18 → 0 → 1 → 2 → 3
565 → 14 → 15 → 16 → 17 → 18 → 0 → 1 → 2 → 3 → 4 → 5

*[Link]
~ece250/
Open Addressing…
*Deletions
 With linear probing, if we deleted the contents of a
bin, we had to search ahead to determine if any
nodes had to be moved back
 easy with linear probing; we simply moved from
bin to bin until an empty bin was located

*[Link]
~ece250/
Open Addressing…
*Deletions…
 The nonlinear probing associated with quadratic
probing does not allow us to do this efficiently
 For example, suppose we delete 466 which is
currently in bin 11

 The two other entries which pass through bin 11 were

011 and 220
 We cannot (efficiently) find these entries

*[Link]
~ece250/
Open Addressing…
*Deletions…
 Solution
 associate with each bin a field which is either
EMPTY, OCCUPIED, or DELETED

*[Link]
~ece250/
Open Addressing…
*Deletions…
 Initially, all bins are initially marked EMPTY

 When a bin is filled, it is marked OCCUPIED

 If a bin is emptied (as a result of a remove), it is

marked DELETED
 Note that a bin which is marked as being DELETED
may once again be filled (and hence marked
OCCUPIED)

*[Link]
~ece250/
Open Addressing…
*Deletions…
 Example
 given a hash table with
M = 11 bins, enter the values
135 909 246 894 518 365
Bin 0 1 2 3 4 5 6 7 8 9 10
Entry
Flag E E E E E E E E E E E

*[Link]
~ece250/
Open Addressing…
*Deletions…
 The first three are straight-forward
135 → 3 909 → 7 246 → 4

Bin 0 1 2 3 4 5 6 7 8 9 10
Entry 135 246 909
Flag E E E O O E E O E E E

*[Link]
~ece250/
Open Addressing…
 The phenomenon of primary clustering will not occur
with quadratic probing
 However, if multiple items all hash to the same initial
bin, the same sequence of numbers will be followed
 This is termed secondary clustering

 The effect is less significant than that of primary

clustering

*[Link]
~ece250/
Open Addressing…
 Secondary clustering may be a problem if the hash
function does not produce an even distribution of
entries
 One solution to secondary clustering is double
hashing, associating with each element an initial bin
(defined by one hash function) and a skip (defined by
a second hash function)

*[Link]
~ece250/
Open Addressing…
 Example
 Insert the 6 elements
14, 107, 31, 118, 34, 112
into an initially empty hash table of size 11 using
quadratic hashing

 Let the hash function be the number modulo 11

*[Link]
~ece250/
Open Addressing…
 The first three fall into bins 3, 8, and 9, respectively

0 1 2 3 4 5 6 7 8 9 10

14 107 31

*[Link]
~ece250/
Open Addressing…
 118 also falls into bin 8 (occupied)
 Thus, we check
8+1=9 - occupied
8+4=1 - unoccupied

0 1 2 3 4 5 6 7 8 9 10

118 14 107 31

*[Link]
~ece250/
Open Addressing…
 34 falls into bin 1 which is occupied, thus we check
 1+1=2 - unoccupied

0 1 2 3 4 5 6 7 8 9 10

118 34 14 107 31

*[Link]
~ece250/
Open Addressing…
 112 falls into bin 2 which is now occupied, thus we
check
2+1=3 - occupied
2+4=6 - unoccupied

0 1 2 3 4 5 6 7 8 9 10

118 34 14 112 107 31

*[Link]
~ece250/
Open Addressing…
 At this point, the hash table is over half full
 We are no longer guaranteed that the insertion of a
new element may be possible
 Solution
 increase the size of the table (perhaps only after failing)
 Problem
 the new size must, too, be prime

0 1 2 3 4 5 6 7 8 9 10

118 34 14 112 107 31

*[Link]
~ece250/
Open Addressing…
 To remove an element, we must simply mark it as
deleted
 In our example, removing 118, we begin in bin 8, and
continue to check 9, and then 1
 Mark that bin as having had an element deleted

0 1 2 3 4 5 6 7 8 9 10
DEL 34 14 112 107 31

*[Link]
~ece250/
Open Addressing…
 To find an element we start by checking the bin it
should have initially been in, and then begin checking
following quadratic probing until either
 we find it, or
 we find a bin which is empty

0 1 2 3 4 5 6 7 8 9 10
DEL 34 14 112 107 31

*[Link]
~ece250/
Open Addressing…
 We find 14 in bin 3
 We don’t find 34 in bin 1 (marked as deleted), so we
check bin 1 + 1 = 2, and find it

0 1 2 3 4 5 6 7 8 9 10
DEL 34 14 112 107 31

*[Link]
~ece250/
Open Addressing…
 We search for 19 in bin 8
 Not finding it, we check
8+ 1=9 - occupied
8+ 4=1 - deleted
8+ 9=6 - occupied
8 + 16 = 2 - occupied
8 + 25 = 0 - unoccupied: not found

0 1 2 3 4 5 6 7 8 9 10
DEL 34 14 112 107 31

*[Link]
~ece250/
Open Addressing…
 Double Hashing
 choose the initial bin with a first hash function
 choose the jump value with a different hash
function i.e. F(i) = i * hash2(key)
• A function such as hash2(key) = R – (key%R) ,
with R a prime smaller than TableSize, will often
work well
Open Addressing…
 Example
 The hash table size, TableSize = 10,
 Insert the keys 89, 18, 49, 58, and 69
 The hash function is h(key) = key%10

 The 2nd hash function is hash (key) = 7- (key%7)

2
Open Addressing…
 Example…
Open Addressing…
 Conclusions
 Double hashing has performance that is almost
optimal
 However, calculating the 2nd hash function does
provide some additional computational inefficiency
Rehashing
 If the table gets too full, the running time for the
operations will start taking too long

 Solution
 Rehashing
• Build another table twice as big with a new associated
hash function
• Scan down entire original hash table
• Compute new hash value for each element
• Insert into new table
Rehashing… Original Hash Table

 Example
 hash table size = 7
 Insert the elements
13, 15, 24, and 6
 The hash function
After Inserting 23
is h(key) = key%7
 Use linear probing
to resolve collisions
 Insert 23 now

Because this table is too full, enlarge

it to size 17, and redefine the hash
function
Rehashing… After Rehashing

 Example…
 hash table size = 17
 The hash function is
h(key) = key%17
 The old table is
scanned and elements
6, 15, 23, 24, and 13 are
inserted into the new
table
 Use linear probing to
resolve collisions
Rehashing…
 Complexity of Rehashing
 It takes O(N) time to rehash, since there are N
elements to rehash
Hashing : Summary
 Hash tables can be used to implement the Insert and
Find operations in constant average time
 For these time bounds to be valid, special attention has to
be paid to load factor
 For separate chaining hashing, λ should be close to 1
 For open addressing hashing, λ should not exceed 0.5
 If linear probing is used, performance degenerates rapidly
as the λ approaches to 1
 Rehashing can be implemented to allow the table to grow ,
thus maintaining a reasonable λ

Hashing Techniques in Algorithms
No ratings yet
Hashing Techniques in Algorithms
36 pages
Hashing V2
No ratings yet
Hashing V2
38 pages
Hash Table Operations and Collision Resolution
No ratings yet
Hash Table Operations and Collision Resolution
48 pages
Understanding Hash Tables and Functions
No ratings yet
Understanding Hash Tables and Functions
53 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
47 pages
Hashing Techniques in Algorithm Design
No ratings yet
Hashing Techniques in Algorithm Design
38 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
9 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
36 pages
Understanding Hash Tables and Functions
No ratings yet
Understanding Hash Tables and Functions
42 pages
Primary Clustering in Hashing Explained
No ratings yet
Primary Clustering in Hashing Explained
61 pages
Hashing Techniques and Applications
No ratings yet
Hashing Techniques and Applications
53 pages
Understanding Hashing Techniques and Collision Resolution
No ratings yet
Understanding Hashing Techniques and Collision Resolution
20 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
44 pages
Understanding Hashing and Collision Resolution
No ratings yet
Understanding Hashing and Collision Resolution
33 pages
Understanding Hashing and Collision Resolution
No ratings yet
Understanding Hashing and Collision Resolution
55 pages
Understanding Hashing and Collision Resolution
No ratings yet
Understanding Hashing and Collision Resolution
20 pages
Understanding Hash Tables and Functions
No ratings yet
Understanding Hash Tables and Functions
38 pages
Hashing Techniques and Set Data Structures
No ratings yet
Hashing Techniques and Set Data Structures
26 pages
Hash Table Search Complexity Explained
No ratings yet
Hash Table Search Complexity Explained
43 pages
Hash Table Access and Collision Resolution
No ratings yet
Hash Table Access and Collision Resolution
28 pages
Understanding Hash Tables and Collisions
No ratings yet
Understanding Hash Tables and Collisions
92 pages
Understanding Hashing and Collision Resolution
No ratings yet
Understanding Hashing and Collision Resolution
52 pages
Searching Algorithms and Hashing Techniques
No ratings yet
Searching Algorithms and Hashing Techniques
37 pages
Understanding Hash Tables and Functions
No ratings yet
Understanding Hash Tables and Functions
64 pages
Insertion Sequences in Hash Tables
No ratings yet
Insertion Sequences in Hash Tables
51 pages
Efficient Hash Table Techniques
No ratings yet
Efficient Hash Table Techniques
47 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
18 pages
Hashing and Hash Tables Explained
No ratings yet
Hashing and Hash Tables Explained
5 pages
Understanding Hashing and Hash Tables
No ratings yet
Understanding Hashing and Hash Tables
11 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
61 pages
Multiplication Method in Hashing
No ratings yet
Multiplication Method in Hashing
41 pages
Hashing Time Complexity Explained
No ratings yet
Hashing Time Complexity Explained
44 pages
Implementing Dictionaries with Hashing
No ratings yet
Implementing Dictionaries with Hashing
29 pages
Hash Tables and Binary Search Trees
No ratings yet
Hash Tables and Binary Search Trees
57 pages
Hashing and Hash Tables Overview
No ratings yet
Hashing and Hash Tables Overview
6 pages
Understanding Hash Tables in Data Structures
No ratings yet
Understanding Hash Tables in Data Structures
65 pages
Understanding Hash Tables and Collision Resolution
No ratings yet
Understanding Hash Tables and Collision Resolution
48 pages
Understanding Hashing and Its Techniques
No ratings yet
Understanding Hashing and Its Techniques
25 pages
Understanding Hashing and Hash Tables
No ratings yet
Understanding Hashing and Hash Tables
9 pages
Hashing Techniques in Algorithm Analysis
No ratings yet
Hashing Techniques in Algorithm Analysis
53 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
7 pages
Hash Tables: Structure, Functions, and Collisions
No ratings yet
Hash Tables: Structure, Functions, and Collisions
72 pages
Hash Functions and Collision Resolution Techniques
No ratings yet
Hash Functions and Collision Resolution Techniques
339 pages
Understanding Hash Tables and Algorithms
No ratings yet
Understanding Hash Tables and Algorithms
47 pages
Hashing Techniques and Collision Handling
No ratings yet
Hashing Techniques and Collision Handling
47 pages
Hashing in Data Structures Explained
No ratings yet
Hashing in Data Structures Explained
24 pages
Understanding Hash Tables and Functions
No ratings yet
Understanding Hash Tables and Functions
13 pages
Search Algorithms and Hashing Techniques
No ratings yet
Search Algorithms and Hashing Techniques
41 pages
Open Addressing Hash Table Operations
No ratings yet
Open Addressing Hash Table Operations
23 pages
Understanding Hash Functions and Tables
No ratings yet
Understanding Hash Functions and Tables
65 pages
Understanding Hash Tables in Data Science
No ratings yet
Understanding Hash Tables in Data Science
25 pages
Hashing Techniques and Collision Resolution
No ratings yet
Hashing Techniques and Collision Resolution
21 pages
Hashing Techniques and Functions
No ratings yet
Hashing Techniques and Functions
10 pages
Direct Addressing in Hash Tables
No ratings yet
Direct Addressing in Hash Tables
35 pages
Advanced Hashing Techniques Explained
No ratings yet
Advanced Hashing Techniques Explained
46 pages
Hashing Techniques in Algorithms
No ratings yet
Hashing Techniques in Algorithms
36 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
43 pages
Lazy Deletion in Hash Tables
No ratings yet
Lazy Deletion in Hash Tables
27 pages
C Projects: Twitter X & ATM Systems
No ratings yet
C Projects: Twitter X & ATM Systems
5 pages
Chapter 6 Study Guide and Assignments
No ratings yet
Chapter 6 Study Guide and Assignments
1 page
CPU and Software Fundamentals Explained
No ratings yet
CPU and Software Fundamentals Explained
23 pages
Sample
No ratings yet
Sample
1 page
Python Number Checkers and Algorithms
No ratings yet
Python Number Checkers and Algorithms
3 pages
TIA Portal Video Tutorials Overview
No ratings yet
TIA Portal Video Tutorials Overview
2 pages
Toyota's Rural Marketing Strategy Analysis
No ratings yet
Toyota's Rural Marketing Strategy Analysis
98 pages
Image Inpainting Using Partial Convolution
No ratings yet
Image Inpainting Using Partial Convolution
6 pages
Automatic Vehicle Horn Control System
No ratings yet
Automatic Vehicle Horn Control System
3 pages
ورقة عمل المشتقات التعليمية
No ratings yet
ورقة عمل المشتقات التعليمية
2 pages
Mole Concept Exercises and Solutions
No ratings yet
Mole Concept Exercises and Solutions
11 pages
Advanced Cysto-Nephro Videoscope Specs
No ratings yet
Advanced Cysto-Nephro Videoscope Specs
2 pages
Orkney Port Waste Management Plan
No ratings yet
Orkney Port Waste Management Plan
91 pages
Inductive Wireless EV Charging System
No ratings yet
Inductive Wireless EV Charging System
12 pages
Contractor HSE Audit Checklist Guide
No ratings yet
Contractor HSE Audit Checklist Guide
61 pages
New SciFinder613
No ratings yet
New SciFinder613
19 pages
BSI's Path to Sustainable Resilience
No ratings yet
BSI's Path to Sustainable Resilience
14 pages
Understanding "Chém Gió" in Vietnamese
No ratings yet
Understanding "Chém Gió" in Vietnamese
10 pages
Buffalo Turbine Blower Manual & Parts Guide
No ratings yet
Buffalo Turbine Blower Manual & Parts Guide
42 pages
Semiconductor Device Calculations and Analysis
No ratings yet
Semiconductor Device Calculations and Analysis
7 pages
Special Practices for Strawberry Yield
No ratings yet
Special Practices for Strawberry Yield
10 pages
NEET PG Health Care System Insights
No ratings yet
NEET PG Health Care System Insights
2 pages
Currency Exchange and Investment Analysis
No ratings yet
Currency Exchange and Investment Analysis
3 pages
Overview of Pump Systems and Types
No ratings yet
Overview of Pump Systems and Types
36 pages
Distillation and Filtration Exam Questions
No ratings yet
Distillation and Filtration Exam Questions
12 pages
Chemistry Problems for Metropolises Olympiad
No ratings yet
Chemistry Problems for Metropolises Olympiad
23 pages
CCI Analysis: SMATP Insights
100% (1)
CCI Analysis: SMATP Insights
38 pages
Balaji Wafers Product Overview
No ratings yet
Balaji Wafers Product Overview
6 pages
EarthSense 2025 Program Schedule
No ratings yet
EarthSense 2025 Program Schedule
13 pages
Airbus Green Operating Procedures
100% (1)
Airbus Green Operating Procedures
23 pages
Understanding Media and Information Literacy
No ratings yet
Understanding Media and Information Literacy
37 pages
Understanding Accounting for Class XI
No ratings yet
Understanding Accounting for Class XI
3 pages
Understanding Land Information Systems
100% (1)
Understanding Land Information Systems
31 pages
Industrial LED Luminaire ATLANTICS IP66
No ratings yet
Industrial LED Luminaire ATLANTICS IP66
2 pages
HR Management Strategies in Banking
No ratings yet
HR Management Strategies in Banking
12 pages

Understanding Hashing Techniques in Data Structures

Uploaded by

Understanding Hashing Techniques in Data Structures

Uploaded by

Chapter 5

 Can we do even better than O(logN) ?

 A hash table is an array of some fixed size ‘m’

Here we are using a hashing function that

• when h(k1) = h(k2) for two distinct keys k1, k2 , we

Umer (null) Hassan (null) (null)

Umer Hamid Hassan Hamid

(null) (null) (null) (null) Hassan

 Hash functions should try to achieve uniform full

0 Name: Irzam Shahid Name: Rehan Arif

m-1 The two entries are now stored

A separate chaining hash table

• bin 1 (also full), and

 Instead of searching forward in a linear fashion,

 For example, with M = 17

 The sequence in which the bins would be checked is

 Even if two bins are initially close, the sequence in

 In this case, we are checking bin h + 1 twice

 Unfortunately, we are not guaranteed that we will use

086, 198, 466, 709, 973, 981, 374,

 The next already causes a collision

 The next four cause no collisons

 Then another collision

 Then there is one more collision

 300 falls into its correct bin

 If we try to add the last number 565, the sequence of

 The two other entries which pass through bin 11 were

 When a bin is filled, it is marked OCCUPIED

 If a bin is emptied (as a result of a remove), it is

 The effect is less significant than that of primary

 Let the hash function be the number modulo 11

118 34 14 112 107 31

118 34 14 112 107 31

 The 2nd hash function is hash (key) = 7- (key%7)

Because this table is too full, enlarge

You might also like