Chapter 5
Hashing
Motivation
Let us assume that we want to search for a particular
item in a database of 20,000,000 data items
How long would it take to find for a successful
search?
How long would it take for an unsuccessful
search?
It depends on the data structure
Motivation…
If the data structure is a linked list,
the search time is O(N)
If the data structure is a binary search tree,
estimated running time is O(logN)
log 20,000,000 ≈ 24
Can we do even better than O(logN) ?
hash table ADT
Chapter 5: Hashing
5.1 General Idea
5.2 Hash Function
5.3 Separate Chaining
5.4 Open Addressing
5.5 Rehashing
5.6 Extendible Hashing
Chapter 5: Hashing
Our goals: We will
See several methods of implementing the hash table
Compare these methods analytically
Show numerous applications of hashing
Compare hash tables with binary search trees
First some terminology
Hash table ADT is a data structure that supports
only a subset of the operations allowed by the
binary search trees
Implementation of a hash table is called hashing
Hashing is a technique used for performing
insertions, deletions, and finds in a constant time
General Idea
The general idea behind hashing is to directly map
each data item into an address in memory using
some function
key hash function index to an array
Components of hashing
A hash table is an array of some fixed size ‘m’
A hash function h(k) that maps a search key k to
some location in the range [0...m-1]
• h(k): S {0, 1, …, m-1}
General Idea… array
0
Name: Irzam Shahid h(Irzam) = 1
University: RCET 1
Office: room 1 EED
Mobile Number: 2
Email:
etc
Data Item
Here we are using a hashing function that
accepts my last name as a key and returns a 1
m-1
General Idea…
Desired Properties of h(k)
simple to compute
uniform distribution of keys over {0, 1, …, m-1}
• when h(k1) = h(k2) for two distinct keys k1, k2 , we
have a collision
General Idea… array
0
Name: Irzam Shahid h(Irzam) = 1
University: RCET 1
Office: room 1 EED
Mobile Number: 2
Email:
etc
Data Items
Name: Rehan Arif
University: RCET
Office: room 8 EED
Mobile Number:
h(Rehan) = 1
Email:
etc
A collision has occurred m-1
General Idea…
Two Important Topics in Hashing
How to select a hash function
How to resolve collisions
General Idea…
Hashing revisited
A hash table data structure is an array
Each data element contains a key
Each key is mapped to some number in the range
from 0 to TableSize-1, with the help of a hash
function
• The hash function should be efficient to compute and
should ensure that different data items get mapped to
different numbers
The key and the hashing function are used both to
insert the data into the table and to later find that
data
General Idea…
Example
PTCL is a large telephone company, and they
want to maintain a database that provides the
caller ID capability
• given a phone number, return the caller’s name
• phone numbers range from 0 to r = 107 -1
• want to do this as efficiently as possible
General Idea…
Solution 1
an array indexed by key
• takes O(1) time,
• O(r) space - huge amount of wasted space
Umer (null) Hassan (null) (null)
Hamid Hamid
6829227 0000000 6829229 0000000 0000000
General Idea…
Solution 2
Linked list
• takes O(r) time,
• O(r) space (only as much space as is needed )
Umer Hamid Hassan Hamid
6829227 6829229
General Idea…
Solution 3
Hash table
• O(1) expected time, O(n+m) space, where m is table size
Like an array, but come up with a function to map the
large range into one which we can manage
• e.g. take the original key, modulo the (relatively small) size
of the array, and use that as an index
• 6829229 mod 5 = 4
(null) (null) (null) (null) Hassan
Hamid
0 1 2 3 4
Hash Function
A simple hash function
If input keys (k) are integers
hash function, h( k ) = k mod m
um ber
• where m is the table size ime n
a pr
l d be
shou
io n, m
Example
s i t ua t
c h a
• o i d su
To aSuppose
v m = 10,
• k = 10, 20, 30, 40
• h(k) = 0, 0, 0, 0
– A bad choice if the keys end in zeros
Hash Function…
Another simple hash function
If input keys (k) are integers
hash function, h( k ) = k mod m
• where m is the table size and is a prime number
Example
• Suppose m = 11,
• k = 10, 20, 30, 40
• h(k) = 10, 9, 8, 7
– Distributes the keys more uniformly
Hash Function…
A simple hash function
If the keys are strings, then the hash function
can be some function of the characters in the
strings
One possibility is to simply add the ASCII
values of the characters:
length 1
h( str ) str[i ] %m
i 0
• Example
– h(ABC) = (65 + 66 + 67)%m
Hash Function…
Problem
If the table size is large, the function does not
distribute the keys well
TableSize = 10,007 (prime number)
Keys are <= 8 characters
Each char is 1 byte long so highest value it can
have is 28 – 1 = 127
Hash function will have range: 0 to (127*8) = 0 to
1016
~10K spaces in the table and only using the first
1K elements
Hash Function…
Another hash function
If the keys are strings
convert the string into some number in some
arbitrary base b
length 1 i
h( str ) str[i ] b %m
i 0
• Example
– h(ABC) = (65b0 + 66b1 + 67b2) %m
Hash Function…
Examines first three characters of the input
The value 27 represents the number of letters
in English alphabet, plus the blank o not
i s al s
abl e ,
m p ut l a r g e
i l y c o a b ly
he a s eas o n
oug le i s r
i on , th t a b
Index f un ct e ha sh
T his a t e i f th
pr p ri char *Key, int TableSize )
Hash2(oconst
{ ap
return ( Key[ 0 ] + 27 * Key[ 1 ] + 729 * Key[ 2 ] )% TableSize;
}
Hash Function…
Rule of Thumb
Hash functions should try to achieve uniform full
coverage of the hash table, while minimizing
collisions
Since this is usually impossible, and collisions will
almost always occur, an important design
consideration is how you deal with the collision
resolution
Separate Chaining
How to deal with two keys which hash to the same
spot in the array?
Use chaining
All data items that hash to the same number are
kept in a linked list
• Setup an array of lists, indexed by the keys, to
lists of items with the same key
Separate Chaining…
Example
0 Name: Irzam Shahid Name: Rehan Arif
University: RCET University: RCET
Office: room 8 EED
1 Office: room 1 EED
Mobile Number:
Mobile Number:
Email: Email:
2 etc etc
m-1 The two entries are now stored
in a linked list
Separate Chaining…
Example
Here the size of the
hash table = 10
Keys are the first ten
perfect squares 0, 1, 4,
9, 16, 25, 36, 49, 64, and
81
The hash function,
h(k) = k mod 10
A separate chaining hash table
Separate Chaining…
To find an element
using hash function, look up its position in table
search for the element in the linked list of the
hashed slot
To insert an element
compute h(k) to determine which list to traverse
If T[h(k)] contains a null pointer, initialize this entry
to point to a linked list that contains k alone
If T[h(k)] is a non-empty list, we add k at the
beginning of this list
Separate Chaining…
To delete an element
compute h(k), then search for k within the list at
T[h(k)]
delete k if it is found
Separate Chaining…
Analysing the performance of separate chaining hash
table
as we increase the number of elements N in the
hash table, more and more items will be stored in
linked lists, thus slowing everything down
Also increasing the table size TableSize allows you
to hold more data in an efficient manner
It turns out that the ratio λ = N / T is the important
quantity to analyze
• This is called the load factor
Separate Chaining…
Analysing the performance of separate chaining hash
table…
Time to perform search = the constant time
required to evaluate the hash function + time to
traverse the list
Note that, for separate chaining, the average
length of a linked list is λ
Thus, an unsuccessful search will require to
traverse λ links on average
A successful search requires that about 1 + (λ/2)
links be traversed
Separate Chaining…
Analysing the performance of separate chaining hash
table…
Thus, lowering the load factor is a good thing, from
the time point of view
From the space point of view, lowering the load
factor means increasing the table size
• This can lead to largely wasted space
A reasonable compromise is λ ≈ 1
• search times will be roughly O(1)
Open Addressing
Separate chaining has the disadvantage of using
linked lists that slows the algorithm because of the
time required to allocate new cells
Open addressing
relocate the key k to be inserted if it collides with
an existing key
• That is, we store k at an entry different from
T[h(k)]
Open Addressing…
Open addressing hashing resolves collisions by trying
alternative slots in the hash table, until an empty cell
is found
cells h0 (X), h1 (X), h2 (X),… are tried in succession
where hi (X) = (Hash(X) + F(i))mod TableSize with
F(0) = 0
The function, F, is the collision resolution strategy
Open Addressing…
Linear Probing
F(i) is a linear function of i, i.e. F(i) = i
• h0(X) = Hash(X) + 0
• h1(X) = Hash(X) + 1
• h2(X) = Hash(X) + 2
•…
• cells are probed sequentially (with wraparound)
in search of an empty cell
Open Addressing…
*Example
suppose that our hash function converts a 2-digit
integer into a single digit by taking the least-
significant digit
*[Link]
~ece250/
Open Addressing…
*Insertions
Insert the numbers 81, 70, 97, 60, 51, 38, 89, 68, 24 into the
initially empty hash table:
0 1 2 3 4 5 6 7 8 9
*[Link]
~ece250/
Open Addressing…
*Insertions…
We can easily insert 81, 70, and 97 into their corresponding
bins:
0 1 2 3 4 5 6 7 8 9
70 81 97
*[Link]
~ece250/
Open Addressing…
*Insertions…
Inserting 60 causes a collision in bin 0, therefore, we check:
• bin 1 (also full), and
• bin 2 (empty)
0 1 2 3 4 5 6 7 8 9
70 81 60 97
*[Link]
~ece250/
Open Addressing…
*Insertions…
Inserting 51 also causes a collision, this time, in bin 1,
therefore, we check:
• bin 2 (also full), and
• bin 3 (empty)
0 1 2 3 4 5 6 7 8 9
70 81 60 51 97
*[Link]
~ece250/
Open Addressing…
*Insertions…
38 and 89 can be placed into bins 8 and 9 respectively
without collisions
0 1 2 3 4 5 6 7 8 9
70 81 60 51 97 38 89
*[Link]
~ece250/
Open Addressing…
*Insertions…
Inserting 68 causes a collision in bin 8, and therefore we
check bins:
• 9, 0, 1, 2, 3, and finally 4 which is empty
• insert 68 into bin 4
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 97 38 89
*[Link]
~ece250/
Open Addressing…
*Insertions…
Inserting 24 causes a collision in bin 4, however the next bin
is empty
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
*[Link]
~ece250/
Open Addressing…
*Searching
Testing for membership is similar to insertions
Start at the appropriate bin, and continue
searching forward until either:
• the item is found, or
• an empty bin is found
*[Link]
~ece250/
Open Addressing…
*Searching…
Searching for 68, we first examine bin 8, then 9, 0, 1, 2, 3,
and 4, finding 68 in bin 4
Searching for 23, we search bins 3, 4, 5, and bin 6 is empty,
so 23 is not in the table
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
*[Link]
~ece250/
Open Addressing…
*Removing
We cannot simply remove elements from the hash table
For example, if we delete 89 by removing it, we can no
longer find 68
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
*[Link]
~ece250/
Open Addressing…
*Removing…
However, we cannot simply move all entries up to fill the gap
Moving 70 to bin 9 would make it impossible to find 70
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
81 60 51 68 24 97 38 70
*[Link]
~ece250/
Open Addressing…
*Removing…
Instead, we must probe forward, moving only those
elements which would not be moved to a location before
their bin starts
For example, we remove 89
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38
*[Link]
~ece250/
Open Addressing…
*Removing…
We probe forward until we find an entry which can be moved
into bin 9
We cannot move 70, 81, 60, or 51, but we can move 68
0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68
*[Link]
~ece250/
Open Addressing…
*Removing…
Next, we search forward again, and note that 24 can be
moved forward
The next cell is already empty, and therefore we are finished
0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68
*[Link]
~ece250/
Open Addressing…
*Removing…
Suppose we now remove 60
0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68
*[Link]
~ece250/
Open Addressing…
*Removing…
We find 60 in bin 2, and therefore we remove it
We search forward and find that we can move 51 into bin 2
0 1 2 3 4 5 6 7 8 9
70 81 51 24 97 38 68
*[Link]
~ece250/
Open Addressing…
*Removing…
We cannot move 24 forward
The next bin (5) is empty, therefore we are finished
0 1 2 3 4 5 6 7 8 9
70 81 51 24 97 38 68
*[Link]
~ece250/
Open Addressing…
*Primary Clustering
We have already observed the following
phenomenon:
• as we insert more elements into the hash table,
the contiguous regions get larger
• Any key that hashes into the cluster will require
several attempts to resolve the collision
This results in longer search times
*[Link]
~ece250/
Open Addressing…
*Primary Clustering…
Consider inserting the following entries 81, 70, 97, 63, 76,
38, 85, 68, 21, 9, 55, 73, 57, 60, 72, 74, 85, 16, 61, 7, 49
Use the number modulo 25 to determine which bin it should
occupy
The first five don’t cause any collisions
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
76 81 63 70 97
*[Link]
~ece250/
Open Addressing…
*Primary Clustering…
Inserting 38 causes a collision in bin 13
The next seven do not cause any further collisions
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
76 55 81 57 9 85 63 38 68 70 21 97 73
*[Link]
~ece250/
Open Addressing…
*Primary Clustering…
The next four insertions cause collisions:
60 (bin 10)
72 (bin 22)
74 (bin 24)
85 (bin 10)
We can safely insert 16 into bin 16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
74 76 55 81 57 9 85 60 85 63 38 16 68 70 21 97 73 72
*[Link]
~ece250/
Open Addressing…
*Primary Clustering…
The remaining insertions all cause collisions:
61 (bin 11)
7 (bin 7)
49 (bin 24)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
74 76 49 55 81 57 7 9 85 60 85 63 38 61 16 68 70 21 97 73 72
*[Link]
~ece250/
Open Addressing…
Asymptotic Performance
Primary clustering affects the number of probes
required to perform the insertions, searches or
deletions
The average number of probes for a successful
search can be estimated as
• Number of probes ( ½ ) ( 1+1/( 1- ) )
• where is the load factor – what fraction of the table is
used
Open Addressing…
Asymptotic Performance…
The number of probes for an unsuccessful search
n ha lf
or for an insertion is higher: r e tha
2 is m
o
• Number of probes ( ½ ) ( 1+1/( t1- a bl e
) )
i f t h e
e
ic expected
• if = 0.75 , 8.5 probeshoare
b c
ad are expected, and this unreasonable
• if = 0.9 , 50be a
probes
c a n
r o bi ng
i ne a rp
l
full
Open Addressing…
*The following plot shows how the number of
required probes increases
*[Link]
~ece250/
Open Addressing…
*Primary clustering occurs with linear probing
because the same linear pattern
if a bin is inside a cluster, then the next bin must
either
• also be in that cluster, or
• expand the cluster
Instead of searching forward in a linear fashion,
consider searching forward using a quadratic function
*[Link]
~ece250/
Open Addressing…
Quadratic Probing
with quadratic probing F(i) = i2
This eliminates the primary clustering problem of
linear probing
• h0(X) = Hash(X) + 0
• h1(X) = Hash(X) + 1
• h2(X) = Hash(X) + 4
• …
Open Addressing…
*Insertions
Suppose that an element should appear in bin h
if bin h is occupied, then check the following
sequence of bins
h + 12, h + 22, h + 32, h + 42, h + 52, ...
h + 1, h + 4, h + 9, h + 16, h + 25, ...
For example, with M = 17
*[Link]
~ece250/
Open Addressing…
*Insertions…
If one of h + i2 falls into a cluster, this does not imply
the next one will
*[Link]
~ece250/
Open Addressing…
*Insertions…
For example, suppose an element was to be inserted in
bin 23 in a hash table with 31 bins
The sequence in which the bins would be checked is
23, 24, 27, 1, 8, 17, 28, 10, 25, 11, 30, 20, 12, 6, 2, 0
Even if two bins are initially close, the sequence in
which subsequent bins are checked varies greatly
*[Link]
~ece250/
Open Addressing…
*Insertions…
Thus, quadratic probing solves the problem of
primary clustering
Unfortunately, there is a second problem which must
be dealt with
Suppose we have M = 8 bins
12 ≡ 1, 22 ≡ 4, 32 ≡ 1
In this case, we are checking bin h + 1 twice
having checked only one other bin
*[Link]
~ece250/
Open Addressing…
*Insertions…
Unfortunately, there is no guarantee that
h + i2 mod M
will cycle through 0, 1, ..., M – 1
Solution
M should be a prime number
in this case, h + i2 mod M for i = 0, ..., (M – 1)/2 will
cycle through exactly (M + 1)/2 values before
repeating
*[Link]
~ece250/
Open Addressing…
*Insertions…
Example
with M = 11
0, 1, 4, 9, 16 ≡ 5, 25 ≡ 3, 36 ≡ 3
with M = 13
0, 1, 4, 9, 16 ≡ 3, 25 ≡ 12, 36 ≡ 10, 49 ≡ 10
with M = 17
0, 1, 4, 9, 16, 25 ≡ 8, 36 ≡ 2, 49 ≡ 15, 64 ≡ 13, 81 ≡ 13
*[Link]
~ece250/
Open Addressing…
*Insertions…
Thus, quadratic probing avoids primary clustering
Unfortunately, we are not guaranteed that we will use
all the bins
In reality, if the hash function is reasonable, this is not
a significant problem until approaches 1
*[Link]
~ece250/
Open Addressing…
*Insertions…
Example
with a hash table with M = 19 using quadratic
probing, insert the following random 3-digit
numbers
086, 198, 466, 709, 973, 981, 374,
766, 473, 342, 191, 393, 300, 011,
538, 913, 220, 844, 565
using the number modulo 19 to be the initial bin
*[Link]
~ece250/
Open Addressing…
*Insertions…
The first two fall into their correct bin
086 → 10, 198 → 8
The next already causes a collision
466 → 10 → 11
The next four cause no collisons
709 → 6, 973 → 4, 981 → 12, 374 → 13
Then another collision
766 → 6 → 7
*[Link]
~ece250/
Open Addressing…
*Insertions…
At this point, two clusters have appeared and the
load factor is = 0.42
*[Link]
~ece250/
Open Addressing…
*Insertions…
The next three also go into their appropriate bins
473 → 17, 342 → 0, 191 → 1
Then there is one more collision
393 → 13 → 14
300 falls into its correct bin
300 → 15
*[Link]
~ece250/
Open Addressing…
*Insertions…
With previous five insertions, the load factor is =
0.68 with one large cluster
*[Link]
~ece250/
Open Addressing…
*Insertions…
At this point, insertions become more tedious
011 → 11 → 12 → 15 → 1 → 8 → 17 → 9
538 → 6 → 7 → 10 → 15 → 3
913 → 1 → 2
220 → 11 → ⋅⋅⋅ → 9 → 3 → 18
844 → 8 → 9 → 12 → 17 → 5
*[Link]
~ece250/
Open Addressing…
*Insertions…
To show how quadratic probing works, consider the
addition of 538, starting in bin 6
The first four bins all fall within the same cluster,
however, the fifth bin checked falls far outside the
cluster
*[Link]
~ece250/
Open Addressing…
*Insertions…
At this point, the array is almost full (bin 16 is open)
and the load factor is = 0.95
If we try to add the last number 565, the sequence of
bins checked is
14 → 15 → 18 → 4 → 11 → 1 → 12 → 6 → 2 →
0
which does not hit bin 16
*[Link]
~ece250/
Open Addressing…
*We can compare the number of probes required with
that of linear probing
086 → 10, 10 198 → 8
466 → 10 → 11 709 → 6
973 → 4 981 → 12
374 → 13 766 → 6 → 7
473 → 17 342 → 0
191 → 1 393 → 13 → 14
300 → 15 011 → 11 → 12 → 13 → 14 → 15 → 16
538 → 6 → 7 → 8 → 9 913 → 1 → 2
220 → 11 → 12 → 13 → 14 → 15 → 16 → 17 → 18
844 → 8 → 9 → 10 → 11 → 12 → 13 → 14 → 15 → 16 → 17 → 18 → 0 → 1 → 2 → 3
565 → 14 → 15 → 16 → 17 → 18 → 0 → 1 → 2 → 3 → 4 → 5
*[Link]
~ece250/
Open Addressing…
*Deletions
With linear probing, if we deleted the contents of a
bin, we had to search ahead to determine if any
nodes had to be moved back
easy with linear probing; we simply moved from
bin to bin until an empty bin was located
*[Link]
~ece250/
Open Addressing…
*Deletions…
The nonlinear probing associated with quadratic
probing does not allow us to do this efficiently
For example, suppose we delete 466 which is
currently in bin 11
The two other entries which pass through bin 11 were
011 and 220
We cannot (efficiently) find these entries
*[Link]
~ece250/
Open Addressing…
*Deletions…
Solution
associate with each bin a field which is either
EMPTY, OCCUPIED, or DELETED
*[Link]
~ece250/
Open Addressing…
*Deletions…
Initially, all bins are initially marked EMPTY
When a bin is filled, it is marked OCCUPIED
If a bin is emptied (as a result of a remove), it is
marked DELETED
Note that a bin which is marked as being DELETED
may once again be filled (and hence marked
OCCUPIED)
*[Link]
~ece250/
Open Addressing…
*Deletions…
Example
given a hash table with
M = 11 bins, enter the values
135 909 246 894 518 365
Bin 0 1 2 3 4 5 6 7 8 9 10
Entry
Flag E E E E E E E E E E E
*[Link]
~ece250/
Open Addressing…
*Deletions…
The first three are straight-forward
135 → 3 909 → 7 246 → 4
Bin 0 1 2 3 4 5 6 7 8 9 10
Entry 135 246 909
Flag E E E O O E E O E E E
*[Link]
~ece250/
Open Addressing…
The phenomenon of primary clustering will not occur
with quadratic probing
However, if multiple items all hash to the same initial
bin, the same sequence of numbers will be followed
This is termed secondary clustering
The effect is less significant than that of primary
clustering
*[Link]
~ece250/
Open Addressing…
Secondary clustering may be a problem if the hash
function does not produce an even distribution of
entries
One solution to secondary clustering is double
hashing, associating with each element an initial bin
(defined by one hash function) and a skip (defined by
a second hash function)
*[Link]
~ece250/
Open Addressing…
Example
Insert the 6 elements
14, 107, 31, 118, 34, 112
into an initially empty hash table of size 11 using
quadratic hashing
Let the hash function be the number modulo 11
*[Link]
~ece250/
Open Addressing…
The first three fall into bins 3, 8, and 9, respectively
0 1 2 3 4 5 6 7 8 9 10
14 107 31
*[Link]
~ece250/
Open Addressing…
118 also falls into bin 8 (occupied)
Thus, we check
8+1=9 - occupied
8+4=1 - unoccupied
0 1 2 3 4 5 6 7 8 9 10
118 14 107 31
*[Link]
~ece250/
Open Addressing…
34 falls into bin 1 which is occupied, thus we check
1+1=2 - unoccupied
0 1 2 3 4 5 6 7 8 9 10
118 34 14 107 31
*[Link]
~ece250/
Open Addressing…
112 falls into bin 2 which is now occupied, thus we
check
2+1=3 - occupied
2+4=6 - unoccupied
0 1 2 3 4 5 6 7 8 9 10
118 34 14 112 107 31
*[Link]
~ece250/
Open Addressing…
At this point, the hash table is over half full
We are no longer guaranteed that the insertion of a
new element may be possible
Solution
increase the size of the table (perhaps only after failing)
Problem
the new size must, too, be prime
0 1 2 3 4 5 6 7 8 9 10
118 34 14 112 107 31
*[Link]
~ece250/
Open Addressing…
To remove an element, we must simply mark it as
deleted
In our example, removing 118, we begin in bin 8, and
continue to check 9, and then 1
Mark that bin as having had an element deleted
0 1 2 3 4 5 6 7 8 9 10
DEL 34 14 112 107 31
*[Link]
~ece250/
Open Addressing…
To find an element we start by checking the bin it
should have initially been in, and then begin checking
following quadratic probing until either
we find it, or
we find a bin which is empty
0 1 2 3 4 5 6 7 8 9 10
DEL 34 14 112 107 31
*[Link]
~ece250/
Open Addressing…
We find 14 in bin 3
We don’t find 34 in bin 1 (marked as deleted), so we
check bin 1 + 1 = 2, and find it
0 1 2 3 4 5 6 7 8 9 10
DEL 34 14 112 107 31
*[Link]
~ece250/
Open Addressing…
We search for 19 in bin 8
Not finding it, we check
8+ 1=9 - occupied
8+ 4=1 - deleted
8+ 9=6 - occupied
8 + 16 = 2 - occupied
8 + 25 = 0 - unoccupied: not found
0 1 2 3 4 5 6 7 8 9 10
DEL 34 14 112 107 31
*[Link]
~ece250/
Open Addressing…
Double Hashing
choose the initial bin with a first hash function
choose the jump value with a different hash
function i.e. F(i) = i * hash2(key)
• A function such as hash2(key) = R – (key%R) ,
with R a prime smaller than TableSize, will often
work well
Open Addressing…
Example
The hash table size, TableSize = 10,
Insert the keys 89, 18, 49, 58, and 69
The hash function is h(key) = key%10
The 2nd hash function is hash (key) = 7- (key%7)
2
Open Addressing…
Example…
Open Addressing…
Conclusions
Double hashing has performance that is almost
optimal
However, calculating the 2nd hash function does
provide some additional computational inefficiency
Rehashing
If the table gets too full, the running time for the
operations will start taking too long
Solution
Rehashing
• Build another table twice as big with a new associated
hash function
• Scan down entire original hash table
• Compute new hash value for each element
• Insert into new table
Rehashing… Original Hash Table
Example
hash table size = 7
Insert the elements
13, 15, 24, and 6
The hash function
After Inserting 23
is h(key) = key%7
Use linear probing
to resolve collisions
Insert 23 now
Because this table is too full, enlarge
it to size 17, and redefine the hash
function
Rehashing… After Rehashing
Example…
hash table size = 17
The hash function is
h(key) = key%17
The old table is
scanned and elements
6, 15, 23, 24, and 13 are
inserted into the new
table
Use linear probing to
resolve collisions
Rehashing…
Complexity of Rehashing
It takes O(N) time to rehash, since there are N
elements to rehash
Hashing : Summary
Hash tables can be used to implement the Insert and
Find operations in constant average time
For these time bounds to be valid, special attention has to
be paid to load factor
For separate chaining hashing, λ should be close to 1
For open addressing hashing, λ should not exceed 0.5
If linear probing is used, performance degenerates rapidly
as the λ approaches to 1
Rehashing can be implemented to allow the table to grow ,
thus maintaining a reasonable λ