Chapter - 8
Searching and Hashing
contents
• What is searching?
• Importance of searching
• Types of searching, unordered linear search,
sorted linear search, binary search,
interpolation search
• Comparison of searching algorithms,
• hashing techniques.
Searching
• Searching is a process of finding a particular record, which
can be a single element or a small chunk, within a huge
amount of data.
• The data can be in various forms: arrays, linked lists, trees
and graphs etc. With the increasing amount of data
nowadays, there are multiple techniques to perform the
searching operation.
• Searching Algorithms in Data Structures
• Various searching techniques can be applied on the data
structures to retrieve certain data.
• A search operation is said to be successful only if it returns
the desired element or data; otherwise, the searching
method is unsuccessful.
Cont.
• There are two categories these searching
techniques fall into. They are −
• Sequential Searching
• Interval Searching
Sequential Searching
• As the name suggests, the sequential
searching operation traverses through each
element of the data sequentially to look for
the desired data.
• The data need not be in a sorted manner for
this type of search.
Example − Linear Search
Interval Searching
• Unlike sequential searching, the interval
searching operation requires the data to be in
a sorted manner.
• This method usually searches the data in
intervals; it could be done by either dividing
the data into multiple sub-parts or jumping
through the indices to search for an element.
Example − Binary Search
Evaluating Searching Algorithms
• Usually, not all searching techniques are suitable for all types of
data structures.
• In some cases, a sequential search is preferable while in other cases
interval searching is preferable.
• Evaluation of these searching techniques is done by checking the
running time taken by each searching method on a particular input.
• To explain briefly, there are three different cases of time complexity
in which a program can run. They are −
• Best Case
• Average Case
• Worst Case
Cont.
• The best case time complexity of a linear search
is O(1) where the desired element is found in the
first iteration;
• whereas the worst case time complexity is O(n)
when the program traverses through all the
elements and still does not find an element.
• This is labeled as an unsuccessful search.
Therefore, the actual time complexity of a linear
search is seen as O(n), where n is the number of
elements present in the input data structure.
Unordered linear search
• The linear search approach depends on how the list items are
stored—
whether they are sorted in order or stored without any order.
• Let's first see if a list has items that are not sorted.
• Consider an example list that contains elements 60, 1, 88, 10, and
100—an unordered list.
• The items in the list have no order by magnitude. To perform a
search operation on such a list, one proceeds from the very first
item and compares that with the search item.
• If the search item is not matched then the next element in the list is
examined.
• This continues till we reach the last element in the list or until a
match is found
types of searching methods
• Many types of searching methods are used to
search for data entries in various data
structures. Some of them include −
• Linear Search
• Binary Search
• Interpolation Search
• Hash Table
Linear Search Algorithm
• Linear search is a type of sequential searching algorithm.
• In this method, every element within the input array is
traversed and compared with the key element to be found.
If a match is found in the array the search is said to be
successful;
• if there is no match found the search is said to be
unsuccessful and gives the worst-case time complexity.
• For instance, in the given animated diagram, we are
searching for an element 33. Therefore, the linear search
method searches for it sequentially from the very first
element until it finds a match. This returns a successful
search.
Cont.
Linear Search Algorithm
• The algorithm for linear search is relatively simple. The procedure starts at
the very first index of the input array to be searched.
• Step 1 − Start from the 0th index of the input array, compare the key value
with the value present in the 0th index.
• Step 2 − If the value matches with the key, return the posi on at which the
value was found.
• Step 3 − If the value does not match with the key, compare the next
element in the array.
• Step 4 − Repeat Step 3 un l there is a match found. Return the posi on at
which the match was found.
• Step 5 − If it is an unsuccessful search, print that the element is not
present in the array and exit the program.
Example
• Let us look at the step-by-step searching of
the key element (say 47) in an array using the
linear search method.
Cont.
• Step 1
• The linear search starts from the 0th index. Compare
the key element with the value in the 0th index, 34.
• However, 47 34. So it moves to the next element.
Cont.
• Step 2
• Now, the key is compared with value in the 1st index
of the array.
• Still, 47 10, making the algorithm move for another
iteration.
Cont.
• Step 3
• The next element 66 is compared with 47.
They are both not a match so the algorithm
compares the further elements.
Cont.
• Step 4
• Now the element in 3rd index, 27, is
compared with the key value, 47. They are not
equal so the algorithm is pushed forward to
check the next element.
Cont.
• Step 5
• Comparing the element in the 4th index of the
array, 47, to the key 47.
• It is figured that both the elements match.
Now, the position in which 47 is present, i.e., 4
is returned.
Binary Search Algorithm
• Binary search is a fast search algorithm with run-time complexity of
(log n).
• This search algorithm works on the principle of divide and conquer,
since it divides the array into half before searching.
• For this algorithm to work properly, the data collection should be in
the sorted form.
• Binary search looks for a particular key value by comparing the
middle most item of the collection.
• If a match occurs, then the index of item is returned.
• But if the middle item has a value greater than the key value, the
right sub-array of the middle item is searched.
• Otherwise, the left sub-array is searched.
• This process continues recursively until the size of a subarray
reduces to zero.
Cont.
Binary Search Algorithm
• Binary Search algorithm is an interval
searching method that performs the searching
in intervals only.
• The input taken by the binary search
algorithm must always be in a sorted array
since it divides the array into subarrays based
on the greater or lower values.
• The algorithm follows the procedure below −
Cont.
• Step 1 − Select the middle item in the array and compare it with the key
value to be searched. If it is matched, return the position of the median.
• Step 2 − If it does not match the key value, check if the key value is either
greater than or less than the median value.
• Step 3 − If the key is greater, perform the search in the right sub-array; but
if the key is lower than the median value, perform the search in the left
sub-array.
• Step 4 − Repeat Steps 1, 2 and 3 itera vely, un l the size of sub-array
becomes 1.
• Step 5 − If the key value does not exist in the array, then the algorithm
returns an unsuccessful search.
Cont.
• During the first iteration, the element is searched in
the entire array. Therefore, length of the array = n.
• In the second iteration, only half of the original array is
searched. Hence, length of the array = n/2.
• In the third iteration, half of the previous sub-array is
searched. Here, length of the array will be = n/4.
• Similarly, in the ith iteration, the length of the array will
become n/2i
Example
• For a binary search to work, it is mandatory for the
target array to be sorted.
• We shall learn the process of binary search with a
pictorial example.
• The following is our sorted array and let us assume
that we need to search the location of value 31 using
binary search.
Cont.
• First, we shall determine half of the array by
using this formula −
• mid = low + (high - low) / 2
• Here it is, 0 + (9 - 0) / 2 = 4 (integer value of
4.5). So, 4 is the mid of the array.
Cont.
• Now we compare the value stored at location 4, with
the value being searched, i.e. 31.
• We find that the value at location 4 is 27, which is
not a match.
• As the value is greater than 27 and we have a sorted
array, so we also know that the target value must be
in the upper portion of the array.
Cont.
• We change our low to mid + 1 and find the new mid
value again.
• low = mid + 1
• mid = low + (high - low) / 2
• Our new mid is 7 now. We compare the value stored
at location 7 with our target value 31.
Cont.
• The value stored at location 7 is not a match,
rather it is less than what we are looking for. So,
the value must be in the lower part from this
location.
• Hence, we calculate the mid again. This time it is 5.
Cont.
• We compare the value stored at location 5 with
our target value. We find that it is a match.
• We conclude that the target value 31 is stored at
location 5.
• Binary search halves the searchable items and
thus reduces the count of comparisons to be
made to very less numbers.
Interpolation Search Algorithm
• Interpolation search is an improved variant of binary
search.
• This search algorithm works on the probing position of
the required value or estimates/predicts the probable
position of the search key.
• For this algorithm to work properly, the data collection
should be in a sorted form and equally distributed.
• Binary search has a huge advantage of time complexity
over linear search.
• Linear search has worst-case complexity of (n) whereas
binary search has (log n).
Cont.
• There are cases where the location of target data
may be known in advance.
• For example, in case of a telephone directory, if
we want to search the telephone number of
Morpheus.
• you jump close to where it should be, not the
middle.
• Here, linear search and even binary search will
seem slow as we can directly jump to memory
space where the names start from 'M' are stored.
Positioning in Binary Search
• In binary search, if the desired data is not found then
the rest of the list is divided in two parts, lower and
higher. The search is carried out in either of them.
Cont.
• Even when the data is sorted, binary search
does not take advantage to probe the position
of the desired data.
Cont.
• Formula used
• To calculate the estimated position:
• Pos=low+((key-arr[low]*(high-low))/arr[high]-
arr[low]
• Where
• Low=starting index
• High=ending index
• Key= element to search
example
• Array: [10, 20,30,40,50,60,70,80,90]
• Search key=70
• Low=0, high= 8
• Estimated position
• Pos=0+((70-10)*(8-0))/90-10
• =6
Position Probing in Interpolation
Search
• Interpolation search finds a particular item by
computing the probe position. Initially, the probe
position is the position of the middle most item of
the collection.
Cont.
• If a match occurs, then the index of the item is returned. To split the
list into two parts, we use the following method −
• mid=Lo+(Hi−Lo)∗ (key−Arr[Lo])Arr[Hi]−Arr[Lo]
where −
Arr = list
• Lo = Lowest index of the list
• Hi = Highest index of the list
• A[n] = Value stored at index n in the list
• If the middle item is greater than the item, then the probe position
is again calculated in the sub-array to the right of the middle item.
• Otherwise, the item is searched in the sub-array to the left of the
middle item.
• This process continues on the sub-array as well until the size of
subarray reduces to zero.
Interpolation Search Algorithm
• As it is an improvisation of the existing BST algorithm,
we are mentioning the steps to search the 'target' data
value index, using posi on probing −
• 1. Start searching data from middle of the list.
• 2. If it is a match, return the index of the item, and exit.
• 3. If it is not a match, probe position.
• 4. Divide the list using probing formula and find the
new middle.
• 5. If data is greater than middle, search in higher sub-
list.
• 6. If data is smaller than middle, search in lower sub-
list.
• 7. Repeat until match.
Cont.
• Example
• To understand the step-by-step process involved in the
interpolation search, let us look at an example and work
around it.
• Consider an array of sorted elements given below −
Cont.
• Let us search for the element 19.
• Solution
• Unlike binary search, the middle point in this approach is chosen
using the formula −
• mid=Lo+(Hi−Lo)∗ (X−A[Lo])/A[Hi]−A[Lo]
• So in this given array input,
• Lo = 0, A[Lo] = 10
• Hi = 9, A[Hi] = 44
• X = 19
• Applying the formula to find the middle point in the list, we get
• mid= 0+(9−0)∗ (19−10)/44−10
• mid=9∗ 9/34
• mid=81/34
=2.38
Cont.
• Since, mid is an index value, we only consider the integer part
of the decimal. That is, mid = 2.
• Comparing the key element given, that is 19, to the element
present in the mid index, it is found that both the elements
match.
• Therefore, the element is found at index 2.
Comparison of searching algorithms
• Here’s a clear and concise comparison of the most common
searching algorithms, focusing on how they work, their
complexity, and when to use them.
• 1. Linear Search
• Idea: Check each element sequentially until the target is
found or the list ends.
• Best Case: O(1) (target is first element)
• Worst Case: O(n)
• Space Complexity: O(1)
• When to Use: Small or unsorted datasets.
Cont.
• 2. Binary Search
• Idea: Repeatedly divide the sorted list in half
to find the target.
• Best Case: O(1)
• Worst Case: O(logn)
• Space Complexity: O(1) (iterative) or O(logn)
(recursive)
• When to Use: Large sorted datasets.
Cont.
• Interpolation Search
• Idea: Improves binary search by estimating the
position of the target based on value distribution.
• Best Case: O(1)
• Average Case: O(loglogn) (uniform distribution)
• Worst Case: O(n) (non-uniform data)
• When to Use: Sorted datasets with uniformly
distributed values.
Cont.
• Hashing (Direct Access Search)
• Idea: Use a hash function to map keys to
indices for constant-time lookup.
• Best Case: O(1)
• Worst Case: O(n) (hash collisions)
• Space Complexity: O(n)
• When to Use: Fast lookups in large datasets
where extra memory is acceptable.
hashing techniques
• Hashing is a technique used in data structures to store and retrieve
data quickly by converting a key into an index of an array called a
hash table.
• Hash Table is a data structure which stores data in an associative
manner.
• In a hash table, data is stored in an array format, where each data
value has its own unique index value.
• Access of data becomes very fast if we know the index of the
desired data.
• Thus, it becomes a data structure in which insertion and search
operations are very fast irrespective of the size of the data.
• Hash Table uses an array as a storage medium and uses hash
technique to generate an index where an element is to be inserted
or is to be located from.
Hashing
• Hashing is a technique to convert a range of
key values into a range of indexes of an array.
We're going to use modulo operator to get a
range of key values.
• Consider an example of hash table of size 20,
and the following items are to be stored.
• Item are in the (key,value) format.
Cont.
• (1,20) (2,70) (42,80) (4,25) (12,44) (14,32)
(17,11) (13,78) (37,98)
Cont.
• [Link]. Key Hash Array Index
• 1 1 1 % 20 = 1 1
• 2 2 2 % 20 = 2 2
• 3 42 42 % 20 = 2 2
• 4 4 4 % 20 = 4 4
• 5 12 12 % 20 = 12 12
• 6 14 14 % 20 = 14 14
• 7 17 17 % 20 = 17 17
• 8 13 13 % 20 = 13 13
• 9 37 37 % 20 = 17 17
Linear Probing
• As we can see, it may happen that the hashing
technique is used to create an already used
index of the array.
• In such a case, we can search the next empty
location in the array by looking into the next
cell until we find an empty cell.
• This technique is called linear probing.
Cont.
• [Link]. Key Hash Array Index After Linear
Probing, Array Index
• 1 1 1 % 20 = 1 1 1
• 2 2 2 % 20 = 2 2 2
• 3 42 42 % 20 = 2 2 3
• 4 4 4 % 20 = 4 4 4
• 5 12 12 % 20 = 12 12 12
• 6 14 14 % 20 = 14 14 14
• 7 17 17 % 20 = 17 17 17
• 8 13 13 % 20 = 13 13 13
• 9 37 37 % 20 = 17 17 18
Basic Operations
• Following are the basic primary operations of a hash table.
• Search − Searches an element in a hash table.
• Insert − Inserts an element in a hash table.
• Delete − Deletes an element from a hash table.
• Data Item
• Define a data item having some data and key, based on
which the search is to be conducted in a hash table.
Cont.
• Search Operation
• Whenever an element is to be searched,
compute the hash code of the key passed and
locate the element using that hash code as
index in the array.
• Use linear probing to get the element ahead if
the element is not found at the computed
hash code.
Cont.
• Insert Operation
• Whenever an element is to be inserted,
compute the hash code of the key passed and
locate the index using that hash code as an
index in the array.
• Use linear probing for empty location, if an
element is found at the computed hash code.
Cont.
• Delete Operation
• Whenever an element is to be deleted, compute
the hash code of the key passed and locate the
index using that hash code as an index in the
array.
• Use linear probing to get the element ahead if an
element is not found at the computed hash code.
When found, store a dummy item there to keep
the performance of the hash table intact.
Complexity Analysis of a Hash Table:
• For lookup, insertion, and deletion operations,
hash tables have an average-case time
complexity of O(1).
Yet, these operations may, in the worst case,
require O(n) time, where n is the number of
elements in the table.
Applications of Hash Table:
• Hash tables are frequently used for indexing and
searching massive volumes of data.
• A search engine might use a hash table to store the
web pages that it has indexed.
• Data is usually cached in memory via hash tables,
enabling rapid access to frequently used information.
• Hash functions are frequently used in cryptography to
create digital signatures, validate data, and guarantee
data integrity.
• Hash tables can be used for implementing database
indexes, enabling fast access to data based on key
values.
End of chapter