Data Structures and Algorithms Overview
Data Structures and Algorithms Overview
1|Page
• Operations on data structures:
o Traversing, Searching, Inserting, Deleting, Sorting, Merging.
• Algorithm properties:
o It must be correct (must produce desired output).
o It is composed of a series of concrete steps.
o There can be no ambiguity.
o It must be composed of a finite number of steps.
o It must terminate.
• To summarize:
o Problem - a function of inputs and mapping them to outputs.
o Algorithm - a step-by-step set of operations to solve a specific problem
or a set of problems.
o Program - a specific sequences of instructions in a prog. lang., and it
may contain the implementation of many algorithms.
• [Link] [Link]
• Two important things about data types:
o Defines a certain domain of values
o Defines operations allowed on those values
o Example: int takes
▪ Takes only integer values
▪ Operations: addition, subtraction, multiplication, division, bitwise
operations.
• ADT describes a set of objects sharing the same properties and behaviors.
o The properties of an ADT are its data.
o The behaviors of an ADT are its operations or functions.
• ADT example: stack (can be implemented with array or linked list)
• Abstraction is the method of hiding the unwanted information.
• Encapsulation is a method to hide the data in a single entity or unit along with
a method to protect information from outside. Encapsulation can be
implemented using by access modifier i.e. private, protected and public.
• A data structure is the organization of the data in a way so that it can be used
efficiently.
• It is used to implement an ADT.
2|Page
• ADT tells us what is to be done and data structures tells use how to do it.
• Types:
o linear (stack, array, linked list)
o non-linear (tree, graph)
o static (compile time memory allocation), array
▪ Advantage: fast access
▪ Disadvantage: slow insertion and deletion
o dynamic (run time memory allocation), linked list
▪ Advantage: faster insertion and deletion
▪ Disadvantage: slow access
Asymptotic notations
• Efficiency measured in terms of TIME and SPACE. In terms of number of
operations.
• Asymptotic complexity
3|Page
•
•
• O (Big-O) notation (worst time, upper bound, maximum complexity), 0 <= f(n)
<= c*g(n) for all n >= n0, f(n) = O(g(n))
4|Page
• f(n) = 3n + 2, g(n) = n, f(n) = Og(n)
•
• 3n + 2 <= Cn
• 3n + 2 <= 4n
• n >= 2
•
• c = 4, n >= 2
o n3 = O(n2) False
o n2 = O(n3) True
• Ω (Omega) notation (best amount of time, lower bound), 0 <= c*g(n) <= f(n)
for all n >=n0
• f(n) = 3n + 2, g(n) = n, f(n) = Ωg(n)
•
• 3n + 2 <= Cn
• 3n + 2 <= n
• 2n >= -2
• n >= -1
•
• c = 1, n >= 1
• Θ (Big-theta) notation (average case, lower & upper sandwich), 0 <= c1*g(n)
<= f(n) <= c2*g(n)
• f(n) = 3n + 2, g(n) = n, f(n) = Θg(n)
•
• C1*n <= 3n + 2 <= C2*n
•
• 3n + 2 <= C2*n c1*n <= 3n + 2
• 3n + 2 <= 4n 3n + 2 >= n
• n >= 2 n >= -1
•
• c2 = 4, n >= 2 c1 = 1, n >= 1
• n >=2 // We must take greater number, which is true for both
5|Page
Searching Techniques
• Searching is an operation which finds the location of a given element in a list.
• The search is said to be successful or unsuccessful depending on whether the
element that is to be searched is found or not.
Linear Search
6|Page
returned, otherwise the search continues till the end of the data collection.
• Pseudocode:
• procedure linear_search(list, value)
• for each item in the list
• if item == value
• return the item's location
• end if
• end for
end procedure
7|Page
CODE: // C++ code to linearly search x in arr[]. If x
// is present then return its location, otherwise
// return -1
#include <iostream>
using namespace std;
// Driver code
int main(void)
{
int arr[] = { 2, 3, 4, 10, 40 };
int x = 10;
int n = sizeof(arr) / sizeof(arr[0]);
// Function call
int result = search(arr, n, x);
(result == -1)
? cout << "Element is not present in array"
: cout << "Element is present at index " << result;
return 0;
}
• Analysis:
o Best case O(1)
o Average O(n)
o Worst O(n)
Binary Search
• Binary Search is a searching algorithm for finding an element's position in a sorted
array.
• It's fast and efficient, tIme complexity of binary search: O(log n)
• In this method:
o To search an element we compare it with the element present at the center of
the list. If it matches then the search is successful.
8|Page
o Otherwise, the list is divided into two halves:
▪ One from 0th element to the center element (first half)
▪ Another from center element to the last element (second half)
o The searching will now proceed in either of the two halves depending upon
whether the element is greater or smaller than the center element.
o If the element is smaller than the center element then the searching will be done
in the first half, otherwise in the second half.
• It can be done recursively or iteratively.
• Pseudocode:
• procedure binary_search
• A ← sorted array
• n ← size of array
• x ← value to be searched
•
• set lowerBound = 1
• set upperBound = n
•
• while x not found
• if upperBound < lowerBound
• EXIT: x does not exists.
•
• set midPoint = lowerBound + (upperBound - lowerBound) / 2
•
• if A[midPoint] < x
• set lowerBound = midPoint + 1
•
• if A[midPoint] > x
• set upperBound = midPoint - 1
•
• if A[midPoint] = x
• EXIT: x found at location midPoint
• end while
•
• end procedure
Sorting techniques
• Sorting - a process of arranging a set of data in certain order
• Internal sorting - deals with data in memory of computer
• External sorting - deals with data stored in data files when data is in
large volume
TYPES OF SORT:
9|Page
• Sorting - a process of arranging a set of data in certain order
• Internal sorting - deals with data in memory of computer
• External sorting - deals with data stored in data files when data is in large
volume
• Types of sorts:
o Selection sort - O(n2). Selects the smallest element from an unsorted list
and places that element in front. Python code
o Bubble sort - best O(n) else O(n2). Compares adjacent elements, and
swaps elements bringing large elements to the end. Python code
o **Insertion sort - best O(n) else O(n2). Places unsorted element at its
suitable place in each iteration. Python code
o **Merge sort - O(n*logn). It is based on Divide and Conquer
Algorithm divides in the middle, sorts, then combines.
o Quick sort - PIVOT, worst O(n2) else O(n*logn). Based on Divide and
Conquer Algorithm, larger and smaller elements are placed after and
before pivot element.
o Heap sort - O(n*logn).
o Radix sort
o Bucket sort
10 | P a g e
Selection Sort Algorithm
Selection sort is a sorting algorithm that selects the smallest element from
an unsorted list in each iteration and places that element at the beginning of
the unsorted list.
Compare minimum with the third element. Again, if the third element is smaller,
then assign minimum to the third element otherwise do nothing. The process
11 | P a g e
3. After each iteration, minimum is placed in the front of the unsorted list.
12 | P a g e
The second iteration
13 | P a g e
The fourth iteration
selectionSort(array, size)
repeat (size - 1) times
set the first unsorted element as the minimum
for each of the unsorted elements
if element < currentMinimum
set element as new minimum
swap minimum with first unsorted position
end selectionSort
Java
C++
14 | P a g e
def selectionSort(array, size):
Best O(n2)
Worst O(n2)
Average O(n2)
Stability No
15 | P a g e
Cycle Number of Comparison
1st (n-1)
2nd (n-2)
3rd (n-3)
... ...
last 1
2 nearly equals to n2 .
Complexity = O(n2)
Time Complexities:
• Worst Case Complexity: O(n2)
It occurs when the elements of the array are in jumbled order (neither
ascending nor descending).
The time complexity of the selection sort is the same in all cases. At every
step, you have to find the minimum element and put it in the right place. The
minimum element is not known until the end of the array is not reached.
16 | P a g e
Space Complexity:
Space complexity is O(1) because an extra variable temp is used.
Bubble Sort
Bubble sort is a sorting algorithm that compares two adjacent elements and
swaps them until they are in the intended order.
Just like the movement of air bubbles in the water that rise up to the surface,
each element of the array move to the end in each iteration. Therefore, it is
called a bubble sort.
17 | P a g e
2. If the first element is greater than the second element, they are swapped.
3. Now, compare the second and the third elements. Swap them if they are not
in order.
2. Remaining Iteration
The same process goes on for the remaining iterations.
After each iteration, the largest element among the unsorted elements is
placed at the end.
18 | P a g e
Put the largest element at the end
In each iteration, the comparison takes place up to the last unsorted element.
The array is sorted when all the unsorted elements are placed at their correct
positions.
19 | P a g e
The array is sorted if all elements
are kept in the right order
bubbleSort(array)
for i <- 1 to indexOfLastUnsortedElement-1
if leftElement > rightElement
swap leftElement and rightElement
end bubbleSort
Java
C++
def bubbleSort(array):
20 | P a g e
for i in range(len(array)):
bubbleSort(data)
bubbleSort(array)
swapped <- false
21 | P a g e
for i <- 1 to indexOfLastUnsortedElement-1
if leftElement > rightElement
swap leftElement and rightElement
swapped <- true
end bubbleSort
def bubbleSort(array):
swapped = True
22 | P a g e
data = [-2, 45, 0, 11, -9]
bubbleSort(data)
Best O(n)
Worst O(n2)
Average O(n2)
Stability Yes
Complexity in Detail
1st (n-1)
2nd (n-2)
3rd (n-3)
23 | P a g e
....... ......
last 1
nearly equals to n2
Also, if we observe the code, bubble sort requires two loops. Hence, the
complexity is n*n = n2
1. Time Complexities
It occurs when the elements of the array are in jumbled order (neither
ascending nor descending).
2. Space Complexity
24 | P a g e
• complexity does not matter
We assume that the first card is already sorted then, we select an unsorted
card. If the unsorted card is greater than the card in hand, it is placed on the
right otherwise, to the left. In the same way, other unsorted cards are taken
and put in their right place.
Initial array
1. The first element in the array is assumed to be sorted. Take the second
element and store it separately in key .
Compare key with the first element. If the first element is greater than key ,
25 | P a g e
If the first element is
greater than key, then key is placed in front of the first element.
2. Now, the first two elements are sorted.
Take the third element and compare it with the elements on the left of it.
Placed it just behind the element smaller than it. If there is no element smaller
26 | P a g e
than it, then place it at the beginning of the array.
27 | P a g e
3. Similarly, place every unsorted element at its correct position.
Place 4 behind 1
28 | P a g e
Place 3 behind 1 and the
array is sorted
insertionSort(array)
mark first element as sorted
for each unsorted element X
'extract' the element X
for j <- lastSortedIndex down to 0
if current element j > X
move sorted element to the right by 1
break loop and insert X here
29 | P a g e
end insertionSort
Java
C++
def insertionSort(array):
# Compare key with each element on the left of it until an element smaller than
it is found
# For descending order, change key<array[j] to key>array[j].
while j >= 0 and key < array[j]:
array[j + 1] = array[j]
j = j - 1
data = [9, 5, 1, 4, 3]
insertionSort(data)
print('Sorted Array in Ascending Order:')
print(data)
Best O(n)
Worst O(n2)
Average O(n2)
Stability Yes
Time Complexities
• Worst Case Complexity: O(n2)
Each element has to be compared with each of the other elements so, for
every nth element, (n-1) number of comparisons are made.
When the array is already sorted, the outer loop runs for n number of times
whereas the inner loop does not run at all. So, there are only n number of
comparisons. Thus, complexity is linear.
• Average Case Complexity: O(n2)
31 | P a g e
Space Complexity
Space complexity is O(1) because an extra variable key is used.
32 | P a g e
Merge Sort example
33 | P a g e
Suppose we had to sort an array A . A subproblem would be to sort a sub-
section of this array starting at index p and ending at index r , denoted
as A[p..r] .
Divide
If q is the half-way point between p and r, then we can split the
subarray A[p..r] into two arrays A[p..q] and A[q+1, r] .
Conquer
In the conquer step, we try to sort both the subarrays A[p..q] and A[q+1, r] .
If we haven't yet reached the base case, we again divide both these
subarrays and try to sort them.
Combine
When the conquer step reaches the base step and we get two sorted
subarrays A[p..q] and A[q+1, r] for array A[p..r] , we combine the results by
creating a sorted array A[p..r] from two sorted subarrays A[p..q] and A[q+1,
r] .
MergeSort Algorithm
The MergeSort function repeatedly divides the array into two halves until we
reach a stage where we try to perform MergeSort on a subarray of size 1
i.e. p == r .
After that, the merge function comes into play and combines the sorted
arrays into larger arrays until the whole array is merged.
MergeSort(A, p, r):
if p > r
return
q = (p+r)/2
34 | P a g e
mergeSort(A, p, q)
mergeSort(A, q+1, r)
merge(A, p, q, r)
As shown in the image below, the merge sort algorithm recursively divides
the array into halves until we reach the base case of array with 1 element.
After that, the merge function picks up the sorted sub-arrays and merges
them to gradually sort the entire array.
35 | P a g e
The algorithm maintains three pointers, one for each of the two arrays and
one for maintaining the current index of the final sorted array.
No:
Yes:
Merge step
36 | P a g e
Writing the Code for Merge Algorithm
A noticeable difference between the merging step we described above and
the one we use for merge sort is that we only perform the merge function on
consecutive sub-arrays.
This is why we only need the array, the first position, the last index of the first
subarray(we can calculate the first index of the second subarray) and the last
index of the second subarray.
Our task is to merge two subarrays A[p..q] and A[q+1..r] to create a sorted
array A[p..r] . So the inputs to the function are A, p, q and r
The merge function works as follows:
37 | P a g e
for (int i = 0; i < n1; i++)
L[i] = arr[p + i];
for (int j = 0; j < n2; j++)
M[j] = arr[q + 1 + j];
38 | P a g e
As usual, a picture speaks a thousand words.
39 | P a g e
int i, j, k;
i = 0;
j = 0;
k = p;
40 | P a g e
Comparing individual elements of sorted subarrays until we reach end of one
Step 4: When we run out of elements in either L or M, pick up
the remaining elements and put in A[p..r]
41 | P a g e
Copy
the remaining elements from the first array to main subarray
Copy remaining
elements of second array to main subarray
This step would have been needed if the size of M was greater than L.
42 | P a g e
Merge Sort Code in Python, Java, and C/C++
Python
Java
C
C++
# MergeSort in Python
def mergeSort(array):
if len(array) > 1:
i = j = k = 0
43 | P a g e
# Print the array
def printList(array):
for i in range(len(array)):
print(array[i], end=" ")
print()
# Driver program
if __name__ == '__main__':
array = [6, 5, 12, 10, 9, 1]
mergeSort(array)
Best O(n*log n)
Worst O(n*log n)
Average O(n*log n)
Stability Yes
44 | P a g e
Time Complexity
Space Complexity
• External sorting
• E-commerce applications
Quicksort Algorithm
Quicksort is a sorting algorithm based on the divide and conquer
approach where
1. An array is divided into subarrays by selecting a pivot element (element
selected from the array).
While dividing the array, the pivot element should be positioned in such a
way that elements less than pivot are kept on the left side and elements
greater than pivot are on the right side of the pivot.
2. The left and right subarrays are also divided using the same approach. This
process continues until each subarray contains a single element.
3. At this point, elements are already sorted. Finally, elements are combined to
form a sorted array.
45 | P a g e
Working of Quicksort Algorithm
1. Select the Pivot Element
There are different variations of quicksort where the pivot element is selected
from different positions. Here, we will be selecting the rightmost element of
the array as the pivot element.
1. A pointer is fixed at the pivot element. The pivot element is compared with
the elements beginning from the first index.
Comparison of
pivot element with element beginning from the first index
46 | P a g e
2. If the element is greater than the pivot element, a second pointer is set for
3. Now, pivot is compared with other elements. If an element smaller than the
pivot element is reached, the smaller element is swapped with the greater
element found earlier.
Pivot is compared
with other elements.
4. Again, the process is repeated to set the next greater element as the second
pointer. And, swap it with another smaller element.
47 | P a g e
The process is
repeated to set the next greater element as the second pointer.
3. Divide Subarrays
Pivot elements are again chosen for the left and the right sub-parts
separately. And, step 2 is repeated.
48 | P a g e
Select pivot element of
in each half and put at correct place using recursion
The subarrays are divided until each subarray is formed of a single element.
At this point, the array is already sorted.
50 | P a g e
# pointer for greater element
i = low - 1
data = [8, 7, 2, 1, 0, 9, 6]
print("Unsorted Array")
print(data)
size = len(data)
quickSort(data, 0, size - 1)
51 | P a g e
print(data)
Quicksort Complexity
Time Complexity
Best O(n*log n)
Worst O(n2)
Average O(n*log n)
Stability No
1. Time Complexities
It occurs when the pivot element picked is either the greatest or the smallest
element.
This condition leads to the case in which the pivot element lies in an extreme
end of the sorted array. One sub-array is always empty and another sub-
array contains n - 1 elements. Thus, quicksort is called only on this sub-
array.
52 | P a g e
• Best Case Complexity [Big-omega]: O(n*log n)
It occurs when the pivot element is always the middle element or near to the
middle element.
• Average Case Complexity [Big-theta]: O(n*log n)
Quicksort Applications
Quicksort algorithm is used when
3, 76, 34, 23, 32] and after sorting, we get a sorted array [3,10,23,32,34,76] .
Heap sort works by visualizing the elements of the array as a special kind of
complete binary tree called a heap.
53 | P a g e
Note: As a prerequisite, you must know about a complete binary
tree and heap data structure.
If the index of any element in the array is i , the element in the index 2i+1 will
become the left child and element in 2i+2 index will become the right child.
Also, the parent of any element at index i is given by the lower bound of (i-
1)/2 .
Relationship
between array and heap indices
= element in 1 index
54 | P a g e
= 12
Right child of 1
= element in 2 index
= 9
Similarly,
= element in 3 index
= 5
Right child of 12
= element in 4 index
= 6
Let us also confirm that the rules hold for finding parent of any node
Parent of 9 (position 2)
= (2-1)/2
= ½
= 0.5
~ 0 index
= 1
55 | P a g e
Parent of 12 (position 1)
= (1-1)/2
= 0 index
= 1
56 | P a g e
Max Heap and Min
Heap
Since heapify uses recursion, it can be difficult to grasp. So let's first think
about how you would heapify a tree with just three elements.
heapify(array)
Root = array[0]
Largest = largest( array[0] , array [2*0 + 1]. array[2*0+2])
if(Root != Largest)
Swap(Root, Largest)
57 | P a g e
Heapify base cases
The example above shows two scenarios - one in which the root is the largest
element and we don't need to do anything. And another in which the root had
a larger element as a child and we needed to swap to maintain max-heap
property.
Now let's think of another scenario in which there is more than one level.
58 | P a g e
How to heapify root element when its subtrees are
already max heaps
The top element isn't a max-heap but all the sub-trees are max-heaps.
To maintain the max-heap property for the entire tree, we will have to keep
pushing 2 downwards until it reaches its correct position.
59 | P a g e
How to heapify
root element when its subtrees are max-heaps
Thus, to maintain the max-heap property in a tree where both sub-trees are
max-heaps, we need to run heapify on the root element repeatedly until it is
larger than its children or it becomes a leaf node.
60 | P a g e
// Swap and continue heapifying if root is not largest
if (largest != i) {
swap(&arr[i], &arr[largest]);
heapify(arr, n, largest);
}
}
This function works for both the base case and for a tree of any size. We can
thus move the root element to the correct position to maintain the max-heap
status for any tree size as long as the sub-trees are max-heaps.
Build max-heap
To build a max-heap from any tree, we can thus start heapifying each sub-
tree from the bottom up and end up with a max-heap after the function is
applied to all the elements including the root element.
In the case of a complete tree, the first index of a non-leaf node is given
by n/2 - 1 . All other nodes after that are leaf-nodes and thus don't need to
be heapified.
So, we can build a maximum heap as
61 | P a g e
Create array and calculate i
62 | P a g e
Steps to build max heap for heap sort
Steps to build max heap for heap sort
63 | P a g e
Steps to build max heap for heap sort
64 | P a g e
If you've understood everything till here, congratulations, you are on your
way to mastering the Heap sort.
2. Swap: Remove the root element and put at the end of the array (nth position)
Put the last item of the tree (heap) at the vacant place.
3. Remove: Reduce the size of the heap by 1.
4. Heapify: Heapify the root element again so that we have the highest element
at root.
5. The process is repeated until all the items of the list are sorted.
65 | P a g e
Swap, Remove, and Heapify
66 | P a g e
The code below shows the operation.
// Heap sort
for (int i = n - 1; i >= 0; i--) {
swap(&arr[0], &arr[i]);
Java
C++
# Heap Sort in python
def heapSort(arr):
n = len(arr)
67 | P a g e
heapify(arr, n, i)
Best O(nlog n)
Worst O(nlog n)
Average O(nlog n)
Stability No
Heap Sort has O(nlog n) time complexities for all the cases ( best case,
average case, and worst case).
Let us understand the reason why. The height of a complete binary tree
containing n elements is log n
68 | P a g e
As we have seen earlier, to fully heapify an element whose subtrees are
already max-heaps, we need to keep comparing the element with its left and
right children and pushing it downwards until it reaches a point where both
its children are smaller than it.
In the worst case scenario, we will need to move an element from the root to
the leaf node making a multiple of log(n) comparisons and swaps.
During the build_max_heap stage, we do that for n/2 elements so the worst
case complexity of the build_heap step is n/2*log n ~ nlog n .
During the sorting step, we exchange the root element with the last element
and heapify the root element. For each element, this again takes log n worst
time because we might have to bring the element all the way from the root
to the leaf. Since we repeat this n times, the heap_sort step is also nlog n .
Also since the build_max_heap and heap_sort steps are executed one after
another, the algorithmic complexity is not multiplied and it remains in the
order of nlog n .
Also it performs sorting in O(1) space complexity. Compared with Quick Sort,
it has a better worst case ( O(nlog n) ) . Quick Sort has complexity O(n^2) for
worst case. But in other cases, Quick Sort is fast. Introsort is an alternative
to heapsort that combines quicksort and heapsort to retain advantages of
both: worst case speed of heapsort and average speed of quicksort.
69 | P a g e
Quick Sort, Merge Sort ). However, its underlying data structure, heap, can
be efficiently used if we want to extract the smallest (or largest) from the list
of items without the overhead of keeping the remaining items in the sorted
order. For e.g Priority Queues.
Quick sort
• Python code
• Based on divide and conquer approach.
• Algorithm:
o An array is divided into sub-arrays by selecting a pivot element (element
selected from the array).
o While dividing the array, the pivot element should be positioned in such a way
that elements less than pivot are kept on the left side and elements greater
than pivot are on the right side of the pivot.
o The left and right sub-arrays are also divided using the same approach. This
process continues until each subarray contains a single element.
o At this point, elements are already sorted. Finally, elements are combined to
form a sorted array
• Working with Quicksort algorithm:
i. Select the pivot element. We select the rightmost element of array as pivot
element.
ii. Rearrange the array. We rearrange smaller and larger elements to right and left
side of pivot.
70 | P a g e
b. We compare "j" with pivot. If "j" is smaller than pivot we swap "j" with
"i", and make "++i".
c. If "j" reaches the pivot, we just swap pivot with "i".
d. Now we have two sub-arrays, we repeat the same algo.
Heap sort
• Python code
• Left child of element i is 2i + 1, right child is 2i + 2. Indexing starts from 0
• Parent of element i can be found with (i-1) / 2
• Heap data structure:
o It is a complete binary tree (nodes are formed from left to right)
o All nodes are greater than children (max-heap)
o
• To create a Max-Heap from a complete binary tree, we must use a heapify function.
71 | P a g e
o
o n/2 - 1 is the first index of a non-leaf node.
o Heapify function, which bring larger element in top. Used just for one
sub-tree recursively.
72 | P a g e
o Firstly, it is a kind of pre-condition for swapping, we must bring our tree
to MAX-HEAP, so that the largest element is in top. It is needed so that
we start sorting the array.
o // Max-heap creation
o for(int i = n/2 - 1; i >= 0; i--)
heapify(arr, n, i);
Linked List
• Array limitations:
o Fixed size
o Physically stored in consecutive memory locations
o To insert or delete items, may need to shift data
• Variations of linked list: linear linked list, circular linked list, double linked list
• head pointer "defines" the linked list (it is not a node)
73 | P a g e
• Disadvantages of Linked Lists
o A linked list will use more memory storage than arrays. It has more memory for
an additional linked field or next pointer field.
o Linked list elements cannot randomly be accessed.
o Binary search cannot be applied in a linked list.
o A linked list takes more time in traversing of elements.
• Node
o A linked list is an ordered sequence of items called nodes
o A node is the basic unit of representation in a linked list
o A node in a singly linked list consists of two fields:
▪ A data portion
▪ A link (pointer) to the next node in the structure
o The first item (node) in the linked list is accessed via a front or head pointer
▪ The linked list is defined by its head (this is its starting point)
• We will use ListNode and LinkedList classes ([Link]
• class Node {
• public:
• int info; // data
• Node* next; // pointer to next node in the list
• /*Node(int val) {info = val; next=NULL;}*/
• };
•
• class List {
• public:
• // head: a pointer to the first node in the list.
• // Since the list is empty initially, head is set to NULL
• List(void) {head = NULL;} // constructor
• ~List(void); // destructor
•
• private:
• Node* head;
• };
// isEmpty, insertNode, findNode, deleteNode, displayList
• Boundary condition
o Empty data structure
o Single element in the data structure
o Adding / removing beginning of data structure
o Adding / removing end of data structure
o Working in the middle
74 | P a g e
o New node should be considered as a head. It can be achieved by declaring head
equals to a new node.
75 | P a g e
• cur = head;
•
• for(int i=1; i<pos; i++) {
• pre = cur;
• cur = cur->next;
• }
• pre->next = node;
• node->next = cur;
}
void insertSpecificValue(int sp_val, int data) {
Node *pre;
Node *cur;
Node *node = new Node;
node->info = data;
cur = head; // "current" in the beginning points to head, and "previous"
points to NULL
while(cur->data != sp_val) {
pre = cur;
cur = cur->next;
}
node->next = cur;
cur->next = node;
}
76 | P a g e
Deleting the last node from a Linked List
• Following steps we need to remove the first node:
o Check if the linked list exists or not if(head == NULL).
o Check if it is one element list.
o Take a pointer variable PTR and initialize it with head. That is, PTR now points to
the first node of the linked list. In the while loop, we take another pointer
variable PREPTR such that it always points to one node before the PTR. Once we
reach the last node and the second last node, we set the NEXT pointer of the
second last node to NULL, so that it now becomes the (new) last node of the
linked list. The memory of the previous last node is freed and returned back to
the free pool.
• STEP 1: IF START = NULL
• WRITE UNDERFLOW
• Go to STEP 8
• [END OF IF]
• STEP 2: SET PTR = START
• STEP 3: REPEAT Steps 4 and 5 while PTR->NEXT != NULL
• STEP 4: SET PREPTR = PTR
• STEP 5: SET PTR = PTR->NEXT
• [END OF LOOP]
• STEP 6: SET PREPTR->NEXT = NULL
• STEP 7: FREE PTR
• STEP 8: EXIT
77 | P a g e
• Node *next;
• };
•
• class CircularLList {
• public:
• Node *last;
•
• CircularLList() {
• last = NULL;
• }
• };
prev = last;
cur = last->next;
78 | P a g e
New->next = cur;
prev->next = New;
}
Stacks
79 | P a g e
• Last in, first out (LIFO)
• Elements are added to and removed from the top of the stack (the most recently added
items are at the top of the stack).
• Operations on Stack:
o push(i) to insert the element i on the top of the stack.
o pop() to remove the top element of the stack and to return the removed
element as a function value.
o top() to return the top element of stack(s)
o empty() to check whether the stack is empty or not. It returns true if stack is
empty and returns false otherwise.
80 | P a g e
• Step 1: IF TOP = MAX - 1
• PRINT "OVERFLOW"
• Goto Step 4
• [END OF IF]
• Step 2: SET TOP = TOP + 1
• Step 3: SET STACK[TOP] = VALUE
• Step 4: END
•
• POP operation
• Step 1: IF TOP = NULL
• PRINT "UNDERFLOW"
• Goto Step 4
• [END OF IF]
• Step 2: SET VALUE STACK(TOP)
• Step 3: SET TOP = TOP - 1
• Step 4: END
•
• PEEK operation
• Step 1: IF TOP = NULL
• PRINT "STACK IS EMPTY"
• Goto Step 3
• Step 2: RETURN STACK[TOP]
• Step 3: END
Infix to Postfix
• Algorithm used (Postfix):
o Step 1: Add ) to the end of the infix expression
o Step 2: Push ( onto the STACK
o Step 3: Repeat until each character in the infix notation is scanned
▪ IF a ( is encountered, push it on the STACK.
▪ IF an operand (whether a digit or acharacter) is encountered, add it
postfix expression.
▪ IF a ) is encountered, then
▪ a. Repeatedly pop from STACK and add it to the postfix
expression until a ( is encountered.
▪ b. Discard the (. That is, remove the ( from STACK and do not
add it to the postfix expression
▪ IF an operator O is encountered, then
81 | P a g e
a. Repeatedly pop from STACK and add each operator (popped
▪
from the STACK) to the postfix expression which has the same
precedence or ahigher precedence than O
▪ b. Push the operator to the STACK [END OF IF]
o Step 4: Repeatedly pop from the STACK and add it to the postfix expression until
the STACK is empty
o Step 5: EXIT
• If / adds to ((-* we will take only *, then it will be ((-/
• Example: (A * B) + (C / D) – (D + E)
•
• (A * B) + (C / D) – (D + E)) [put extra ")" at last]
•
• Char Stack Expression
• ( (( Push at beginning "("
• A (( A
• * ((* A
• B ((* AB
• ) ( AB*
• + (+ AB*
• ( (+( AB*
• C (+( AB*C
• / (+(/ AB*C
• D (+(/ AB*CD
• ) (+ AB*CD/
• - (- AB*CD/+
• ( (-( AB*CD/+
• D (-( AB*CD/+D
• + (-(+ AB*CD/+D
• E (-(+ AB*CD/+DE
• ) (- AB*CD/+DE+
• ) AB*CD/+DE+-
Infix to Prefix
82 | P a g e
First method
Second method
o Step 1: Reverse the infix string. Note that while reversing the string you must
interchange left and right parentheses. Eg. (3+2) will be (2+3) but not )2+3(
o Step 2: Obtain the postfix expression of the infix expression obtained in Step 1.
o Step 3: Reverse the postfix expression to get the prefix expression
• Example: 14 / 7 * 3 - 4 + 9 / 2
83 | P a g e
•
• Reversed: 2 / 9 + 4 - 3 * 7 / 14
•
• Char Stack Expression
• 2 ( Push at beginning "("
• / (/ 2
• 9 (/ 2 9
• + (+ 2 9 /
• 4 (+ 2 9 / 4
• - (+- 2 9 / 4
• 3 (+- 2 9 / 4 3
• * (+-* 2 9 / 4 3
• 7 (+-* 2 9 / 4 3 7
• / (+-*/ 2 9 / 4 3 7
• 14 (+-*/ 2 9 / 4 3 7 14
• ) 2 9 / 4 3 7 14 / * - +
•
• DON'T FORGET TO REVERSE: + - * / 14 7 3 4 / 9 2
•
• NOTE: Operator with the same precedence must not be popped from stack
•
Queue
• First in, first out (FIFO)
• The queue has a front and a rear
84 | P a g e
o
o Items can be removed only at the front
o Items can be added only at the ohter end, the rear
• Types of queues:
o Linear queue
o Circular queue
o Double ended queue (Deque)
o Priority queue
Linear Queue
• A queue is a sequence of data elements
• Enqueue (add element to back) when an item is inserted into the queue, it
always goes at the end (rear).
• Dequeue (remove element from front) when an item is taken from the queue, it
always comes from the front.
• Array implementation:
o ENQUEUE
o Step 1: IF REAR = MAX-1
o Write "OVERFLOW"
o Goto step 4
o [END OF IF]
o Step 2: IF FRONT = -1 and REAR = -1
o SET FRONT = REAR = 0
o ELSE
o SET REAR = REAR + 1
o [END OF IF]
o Step 3: SET QUEUE [REAR] = NUM
o Step 4: EXIT
o DEQUEUE
o Step 1: IF FRONT = -1 OR FRONT > REAR
o Write "UNDERFLOW"
o ELSE
o SET VAL = QUEUE[FRONT]
o SET FRONT = FRONT + 1
o [END OF IF]
o Step 2: EXIT
85 | P a g e
o ENQUEUE the same as adding a node at the end
o Step 1: Allocate memory for the new node and name it as PTR
o Step 2: SET PTR -> DATA = VAL
o Step 3:
o IF FRONT = NULL
o SET FRONT = REAR = PTR
o SET FRONT -> NEXT = REAR -> NEXT = NULL
o ELSE
o SET REAR -> NEXT = PTR
o SET REAR = PTR
o SET REAR -> NEXT = NULL
o [END OF IF]
o Step 4: END
o
o DEQUEUE the same as deleting a node from the beginning
o Step 1: IF FRONT = NULL
o Write "Underflow"
o Go to Step 5
o [END OF IF]
o
o Step 2: SET PTR = FRONT
o Step 3: SET FRONT = FRONT -> NEXT
o Step 4: FREE PTR
o Step 5: END
Circular Queue
• [Link]
• Drawbacks of linear queue once the queue is full, eventhough few elements
from the front are deleted and some occupied space is relieved, it is not possible
to add anymore new elements, as the rear has already reached the Queue’s rear
most position.
• In circular queue, once the Queue is full the "First" index of the Queue becomes
the "Rear" most index, if and only if the "Front" element has moved forward.
Otherwise it will be a "Queue overflow" state.
• ENQUEUE algorithm:
86 | P a g e
•
• 1. If Front = -1 and Rear = -1:
• then Set Front :=0 and go to step 4
•
• 2. If Front = 0 and Rear = N-1 or Front = Rear + 1:
• then Print: “Circular Queue Overflow” and Return
•
• 3. If Rear = N -1:
• then Set Rear := 0 and go to step 4
•
• 4. Set CQueue [Rear] := Item and Rear := Rear + 1
•
• 5. Return
o Here, CQueue is a circular queue.
o Rear represents the location in which the data element is to be inserted.
o Front represents the location from which the data element is to be removed.
o N is the maximum size of CQueue
o Item is the new item to be added.
o Initailly Rear = -1 and Front = -1.
• DEQUEUE algorithm:
87 | P a g e
• No element can be added and deleted from the middle.
• Implemented using either a circular array or a circular doubly linked list.
• In a deque, two pointers are maintained, LEFT and RIGHT, which point to either end of
the deque.
• The elements in a deque extend from the LEFT end to the RIGHT end and since it is
circular, Deque[N–1] is followed by Deque[0].
• Two types:
o Input restricted deque In this, insertions can be done only at one of the ends,
while deletions can be done from both ends.
o Output restricted deque In this deletions can be done only at one of the ends,
while insertions can be done on both ends.
Priority Queue
• A priority queue is a data structure in which each element is assigned a priority.
• The priority of the element will be used to determine the order in which the elements
will be processed.
• An element with higher priority is processed before an element with a lower priority.
• Two elements with the same priority are processed on a first-come-first-served (FCFS)
basis.
Tree
• Root: node without parent (A)
• Siblings: nodes share the same parent
• Internal node: node with at least one child (A, B, C, F)
• External node (leaf): node without children (E, I, J, K, G, H, D)
• Ancestors of a node: parent, grandparent, grand-grandparent, etc.
• Descendant of a node: child, grandchild, grand-grandchild, etc.
• Depth of a node: number of ancestors
• Height of a tree: maximum depth of any node (3)
• Degree of a node: the number of its children. The leaf of the tree does not have any
child so its degree is zero
• Degree of a tree: the maximum degree of a node in the tree.
• Subtree: tree consisting of a node and its descendants
• Empty (Null)-tree: a tree without any node
• Root-tree: a tree with only one node
88 | P a g e
•
Binary Tree
• [Link]
• In a binary tree,
• Complete binary tree - every level except possibly the last is completely filled.
All nodes must appear as far left as possible.
89 | P a g e
o Every node will have three parts: the data element, a pointer to the left
node, and a pointer to the right node.
o class Node {
o public:
o Node *left;
o int data;
o Node *right;
};
o Every binary tree has a pointer ROOT, which points to the root element
(topmost element) of the tree. If ROOT = NULL, then the tree is empty.
90 | P a g e
•
• PREORDER TRAVERSAL (NLR)
91 | P a g e
Binary Search Tree
• A binary search tree, also known as an ordered binary tree, is a variant of binary trees
in which the nodes are arranged in an order.
• Left sub-tree nodes must have a value less than that of the root node.
• Right sub-tree must have a value either equal to or greater than the root node.
• O(n) worst case for searching in BST
•
• Insert 39,27,45,18,29,40,9,21,10,19,54,59,65,60 in binary search tree
92 | P a g e
Deletion Operation in Binary Search Tree
• Deleting a Node that has no children, delete 78
93 | P a g e
• Main algorithm:
Graphs
• Vertices (nodes), edges (lines between vertices), undericted graph, directed
graph
• Size of a graph - The size of a graph is the total number of edges in it.
94 | P a g e
• Regular graph - It is a graph where each vertex has the same number of
neighbors. That is, every node has the same degree.
• Connected graph - A graph is said to be connected if for any two vertices (u,
v) in V there is a path from u to v. That is to say that there are no isolated
nodes in a connected graph.
• Complete graph - Fully connected. That is, there is a path from one node to
every other node in the graph. A complete graph has n(n–1)/2 edges, where n
is the number of nodes in G.
• Weighted graph - In a weighted graph, the edges of the graph are assigned
some weight or length.
95 | P a g e
• Multi-graph - A graph with multiple edges and/or loops is called a multi-
graph.
96 | P a g e
o
97 | P a g e
•
•
• Choose any arbitrary node and PUSH (STATUS 2) it into stack. Then only we will POP.
When you POP (STATUS 3) and PUSH neighbours.
98 | P a g e
• A / B * C * D + E
• n: number of nodes
• number of non-null links: n-1
• total links: 2n
• null links: 2n-(n-1)=n+1
• Replace these null pointers with some useful “threads”.
• A one-way threading and a two-way threading exist.
99 | P a g e
•
•
• class Node {
• int data;
• Node *left_child, *right_child;
• boolean leftThread, rightThread;
• }
100 | P a g e
•
101 | P a g e
• Inserting in the left side
AVL Trees
• [Link]
• Adelson-Velsky-Landis - one of many types of Balanced Binary Search Tree. O(log(n))
• Balanced Factor (BF): BF(node) = HEGHT([Link]) - HEIGH([Link])
• Where HEIGHT(x) is the hight of node x. Which is the number of edges between x and
the furthest leaf.
• -1, 0, +1 balanced factor values.
102 | P a g e
•
103 | P a g e
•
•
• Examples:
104 | P a g e
o
105 | P a g e
o
106 | P a g e
• Example R1:
• Example R-1:
Huffman Encoding
• Fixed-Length encoding
• Variable-Length encoding
• Prefix rule - used to prevent ambiguities during decoding which states that no
binary code should be a prefix of another code.
o Bad Good
o a 0 a 0
o b 011 b 11
o c 111 c 101
o d 11 d 100
o
o Step 1- Create a leaf node for each character and build a min heap using all the
nodes (The frequency value is used to compare two nodes in min heap)
o Step 2- Repeat Steps 3 to 5 while heap has more than one node
o Step 3- Extract two nodes, say x and y, with minimum frequency from the heap
o Step 4- Create a new internal node z with x as its left child and y as its right
child. Also frequency(z)= frequency(x)+frequency(y)
o Step 5- Add z to min heap
107 | P a g e
o Step 6- Last node in the heap is the root of Huffman tree
M-way trees
• [Link]
• Binary search tree is the binary tree.
• Each node has m children and m-1 key fields. The keys in each node are in ascending
order.
• A binary search tree has one value in each node and two subtrees. This notion easily
generalizes to an M-way search tree, which has (M-1) values per node and M subtrees.
• M is called the degree of the tree. A binary search tree, therefore, has degree 2.
• M is thus a fixed upper limit on how much data can be stored in a node.
B-Trees
Multiway Trees
A multiway tree is a tree that can have more than two children. A multiway tree of
order m (or an m-way tree) is one in which a tree can have m children.
As with the other trees that have been studied, the nodes in an m-way tree will be
made up of key fields, in this case m-1 key fields, and pointers to children.
108 | P a g e
To make the processing of m-way trees easier some type of order will be imposed
on the keys within each node, resulting in a multiway search tree of order m ( or
an m-way search tree). By definition an m-way search tree is a m-way tree in
which:
109 | P a g e
M-way search trees give the same advantages to m-way trees that binary search trees
gave to binary trees - they provide fast information retrieval and update. However,
they also have the same problems that binary search trees had - they can become
unbalanced, which means that the construction of the tree becomes of vital
importance.
B-Trees
An extension of a multiway search tree of order m is a B-tree of order m. This type
of tree will be used when the data to be accessed/stored is located on secondary
storage devices because they allow for large amounts of data to be stored in a node.
1. The root has at least two subtrees unless it is the only node in the tree.
2. Each nonroot and each nonleaf node have at most m nonempty children and
at least m/2 nonempty children.
3. The number of keys in each nonroot and each nonleaf node is one less than
the number of its nonempty children.
4. All leaves are on the same level.
These restrictions make B-trees always at least half full, have few levels, and remain
perfectly balanced.
The nodes in a B-tree are usually implemented as a class that contains an array of
m-l cells for keys, an array of m pointers to other nodes, and whatever other
information is required in order to facilitate tree maintenance.
template <class T, int M>
class BTreeNode
{
public:
BTreeNode();
BTreeNode( const T & );
private:
T keys[M-1];
BTreeNode *pointers[M];
...
};
Searching a B-tree
An algorithm for finding a key in B-tree is simple. Start at the root and determine
which pointer to follow based on a comparison between the search value and key
fields in the root node. Follow the appropriate pointer to a child node. Examine the
key fields in the child node and continue to follow the appropriate pointers until the
110 | P a g e
search value is found or a leaf node is reached that doesn't contain the desired search
value.
The condition that all leaves must be on the same level forces a characteristic
behavior of B-trees, namely that B-trees are not allowed to grow at the their leaves;
instead they are forced to grow at the root.
When inserting into a B-tree, a value is inserted directly into a leaf. This leads to
three common situations that can occur:
This is the easiest of the cases to solve because the value is simply inserted into the
correct sorted position in the leaf node.
In this case, the leaf node where the value should be inserted is split in two, resulting
in a new leaf node. Half of the keys will be moved from the full leaf to the new leaf.
The new leaf is then incorporated into the B-tree.
111 | P a g e
The new leaf is incorporated by moving the middle value to the parent and a pointer
to the new leaf is also added to the parent. This process is continues up the tree until
all of the values have "found" a location.
The new node needs to be incorporated into the tree - this is accomplished by taking
the middle value and inserting it in the parent:
The upward movement of values from case 2 means that it's possible that a value
could move up to the root of the B-tree. If the root is full, the same basic process
112 | P a g e
from case 2 will be applied and a new root will be created. This type of split results
in 2 new nodes being added to the B-tree.
Results in:
The 15 needs to be moved to the root node but it is full. This means that the root
needs to be divided:
The 15 is inserted into the parent, which means that it becomes the new root node:
As usual, this is the hardest of the processes to apply. The deletion process will
basically be a reversal of the insertion process - rather than splitting nodes, it's
possible that nodes will be merged so that B-tree properties, namely the requirement
that a node must be at least half full, can be maintained.
113 | P a g e
1a) If the leaf is at least half full after deleting the desired value, the remaining larger
values are moved to "fill the gap".
results in:
1b) If the leaf is less than half full after deleting the desired value (known as
underflow), two things could happen:
1b-1) If there is a left or right sibling with the number of keys exceeding the
minimum requirement, all of the keys from the leaf and sibling will be redistributed
between them by moving the separator key from the parent to the leaf and moving
the middle key from the node and the sibling combined to the parent.
114 | P a g e
1b-2) If the number of keys in the sibling does not exceed the minimum requirement,
then the leaf and sibling are merged by putting the keys from the leaf, the sibling,
and the separator from the parent into the leaf. The sibling node is discarded and the
keys in the parent are moved to "fill the gap". It's possible that this will cause the
parent to underflow. If that is the case, treat the parent as a leaf and continue
repeating step 1b-2 until the minimum requirement is met or the root of the tree is
reached.
Special Case for 1b-2: When merging nodes, if the parent is the root with only one
key, the keys from the node, the sibling, and the only key of the root are placed into
a node and this will become the new root for the B-tree. Both the sibling and the old
root will be discarded.
This case can lead to problems with tree reorganization but it will be solved in a
manner similar to deletion from a binary search tree.
The key to be deleted will be replaced by its immediate predecessor (or successor)
and then the predecessor (or successor) will be deleted since it can only be found in
a leaf node.
115 | P a g e
The "gap" is filled in with the immediate predecessor:
The vales in the left sibling are combined with the separator key (18) and the
remaining values. They are divided between the 2 nodes:
• Every node in a B-Tree contains at most m children. (other nodes beside root & leaf
must have at least m/2 children)
• All leaf nodes must be at the same level.
• Inserting
116 | P a g e
o Find the appropriate leaf node
o If the leaf node contain less than m-1 keys then insert the element in the
increasing order.
o Else if the leaf contains m-1:
▪ Insert the new element in the increasing order of elements.
▪ Split the node into the two nodes at the median.
▪ Push the median element upto its parent node.
▪ If the parent node also contain m-1 number of keys, then split it too by
following the same steps.
• Discuss
• Courses
• Practice
•
Hashing refers to the process of generating a fixed-size output from an input of
variable size using the mathematical formulas known as hash functions. This
technique determines an index or location for the storage of an item in a data
structure.
What is Hashing
Table of Contents/Roadmap
• What is Hashing
117 | P a g e
• Need for Hash data structure
• Components of Hashing
• How does Hashing work?
• What is a Hash function?
• Types of Hash functions:
• Properties of a Good hash function:
• Complexity of calculating hash value using the hash function
• Problem with Hashing:
• What is collision?
• How to handle Collisions?
• 1) Separate Chaining:
• 2) Open Addressing:
• 2.a) Linear Probing:
• 2.b) Quadratic Probing:
• 2.c) Double Hashing:
• What is meant by Load Factor in Hashing?
• What is Rehashing?
• Applications of Hash Data structure
• Real-Time Applications of Hash Data structure
• Advantages of Hash Data structure
• Disadvantages of Hash Data structure
• Conclusion
Need for Hash data structure
Every day, the data on the internet is increasing multifold and it is always a
struggle to store this data efficiently. In day-to-day programming, this amount of
data might not be that big, but still, it needs to be stored, accessed, and processed
easily and efficiently. A very common data structure that is used for such a
purpose is the Array data structure.
Now the question arises if Array was already there, what was the need for a new
data structure! The answer to this is in the word “efficiency“. Though storing in
Array takes O(1) time, searching in it takes at least O(log n) time. This time
appears to be small, but for a large data set, it can cause a lot of problems and
this, in turn, makes the Array data structure inefficient.
So now we are looking for a data structure that can store the data and search in it
in constant time, i.e. in O(1) time. This is how Hashing data structure came into
play. With the introduction of the Hash data structure, it is now possible to easily
store data in constant time and retrieve them in constant time as well.
Components of Hashing
There are majorly three components of hashing:
1. Key: A Key can be anything string or integer which is fed as input in the hash
function the technique that determines an index or location for storage of an
item in a data structure.
2. Hash Function: The hash function receives the input key and returns the
index of an element in an array called a hash table. The index is known as
the hash index.
118 | P a g e
3. Hash Table: Hash table is a data structure that maps keys to values using a
special function called a hash function. Hash stores the data in an associative
manner in an array where each data value has its own unique index.
Components of Hashing
119 | P a g e
• “cd” in 7 mod 7 = 0, and
• “efg” in 18 mod 7 = 4.
120 | P a g e
2. Should uniformly distribute the keys (Each table position is equally likely for
each.
3. Should minimize collisions.
4. Should have a low load factor(number of items in the table divided by the size
of the table).
Complexity of calculating hash value using the hash function
• Time complexity: O(n)
• Space complexity: O(1)
Problem with Hashing
If we consider the above example, the hash function we used is the sum of the
letters, but if we examined the hash function closely then the problem can be
easily visualized that for different strings same hash value is begin generated by
the hash function.
For example: {“ab”, “ba”} both have the same hash value, and string {“cd”,”be”}
also generate the same hash value, etc. This is known as collision and it creates
problem in searching, insertion, deletion, and updating of value.
What is collision?
The hashing process generates a small number for a big key, so there is a
possibility that two keys could produce the same value. The situation where the
newly inserted key maps to an already occupied, and it must be handled using
some collision handling technology.
121 | P a g e
Collision resolution technique
1) Separate Chaining
The idea is to make each cell of the hash table point to a linked list of records that
have the same hash function value. Chaining is simple but requires additional
memory outside the table.
Example: We have given a hash function and we have to insert some elements in
the hash table using a separate chaining method for collision resolution
technique.
Hash function = key % 5,
Elements = 12, 15, 22, 25 and 37.
Let’s see step by step approach to how to solve the above problem:
• Step 1: First draw the empty hash table which will have a possible range of
hash values from 0 to 4 according to the hash function provided.
122 | P a g e
Hash table
• Step 2: Now insert all the keys in the hash table one by one. The first key to be
inserted is 12 which is mapped to bucket number 2 which is calculated by
using the hash function 12%5=2.
Insert 12
• Step 3: Now the next key is 22. It will map to bucket number 2 because
22%5=2. But bucket 2 is already occupied by key 12.
123 | P a g e
Insert 22
• Step 4: The next key is 15. It will map to slot number 0 because 15%5=0.
Insert 15
• Step 5: Now the next key is 25. Its bucket number will be 25%5=0. But bucket
0 is already occupied by key 25. So separate chaining method will again
handle the collision by creating a linked list to bucket 0.
124 | P a g e
Insert 25
Hence In this way, the separate chaining method is used as the collision
resolution technique.
2) Open Addressing
In open addressing, all elements are stored in the hash table itself. Each table
entry contains either a record or NIL. When searching for an element, we
examine the table slots one by one until the desired element is found or it is clear
that the element is not in the table.
2.a) Linear Probing
In linear probing, the hash table is searched sequentially that starts from the
original location of the hash. If in case the location that we get is already
occupied, then we check for the next location.
Algorithm:
1. Calculate the hash key. i.e. key = data % size
2. Check, if hashTable[key] is empty
• store the value directly by hashTable[key] = data
3. If the hash index already has some value then
• check for next index using key = (key+1) % size
4. Check, if the next index is available hashTable[key] then store the value.
Otherwise try for next index.
5. Do the above process till we find the space.
Example: Let us consider a simple hash function as “key mod 5” and a sequence
of keys that are to be inserted are 50, 70, 76, 85, 93.
• Step 1: First draw the empty hash table which will have a possible range of
hash values from 0 to 4 according to the hash function provided.
125 | P a g e
Hash table
• Step 2: Now insert all the keys in the hash table one by one. The first key is
50. It will map to slot number 0 because 50%5=0. So insert it into slot number
0.
• Step 3: The next key is 70. It will map to slot number 0 because 70%5=0 but
50 is already at slot number 0 so, search for the next empty slot and insert it.
126 | P a g e
Insert 70 into hash table
• Step 4: The next key is 76. It will map to slot number 1 because 76%5=1 but
70 is already at slot number 1 so, search for the next empty slot and insert it.
• Step 5: The next key is 93 It will map to slot number 3 because 93%5=3, So
insert it into slot number 3.
127 | P a g e
Insert 93 into hash table
128 | P a g e
Hash table
129 | P a g e
• Step 3: Inserting 50
• Hash(25) = 50 % 7 = 1
• In our hash table slot 1 is already occupied. So, we will search for
slot 1+12, i.e. 1+1 = 2,
• Again slot 2 is found occupied, so we will search for cell 1+22, i.e.1+4
= 5,
• Now, cell 5 is not occupied so we will place 50 in slot 5.
130 | P a g e
Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first
hash-function is h1(k) = k mod 7 and second hash-function is h2(k) = 1 + (k
mod 5)
• Step 1: Insert 27
• 27 % 7 = 6, location 6 is empty so insert 27 into 6 slot.
• Step 2: Insert 43
• 43 % 7 = 1, location 1 is empty so insert 43 into 1 slot.
131 | P a g e
• Step 3: Insert 692
• 692 % 7 = 6, but location 6 is already being occupied and this is a
collision
• So we need to resolve this collision using double hashing.
hnew = [h1(692) + i * (h2(692)] % 7
= [6 + 1 * (1 + 692 % 5)] % 7
= 9 % 7
= 2
• Step 4: Insert 72
• 72 % 7 = 2, but location 2 is already being occupied and this is a
collision.
• So we need to resolve this collision using double hashing.
hnew = [h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
= 5 % 7
= 5,
132 | P a g e
Insert key 72 in the hash table
133 | P a g e
• Rabin-Karp algorithm for pattern matching in a string.
• Calculating the number of different substrings of a string.
Advantages of Hash Data structure
• Hash provides better synchronization than other data structures.
• Hash tables are more efficient than search trees or other data structures
• Hash provides constant time for searching, insertion, and deletion operations
on average.
Disadvantages of Hash Data structure
• Hash is inefficient when there are many collisions.
• Hash collisions are practically not avoided for a large set of possible keys.
• Hash does not allow null values.
Conclusion
From the above discussion, we conclude that the goal of hashing is to resolve the
challenge of finding an item quickly in a collection. For example, if we have a list
of millions of English words and we wish to find a particular term then we would
use hashing to locate and find it more efficiently. It would be inefficient to check
each item on the millions of lists until we find a match. Hashing reduces search
time by restricting the search to a smaller set of words at the beginning.
Hashing is a technique or process of mapping keys, and values into the hash table by
using a hash function. It is done for faster access to elements. The efficiency of mapping
depends on the efficiency of the hash function used.
Let a hash function H(x) maps the value x at the index x%10 in an Array. For example if
the list of values is [11,12,13,14,15] it will be stored at positions {1,2,3,4,5} in the array or
Hash table respectively.
134 | P a g e
• Discuss
• Courses
• Practice
•
Index Mapping (also known as Trivial Hashing) is a simple form of hashing where
the data is directly mapped to an index in a hash table. The hash function used in
this method is typically the identity function, which maps the input data to itself.
In this case, the key of the data is used as the index in the hash table, and the
value is stored at that index.
For example, if we have a hash table of size 10 and we want to store the value
“apple” with the key “a”, the trivial hashing function would simply map the key
“a” to the index “a” in the hash table, and store the value “apple” at that index.
One of the main advantages of Index Mapping is its simplicity. The hash function
is easy to understand and implement, and the data can be easily retrieved using
the key. However, it also has some limitations. The main disadvantage is that it
can only be used for small data sets, as the size of the hash table has to be the
same as the number of keys. Additionally, it doesn’t handle collisions, so if two
keys map to the same index, one of the data will be overwritten.
Given a limited range array contains both positive and non-positive numbers, i.e.,
elements are in the range from -MAX to +MAX. Our task is to search if some
number is present in the array or not in O(1) time.
Since the range is limited, we can use index mapping (or trivial hashing). We use
values as the index in a big array. Therefore we can search and insert elements in
O(1) time.
135 | P a g e
How to handle negative numbers?
The idea is to use a 2D array of size hash[MAX+1][2]
Algorithm:
Assign all the values of the hash matrix as 0.
Traverse the given array:
• If the element ele is non negative assign
• hash[ele][0] as 1.
• Else take the absolute value of ele and
• assign hash[ele][1] as 1.
To search any element x in the array.
• If X is non-negative check if hash[X][0] is 1 or not. If hash[X][0] is one then the
number is present else not present.
• If X is negative take the absolute value of X and then check if hash[X][1] is 1 or
not. If hash[X][1] is one then the number is present
Below is the implementation of the above idea.
• C++
• Java
• Python3
• C#
• Javascript
pag
Output
Present
Time Complexity: The time complexity of the above algorithm is O(N), where N
is the size of the given array.
Space Complexity: The space complexity of the above algorithm
is O(N), because we are using an array of max size.
This article is contributed by ShivamKD. If you like GeeksforGeeks and would
like to contribute, you can also write an article using [Link] or
mail your article to review-team@[Link]. See your article appearing
on the GeeksforGeeks main page and help other Geeks.
Separate Chaining Collision Handling Technique in Hashing
• Read
• Discuss
• Courses
• Practice
136 | P a g e
• Video
What is Collision?
Since a hash function gets us a small number for a key which is a big integer or
string, there is a possibility that two keys result in the same value. The situation
where a newly inserted key maps to an already occupied slot in the hash table is
called collision and must be handled using some collision handling technique.
Separate Chaining:
The idea behind separate chaining is to implement the array as a linked list called
a chain. Separate chaining is one of the most popular and commonly used
techniques in order to handle collisions.
The linked list data structure is used to implement this technique. So what happens
is, when multiple elements are hashed into the same slot index, then these elements
are inserted into a singly-linked list which is known as a chain.
Here, all those elements that hash into the same slot index are inserted into a
linked list. Now, we can use a key K to search in the linked list by just linearly
traversing. If the intrinsic key for any entry is equal to K then it means that we
have found our entry. If we have reached the end of the linked list and yet we
haven’t found our entry then it means that the entry does not exist. Hence, the
conclusion is that in separate chaining, if two different elements have the same
hash value then we store both the elements in the same linked list one after the
other.
Example: Let us consider a simple hash function as “key mod 7” and a sequence
of keys as 50, 700, 76, 85, 92, 73, 101
137 | P a g e
You can refer to the following link in order to understand how to implement
separate chaining with C++.
C++ program for hashing with chaining
Advantages:
• Simple to implement.
• Hash table never fills up, we can always add more elements to the chain.
• Less sensitive to the hash function or load factors.
• It is mostly used when it is unknown how many and how frequently keys may
be inserted or deleted.
Disadvantages:
• The cache performance of chaining is not good as keys are stored using a
linked list. Open addressing provides better cache performance as everything
is stored in the same table.
• Wastage of Space (Some Parts of the hash table are never used)
• If the chain becomes long, then search time can become O(n) in the worst case
• Uses extra space for links
Performance of Chaining:
Performance of hashing can be evaluated under the assumption that each key is
equally likely to be hashed to any slot of the table (simple uniform hashing).
m = Number of slots in hash table
n = Number of keys to be inserted in hash table
138 | P a g e
Load factor α = n/m
Expected time to search = O(1 + α)
Expected time to delete = O(1 + α)
Time to insert = O(1)
Time complexity of search insert and delete is O(1) if α is O(1)
• Discuss(20+)
• Courses
• Practice
• Video
Open Addressing:
Like separate chaining, open addressing is a method for handling collisions. In
Open Addressing, all elements are stored in the hash table itself. So at any point,
the size of the table must be greater than or equal to the total number of keys
(Note that we can increase table size by copying old data if needed). This
approach is also known as closed hashing. This entire procedure is based upon
probing. We will understand the types of probing ahead:
• Insert(k): Keep probing until an empty slot is found. Once an empty slot is
found, insert k.
139 | P a g e
• Search(k): Keep probing until the slot’s key doesn’t become equal to k or an
empty slot is reached.
• Delete(k): Delete operation is interesting. If we simply delete a key, then the
search may fail. So slots of deleted keys are marked specially as “deleted”.
The insert can insert an item in a deleted slot, but the search doesn’t stop at a
deleted slot.
Different ways of Open Addressing:
1. Linear Probing:
In linear probing, the hash table is searched sequentially that starts from the
original location of the hash. If in case the location that we get is already
occupied, then we check for the next location.
The function used for rehashing is as follows: rehash(key) = (n+1)%table-size.
For example, The typical gap between two probes is 1 as seen in the example
below:
Let hash(x) be the slot index computed using a hash function and S be the table
size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
…………………………………………..
…………………………………………..
Let us consider a simple hash function as “key mod 7” and a sequence of keys as 50,
700, 76, 85, 92, 73, 101,
which means hash(key)= key% S, here S=size of the table =7,indexed from 0 to [Link]
can define the hash function as per our choice if we want to create a hash
table,although it is fixed internally with a pre-defined formula.
140 | P a g e
Applications of linear probing:
Linear probing is a collision handling technique used in hashing, where the
algorithm looks for the next available slot in the hash table to store the collided
key. Some of the applications of linear probing include:
• Symbol tables: Linear probing is commonly used in symbol tables, which are
used in compilers and interpreters to store variables and their associated
values. Since symbol tables can grow dynamically, linear probing can be used
to handle collisions and ensure that variables are stored efficiently.
• Caching: Linear probing can be used in caching systems to store frequently
accessed data in memory. When a cache miss occurs, the data can be loaded
into the cache using linear probing, and when a collision occurs, the next
available slot in the cache can be used to store the data.
• Databases: Linear probing can be used in databases to store records and
their associated keys. When a collision occurs, linear probing can be used to
find the next available slot to store the record.
• Compiler design: Linear probing can be used in compiler design to
implement symbol tables, error recovery mechanisms, and syntax analysis.
• Spell checking: Linear probing can be used in spell-checking software to
store the dictionary of words and their associated frequency counts. When a
collision occurs, linear probing can be used to store the word in the next
available slot.
Overall, linear probing is a simple and efficient method for handling collisions in
hash tables, and it can be used in a variety of applications that require efficient
storage and retrieval of data.
141 | P a g e
Challenges in Linear Probing :
• Primary Clustering: One of the problems with linear probing is Primary
clustering, many consecutive elements form groups and it starts taking time
to find a free slot or to search for an element.
• Secondary Clustering: Secondary clustering is less severe, two records only
have the same collision chain (Probe Sequence) if their initial position is the
same.
Example: Let us consider a simple hash function as “key mod 5” and a sequence
of keys that are to be inserted are 50, 70, 76, 93.
• Step1: First draw the empty hash table which will have a possible range of
hash values from 0 to 4 according to the hash function provided.
Hash table
• Step 2: Now insert all the keys in the hash table one by one. The first key is
50. It will map to slot number 0 because 50%5=0. So insert it into slot number
0.
142 | P a g e
Insert 50 into hash table
• Step 3: The next key is 70. It will map to slot number 0 because 70%5=0 but
50 is already at slot number 0 so, search for the next empty slot and insert it.
• Step 4: The next key is 76. It will map to slot number 1 because 76%5=1 but
70 is already at slot number 1 so, search for the next empty slot and insert it.
143 | P a g e
Insert 76 into hash table
• Step 5: The next key is 93 It will map to slot number 3 because 93%5=3, So
insert it into slot number 3.
2. Quadratic Probing
If you observe carefully, then you will understand that the interval between
probes will increase proportionally to the hash value. Quadratic probing is a
method with the help of which we can solve the problem of clustering that was
discussed above. This method is also known as the mid-square method. In this
method, we look for the i2‘th slot in the ith iteration. We always start from the
144 | P a g e
original hash location. If only the location is occupied then we check the other
slots.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
…………………………………………..
…………………………………………..
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and
collision resolution strategy to be f(i) = i2 . Insert = 22, 30, and 50.
• Step 1: Create a table of size 7.
Hash table
145 | P a g e
Insert keys 22 and 30 in the hash table
• Step 3: Inserting 50
• Hash(50) = 50 % 7 = 1
• In our hash table slot 1 is already occupied. So, we will search for
slot 1+12, i.e. 1+1 = 2,
• Again slot 2 is found occupied, so we will search for cell 1+22, i.e.1+4
= 5,
• Now, cell 5 is not occupied so we will place 50 in slot 5.
146 | P a g e
Insert key 50 in the hash table
3. Double Hashing
The intervals that lie between probes are computed by another hash function.
Double hashing is a technique that reduces clustering in an optimized way. In this
technique, the increments for the probing sequence are computed by using
another hash function. We use another hash function hash2(x) and look for the
i*hash2(x) slot in the ith rotation.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
…………………………………………..
…………………………………………..
Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first
hash-function is h1(k) = k mod 7 and second hash-function is h2(k) = 1 + (k
mod 5)
• Step 1: Insert 27
• 27 % 7 = 6, location 6 is empty so insert 27 into 6 slot.
• Step 2: Insert 43
• 43 % 7 = 1, location 1 is empty so insert 43 into 1 slot.
147 | P a g e
Insert key 43 in the hash table
148 | P a g e
Insert key 692 in the hash table
• Step 4: Insert 72
• 72 % 7 = 2, but location 2 is already being occupied and this is a
collision.
• So we need to resolve this collision using double hashing.
hnew = [h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
=5%7
= 5,
Now, as 5 is an empty slot,
so we can insert 72 into 5th slot.
149 | P a g e
Insert key 72 in the hash table
151 | P a g e
[Link]. Separate Chaining Open Addressing
152 | P a g e
Double hashing can be done using :
(hash1(key) + i * hash2(key)) % TABLE_SIZE
Here hash1() and hash2() are hash functions and TABLE_SIZE
is size of hash table.
(We repeat by increasing i when collision occurs)
Method 1: First hash function is typically hash1(key) = key % TABLE_SIZE
A popular second hash function is hash2(key) = PRIME – (key %
PRIME) where PRIME is a prime smaller than the TABLE_SIZE.
A good second Hash function is:
• It must never evaluate to zero
• Just make sure that all cells can be probed
• CPP
• Python3
• C#
• Javascript
/*
** Handling of collision via open addressing
153 | P a g e
** Method for Probing: Double Hashing
*/
#include <iostream>
#include <vector>
#include <bitset>
using namespace std;
#define MAX_SIZE 10000001ll
class doubleHash {
public:
doubleHash(int n){
__setSieve();
TABLE_SIZE = n;
/* Find the largest prime number smaller than hash table's size. */
PRIME = TABLE_SIZE - 1;
while(isPrime[PRIME] == 1)
PRIME--;
keysPresent = 0;
154 | P a g e
/* Fill the hash table with -1 (empty entries). */
for(int i = 0; i < TABLE_SIZE; i++)
hashTable.push_back(-1);
}
if(isFull()){
cout<<("ERROR : Hash Table Full\n");
return;
}
while(hashTable[probe] != -1){
if(-2 == hashTable[probe])
break; // insert at deleted element's location
probe = (probe+offset) % TABLE_SIZE;
}
hashTable[probe] = value;
keysPresent += 1;
}
while(hashTable[probe] != -1)
if(hashTable[probe] == value){
hashTable[probe] = -2; // mark element as deleted (rather than unvisited(-1)).
keysPresent--;
return;
155 | P a g e
}
else
probe = (probe + offset) % TABLE_SIZE;
while(1){
if(hashTable[probe] == -1) // Stop search if -1 is encountered.
break;
else if(hashTable[probe] == value) // Stop search after finding the element.
return true;
else if(probe == initialPos && !firstItr) // Stop search if one complete traversal of hash table is
completed.
return false;
else
probe = ((probe + offset) % TABLE_SIZE); // if none of the above cases occur then update the index and
check at it.
firstItr = false;
}
return false;
}
};
int main(){
doubleHash myHash(13); // creates an empty hash table of size 13
156 | P a g e
/*
** Searches for random element in the hash table,
** and prints them if found.
*/
return 0;
}
Output
Status of hash table after initial insertions : -1, 66, -1, -1, -1,
-1, 123, -1, -1, 87, -1, 115, 12,
157 | P a g e
M
me_l
• Read
• Discuss
• Courses
• Practice
•
Prerequisites: Hashing Introduction and Collision handling by separate chaining
For insertion of a key(K) – value(V) pair into a hash map, 2 steps are required:
1. K is converted into a small integer (called its hash code) using a hash function.
2. The hash code is used to find an index (hashCode % arrSize) and the entire
linked list at that index(Separate chaining) is first searched for the presence
of the K already.
3. If found, it’s value is updated and if not, the K-V pair is stored as a new node in
the list.
• For the first step, the time taken depends on the K and the hash function.
For example, if the key is a string “abcd”, then it’s hash function may depend
on the length of the string. But for very large values of n, the number of
entries into the map, and length of the keys is almost negligible in comparison
to n so hash computation can be considered to take place in constant time,
i.e, O(1).
• For the second step, traversal of the list of K-V pairs present at that index
needs to be done. For this, the worst case may be that all the n entries are at
the same index. So, time complexity would be O(n). But, enough research has
been done to make hash functions uniformly distribute the keys in the array
so this almost never happens.
• So, on an average, if there are n entries and b is the size of the array there
would be n/b entries on each index. This value n/b is called the load
factor that represents the load that is there on our map.
158 | P a g e
• This Load Factor needs to be kept low, so that number of entries at one index
is less and so is the complexity almost constant, i.e., O(1).
Rehashing:
159 | P a g e
• C++
• Java
• Python3
• C#
• Javascript
#include <iostream>
#include <vector>
#include <functional>
class Map {
private:
class MapNode {
public:
int key;
int value;
MapNode* next;
// Default loadFactor
double DEFAULT_LOAD_FACTOR = 0.75;
public:
Map() {
numBuckets = 5;
[Link](numBuckets);
160 | P a g e
std::cout << "HashMap created" << std::endl;
std::cout << "Number of pairs in the Map: " << size << std::endl;
std::cout << "Size of Map: " << numBuckets << std::endl;
std::cout << "Default Load Factor : " << DEFAULT_LOAD_FACTOR << std::endl;
}
buckets[bucketInd] = newElementNode;
std::cout << "Pair(" << key << ", " << value << ") inserted successfully." <<
std::endl;
// Incrementing size
// as new K-V pair is added to the map
size++;
std::cout << "Current Load factor = " << loadFactor << std::endl;
161 | P a g e
// Rehash
rehash();
std::cout << "New Size of Map: " << numBuckets << std::endl;
}
std::cout << "Number of pairs in the Map: " << size << std::endl;
}
void rehash() {
std::cout << "\n***Rehashing Started***\n" << std::endl;
int main() {
Map map;
// Inserting elements
[Link](1, 1);
[Link](2, 2);
[Link](3, 3);
[Link](4, 4);
162 | P a g e
[Link](5, 5);
[Link](6, 6);
[Link](7, 7);
[Link](8, 8);
[Link](9, 9);
[Link](10, 10);
return 0;
}
Output:
HashMap created
Number of pairs in the Map: 0
Size of Map: 5
Default Load Factor : 0.75
Current HashMap:
key = 1, val = Geeks
Current HashMap:
key = 1, val = Geeks
key = 2, val = forGeeks
163 | P a g e
Current Load factor = 0.6
Number of pairs in the Map: 3
Size of Map: 5
Current HashMap:
key = 1, val = Geeks
key = 2, val = forGeeks
key = 3, val = A
***Rehashing Started***
164 | P a g e
Current Load factor = 0.3
Number of pairs in the Map: 3
Size of Map: 10
***Rehashing Ended***
Current HashMap:
key = 1, val = Geeks
key = 2, val = forGeeks
key = 3, val = A
key = 4, val = Computer
Current HashMap:
key = 1, val = Geeks
165 | P a g e
key = 2, val = forGeeks
key = 3, val = A
key = 4, val = Computer
key = 5, val = Portal
Recursion:
Definition: Recursion is a programming technique where a function calls itself to
solve a problem.
Key Points:
• A recursive function typically has a base case (the simplest version of the problem) that stops
the recursion.
• It also has a recursive case, where the function calls itself with a smaller or simpler version of
the problem.
• Recursion can lead to elegant solutions, but it's important to avoid infinite loops by ensuring
the base case is reached.
pythonCopy code
def factorial ( n ): if n == 0 : return 1 else : return n * factorial(n - 1 )
Backtracking:
Definition: Backtracking is an algorithmic technique that involves trying out different
possibilities and undoing choices when they lead to a dead end.
Key Points:
• It's used to solve problems where you need to find all possible solutions, or one solution that
meets specific constraints.
• Backtracking uses a depth-first search approach and maintains a state that can be reverted.
• It involves making choices, exploring a path, and then undoing the choices if they don't lead
to a solution.
166 | P a g e
Example: N-Queens Problem In the N-Queens problem, you need to place N
queens on an N×N chessboard in such a way that no two queens threaten each
other.
pythonCopy code
def is_safe ( board, row, col, n ): for i in range (row): if board[i][col] == 1 : return False if col-row+i
>= 0 and board[i][col-row+i] == 1 : return False if col+row-i < n and board[i][col+row-i] == 1 :
return False return True def solve_n_queens_util ( board, row, n ): if row == n:
[Link]([ list (row) for row in board]) return for col in range (n): if is_safe(board, row, col,
n): board[row][col] = 1 solve_n_queens_util(board, row + 1 , n) board[row][col] = 0 def
solve_n_queens ( n ): board = [[ 0 ] * n for _ in range (n)] solve_n_queens_util(board, 0 , n) n = 4
solutions = [] solve_n_queens(n) for sol in solutions: for row in sol: print ( ' ' .join( 'Q' if cell == 1 else
'.' for cell in row)) print ()
These are just brief notes on recursion and backtracking. To fully understand and
master these concepts, it's important to study and practice them with different
problems and examples.
Certainly, the Tower of Hanoi is a classic problem that's often used to illustrate the concept of
recursion. In this problem, you have three pegs and a set of disks of different sizes which can be
slid onto any peg. The puzzle starts with the disks in a neat stack in ascending order of size on
one peg, the smallest at the top. The objective is to move the entire stack to another peg,
obeying the following rules:
Here's how you can solve the Tower of Hanoi problem using recursion:
pythonCopy code
def tower_of_hanoi ( n, source, auxiliary, target ): if n == 1 : print ( f"Move disk 1 from {source} to
{target}" ) return tower_of_hanoi(n - 1 , source, target, auxiliary) print ( f"Move disk {n} from {source} to
{target}" ) tower_of_hanoi(n - 1 , auxiliary, source, target) n = 3 # Number of disks tower_of_hanoi(n,
'A' , 'B' , 'C' ) # 'A', 'B', 'C' are the pegs
In this example, the function tower_of_hanoi takes the number of disks n and the names of the
source, auxiliary, and target pegs as arguments. It uses recursion to solve the problem by
breaking it down into smaller subproblems.
The function first moves n-1 disks from the source peg to the auxiliary peg using the target peg
as an auxiliary. Then, it moves the remaining largest disk from the source peg to the target peg.
Finally, it moves the n-1 disks from the auxiliary peg to the target peg using the source peg as an
auxiliary.
The recursion ends when there's only one disk left to move, at which point it directly moves that
disk from the source peg to the target peg.
167 | P a g e
By following this approach, you can solve the Tower of Hanoi problem efficiently using recursion.
Key Points:
168 | P a g e
In the coin change problem, the goal is to find the minimum number of coins
needed to make up a given amount. The dynamic programming approach involves
building up the solutions for each amount incrementally using the values of smaller
amounts.
String Formatting:
String formatting allows you to create dynamic strings by embedding variables and values within
them.
169 | P a g e
1. String Interpolation:
pythonCopy code
name = "Alice" age = 30 message = f"My name is {name} and I am {age} years old."
2. format() Method:
pythonCopy code
name = "Bob" age = 25 message = "My name is {} and I am {} years old." . format (name, age)
Case Conversion:
1. Uppercase and Lowercase:
pythonCopy code
text = "Hello, World!" upper_text = [Link]() # "HELLO, WORLD!" lower_text = [Link]() # "hello,
world!"
String Validation:
1. Checking Prefix and Suffix:
pythonCopy code
text = "Hello, World!" starts_with_hello = [Link]( "Hello" ) # True ends_with_world =
[Link]( "world" ) # False
2. Checking Type of Characters:
pythonCopy code
alphanumeric = "abc123" only_letters = [Link]() # False only_digits =
[Link]() # False
Regular Expressions:
Regular expressions (regex) are powerful tools for pattern matching and manipulation within
strings. They are used to search, validate, and replace patterns in text data.
pythonCopy code
import re pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b" text = "Contact us at
support@[Link]" matches = [Link](pattern, text) # ["support@[Link]"]
String manipulation is a versatile skill that's essential for working with textual data in various
programming tasks, ranging from data parsing and validation to user interfaces and text
processing applications.
Greedy Algorithms
• Last Updated : 26 Jul, 2023
• Read
• Discuss
• Courses
170 | P a g e
DSA for Beginners
Learn more about Greedy Algorithms in DSA Self Paced Course
Practice Problems on Greedy Algorithms
Top Quizzes on Greedy Algorithms
Greedy is an algorithmic paradigm that builds up a solution piece by piece, always choosing
the next piece that offers the most obvious and immediate benefit. So the problems where
choosing locally optimal also leads to global solution are the best fit for Greedy.
For example consider the Fractional Knapsack Problem. The local optimal strategy is to
choose the item that has maximum value vs weight ratio. This strategy also leads to a
globally optimal solution because we are allowed to take fractions of an item.
S
simranjenny84
• Read
• Discuss
• Courses
• Practice
171 | P a g e
The general structure of a greedy algorithm can be summarized in the
following steps:
1. Coin change problem: Given a set of coins with different denominations, find
the minimum number of coins required to make a given amount of change.
Fractional knapsack problem: Given a set of items with weights and values, fill
a knapsack with a maximum weight capacity with the most valuable items,
allowing fractional amounts of items to be included.
Huffman coding: Given a set of characters and their frequencies in a message,
construct a binary code with minimum average length for the characters.
Shortest path algorithms: Given a weighted graph, find the shortest path
between two nodes.
Minimum spanning tree: Given a weighted graph, find a tree that spans all
nodes with the minimum total weight.
Greedy algorithms can be very efficient and provide fast solutions for many
problems. However, it is important to keep in mind that they may not always
provide the optimal solution and to analyze the problem carefully to ensure
the correctness of the algorithm.
2. Greedy Algorithms work step-by-step, and always choose the steps which
provide immediate profit/benefit. It chooses the “locally optimal solution”,
without thinking about future consequences. Greedy algorithms may not
always lead to the optimal global solution, because it does not consider the
entire data. The choice made by the greedy approach does not consider future
data and choices. In some cases making a decision that looks right at that
moment gives the best solution (Greedy), but in other cases, it doesn’t. The
greedy technique is used for optimization problems (where we have to find
the maximum or minimum of something). The Greedy technique is best suited
for looking at the immediate situation.
All greedy algorithms follow a basic structure:
1. declare an empty result = 0.
172 | P a g e
2. We make a greedy choice to select, If the choice is feasible add it to the final
result.
3. return the result.
Why choose Greedy Approach:
The greedy approach has a few tradeoffs, which may make it suitable for
optimization. One prominent reason is to achieve the most feasible solution
immediately. In the activity selection problem (Explained below), if more activities
can be done before finishing the current activity, these activities can be performed
within the same time. Another reason is to divide a problem recursively based on a
condition, with no need to combine all the solutions. In the activity selection
problem, the “recursive division” step is achieved by scanning a list of items only
once and considering certain activities.
Greedy choice property:
This property says that the globally optimal solution can be obtained by making a
locally optimal solution (Greedy). The choice made by a Greedy algorithm may
depend on earlier choices but not on the future. It iteratively makes one Greedy
choice after another and reduces the given problem to a smaller one.
Optimal substructure:
A problem exhibits optimal substructure if an optimal solution to the problem
contains optimal solutions to the subproblems. That means we can solve
subproblems and build up the solutions to solve larger problems.
Note: Making locally optimal choices does not always work. Hence, Greedy
algorithms will not always give the best solutions.
Characteristics of Greedy approach:
• There is an ordered list of resources(profit, cost, value, etc.)
• Maximum of all the resources(max profit, max value, etc.) are taken.
• For example, in the fractional knapsack problem, the maximum value/weight
is taken first according to available capacity.
Characteristic components of greedy algorithm:
1. The feasible solution: A subset of given inputs that satisfies all specified
constraints of a problem is known as a “feasible solution”.
2. Optimal solution: The feasible solution that achieves the desired extremum
is called an “optimal solution”. In other words, the feasible solution that either
minimizes or maximizes the objective function specified in a problem is
known as an “optimal solution”.
3. Feasibility check: It investigates whether the selected input fulfils all
constraints mentioned in a problem or not. If it fulfils all the constraints then
it is added to a set of feasible solutions; otherwise, it is rejected.
4. Optimality check: It investigates whether a selected input produces either a
minimum or maximum value of the objective function by fulfilling all the
specified constraints. If an element in a solution set produces the desired
extremum, then it is added to a sel of optimal solutions.
173 | P a g e
5. Optimal substructure property: The globally optimal solution to a problem
includes the optimal sub solutions within it.
6. Greedy choice property: The globally optimal solution is assembled by
selecting locally optimal choices. The greedy approach applies some locally
optimal criteria to obtain a partial solution that seems to be the best at that
moment and then find out the solution for the remaining sub-problem.
The local decisions (or choices) must possess three characteristics as mentioned
below:
1. Feasibility: The selected choice must fulfil local constraints.
2. Optimality: The selected choice must be the best at that stage (locally
optimal choice).
3. Irrevocability: The selected choice cannot be changed once it is made.
Applications of Greedy Algorithms:
• Finding an optimal solution (Activity selection, Fractional Knapsack, Job
Sequencing, Huffman Coding).
• Finding close to the optimal solution for NP-Hard problems like TSP.
• Network design: Greedy algorithms can be used to design efficient networks,
such as minimum spanning trees, shortest paths, and maximum flow
networks. These algorithms can be applied to a wide range of network design
problems, such as routing, resource allocation, and capacity planning.
• Machine learning: Greedy algorithms can be used in machine learning
applications, such as feature selection, clustering, and classification. In feature
selection, greedy algorithms are used to select a subset of features that are
most relevant to a given problem. In clustering and classification, greedy
algorithms can be used to optimize the selection of clusters or classes.
• Image processing: Greedy algorithms can be used to solve a wide range of
image processing problems, such as image compression, denoising, and
segmentation. For example, Huffman coding is a greedy algorithm that can be
used to compress digital images by efficiently encoding the most frequent
pixels.
• Combinatorial optimization: Greedy algorithms can be used to solve
combinatorial optimization problems, such as the traveling salesman
problem, graph coloring, and scheduling. Although these problems are
typically NP-hard, greedy algorithms can often provide close-to-optimal
solutions that are practical and efficient.
• Game theory: Greedy algorithms can be used in game theory applications,
such as finding the optimal strategy for games like chess or poker. In these
applications, greedy algorithms can be used to identify the most promising
moves or actions at each turn, based on the current state of the game.
• Financial optimization: Greedy algorithms can be used in financial
applications, such as portfolio optimization and risk management. In portfolio
optimization, greedy algorithms can be used to select a subset of assets that
are most likely to provide the best return on investment, based on historical
data and current market trends.
174 | P a g e
Applications of Greedy Approach:
Greedy algorithms are used to find an optimal or near optimal solution to many
real-life problems. Few of them are listed below:
(1) Make a change problem
(2) Knapsack problem
(3) Minimum spanning tree
(4) Single source shortest path
(5) Activity selection problem
(6) Job sequencing problem
(7) Huffman code generation.
(8) Dijkstra’s algorithm
(9) Greedy coloring
(10) Minimum cost spanning tree
(11) Job scheduling
(12) Interval scheduling
(13) Greedy set cover
(14) Knapsack with fractions
175 | P a g e
• Greedy algorithms are often used as a first step in solving optimization
problems, because they provide a good starting point for more complex
optimization algorithms.
• Greedy algorithms can be used in conjunction with other optimization
algorithms, such as local search or simulated annealing, to improve the
quality of the solution.
Disadvantages of the Greedy Approach:
• The local optimal solution may not always be globally optimal.
• Greedy algorithms do not always guarantee to find the optimal solution, and
may produce suboptimal solutions in some cases.
• The greedy approach relies heavily on the problem structure and the choice of
criteria used to make the local optimal choice. If the criteria are not chosen
carefully, the solution produced may be far from optimal.
• Greedy algorithms may require a lot of preprocessing to transform the
problem into a form that can be solved by the greedy approach.
• Greedy algorithms may not be applicable to problems where the optimal
solution depends on the order in which the inputs are processed.
• Greedy algorithms may not be suitable for problems where the optimal
solution depends on the size or composition of the input, such as the bin
packing problem.
• Greedy algorithms may not be able to handle constraints on the solution
space, such as constraints on the total weight or capacity of the solution.
• Greedy algorithms may be sensitive to small changes in the input, which can
result in large changes in the output. This can make the algorithm unstable
and unpredictable in some cases.
Standard Greedy Algorithms :
• Prim’s Algorithm
• Kruskal’s Algorithm
• Dijkstra’s Algorithm
Difference between Greedy Algorithm and Divide and Conquer
Algorithm
H
harshraghav718
• Read
• Discuss
• Courses
176 | P a g e
• Practice
•
Greedy algorithm and divide and conquer algorithm are two common
algorithmic paradigms used to solve problems. The main difference between
them lies in their approach to solving problems.
Greedy Algorithm:
177 | P a g e
A typical Divide and Conquer algorithm solves a problem using the following
three steps:
• Divide: This involves dividing the problem into smaller sub-problems.
• Conquer: Solve sub-problems by calling recursively until solved.
• Combine: Combine the sub-problems to get the final solution of the whole
problem.
Difference between the Greedy Algorithm and the
Divide and Conquer Algorithm:
[Link] Divide and conquer Greedy Algorithm
178 | P a g e
[Link] Divide and conquer Greedy Algorithm
H
harshraghav718
• Read
• Discuss
• Courses
• Practice
•
Greedy algorithm and divide and conquer algorithm are two common
algorithmic paradigms used to solve problems. The main difference between
them lies in their approach to solving problems.
Greedy Algorithm:
179 | P a g e
benefit irrespective of the final outcome. It is a simple, intuitive algorithm that
is used in optimization problems.
180 | P a g e
[Link] Divide and conquer Greedy Algorithm
H
harshraghav718
• Read
• Discuss
181 | P a g e
• Courses
• Practice
•
Greedy algorithm, divide and conquer algorithm, and dynamic programming
algorithm are three common algorithmic paradigms used to solve problems.
Here’s a comparison among these algorithms:
Approach:
1. Greedy algorithm: Makes locally optimal choices at each step with the hope of
finding a global optimum.
2. Divide and conquer algorithm: Breaks down a problem into smaller
subproblems, solves each subproblem recursively, and then combines the
solutions to the
3. subproblems to solve the original problem.
4. Dynamic programming algorithm: Solves subproblems recursively and stores
their solutions to avoid repeated calculations.
Goal:
1. Greedy algorithm: Finds the best solution among a set of possible solutions.
2. Divide and conquer algorithm: Solves a problem by dividing it into smaller
subproblems, solving each subproblem independently, and then combining
the
3. solutions to the subproblems to solve the original problem.
4. Dynamic programming algorithm: Solves a problem by breaking it down into
smaller subproblems and solving each subproblem recursively.
Time complexity:
.
Space complexity:
182 | P a g e
3. Dynamic programming algorithm: O(n^2) or O(n^3) depending on the
problem.
Optimal solution:
Examples:
Greedy Algorithm:
Greedy algorithm is defined as a method for solving optimization problems by
taking decisions that result in the most evident and immediate benefit
irrespective of the final outcome. It is a simple, intuitive algorithm that is used in
optimization problems.
Divide and conquer Algorithm:
Divide and conquer is an algorithmic paradigm in which the problem is solved
using the Divide, Conquer, and Combine strategy. A typical Divide and Conquer
algorithm solve a problem using the following three steps:
Divide: This involves dividing the problem into smaller sub-problems.
Conquer: Solve sub-problems by calling recursively until solved.
Combine: Combine the sub-problems to get the final solution of the whole problem.
Dynamic Programming:
Dynamic Programming is mainly an optimization over plain recursion. Wherever
we see a recursive solution that has sometimes repeated calls for the same input
states, we can optimize it using Dynamic Programming. The idea is to simply
store the results of subproblems so that we do not have to re-compute them
183 | P a g e
when needed later. This simple optimization reduces time complexities from
exponential to polynomial.
It is used to obtain a
solution to the given
It may or may not generate It always generates
4 problem, it does not
an optimal solution. optimal solution.
aim for the optimal
solution
184 | P a g e
Divide and Dynamic
[Link] Greedy Algorithm conquer Programming
more memory is
extra memory is not some memory is required to store
7
required. required. subproblems for later
use.
2/2
Sure, here's a table that provides a summary of common data structures and
algorithms, along with their time and space complexity analyses. This table can be
helpful for DSA placement preparation:
Arrays
185 | P a g e
Data Structure / Algorithm Time Complexity Space Complexity
Linked Lists
186 | P a g e
Data Structure / Algorithm Time Complexity Space Complexity
Stacks
Queues
187 | P a g e
Data Structure / Algorithm Time Complexity Space Complexity
Hash Tables
188 | P a g e
Data Structure / Algorithm Time Complexity Space Complexity
Heaps (Binary)
Sorting Algorithms
189 | P a g e
Data Structure / Algorithm Time Complexity Space Complexity
Graph Traversal
Please note that the time and space complexities provided in the table are simplified
and represent average and worst-case scenarios. Depending on the exact
implementation and specific scenarios, these complexities can vary. This table is
meant to provide a quick reference and overview of the complexities of common
data structures and algorithms encountered in DSA placement interviews.
190 | P a g e