0% found this document useful (0 votes)
106 views190 pages

Data Structures and Algorithms Overview

The document provides an overview of data structures, their operations, and the importance of selecting appropriate data structures based on problem requirements. It discusses various types of data structures, searching techniques like linear and binary search, and sorting algorithms including selection sort and bubble sort, along with their complexities. Additionally, it explains asymptotic notations for analyzing algorithm efficiency, emphasizing the significance of understanding time and space complexities.

Uploaded by

Prabhjot Kaur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views190 pages

Data Structures and Algorithms Overview

The document provides an overview of data structures, their operations, and the importance of selecting appropriate data structures based on problem requirements. It discusses various types of data structures, searching techniques like linear and binary search, and sorting algorithms including selection sort and bubble sort, along with their complexities. Additionally, it explains asymptotic notations for analyzing algorithm efficiency, emphasizing the significance of understanding time and space complexities.

Uploaded by

Prabhjot Kaur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction

• Data structure usually refers to a data organization, management,


and storage in main memory that enables efficiently access and modification.
• If data is arranged in a systematic way then it gets the structure and became
meaningful. This meaningful and processed data is the information.
• The cost of a solution is the amount of resources that the solution needs.
• A data structure requires:
o Space for each data item it stores
o Time to perform each basic operation
o Programming effort
• How to select a data structure?
o Identify the problem
o Analyze the problem
o Quantify the resources
o Select the data structure

Data structures hierarchy

1|Page
• Operations on data structures:
o Traversing, Searching, Inserting, Deleting, Sorting, Merging.
• Algorithm properties:
o It must be correct (must produce desired output).
o It is composed of a series of concrete steps.
o There can be no ambiguity.
o It must be composed of a finite number of steps.
o It must terminate.
• To summarize:
o Problem - a function of inputs and mapping them to outputs.
o Algorithm - a step-by-step set of operations to solve a specific problem
or a set of problems.
o Program - a specific sequences of instructions in a prog. lang., and it
may contain the implementation of many algorithms.

Abstract data type

• [Link] [Link]
• Two important things about data types:
o Defines a certain domain of values
o Defines operations allowed on those values
o Example: int takes
▪ Takes only integer values
▪ Operations: addition, subtraction, multiplication, division, bitwise
operations.
• ADT describes a set of objects sharing the same properties and behaviors.
o The properties of an ADT are its data.
o The behaviors of an ADT are its operations or functions.
• ADT example: stack (can be implemented with array or linked list)
• Abstraction is the method of hiding the unwanted information.
• Encapsulation is a method to hide the data in a single entity or unit along with
a method to protect information from outside. Encapsulation can be
implemented using by access modifier i.e. private, protected and public.

What is the data structure

• A data structure is the organization of the data in a way so that it can be used
efficiently.
• It is used to implement an ADT.

2|Page
• ADT tells us what is to be done and data structures tells use how to do it.
• Types:
o linear (stack, array, linked list)
o non-linear (tree, graph)
o static (compile time memory allocation), array
▪ Advantage: fast access
▪ Disadvantage: slow insertion and deletion
o dynamic (run time memory allocation), linked list
▪ Advantage: faster insertion and deletion
▪ Disadvantage: slow access

Asymptotic notations
• Efficiency measured in terms of TIME and SPACE. In terms of number of
operations.

• Asymptotic complexity

o The running time depends on the size of the input


o f(n) = running time of an algorithm, where n= input size. We are
interested in growth of n to calculate the f(n)
o "Functions do more work for bigger input"
o Drop all constants: 3n, 5n, 100n => n, why?
o Ignore lower order terms: n3 + n2 + n + 5 => n3
o Ignore the base of logs: log(2) => ln(2)

• f(n) = O(n2) => describes how f(n) grows in comparison to n2

• Big-O notation, Ω (Omega) notation, Θ (Big-Theta) notation

• Big-O notation is used to measure the performance of any algorithm by


providing the order of growth of the function.

3|Page


• O (Big-O) notation (worst time, upper bound, maximum complexity), 0 <= f(n)
<= c*g(n) for all n >= n0, f(n) = O(g(n))

4|Page
• f(n) = 3n + 2, g(n) = n, f(n) = Og(n)

• 3n + 2 <= Cn
• 3n + 2 <= 4n
• n >= 2

• c = 4, n >= 2

o n3 = O(n2) False
o n2 = O(n3) True
• Ω (Omega) notation (best amount of time, lower bound), 0 <= c*g(n) <= f(n)
for all n >=n0
• f(n) = 3n + 2, g(n) = n, f(n) = Ωg(n)

• 3n + 2 <= Cn
• 3n + 2 <= n
• 2n >= -2
• n >= -1

• c = 1, n >= 1

• Θ (Big-theta) notation (average case, lower & upper sandwich), 0 <= c1*g(n)
<= f(n) <= c2*g(n)
• f(n) = 3n + 2, g(n) = n, f(n) = Θg(n)

• C1*n <= 3n + 2 <= C2*n

• 3n + 2 <= C2*n c1*n <= 3n + 2
• 3n + 2 <= 4n 3n + 2 >= n
• n >= 2 n >= -1

• c2 = 4, n >= 2 c1 = 1, n >= 1
• n >=2 // We must take greater number, which is true for both

• Loops, if-else asymptotic analysis

5|Page
Searching Techniques
• Searching is an operation which finds the location of a given element in a list.
• The search is said to be successful or unsuccessful depending on whether the
element that is to be searched is found or not.

Linear Search

• Problem: Given an array arr[] of n elements, write a function to search a given


element x in arr[].
• In this type of search, a sequential search is made over all items one by one.
Every item is checked and if a match is found then that particular item is

6|Page
returned, otherwise the search continues till the end of the data collection.

• Pseudocode:
• procedure linear_search(list, value)
• for each item in the list
• if item == value
• return the item's location
• end if
• end for
end procedure

7|Page
CODE: // C++ code to linearly search x in arr[]. If x
// is present then return its location, otherwise
// return -1

#include <iostream>
using namespace std;

int search(int arr[], int n, int x)


{
int i;
for (i = 0; i < n; i++)
if (arr[i] == x)
return i;
return -1;
}

// Driver code
int main(void)
{
int arr[] = { 2, 3, 4, 10, 40 };
int x = 10;
int n = sizeof(arr) / sizeof(arr[0]);

// Function call
int result = search(arr, n, x);
(result == -1)
? cout << "Element is not present in array"
: cout << "Element is present at index " << result;
return 0;
}
• Analysis:
o Best case O(1)
o Average O(n)
o Worst O(n)

Binary Search
• Binary Search is a searching algorithm for finding an element's position in a sorted
array.
• It's fast and efficient, tIme complexity of binary search: O(log n)
• In this method:
o To search an element we compare it with the element present at the center of
the list. If it matches then the search is successful.

8|Page
o Otherwise, the list is divided into two halves:
▪ One from 0th element to the center element (first half)
▪ Another from center element to the last element (second half)
o The searching will now proceed in either of the two halves depending upon
whether the element is greater or smaller than the center element.
o If the element is smaller than the center element then the searching will be done
in the first half, otherwise in the second half.
• It can be done recursively or iteratively.
• Pseudocode:
• procedure binary_search
• A ← sorted array
• n ← size of array
• x ← value to be searched

• set lowerBound = 1
• set upperBound = n

• while x not found
• if upperBound < lowerBound
• EXIT: x does not exists.

• set midPoint = lowerBound + (upperBound - lowerBound) / 2

• if A[midPoint] < x
• set lowerBound = midPoint + 1

• if A[midPoint] > x
• set upperBound = midPoint - 1

• if A[midPoint] = x
• EXIT: x found at location midPoint
• end while

• end procedure

• Binary search in C++ | Binary search in Python


• Analysis:
o Best-case O(1)
o Average O(log n)
o Worst-case O(log n)

Sorting techniques
• Sorting - a process of arranging a set of data in certain order
• Internal sorting - deals with data in memory of computer
• External sorting - deals with data stored in data files when data is in
large volume

TYPES OF SORT:

9|Page
• Sorting - a process of arranging a set of data in certain order
• Internal sorting - deals with data in memory of computer
• External sorting - deals with data stored in data files when data is in large
volume
• Types of sorts:
o Selection sort - O(n2). Selects the smallest element from an unsorted list
and places that element in front. Python code
o Bubble sort - best O(n) else O(n2). Compares adjacent elements, and
swaps elements bringing large elements to the end. Python code
o **Insertion sort - best O(n) else O(n2). Places unsorted element at its
suitable place in each iteration. Python code
o **Merge sort - O(n*logn). It is based on Divide and Conquer
Algorithm divides in the middle, sorts, then combines.
o Quick sort - PIVOT, worst O(n2) else O(n*logn). Based on Divide and
Conquer Algorithm, larger and smaller elements are placed after and
before pivot element.
o Heap sort - O(n*logn).
o Radix sort
o Bucket sort

10 | P a g e
Selection Sort Algorithm
Selection sort is a sorting algorithm that selects the smallest element from
an unsorted list in each iteration and places that element at the beginning of
the unsorted list.

Working of Selection Sort

1. Set the first element as minimum .

Select first element as minimum


2. Compare minimum with the second element. If the second element is smaller
than minimum , assign the second element as minimum .

Compare minimum with the third element. Again, if the third element is smaller,
then assign minimum to the third element otherwise do nothing. The process

goes on until the last element.


Compare minimum with the remaining elements

11 | P a g e
3. After each iteration, minimum is placed in the front of the unsorted list.

Swap the first with minimum


4. For each iteration, indexing starts from the first unsorted element. Step 1 to
3 are repeated until all the elements are placed at their correct positions.

The first iteration

12 | P a g e
The second iteration

The third iteration

13 | P a g e
The fourth iteration

Selection Sort Algorithm

selectionSort(array, size)
repeat (size - 1) times
set the first unsorted element as the minimum
for each of the unsorted elements
if element < currentMinimum
set element as new minimum
swap minimum with first unsorted position
end selectionSort

Selection Sort Code in Python, Java, and C/C++


Python

Java

C++

# Selection sort in Python

14 | P a g e
def selectionSort(array, size):

for step in range(size):


min_idx = step

for i in range(step + 1, size):

# to sort in descending order, change > to < in this line


# select the minimum element in each loop
if array[i] < array[min_idx]:
min_idx = i

# put min at the correct position


(array[step], array[min_idx]) = (array[min_idx], array[step])

data = [-2, 45, 0, 11, -9]


size = len(data)
selectionSort(data, size)
print('Sorted Array in Ascending Order:')
print(data)

Selection Sort Complexity


Time Complexity

Best O(n2)

Worst O(n2)

Average O(n2)

Space Complexity O(1)

Stability No

15 | P a g e
Cycle Number of Comparison

1st (n-1)

2nd (n-2)

3rd (n-3)

... ...

last 1

Number of comparisons: (n - 1) + (n - 2) + (n - 3) + ..... + 1 = n(n - 1) /

2 nearly equals to n2 .

Complexity = O(n2)

Also, we can analyze the complexity by simply observing the number of


loops. There are 2 loops so the complexity is n*n = n2 .

Time Complexities:
• Worst Case Complexity: O(n2)

If we want to sort in ascending order and the array is in descending order


then, the worst case occurs.
• Best Case Complexity: O(n2)

It occurs when the array is already sorted


• Average Case Complexity: O(n2)

It occurs when the elements of the array are in jumbled order (neither
ascending nor descending).
The time complexity of the selection sort is the same in all cases. At every
step, you have to find the minimum element and put it in the right place. The
minimum element is not known until the end of the array is not reached.

16 | P a g e
Space Complexity:
Space complexity is O(1) because an extra variable temp is used.

Selection Sort Applications


The selection sort is used when

• a small list is to be sorted

• cost of swapping does not matter

• checking of all the elements is compulsory

• cost of writing to a memory matters like in flash memory (number of


writes/swaps is O(n) as compared to O(n2) of bubble sort)

Bubble Sort
Bubble sort is a sorting algorithm that compares two adjacent elements and
swaps them until they are in the intended order.
Just like the movement of air bubbles in the water that rise up to the surface,
each element of the array move to the end in each iteration. Therefore, it is
called a bubble sort.

Working of Bubble Sort


Suppose we are trying to sort the elements in ascending order.
1. First Iteration (Compare and Swap)
1. Starting from the first index, compare the first and the second elements.

17 | P a g e
2. If the first element is greater than the second element, they are swapped.

3. Now, compare the second and the third elements. Swap them if they are not
in order.

4. The above process goes on until the last element.

Compare the Adjacent


Elements

2. Remaining Iteration
The same process goes on for the remaining iterations.

After each iteration, the largest element among the unsorted elements is
placed at the end.

18 | P a g e
Put the largest element at the end

In each iteration, the comparison takes place up to the last unsorted element.

Compare the adjacent elements

The array is sorted when all the unsorted elements are placed at their correct
positions.

19 | P a g e
The array is sorted if all elements
are kept in the right order

Bubble Sort Algorithm

bubbleSort(array)
for i <- 1 to indexOfLastUnsortedElement-1
if leftElement > rightElement
swap leftElement and rightElement
end bubbleSort

Bubble Sort Code in Python, Java and C/C++


Python

Java

C++

# Bubble sort in Python

def bubbleSort(array):

# loop to access each array element

20 | P a g e
for i in range(len(array)):

# loop to compare array elements


for j in range(0, len(array) - i - 1):

# compare two adjacent elements


# change > to < to sort in descending order
if array[j] > array[j + 1]:

# swapping elements if elements


# are not in the intended order
temp = array[j]
array[j] = array[j+1]
array[j+1] = temp

data = [-2, 45, 0, 11, -9]

bubbleSort(data)

print('Sorted Array in Ascending Order:')


print(data)

Optimized Bubble Sort Algorithm


In the above algorithm, all the comparisons are made even if the array is
already sorted.

This increases the execution time.

To solve this, we can introduce an extra variable swapped . The value


of swapped is set true if there occurs swapping of elements. Otherwise, it is
set false.
After an iteration, if there is no swapping, the value of swapped will be false.
This means elements are already sorted and there is no need to perform
further iterations.
This will reduce the execution time and helps to optimize the bubble sort.

Algorithm for optimized bubble sort is

bubbleSort(array)
swapped <- false

21 | P a g e
for i <- 1 to indexOfLastUnsortedElement-1
if leftElement > rightElement
swap leftElement and rightElement
swapped <- true
end bubbleSort

Optimized Bubble Sort in Python, Java, and C/C++


Python
Java
C
C++
# Optimized Bubble sort in Python

def bubbleSort(array):

# loop through each element of array


for i in range(len(array)):

# keep track of swapping


swapped = False

# loop to compare array elements


for j in range(0, len(array) - i - 1):

# compare two adjacent elements


# change > to < to sort in descending order
if array[j] > array[j + 1]:

# swapping occurs if elements


# are not in the intended order
temp = array[j]
array[j] = array[j+1]
array[j+1] = temp

swapped = True

# no swapping means the array is already sorted


# so no need for further comparison
if not swapped:
break

22 | P a g e
data = [-2, 45, 0, 11, -9]

bubbleSort(data)

print('Sorted Array in Ascending Order:')


print(data)

Bubble Sort Complexity


Time Complexity

Best O(n)

Worst O(n2)

Average O(n2)

Space Complexity O(1)

Stability Yes

Complexity in Detail

Bubble Sort compares the adjacent elements.

Cycle Number of Comparisons

1st (n-1)

2nd (n-2)

3rd (n-3)

23 | P a g e
....... ......

last 1

Hence, the number of comparisons is

(n-1) + (n-2) + (n-3) +.....+ 1 = n(n-1)/2

nearly equals to n2

Hence, Complexity: O(n2)

Also, if we observe the code, bubble sort requires two loops. Hence, the
complexity is n*n = n2

1. Time Complexities

• Worst Case Complexity: O(n2)

If we want to sort in ascending order and the array is in descending order


then the worst case occurs.
• Best Case Complexity: O(n)

If the array is already sorted, then there is no need for sorting.


• Average Case Complexity: O(n2)

It occurs when the elements of the array are in jumbled order (neither
ascending nor descending).
2. Space Complexity

• Space complexity is O(1) because an extra variable is used for swapping.


• In the optimized bubble sort algorithm, two extra variables are used.
Hence, the space complexity will be O(2) .

Bubble Sort Applications


Bubble sort is used if

24 | P a g e
• complexity does not matter

• short and simple code is preferred

Insertion Sort Algorithm


Insertion sort is a sorting algorithm that places an unsorted element at its
suitable place in each iteration.
Insertion sort works similarly as we sort cards in our hand in a card game.

We assume that the first card is already sorted then, we select an unsorted
card. If the unsorted card is greater than the card in hand, it is placed on the
right otherwise, to the left. In the same way, other unsorted cards are taken
and put in their right place.

A similar approach is used by insertion sort.

Working of Insertion Sort


Suppose we need to sort the following array.

Initial array

1. The first element in the array is assumed to be sorted. Take the second
element and store it separately in key .

Compare key with the first element. If the first element is greater than key ,

then key is placed in front of the first element.

25 | P a g e
If the first element is
greater than key, then key is placed in front of the first element.
2. Now, the first two elements are sorted.

Take the third element and compare it with the elements on the left of it.
Placed it just behind the element smaller than it. If there is no element smaller

26 | P a g e
than it, then place it at the beginning of the array.

Place 1 at the beginning

27 | P a g e
3. Similarly, place every unsorted element at its correct position.

Place 4 behind 1

28 | P a g e
Place 3 behind 1 and the
array is sorted

Insertion Sort Algorithm

insertionSort(array)
mark first element as sorted
for each unsorted element X
'extract' the element X
for j <- lastSortedIndex down to 0
if current element j > X
move sorted element to the right by 1
break loop and insert X here

29 | P a g e
end insertionSort

Insertion Sort in Python, Java, and C/C++


Python

Java

C++

# Insertion sort in Python

def insertionSort(array):

for step in range(1, len(array)):


key = array[step]
j = step - 1

# Compare key with each element on the left of it until an element smaller than
it is found
# For descending order, change key<array[j] to key>array[j].
while j >= 0 and key < array[j]:
array[j + 1] = array[j]
j = j - 1

# Place key at after the element just smaller than it.


array[j + 1] = key

data = [9, 5, 1, 4, 3]
insertionSort(data)
print('Sorted Array in Ascending Order:')
print(data)

Insertion Sort Complexity


30 | P a g e
Time Complexity

Best O(n)

Worst O(n2)

Average O(n2)

Space Complexity O(1)

Stability Yes

Time Complexities
• Worst Case Complexity: O(n2)

Suppose, an array is in ascending order, and you want to sort it in


descending order. In this case, worst case complexity occurs.

Each element has to be compared with each of the other elements so, for
every nth element, (n-1) number of comparisons are made.

Thus, the total number of comparisons = n*(n-1) ~ n2

• Best Case Complexity: O(n)

When the array is already sorted, the outer loop runs for n number of times
whereas the inner loop does not run at all. So, there are only n number of
comparisons. Thus, complexity is linear.
• Average Case Complexity: O(n2)

It occurs when the elements of an array are in jumbled order (neither


ascending nor descending).

31 | P a g e
Space Complexity
Space complexity is O(1) because an extra variable key is used.

Insertion Sort Applications


The insertion sort is used when:

• the array is has a small number of elements

• there are only a few elements left to be sorted

Merge Sort Algorithm


Merge Sort is one of the most popular sorting algorithms that is based on the
principle of Divide and Conquer Algorithm.
Here, a problem is divided into multiple sub-problems. Each sub-problem is
solved individually. Finally, sub-problems are combined to form the final
solution.

32 | P a g e
Merge Sort example

Divide and Conquer Strategy


Using the Divide and Conquer technique, we divide a problem into
subproblems. When the solution to each subproblem is ready, we 'combine'
the results from the subproblems to solve the main problem.

33 | P a g e
Suppose we had to sort an array A . A subproblem would be to sort a sub-
section of this array starting at index p and ending at index r , denoted
as A[p..r] .

Divide
If q is the half-way point between p and r, then we can split the
subarray A[p..r] into two arrays A[p..q] and A[q+1, r] .

Conquer
In the conquer step, we try to sort both the subarrays A[p..q] and A[q+1, r] .

If we haven't yet reached the base case, we again divide both these
subarrays and try to sort them.
Combine
When the conquer step reaches the base step and we get two sorted
subarrays A[p..q] and A[q+1, r] for array A[p..r] , we combine the results by
creating a sorted array A[p..r] from two sorted subarrays A[p..q] and A[q+1,

r] .

MergeSort Algorithm
The MergeSort function repeatedly divides the array into two halves until we
reach a stage where we try to perform MergeSort on a subarray of size 1
i.e. p == r .

After that, the merge function comes into play and combines the sorted
arrays into larger arrays until the whole array is merged.

MergeSort(A, p, r):

if p > r

return

q = (p+r)/2

34 | P a g e
mergeSort(A, p, q)

mergeSort(A, q+1, r)

merge(A, p, q, r)

To sort an entire array, we need to call MergeSort(A, 0, length(A)-1) .

As shown in the image below, the merge sort algorithm recursively divides
the array into halves until we reach the base case of array with 1 element.
After that, the merge function picks up the sorted sub-arrays and merges
them to gradually sort the entire array.

Merge sort in action

The merge Step of Merge Sort


Every recursive algorithm is dependent on a base case and the ability to
combine the results from base cases. Merge sort is no different. The most
important part of the merge sort algorithm is, you guessed it, merge step.
The merge step is the solution to the simple problem of merging two sorted
lists(arrays) to build one large sorted list(array).

35 | P a g e
The algorithm maintains three pointers, one for each of the two arrays and
one for maintaining the current index of the final sorted array.

Have we reached the end of any of the arrays?

No:

Compare current elements of both arrays

Copy smaller element into sorted array

Move pointer of element containing smaller element

Yes:

Copy all remaining elements of non-empty array

Merge step

36 | P a g e
Writing the Code for Merge Algorithm
A noticeable difference between the merging step we described above and
the one we use for merge sort is that we only perform the merge function on
consecutive sub-arrays.

This is why we only need the array, the first position, the last index of the first
subarray(we can calculate the first index of the second subarray) and the last
index of the second subarray.

Our task is to merge two subarrays A[p..q] and A[q+1..r] to create a sorted
array A[p..r] . So the inputs to the function are A, p, q and r
The merge function works as follows:

1. Create copies of the subarrays L <- A[p..q] and M <- A[q+1..r] .

2. Create three pointers i , j and k

a. i maintains current index of L , starting at 1


b. j maintains current index of M , starting at 1
c. k maintains the current index of A[p..q] , starting at p .
3. Until we reach the end of either L or M , pick the larger among the elements
from L and M and place them in the correct position at A[p..q]

4. When we run out of elements in either L or M , pick up the remaining elements


and put in A[p..q]

In code, this would look like:

// Merge two subarrays L and M into arr


void merge(int arr[], int p, int q, int r) {

// Create L ← A[p..q] and M ← A[q+1..r]


int n1 = q - p + 1;
int n2 = r - q;

int L[n1], M[n2];

37 | P a g e
for (int i = 0; i < n1; i++)
L[i] = arr[p + i];
for (int j = 0; j < n2; j++)
M[j] = arr[q + 1 + j];

// Maintain current index of sub-arrays and main array


int i, j, k;
i = 0;
j = 0;
k = p;

// Until we reach either end of either L or M, pick larger among


// elements L and M and place them in the correct position at A[p..r]
while (i < n1 && j < n2) {
if (L[i] <= M[j]) {
arr[k] = L[i];
i++;
} else {
arr[k] = M[j];
j++;
}
k++;
}

// When we run out of elements in either L or M,


// pick up the remaining elements and put in A[p..r]
while (i < n1) {
arr[k] = L[i];
i++;
k++;
}

while (j < n2) {


arr[k] = M[j];
j++;
k++;
}
}

Merge( ) Function Explained Step-By-Step


A lot is happening in this function, so let's take an example to see how this
would work.

38 | P a g e
As usual, a picture speaks a thousand words.

Merging two consecutive subarrays of


array
The array A[0..5] contains two sorted subarrays A[0..3] and A[4..5] . Let us
see how the merge function will merge the two arrays.

void merge(int arr[], int p, int q, int r) {


// Here, p = 0, q = 4, r = 6 (size of array)

Step 1: Create duplicate copies of sub-arrays to be sorted

// Create L ← A[p..q] and M ← A[q+1..r]


int n1 = q - p + 1 = 3 - 0 + 1 = 4;
int n2 = r - q = 5 - 3 = 2;

int L[4], M[2];

for (int i = 0; i < 4; i++)


L[i] = arr[p + i];
// L[0,1,2,3] = A[0,1,2,3] = [1,5,10,12]

for (int j = 0; j < 2; j++)


M[j] = arr[q + 1 + j];
// M[0,1] = A[4,5] = [6,9]

Create copies of subarrays for merging


Step 2: Maintain current index of sub-arrays and main array

39 | P a g e
int i, j, k;
i = 0;
j = 0;
k = p;

Maintain indices of copies of sub array and main array


Step 3: Until we reach the end of either L or M, pick larger among
elements L and M and place them in the correct position at
A[p..r]

while (i < n1 && j < n2) {


if (L[i] <= M[j]) {
arr[k] = L[i]; i++;
}
else {
arr[k] = M[j];
j++;
}
k++;
}

40 | P a g e
Comparing individual elements of sorted subarrays until we reach end of one
Step 4: When we run out of elements in either L or M, pick up
the remaining elements and put in A[p..r]

// We exited the earlier loop because j < n2 doesn't hold


while (i < n1)
{
arr[k] = L[i];
i++;
k++;
}

41 | P a g e
Copy
the remaining elements from the first array to main subarray

// We exited the earlier loop because i < n1 doesn't hold


while (j < n2)
{
arr[k] = M[j];
j++;
k++;
}
}

Copy remaining
elements of second array to main subarray
This step would have been needed if the size of M was greater than L.

At the end of the merge function, the subarray A[p..r] is sorted.

42 | P a g e
Merge Sort Code in Python, Java, and C/C++
Python
Java
C
C++
# MergeSort in Python

def mergeSort(array):
if len(array) > 1:

# r is the point where the array is divided into two subarrays


r = len(array)//2
L = array[:r]
M = array[r:]

# Sort the two halves


mergeSort(L)
mergeSort(M)

i = j = k = 0

# Until we reach either end of either L or M, pick larger among


# elements L and M and place them in the correct position at A[p..r]
while i < len(L) and j < len(M):
if L[i] < M[j]:
array[k] = L[i]
i += 1
else:
array[k] = M[j]
j += 1
k += 1

# When we run out of elements in either L or M,

# pick up the remaining elements and put in A[p..r]


while i < len(L):
array[k] = L[i]
i += 1
k += 1

while j < len(M):


array[k] = M[j]
j += 1
k += 1

43 | P a g e
# Print the array
def printList(array):
for i in range(len(array)):
print(array[i], end=" ")
print()

# Driver program
if __name__ == '__main__':
array = [6, 5, 12, 10, 9, 1]

mergeSort(array)

print("Sorted array is: ")


printList(array)

Merge Sort Complexity


Time Complexity

Best O(n*log n)

Worst O(n*log n)

Average O(n*log n)

Space Complexity O(n)

Stability Yes

44 | P a g e
Time Complexity

Best Case Complexity: O(n*log n)

Worst Case Complexity: O(n*log n)

Average Case Complexity: O(n*log n)

Space Complexity

The space complexity of merge sort is O(n) .

Merge Sort Applications


• Inversion count problem

• External sorting

• E-commerce applications

Quicksort Algorithm
Quicksort is a sorting algorithm based on the divide and conquer
approach where
1. An array is divided into subarrays by selecting a pivot element (element
selected from the array).

While dividing the array, the pivot element should be positioned in such a
way that elements less than pivot are kept on the left side and elements
greater than pivot are on the right side of the pivot.
2. The left and right subarrays are also divided using the same approach. This
process continues until each subarray contains a single element.

3. At this point, elements are already sorted. Finally, elements are combined to
form a sorted array.

45 | P a g e
Working of Quicksort Algorithm
1. Select the Pivot Element
There are different variations of quicksort where the pivot element is selected
from different positions. Here, we will be selecting the rightmost element of
the array as the pivot element.

Select a pivot element

2. Rearrange the Array


Now the elements of the array are rearranged so that elements that are
smaller than the pivot are put on the left and the elements greater than the
pivot are put on the right.

Put all the smaller


elements on the left and greater on the right of pivot element

Here's how we rearrange the array:

1. A pointer is fixed at the pivot element. The pivot element is compared with
the elements beginning from the first index.

Comparison of
pivot element with element beginning from the first index

46 | P a g e
2. If the element is greater than the pivot element, a second pointer is set for

that element. If the


element is greater than the pivot element, a second pointer is set for that
element.

3. Now, pivot is compared with other elements. If an element smaller than the
pivot element is reached, the smaller element is swapped with the greater
element found earlier.

Pivot is compared
with other elements.

4. Again, the process is repeated to set the next greater element as the second
pointer. And, swap it with another smaller element.

47 | P a g e
The process is
repeated to set the next greater element as the second pointer.

5. The process goes on until the second last element is reached.

The process goes


on until the second last element is reached.

6. Finally, the pivot element is swapped with the second pointer.

Finally, the pivot


element is swapped with the second pointer.

3. Divide Subarrays
Pivot elements are again chosen for the left and the right sub-parts
separately. And, step 2 is repeated.

48 | P a g e
Select pivot element of
in each half and put at correct place using recursion

The subarrays are divided until each subarray is formed of a single element.
At this point, the array is already sorted.

Quick Sort Algorithm

quickSort(array, leftmostIndex, rightmostIndex)


if (leftmostIndex < rightmostIndex)
pivotIndex <- partition(array,leftmostIndex, rightmostIndex)
quickSort(array, leftmostIndex, pivotIndex - 1)
quickSort(array, pivotIndex, rightmostIndex)

partition(array, leftmostIndex, rightmostIndex)


set rightmostIndex as pivotIndex
storeIndex <- leftmostIndex - 1
for i <- leftmostIndex + 1 to rightmostIndex
if element[i] < pivotElement
swap element[i] and element[storeIndex]
storeIndex++
swap pivotElement and element[storeIndex+1]
return storeIndex + 1

Visual Illustration of Quicksort Algorithm


49 | P a g e
You can understand the working of quicksort algorithm with the help of the
illustrations below.

Sorting the elements on the left of pivot using recursion

Sorting the elements on the right of pivot using recursion

Quicksort Code in Python, Java, and C/C++


Python
Java
C
C++
# Quick sort in Python

# function to find the partition position


def partition(array, low, high):

# choose the rightmost element as pivot


pivot = array[high]

50 | P a g e
# pointer for greater element
i = low - 1

# traverse through all elements


# compare each element with pivot
for j in range(low, high):
if array[j] <= pivot:
# if element smaller than pivot is found
# swap it with the greater element pointed by i
i = i + 1

# swapping element at i with element at j


(array[i], array[j]) = (array[j], array[i])

# swap the pivot element with the greater element specified by i


(array[i + 1], array[high]) = (array[high], array[i + 1])

# return the position from where partition is done


return i + 1

# function to perform quicksort

def quickSort(array, low, high):


if low < high:

# find pivot element such that


# element smaller than pivot are on the left
# element greater than pivot are on the right
pi = partition(array, low, high)

# recursive call on the left of pivot


quickSort(array, low, pi - 1)

# recursive call on the right of pivot


quickSort(array, pi + 1, high)

data = [8, 7, 2, 1, 0, 9, 6]
print("Unsorted Array")
print(data)

size = len(data)

quickSort(data, 0, size - 1)

print('Sorted Array in Ascending Order:')

51 | P a g e
print(data)

Quicksort Complexity
Time Complexity

Best O(n*log n)

Worst O(n2)

Average O(n*log n)

Space Complexity O(log n)

Stability No

1. Time Complexities

• Worst Case Complexity [Big-O]: O(n2)

It occurs when the pivot element picked is either the greatest or the smallest
element.

This condition leads to the case in which the pivot element lies in an extreme
end of the sorted array. One sub-array is always empty and another sub-
array contains n - 1 elements. Thus, quicksort is called only on this sub-
array.

However, the quicksort algorithm has better performance for scattered


pivots.

52 | P a g e
• Best Case Complexity [Big-omega]: O(n*log n)

It occurs when the pivot element is always the middle element or near to the
middle element.
• Average Case Complexity [Big-theta]: O(n*log n)

It occurs when the above conditions do not occur.


2. Space Complexity

The space complexity for quicksort is O(log n) .

Quicksort Applications
Quicksort algorithm is used when

• the programming language is good for recursion

• time complexity matters

• space complexity matters

Heap Sort Algorithm


Heap Sort is a popular and efficient sorting algorithm in computer
programming. Learning how to write the heap sort algorithm requires
knowledge of two types of data structures - arrays and trees.
The initial set of numbers that we want to sort is stored in an array e.g. [10,

3, 76, 34, 23, 32] and after sorting, we get a sorted array [3,10,23,32,34,76] .

Heap sort works by visualizing the elements of the array as a special kind of
complete binary tree called a heap.

53 | P a g e
Note: As a prerequisite, you must know about a complete binary
tree and heap data structure.

Relationship between Array Indexes and Tree


Elements
A complete binary tree has an interesting property that we can use to find
the children and parents of any node.

If the index of any element in the array is i , the element in the index 2i+1 will
become the left child and element in 2i+2 index will become the right child.
Also, the parent of any element at index i is given by the lower bound of (i-

1)/2 .

Relationship
between array and heap indices

Let's test it out,

Left child of 1 (index 0)

= element in (2*0+1) index

= element in 1 index

54 | P a g e
= 12

Right child of 1

= element in (2*0+2) index

= element in 2 index

= 9

Similarly,

Left child of 12 (index 1)

= element in (2*1+1) index

= element in 3 index

= 5

Right child of 12

= element in (2*1+2) index

= element in 4 index

= 6

Let us also confirm that the rules hold for finding parent of any node

Parent of 9 (position 2)

= (2-1)/2

= ½

= 0.5

~ 0 index

= 1

55 | P a g e
Parent of 12 (position 1)

= (1-1)/2

= 0 index

= 1

Understanding this mapping of array indexes to tree positions is critical to


understanding how the Heap Data Structure works and how it is used to
implement Heap Sort.

What is Heap Data Structure?


Heap is a special tree-based data structure. A binary tree is said to follow a
heap data structure if

• it is a complete binary tree


• All nodes in the tree follow the property that they are greater than their
children i.e. the largest element is at the root and both its children and smaller
than the root and so on. Such a heap is called a max-heap. If instead, all
nodes are smaller than their children, it is called a min-heap

The following example diagram shows Max-Heap and Min-Heap.

56 | P a g e
Max Heap and Min
Heap

To learn more about it, please visit Heap Data Structure.

How to "heapify" a tree


Starting from a complete binary tree, we can modify it to become a Max-
Heap by running a function called heapify on all the non-leaf elements of the
heap.

Since heapify uses recursion, it can be difficult to grasp. So let's first think
about how you would heapify a tree with just three elements.

heapify(array)
Root = array[0]
Largest = largest( array[0] , array [2*0 + 1]. array[2*0+2])
if(Root != Largest)
Swap(Root, Largest)

57 | P a g e
Heapify base cases

The example above shows two scenarios - one in which the root is the largest
element and we don't need to do anything. And another in which the root had
a larger element as a child and we needed to swap to maintain max-heap
property.

If you're worked with recursive algorithms before, you've probably identified


that this must be the base case.

Now let's think of another scenario in which there is more than one level.

58 | P a g e
How to heapify root element when its subtrees are
already max heaps

The top element isn't a max-heap but all the sub-trees are max-heaps.

To maintain the max-heap property for the entire tree, we will have to keep
pushing 2 downwards until it reaches its correct position.

59 | P a g e
How to heapify
root element when its subtrees are max-heaps

Thus, to maintain the max-heap property in a tree where both sub-trees are
max-heaps, we need to run heapify on the root element repeatedly until it is
larger than its children or it becomes a leaf node.

We can combine both these conditions in one heapify function as

void heapify(int arr[], int n, int i) {


// Find largest among root, left child and right child
int largest = i;
int left = 2 * i + 1;
int right = 2 * i + 2;

if (left < n && arr[left] > arr[largest])


largest = left;

if (right < n && arr[right] > arr[largest])


largest = right;

60 | P a g e
// Swap and continue heapifying if root is not largest
if (largest != i) {
swap(&arr[i], &arr[largest]);
heapify(arr, n, largest);
}
}

This function works for both the base case and for a tree of any size. We can
thus move the root element to the correct position to maintain the max-heap
status for any tree size as long as the sub-trees are max-heaps.

Build max-heap
To build a max-heap from any tree, we can thus start heapifying each sub-
tree from the bottom up and end up with a max-heap after the function is
applied to all the elements including the root element.

In the case of a complete tree, the first index of a non-leaf node is given
by n/2 - 1 . All other nodes after that are leaf-nodes and thus don't need to
be heapified.
So, we can build a maximum heap as

// Build heap (rearrange array)


for (int i = n / 2 - 1; i >= 0; i--)
heapify(arr, n, i);

61 | P a g e
Create array and calculate i

62 | P a g e
Steps to build max heap for heap sort
Steps to build max heap for heap sort

63 | P a g e
Steps to build max heap for heap sort

As shown in the above diagram, we start by heapifying the lowest smallest


trees and gradually move up until we reach the root element.

64 | P a g e
If you've understood everything till here, congratulations, you are on your
way to mastering the Heap sort.

Working of Heap Sort


1. Since the tree satisfies Max-Heap property, then the largest item is stored at
the root node.

2. Swap: Remove the root element and put at the end of the array (nth position)
Put the last item of the tree (heap) at the vacant place.
3. Remove: Reduce the size of the heap by 1.
4. Heapify: Heapify the root element again so that we have the highest element
at root.
5. The process is repeated until all the items of the list are sorted.

65 | P a g e
Swap, Remove, and Heapify
66 | P a g e
The code below shows the operation.

// Heap sort
for (int i = n - 1; i >= 0; i--) {
swap(&arr[0], &arr[i]);

// Heapify root element to get highest element at root again


heapify(arr, i, 0);
}

Heap Sort Code in Python, Java, and C/C++


Python

Java

C++
# Heap Sort in python

def heapify(arr, n, i):


# Find largest among root and children
largest = i
l = 2 * i + 1
r = 2 * i + 2

if l < n and arr[i] < arr[l]:


largest = l

if r < n and arr[largest] < arr[r]:


largest = r

# If root is not largest, swap with largest and continue heapifying


if largest != i:
arr[i], arr[largest] = arr[largest], arr[i]
heapify(arr, n, largest)

def heapSort(arr):
n = len(arr)

# Build max heap


for i in range(n//2, -1, -1):

67 | P a g e
heapify(arr, n, i)

for i in range(n-1, 0, -1):


# Swap
arr[i], arr[0] = arr[0], arr[i]

# Heapify root element


heapify(arr, i, 0)

arr = [1, 12, 9, 5, 6, 10]


heapSort(arr)
n = len(arr)
print("Sorted array is")
for i in range(n):
print("%d " % arr[i], end='')

Heap Sort Complexity


Time Complexity

Best O(nlog n)

Worst O(nlog n)

Average O(nlog n)

Space Complexity O(1)

Stability No

Heap Sort has O(nlog n) time complexities for all the cases ( best case,
average case, and worst case).
Let us understand the reason why. The height of a complete binary tree
containing n elements is log n

68 | P a g e
As we have seen earlier, to fully heapify an element whose subtrees are
already max-heaps, we need to keep comparing the element with its left and
right children and pushing it downwards until it reaches a point where both
its children are smaller than it.

In the worst case scenario, we will need to move an element from the root to
the leaf node making a multiple of log(n) comparisons and swaps.
During the build_max_heap stage, we do that for n/2 elements so the worst
case complexity of the build_heap step is n/2*log n ~ nlog n .

During the sorting step, we exchange the root element with the last element
and heapify the root element. For each element, this again takes log n worst
time because we might have to bring the element all the way from the root
to the leaf. Since we repeat this n times, the heap_sort step is also nlog n .

Also since the build_max_heap and heap_sort steps are executed one after
another, the algorithmic complexity is not multiplied and it remains in the
order of nlog n .

Also it performs sorting in O(1) space complexity. Compared with Quick Sort,
it has a better worst case ( O(nlog n) ) . Quick Sort has complexity O(n^2) for
worst case. But in other cases, Quick Sort is fast. Introsort is an alternative
to heapsort that combines quicksort and heapsort to retain advantages of
both: worst case speed of heapsort and average speed of quicksort.

Heap Sort Applications


Systems concerned with security and embedded systems such as Linux
Kernel use Heap Sort because of the O(n log n) upper bound on Heapsort's
running time and constant O(1) upper bound on its auxiliary storage.
Although Heap Sort has O(n log n) time complexity even for the worst case,
it doesn't have more applications ( compared to other sorting algorithms like

69 | P a g e
Quick Sort, Merge Sort ). However, its underlying data structure, heap, can
be efficiently used if we want to extract the smallest (or largest) from the list
of items without the overhead of keeping the remaining items in the sorted
order. For e.g Priority Queues.

Quick sort
• Python code
• Based on divide and conquer approach.
• Algorithm:
o An array is divided into sub-arrays by selecting a pivot element (element
selected from the array).
o While dividing the array, the pivot element should be positioned in such a way
that elements less than pivot are kept on the left side and elements greater
than pivot are on the right side of the pivot.
o The left and right sub-arrays are also divided using the same approach. This
process continues until each subarray contains a single element.
o At this point, elements are already sorted. Finally, elements are combined to
form a sorted array
• Working with Quicksort algorithm:
i. Select the pivot element. We select the rightmost element of array as pivot
element.

ii. Rearrange the array. We rearrange smaller and larger elements to right and left
side of pivot.

iii. How do we rearrange the array?


a. We need PIVOT which is last element, "i" the first largest element from
left side, and "j" which is the iterator (next element in array).

70 | P a g e
b. We compare "j" with pivot. If "j" is smaller than pivot we swap "j" with
"i", and make "++i".
c. If "j" reaches the pivot, we just swap pivot with "i".
d. Now we have two sub-arrays, we repeat the same algo.

Heap sort
• Python code
• Left child of element i is 2i + 1, right child is 2i + 2. Indexing starts from 0
• Parent of element i can be found with (i-1) / 2
• Heap data structure:
o It is a complete binary tree (nodes are formed from left to right)
o All nodes are greater than children (max-heap)

o
• To create a Max-Heap from a complete binary tree, we must use a heapify function.

71 | P a g e
o
o n/2 - 1 is the first index of a non-leaf node.

o Heapify function, which bring larger element in top. Used just for one
sub-tree recursively.

o void heapify(int arr[], int n, int i) {


o // Find largest among root, left child and right child
o int largest = i;
o int left = 2 * i + 1;
o int right = 2 * i + 2;
o
o if(left < n && arr[left] > arr[largest])
o largest = left;
o
o if(right < n && arr[right] > arr[largest])
o largest = right;
o
o // Swap and continue heapifying if root is not largest
o if (largest != i) {
o swap(&arr[i], &arr[largest]);
o heapify(arr, n, largest);
o }
}

72 | P a g e
o Firstly, it is a kind of pre-condition for swapping, we must bring our tree
to MAX-HEAP, so that the largest element is in top. It is needed so that
we start sorting the array.

o // Max-heap creation
o for(int i = n/2 - 1; i >= 0; i--)
heapify(arr, n, i);

o After that we swap elements, and apply heapify again.

o // Build heap (rearrange array)


o for (int i = n/2 - 1; i >= 0; i--)
o swap(arr[i], arr[0]);
heapify(arr, n, i);

Linked List
• Array limitations:
o Fixed size
o Physically stored in consecutive memory locations
o To insert or delete items, may need to shift data
• Variations of linked list: linear linked list, circular linked list, double linked list
• head pointer "defines" the linked list (it is not a node)

• Advantages of Linked Lists


o The items do NOT have to be stored in consecutive memory locations.
▪ So, can insert and delete items without shifting data.
▪ Can increase the size of the data structure easily.
o Linked lists can grow dynamically (i.e. at run time) – the amount of memory
space allocated can grow and shrink as needed.

73 | P a g e
• Disadvantages of Linked Lists
o A linked list will use more memory storage than arrays. It has more memory for
an additional linked field or next pointer field.
o Linked list elements cannot randomly be accessed.
o Binary search cannot be applied in a linked list.
o A linked list takes more time in traversing of elements.
• Node
o A linked list is an ordered sequence of items called nodes
o A node is the basic unit of representation in a linked list
o A node in a singly linked list consists of two fields:
▪ A data portion
▪ A link (pointer) to the next node in the structure
o The first item (node) in the linked list is accessed via a front or head pointer
▪ The linked list is defined by its head (this is its starting point)
• We will use ListNode and LinkedList classes ([Link]
• class Node {
• public:
• int info; // data
• Node* next; // pointer to next node in the list
• /*Node(int val) {info = val; next=NULL;}*/
• };

• class List {
• public:
• // head: a pointer to the first node in the list.
• // Since the list is empty initially, head is set to NULL
• List(void) {head = NULL;} // constructor
• ~List(void); // destructor

• private:
• Node* head;
• };
// isEmpty, insertNode, findNode, deleteNode, displayList
• Boundary condition
o Empty data structure
o Single element in the data structure
o Adding / removing beginning of data structure
o Adding / removing end of data structure
o Working in the middle

Insertion at beginning in Linked List


• [Link]
• It is just a 2-step algorithm:
o New node should be connected to the first node, which means the head. This
can be achieved by assigning the address of node to the head.

74 | P a g e
o New node should be considered as a head. It can be achieved by declaring head
equals to a new node.

void insertStart(int val) {


Node *node = new Node; // create a new node (node=node)
node->info=val; // put value

if(head == NULL) { // check if the list is empty


head = node;
node->next = NULL
}
else { // if list is not empty
node->next = head;
head = node;
}
}

Insertion at the end in Linked List


void insertEnd(int val) {
Node *node = new Node; // create a new node
node->info = val; // put value
node->next = NULL; // pointer of last node is NULL

if(head == NULL) { // if empty


node->next = NULL
head = node;
}
else {
Node *cur = new Node();
cur = head;
while(cur->next != NULL) {
cur = cur->next;
}
cur->next = node;
}
}

Insertion at particular position


• In this case, we don’t disturb the head and tail nodes. Rather, a new node is inserted
between two consecutive nodes.
• We call one node as current and the other as previous, and the new node is placed
between them.
• Two steps we need to insert between previous and currect:
o Pass the address of the new node in the next field of the previous node.
o Pass the address of the current node in the next field of the new node.
• void insertPosition(int pos, int val) {
• Node *pre;
• Node *cur;
• Node *node = new Node;

• node->data = val;

75 | P a g e
• cur = head;

• for(int i=1; i<pos; i++) {
• pre = cur;
• cur = cur->next;
• }
• pre->next = node;
• node->next = cur;
}
void insertSpecificValue(int sp_val, int data) {
Node *pre;
Node *cur;
Node *node = new Node;

node->info = data;
cur = head; // "current" in the beginning points to head, and "previous"
points to NULL

while(cur->data != sp_val) {
pre = cur;
cur = cur->next;
}
node->next = cur;
cur->next = node;
}

Deleting the first node from a Linked List


• Following steps, we need to remove the first node:
o Check if the linked list exists or not if(head == NULL).
o Check if it is one element list.
o However, if there are nodes in the linked list, then we use a pointer
variable PTR that is set to point to the first node of the list. For this, we
initialize PTR with Head that stores the address of the first node of the list.
o Head is made to point to the next node in sequence and finally the memory
occupied by the node pointed by PTR is freed and returned to the free pool.
• void deleteFirst() {
• if(head == NULL) { // if empty
• cout << "Underflow" << endl;
• }
• else if([Link] == NULL) { // if only one element
• Node *ptr;
• ptr = head;
• head = NULL;
• delete ptr;
• }
• else { // otherwise
• Node *ptr;
• ptr = head;
• head = head->next;
• delete ptr;
• }
}

76 | P a g e
Deleting the last node from a Linked List
• Following steps we need to remove the first node:
o Check if the linked list exists or not if(head == NULL).
o Check if it is one element list.
o Take a pointer variable PTR and initialize it with head. That is, PTR now points to
the first node of the linked list. In the while loop, we take another pointer
variable PREPTR such that it always points to one node before the PTR. Once we
reach the last node and the second last node, we set the NEXT pointer of the
second last node to NULL, so that it now becomes the (new) last node of the
linked list. The memory of the previous last node is freed and returned back to
the free pool.
• STEP 1: IF START = NULL
• WRITE UNDERFLOW
• Go to STEP 8
• [END OF IF]
• STEP 2: SET PTR = START
• STEP 3: REPEAT Steps 4 and 5 while PTR->NEXT != NULL
• STEP 4: SET PREPTR = PTR
• STEP 5: SET PTR = PTR->NEXT
• [END OF LOOP]
• STEP 6: SET PREPTR->NEXT = NULL
• STEP 7: FREE PTR
• STEP 8: EXIT

Deleting the Specific Node in a Linked List


Step 1: IF START = NULL
Write UNDERFLOW Go to Step 10
[END OF IF]
Step 2: SET PTR = START
Step 3: SET PREPTR = PTR
Step 4: Repeat Steps 5 and 6 while PREPTR-> DATA I = NUM
Step 5: SET PREPTR = PTR
Step 6: SET PTR = PTR -> NEXT
[END OF LOOP)
Step 7: SET TEMP = PTR
Step 8: SET PREPTR -> NEXT - PTR-> NEXT
Step 9: FREE TEMP
Step 10: EXIT

Circular Linked List


• [Link]
• In a circular linked list, the last node contains a pointer to the first node.
• No node points to NULL!
• Start at head, iterate until you find head again: t == head, [Link] == head
• Complexity for all operations is O(n)
• class Node {
• int info;

77 | P a g e
• Node *next;
• };

• class CircularLList {
• public:
• Node *last;

• CircularLList() {
• last = NULL;
• }
• };

Insertion at Beginning in Circular Linked List


void addBegin(int val) {
Node *temp = new Node();
temp->info=val;

if (last == NULL) { // if empty


last = temp;
temp->next = last; // points next to itself // in simple LL it pointed to NULL
}
else {
temp->next = last;
last = temp;
}

Insertion at the End in Circular Linked List


while cur->next != last) {
cur = cur->next;
}
cur->next = New;
New->next = last;

Insertion at Particular Position in Circular Linked List


void insertNode(int item,int pos) {
Node *New = new Node();
Node *prev;
Node *cur;
New->data = item;

if(last == NULL){ // insert into empty list


last = New;
last->next = last;
}

prev = last;
cur = last->next;

for (int i=1; i<pos; i++) {


prev = cur;
cur = cur->next;
}

78 | P a g e
New->next = cur;
prev->next = New;
}

Deletion a Node in Circular Linked List


• From a single-node circular linked list (node points to itself):
• last = NULL;
delete cur;
• Delete the head node:
• while(prev->next != last) {
• prev = cur;
• cur = cur->next;
• }
• prev->next = cur->next;
delete cur;
• Delete a middle node Cur:
• for(i=1; i<=pos; i++) {
• prev = cur;
• cur = cur->next;
• }
• prev->next = cur->next;
delete cur;
• Delete the end node:
• while(cur->next != last) {
• prev = cur;
• cur = cur->next;
• }
• prev->next = cur->next;
delete cur;

Doubly Linked List


• [Link]
• DLL contains a pointer to the next as well as the previous node in the sequence.
Therefore, it consists of three parts:
o data
o a pointer to the next node
o a pointer to the previous node
• class Node {
• int info;
• Node *next;
• Node *pre;
• }

Stacks

79 | P a g e
• Last in, first out (LIFO)
• Elements are added to and removed from the top of the stack (the most recently added
items are at the top of the stack).

• Operations on Stack:
o push(i) to insert the element i on the top of the stack.
o pop() to remove the top element of the stack and to return the removed
element as a function value.
o top() to return the top element of stack(s)
o empty() to check whether the stack is empty or not. It returns true if stack is
empty and returns false otherwise.

Array Representation of Stacks


• In the computer’s memory, stacks can be represented as a linear array.
• Every stack has a variable called TOP associated with it, which is used to store the
address of the topmost element of the stack.
• TOP is the position where the element will be added to or deleted from
• There is another variable called MAX, which is used to store the maximum number of
elements that the stack can hold.
• Underflow and Overflow:
o if TOP = NULL (underflow) it indicates that the stack is empty and
o if TOP = MAX–1 (overflow) then the stack is full.
• Pseudocode for PUSH, POP, PEEK:
• PUSH operation

80 | P a g e
• Step 1: IF TOP = MAX - 1
• PRINT "OVERFLOW"
• Goto Step 4
• [END OF IF]
• Step 2: SET TOP = TOP + 1
• Step 3: SET STACK[TOP] = VALUE
• Step 4: END

• POP operation
• Step 1: IF TOP = NULL
• PRINT "UNDERFLOW"
• Goto Step 4
• [END OF IF]
• Step 2: SET VALUE STACK(TOP)
• Step 3: SET TOP = TOP - 1
• Step 4: END

• PEEK operation
• Step 1: IF TOP = NULL
• PRINT "STACK IS EMPTY"
• Goto Step 3
• Step 2: RETURN STACK[TOP]
• Step 3: END

Linked Representation of Stack


• Stack may be created using an array. This technique of creating a stack is easy, but the
drawback is that the array must be declared to have some fixed size.
• In a linked stack, every node has two parts—one that stores data and another that
stores the address of the next node. The START pointer of the linked list is used as TOP.
• PUSH is adding a node at beginning, POP deleting front node.

Infix to Postfix
• Algorithm used (Postfix):
o Step 1: Add ) to the end of the infix expression
o Step 2: Push ( onto the STACK
o Step 3: Repeat until each character in the infix notation is scanned
▪ IF a ( is encountered, push it on the STACK.
▪ IF an operand (whether a digit or acharacter) is encountered, add it
postfix expression.
▪ IF a ) is encountered, then
▪ a. Repeatedly pop from STACK and add it to the postfix
expression until a ( is encountered.
▪ b. Discard the (. That is, remove the ( from STACK and do not
add it to the postfix expression
▪ IF an operator O is encountered, then

81 | P a g e
a. Repeatedly pop from STACK and add each operator (popped

from the STACK) to the postfix expression which has the same
precedence or ahigher precedence than O
▪ b. Push the operator to the STACK [END OF IF]
o Step 4: Repeatedly pop from the STACK and add it to the postfix expression until
the STACK is empty
o Step 5: EXIT
• If / adds to ((-* we will take only *, then it will be ((-/
• Example: (A * B) + (C / D) – (D + E)

• (A * B) + (C / D) – (D + E)) [put extra ")" at last]

• Char Stack Expression
• ( (( Push at beginning "("
• A (( A
• * ((* A
• B ((* AB
• ) ( AB*
• + (+ AB*
• ( (+( AB*
• C (+( AB*C
• / (+(/ AB*C
• D (+(/ AB*CD
• ) (+ AB*CD/
• - (- AB*CD/+
• ( (-( AB*CD/+
• D (-( AB*CD/+D
• + (-(+ AB*CD/+D
• E (-(+ AB*CD/+DE
• ) (- AB*CD/+DE+
• ) AB*CD/+DE+-

Evaluation of Postfix expression


• [AB*CD/+DE+-] ==> 2 3 * 2 4 / + 4 3 + -

• Char Stack Operation
• 2 2
• 3 2, 3
• * 6 2*3
• 2 6, 2
• 4 6, 2, 4
• / 6, 0 2/4
• + 0 6+0
• 4 6, 4
• 3 6, 4, 3
• + 6, 7 4+3
• - -1 6-7

Infix to Prefix

82 | P a g e
First method

• Algorithm used (Prefix):

o Step 1. Push ) onto STACK, and add ( to start of the A.


o Step 2. Scan A from right to left and repeat step 3 to 6 for each element of A
until the STACK is empty or contain only )
o Step 3. If an operand is encountered add it to B
o Step 4. If a right parenthesis is encountered push it onto STACK
o Step 5. If an operator is encountered then:
▪ a. Repeatedly pop from STACK and add to B each operator (on the top
of STACK) which has only higher precedence than the operator.
▪ b. Add operator to STACK
o Step 6. If left parenthesis is encountered then
▪ a. Repeatedly pop from the STACK and add to B (each operator on top
of stack until a right parenthesis is encountered)
▪ b. Remove the left parenthesis
o Step 7. Reverse B to get prefix form
• Example: 14 / 7 * 3 - 4 + 9 / 2

• (14 / 7 * 3 - 4 + 9 / 2 [Put extra "(" to start]

• Char Stack Expression
• 2 ) Push at beginning ")"
• / )/ 2
• 9 )/ 2 9
• + )+ 2 9 /
• 4 )+ 2 9 / 4
• - )+- 2 9 / 4
• 3 )+- 2 9 / 4 3
• * )+-* 2 9 / 4 3
• 7 )+-* 2 9 / 4 3 7
• / )+-*/ 2 9 / 4 3 7
• 14 )+-*/ 2 9 / 4 3 7 14
• ( 2 9 / 4 3 7 14 / * - +

• DON'T FORGET TO REVERSE: + - * / 14 7 3 4 / 9 2

Second method

• Algorithm used (Prefix):

o Step 1: Reverse the infix string. Note that while reversing the string you must
interchange left and right parentheses. Eg. (3+2) will be (2+3) but not )2+3(
o Step 2: Obtain the postfix expression of the infix expression obtained in Step 1.
o Step 3: Reverse the postfix expression to get the prefix expression
• Example: 14 / 7 * 3 - 4 + 9 / 2

83 | P a g e

• Reversed: 2 / 9 + 4 - 3 * 7 / 14

• Char Stack Expression
• 2 ( Push at beginning "("
• / (/ 2
• 9 (/ 2 9
• + (+ 2 9 /
• 4 (+ 2 9 / 4
• - (+- 2 9 / 4
• 3 (+- 2 9 / 4 3
• * (+-* 2 9 / 4 3
• 7 (+-* 2 9 / 4 3 7
• / (+-*/ 2 9 / 4 3 7
• 14 (+-*/ 2 9 / 4 3 7 14
• ) 2 9 / 4 3 7 14 / * - +

• DON'T FORGET TO REVERSE: + - * / 14 7 3 4 / 9 2

• NOTE: Operator with the same precedence must not be popped from stack

Evaluation of Prefix Expression


• For postfix we eveluated a+b but in prefix we will do b+a
• Example: 14 / 7 * 3 - 4 + 9 / 2 ==> + - * / 14 7 3 4 / 9 2

• Char Stack Operation
• 2 2
• 9 2, 9
• / 4 9/2 [but in postfix we did 2/9]
• 4 4, 4
• 3 4, 4, 3
• 7 4, 4, 3, 7
• 14 4, 4, 3, 7, 14
• / 4, 4, 3, 2 14/2
• * 4, 4, 6 2*2
• - 4, 2 6-4
• + 6 2+4

Queue
• First in, first out (FIFO)
• The queue has a front and a rear

84 | P a g e
o
o Items can be removed only at the front
o Items can be added only at the ohter end, the rear
• Types of queues:
o Linear queue
o Circular queue
o Double ended queue (Deque)
o Priority queue

Linear Queue
• A queue is a sequence of data elements

• Enqueue (add element to back) when an item is inserted into the queue, it
always goes at the end (rear).
• Dequeue (remove element from front) when an item is taken from the queue, it
always comes from the front.

• Implemented using either array or a linear linked list.

• Array implementation:

o ENQUEUE
o Step 1: IF REAR = MAX-1
o Write "OVERFLOW"
o Goto step 4
o [END OF IF]
o Step 2: IF FRONT = -1 and REAR = -1
o SET FRONT = REAR = 0
o ELSE
o SET REAR = REAR + 1
o [END OF IF]
o Step 3: SET QUEUE [REAR] = NUM
o Step 4: EXIT
o DEQUEUE
o Step 1: IF FRONT = -1 OR FRONT > REAR
o Write "UNDERFLOW"
o ELSE
o SET VAL = QUEUE[FRONT]
o SET FRONT = FRONT + 1
o [END OF IF]
o Step 2: EXIT

• Linked list implementation:

85 | P a g e
o ENQUEUE the same as adding a node at the end
o Step 1: Allocate memory for the new node and name it as PTR
o Step 2: SET PTR -> DATA = VAL
o Step 3:
o IF FRONT = NULL
o SET FRONT = REAR = PTR
o SET FRONT -> NEXT = REAR -> NEXT = NULL
o ELSE
o SET REAR -> NEXT = PTR
o SET REAR = PTR
o SET REAR -> NEXT = NULL
o [END OF IF]
o Step 4: END
o
o DEQUEUE the same as deleting a node from the beginning
o Step 1: IF FRONT = NULL
o Write "Underflow"
o Go to Step 5
o [END OF IF]
o
o Step 2: SET PTR = FRONT
o Step 3: SET FRONT = FRONT -> NEXT
o Step 4: FREE PTR
o Step 5: END

Circular Queue
• [Link]

• Drawbacks of linear queue once the queue is full, eventhough few elements
from the front are deleted and some occupied space is relieved, it is not possible
to add anymore new elements, as the rear has already reached the Queue’s rear
most position.

• In circular queue, once the Queue is full the "First" index of the Queue becomes
the "Rear" most index, if and only if the "Front" element has moved forward.
Otherwise it will be a "Queue overflow" state.

• ENQUEUE algorithm:

• Insert-Circular-Q(CQueue, Rear, Front, N, Item)

86 | P a g e

• 1. If Front = -1 and Rear = -1:
• then Set Front :=0 and go to step 4

• 2. If Front = 0 and Rear = N-1 or Front = Rear + 1:
• then Print: “Circular Queue Overflow” and Return

• 3. If Rear = N -1:
• then Set Rear := 0 and go to step 4

• 4. Set CQueue [Rear] := Item and Rear := Rear + 1

• 5. Return
o Here, CQueue is a circular queue.
o Rear represents the location in which the data element is to be inserted.
o Front represents the location from which the data element is to be removed.
o N is the maximum size of CQueue
o Item is the new item to be added.
o Initailly Rear = -1 and Front = -1.

• DEQUEUE algorithm:

• Delete-Circular-Q(CQueue, Front, Rear, Item)



• 1. If Front = -1:
• then Print: “Circular Queue Underflow” and Return

• 2. Set Item := CQueue [Front]

• 3. If Front = N – 1:
• then Set Front = 0 and Return

• 4. If Front = Rear:
• then Set Front = Rear = -1 and Return

• 5. Set Front := Front + 1

• 6. Return
o CQueue is the place where data are stored.
o Rear represents the location in which the data element is to be inserted.
o Front represents the location from which the data element is to be removed.
o Front element is assigned to Item.
o Initially, Front = -1.
• While insert REAR++, FRONT
• While delete REAR, FRONT++
• If FRONT = REAR + 1 then queue is full! Overflow will occur.

Double Ended Queue


• It is exactly like a queue except that elements can be added to or removed from
the head or the tail.

87 | P a g e
• No element can be added and deleted from the middle.
• Implemented using either a circular array or a circular doubly linked list.
• In a deque, two pointers are maintained, LEFT and RIGHT, which point to either end of
the deque.
• The elements in a deque extend from the LEFT end to the RIGHT end and since it is
circular, Deque[N–1] is followed by Deque[0].
• Two types:
o Input restricted deque In this, insertions can be done only at one of the ends,
while deletions can be done from both ends.
o Output restricted deque In this deletions can be done only at one of the ends,
while insertions can be done on both ends.

Priority Queue
• A priority queue is a data structure in which each element is assigned a priority.
• The priority of the element will be used to determine the order in which the elements
will be processed.
• An element with higher priority is processed before an element with a lower priority.
• Two elements with the same priority are processed on a first-come-first-served (FCFS)
basis.

Tree
• Root: node without parent (A)
• Siblings: nodes share the same parent
• Internal node: node with at least one child (A, B, C, F)
• External node (leaf): node without children (E, I, J, K, G, H, D)
• Ancestors of a node: parent, grandparent, grand-grandparent, etc.
• Descendant of a node: child, grandchild, grand-grandchild, etc.
• Depth of a node: number of ancestors
• Height of a tree: maximum depth of any node (3)
• Degree of a node: the number of its children. The leaf of the tree does not have any
child so its degree is zero
• Degree of a tree: the maximum degree of a node in the tree.
• Subtree: tree consisting of a node and its descendants
• Empty (Null)-tree: a tree without any node
• Root-tree: a tree with only one node

88 | P a g e

Binary Tree
• [Link]

• It is a data structure that is defined as a collection of elements called nodes.

• In a binary tree,

o The topmost element is called the root node.


o Each node has 0, 1, or at the most 2 children.
o A node that has zero children is called a leaf node or a terminal node.
o Every node contains a data element, a left pointer which points to the left child,
and a right pointer which points to the right child

• Complete binary tree - every level except possibly the last is completely filled.
All nodes must appear as far left as possible.

• Linked list implementation of binary tree:

89 | P a g e
o Every node will have three parts: the data element, a pointer to the left
node, and a pointer to the right node.

o class Node {
o public:
o Node *left;
o int data;
o Node *right;
};
o Every binary tree has a pointer ROOT, which points to the root element
(topmost element) of the tree. If ROOT = NULL, then the tree is empty.

• Array implementation of binary treee:

o If TREE[1] = ROOT then


▪ the left child of a node K ==> 2*K
▪ the right child of a node K ==> 2*K+1
▪ parent of any node K ==> floor(K/2)
▪ max size of tree is 2h+1-1, where h = height
▪ P.S. floor(3/2) = 2
o If TREE[0] = ROOT then
▪ the left child of a node K ==> 2*K+1
▪ the right child of a node K ==> 2*K+2
▪ parent of any node K ==> floor(K/2)-1

• Algebraic expressions with binary tree

o ((a + b) – (c * d)) % ((f ^ g) / (h – i))

Traversing a Binary Tree


• [Link]
• PREORDER (NLR), POSTORDER (LRN) & INORDER TRAVERSAL (LNR)
• Preorder traversal can be used to extract a prefix notation

90 | P a g e

• PREORDER TRAVERSAL (NLR)

i. Visiting the root node,

ii. Traversing the left sub-tree, and finally

iii. Traversing the right sub-tree.

iv. Example outputs with preorder:


v. (a) A, B, D, G, H, L, E, C, F, I, J, K
vi. (b) A, B, D, C, E, F, G, H, I
• POSTORDER TRAVERSAL (LRN)

i. Traversing the left sub-tree,

ii. Visiting the root node, and finally

iii. Traversing the right sub-tree.

iv. Example outputs with postorder:


v. (a) G, L, H, D, E, B, I, K, J, F, C, A
vi. (b) D, B, H, I, G, F, E, C, A
• INORDER TRAVERSAL (LNR)

i. Traversing the left sub-tree,

ii. Traversing the right sub-tree, and finally

iii. Visiting the root node.

iv. Example outputs with inorder:


v. (a) G, D, H, L, B, E, A, C, I, F, K, J
vi. (b) B, D, A, E, H, G, I, F, C

91 | P a g e
Binary Search Tree
• A binary search tree, also known as an ordered binary tree, is a variant of binary trees
in which the nodes are arranged in an order.
• Left sub-tree nodes must have a value less than that of the root node.
• Right sub-tree must have a value either equal to or greater than the root node.
• O(n) worst case for searching in BST

Search & Insert Operation in Binary Search Tree


• Insert 39,27,45,18,29,40,9,21,10,19,54,59,65,60 in binary search tree

92 | P a g e
Deletion Operation in Binary Search Tree
• Deleting a Node that has no children, delete 78

• Deleting a Node with One Child, delete 54

• Deleting a Node with Two Children, delete 56

93 | P a g e
• Main algorithm:

Graphs
• Vertices (nodes), edges (lines between vertices), undericted graph, directed
graph

• Adjacent nodes and neighbors:

• O----O adjacent nodes

• Degree of a node - Total number os edges containing the node. If deg(u)=0


then isolated node.

• Size of a graph - The size of a graph is the total number of edges in it.

94 | P a g e
• Regular graph - It is a graph where each vertex has the same number of
neighbors. That is, every node has the same degree.

• Connected graph - A graph is said to be connected if for any two vertices (u,
v) in V there is a path from u to v. That is to say that there are no isolated
nodes in a connected graph.

• Complete graph - Fully connected. That is, there is a path from one node to
every other node in the graph. A complete graph has n(n–1)/2 edges, where n
is the number of nodes in G.

• Weighted graph - In a weighted graph, the edges of the graph are assigned
some weight or length.

95 | P a g e
• Multi-graph - A graph with multiple edges and/or loops is called a multi-
graph.

• Directed Graphs - digraph, a graph in which every edge has a direction


assigned to it.

• Terminology of a Directed graph:

o Out-degree of a node - The out-degree of a node u, written as outdeg(u), is the


number of edges that originate at u.
o In-degree of a node - The in-degree of a node u, written as indeg(u), is the
number of edges that terminate at u.
o Degree of a node - The degree of a node, written as deg(u), is equal to the sum
of in-degree and out-degree of that node. Therefore, deg(u) = indeg(u) +
outdeg(u).
o Isolated vertex - A vertex with degree zero. Such a vertex is not an end-point of
any edge.
o Pendant vertex - (also known as leaf vertex) A vertex with degree one.

• REPRESENTATION OF GRAPHS. Sequential (adjacency matrix) & linked rep-s.

96 | P a g e
o

Breadth First Search Traversal


• There are two standard methods of graph traversal:
i. Breadth-first search (uses queue)
ii. Depth-first search (uses stack)
• [Link]
• Breadth-first search. Complexity = O(vertices + edges), finding the shortest path on
unweighted graphs.
• BFS starts at some arbitrary node of a graph and explores the neighbour nodes first,
before moving to the next level neighbours.

97 | P a g e

Depth First Search


• [Link]
• Complexity = O(vertices + edges)
• Make sure you don't re-visit visited nodes! Continue on the previous node!
• Backtrack when a dead end is reched! Means don't take the node which has no other
neighbours.


• Choose any arbitrary node and PUSH (STATUS 2) it into stack. Then only we will POP.
When you POP (STATUS 3) and PUSH neighbours.

Threaded Binary Tree


• According to this idea we are going to replace all the null pointers by the appropriate
pointer values called threads.
• The maximum number of nodes with height h of a binary tree is 2h+1-1
• n0 is the number of leaf nodes and n2 the number of nodes of degree 2, then n0=n2+1

Inorder Traversal in TBT

98 | P a g e
• A / B * C * D + E

• n: number of nodes
• number of non-null links: n-1
• total links: 2n
• null links: 2n-(n-1)=n+1
• Replace these null pointers with some useful “threads”.
• A one-way threading and a two-way threading exist.

Threaded Binary Tree One-Way


• In the one way threading of T, a thread will appear in the right field of a node and will
point to the next node in the in-order traversal of T.

99 | P a g e

Threaded Binary Tree Two-Way


• If ptr->left_child is null, replace it with a pointer to the node that would be visited
before ptr in an inorder traversal (inorder predeccessor)
• If ptr->right_child is null, replace it with a pointer to the node that would be visited
after ptr in an inorder traversal (inorder successor)


• class Node {
• int data;
• Node *left_child, *right_child;
• boolean leftThread, rightThread;
• }

100 | P a g e

Inserting Node in TBT


• Inserting in the right side

101 | P a g e
• Inserting in the left side

AVL Trees
• [Link]
• Adelson-Velsky-Landis - one of many types of Balanced Binary Search Tree. O(log(n))
• Balanced Factor (BF): BF(node) = HEGHT([Link]) - HEIGH([Link])
• Where HEIGHT(x) is the hight of node x. Which is the number of edges between x and
the furthest leaf.
• -1, 0, +1 balanced factor values.

Insertion in AVL Tree

102 | P a g e

103 | P a g e


• Examples:

104 | P a g e
o

105 | P a g e
o

Deletion in AVL Tree


• We need rebalancing if needed after deletion: L rotation & R rotation
• R rotations
o R0 -> LL Case
o R1 -> LL case
o R-1 -> LR case
• L rotations
o L0 -> RR Case
o L1 -> RL Case
o L-1 -> RR Case
• Example R0:

106 | P a g e
• Example R1:

• Example R-1:

Huffman Encoding
• Fixed-Length encoding

• Variable-Length encoding

• Prefix rule - used to prevent ambiguities during decoding which states that no
binary code should be a prefix of another code.

o Bad Good
o a 0 a 0
o b 011 b 11
o c 111 c 101
o d 11 d 100
o

• Algorithm for creating the Huffman Tree:

o Step 1- Create a leaf node for each character and build a min heap using all the
nodes (The frequency value is used to compare two nodes in min heap)
o Step 2- Repeat Steps 3 to 5 while heap has more than one node
o Step 3- Extract two nodes, say x and y, with minimum frequency from the heap
o Step 4- Create a new internal node z with x as its left child and y as its right
child. Also frequency(z)= frequency(x)+frequency(y)
o Step 5- Add z to min heap

107 | P a g e
o Step 6- Last node in the heap is the root of Huffman tree

M-way trees
• [Link]
• Binary search tree is the binary tree.
• Each node has m children and m-1 key fields. The keys in each node are in ascending
order.
• A binary search tree has one value in each node and two subtrees. This notion easily
generalizes to an M-way search tree, which has (M-1) values per node and M subtrees.
• M is called the degree of the tree. A binary search tree, therefore, has degree 2.
• M is thus a fixed upper limit on how much data can be stored in a node.

B-Trees
Multiway Trees

A multiway tree is a tree that can have more than two children. A multiway tree of
order m (or an m-way tree) is one in which a tree can have m children.

As with the other trees that have been studied, the nodes in an m-way tree will be
made up of key fields, in this case m-1 key fields, and pointers to children.

multiway tree of order 5

108 | P a g e
To make the processing of m-way trees easier some type of order will be imposed
on the keys within each node, resulting in a multiway search tree of order m ( or
an m-way search tree). By definition an m-way search tree is a m-way tree in
which:

• Each node has m children and m-1 key fields


• The keys in each node are in ascending order.
• The keys in the first i children are smaller than the ith key
• The keys in the last m-i children are larger than the ith key

4-way search tree

109 | P a g e
M-way search trees give the same advantages to m-way trees that binary search trees
gave to binary trees - they provide fast information retrieval and update. However,
they also have the same problems that binary search trees had - they can become
unbalanced, which means that the construction of the tree becomes of vital
importance.

B-Trees
An extension of a multiway search tree of order m is a B-tree of order m. This type
of tree will be used when the data to be accessed/stored is located on secondary
storage devices because they allow for large amounts of data to be stored in a node.

A B-tree of order m is a multiway search tree in which:

1. The root has at least two subtrees unless it is the only node in the tree.
2. Each nonroot and each nonleaf node have at most m nonempty children and
at least m/2 nonempty children.
3. The number of keys in each nonroot and each nonleaf node is one less than
the number of its nonempty children.
4. All leaves are on the same level.

These restrictions make B-trees always at least half full, have few levels, and remain
perfectly balanced.

The nodes in a B-tree are usually implemented as a class that contains an array of
m-l cells for keys, an array of m pointers to other nodes, and whatever other
information is required in order to facilitate tree maintenance.
template <class T, int M>
class BTreeNode
{
public:
BTreeNode();
BTreeNode( const T & );

private:
T keys[M-1];
BTreeNode *pointers[M];
...
};

Searching a B-tree

An algorithm for finding a key in B-tree is simple. Start at the root and determine
which pointer to follow based on a comparison between the search value and key
fields in the root node. Follow the appropriate pointer to a child node. Examine the
key fields in the child node and continue to follow the appropriate pointers until the

110 | P a g e
search value is found or a leaf node is reached that doesn't contain the desired search
value.

Insertion into a B-tree

The condition that all leaves must be on the same level forces a characteristic
behavior of B-trees, namely that B-trees are not allowed to grow at the their leaves;
instead they are forced to grow at the root.

When inserting into a B-tree, a value is inserted directly into a leaf. This leads to
three common situations that can occur:

1. A key is placed into a leaf that still has room.


2. The leaf in which a key is to be placed is full.
3. The root of the B-tree is full.

Case 1: A key is placed into a leaf that still has room

This is the easiest of the cases to solve because the value is simply inserted into the
correct sorted position in the leaf node.

Inserting the number 7 results in:

Case 2: The leaf in which a key is to be placed is full

In this case, the leaf node where the value should be inserted is split in two, resulting
in a new leaf node. Half of the keys will be moved from the full leaf to the new leaf.
The new leaf is then incorporated into the B-tree.

111 | P a g e
The new leaf is incorporated by moving the middle value to the parent and a pointer
to the new leaf is also added to the parent. This process is continues up the tree until
all of the values have "found" a location.

Insert 6 into the following B-tree:

results in a split of the first leaf node:

The new node needs to be incorporated into the tree - this is accomplished by taking
the middle value and inserting it in the parent:

Case 3: The root of the B-tree is full

The upward movement of values from case 2 means that it's possible that a value
could move up to the root of the B-tree. If the root is full, the same basic process

112 | P a g e
from case 2 will be applied and a new root will be created. This type of split results
in 2 new nodes being added to the B-tree.

Inserting 13 into the following tree:

Results in:

The 15 needs to be moved to the root node but it is full. This means that the root
needs to be divided:

The 15 is inserted into the parent, which means that it becomes the new root node:

Deleting from a B-tree

As usual, this is the hardest of the processes to apply. The deletion process will
basically be a reversal of the insertion process - rather than splitting nodes, it's
possible that nodes will be merged so that B-tree properties, namely the requirement
that a node must be at least half full, can be maintained.

There are two main cases to be considered:

1. Deletion from a leaf


2. Deletion from a non-leaf

Case 1: Deletion from a leaf

113 | P a g e
1a) If the leaf is at least half full after deleting the desired value, the remaining larger
values are moved to "fill the gap".

Deleting 6 from the following tree:

results in:

1b) If the leaf is less than half full after deleting the desired value (known as
underflow), two things could happen:

Deleting 7 from the tree above results in:

1b-1) If there is a left or right sibling with the number of keys exceeding the
minimum requirement, all of the keys from the leaf and sibling will be redistributed
between them by moving the separator key from the parent to the leaf and moving
the middle key from the node and the sibling combined to the parent.

Now delete 8 from the tree:

114 | P a g e
1b-2) If the number of keys in the sibling does not exceed the minimum requirement,
then the leaf and sibling are merged by putting the keys from the leaf, the sibling,
and the separator from the parent into the leaf. The sibling node is discarded and the
keys in the parent are moved to "fill the gap". It's possible that this will cause the
parent to underflow. If that is the case, treat the parent as a leaf and continue
repeating step 1b-2 until the minimum requirement is met or the root of the tree is
reached.

Special Case for 1b-2: When merging nodes, if the parent is the root with only one
key, the keys from the node, the sibling, and the only key of the root are placed into
a node and this will become the new root for the B-tree. Both the sibling and the old
root will be discarded.

Case 2: Deletion from a non-leaf

This case can lead to problems with tree reorganization but it will be solved in a
manner similar to deletion from a binary search tree.

The key to be deleted will be replaced by its immediate predecessor (or successor)
and then the predecessor (or successor) will be deleted since it can only be found in
a leaf node.

Deleting 16 from the tree above results in:

115 | P a g e
The "gap" is filled in with the immediate predecessor:

and then the immediate predecessor is deleted:

If the immediate successor had been chosen as the replacement:

Deleting the successor results in:

The vales in the left sibling are combined with the separator key (18) and the
remaining values. They are divided between the 2 nodes:

and then the middle value is moved to the parent:

• Every node in a B-Tree contains at most m children. (other nodes beside root & leaf
must have at least m/2 children)
• All leaf nodes must be at the same level.
• Inserting

116 | P a g e
o Find the appropriate leaf node
o If the leaf node contain less than m-1 keys then insert the element in the
increasing order.
o Else if the leaf contains m-1:
▪ Insert the new element in the increasing order of elements.
▪ Split the node into the two nodes at the median.
▪ Push the median element upto its parent node.
▪ If the parent node also contain m-1 number of keys, then split it too by
following the same steps.

Introduction to Hashing – Data Structure and Algorithm


Tutorials
• Read

• Discuss

• Courses

• Practice


Hashing refers to the process of generating a fixed-size output from an input of
variable size using the mathematical formulas known as hash functions. This
technique determines an index or location for the storage of an item in a data
structure.

What is Hashing

Table of Contents/Roadmap
• What is Hashing

117 | P a g e
• Need for Hash data structure
• Components of Hashing
• How does Hashing work?
• What is a Hash function?
• Types of Hash functions:
• Properties of a Good hash function:
• Complexity of calculating hash value using the hash function
• Problem with Hashing:
• What is collision?
• How to handle Collisions?
• 1) Separate Chaining:
• 2) Open Addressing:
• 2.a) Linear Probing:
• 2.b) Quadratic Probing:
• 2.c) Double Hashing:
• What is meant by Load Factor in Hashing?
• What is Rehashing?
• Applications of Hash Data structure
• Real-Time Applications of Hash Data structure
• Advantages of Hash Data structure
• Disadvantages of Hash Data structure
• Conclusion
Need for Hash data structure
Every day, the data on the internet is increasing multifold and it is always a
struggle to store this data efficiently. In day-to-day programming, this amount of
data might not be that big, but still, it needs to be stored, accessed, and processed
easily and efficiently. A very common data structure that is used for such a
purpose is the Array data structure.
Now the question arises if Array was already there, what was the need for a new
data structure! The answer to this is in the word “efficiency“. Though storing in
Array takes O(1) time, searching in it takes at least O(log n) time. This time
appears to be small, but for a large data set, it can cause a lot of problems and
this, in turn, makes the Array data structure inefficient.
So now we are looking for a data structure that can store the data and search in it
in constant time, i.e. in O(1) time. This is how Hashing data structure came into
play. With the introduction of the Hash data structure, it is now possible to easily
store data in constant time and retrieve them in constant time as well.
Components of Hashing
There are majorly three components of hashing:
1. Key: A Key can be anything string or integer which is fed as input in the hash
function the technique that determines an index or location for storage of an
item in a data structure.
2. Hash Function: The hash function receives the input key and returns the
index of an element in an array called a hash table. The index is known as
the hash index.
118 | P a g e
3. Hash Table: Hash table is a data structure that maps keys to values using a
special function called a hash function. Hash stores the data in an associative
manner in an array where each data value has its own unique index.

Components of Hashing

How does Hashing work?


Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to store it in
a table.
Our main objective here is to search or update the values stored in the table
quickly in O(1) time and we are not concerned about the ordering of strings in
the table. So the given set of strings can act as a key and the string itself will act as
the value of the string but how to store the value corresponding to the key?
• Step 1: We know that hash functions (which is some mathematical formula)
are used to calculate the hash value which acts as the index of the data
structure where the value will be stored.
• Step 2: So, let’s assign
• “a” = 1,
• “b”=2, .. etc, to all alphabetical characters.
• Step 3: Therefore, the numerical value by summation of all characters of the
string:
• “ab” = 1 + 2 = 3,
• “cd” = 3 + 4 = 7 ,
• “efg” = 5 + 6 + 7 = 18
• Step 4: Now, assume that we have a table of size 7 to store these strings. The
hash function that is used here is the sum of the characters in key mod Table
size. We can compute the location of the string in the array by taking
the sum(string) mod 7.
• Step 5: So we will then store
• “ab” in 3 mod 7 = 3,

119 | P a g e
• “cd” in 7 mod 7 = 0, and
• “efg” in 18 mod 7 = 4.

Mapping key with indices of array

The above technique enables us to calculate the location of a given string by


using a simple hash function and rapidly find the value that is stored in that
location. Therefore the idea of hashing seems like a great way to store (key,
value) pairs of the data in a table.
What is a Hash function?
The hash function creates a mapping between key and value, this is done through
the use of mathematical formulas known as hash functions. The result of the hash
function is referred to as a hash value or hash. The hash value is a representation
of the original string of characters but usually smaller than the original.
For example: Consider an array as a Map where the key is the index and the value
is the value at that index. So for an array A if we have index i which will be
treated as the key then we can find the value by simply looking at the value at
A[i].
simply looking up A[i].
Types of Hash functions:
There are many hash functions that use numeric or alphanumeric keys. This
article focuses on discussing different hash functions:
1. Division Method.
2. Mid Square Method.
3. Folding Method.
4. Multiplication Method
Properties of a Good hash function
A hash function that maps every item into its own unique slot is known as a
perfect hash function. We can construct a perfect hash function if we know the
items and the collection will never change but the problem is that there is no
systematic way to construct a perfect hash function given an arbitrary collection
of items. Fortunately, we will still gain performance efficiency even if the hash
function isn’t perfect. We can achieve a perfect hash function by increasing the
size of the hash table so that every possible value can be accommodated. As a
result, each item will have a unique slot. Although this approach is feasible for a
small number of items, it is not practical when the number of possibilities is
large.
So, We can construct our hash function to do the same but the things that we
must be careful about while constructing our own hash function.
A good hash function should have the following properties:
1. Efficiently computable.

120 | P a g e
2. Should uniformly distribute the keys (Each table position is equally likely for
each.
3. Should minimize collisions.
4. Should have a low load factor(number of items in the table divided by the size
of the table).
Complexity of calculating hash value using the hash function
• Time complexity: O(n)
• Space complexity: O(1)
Problem with Hashing
If we consider the above example, the hash function we used is the sum of the
letters, but if we examined the hash function closely then the problem can be
easily visualized that for different strings same hash value is begin generated by
the hash function.
For example: {“ab”, “ba”} both have the same hash value, and string {“cd”,”be”}
also generate the same hash value, etc. This is known as collision and it creates
problem in searching, insertion, deletion, and updating of value.
What is collision?
The hashing process generates a small number for a big key, so there is a
possibility that two keys could produce the same value. The situation where the
newly inserted key maps to an already occupied, and it must be handled using
some collision handling technology.

What is Collision in Hashing

How to handle Collisions?


There are mainly two methods to handle collision:
1. Separate Chaining:
2. Open Addressing:

121 | P a g e
Collision resolution technique

1) Separate Chaining
The idea is to make each cell of the hash table point to a linked list of records that
have the same hash function value. Chaining is simple but requires additional
memory outside the table.
Example: We have given a hash function and we have to insert some elements in
the hash table using a separate chaining method for collision resolution
technique.
Hash function = key % 5,
Elements = 12, 15, 22, 25 and 37.
Let’s see step by step approach to how to solve the above problem:
• Step 1: First draw the empty hash table which will have a possible range of
hash values from 0 to 4 according to the hash function provided.

122 | P a g e
Hash table

• Step 2: Now insert all the keys in the hash table one by one. The first key to be
inserted is 12 which is mapped to bucket number 2 which is calculated by
using the hash function 12%5=2.

Insert 12

• Step 3: Now the next key is 22. It will map to bucket number 2 because
22%5=2. But bucket 2 is already occupied by key 12.

123 | P a g e
Insert 22

• Step 4: The next key is 15. It will map to slot number 0 because 15%5=0.

Insert 15

• Step 5: Now the next key is 25. Its bucket number will be 25%5=0. But bucket
0 is already occupied by key 25. So separate chaining method will again
handle the collision by creating a linked list to bucket 0.

124 | P a g e
Insert 25

Hence In this way, the separate chaining method is used as the collision
resolution technique.
2) Open Addressing
In open addressing, all elements are stored in the hash table itself. Each table
entry contains either a record or NIL. When searching for an element, we
examine the table slots one by one until the desired element is found or it is clear
that the element is not in the table.
2.a) Linear Probing
In linear probing, the hash table is searched sequentially that starts from the
original location of the hash. If in case the location that we get is already
occupied, then we check for the next location.
Algorithm:
1. Calculate the hash key. i.e. key = data % size
2. Check, if hashTable[key] is empty
• store the value directly by hashTable[key] = data
3. If the hash index already has some value then
• check for next index using key = (key+1) % size
4. Check, if the next index is available hashTable[key] then store the value.
Otherwise try for next index.
5. Do the above process till we find the space.
Example: Let us consider a simple hash function as “key mod 5” and a sequence
of keys that are to be inserted are 50, 70, 76, 85, 93.
• Step 1: First draw the empty hash table which will have a possible range of
hash values from 0 to 4 according to the hash function provided.

125 | P a g e
Hash table

• Step 2: Now insert all the keys in the hash table one by one. The first key is
50. It will map to slot number 0 because 50%5=0. So insert it into slot number
0.

Insert 50 into hash table

• Step 3: The next key is 70. It will map to slot number 0 because 70%5=0 but
50 is already at slot number 0 so, search for the next empty slot and insert it.

126 | P a g e
Insert 70 into hash table

• Step 4: The next key is 76. It will map to slot number 1 because 76%5=1 but
70 is already at slot number 1 so, search for the next empty slot and insert it.

Insert 76 into hash table

• Step 5: The next key is 93 It will map to slot number 3 because 93%5=3, So
insert it into slot number 3.

127 | P a g e
Insert 93 into hash table

2.b) Quadratic Probing


Quadratic probing is an open addressing scheme in computer programming for
resolving hash collisions in hash tables. Quadratic probing operates by taking the
original hash index and adding successive values of an arbitrary quadratic
polynomial until an open slot is found.
An example sequence using quadratic probing is:
H + 12, H + 22, H + 32, H + 42…………………. H + k2
This method is also known as the mid-square method because in this method we
look for i2‘th probe (slot) in i’th iteration and the value of i = 0, 1, . . . n – 1. We
always start from the original hash location. If only the location is occupied then
we check the other slots.
Let hash(x) be the slot index computed using the hash function and n be the size
of the hash table.
If the slot hash(x) % n is full, then we try (hash(x) + 12) % n.
If (hash(x) + 12) % n is also full, then we try (hash(x) + 22) % n.
If (hash(x) + 22) % n is also full, then we try (hash(x) + 32) % n.
This process will be repeated for all the values of i until an empty slot is found
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and
collision resolution strategy to be f(i) = i2 . Insert = 22, 30, and 50
• Step 1: Create a table of size 7.

128 | P a g e
Hash table

• Step 2 – Insert 22 and 30


• Hash(25) = 22 % 7 = 1, Since the cell at index 1 is empty, we can
easily insert 22 at slot 1.
• Hash(30) = 30 % 7 = 2, Since the cell at index 2 is empty, we can
easily insert 30 at slot 2.

Insert key 22 and 30 in the hash table

129 | P a g e
• Step 3: Inserting 50
• Hash(25) = 50 % 7 = 1
• In our hash table slot 1 is already occupied. So, we will search for
slot 1+12, i.e. 1+1 = 2,
• Again slot 2 is found occupied, so we will search for cell 1+22, i.e.1+4
= 5,
• Now, cell 5 is not occupied so we will place 50 in slot 5.

Insert key 50 in the hash table

2.c) Double Hashing


Double hashing is a collision resolving technique in Open Addressed Hash tables.
Double hashing make use of two hash function,
• The first hash function is h1(k) which takes the key and gives out a location
on the hash table. But if the new location is not occupied or empty then we
can easily place our key.
• But in case the location is occupied (collision) we will use secondary hash-
function h2(k) in combination with the first hash-function h1(k) to find the
new location on the hash table.
This combination of hash functions is of the form
h(k, i) = (h1(k) + i * h2(k)) % n
where
• i is a non-negative integer that indicates a collision number,
• k = element/key which is being hashed
• n = hash table size.
Complexity of the Double hashing algorithm:
Time complexity: O(n)

130 | P a g e
Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first
hash-function is h1(k) = k mod 7 and second hash-function is h2(k) = 1 + (k
mod 5)
• Step 1: Insert 27
• 27 % 7 = 6, location 6 is empty so insert 27 into 6 slot.

Insert key 27 in the hash table

• Step 2: Insert 43
• 43 % 7 = 1, location 1 is empty so insert 43 into 1 slot.

Insert key 43 in the hash table

131 | P a g e
• Step 3: Insert 692
• 692 % 7 = 6, but location 6 is already being occupied and this is a
collision
• So we need to resolve this collision using double hashing.
hnew = [h1(692) + i * (h2(692)] % 7
= [6 + 1 * (1 + 692 % 5)] % 7
= 9 % 7
= 2

Now, as 2 is an empty slot,


so we can insert 692 into 2nd slot.

Insert key 692 in the hash table

• Step 4: Insert 72
• 72 % 7 = 2, but location 2 is already being occupied and this is a
collision.
• So we need to resolve this collision using double hashing.
hnew = [h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
= 5 % 7
= 5,

Now, as 5 is an empty slot,


so we can insert 72 into 5th slot.

132 | P a g e
Insert key 72 in the hash table

What is meant by Load Factor in Hashing?


The load factor of the hash table can be defined as the number of items the hash
table contains divided by the size of the hash table. Load factor is the decisive
parameter that is used when we want to rehash the previous hash function or
want to add more elements to the existing hash table.
It helps us in determining the efficiency of the hash function i.e. it tells whether
the hash function which we are using is distributing the keys uniformly or not in
the hash table.
Load Factor = Total elements in hash table/ Size of hash table
What is Rehashing?
As the name suggests, rehashing means hashing again. Basically, when the load
factor increases to more than its predefined value (the default value of the load
factor is 0.75), the complexity increases. So to overcome this, the size of the array
is increased (doubled) and all the values are hashed again and stored in the new
double-sized array to maintain a low load factor and low complexity.
Applications of Hash Data structure
• Hash is used in databases for indexing.
• Hash is used in disk-based data structures.
• In some programming languages like Python, JavaScript hash is used to
implement objects.
Real-Time Applications of Hash Data structure
• Hash is used for cache mapping for fast access to the data.
• Hash can be used for password verification.
• Hash is used in cryptography as a message digest.

133 | P a g e
• Rabin-Karp algorithm for pattern matching in a string.
• Calculating the number of different substrings of a string.
Advantages of Hash Data structure
• Hash provides better synchronization than other data structures.
• Hash tables are more efficient than search trees or other data structures
• Hash provides constant time for searching, insertion, and deletion operations
on average.
Disadvantages of Hash Data structure
• Hash is inefficient when there are many collisions.
• Hash collisions are practically not avoided for a large set of possible keys.
• Hash does not allow null values.
Conclusion
From the above discussion, we conclude that the goal of hashing is to resolve the
challenge of finding an item quickly in a collection. For example, if we have a list
of millions of English words and we wish to find a particular term then we would
use hashing to locate and find it more efficiently. It would be inefficient to check
each item on the millions of lists until we find a match. Hashing reduces search
time by restricting the search to a smaller set of words at the beginning.
Hashing is a technique or process of mapping keys, and values into the hash table by
using a hash function. It is done for faster access to elements. The efficiency of mapping
depends on the efficiency of the hash function used.
Let a hash function H(x) maps the value x at the index x%10 in an Array. For example if
the list of values is [11,12,13,14,15] it will be stored at positions {1,2,3,4,5} in the array or
Hash table respectively.

Hashing Data Structure

Index Mapping (or Trivial Hashing) with negatives allowed


• Read

134 | P a g e
• Discuss

• Courses

• Practice


Index Mapping (also known as Trivial Hashing) is a simple form of hashing where
the data is directly mapped to an index in a hash table. The hash function used in
this method is typically the identity function, which maps the input data to itself.
In this case, the key of the data is used as the index in the hash table, and the
value is stored at that index.
For example, if we have a hash table of size 10 and we want to store the value
“apple” with the key “a”, the trivial hashing function would simply map the key
“a” to the index “a” in the hash table, and store the value “apple” at that index.
One of the main advantages of Index Mapping is its simplicity. The hash function
is easy to understand and implement, and the data can be easily retrieved using
the key. However, it also has some limitations. The main disadvantage is that it
can only be used for small data sets, as the size of the hash table has to be the
same as the number of keys. Additionally, it doesn’t handle collisions, so if two
keys map to the same index, one of the data will be overwritten.
Given a limited range array contains both positive and non-positive numbers, i.e.,
elements are in the range from -MAX to +MAX. Our task is to search if some
number is present in the array or not in O(1) time.
Since the range is limited, we can use index mapping (or trivial hashing). We use
values as the index in a big array. Therefore we can search and insert elements in
O(1) time.

135 | P a g e
How to handle negative numbers?
The idea is to use a 2D array of size hash[MAX+1][2]
Algorithm:
Assign all the values of the hash matrix as 0.
Traverse the given array:
• If the element ele is non negative assign
• hash[ele][0] as 1.
• Else take the absolute value of ele and
• assign hash[ele][1] as 1.
To search any element x in the array.
• If X is non-negative check if hash[X][0] is 1 or not. If hash[X][0] is one then the
number is present else not present.
• If X is negative take the absolute value of X and then check if hash[X][1] is 1 or
not. If hash[X][1] is one then the number is present
Below is the implementation of the above idea.

• C++
• Java
• Python3
• C#
• Javascript
pag

Output
Present
Time Complexity: The time complexity of the above algorithm is O(N), where N
is the size of the given array.
Space Complexity: The space complexity of the above algorithm
is O(N), because we are using an array of max size.
This article is contributed by ShivamKD. If you like GeeksforGeeks and would
like to contribute, you can also write an article using [Link] or
mail your article to review-team@[Link]. See your article appearing
on the GeeksforGeeks main page and help other Geeks.
Separate Chaining Collision Handling Technique in Hashing
• Read

• Discuss

• Courses

• Practice

136 | P a g e
• Video

What is Collision?
Since a hash function gets us a small number for a key which is a big integer or
string, there is a possibility that two keys result in the same value. The situation
where a newly inserted key maps to an already occupied slot in the hash table is
called collision and must be handled using some collision handling technique.

What are the chances of collisions with the large


table?
Collisions are very likely even if we have a big table to store keys. An important
observation is Birthday Paradox. With only 23 persons, the probability that two
people have the same birthday is 50%.
How to handle Collisions?
There are mainly two methods to handle collision:
• Separate Chaining
• Open Addressing
In this article, only separate chaining is discussed. We will be discussing Open
addressing in the next post

Separate Chaining:
The idea behind separate chaining is to implement the array as a linked list called
a chain. Separate chaining is one of the most popular and commonly used
techniques in order to handle collisions.
The linked list data structure is used to implement this technique. So what happens
is, when multiple elements are hashed into the same slot index, then these elements
are inserted into a singly-linked list which is known as a chain.
Here, all those elements that hash into the same slot index are inserted into a
linked list. Now, we can use a key K to search in the linked list by just linearly
traversing. If the intrinsic key for any entry is equal to K then it means that we
have found our entry. If we have reached the end of the linked list and yet we
haven’t found our entry then it means that the entry does not exist. Hence, the
conclusion is that in separate chaining, if two different elements have the same
hash value then we store both the elements in the same linked list one after the
other.
Example: Let us consider a simple hash function as “key mod 7” and a sequence
of keys as 50, 700, 76, 85, 92, 73, 101

137 | P a g e
You can refer to the following link in order to understand how to implement
separate chaining with C++.
C++ program for hashing with chaining
Advantages:
• Simple to implement.
• Hash table never fills up, we can always add more elements to the chain.
• Less sensitive to the hash function or load factors.
• It is mostly used when it is unknown how many and how frequently keys may
be inserted or deleted.
Disadvantages:
• The cache performance of chaining is not good as keys are stored using a
linked list. Open addressing provides better cache performance as everything
is stored in the same table.
• Wastage of Space (Some Parts of the hash table are never used)
• If the chain becomes long, then search time can become O(n) in the worst case
• Uses extra space for links
Performance of Chaining:
Performance of hashing can be evaluated under the assumption that each key is
equally likely to be hashed to any slot of the table (simple uniform hashing).
m = Number of slots in hash table
n = Number of keys to be inserted in hash table

138 | P a g e
Load factor α = n/m
Expected time to search = O(1 + α)
Expected time to delete = O(1 + α)
Time to insert = O(1)
Time complexity of search insert and delete is O(1) if α is O(1)

Data Structures For Storing Chains:


1. Linked lists
• Search: O(l) where l = length of linked list
• Delete: O(l)
• Insert: O(l)
• Not cache friendly
2. Dynamic Sized Arrays ( Vectors in C++, ArrayList in Java, list in Python)
• Search: O(l) where l = length of array
• Delete: O(l)
• Insert: O(l)
• Cache friendly
3. Self Balancing BST ( AVL Trees, Red-Black Trees)
• Search: O(log(l)) where l = length of linked list
• Delete: O(log(l))
• Insert: O(l)
• Not cache friendly
• Java 8 onwards use this for HashMap
Open Addressing Collision Handling technique in Hashing
• Read

• Discuss(20+)

• Courses

• Practice

• Video

Open Addressing:
Like separate chaining, open addressing is a method for handling collisions. In
Open Addressing, all elements are stored in the hash table itself. So at any point,
the size of the table must be greater than or equal to the total number of keys
(Note that we can increase table size by copying old data if needed). This
approach is also known as closed hashing. This entire procedure is based upon
probing. We will understand the types of probing ahead:
• Insert(k): Keep probing until an empty slot is found. Once an empty slot is
found, insert k.

139 | P a g e
• Search(k): Keep probing until the slot’s key doesn’t become equal to k or an
empty slot is reached.
• Delete(k): Delete operation is interesting. If we simply delete a key, then the
search may fail. So slots of deleted keys are marked specially as “deleted”.
The insert can insert an item in a deleted slot, but the search doesn’t stop at a
deleted slot.
Different ways of Open Addressing:
1. Linear Probing:
In linear probing, the hash table is searched sequentially that starts from the
original location of the hash. If in case the location that we get is already
occupied, then we check for the next location.
The function used for rehashing is as follows: rehash(key) = (n+1)%table-size.
For example, The typical gap between two probes is 1 as seen in the example
below:
Let hash(x) be the slot index computed using a hash function and S be the table
size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
…………………………………………..
…………………………………………..
Let us consider a simple hash function as “key mod 7” and a sequence of keys as 50,
700, 76, 85, 92, 73, 101,
which means hash(key)= key% S, here S=size of the table =7,indexed from 0 to [Link]
can define the hash function as per our choice if we want to create a hash
table,although it is fixed internally with a pre-defined formula.

140 | P a g e
Applications of linear probing:
Linear probing is a collision handling technique used in hashing, where the
algorithm looks for the next available slot in the hash table to store the collided
key. Some of the applications of linear probing include:
• Symbol tables: Linear probing is commonly used in symbol tables, which are
used in compilers and interpreters to store variables and their associated
values. Since symbol tables can grow dynamically, linear probing can be used
to handle collisions and ensure that variables are stored efficiently.
• Caching: Linear probing can be used in caching systems to store frequently
accessed data in memory. When a cache miss occurs, the data can be loaded
into the cache using linear probing, and when a collision occurs, the next
available slot in the cache can be used to store the data.
• Databases: Linear probing can be used in databases to store records and
their associated keys. When a collision occurs, linear probing can be used to
find the next available slot to store the record.
• Compiler design: Linear probing can be used in compiler design to
implement symbol tables, error recovery mechanisms, and syntax analysis.
• Spell checking: Linear probing can be used in spell-checking software to
store the dictionary of words and their associated frequency counts. When a
collision occurs, linear probing can be used to store the word in the next
available slot.
Overall, linear probing is a simple and efficient method for handling collisions in
hash tables, and it can be used in a variety of applications that require efficient
storage and retrieval of data.

141 | P a g e
Challenges in Linear Probing :
• Primary Clustering: One of the problems with linear probing is Primary
clustering, many consecutive elements form groups and it starts taking time
to find a free slot or to search for an element.
• Secondary Clustering: Secondary clustering is less severe, two records only
have the same collision chain (Probe Sequence) if their initial position is the
same.
Example: Let us consider a simple hash function as “key mod 5” and a sequence
of keys that are to be inserted are 50, 70, 76, 93.
• Step1: First draw the empty hash table which will have a possible range of
hash values from 0 to 4 according to the hash function provided.

Hash table

• Step 2: Now insert all the keys in the hash table one by one. The first key is
50. It will map to slot number 0 because 50%5=0. So insert it into slot number
0.

142 | P a g e
Insert 50 into hash table

• Step 3: The next key is 70. It will map to slot number 0 because 70%5=0 but
50 is already at slot number 0 so, search for the next empty slot and insert it.

Insert 70 into hash table

• Step 4: The next key is 76. It will map to slot number 1 because 76%5=1 but
70 is already at slot number 1 so, search for the next empty slot and insert it.

143 | P a g e
Insert 76 into hash table

• Step 5: The next key is 93 It will map to slot number 3 because 93%5=3, So
insert it into slot number 3.

Insert 93 into hash table

2. Quadratic Probing
If you observe carefully, then you will understand that the interval between
probes will increase proportionally to the hash value. Quadratic probing is a
method with the help of which we can solve the problem of clustering that was
discussed above. This method is also known as the mid-square method. In this
method, we look for the i2‘th slot in the ith iteration. We always start from the

144 | P a g e
original hash location. If only the location is occupied then we check the other
slots.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
…………………………………………..
…………………………………………..
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and
collision resolution strategy to be f(i) = i2 . Insert = 22, 30, and 50.
• Step 1: Create a table of size 7.

Hash table

• Step 2 – Insert 22 and 30


• Hash(22) = 22 % 7 = 1, Since the cell at index 1 is empty, we can
easily insert 22 at slot 1.
• Hash(30) = 30 % 7 = 2, Since the cell at index 2 is empty, we can
easily insert 30 at slot 2.

145 | P a g e
Insert keys 22 and 30 in the hash table

• Step 3: Inserting 50
• Hash(50) = 50 % 7 = 1
• In our hash table slot 1 is already occupied. So, we will search for
slot 1+12, i.e. 1+1 = 2,
• Again slot 2 is found occupied, so we will search for cell 1+22, i.e.1+4
= 5,
• Now, cell 5 is not occupied so we will place 50 in slot 5.

146 | P a g e
Insert key 50 in the hash table

3. Double Hashing
The intervals that lie between probes are computed by another hash function.
Double hashing is a technique that reduces clustering in an optimized way. In this
technique, the increments for the probing sequence are computed by using
another hash function. We use another hash function hash2(x) and look for the
i*hash2(x) slot in the ith rotation.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
…………………………………………..
…………………………………………..
Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first
hash-function is h1(k) = k mod 7 and second hash-function is h2(k) = 1 + (k
mod 5)
• Step 1: Insert 27
• 27 % 7 = 6, location 6 is empty so insert 27 into 6 slot.

Insert key 27 in the hash table

• Step 2: Insert 43
• 43 % 7 = 1, location 1 is empty so insert 43 into 1 slot.

147 | P a g e
Insert key 43 in the hash table

• Step 3: Insert 692


• 692 % 7 = 6, but location 6 is already being occupied and this is a
collision
• So we need to resolve this collision using double hashing.
hnew = [h1(692) + i * (h2(692)] % 7
= [6 + 1 * (1 + 692 % 5)] % 7
= 9% 7
=2
Now, as 2 is an empty slot,
so we can insert 692 into 2nd slot.

148 | P a g e
Insert key 692 in the hash table

• Step 4: Insert 72
• 72 % 7 = 2, but location 2 is already being occupied and this is a
collision.
• So we need to resolve this collision using double hashing.
hnew = [h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
=5%7
= 5,
Now, as 5 is an empty slot,
so we can insert 72 into 5th slot.

149 | P a g e
Insert key 72 in the hash table

See this for step-by-step diagrams:


Comparison of the above three:
Open addressing is a collision handling technique used in hashing where, when a
collision occurs (i.e., when two or more keys map to the same slot), the algorithm
looks for another empty slot in the hash table to store the collided key.
• In linear probing, the algorithm simply looks for the next available slot in the
hash table and places the collided key there. If that slot is also occupied, the
algorithm continues searching for the next available slot until an empty slot is
found. This process is repeated until all collided keys have been stored. Linear
probing has the best cache performance but suffers from clustering. One more
advantage of Linear probing is easy to compute.
• In quadratic probing, the algorithm searches for slots in a more spaced-out
manner. When a collision occurs, the algorithm looks for the next slot using an
equation that involves the original hash value and a quadratic function. If that
slot is also occupied, the algorithm increments the value of the quadratic
function and tries again. This process is repeated until an empty slot is found.
Quadratic probing lies between the two in terms of cache performance and
clustering.
• In double hashing, the algorithm uses a second hash function to determine
the next slot to check when a collision occurs. The algorithm calculates a hash
value using the original hash function, then uses the second hash function to
calculate an offset. The algorithm then checks the slot that is the sum of the
original hash value and the offset. If that slot is occupied, the algorithm
increments the offset and tries again. This process is repeated until an empty
slot is found. Double hashing has poor cache performance but no clustering.
150 | P a g e
Double hashing requires more computation time as two hash functions need
to be computed.
The choice of collision handling technique can have a significant impact on the
performance of a hash table. Linear probing is simple and fast, but it can lead to
clustering (i.e., a situation where keys are stored in long contiguous runs) and
can degrade performance. Quadratic probing is more spaced out, but it can also
lead to clustering and can result in a situation where some slots are never
checked. Double hashing is more complex, but it can lead to more even
distribution of keys and can provide better performance in some cases.

[Link]. Separate Chaining Open Addressing

Open Addressing requires more


1. Chaining is Simpler to implement.
computation.

In chaining, Hash table never fills up,


In open addressing, table may
2. we can always add more elements to
become full.
chain.

Open addressing requires extra


Chaining is Less sensitive to the hash
3. care to avoid clustering and load
function or load factors.
factor.

Chaining is mostly used when it is


Open addressing is used when the
unknown how many and how
4. frequency and number of keys is
frequently keys may be inserted or
known.
deleted.

Cache performance of chaining is not Open addressing provides better


5. good as keys are stored using linked cache performance as everything is
list. stored in the same table.

In Open addressing, a slot can be


Wastage of Space (Some Parts of hash
6. used even if an input doesn’t map
table in chaining are never used).
to it.

151 | P a g e
[Link]. Separate Chaining Open Addressing

7. Chaining uses extra space for links. No links in Open addressing

Note: Cache performance of chaining is not good because when we traverse a


Linked List, we are basically jumping from one node to another, all across the
computer’s memory. For this reason, the CPU cannot cache the nodes which
aren’t visited yet, this doesn’t help us. But with Open Addressing, data isn’t
spread, so if the CPU detects that a segment of memory is constantly being
accessed, it gets cached for quick access.
Performance of Open Addressing:
Like Chaining, the performance of hashing can be evaluated under the
assumption that each key is equally likely to be hashed to any slot of the table
(simple uniform hashing)
m = Number of slots in the hash table
n = Number of keys to be inserted in the hash table
Load factor α = n/m ( < 1 )
Expected time to search/insert/delete < 1/(1 – α)
So Search, Insert and Delete take (1/(1 – α)) time
Double hashing is a collision resolution technique used in hash tables. It
works by using two hash functions to compute two different hash values for a
given key. The first hash function is used to compute the initial hash value,
and the second hash function is used to compute the step size for the
probing sequence.
Double hashing has the ability to have a low collision rate, as it uses two
hash functions to compute the hash value and the step size. This means that
the probability of a collision occurring is lower than in other collision
resolution techniques such as linear probing or quadratic probing.
However, double hashing has a few drawbacks. First, it requires the use of
two hash functions, which can increase the computational complexity of the
insertion and search operations. Second, it requires a good choice of hash
functions to achieve good performance. If the hash functions are not well-
designed, the collision rate may still be high.
Advantages of Double hashing
• The advantage of Double hashing is that it is one of the best forms of
probing, producing a uniform distribution of records throughout a hash
table.
• This technique does not yield any clusters.
• It is one of the effective methods for resolving collisions.

152 | P a g e
Double hashing can be done using :
(hash1(key) + i * hash2(key)) % TABLE_SIZE
Here hash1() and hash2() are hash functions and TABLE_SIZE
is size of hash table.
(We repeat by increasing i when collision occurs)
Method 1: First hash function is typically hash1(key) = key % TABLE_SIZE
A popular second hash function is hash2(key) = PRIME – (key %
PRIME) where PRIME is a prime smaller than the TABLE_SIZE.
A good second Hash function is:
• It must never evaluate to zero
• Just make sure that all cells can be probed

Below is the implementation of the above approach:

• CPP
• Python3
• C#
• Javascript

/*
** Handling of collision via open addressing

153 | P a g e
** Method for Probing: Double Hashing
*/

#include <iostream>
#include <vector>
#include <bitset>
using namespace std;
#define MAX_SIZE 10000001ll

class doubleHash {

int TABLE_SIZE, keysPresent, PRIME;


vector<int> hashTable;
bitset<MAX_SIZE> isPrime;

/* Function to set sieve of Eratosthenes. */


void __setSieve(){
isPrime[0] = isPrime[1] = 1;
for(long long i = 2; i*i <= MAX_SIZE; i++)
if(isPrime[i] == 0)
for(long long j = i*i; j <= MAX_SIZE; j += i)
isPrime[j] = 1;

int inline hash1(int value){


return value%TABLE_SIZE;
}

int inline hash2(int value){


return PRIME - (value%PRIME);
}

bool inline isFull(){


return (TABLE_SIZE == keysPresent);
}

public:

doubleHash(int n){
__setSieve();
TABLE_SIZE = n;

/* Find the largest prime number smaller than hash table's size. */
PRIME = TABLE_SIZE - 1;
while(isPrime[PRIME] == 1)
PRIME--;

keysPresent = 0;

154 | P a g e
/* Fill the hash table with -1 (empty entries). */
for(int i = 0; i < TABLE_SIZE; i++)
hashTable.push_back(-1);
}

void __printPrime(long long n){


for(long long i = 0; i <= n; i++)
if(isPrime[i] == 0)
cout<<i<<", ";
cout<<endl;
}

/* Function to insert value in hash table */


void insert(int value){

if(value == -1 || value == -2){


cout<<("ERROR : -1 and -2 can't be inserted in the table\n");
}

if(isFull()){
cout<<("ERROR : Hash Table Full\n");
return;
}

int probe = hash1(value), offset = hash2(value); // in linear probing offset = 1;

while(hashTable[probe] != -1){
if(-2 == hashTable[probe])
break; // insert at deleted element's location
probe = (probe+offset) % TABLE_SIZE;
}

hashTable[probe] = value;
keysPresent += 1;
}

void erase(int value){


/* Return if element is not present */
if(!search(value))
return;

int probe = hash1(value), offset = hash2(value);

while(hashTable[probe] != -1)
if(hashTable[probe] == value){
hashTable[probe] = -2; // mark element as deleted (rather than unvisited(-1)).
keysPresent--;
return;

155 | P a g e
}
else
probe = (probe + offset) % TABLE_SIZE;

bool search(int value){


int probe = hash1(value), offset = hash2(value), initialPos = probe;
bool firstItr = true;

while(1){
if(hashTable[probe] == -1) // Stop search if -1 is encountered.
break;
else if(hashTable[probe] == value) // Stop search after finding the element.
return true;
else if(probe == initialPos && !firstItr) // Stop search if one complete traversal of hash table is
completed.
return false;
else
probe = ((probe + offset) % TABLE_SIZE); // if none of the above cases occur then update the index and
check at it.

firstItr = false;
}
return false;
}

/* Function to display the hash table. */


void print(){
for(int i = 0; i < TABLE_SIZE; i++)
cout<<hashTable[i]<<", ";
cout<<"\n";
}

};

int main(){
doubleHash myHash(13); // creates an empty hash table of size 13

/* Inserts random element in the hash table */

int insertions[] = {115, 12, 87, 66, 123},


n1 = sizeof(insertions)/sizeof(insertions[0]);

for(int i = 0; i < n1; i++)


[Link](insertions[i]);

cout<< "Status of hash table after initial insertions : "; [Link]();

156 | P a g e
/*
** Searches for random element in the hash table,
** and prints them if found.
*/

int queries[] = {1, 12, 2, 3, 69, 88, 115},


n2 = sizeof(queries)/sizeof(queries[0]);

cout<<"\n"<<"Search operation after insertion : \n";

for(int i = 0; i < n2; i++)


if([Link](queries[i]))
cout<<queries[i]<<" present\n";

/* Deletes random element from the hash table. */

int deletions[] = {123, 87, 66},


n3 = sizeof(deletions)/sizeof(deletions[0]);

for(int i = 0; i < n3; i++)


[Link](deletions[i]);

cout<< "Status of hash table after deleting elements : "; [Link]();

return 0;
}

Output
Status of hash table after initial insertions : -1, 66, -1, -1, -1,
-1, 123, -1, -1, 87, -1, 115, 12,

Search operation after insertion :


12 present
115 present
Status of hash table after deleting elements : -1, -2, -1, -1, -1,
-1, -2, -1, -1, -2, -1, 115, 12,
Time Complexity:
• Insertion: O(n)
• Search: O(n)
• Deletion: O(n)
Auxiliary Space: O(size of the hash table).
Load Factor and Rehashing

157 | P a g e
M
me_l

• Read

• Discuss

• Courses

• Practice


Prerequisites: Hashing Introduction and Collision handling by separate chaining

How hashing works:

For insertion of a key(K) – value(V) pair into a hash map, 2 steps are required:

1. K is converted into a small integer (called its hash code) using a hash function.
2. The hash code is used to find an index (hashCode % arrSize) and the entire
linked list at that index(Separate chaining) is first searched for the presence
of the K already.
3. If found, it’s value is updated and if not, the K-V pair is stored as a new node in
the list.

Complexity and Load Factor

• For the first step, the time taken depends on the K and the hash function.
For example, if the key is a string “abcd”, then it’s hash function may depend
on the length of the string. But for very large values of n, the number of
entries into the map, and length of the keys is almost negligible in comparison
to n so hash computation can be considered to take place in constant time,
i.e, O(1).
• For the second step, traversal of the list of K-V pairs present at that index
needs to be done. For this, the worst case may be that all the n entries are at
the same index. So, time complexity would be O(n). But, enough research has
been done to make hash functions uniformly distribute the keys in the array
so this almost never happens.
• So, on an average, if there are n entries and b is the size of the array there
would be n/b entries on each index. This value n/b is called the load
factor that represents the load that is there on our map.

158 | P a g e
• This Load Factor needs to be kept low, so that number of entries at one index
is less and so is the complexity almost constant, i.e., O(1).

Rehashing:

Rehashing is the process of increasing the size of a hashmap and redistributing


the elements to new buckets based on their new hash values. It is done to
improve the performance of the hashmap and to prevent collisions caused by a
high load factor.
When a hashmap becomes full, the load factor (i.e., the ratio of the number of
elements to the number of buckets) increases. As the load factor increases, the
number of collisions also increases, which can lead to poor performance. To
avoid this, the hashmap can be resized and the elements can be rehashed to new
buckets, which decreases the load factor and reduces the number of collisions.
During rehashing, all elements of the hashmap are iterated and their new bucket
positions are calculated using the new hash function that corresponds to the new
size of the hashmap. This process can be time-consuming but it is necessary to
maintain the efficiency of the hashmap.
Why rehashing?
Rehashing is needed in a hashmap to prevent collision and to maintain the
efficiency of the data structure.
As elements are inserted into a hashmap, the load factor (i.e., the ratio of the
number of elements to the number of buckets) increases. If the load factor
exceeds a certain threshold (often set to 0.75), the hashmap becomes inefficient
as the number of collisions increases. To avoid this, the hashmap can be resized
and the elements can be rehashed to new buckets, which decreases the load
factor and reduces the number of collisions. This process is known as rehashing.
Rehashing can be costly in terms of time and space, but it is necessary to
maintain the efficiency of the hashmap.
How Rehashing is done?
Rehashing can be done as follows:
• For each addition of a new entry to the map, check the load factor.
• If it’s greater than its pre-defined value (or default value of 0.75 if not given),
then Rehash.
• For Rehash, make a new array of double the previous size and make it the
new bucketarray.
• Then traverse to each element in the old bucketArray and call the insert() for
each so as to insert it into the new larger bucket array.

Program to implement Rehashing:

159 | P a g e
• C++
• Java
• Python3
• C#
• Javascript
#include <iostream>
#include <vector>
#include <functional>

class Map {

private:
class MapNode {
public:
int key;
int value;
MapNode* next;

MapNode(int key, int value) {


this->key = key;
this->value = value;
this->next = NULL;
}
};

// The bucket array where


// the nodes containing K-V pairs are stored
std::vector<MapNode*> buckets;

// No. of pairs stored - n


int size;

// Size of the bucketArray - b


int numBuckets;

// Default loadFactor
double DEFAULT_LOAD_FACTOR = 0.75;

int getBucketInd(int key) {


// Using the inbuilt function from the object class
int hashCode = std::hash<int>()(key);

// array index = hashCode%numBuckets


return (hashCode % numBuckets);
}

public:
Map() {
numBuckets = 5;

[Link](numBuckets);

160 | P a g e
std::cout << "HashMap created" << std::endl;
std::cout << "Number of pairs in the Map: " << size << std::endl;
std::cout << "Size of Map: " << numBuckets << std::endl;
std::cout << "Default Load Factor : " << DEFAULT_LOAD_FACTOR << std::endl;
}

void insert(int key, int value) {


// Getting the index at which it needs to be inserted
int bucketInd = getBucketInd(key);

// The first node at that index


MapNode* head = buckets[bucketInd];

// First, loop through all the nodes present at that index


// to check if the key already exists
while (head != NULL) {
// If already present the value is updated
if (head->key == key) {
head->value = value;
return;
}
head = head->next;
}

// new node with the K and V


MapNode* newElementNode = new MapNode(key, value);

// The head node at the index


head = buckets[bucketInd];

// the new node is inserted


// by making it the head
// and it's next is the previous head
newElementNode->next = head;

buckets[bucketInd] = newElementNode;

std::cout << "Pair(" << key << ", " << value << ") inserted successfully." <<
std::endl;

// Incrementing size
// as new K-V pair is added to the map
size++;

// Load factor calculated


double loadFactor = (1 * size) / numBuckets;

std::cout << "Current Load factor = " << loadFactor << std::endl;

// If the load factor is > 0.75, rehashing is done


if (loadFactor > DEFAULT_LOAD_FACTOR) {
std::cout << loadFactor << " is greater than " << DEFAULT_LOAD_FACTOR <<
std::endl;
std::cout << "Therefore Rehashing will be done." << std::endl;

161 | P a g e
// Rehash
rehash();

std::cout << "New Size of Map: " << numBuckets << std::endl;
}

std::cout << "Number of pairs in the Map: " << size << std::endl;
}

void rehash() {
std::cout << "\n***Rehashing Started***\n" << std::endl;

// The present bucket list is made temp


std::vector<MapNode*> temp = buckets;

// New bucketList of double the old size is created


[Link](2 * numBuckets);

for (int i = 0; i < 2 * numBuckets; i++) {


// Initialised to null
buckets[i] = NULL;
}

// Now size is made zero


// and we loop through all the nodes in the original bucket list(temp)
// and insert it into the new list
size = 0;
numBuckets *= 2;

for (int i = 0; i < [Link](); i++) {


// head of the chain at that index
MapNode* head = temp[i];

while (head != NULL) {


int key = head->key;
int val = head->value;

// calling the insert function for each node in temp


// as the new list is now the bucketArray
insert(key, val);
head = head->next;
}
}

std::cout << "***Rehashing Done***\n" << std::endl;


}
};

int main() {
Map map;
// Inserting elements
[Link](1, 1);
[Link](2, 2);
[Link](3, 3);
[Link](4, 4);

162 | P a g e
[Link](5, 5);
[Link](6, 6);
[Link](7, 7);
[Link](8, 8);
[Link](9, 9);
[Link](10, 10);

return 0;
}

Output:
HashMap created
Number of pairs in the Map: 0
Size of Map: 5
Default Load Factor : 0.75

Pair(1, Geeks) inserted successfully.

Current Load factor = 0.2


Number of pairs in the Map: 1
Size of Map: 5

Current HashMap:
key = 1, val = Geeks

Pair(2, forGeeks) inserted successfully.

Current Load factor = 0.4


Number of pairs in the Map: 2
Size of Map: 5

Current HashMap:
key = 1, val = Geeks
key = 2, val = forGeeks

Pair(3, A) inserted successfully.

163 | P a g e
Current Load factor = 0.6
Number of pairs in the Map: 3
Size of Map: 5

Current HashMap:
key = 1, val = Geeks
key = 2, val = forGeeks
key = 3, val = A

Pair(4, Computer) inserted successfully.

Current Load factor = 0.8


0.8 is greater than 0.75
Therefore Rehashing will be done.

***Rehashing Started***

Pair(1, Geeks) inserted successfully.

Current Load factor = 0.1


Number of pairs in the Map: 1
Size of Map: 10

Pair(2, forGeeks) inserted successfully.

Current Load factor = 0.2


Number of pairs in the Map: 2
Size of Map: 10

Pair(3, A) inserted successfully.

164 | P a g e
Current Load factor = 0.3
Number of pairs in the Map: 3
Size of Map: 10

Pair(4, Computer) inserted successfully.

Current Load factor = 0.4


Number of pairs in the Map: 4
Size of Map: 10

***Rehashing Ended***

New Size of Map: 10

Number of pairs in the Map: 4


Size of Map: 10

Current HashMap:
key = 1, val = Geeks
key = 2, val = forGeeks
key = 3, val = A
key = 4, val = Computer

Pair(5, Portal) inserted successfully.

Current Load factor = 0.5


Number of pairs in the Map: 5
Size of Map: 10

Current HashMap:
key = 1, val = Geeks

165 | P a g e
key = 2, val = forGeeks
key = 3, val = A
key = 4, val = Computer
key = 5, val = Portal

The time complexity of the insert operation is O(1) and the


Auxiliary space : O(n).
The time complexity of the rehash operation is O(n) and the
Auxiliary space: O(n).

Recursion:
Definition: Recursion is a programming technique where a function calls itself to
solve a problem.

Key Points:

• A recursive function typically has a base case (the simplest version of the problem) that stops
the recursion.
• It also has a recursive case, where the function calls itself with a smaller or simpler version of
the problem.
• Recursion can lead to elegant solutions, but it's important to avoid infinite loops by ensuring
the base case is reached.

Example: Factorial Calculation

pythonCopy code
def factorial ( n ): if n == 0 : return 1 else : return n * factorial(n - 1 )

Backtracking:
Definition: Backtracking is an algorithmic technique that involves trying out different
possibilities and undoing choices when they lead to a dead end.

Key Points:

• It's used to solve problems where you need to find all possible solutions, or one solution that
meets specific constraints.
• Backtracking uses a depth-first search approach and maintains a state that can be reverted.
• It involves making choices, exploring a path, and then undoing the choices if they don't lead
to a solution.

166 | P a g e
Example: N-Queens Problem In the N-Queens problem, you need to place N
queens on an N×N chessboard in such a way that no two queens threaten each
other.

pythonCopy code
def is_safe ( board, row, col, n ): for i in range (row): if board[i][col] == 1 : return False if col-row+i
>= 0 and board[i][col-row+i] == 1 : return False if col+row-i < n and board[i][col+row-i] == 1 :
return False return True def solve_n_queens_util ( board, row, n ): if row == n:
[Link]([ list (row) for row in board]) return for col in range (n): if is_safe(board, row, col,
n): board[row][col] = 1 solve_n_queens_util(board, row + 1 , n) board[row][col] = 0 def
solve_n_queens ( n ): board = [[ 0 ] * n for _ in range (n)] solve_n_queens_util(board, 0 , n) n = 4
solutions = [] solve_n_queens(n) for sol in solutions: for row in sol: print ( ' ' .join( 'Q' if cell == 1 else
'.' for cell in row)) print ()

These are just brief notes on recursion and backtracking. To fully understand and
master these concepts, it's important to study and practice them with different
problems and examples.
Certainly, the Tower of Hanoi is a classic problem that's often used to illustrate the concept of
recursion. In this problem, you have three pegs and a set of disks of different sizes which can be
slid onto any peg. The puzzle starts with the disks in a neat stack in ascending order of size on
one peg, the smallest at the top. The objective is to move the entire stack to another peg,
obeying the following rules:

1. Only one disk can be moved at a time.


2. Each move consists of taking the top disk from one of the stacks and placing it on top of another
stack or an empty peg.
3. No disk may be placed on top of a smaller disk.

Here's how you can solve the Tower of Hanoi problem using recursion:

pythonCopy code
def tower_of_hanoi ( n, source, auxiliary, target ): if n == 1 : print ( f"Move disk 1 from {source} to
{target}" ) return tower_of_hanoi(n - 1 , source, target, auxiliary) print ( f"Move disk {n} from {source} to
{target}" ) tower_of_hanoi(n - 1 , auxiliary, source, target) n = 3 # Number of disks tower_of_hanoi(n,
'A' , 'B' , 'C' ) # 'A', 'B', 'C' are the pegs

In this example, the function tower_of_hanoi takes the number of disks n and the names of the
source, auxiliary, and target pegs as arguments. It uses recursion to solve the problem by
breaking it down into smaller subproblems.

The function first moves n-1 disks from the source peg to the auxiliary peg using the target peg
as an auxiliary. Then, it moves the remaining largest disk from the source peg to the target peg.
Finally, it moves the n-1 disks from the auxiliary peg to the target peg using the source peg as an
auxiliary.

The recursion ends when there's only one disk left to move, at which point it directly moves that
disk from the source peg to the target peg.

167 | P a g e
By following this approach, you can solve the Tower of Hanoi problem efficiently using recursion.

Dynamic Programming (DP):


Definition: Dynamic programming is a method for solving complex problems by
breaking them down into simpler overlapping subproblems and solving each
subproblem only once.

Key Points:

• DP is suitable for problems with optimal substructure and overlapping subproblems.


• It involves solving each subproblem only once and storing its solution in a table to avoid
redundant computations.
• There are two types of DP approaches: bottom-up (iterative) and top-down (recursive with
memoization).

Steps for Solving DP Problems:

1. Characterize the structure of an optimal solution: Understand how an optimal solution is


built from solutions to smaller subproblems.
2. Define the value of an optimal solution recursively: Express the value of the optimal
solution in terms of the values of smaller subproblems.
3. Compute the value of an optimal solution bottom-up or top-down: Use memoization or
tabulation to avoid solving the same subproblem multiple times.
4. Construct an optimal solution to the problem: Use the stored information to build the
solution.

Example: Fibonacci Sequence using Memoization


(Top-Down DP)
pythonCopy code
# Using memoization to avoid redundant computations def fibonacci ( n, memo={} ): if n in memo:
return memo[n] if n <= 1 : return n memo[n] = fibonacci(n - 1 , memo) + fibonacci(n - 2 , memo) return
memo[n] n = 10 print ( f"Fibonacci({n}) =" , fibonacci(n))

Example: Coin Change Problem (Bottom-Up DP)


pythonCopy code
def coin_change ( coins, amount ): dp = [ float ( 'inf' )] * (amount + 1 ) dp[ 0 ] = 0 for coin in coins: for i
in range (coin, amount + 1 ): dp[i] = min (dp[i], dp[i - coin] + 1 ) return dp[amount] if dp[amount] !=
float ( 'inf' ) else - 1 coins = [ 1 , 2 , 5 ] amount = 11 print ( "Minimum number of coins:" ,
coin_change(coins, amount))

168 | P a g e
In the coin change problem, the goal is to find the minimum number of coins
needed to make up a given amount. The dynamic programming approach involves
building up the solutions for each amount incrementally using the values of smaller
amounts.

These examples demonstrate the essence of dynamic programming. To effectively


use DP, you'll need to recognize the problems that can be solved using this
technique, understand their optimal substructure and overlapping subproblems, and
choose the appropriate approach (top-down or bottom-up) based on the problem's
characteristics. Practice and familiarity with different types of DP problems will help
you master this technique.
String manipulation is a fundamental skill in programming, especially when working with text-
based data. It involves various operations and techniques for manipulating strings to achieve
desired outcomes. Here's an overview of key concepts and operations related to string
manipulation:

Basic String Operations:


1. Concatenation: Combining two or more strings together.
pythonCopy code
str1 = "Hello, " str2 = "world!" result = str1 + str2
2. Substring: Extracting a portion of a string.
pythonCopy code
text = "Hello, world!" substring = text[ 7 : 12 ] # "world"
3. Length: Finding the length of a string.
pythonCopy code
text = "Hello" length = len (text) # 5

Searching and Manipulating:


1. Searching: Finding the position of a substring within a string.
pythonCopy code
text = "Hello, world!" position = [Link]( "world" ) # 7
2. Replacing: Replacing occurrences of a substring with another string.
pythonCopy code
text = "Hello, world!" new_text = [Link]( "world" , "Python" ) # "Hello, Python!"
3. Splitting and Joining: Splitting a string into a list of substrings based on a delimiter, and joining
a list of strings into a single string.
pythonCopy code
text = "apple,banana,orange" fruits = [Link]( "," ) # ["apple", "banana", "orange"] new_text = "-
" .join(fruits) # "apple-banana-orange"

String Formatting:
String formatting allows you to create dynamic strings by embedding variables and values within
them.

169 | P a g e
1. String Interpolation:
pythonCopy code
name = "Alice" age = 30 message = f"My name is {name} and I am {age} years old."
2. format() Method:
pythonCopy code
name = "Bob" age = 25 message = "My name is {} and I am {} years old." . format (name, age)

Case Conversion:
1. Uppercase and Lowercase:
pythonCopy code
text = "Hello, World!" upper_text = [Link]() # "HELLO, WORLD!" lower_text = [Link]() # "hello,
world!"

String Validation:
1. Checking Prefix and Suffix:
pythonCopy code
text = "Hello, World!" starts_with_hello = [Link]( "Hello" ) # True ends_with_world =
[Link]( "world" ) # False
2. Checking Type of Characters:
pythonCopy code
alphanumeric = "abc123" only_letters = [Link]() # False only_digits =
[Link]() # False

Regular Expressions:
Regular expressions (regex) are powerful tools for pattern matching and manipulation within
strings. They are used to search, validate, and replace patterns in text data.

pythonCopy code
import re pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b" text = "Contact us at
support@[Link]" matches = [Link](pattern, text) # ["support@[Link]"]

String manipulation is a versatile skill that's essential for working with textual data in various
programming tasks, ranging from data parsing and validation to user interfaces and text
processing applications.

Greedy Algorithms
• Last Updated : 26 Jul, 2023
• Read

• Discuss

• Courses

170 | P a g e
DSA for Beginners
Learn more about Greedy Algorithms in DSA Self Paced Course
Practice Problems on Greedy Algorithms
Top Quizzes on Greedy Algorithms

What is Greedy Algorithm?

Greedy is an algorithmic paradigm that builds up a solution piece by piece, always choosing
the next piece that offers the most obvious and immediate benefit. So the problems where
choosing locally optimal also leads to global solution are the best fit for Greedy.
For example consider the Fractional Knapsack Problem. The local optimal strategy is to
choose the item that has maximum value vs weight ratio. This strategy also leads to a
globally optimal solution because we are allowed to take fractions of an item.

Greedy Algorithms (General Structure and Applications)

S
simranjenny84

• Read

• Discuss

• Courses

• Practice

171 | P a g e
The general structure of a greedy algorithm can be summarized in the
following steps:

1. Identify the problem as an optimization problem where we need to find the


best solution among a set of possible solutions.
2. Determine the set of feasible solutions for the problem.
3. Identify the optimal substructure of the problem, meaning that the optimal
solution to the problem can be constructed from the optimal solutions of its
subproblems.
4. Develop a greedy strategy to construct a feasible solution step by step, making
the locally optimal choice at each step.
Prove the correctness of the algorithm by showing that the locally optimal
choices at each step lead to a globally optimal solution.

Some common applications of greedy algorithms include:

1. Coin change problem: Given a set of coins with different denominations, find
the minimum number of coins required to make a given amount of change.
Fractional knapsack problem: Given a set of items with weights and values, fill
a knapsack with a maximum weight capacity with the most valuable items,
allowing fractional amounts of items to be included.
Huffman coding: Given a set of characters and their frequencies in a message,
construct a binary code with minimum average length for the characters.
Shortest path algorithms: Given a weighted graph, find the shortest path
between two nodes.
Minimum spanning tree: Given a weighted graph, find a tree that spans all
nodes with the minimum total weight.
Greedy algorithms can be very efficient and provide fast solutions for many
problems. However, it is important to keep in mind that they may not always
provide the optimal solution and to analyze the problem carefully to ensure
the correctness of the algorithm.
2. Greedy Algorithms work step-by-step, and always choose the steps which
provide immediate profit/benefit. It chooses the “locally optimal solution”,
without thinking about future consequences. Greedy algorithms may not
always lead to the optimal global solution, because it does not consider the
entire data. The choice made by the greedy approach does not consider future
data and choices. In some cases making a decision that looks right at that
moment gives the best solution (Greedy), but in other cases, it doesn’t. The
greedy technique is used for optimization problems (where we have to find
the maximum or minimum of something). The Greedy technique is best suited
for looking at the immediate situation.
All greedy algorithms follow a basic structure:
1. declare an empty result = 0.

172 | P a g e
2. We make a greedy choice to select, If the choice is feasible add it to the final
result.
3. return the result.
Why choose Greedy Approach:
The greedy approach has a few tradeoffs, which may make it suitable for
optimization. One prominent reason is to achieve the most feasible solution
immediately. In the activity selection problem (Explained below), if more activities
can be done before finishing the current activity, these activities can be performed
within the same time. Another reason is to divide a problem recursively based on a
condition, with no need to combine all the solutions. In the activity selection
problem, the “recursive division” step is achieved by scanning a list of items only
once and considering certain activities.
Greedy choice property:
This property says that the globally optimal solution can be obtained by making a
locally optimal solution (Greedy). The choice made by a Greedy algorithm may
depend on earlier choices but not on the future. It iteratively makes one Greedy
choice after another and reduces the given problem to a smaller one.
Optimal substructure:
A problem exhibits optimal substructure if an optimal solution to the problem
contains optimal solutions to the subproblems. That means we can solve
subproblems and build up the solutions to solve larger problems.
Note: Making locally optimal choices does not always work. Hence, Greedy
algorithms will not always give the best solutions.
Characteristics of Greedy approach:
• There is an ordered list of resources(profit, cost, value, etc.)
• Maximum of all the resources(max profit, max value, etc.) are taken.
• For example, in the fractional knapsack problem, the maximum value/weight
is taken first according to available capacity.
Characteristic components of greedy algorithm:
1. The feasible solution: A subset of given inputs that satisfies all specified
constraints of a problem is known as a “feasible solution”.
2. Optimal solution: The feasible solution that achieves the desired extremum
is called an “optimal solution”. In other words, the feasible solution that either
minimizes or maximizes the objective function specified in a problem is
known as an “optimal solution”.
3. Feasibility check: It investigates whether the selected input fulfils all
constraints mentioned in a problem or not. If it fulfils all the constraints then
it is added to a set of feasible solutions; otherwise, it is rejected.
4. Optimality check: It investigates whether a selected input produces either a
minimum or maximum value of the objective function by fulfilling all the
specified constraints. If an element in a solution set produces the desired
extremum, then it is added to a sel of optimal solutions.

173 | P a g e
5. Optimal substructure property: The globally optimal solution to a problem
includes the optimal sub solutions within it.
6. Greedy choice property: The globally optimal solution is assembled by
selecting locally optimal choices. The greedy approach applies some locally
optimal criteria to obtain a partial solution that seems to be the best at that
moment and then find out the solution for the remaining sub-problem.
The local decisions (or choices) must possess three characteristics as mentioned
below:
1. Feasibility: The selected choice must fulfil local constraints.
2. Optimality: The selected choice must be the best at that stage (locally
optimal choice).
3. Irrevocability: The selected choice cannot be changed once it is made.
Applications of Greedy Algorithms:
• Finding an optimal solution (Activity selection, Fractional Knapsack, Job
Sequencing, Huffman Coding).
• Finding close to the optimal solution for NP-Hard problems like TSP.
• Network design: Greedy algorithms can be used to design efficient networks,
such as minimum spanning trees, shortest paths, and maximum flow
networks. These algorithms can be applied to a wide range of network design
problems, such as routing, resource allocation, and capacity planning.
• Machine learning: Greedy algorithms can be used in machine learning
applications, such as feature selection, clustering, and classification. In feature
selection, greedy algorithms are used to select a subset of features that are
most relevant to a given problem. In clustering and classification, greedy
algorithms can be used to optimize the selection of clusters or classes.
• Image processing: Greedy algorithms can be used to solve a wide range of
image processing problems, such as image compression, denoising, and
segmentation. For example, Huffman coding is a greedy algorithm that can be
used to compress digital images by efficiently encoding the most frequent
pixels.
• Combinatorial optimization: Greedy algorithms can be used to solve
combinatorial optimization problems, such as the traveling salesman
problem, graph coloring, and scheduling. Although these problems are
typically NP-hard, greedy algorithms can often provide close-to-optimal
solutions that are practical and efficient.
• Game theory: Greedy algorithms can be used in game theory applications,
such as finding the optimal strategy for games like chess or poker. In these
applications, greedy algorithms can be used to identify the most promising
moves or actions at each turn, based on the current state of the game.
• Financial optimization: Greedy algorithms can be used in financial
applications, such as portfolio optimization and risk management. In portfolio
optimization, greedy algorithms can be used to select a subset of assets that
are most likely to provide the best return on investment, based on historical
data and current market trends.

174 | P a g e
Applications of Greedy Approach:
Greedy algorithms are used to find an optimal or near optimal solution to many
real-life problems. Few of them are listed below:
(1) Make a change problem
(2) Knapsack problem
(3) Minimum spanning tree
(4) Single source shortest path
(5) Activity selection problem
(6) Job sequencing problem
(7) Huffman code generation.
(8) Dijkstra’s algorithm
(9) Greedy coloring
(10) Minimum cost spanning tree
(11) Job scheduling
(12) Interval scheduling
(13) Greedy set cover
(14) Knapsack with fractions

Advantages of the Greedy Approach:


• The greedy approach is easy to implement.
• Typically have less time complexity.
• Greedy algorithms can be used for optimization purposes or finding close to
optimization in case of Hard problems.
• Greedy algorithms can produce efficient solutions in many cases, especially
when the problem has a substructure that exhibits the greedy choice
property.
• Greedy algorithms are often faster than other optimization algorithms, such
as dynamic programming or branch and bound, because they require less
computation and memory.
• The greedy approach is often used as a heuristic or approximation algorithm
when an exact solution is not feasible or when finding an exact solution would
be too time-consuming.
• The greedy approach can be applied to a wide range of problems, including
problems in computer science, operations research, economics, and other
fields.
• The greedy approach can be used to solve problems in real-time, such as
scheduling problems or resource allocation problems, because it does not
require the solution to be computed in advance.

175 | P a g e
• Greedy algorithms are often used as a first step in solving optimization
problems, because they provide a good starting point for more complex
optimization algorithms.
• Greedy algorithms can be used in conjunction with other optimization
algorithms, such as local search or simulated annealing, to improve the
quality of the solution.
Disadvantages of the Greedy Approach:
• The local optimal solution may not always be globally optimal.
• Greedy algorithms do not always guarantee to find the optimal solution, and
may produce suboptimal solutions in some cases.
• The greedy approach relies heavily on the problem structure and the choice of
criteria used to make the local optimal choice. If the criteria are not chosen
carefully, the solution produced may be far from optimal.
• Greedy algorithms may require a lot of preprocessing to transform the
problem into a form that can be solved by the greedy approach.
• Greedy algorithms may not be applicable to problems where the optimal
solution depends on the order in which the inputs are processed.
• Greedy algorithms may not be suitable for problems where the optimal
solution depends on the size or composition of the input, such as the bin
packing problem.
• Greedy algorithms may not be able to handle constraints on the solution
space, such as constraints on the total weight or capacity of the solution.
• Greedy algorithms may be sensitive to small changes in the input, which can
result in large changes in the output. This can make the algorithm unstable
and unpredictable in some cases.
Standard Greedy Algorithms :
• Prim’s Algorithm
• Kruskal’s Algorithm
• Dijkstra’s Algorithm
Difference between Greedy Algorithm and Divide and Conquer
Algorithm

H
harshraghav718

• Read

• Discuss

• Courses

176 | P a g e
• Practice


Greedy algorithm and divide and conquer algorithm are two common
algorithmic paradigms used to solve problems. The main difference between
them lies in their approach to solving problems.

Greedy Algorithm:

1. The greedy algorithm is an algorithmic paradigm that follows the problem-


solving heuristic of making the locally optimal choice at each stage with the
hope of finding a global optimum.
2. In other words, a greedy algorithm chooses the best possible option at each
step, without considering the consequences of that choice on future steps.
Greedy algorithms are useful for solving optimization problems that can be
divided into smaller subproblems.
Greedy algorithms may not always find the optimal solution, but they are
usually faster and simpler than other algorithms.
3. Greedy algorithm is defined as a method for solving optimization
problems by taking decisions that result in the most evident and immediate
benefit irrespective of the final outcome. It is a simple, intuitive algorithm that
is used in optimization problems.

Divide and Conquer Algorithm:

1. The divide and conquer algorithm is an algorithmic paradigm that involves


breaking down a problem into smaller subproblems, solving each subproblem
recursively, and then combining the solutions to the subproblems to solve the
original problem.
In other words, the divide and conquer algorithm solves a problem by
dividing it into smaller subproblems, solving each subproblem independently,
and then
2. combining the solutions to the subproblems to solve the original problem.
Divide and conquer algorithms are useful for solving problems that can be
divided into smaller subproblems that are similar to the original problem.
3. Divide and conquer algorithms are generally slower than greedy algorithms,
but they are more likely to find the optimal solution.
In summary, the main difference between greedy algorithms and divide and
conquer algorithms is in their approach to solving problems. Greedy
algorithms make locally optimal choices at each step, while divide and
conquer algorithms divide a problem into smaller subproblems and solve
each subproblem independently. Greedy algorithms are faster and simpler
but may not always find the optimal solution, while divide and conquer
algorithms are slower but more likely to find the optimal solution.

177 | P a g e
A typical Divide and Conquer algorithm solves a problem using the following
three steps:
• Divide: This involves dividing the problem into smaller sub-problems.
• Conquer: Solve sub-problems by calling recursively until solved.
• Combine: Combine the sub-problems to get the final solution of the whole
problem.
Difference between the Greedy Algorithm and the
Divide and Conquer Algorithm:
[Link] Divide and conquer Greedy Algorithm

Divide and conquer is used to obtain a The greedy method is used to


1 solution to the given problem, it does obtain an optimal solution to the
not aim for the optimal solution. given problem.

In this technique, the problem is divided


into small subproblems. These In Greedy Method, a set of
subproblems are solved independently. feasible solutions are generated
2
Finally, all the solutions to subproblems and pick up one feasible solution
are collected together to get the is the optimal solution.
solution to the given problem.

A greedy method is comparatively


Divide and conquer is less efficient and
3 efficient and faster as it is iterative
slower because it is recursive in nature.
in nature.

In the Greedy method, the optimal


solution is generated without
Divide and conquer may generate
4 revisiting previously generated
duplicate solutions.
solutions, thus it avoids the re-
computation

Greedy algorithms also run in


Divide and conquer algorithms mostly
5 polynomial time but take less time
run in polynomial time.
than Divide and conquer

178 | P a g e
[Link] Divide and conquer Greedy Algorithm

Examples: Fractional Knapsack


Examples: Merge sort,
problem,
6 Quick sort,
Activity selection problem,
Strassen’s matrix multiplication.
Job sequencing problem.

Difference between Greedy Algorithm and Divide and Conquer


Algorithm

H
harshraghav718

• Read

• Discuss

• Courses

• Practice


Greedy algorithm and divide and conquer algorithm are two common
algorithmic paradigms used to solve problems. The main difference between
them lies in their approach to solving problems.

Greedy Algorithm:

1. The greedy algorithm is an algorithmic paradigm that follows the problem-


solving heuristic of making the locally optimal choice at each stage with the
hope of finding a global optimum.
2. In other words, a greedy algorithm chooses the best possible option at each
step, without considering the consequences of that choice on future steps.
Greedy algorithms are useful for solving optimization problems that can be
divided into smaller subproblems.
Greedy algorithms may not always find the optimal solution, but they are
usually faster and simpler than other algorithms.
3. Greedy algorithm is defined as a method for solving optimization
problems by taking decisions that result in the most evident and immediate

179 | P a g e
benefit irrespective of the final outcome. It is a simple, intuitive algorithm that
is used in optimization problems.

Divide and Conquer Algorithm:

1. The divide and conquer algorithm is an algorithmic paradigm that involves


breaking down a problem into smaller subproblems, solving each subproblem
recursively, and then combining the solutions to the subproblems to solve the
original problem.
In other words, the divide and conquer algorithm solves a problem by
dividing it into smaller subproblems, solving each subproblem independently,
and then
2. combining the solutions to the subproblems to solve the original problem.
Divide and conquer algorithms are useful for solving problems that can be
divided into smaller subproblems that are similar to the original problem.
3. Divide and conquer algorithms are generally slower than greedy algorithms,
but they are more likely to find the optimal solution.
In summary, the main difference between greedy algorithms and divide and
conquer algorithms is in their approach to solving problems. Greedy
algorithms make locally optimal choices at each step, while divide and
conquer algorithms divide a problem into smaller subproblems and solve
each subproblem independently. Greedy algorithms are faster and simpler
but may not always find the optimal solution, while divide and conquer
algorithms are slower but more likely to find the optimal solution.
A typical Divide and Conquer algorithm solves a problem using the following
three steps:
• Divide: This involves dividing the problem into smaller sub-problems.
• Conquer: Solve sub-problems by calling recursively until solved.
• Combine: Combine the sub-problems to get the final solution of the whole
problem.
Difference between the Greedy Algorithm and the
Divide and Conquer Algorithm:
[Link] Divide and conquer Greedy Algorithm

Divide and conquer is used to obtain a The greedy method is used to


1 solution to the given problem, it does obtain an optimal solution to the
not aim for the optimal solution. given problem.

2 In this technique, the problem is divided In Greedy Method, a set of


into small subproblems. These feasible solutions are generated

180 | P a g e
[Link] Divide and conquer Greedy Algorithm

subproblems are solved independently. and pick up one feasible solution


Finally, all the solutions to subproblems is the optimal solution.
are collected together to get the
solution to the given problem.

A greedy method is comparatively


Divide and conquer is less efficient and
3 efficient and faster as it is iterative
slower because it is recursive in nature.
in nature.

In the Greedy method, the optimal


solution is generated without
Divide and conquer may generate
4 revisiting previously generated
duplicate solutions.
solutions, thus it avoids the re-
computation

Greedy algorithms also run in


Divide and conquer algorithms mostly
5 polynomial time but take less time
run in polynomial time.
than Divide and conquer

Examples: Fractional Knapsack


Examples: Merge sort,
problem,
6 Quick sort,
Activity selection problem,
Strassen’s matrix multiplication.
Job sequencing problem.

Greedy, Divide and Conquer and Dynamic Programming


algorithm

H
harshraghav718

• Read

• Discuss
181 | P a g e
• Courses

• Practice


Greedy algorithm, divide and conquer algorithm, and dynamic programming
algorithm are three common algorithmic paradigms used to solve problems.
Here’s a comparison among these algorithms:

Approach:

1. Greedy algorithm: Makes locally optimal choices at each step with the hope of
finding a global optimum.
2. Divide and conquer algorithm: Breaks down a problem into smaller
subproblems, solves each subproblem recursively, and then combines the
solutions to the
3. subproblems to solve the original problem.
4. Dynamic programming algorithm: Solves subproblems recursively and stores
their solutions to avoid repeated calculations.

Goal:

1. Greedy algorithm: Finds the best solution among a set of possible solutions.
2. Divide and conquer algorithm: Solves a problem by dividing it into smaller
subproblems, solving each subproblem independently, and then combining
the
3. solutions to the subproblems to solve the original problem.
4. Dynamic programming algorithm: Solves a problem by breaking it down into
smaller subproblems and solving each subproblem recursively.

Time complexity:

1. Greedy algorithm: O(nlogn) or O(n) depending on the problem.


2. Divide and conquer algorithm: O(nlogn) or O(n^2) depending on the problem.
3. Dynamic programming algorithm: O(n^2) or O(n^3) depending on the
problem

.
Space complexity:

1. Greedy algorithm: O(1) or O(n) depending on the problem.


2. Divide and conquer algorithm: O(nlogn) or O(n^2) depending on the problem.

182 | P a g e
3. Dynamic programming algorithm: O(n^2) or O(n^3) depending on the
problem.

Optimal solution:

1. Greedy algorithm: May or may not provide the optimal solution.


2. Divide and conquer algorithm: May or may not provide the optimal solution.
3. Dynamic programming algorithm: Guarantees the optimal solution.

Examples:

1. Greedy algorithm: Huffman coding, Kruskal’s algorithm, Dijkstra’s algorithm,


etc.
2. Divide and conquer algorithm: Merge sort, Quick sort, binary search, etc.
3. Dynamic programming algorithm: Fibonacci series, Longest common
subsequence, Knapsack problem, etc.
4. In summary, the main differences among these algorithms are their approach,
goal, time and space complexity, and their ability to provide the optimal
solution. Greedy algorithm and divide and conquer algorithm are generally
faster and simpler, but may not always provide the optimal solution, while
dynamic programming algorithm guarantees the optimal solution but is
slower and more complex.

Greedy Algorithm:
Greedy algorithm is defined as a method for solving optimization problems by
taking decisions that result in the most evident and immediate benefit
irrespective of the final outcome. It is a simple, intuitive algorithm that is used in
optimization problems.
Divide and conquer Algorithm:
Divide and conquer is an algorithmic paradigm in which the problem is solved
using the Divide, Conquer, and Combine strategy. A typical Divide and Conquer
algorithm solve a problem using the following three steps:
Divide: This involves dividing the problem into smaller sub-problems.
Conquer: Solve sub-problems by calling recursively until solved.
Combine: Combine the sub-problems to get the final solution of the whole problem.
Dynamic Programming:
Dynamic Programming is mainly an optimization over plain recursion. Wherever
we see a recursive solution that has sometimes repeated calls for the same input
states, we can optimize it using Dynamic Programming. The idea is to simply
store the results of subproblems so that we do not have to re-compute them

183 | P a g e
when needed later. This simple optimization reduces time complexities from
exponential to polynomial.

Greedy Algorithm vs Divide and Conquer Algorithm vs Dynamic


Algorithm:

Divide and Dynamic


[Link] Greedy Algorithm conquer Programming

Follows Top-down Follows Top-down Follows bottom-up


1
approach approach approach

Used to solve optimization Used to solve Used to solve


2
problem decision problem optimization problem

The optimal solution is


The solution of
generated without Solution of
subproblems is
revisiting previously subproblem is
3 computed once and
generated solutions; thus, computed recursively
stored in a table for
it avoids the re- more than once.
later use.
computation

It is used to obtain a
solution to the given
It may or may not generate It always generates
4 problem, it does not
an optimal solution. optimal solution.
aim for the optimal
solution

5 Iterative in nature. Recursive in nature. Recursive in nature.

efficient and fast than


divide and conquer. For more efficient but
instance, single source less efficient and slower than greedy.
6 For instance, single
shortest path finding using slower.
Dijkstra’s Algo takes source shortest path
O(ElogV) time finding using Bellman

184 | P a g e
Divide and Dynamic
[Link] Greedy Algorithm conquer Programming

Ford Algo takes O(VE)


time.

more memory is
extra memory is not some memory is required to store
7
required. required. subproblems for later
use.

Examples: Merge Examples: 0/1


Examples: Fractional
sort, Knapsack,
Knapsack problem,
8 Quick sort, All pair shortest path,
Activity selection problem,
Strassen’s matrix Matrix-chain
Job sequencing problem.
multiplication. multiplication.

2/2
Sure, here's a table that provides a summary of common data structures and
algorithms, along with their time and space complexity analyses. This table can be
helpful for DSA placement preparation:

Data Structure / Algorithm Time Complexity Space Complexity

Arrays

Access O(1) O(1)

Search O(n) O(1)

185 | P a g e
Data Structure / Algorithm Time Complexity Space Complexity

Insertion (at the end) O(1) O(1)

Insertion (at arbitrary) O(n) O(1)

Deletion (at the end) O(1) O(1)

Deletion (at arbitrary) O(n) O(1)

Linked Lists

Access O(n) O(1)

Search O(n) O(1)

Insertion (at the end) O(1) O(1)

Insertion (at arbitrary) O(1) O(1)

186 | P a g e
Data Structure / Algorithm Time Complexity Space Complexity

Deletion (at the end) O(n) O(1)

Deletion (at arbitrary) O(1) O(1)

Stacks

Access O(n) O(n)

Search O(n) O(n)

Insertion (Push) O(1) O(1)

Deletion (Pop) O(1) O(1)

Queues

Access O(n) O(n)

187 | P a g e
Data Structure / Algorithm Time Complexity Space Complexity

Search O(n) O(n)

Insertion (Enqueue) O(1) O(1)

Deletion (Dequeue) O(1) O(1)

Hash Tables

Insertion O(1) avg, O(n) worst O(n)

Deletion O(1) avg, O(n) worst O(n)

Access O(1) avg, O(n) worst O(n)

Binary Search Trees

Search O(log n) avg, O(n) worst O(log n)

188 | P a g e
Data Structure / Algorithm Time Complexity Space Complexity

Insertion O(log n) avg, O(n) worst O(log n)

Deletion O(log n) avg, O(n) worst O(log n)

Heaps (Binary)

Insertion (Push) O(log n) O(n)

Deletion (Pop) O(log n) O(n)

Sorting Algorithms

Bubble Sort O(n^2) O(1)

Selection Sort O(n^2) O(1)

Insertion Sort O(n^2) O(1)

189 | P a g e
Data Structure / Algorithm Time Complexity Space Complexity

Merge Sort O(n log n) O(n)

Quick Sort O(n log n) avg, O(n^2) worst O(log n)

Graph Traversal

Breadth-First Search O(V + E) O(V)

Depth-First Search O(V + E) O(V)

Please note that the time and space complexities provided in the table are simplified
and represent average and worst-case scenarios. Depending on the exact
implementation and specific scenarios, these complexities can vary. This table is
meant to provide a quick reference and overview of the complexities of common
data structures and algorithms encountered in DSA placement interviews.

190 | P a g e

You might also like