Algorithm analysis and design
Lecture-09
Bucket Sort
Basic idea:
if you have n input elements, then we need n buckets
divide [0, 1] evenly into n consecutive sub-intervals [(0,1/n), (1/n,2/n), ..., (n−1)/n,1)]
given some element A[i] ∈ [0, 1], throw it into bucket with index ⌊n · A[i]⌋
(that’s them buckets)
hope that input is distributed evenly among buckets
sort buckets separately and concatenate results
Input A = A[1],...,A[n] with A[i] ∈ [0,1) drawn uniformly at random
Need auxiliary array B[0], . . . , B[n − 1] of linked lists
Bucket-Sort(A)
1: n ← length(A)
2: for i←1 to n do
3: insert A[i] into list B[⌊n · A[i]⌋]
4: end for
5: for i←0 to n−1 do
6: sort list B[i] with insertion sort
7: end for
8: concatenate lists B[0], . . . , B[n − 1] together in order
Claim: expected running time is O(n)
Example
10 inputs elements, thus buckets [(0, 1/10), (1/10, 2/10), . . . (9/10, 1)]
After sorting buckets:
A B
0.32 0.02
0.12 0.12
0.78 0.22
0.55 0.32
0.91 0.41
0.22 0.55 → 0.59
0.41 /
0.59 0.72 → 0.78
0.72 /
0.02 0.91
Best Case Complexity O(n)
It occurs when the elements are distributed uniformly in the buckets, with nearly
identical elements in each bucket.
When the elements within the buckets are already sorted, the complexity
increases.
If insertion sort is used to sort bucket elements, the overall complexity will be
linear, i.e. O(n+k).
O(n) is the complexity of creating buckets, and O(k) is the complexity of sorting
bucket elements using algorithms with linear time complexity in the best case.
Average Case Complexity O(n)
It happens when the array's elements are distributed at random.
Bucket sorting takes linear time, even if the elements are not distributed
uniformly.
It holds until the sum of the squares of the bucket sizes is linear in terms of the
total number of elements.
Worst Case Complexity O(n*n)
When elements in the array are close in proximity, they are likely to be placed in
the same bucket.
As a result, some buckets may contain more elements than others.
It makes the complexity dependent on the sorting algorithm used to sort the
bucket's elements.
When the elements are placed in reverse order, the complexity increases even
more.
When insertion sort is used to sort bucket elements, the time complexity becomes
O (n2).