Sets & Graphs
Sets and disjoint Set Union-union and find operations,
Graphs-Adjacency Matrix and Adjacency list- representations.
What are Sets?
• A set is fundamentally a collection of distinct elements.
• For our purposes, we will primarily consider elements as numbers,
typically ranging from 1 to n.
• All the sets we discuss are pairwise disjoint, meaning no element
exists in more than one set simultaneously.
• For example: If n=10, elements could be partitioned into three
disjoint sets: S1 = {1, 7, 8, 9}, S2 = {2, 5, 10}, and S3 = {3, 4, 6}.
How are Sets Represented?
• Sets can be efficiently represented using a forest, which is simply a
collection of trees.
• A unique aspect of this representation is that nodes are linked from
children to their parent, rather than the usual parent-to-child
direction. This "inverted" linking simplifies the implementation of
set operations.
• Figure 2.17 to see a visual example of this forest representation.
How are Sets Represented? …
• Data Representation for Set Names: Figure 2.19 illustrates how external set
names can effectively point to the root of their respective trees, providing a
convenient way to refer to each set.
Core Set Operations
• Union (Si ∪ Sj): This operation combines two disjoint sets, Si and Sj, into
a single, unified set. After the union, the original individual sets Si
and Sj are replaced by their combined form.
• Ex: The union of S1 and S2 would result in a new set {1, 7, 8, 9, 2, 5, 10}.
• Figure 2.18 demonstrates various possible representations for the union of
two sets.
Core Set Operations…
• Find(i): The Find(i) operation's purpose is to determine which specific set an
element i belongs to.
• For example: Find(4) would return S3, indicating that element 4 is part of set S3;
similarly, Find(9) would return S1.
Simple Implementations (Initial Approach)
• Data Structure: Sets can be represented using a simple array p[1:n],
where each p[i] entry stores the parent of element i.
• Identifying Roots: The root nodes of the trees in our forest are
identified by having a value of -1 in their p[i] entry.
• Figure 2.20 provides a clear illustration of how the sets S1, S2, and
S3 are represented using this array-based structure.
Simple Implementations (Initial Approach)…
• SimpleUnion(i, j): To unite the trees whose roots are i and j, the simplest
approach is to set p[i] := j;, effectively making the tree rooted at i a child of the
tree rooted at j.
• SimpleFind(i): This operation involves traversing the parent links in the p array,
starting from element i, until a -1 (which signifies the root) is encountered.
Simple Implementations (Initial Approach)…
• Performance Issue: While conceptually straightforward, this simple
approach can unfortunately lead to "degenerate trees," which are
long and thin, resembling linked lists.
• The Union operation remains very fast, taking constant time (O(1)).
• However, the Find operation can become extremely slow, potentially taking
O(n) time in the worst-case scenario
• (e.g., finding an element at the very bottom of a long chain, as shown in
Figure 2.21).
• A sequence of n-1 unions and n finds can therefore result in a significantly
inefficient overall time complexity of O(n2).
Improving Union: Weighted Union (Algorithm 2.14):
• The primary goal of Weighted Union is to
prevent the formation of degenerate trees and,
consequently, improve the performance of Find
operations.
• Weighting Rule : When uniting two trees with
roots i and j, the rule dictates that the tree with
fewer nodes should become a subtree of the
tree with more nodes. This strategy helps keep
the trees balanced.
• Implementation: To facilitate the weighting rule,
the p field of the root node stores the negative
count of the nodes in that tree (e.g., p[i] = -
count[i]).
• The WeightedUnion(i, j) operation still maintains
its O(1) time complexity.
• The SimpleFind algorithm itself remains
unchanged.
Improving Union: Weighted Union (Algorithm 2.14):
• Figure 2.22 illustrates how trees are structured
and united when the weighting rule is applied.
• Improved Find Performance: Because the tree
height is now logarithmic, Find operations are
dramatically improved, taking O(log m) time
(where m is the number of elements in the tree).
• The overall time complexity for u union
operations and f find operations becomes
O(u+f log u).
Graph
What are Graphs?
• Graphs are incredibly versatile mathematical structures used to model
relationships between various objects.
• Components: A graph G is formally defined by two sets:
• Vertices (V): A finite, non-empty set of points or nodes that represent the
objects.
• Edges (E): A set of connections or pairs of vertices that represent the
relationships between the objects.
Types of Graphs
• Undirected Graph: In an undirected graph, edges are unordered pairs (e.g., (u,v) is
considered the same as (v,u)). This implies that connections are bidirectional,
allowing traversal in either direction.
• Directed Graph (Digraph): Conversely, in a directed graph, edges are ordered pairs
(u,v). This signifies a specific connection or flow from u (the "tail") to v (the "head"),
meaning traversal is only allowed in the specified direction.
• Figure 2.25 provides clear examples, showcasing G1 and G2 as undirected graphs, and
G3 as a directed graph.
Key Graph Terminology
• Incident: An edge (u,v) is said to be incident to both vertex u and vertex v, indicating it touches or
connects both.
• Adjacent: Two vertices, u and v, are considered adjacent if there exists an edge directly connecting
them.
• Self-edge (Self-loop): A self-edge or self-loop is a special type of edge that connects a vertex to itself
(e.g., (u,u)).
• Figure 2.26(a) illustrates a graph that includes a self-edge.
• Multigraph: A multigraph is a type of graph that permits the existence of multiple distinct edges
between the same two vertices.
• Figure 2.26(b) displays a multigraph, highlighting the presence of multiple edges between a single
pair of vertices.
Key Graph Terminology
• Degree (Undirected): For an undirected graph, the degree of a vertex is simply the
total number of edges connected to it.
• In-degree (Directed): In a directed graph, the in-degree of a vertex is the count of
edges where that vertex serves as the head (i.e., edges pointing into it).
• Out-degree (Directed): Conversely, the out-degree of a vertex in a directed graph is
the count of edges where that vertex acts as the tail (i.e., edges pointing out from
it).
• Path: A path is defined as a sequence of vertices where each consecutive pair in the
sequence is connected by an edge.
• Length of a Path: The length of a path is determined by the number of edges it
contains.
• Simple Path: A simple path is a path in which all vertices are distinct, with the
possible exception of the first and last vertices if it forms a cycle.
• Cycle: A cycle is a simple path that begins and ends at the same vertex, forming a
closed loop.
Key Graph Terminology
• Connected Component (Undirected): In an undirected graph, a connected component is a maximal
connected subgraph, meaning it's a part of the graph where every vertex can be reached from every
other vertex within that specific subgraph.
• Figure 2.28 presents a graph composed of two distinct connected components, labeled H1 and H2.
Key Graph Terminology
• Strongly Connected Component (Directed): For a directed graph, a strongly connected component is
a maximal subgraph where there exists a directed path between any two vertices within that
component.
• Figure 2.29 illustrates a directed graph and highlights its individual strongly connected components.
• Tree: A tree, in graph theory, is defined as a connected undirected graph that contains no cycles.
How are Graphs Represented in Memory?
Adjacency Matrix
• This representation uses an n x n array
(where n is the number of vertices).
• An entry a[i,j] is set to 1 if an edge exists
from vertex i to vertex j, and 0 otherwise.
• For undirected graphs, the adjacency
matrix will always be symmetric (a[i,j] will be
equal to a[j,i]).
• Space Requirement: This method requires
O(n2) bits of memory. It is most suitable for
dense graphs, which have a relatively high
number of edges compared to vertices.
• Figure 2.30 displays the adjacency matrices
for graphs G1, G3, and G4.
• Pros: Checking for the existence of a
specific edge is extremely fast, taking
constant time (O(1)).
Adjacency Lists
• In this approach, each vertex i is associated
with a linked list that contains all the vertices
j to which i is directly connected.
• Space Requirement: The space needed is
O(n+e), where n is the number of vertices
and e is the number of edges. This makes it a
much more memory-efficient choice for
sparse graphs.
• Figure 2.31 demonstrates how adjacency
lists are constructed for graphs G1, G3, and G4.
• Pros: It is highly efficient for traversing or
finding all neighbors of a particular vertex.
• Cons: Determining if a specific edge exists
requires traversing a list, which can take
O(degree of vertex) time in the worst case.
Adjacency Lists
• Inverse Adjacency Lists: Figure 2.33 showcases inverse adjacency lists for G3, which list
vertices from which an edge points to a given vertex.
Adjacency Lists
• Orthogonal List Representation: Figure 2.34 illustrates an orthogonal list
representation for G3, providing a flexible way to represent both incoming
and outgoing edges.
Adjacency Multilists
• This specialized representation
uses two entries for each edge
(u,v): one in the list for vertex u and
another in the list for vertex v.
• It typically includes a "mark" field
to indicate whether an edge has
already been examined, which is
useful for certain graph
algorithms.
• Figure 2.35 displays the structure
of adjacency multilists for graph
G1.
Adjacency Multilists
• This specialized representation uses two entries for each edge (u,v): one in
the list for vertex u and another in the list for vertex v.
• It typically includes a "mark" field to indicate whether an edge has already
been examined, which is useful for certain graph algorithms.
• Figure 2.35 displays the structure of adjacency multilists for graph G1.
Weighted Edges:
• It's common for edges in a graph to have associated numerical values, often
referred to as "weights" (e.g., representing distance, cost, or capacity).
• This weight information is stored directly alongside the edge details within
the chosen graph data structure.