Implementation of PageRank Algorithm
1. Introduction
PageRank is a link analysis algorithm developed by Larry Page and Sergey Brin, the
founders of Google. It is used to measure the importance of web pages by analyzing the
structure of incoming links. The basic idea is that a page is important if many important
pages link to it.
2. Theory of PageRank
The PageRank of a page A is defined using the formula:
PR(A) = (1 - d)/N + d * Σ [ PR(B) / L(B) ] for all pages B linking to A
Where:
- PR(A) = PageRank of page A
- d = damping factor (usually 0.85)
- N = total number of pages
- M(A) = set of pages linking to A
- L(B) = number of outgoing links from page B
The damping factor introduces the probability that a user randomly jumps to another page,
preventing the algorithm from getting stuck at dead ends.
3. Steps of the Algorithm
1. Initialize PageRank of all pages equally as 1/N.
2. At each iteration, update the PageRank of each page using the formula.
3. Repeat until values converge (difference < tolerance).
4. Handle dangling nodes (pages with no outgoing links) by assuming they link to all pages
equally.
4. Example
Consider a graph with 4 pages: A, B, C, D
- A → B, C
-B→C
-C→A
-D→C
After running the algorithm (with damping factor = 0.85), the PageRank scores converge
approximately to:
A = 0.3721, B = 0.1958, C = 0.3945, D = 0.0376
Thus, Page C is the most important page in this network.
5. Python Implementation
import numpy as np
def page_rank_numpy(graph, damping=0.85, max_iter=100, tol=1e-
6):
nodes = list([Link]())
N = len(nodes)
node_index = {node: i for i, node in enumerate(nodes)}
# Build adjacency matrix
M = [Link]((N, N))
for node, links in [Link]():
if links:
for link in links:
M[node_index[link], node_index[node]] = 1 / len(links)
else: # dangling node
M[:, node_index[node]] = 1 / N
# Initialize PR
PR = [Link](N) / N
for _ in range(max_iter):
new_PR = (1 - damping) / N + damping * M @ PR
if [Link](new_PR - PR, 1) < tol:
break
PR = new_PR
return {nodes[i]: PR[i] for i in range(N)}
# Example graph
graph = {
"A": ["B", "C"],
"B": ["C"],
"C": ["A"],
"D": ["C"]
}
result = page_rank_numpy(graph)
print("PageRank Scores:")
for node, score in [Link]():
print(f"{node}: {score:.4f}")
6. Sample Output
PageRank Scores:
A: 0.3721
B: 0.1958
C: 0.3945
D: 0.0376
7. Applications and Advantages
Used by search engines to rank web pages.
Identifies influential nodes in social networks.
Helps in citation analysis of research papers.
Used in recommendation systems and link prediction.