0% found this document useful (0 votes)
117 views3 pages

PageRank Algorithm Implementation in Python

Uploaded by

laxmipandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views3 pages

PageRank Algorithm Implementation in Python

Uploaded by

laxmipandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Implementation of PageRank Algorithm

1. Introduction
PageRank is a link analysis algorithm developed by Larry Page and Sergey Brin, the
founders of Google. It is used to measure the importance of web pages by analyzing the
structure of incoming links. The basic idea is that a page is important if many important
pages link to it.

2. Theory of PageRank
The PageRank of a page A is defined using the formula:

PR(A) = (1 - d)/N + d * Σ [ PR(B) / L(B) ] for all pages B linking to A

Where:
- PR(A) = PageRank of page A
- d = damping factor (usually 0.85)
- N = total number of pages
- M(A) = set of pages linking to A
- L(B) = number of outgoing links from page B

The damping factor introduces the probability that a user randomly jumps to another page,
preventing the algorithm from getting stuck at dead ends.

3. Steps of the Algorithm


1. Initialize PageRank of all pages equally as 1/N.
2. At each iteration, update the PageRank of each page using the formula.
3. Repeat until values converge (difference < tolerance).
4. Handle dangling nodes (pages with no outgoing links) by assuming they link to all pages
equally.

4. Example
Consider a graph with 4 pages: A, B, C, D
- A → B, C
-B→C
-C→A
-D→C

After running the algorithm (with damping factor = 0.85), the PageRank scores converge
approximately to:
A = 0.3721, B = 0.1958, C = 0.3945, D = 0.0376

Thus, Page C is the most important page in this network.

5. Python Implementation

import numpy as np

def page_rank_numpy(graph, damping=0.85, max_iter=100, tol=1e-


6):
nodes = list([Link]())
N = len(nodes)
node_index = {node: i for i, node in enumerate(nodes)}

# Build adjacency matrix


M = [Link]((N, N))
for node, links in [Link]():
if links:
for link in links:
M[node_index[link], node_index[node]] = 1 / len(links)
else: # dangling node
M[:, node_index[node]] = 1 / N

# Initialize PR
PR = [Link](N) / N

for _ in range(max_iter):
new_PR = (1 - damping) / N + damping * M @ PR
if [Link](new_PR - PR, 1) < tol:
break
PR = new_PR

return {nodes[i]: PR[i] for i in range(N)}

# Example graph
graph = {
"A": ["B", "C"],
"B": ["C"],
"C": ["A"],
"D": ["C"]
}

result = page_rank_numpy(graph)
print("PageRank Scores:")
for node, score in [Link]():
print(f"{node}: {score:.4f}")

6. Sample Output
PageRank Scores:
A: 0.3721
B: 0.1958
C: 0.3945
D: 0.0376

7. Applications and Advantages


 Used by search engines to rank web pages.
 Identifies influential nodes in social networks.
 Helps in citation analysis of research papers.
 Used in recommendation systems and link prediction.

You might also like