0% found this document useful (0 votes)
24 views7 pages

PageRank and HITS Algorithm Implementation

Uploaded by

martialgamer56
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views7 pages

PageRank and HITS Algorithm Implementation

Uploaded by

martialgamer56
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Experiment – 10

Aim : a] Implementation of Page Rank Algorithms.


Theory :
The PageRank algorithm, developed by Larry Page and Sergey Brin (Google
founders), assigns a numerical weight to each page in a hyperlinked web
graph to measure its relative importance.
It is based on the concept that a page is considered important if it is linked to
by many other important pages.
The rank of a page is defined recursively and depends on the number and
quality of links pointing to it.
Mathematically, the PageRank vector PR is given by:

Where:
• d → damping factor (usually 0.85)
• N → total number of web pages
• M → transition probability matrix (column-stochastic)
• PR → PageRank vector representing rank of each page
The algorithm iteratively updates PR until convergence, i.e., when the change
in PR between two iterations becomes very small.
Algorithm Steps:
1. Represent the web pages as an adjacency matrix (A).
2. Normalize A to create the transition matrix (M).
3. Initialize the PageRank vector (PR) uniformly.
4. Apply the iterative formula:
5. Continue iterations until convergence (difference < tolerance).
6. Output the final PageRank values.

Code :
import numpy as np
# --- 1. SETUP: Adjacency Matrix and Parameters ---

A = [Link](
[
[0, 1, 1, 0, 0], # P1: Links from P2, P3
[0, 0, 1, 1, 0], # P2: Links from P3, P4
[1, 0, 0, 1, 1], # P3: Links from P1, P4, P5
[0, 0, 1, 0, 1], # P4: Links from P3, P5
[0, 0, 0, 0, 0], # P5: Dangling node (no outgoing links)
],
dtype=np.float64,
)

n = len(A)
d = 0.85
TOLERANCE = 1e-6
MAX_ITERATIONS = 100

# --- 2. TRANSITION MATRIX (M) CONSTRUCTION ---

M = [Link]()
for j in range(n):
col_sum = M[:, j].sum()
if col_sum == 0:
M[:, j] = 1.0 / n # Fix dangling node
else:
M[:, j] /= col_sum # Normalize

print("Transition Matrix (M):\n", [Link](M, 4))

# --- 3. POWER ITERATION ---

PR = [Link](n) / n

for i in range(1, MAX_ITERATIONS + 1):


previous_PR = [Link]()
PR = (1 - d) / n + d * [Link](M, PR)
if [Link]([Link](PR - previous_PR)) < TOLERANCE:
print(f"\nConverged after {i} iterations.")
break
else:
print(f"\nDid not converge within {MAX_ITERATIONS} iterations.")

# --- 4. FINAL OUTPUT ---

print("\nFinal PageRank Values:")


for i, val in enumerate(PR, start=1):
print(f"P{i}: {val:.4f}")

print(f"\nVerification (sum = {[Link](PR):.10f})")

Conclusion: Implementation of PageRank Algorithm.


Aim: b] Implementation of Page Hits Algorithm.
Theory:
The HITS algorithm, developed by Jon Kleinberg, identifies two key roles of
web pages:
• Authority: A page that is linked by many good hubs (trusted sources of
information).
• Hub: A page that links to many good authorities.
The algorithm operates on a directed graph where nodes represent web
pages and edges represent hyperlinks.
Mathematically:
Authority = 𝑨𝑻 ⋅ Hub
Hub = 𝑨 ⋅ Authority

where A is the adjacency matrix.


Both vectors are normalized in each iteration until convergence.

Algorithm Steps:
1. Represent the web graph using a directed graph.
2. Initialize all hub and authority scores to 1.
3. Update:
o Authority score = sum of hub scores of linking nodes.
o Hub score = sum of authority scores of linked nodes.
4. Normalize scores after each iteration.
5. Repeat until scores converge.
6. Display the final Hub and Authority values.

Code:
import networkx as nx
import [Link] as plt

G = [Link]()
G.add_edges_from(
[
("A", "D"),
("B", "C"),
("B", "E"),
("C", "A"),
("D", "C"),
("E", "D"),
("E", "B"),
("E", "F"),
("E", "C"),
("F", "C"),
("F", "H"),
("G", "A"),
("G", "C"),
("H", "A"),
]
)

hubs, authorities = [Link](G, max_iter=50, normalized=True)

print("\nHub Scores (Outward influence):")


for node, score in sorted([Link]()):
print(f"{node}: {score:.4f}")

print("\nAuthority Scores (Incoming influence):")


for node, score in sorted([Link]()):
print(f"{node}: {score:.4f}")

[Link](figsize=(10, 8))
pos = nx.spring_layout(G, seed=42)
nx.draw_networkx(
G,
pos,
with_labels=True,
node_size=1500,
node_color="skyblue",
font_size=10,
font_weight="bold",
arrowsize=20,
)
[Link]("Graph for HITS Analysis")
[Link]()

Conclusion:
Implementation of Page Hit Algorithm.

You might also like