0% found this document useful (0 votes)
78 views23 pages

Scalable Indexing in Content Addressable Networks

CAN is a distributed system that maps keys to values in a d-dimensional coordinate space. Nodes are responsible for zones within this space, and route queries by forwarding them to the node responsible for the zone containing the key. New nodes join by contacting an existing node, randomly selecting a point in the space, and splitting the zone of the nearest node. Zones can be reassigned if nodes fail. Improvements include using multiple dimensions, coordinate spaces, overloading zones, caching, and topology-aware construction.

Uploaded by

Yahya Fakhroji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views23 pages

Scalable Indexing in Content Addressable Networks

CAN is a distributed system that maps keys to values in a d-dimensional coordinate space. Nodes are responsible for zones within this space, and route queries by forwarding them to the node responsible for the zone containing the key. New nodes join by contacting an existing node, randomly selecting a point in the space, and splitting the zone of the nearest node. Zones can be reassigned if nodes fail. Improvements include using multiple dimensions, coordinate spaces, overloading zones, caching, and topology-aware construction.

Uploaded by

Yahya Fakhroji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

CONTENT ADDRESSABLE

NETWORK
Sylvia Ratsanamy, Mark Handley
Paul Francis, Richard Karp Scott
Shenker

OUTLINE

Introduction
Overview
Design
Improvements

Introduction
Key goal is scalable indexing system for
large-scale decentralized storage
applications on the Internet
P2P, Large scale storage management systems
(OceanStore, Publius), wide-area name
resolution services

Overview
CAN is a distributed system
that maps keys onto values
Keys hashed into d dimensional
space
Interface:
insert(key, value)
retrieve(key)

Overview
y

State of the system at time t


Peer
Resource

Zone

x
In this 2 dimensional space a key is mapped to a point (x,y)

DESIGN
Routing
Can Construction
Maintenance

Routing
y
d-dimensional space
with n zones

(x,y)

Peer
Q(x,y) Query/
Resource

2 zones are neighbor


if d-1 dim overlap
Routing path of
length:
Q(x,y)
Algorithm:
Choose the
neighbor nearest
to the destination

key

CAN: construction*

Bootstr
ap
node

new node

* From slides of Santashil

CAN: construction

Bootstr
ap
node

I
new node

1) Discover some node I already in


CAN

CAN: construction

(x,y)

I
new node

2) Pick random point in


space

CAN: construction

(x,y)

J
I
new node

3) I routes to (x,y), discovers nod

CAN: construction

J new

4) split Js zone in half new owns one ha

Maintenance
Use zone takeover in case of
failure or leaving of a node
Send your neighbor table to
neighbors to inform that you are
alive at discrete time interval t
If your neighbor does not send
alive in time t, takeover its zone
Zone reassignment is needed

Zone reassignment

3
1

Zoning

Partition tree

Zone reassignment

3
1

1
3

Partition tree
Zoning

Zone reassignment

3
1

Zoning

Partition tree

Zone reassignment

2
1

1
2

Partition tree
Zoning

Design Improvements
Multi-Dimension
Multi-Coordinate Spaces
Overloading the Zones
Multiple Hash Functions
Topologically Sensitive
Construction
Uniform Partitioning
Caching

Multi-Dimension
Increase in the dimension
reduces the path length

Multi-Coordinate Spaces

Multiple coordinate
spaces

Each node is assigned


different zone in each
of them.

Increases the
availability and
reduces the path
length

Overloading the Zones


More than one peer are assigned
to one zone.
Increases availability
Reduces path length
Reduce per-hop latency

Topologically Sensitive
Construction
Istanbul
Tokyo
Ankara

Predefined zones according to landmarks


Each new node measures round trip time
to each zone and enters to the shortest
So topologically close nodes will reside
in the same portion of space

Uniform Partitioning
Instead of splitting directly
splitting the node occupant
node
Compare the volume of its zone
with neighbors
The one to split is the one having
biggest volume

Common questions

Powered by AI

'Overloading the zones,' where multiple peers are assigned to the same zone, enhances performance by increasing availability and shortening path lengths. It also reduces per-hop latency, as more peers in a zone mean queries can be served faster, leading to improved reliability and efficiency of the CAN .

The Content Addressable Network (CAN) addresses scalable indexing by implementing a distributed system that maps keys onto values within a d-dimensional space. Each key is hashed into this space, which allows the CAN to efficiently index and retrieve data across a decentralized network. The system supports operations such as 'insert' and 'retrieve', enabling it to manage large-scale data effectively .

Using multiple hash functions in CAN can benefit the system by increasing data distribution uniformity across the network, which helps in balancing the load and improving fault tolerance. However, drawbacks include the added complexity in processing and potential for higher computational overhead, which necessitates careful consideration in their implementation .

Topologically sensitive construction plays a vital role in CAN structuring by ensuring that nodes are assigned to zones based on network proximity, measured by round-trip times to predefined landmarks. This allows nodes that are close in network topology to reside within the same space, optimizing the routing paths and reducing latency .

Uniform partitioning contributes to efficient zone management by ensuring that when a zone is split, the node with the largest volume zone is chosen for splitting. This method balances the distribution of zones across the network, preventing scenarios where certain zones might become overburdened while others remain underutilized, thus maintaining an overall balance and efficiency .

The primary considerations in CAN's support for large-scale decentralized storage include scalability, efficient data indexing, and robust network maintenance. These are addressed by using a d-dimensional space to index keys, implementing routing algorithms that minimize path lengths, and using zonal strategies for maintenance such as zone takeover and uniform partitioning. These elements collectively ensure CAN's adeptness at handling large-scale data in decentralized environments .

Multi-dimension and multi-coordinate spaces are crucial in CAN design improvements because they reduce the path length for routing messages. Multi-dimension increases the number of coordinates, thus shortening the routing path, while multi-coordinate spaces assign nodes different zones across multiple coordinate spaces, which further increases availability and efficiency in routing .

Zone reassignment in CAN can lead to challenges such as temporary increased latency as nodes recalibrate their routing tables. This process requires synchronization among neighbors to ensure stability in the network. However, proper zone reassignment improves resilience and fault tolerance, ultimately enhancing the network's performance despite the short-term disruptions .

Zone takeover in CAN is a critical maintenance mechanism that addresses node failure or departure. If a node fails to send a 'keep alive' message within a predefined time interval, its neighbor assumes control of the zone, thus ensuring continuity. This reassignment of zones helps maintain network integrity and availability without requiring centralized control .

The CAN routing algorithm selects a path for a query or resource key by choosing the neighbor closest to the destination point (Q(x,y)) in the d-dimensional space. The decision is influenced by the proximity to the destination, where each peer routes the query through the neighbor whose zone is nearest to the target coordinates, ensuring efficient and accurate routing .

You might also like