0% found this document useful (0 votes)
24 views88 pages

Resilient Data Structures Overview

The document discusses resilient data structures, focusing on the challenges posed by memory faults and the historical context of computing with unreliable components. It introduces various models and solutions for handling memory errors, including resilient dictionaries and priority queues, while highlighting the importance of robustness in algorithms. The talk also outlines the impact of memory faults on system performance and security, and presents strategies for designing resilient data structures that can tolerate faults without excessive data replication.

Uploaded by

hitesh Kag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views88 pages

Resilient Data Structures Overview

The document discusses resilient data structures, focusing on the challenges posed by memory faults and the historical context of computing with unreliable components. It introduces various models and solutions for handling memory errors, including resilient dictionaries and priority queues, while highlighting the importance of robustness in algorithms. The talk also outlines the impact of memory faults on system performance and security, and presents strategies for designing resilient data structures that can tolerate faults without excessive data replication.

Uploaded by

hitesh Kag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Resilient Data Structures

Francesco Silvestri

Department of Information Engineering


University of Padova
silvest1@[Link]

Workshop on Recent Advances in Data Structures

December 17-20th, 2011


The origins

Computing with unreliable information / faulty components dates back to


the 50s

Von Numann,Probabilistic Logics and the


Synthesis of Reliable Organisms from Unreliable
Components, 1956

F. Silvestri (UniPD) Resilient Data Structures DS Meet 2 / 78


Which components?

Processors

S.M. Ulam, Adventures of a Mathematician 1977

F. Silvestri (UniPD) Resilient Data Structures DS Meet 3 / 78


Which components?

Network nodes/links

Andrew C. Yao and F. Frances Yao, On


Fault-Tolerant Networks for Sorting 1985

F. Silvestri (UniPD) Resilient Data Structures DS Meet 4 / 78


Which components?

Memories

Memory fault
One or more bits read differently from how they were last written

Due to:
transient electronic noises: electrical or magnetic interference: e.g.,
cosmic rays
hardware problems: e.g., permanently damaged bit
corruption in data path between memories and processing units

F. Silvestri (UniPD) Resilient Data Structures DS Meet 5 / 78


This talk

Introduction to memory faults

The faulty RAM model

Some resilient data structures:

Resilient dictionary

Resilient priority queue

Resiliency & cache-obliviousness

Open problems

F. Silvestri (UniPD) Resilient Data Structures DS Meet 6 / 78


Impact of memory errors: machine crashes

Machine crashes

F. Silvestri (UniPD) Resilient Data Structures DS Meet 7 / 78


Impact of memory errors: security

Security vulnerabilities
Breaking cryptographic protocols
[Blömer and Seifert, 2003]
Taking control over Java Virtual
Machine
[Govindavajhala and Appel, 2003]
Breaking smart cards
[Skorobogatov and Anderson, 2003]

F. Silvestri (UniPD) Resilient Data Structures DS Meet 8 / 78


Impact of memory errors: unpredictable output

Unpredictable output: an example...

MERGE (h1, 2, 3i, h4, 5, 6i)



17
 2, 3i, h4, 5, 6i)
MERGE (h1,


h4, 5, 6, 17, 2, 3i

F. Silvestri (UniPD) Resilient Data Structures DS Meet 9 / 78


How common are memory errors?

F. Silvestri (UniPD) Resilient Data Structures DS Meet 10 / 78


A field study

In a field study by Google researchers [Schroeder et al., 2011]


Observed mean fault rates much higher than in laboratory conditions
25,000-70,000 faults per billion device hours per Mb
> 8% of DIMMs affected by faults per year

Small cluster of computers with few GB per node


one bit fault every few minutes

As memory size becomes larger, mean time between failures decreases

F. Silvestri (UniPD) Resilient Data Structures DS Meet 11 / 78


How to fight corruption?

F. Silvestri (UniPD) Resilient Data Structures DS Meet 12 / 78


Hardware vs software solutions

Hardware solution: error correcting codes (ECC)


$$$$$: large manufacturing and power costs
not always available
do not guarantee complete fault coverage: number of bit faults may
exceed ECC limit
Software solution: robustification
Redesign algorithms
Rewrite software
When faults occur: possibly longer execution, but space/time penalties
not too large

F. Silvestri (UniPD) Resilient Data Structures DS Meet 13 / 78


Some models of faulty memories

Liar model [Rényi, 1994, Ulam, 1977, Pelc, 2002]


two person game: how many comparison questions to find a number in
[1, 100] if the adversary can lie once or twice?
faults on operations, not on data
Sorting networks [Yao and Yao, 1985, Leighton and Ma, 1999]
Some comparison nodes may be faulty
Fault-tolerant pointer-based data structures
[Aumann and Bender, 1996]
Losing a single pointer can make an entire data structure unreachable
Error-correcting data structures [de Wolf, 2009]
Exploit ECCs to obtain space-time trade-offs
Checking model [Blum et al., 1991]
Can we design (on/off-line) checkers to report buggy behavior of data
structures using only a small (logarithmic) amount of reliable memory?

F. Silvestri (UniPD) Resilient Data Structures DS Meet 14 / 78


The Liar Model

Liar model: comparison questions answered by a


possibly lying adversary [Ulam, 1977, Rényi, 1994]

Different variants:
Types of question: comparison, subset inclusion,. . .
Types of lie: fixed number, probabilistic,. . .
Degree of interactivity between players
Sorting and searching well known. E.g. sorting with k lies:
Ω (n log n + kn) [Lakshmanan et al., 1991]
O (n log n) for k = O (log n/ log log n) [Ravikumar, 2002]
Lies ⇒ Transient failures ⇒ Algorithms can exploit query replication
strategies

Models faults on operations, not on data

F. Silvestri (UniPD) Resilient Data Structures DS Meet 15 / 78


Parallel Computing With Memory Faults

Processor/memory faults in parallel


settings
[Chlebus et al., 1994, Indyk, 1996]

Used models: PRAM / distributed memory machine


Static/dynamic deterministic/random faults
With fault-detection registers or limited adversary power

Simulation of fully operational models on faulty


models. Limited adversary.

F. Silvestri (UniPD) Resilient Data Structures DS Meet 16 / 78


Fault-Tolerant Sorting Networks

Fault-Tolerant Sorting Networks:


Comparators can be faulty and destroy one
of the input values [Yao and Yao, 1985]

With probabilistic faults [Assaf and Upfal, 1991]


O n log2 n nodes, depth O (log n)


Tight bound [Leighton and Ma, 1999]


Θ (log n) copies of each item
Uses fault-free replicators

Redundancy should be reduced.


Model faults on operations.

F. Silvestri (UniPD) Resilient Data Structures DS Meet 17 / 78


Pointer-Based Data Structures

Pointer-based data structures highly


non-resilient

Resilient pointer-based data structures [Aumann and Bender, 1996]


Faults are detected by the system
Resilient stacks, linked list, binary search tree
Based on connectivity property of the butterfly (i.e., FFT DAG)

A limited amount of uncorrupted data may be


lost upon the occurrence of a fault

F. Silvestri (UniPD) Resilient Data Structures DS Meet 18 / 78


Error-correcting data structures

Use ECC for restoring faults


[de Wolf, 2009]

Provide data structures for equality, membership, substring, inner


product
Trade-off number of probes (time) and space

Assume no safe memory. Only ECC.

F. Silvestri (UniPD) Resilient Data Structures DS Meet 19 / 78


Checkers

Design a checker that is able to detect error


in the behavior of a data structure
[Blum et al., 1991]

Detects faults; in case of faults, computation could not be restored


On-line checker: immediately after an operation
Off-line checker: at the end of a sequence
Checkers for stacks, queues and RAMs

Computation cannot be restored after a fault

F. Silvestri (UniPD) Resilient Data Structures DS Meet 20 / 78


We would like. . .

We would like to design algorithms and data structures


Resilient to δ faults
Resilient to faults inserted by a powerful adversary
Resilient to faults not recognizable by the system
May exploit O (1) safe memory

Provide (partial) correct solution even with faults

F. Silvestri (UniPD) Resilient Data Structures DS Meet 21 / 78


Which kind of solution?

Do we require the solution to be correct even


with faults?

Too much! We relax this assumption otherwise δ-replication is required

We require correctness (at least) on incorrect


data

Examples:
Sorting. Sort correctly uncorrupted data
Search: Is x in a set S? yes if there is an uncorrupted copy of x in S,
no if there are no uncorrupted values equal to x.

F. Silvestri (UniPD) Resilient Data Structures DS Meet 22 / 78


The faulty RAM

The Faulty RAM Model


[Finocchi and Italiano, 2004,
Finocchi and Italiano, 2008]

Memory fault: the correct value stored in a memory location is


altered (destructive faults)
Adversary with unbounded computational power: can corrupt up to δ
words 
 At any time
Fault appearance At any memory location
Simultaneously

Corrupted values indistinguishable from correct ones

F. Silvestri (UniPD) Resilient Data Structures DS Meet 23 / 78


The faulty RAM (2)

O (1) words of safe memory


cannot be corrupted by the adversary
can be read by the adversary

O (1) words of private memory


cannot be corrupted by the adversary
cannot be read by the adversary
useful for storing random bits

α: actual number of faults (α ≤ δ)

F. Silvestri (UniPD) Resilient Data Structures DS Meet 24 / 78


Why not data replication?

Data replication
Data replication can be quite inefficient in certain highly dynamic
scenarios, especially if objects to be replicated are large and complex

What can we do without (or limited) data replication?

E.g., with respect to sorting:


Q1 Can we sort the correct values in the presence of, e.g.,
polynomially many memory faults?
Q2 How many faults can we tolerate in the worst case if we wish
to maintain optimal time and space?
δ-resilient variable x
Write 2δ + 1 copies
Read by majority in O (1) safe memory: cannot be corrupted!
F. Silvestri (UniPD) Resilient Data Structures DS Meet 25 / 78
Some results in literature

Sorting (mergesort & quicksort) Counting


Searching (binary search & K -d trees
dictionaries) Interval trees
Priority queues Suffix trees
Dynamic programming ...

F. Silvestri (UniPD) Resilient Data Structures DS Meet 26 / 78


The rest of this talk

1 Resilient dictionary [Finocchi et al., 2007]

2 Resilient priority queue [Jørgensen et al., 2007]

F. Silvestri (UniPD) Resilient Data Structures DS Meet 27 / 78


Some results

Sorting [Finocchi and Italiano, 2008, Finocchi et al., 2009]



Θ n log n + δ 2 optimal
√ 
if δ = O n log n no time blow-up
Searching [Finocchi and Italiano, 2008, Jørgensen et al., 2007]
Θ (log n + δ) optimal
if δ = O (log n) no time blow-up
Counting [Brodal et al., 2009b]
Many counters, o(δ) space, one safe word
Small additive error
K -d trees [Gieseke et al., 2010]
Similar to the resilient search tree in the dictionary we will see
Used for clustering
Suffix/Interval trees [Christiano et al., 2011]
Exploit trade-off between ECC and replication

F. Silvestri (UniPD) Resilient Data Structures DS Meet 28 / 78


RESILIENT DICTIONARY

F. Silvestri (UniPD) Resilient Data Structures DS Meet 29 / 78


Resilient Dictionary

Operations
search(x):
return yes if there is an uncorrupted key x
return no if there isn’t an uncorrupted key x
If there is a corrupted key x, the behavior is not defined
insert(x), delete(x): defined as usual

[Finocchi et al., 2007]:


O (log n + αδ) amortized time/operation
O (n + δ) space
We will see a simpler implementation: O log n + αδ 2 amortized


time/operation

F. Silvestri (UniPD) Resilient Data Structures DS Meet 30 / 78


Main difficulties

Take wrong search direction upon reading a corrupted value


10

10
5 20 search(8) = false

2 8

Unsafe pointers:
point to wrong addresses (even outside the tree)
point to wrong nodes (tree structure?)
loosing a pointer ⇒ loosing part of the data

F. Silvestri (UniPD) Resilient Data Structures DS Meet 31 / 78


Unsafe pointers: a naı̈f approach

Replicate the tree 2δ + 1 times

10 10 10 10 10

5 20 5 20 5 20 5 20 5 20

2 8 2 8 2 8 2 8 2 8

At each step follow the majority value of the 2δ + 1 copies ⇒ correct


since we can have at most δ memory faults

O (δ log n) time
Too expensive
O (δn) space
It would be ok if the tree contains n/δ nodes
Push Θ (δ) keys/node

F. Silvestri (UniPD) Resilient Data Structures DS Meet 32 / 78


The data structure: ingredients

Group keys into disjoint intervals spanning the key space


(−∞, −10], (−10, 2], (2, 9], (9, 30], (30, 50], (50, +∞)

Each interval contains Θ (δ) keys (except possibly for the boundaries)
Say, at least δ/2 and at most 2δ

Intervals maintained in a (balanced) binary search tree (e.g. AVL


trees)

Tree stored reliably: each pointer and relevant info replicated 2δ + 1


times

Keys are not stored resiliently

Nodes stored in an array using doubling (check reliably in O (1) time


if a link point to a tree node)

F. Silvestri (UniPD) Resilient Data Structures DS Meet 33 / 78


Example: search(15)

F. Silvestri (UniPD) Resilient Data Structures DS Meet 34 / 78


Space usage

δ/2 ≤ number of keys per node ≤ 2δ (but boundary)



O (n/δ) nodes

Θ (δ) space per node

Linear space: O (n + δ)

F. Silvestri (UniPD) Resilient Data Structures DS Meet 35 / 78


Searching a key γ: the algorithm

search(γ)

1 Interval search
search for the interval that should contain γ (target node)
2 Key search
search γ in the list of keys of the target node

F. Silvestri (UniPD) Resilient Data Structures DS Meet 36 / 78


Example: search(15)

F. Silvestri (UniPD) Resilient Data Structures DS Meet 37 / 78


Useful tests

Given a key γ and a node v , we can check:

v = target node ⇒ γ ∈ I (v )

target node ∈ tree(v ) ⇒ γ ∈ U(v )

v ancestor of w ⇒ U(w ) ⊆ U(v )


 Unreliably: O (1) time
Tests can be done:
Reliably: Θ (δ) time

Using only reliable tests ⇒ too expensive: O (δ log(n/δ) + δ)

F. Silvestri (UniPD) Resilient Data Structures DS Meet 38 / 78


How to pay only an additive overhead?

A lazy approach

typically asleep:
trust unreliable variables. . .

from time to time wake up:


do some check

F. Silvestri (UniPD) Resilient Data Structures DS Meet 39 / 78



An O log n + αδ 2 algorithm

Rounds of at most δ search steps

starting checkpoint node x


target node ∈ tree(x)

ending checkpoint node y


target node ∈ tree(y )

F. Silvestri (UniPD) Resilient Data Structures DS Meet 40 / 78


Round structure

Unreliable phase
unreliable search steps +
unreliable consistency checks

No inconsistenty All check


Failing succeed
check
Checkpoint

Failing check

Reliable phase
reliable search steps+
reliable consistency checks

F. Silvestri (UniPD) Resilient Data Structures DS Meet 41 / 78


The unreliable phase

Perform (at most) δ unreliable search steps starting from x


(use only the first copy of each variable)

1 Let v = current node


2 If v = target node, go to the checkpoint
3 Otherwise, follow left/right pointer (let w = new node)
4 Check whether:
the address of node w is valid
w descendant of v
target node ∈ tree(w )
5 If any consistency check fails, start the reliable phase from x

F. Silvestri (UniPD) Resilient Data Structures DS Meet 42 / 78


The checkpoint

Perform the following reliable checks


(use all the 2δ + 1 copies of each variable)

1 Let x = starting checkpoint node


2 Let y = node on which the unreliable search terminated
3 If y = target node, then stop
4 If y descendant of x and target node ∈ tree(y ) ⇒ start new round
from y (search direction is correct)
5 Otherwise: start the reliable phase from x

F. Silvestri (UniPD) Resilient Data Structures DS Meet 43 / 78


The reliable phase

Perform δ reliable search steps starting from the checkpoint node x


(use all the 2δ + 1 copies of each variable)

1 Let v = current node


2 If v = target node, then stop
3 Otherwise, follow the left/right pointer

F. Silvestri (UniPD) Resilient Data Structures DS Meet 44 / 78


Search analysis

Rounds terminate after:


Unreliable phase + checkpoint:
cost O (δ)
Reliable phase:
cost O δ 2


Unsuccessful rounds:
go down the tree < δ levels
IDEA: Charge the time spent in reliable phases and unsuccessful rounds to
faulty values

F. Silvestri (UniPD) Resilient Data Structures DS Meet 45 / 78


Search analysis 2

Successful rounds:

O (log n/δ) such rounds ⇒ O (log n + δ) total time

Reliable phases:

Take place only if a check fails ⇒ A node at distance ≤ δ from the


starting checkpoint x contains some faulty value
At the end of the phase, such faulty value is out of the subtree in
which the search continues

At most δ faulty values ⇒ O αδ 2 total time

Unsuccessful rounds:

similar reasoning

F. Silvestri (UniPD) Resilient Data Structures DS Meet 46 / 78


Inserting a key

insert(γ)
Find the target node v and add 
O log n + αδ 2
the key to its list of keys
If the number of keys becomes 2δ O (δ)
Delete node v O (δ log n)
Split the interval I (v ) into two
subintervals L and R such that
L ∪ R = I (v )
L takes the δ smallest keys 
of v O δ2

R takes the δ largest keys


of v

Add two new nodes with intervals


O (δ log n)
L and R to the search tree

F. Silvestri (UniPD) Resilient Data Structures DS Meet 47 / 78


Insert analysis

The cost O (δ log n) can be amortized over Ω (δ) operations:


The new nodes contain δ keys each ⇒ the threshold 2δ can be
reached again after at least δ insertions
Similarly when we have deletions: Ω (δ) operations are necessary to
reach the threshold δ/2

Total amortized time

δ log n
O(log n + αδ 2 + ) = O log n + αδ 2

| {z } δ }
node search
| {z
tree update

F. Silvestri (UniPD) Resilient Data Structures DS Meet 48 / 78


Further improvements

[Finocchi et al., 2009]

O log n + δ 1+ amortized for any constant  > 0




O (log n + δ) expected amortized time

F. Silvestri (UniPD) Resilient Data Structures DS Meet 49 / 78


RESILIENT PRIORITY QUEUE

F. Silvestri (UniPD) Resilient Data Structures DS Meet 50 / 78


Priority queue

Operations
insert(x): insert a new entry x
deletemin(): return the smallest uncorrupted value or a corrupted
value

Priority queue from previous tree: O log n + αδ 2 amortized




From [Jørgensen et al., 2007]


O (log n + δ) amortized per operation
Based on the cache-oblivious implementation in [Arge et al., 2002]

F. Silvestri (UniPD) Resilient Data Structures DS Meet 51 / 78


Structure (1)

Di D i+1

I ... Ui Ui+1 ...


b si si+1
Li L i+1

Insertion buffer I
k = O (log n) layers Li
Layer Li contains two buffers Di and Ui
Buffers are implemented as circular arrays

F. Silvestri (UniPD) Resilient Data Structures DS Meet 52 / 78


Structure (2)

Di D i+1

I ... Ui Ui+1 ...


b si si+1
Li L i+1

Buffers are double linked


The links between components and their sizes are stored resiliently
Buffers Di contain small entries that are moving down
Buffers Ui contain large entries that are moving up

F. Silvestri (UniPD) Resilient Data Structures DS Meet 53 / 78


Structure (3)

Di D i+1

I ... Ui Ui+1 ...


b si si+1
Li L i+1

Invariants
Each buffer (but I ) is faithfully ordered ←− Correct values are sorted
Di Di+1 and Di Ui+1 are faithfully ordered
|I | ≤ b ←− b = δ + log n + 1
←− si = 2si−1 = 2i δ 2 + log2 n

si /2 ≤ |Di | ≤ si , 0 ≤ i < k
|Ui | ≤ si /2 0 ≤ i ≤ k

F. Silvestri (UniPD) Resilient Data Structures DS Meet 54 / 78


Insert

insert(x)

1 Append x to I

2 if |I | > b
1 Move all entries of I into U0
2 Resiliently sort U0
3 If |U0 | > s0 /2, invoke push(U0 )

F. Silvestri (UniPD) Resilient Data Structures DS Meet 55 / 78


Delete min

First, find the min

1 Find the min in I


2 Find the min of the first δ + 1 elements of U0
3 Find the min of the first δ + 1 elements of D0
4 The min is the smallest among the three values

Then, delete the min

1 Remove the min from the appropriate buffer


2 Right shift all the elements in the affected buffer from the beginning
up to the position of the minimum
3 If min was in D0 and now |D0 | < s0 /2 invokes pull(D0 )

F. Silvestri (UniPD) Resilient Data Structures DS Meet 56 / 78


Push

push(Ui )
(Invoked when |Ui | > si /2)

If Li is not the last layer


1 Merge U , D and U
i i i+1
2 Assign the first |D | − δ entries to a new buffer D 0
i i
3 Assign the remaining entries to a new buffer U 0
i+1
4 Set U = ∅
i Di = Di0 0
Ui+1 = Ui+1
5 If |U
i+1 | > si /2 invoke push(Ui+1 ) recursively

For each Di where |Di | < si /2 invoke pull(Di )

In the cache-oblivious implementation Di0 receives


|Di |

F. Silvestri (UniPD) Resilient Data Structures DS Meet 57 / 78


Faulty push

Di = {1, 2, 3} Di+1 = {4, 5, 6}


Ui = {100, 101} Ui+1 = {}
Merge {1, 2, 3} and {100, 101}
Di = {1, 2, 3} Di+1 = {4, 5, 6}
Ui = {} Ui+1 = {100, 101}
Memory fault: 3 → 200

Merge {1, 2, 200} and {100, 101}


Di = {1, 2, 100} Di+1 = {4, 5, 6}
Ui = {} Ui+1 = {101, 200}
Di and Di+1 are not faithful ordered since 100 has not been corrupted!

F. Silvestri (UniPD) Resilient Data Structures DS Meet 58 / 78


Push (again)

If Li is the last layer


1 Merge Ui , Di
2 Assign the first |Di | entries to a new buffer Di0
3 0
Assign the remaining entries to a new buffer Di+1
4 Set Di = Di0 0
Di+1 = Di+1 Ui = Ui+1 = ∅

F. Silvestri (UniPD) Resilient Data Structures DS Meet 59 / 78


Pull

pull(Di )
(Invoked when |Di | < si /2)
If Di is not the last layer
1 Merge D , D
i i+1 and Ui+1
2 Assign the first s entries to a new buffer D 0
i i
3 Assign the next |D 0
i+1 | − (si − |Di |) − δ to a new buffer Di+1
4 Assign the remaining values to a new buffer U 0
i+1
5 Set D = D 0 D = D 0 U = U 0
i i i+1 i+1 i+1 i+1
6 If |D
i+1 | < si /2 invoke pull(Di+1 ) recursively

If Di is the last layer: nop


If |Ui+1 | > si /2 invoke push(Ui+1 ) recursively

0
In the cache-oblivious implementation Di+1 receives
|Di+1 | − (si − |Di |)
F. Silvestri (UniPD) Resilient Data Structures DS Meet 60 / 78
Complexities

Space: O (n)
Keys not replicated
Ω (δ + log n) keys per level (but I and Lk )
O (δ) pointers per level
O (n + δ) space
δ term can be removed by exploiting safe memory
Time/operation: O (log n + δ) amortized
Each layer is updated after O (si ) operations

F. Silvestri (UniPD) Resilient Data Structures DS Meet 61 / 78


Resiliency
&
Cache-obliviousness

F. Silvestri (UniPD) Resilient Data Structures DS Meet 62 / 78


I/O-efficiency

Faulty RAM has one memory level


Modern platforms feature memory
hierarchies
Reducing I/O improves performance ⇒
exploit locality
Caches (SRAM) even more sensitive to
memory faults
Low supply voltage, low critical charge per
cell
ECC prohibitive: tight constraints on die
size and speed

F. Silvestri (UniPD) Resilient Data Structures DS Meet 63 / 78


Fault tolerance vs I/O-efficiency

Hierarchical faulty memory model


[Brodal et al., 2009a]

Two memory levels (memory and cache)


Cache size M, block length B
Both levels can be faulty
I/O resilient algorithms for: sorting, dictionary,
priority queue

Algorithms are cache-aware: crucially depend on memory parameters



reduced portability

F. Silvestri (UniPD) Resilient Data Structures DS Meet 64 / 78


Fault tolerance vs cache-oblivious
Cache-oblivious algorithms overcome the issue [Frigo et al., 1999]
no explicit dependency on memory parameters
adapt automatically to all memory levels
optimality on a two-level hierarchy implies optimality on an arbitrary
hierarchy

Question
Can we design algorithms that are fault-tolerant and cache-oblivious?

Cache-oblivious algorithms are designed in a flat model (faulty-RAM),


but executed on the hierarchical faulty memory model
P private memory
if P = Θ (1): private memory may be implemented in the CPU registers
if P = ω (1): private memory hierarchy whose largest level has size P
Misses due to private memory are negligible in our algorithms.
Cache-oblivious algorithms don’t use M and B, but may use δ and P
F. Silvestri (UniPD) Resilient Data Structures DS Meet 65 / 78
Resilient cache-oblivious algorithms

[Caminiti et al., 2011] shows how to derive resilient cache-oblivious


algorithms for many problems
Local-dependency dynamic programming
Edit distance
Longest common subsequence
Gaussian Elimination Paradigm
All-pairs shortest path
Matrix multiplication
Gaussian Elimination Without Pivoting
Fast Fourier Transform

F. Silvestri (UniPD) Resilient Data Structures DS Meet 66 / 78


Edit distance. . .

We focus on a case of local dependency dynamic programming

Case study
Computing the edit distance (ED) of two strings:
We show how to derive a resilient cache-oblivious algorithm for ED
using P private memory

Similar techniques applies to GEP and FFT

F. Silvestri (UniPD) Resilient Data Structures DS Meet 67 / 78


First, some notation. . .

r -resilient variable x
Write 2r + 1 copies
Read by majority (in O (1) safe memory)
At least r + 1 faults are required to corrupt x
An adversary can corrupt at most bδ/(r + 1)c r -resilient variables

Rabin fingerprint ψA of a vector A = ha0 , a1 , . . . , an−1 i


n−1
X
ψA = ai 2w (n−i−1) mod p
i=0

p prime number, w memory word size


Can be computed with a scan of A and O (1) space
If entries are not accessed in order, fingerprints may require O (n log n)
due to exponentiation

F. Silvestri (UniPD) Resilient Data Structures DS Meet 68 / 78


Running example: ED

Edit distance
Input: strings X = x1 , . . . xn , Y = y1 , . . . yn .
Output: their edit distance

Edit − Distance(X , Y ) = number of edit ops {ins, del, sub} required to


transform X into Y
DP table for ED: (n + 1) × (n + 1) table, given by the following
recurrence:

 i +j if i = 0 or j = 0
`[i, j] = `[i − 1, j − 1] if i, j > 0 and xi = yj
1 + min{`[i, j − 1], `[i − 1, j]} if i, j > 0 and xi 6= yj

The ED is `[n, n]
O n2 running time


F. Silvestri (UniPD) Resilient Data Structures DS Meet 69 / 78


A cache-oblivious algorithm for ED

Cache-oblivious algorithm [Chowdhury and Ramachandran, 2007]


Input: Strings X and Y ; T and L boundaries
Output: R and D boundaries

Decomposes the table into 4 subtables


Recursively computes the output
boundaries of each subtable

F. Silvestri (UniPD) Resilient Data Structures DS Meet 70 / 78


A cache-oblivious algorithm for ED

Cache-oblivious algorithm [Chowdhury and Ramachandran, 2007]


Input: Strings X and Y ; T and L boundaries
Output: R and D boundaries

Decomposes the table into 4 subtables


Recursively computes the output
boundaries of each subtable

F. Silvestri (UniPD) Resilient Data Structures DS Meet 70 / 78


A cache-oblivious algorithm for ED

Cache-oblivious algorithm [Chowdhury and Ramachandran, 2007]


Input: Strings X and Y ; T and L boundaries
Output: R and D boundaries

Decomposes the table into 4 subtables


Recursively computes the output
boundaries of each subtable

F. Silvestri (UniPD) Resilient Data Structures DS Meet 70 / 78


A cache-oblivious algorithm for ED

Cache-oblivious algorithm [Chowdhury and Ramachandran, 2007]


Input: Strings X and Y ; T and L boundaries
Output: R and D boundaries

Decomposes the table into 4 subtables


Recursively computes the output
boundaries of each subtable

F. Silvestri (UniPD) Resilient Data Structures DS Meet 70 / 78


A cache-oblivious algorithm for ED

Cache-oblivious algorithm [Chowdhury and Ramachandran, 2007]


Input: Strings X and Y ; T and L boundaries
Output: R and D boundaries

Decomposes the table into 4 subtables


Recursively computes the output
boundaries of each subtable

F. Silvestri (UniPD) Resilient Data Structures DS Meet 70 / 78


The main idea

A bad idea
All variables are δ-resiliently
O δn2 time

 
n2
O δ B√ M
misses
Match lower bounds when δ = O (1)

A good idea
Use dδ/2i e-resilient variables at recursive level i
Each input at level-i is associated with a fingerprint computed with
correct values using prime pi (read fingerprint).
The adversary can corrupt at most 2i subproblems at level i
The algorithm can recognize faults on inputs w.h.p.

F. Silvestri (UniPD) Resilient Data Structures DS Meet 71 / 78


The resilient algorithm

The algorithm at recursive level i

Input: strings X and Y , boundaries L and

T and the respective read fingerprints in


private memory.

Output: boundaries R and D and the


respective read fingerprints. null if an
incorrigible fault occurs.

Note: Inputs and outputs are Private memory (input):


dδ/2 e-resilient and fingerprints computed ΨX , ΨY , ΨT , ΨL
i+1

with prime pi . Private memory (output):


ΨR , ΨD
F. Silvestri (UniPD) Resilient Data Structures DS Meet 72 / 78
The resilient algorithm (2)

Algorithm:
1 Compute the 4 subproblems recursively
2 For each subproblem:
1 Extracts inputs as dδ/2i+1 e-resilient
variables
2 Create read fingerprints of subproblem
inputs with prime pi+1
3 While creating the new fingerprints
check correctness using the old ones
3 If a fault is detected return null
4 If a subproblem return null change
prime pi+1 and restart

F. Silvestri (UniPD) Resilient Data Structures DS Meet 73 / 78


Fingerprint mismatches
Input: vector X (dδ/2i e-resilient) and Ψ(X ) (computed with prime pi )
1 At the same time computes
fingerprint Ψ(X 0 ) of X 0 using pi+1
fingerprint Ψ̃(X 0 ) of X 0 using pi

2 Compute fingerprint Ψ̃(X ) of X using pi starting from Ψ̃(X 0 )

3 If Ψ̃(X ) 6= Ψ(X ), at least dδ/2i e + 1 faults occur, then return null


F. Silvestri (UniPD) Resilient Data Structures DS Meet 74 / 78
Analysis

Successful recursive calls:


No fingerprint mismatch 
T (n, δ) = 4T (n/2, δ/2) + Θ (n(δ + 1)) = O n2 + δn log n

Unsuccessful calls (α ≤ δ actual number of faults):


Fingerprint mismatch at level i
at least δ/2i values corrupted
at most α2i /δ recomputations at level i
log
Xδ log δ
α2i T (n, δ) X 1
≤ T (n, δ) ≤ T (n, δ)
δ 4i 2i
i=1 i=1

F. Silvestri (UniPD) Resilient Data Structures DS Meet 75 / 78


Bounds for LD-DP

Bounds
Running time: O n2 + δn log n


Match O n2 : δ = O (n/ log n)




Cache misses: O n2 /(MB) + δn log n/B



 

Lower bound: Ω  n2 /(MB) + δn/B 


| {z } | {z }
non-resilient bound resilient bound

Previous result: O (nm/B) misses even without faults


The algorithm is cache-oblivious
Requires private memory Θ (log n)

F. Silvestri (UniPD) Resilient Data Structures DS Meet 76 / 78


Research directions

Three research directions:

1 δ-obliviousness

2 Graphs

how to define correctness?

linear-time graph algorithms: cannot store input resiliently

3 Other algorithms/data structures


Trade-off redundancy vs ECC [Christiano et al., 2011]

F. Silvestri (UniPD) Resilient Data Structures DS Meet 77 / 78


Thank you!

Questions?

F. Silvestri (UniPD) Resilient Data Structures DS Meet 78 / 78


Arge, L., Bender, M. A., Demaine, E. D., Holland-Minkley, B., and Munro, J. I.
(2002).
Cache-oblivious priority queue and graph algorithm applications.
In Proc. of 34th STOC, pages 268–276.

Assaf, S. and Upfal, E. (1991).


Fault tolerant sorting networks.
SIAM Journal on Discrete Mathematics, 4(4):472–480.

Aumann, Y. and Bender, M. (1996).


Fault tolerant data structures.
In Proc. of 37th FOCS, pages 580 –589.

Blömer, J. and Seifert, J.-P. (2003).


Fault Based Cryptanalysis of the Advanced Encryption Standard (AES) Financial
Cryptography.
In Financial Cryptography, volume 2742 of LNCS, chapter 12, pages 162–181.
Springer Berlin / Heidelberg.

Blum, M., Evans, W., Gemmell, P., Kannan, S., and Naor, M. (1991).
Checking the correctness of memories.
In Proc. 32nd FOCS, pages 90–99.
F. Silvestri (UniPD) Resilient Data Structures DS Meet 78 / 78
Brodal, G. S., Jørgensen, A. G., and Mølhave, T. (2009a).
Fault tolerant external memory algorithms.
In Proc. 11th WADS, volume 5664 of LNCS, pages 411–422.

Brodal, G. S., Jørgensen, A. G., Moruz, G., and Mølhave, T. (2009b).


Counting in the presence of memory faults.
In Proc. 20th ISAAC, volume 5878 of LNCS, pages 842–851.

Caminiti, S., Finocchi, I., Fusco, E. G., and Silvestri, F. (2011).


Dynamic programming in faulty memory hierarchies (cache-obliviously).
In Procs. of 31st FSTTCS.
Chlebus, B. S., Gambin, A., and Indyk, P. (1994).
PRAM computations resilient to memory faults.
In Proc. 2nd ESA, volume 855 of LNCS, pages 401–412. Springer Berlin /
Heidelberg.

Chowdhury, R. A. and Ramachandran, V. (2007).


The cache-oblivious gaussian elimination paradigm: theoretical framework,
parallelization and experimental evaluation.
In Proc. 19th SPAA, pages 71–80.

Christiano, P., Demaine, E. D., and Kishore, S. (2011).


F. Silvestri (UniPD) Resilient Data Structures DS Meet 78 / 78
Lossless fault-tolerant data structures with additive overhead.
In Proc. 12th WADS, pages 243–254.

de Wolf, R. (2009).
Error-correcting data structures.
In STACS, pages 313–324.

Finocchi, I., Grandoni, F., and Italiano, G. F. (2007).


Resilient search trees.
In Proc. of the 18th SODA, pages 547–553.

Finocchi, I., Grandoni, F., and Italiano, G. F. (2009).


Optimal resilient sorting and searching in the presence of memory faults.
Theor. Comput. Sci., 410(44):4457–4470.

Finocchi, I. and Italiano, G. (2004).


Sorting and searching in the presence of memory faults (without redundancy).
In Proc. of the 36th STOC, pages 101–110.

Finocchi, I. and Italiano, G. (2008).


Sorting and searching in faulty memories.
Algorithmica, 52:309–332.
F. Silvestri (UniPD) Resilient Data Structures DS Meet 78 / 78
Frigo, M., Leiserson, C. E., Prokop, H., and Ramachandran, S. (1999).
Cache-oblivious algorithms.
In Proc. 40th FOCS, pages 285–298.

Gieseke, F., Moruz, G., and Vahrenhold, J. (2010).


Resilient k-d trees: K-means in space revisited.
In Proc. of 10th ICDM, pages 815 –820.

Govindavajhala, S. and Appel, A. W. (2003).


Using memory errors to attack a virtual machine.
In Proc. of Symp. Security and Privacy, pages 154–165. IEEE.

Indyk, P. (1996).
On word-level parallelism in fault-tolerant computing.
In Proc. 13th STOCS, volume 1046 of LNCS, pages 193–204. Springer Berlin /
Heidelberg.

Jørgensen, A. G., Moruz, G., and Mølhave, T. (2007).


Priority queues resilient to memory faults.
In Algorithms and Data Structures, volume 4619 of LNCS, chapter 12, pages
127–138. Springer Berlin / Heidelberg.

Lakshmanan, K., Ravikumar, B., and Ganesan, K. (1991).


F. Silvestri (UniPD) Resilient Data Structures DS Meet 78 / 78
Coping with erroneous information while sorting.
IEEE Trans. on Computers, 40(9):1081 –1084.

Leighton, T. and Ma, Y. (1999).


Tight bounds on the size of fault-tolerant merging and sorting networks with
destructive faults.
SIAM Journal on Computing, 29(1):258–273.

Pelc, A. (2002).
Searching games with errorsfifty years of coping with liars.
Theoretical Computer Science, 270(1-2):71–109.

Ravikumar, B. (2002).
A fault-tolerant merge sorting algorithm.
In Computing and Combinatorics, volume 2387 of LNCS, pages 465–496. Springer
Berlin / Heidelberg.

Rényi, A. (1994).
A Diary on Information Theory.
John Wiley & Sons.
Schroeder, B., Pinheiro, E., and Weber, W. D. (2011).
DRAM errors in the wild: a large-scale field study.
Commun. ACM, 54:100–107.
F. Silvestri (UniPD) Resilient Data Structures DS Meet 78 / 78
Skorobogatov, S. and Anderson, R. (2003).
Optical Fault Induction Attacks Cryptographic Hardware and Embedded Systems.
In Proc. of CHES, volume 2523 of LNCS, chapter 2, pages 31–48. Springer Berlin /
Heidelberg.

Ulam, S. (1977).
Adventures of a mathematician.
Scribners.
Yao, A. C. and Yao, F. F. (1985).
On fault-tolerant networks for sorting.
SIAM Journal on Computing, 14(1):120–128.

F. Silvestri (UniPD) Resilient Data Structures DS Meet 78 / 78

You might also like