Resilient Data Structures
Francesco Silvestri
Department of Information Engineering
University of Padova
silvest1@[Link]
Workshop on Recent Advances in Data Structures
December 17-20th, 2011
The origins
Computing with unreliable information / faulty components dates back to
the 50s
Von Numann,Probabilistic Logics and the
Synthesis of Reliable Organisms from Unreliable
Components, 1956
F. Silvestri (UniPD) Resilient Data Structures DS Meet 2 / 78
Which components?
Processors
S.M. Ulam, Adventures of a Mathematician 1977
F. Silvestri (UniPD) Resilient Data Structures DS Meet 3 / 78
Which components?
Network nodes/links
Andrew C. Yao and F. Frances Yao, On
Fault-Tolerant Networks for Sorting 1985
F. Silvestri (UniPD) Resilient Data Structures DS Meet 4 / 78
Which components?
Memories
Memory fault
One or more bits read differently from how they were last written
Due to:
transient electronic noises: electrical or magnetic interference: e.g.,
cosmic rays
hardware problems: e.g., permanently damaged bit
corruption in data path between memories and processing units
F. Silvestri (UniPD) Resilient Data Structures DS Meet 5 / 78
This talk
Introduction to memory faults
The faulty RAM model
Some resilient data structures:
Resilient dictionary
Resilient priority queue
Resiliency & cache-obliviousness
Open problems
F. Silvestri (UniPD) Resilient Data Structures DS Meet 6 / 78
Impact of memory errors: machine crashes
Machine crashes
F. Silvestri (UniPD) Resilient Data Structures DS Meet 7 / 78
Impact of memory errors: security
Security vulnerabilities
Breaking cryptographic protocols
[Blömer and Seifert, 2003]
Taking control over Java Virtual
Machine
[Govindavajhala and Appel, 2003]
Breaking smart cards
[Skorobogatov and Anderson, 2003]
F. Silvestri (UniPD) Resilient Data Structures DS Meet 8 / 78
Impact of memory errors: unpredictable output
Unpredictable output: an example...
MERGE (h1, 2, 3i, h4, 5, 6i)
⇓
17
2, 3i, h4, 5, 6i)
MERGE (h1,
⇓
h4, 5, 6, 17, 2, 3i
F. Silvestri (UniPD) Resilient Data Structures DS Meet 9 / 78
How common are memory errors?
F. Silvestri (UniPD) Resilient Data Structures DS Meet 10 / 78
A field study
In a field study by Google researchers [Schroeder et al., 2011]
Observed mean fault rates much higher than in laboratory conditions
25,000-70,000 faults per billion device hours per Mb
> 8% of DIMMs affected by faults per year
Small cluster of computers with few GB per node
one bit fault every few minutes
As memory size becomes larger, mean time between failures decreases
F. Silvestri (UniPD) Resilient Data Structures DS Meet 11 / 78
How to fight corruption?
F. Silvestri (UniPD) Resilient Data Structures DS Meet 12 / 78
Hardware vs software solutions
Hardware solution: error correcting codes (ECC)
$$$$$: large manufacturing and power costs
not always available
do not guarantee complete fault coverage: number of bit faults may
exceed ECC limit
Software solution: robustification
Redesign algorithms
Rewrite software
When faults occur: possibly longer execution, but space/time penalties
not too large
F. Silvestri (UniPD) Resilient Data Structures DS Meet 13 / 78
Some models of faulty memories
Liar model [Rényi, 1994, Ulam, 1977, Pelc, 2002]
two person game: how many comparison questions to find a number in
[1, 100] if the adversary can lie once or twice?
faults on operations, not on data
Sorting networks [Yao and Yao, 1985, Leighton and Ma, 1999]
Some comparison nodes may be faulty
Fault-tolerant pointer-based data structures
[Aumann and Bender, 1996]
Losing a single pointer can make an entire data structure unreachable
Error-correcting data structures [de Wolf, 2009]
Exploit ECCs to obtain space-time trade-offs
Checking model [Blum et al., 1991]
Can we design (on/off-line) checkers to report buggy behavior of data
structures using only a small (logarithmic) amount of reliable memory?
F. Silvestri (UniPD) Resilient Data Structures DS Meet 14 / 78
The Liar Model
Liar model: comparison questions answered by a
possibly lying adversary [Ulam, 1977, Rényi, 1994]
Different variants:
Types of question: comparison, subset inclusion,. . .
Types of lie: fixed number, probabilistic,. . .
Degree of interactivity between players
Sorting and searching well known. E.g. sorting with k lies:
Ω (n log n + kn) [Lakshmanan et al., 1991]
O (n log n) for k = O (log n/ log log n) [Ravikumar, 2002]
Lies ⇒ Transient failures ⇒ Algorithms can exploit query replication
strategies
Models faults on operations, not on data
F. Silvestri (UniPD) Resilient Data Structures DS Meet 15 / 78
Parallel Computing With Memory Faults
Processor/memory faults in parallel
settings
[Chlebus et al., 1994, Indyk, 1996]
Used models: PRAM / distributed memory machine
Static/dynamic deterministic/random faults
With fault-detection registers or limited adversary power
Simulation of fully operational models on faulty
models. Limited adversary.
F. Silvestri (UniPD) Resilient Data Structures DS Meet 16 / 78
Fault-Tolerant Sorting Networks
Fault-Tolerant Sorting Networks:
Comparators can be faulty and destroy one
of the input values [Yao and Yao, 1985]
With probabilistic faults [Assaf and Upfal, 1991]
O n log2 n nodes, depth O (log n)
Tight bound [Leighton and Ma, 1999]
Θ (log n) copies of each item
Uses fault-free replicators
Redundancy should be reduced.
Model faults on operations.
F. Silvestri (UniPD) Resilient Data Structures DS Meet 17 / 78
Pointer-Based Data Structures
Pointer-based data structures highly
non-resilient
Resilient pointer-based data structures [Aumann and Bender, 1996]
Faults are detected by the system
Resilient stacks, linked list, binary search tree
Based on connectivity property of the butterfly (i.e., FFT DAG)
A limited amount of uncorrupted data may be
lost upon the occurrence of a fault
F. Silvestri (UniPD) Resilient Data Structures DS Meet 18 / 78
Error-correcting data structures
Use ECC for restoring faults
[de Wolf, 2009]
Provide data structures for equality, membership, substring, inner
product
Trade-off number of probes (time) and space
Assume no safe memory. Only ECC.
F. Silvestri (UniPD) Resilient Data Structures DS Meet 19 / 78
Checkers
Design a checker that is able to detect error
in the behavior of a data structure
[Blum et al., 1991]
Detects faults; in case of faults, computation could not be restored
On-line checker: immediately after an operation
Off-line checker: at the end of a sequence
Checkers for stacks, queues and RAMs
Computation cannot be restored after a fault
F. Silvestri (UniPD) Resilient Data Structures DS Meet 20 / 78
We would like. . .
We would like to design algorithms and data structures
Resilient to δ faults
Resilient to faults inserted by a powerful adversary
Resilient to faults not recognizable by the system
May exploit O (1) safe memory
Provide (partial) correct solution even with faults
F. Silvestri (UniPD) Resilient Data Structures DS Meet 21 / 78
Which kind of solution?
Do we require the solution to be correct even
with faults?
Too much! We relax this assumption otherwise δ-replication is required
We require correctness (at least) on incorrect
data
Examples:
Sorting. Sort correctly uncorrupted data
Search: Is x in a set S? yes if there is an uncorrupted copy of x in S,
no if there are no uncorrupted values equal to x.
F. Silvestri (UniPD) Resilient Data Structures DS Meet 22 / 78
The faulty RAM
The Faulty RAM Model
[Finocchi and Italiano, 2004,
Finocchi and Italiano, 2008]
Memory fault: the correct value stored in a memory location is
altered (destructive faults)
Adversary with unbounded computational power: can corrupt up to δ
words
At any time
Fault appearance At any memory location
Simultaneously
Corrupted values indistinguishable from correct ones
F. Silvestri (UniPD) Resilient Data Structures DS Meet 23 / 78
The faulty RAM (2)
O (1) words of safe memory
cannot be corrupted by the adversary
can be read by the adversary
O (1) words of private memory
cannot be corrupted by the adversary
cannot be read by the adversary
useful for storing random bits
α: actual number of faults (α ≤ δ)
F. Silvestri (UniPD) Resilient Data Structures DS Meet 24 / 78
Why not data replication?
Data replication
Data replication can be quite inefficient in certain highly dynamic
scenarios, especially if objects to be replicated are large and complex
What can we do without (or limited) data replication?
E.g., with respect to sorting:
Q1 Can we sort the correct values in the presence of, e.g.,
polynomially many memory faults?
Q2 How many faults can we tolerate in the worst case if we wish
to maintain optimal time and space?
δ-resilient variable x
Write 2δ + 1 copies
Read by majority in O (1) safe memory: cannot be corrupted!
F. Silvestri (UniPD) Resilient Data Structures DS Meet 25 / 78
Some results in literature
Sorting (mergesort & quicksort) Counting
Searching (binary search & K -d trees
dictionaries) Interval trees
Priority queues Suffix trees
Dynamic programming ...
F. Silvestri (UniPD) Resilient Data Structures DS Meet 26 / 78
The rest of this talk
1 Resilient dictionary [Finocchi et al., 2007]
2 Resilient priority queue [Jørgensen et al., 2007]
F. Silvestri (UniPD) Resilient Data Structures DS Meet 27 / 78
Some results
Sorting [Finocchi and Italiano, 2008, Finocchi et al., 2009]
Θ n log n + δ 2 optimal
√
if δ = O n log n no time blow-up
Searching [Finocchi and Italiano, 2008, Jørgensen et al., 2007]
Θ (log n + δ) optimal
if δ = O (log n) no time blow-up
Counting [Brodal et al., 2009b]
Many counters, o(δ) space, one safe word
Small additive error
K -d trees [Gieseke et al., 2010]
Similar to the resilient search tree in the dictionary we will see
Used for clustering
Suffix/Interval trees [Christiano et al., 2011]
Exploit trade-off between ECC and replication
F. Silvestri (UniPD) Resilient Data Structures DS Meet 28 / 78
RESILIENT DICTIONARY
F. Silvestri (UniPD) Resilient Data Structures DS Meet 29 / 78
Resilient Dictionary
Operations
search(x):
return yes if there is an uncorrupted key x
return no if there isn’t an uncorrupted key x
If there is a corrupted key x, the behavior is not defined
insert(x), delete(x): defined as usual
[Finocchi et al., 2007]:
O (log n + αδ) amortized time/operation
O (n + δ) space
We will see a simpler implementation: O log n + αδ 2 amortized
time/operation
F. Silvestri (UniPD) Resilient Data Structures DS Meet 30 / 78
Main difficulties
Take wrong search direction upon reading a corrupted value
10
10
5 20 search(8) = false
2 8
Unsafe pointers:
point to wrong addresses (even outside the tree)
point to wrong nodes (tree structure?)
loosing a pointer ⇒ loosing part of the data
F. Silvestri (UniPD) Resilient Data Structures DS Meet 31 / 78
Unsafe pointers: a naı̈f approach
Replicate the tree 2δ + 1 times
10 10 10 10 10
5 20 5 20 5 20 5 20 5 20
2 8 2 8 2 8 2 8 2 8
At each step follow the majority value of the 2δ + 1 copies ⇒ correct
since we can have at most δ memory faults
O (δ log n) time
Too expensive
O (δn) space
It would be ok if the tree contains n/δ nodes
Push Θ (δ) keys/node
F. Silvestri (UniPD) Resilient Data Structures DS Meet 32 / 78
The data structure: ingredients
Group keys into disjoint intervals spanning the key space
(−∞, −10], (−10, 2], (2, 9], (9, 30], (30, 50], (50, +∞)
Each interval contains Θ (δ) keys (except possibly for the boundaries)
Say, at least δ/2 and at most 2δ
Intervals maintained in a (balanced) binary search tree (e.g. AVL
trees)
Tree stored reliably: each pointer and relevant info replicated 2δ + 1
times
Keys are not stored resiliently
Nodes stored in an array using doubling (check reliably in O (1) time
if a link point to a tree node)
F. Silvestri (UniPD) Resilient Data Structures DS Meet 33 / 78
Example: search(15)
F. Silvestri (UniPD) Resilient Data Structures DS Meet 34 / 78
Space usage
δ/2 ≤ number of keys per node ≤ 2δ (but boundary)
⇓
O (n/δ) nodes
⇓
Θ (δ) space per node
⇓
Linear space: O (n + δ)
F. Silvestri (UniPD) Resilient Data Structures DS Meet 35 / 78
Searching a key γ: the algorithm
search(γ)
1 Interval search
search for the interval that should contain γ (target node)
2 Key search
search γ in the list of keys of the target node
F. Silvestri (UniPD) Resilient Data Structures DS Meet 36 / 78
Example: search(15)
F. Silvestri (UniPD) Resilient Data Structures DS Meet 37 / 78
Useful tests
Given a key γ and a node v , we can check:
v = target node ⇒ γ ∈ I (v )
target node ∈ tree(v ) ⇒ γ ∈ U(v )
v ancestor of w ⇒ U(w ) ⊆ U(v )
Unreliably: O (1) time
Tests can be done:
Reliably: Θ (δ) time
Using only reliable tests ⇒ too expensive: O (δ log(n/δ) + δ)
F. Silvestri (UniPD) Resilient Data Structures DS Meet 38 / 78
How to pay only an additive overhead?
A lazy approach
typically asleep:
trust unreliable variables. . .
from time to time wake up:
do some check
F. Silvestri (UniPD) Resilient Data Structures DS Meet 39 / 78
An O log n + αδ 2 algorithm
Rounds of at most δ search steps
starting checkpoint node x
target node ∈ tree(x)
ending checkpoint node y
target node ∈ tree(y )
F. Silvestri (UniPD) Resilient Data Structures DS Meet 40 / 78
Round structure
Unreliable phase
unreliable search steps +
unreliable consistency checks
No inconsistenty All check
Failing succeed
check
Checkpoint
Failing check
Reliable phase
reliable search steps+
reliable consistency checks
F. Silvestri (UniPD) Resilient Data Structures DS Meet 41 / 78
The unreliable phase
Perform (at most) δ unreliable search steps starting from x
(use only the first copy of each variable)
1 Let v = current node
2 If v = target node, go to the checkpoint
3 Otherwise, follow left/right pointer (let w = new node)
4 Check whether:
the address of node w is valid
w descendant of v
target node ∈ tree(w )
5 If any consistency check fails, start the reliable phase from x
F. Silvestri (UniPD) Resilient Data Structures DS Meet 42 / 78
The checkpoint
Perform the following reliable checks
(use all the 2δ + 1 copies of each variable)
1 Let x = starting checkpoint node
2 Let y = node on which the unreliable search terminated
3 If y = target node, then stop
4 If y descendant of x and target node ∈ tree(y ) ⇒ start new round
from y (search direction is correct)
5 Otherwise: start the reliable phase from x
F. Silvestri (UniPD) Resilient Data Structures DS Meet 43 / 78
The reliable phase
Perform δ reliable search steps starting from the checkpoint node x
(use all the 2δ + 1 copies of each variable)
1 Let v = current node
2 If v = target node, then stop
3 Otherwise, follow the left/right pointer
F. Silvestri (UniPD) Resilient Data Structures DS Meet 44 / 78
Search analysis
Rounds terminate after:
Unreliable phase + checkpoint:
cost O (δ)
Reliable phase:
cost O δ 2
Unsuccessful rounds:
go down the tree < δ levels
IDEA: Charge the time spent in reliable phases and unsuccessful rounds to
faulty values
F. Silvestri (UniPD) Resilient Data Structures DS Meet 45 / 78
Search analysis 2
Successful rounds:
O (log n/δ) such rounds ⇒ O (log n + δ) total time
Reliable phases:
Take place only if a check fails ⇒ A node at distance ≤ δ from the
starting checkpoint x contains some faulty value
At the end of the phase, such faulty value is out of the subtree in
which the search continues
At most δ faulty values ⇒ O αδ 2 total time
Unsuccessful rounds:
similar reasoning
F. Silvestri (UniPD) Resilient Data Structures DS Meet 46 / 78
Inserting a key
insert(γ)
Find the target node v and add
O log n + αδ 2
the key to its list of keys
If the number of keys becomes 2δ O (δ)
Delete node v O (δ log n)
Split the interval I (v ) into two
subintervals L and R such that
L ∪ R = I (v )
L takes the δ smallest keys
of v O δ2
R takes the δ largest keys
of v
Add two new nodes with intervals
O (δ log n)
L and R to the search tree
F. Silvestri (UniPD) Resilient Data Structures DS Meet 47 / 78
Insert analysis
The cost O (δ log n) can be amortized over Ω (δ) operations:
The new nodes contain δ keys each ⇒ the threshold 2δ can be
reached again after at least δ insertions
Similarly when we have deletions: Ω (δ) operations are necessary to
reach the threshold δ/2
Total amortized time
δ log n
O(log n + αδ 2 + ) = O log n + αδ 2
| {z } δ }
node search
| {z
tree update
F. Silvestri (UniPD) Resilient Data Structures DS Meet 48 / 78
Further improvements
[Finocchi et al., 2009]
O log n + δ 1+ amortized for any constant > 0
O (log n + δ) expected amortized time
F. Silvestri (UniPD) Resilient Data Structures DS Meet 49 / 78
RESILIENT PRIORITY QUEUE
F. Silvestri (UniPD) Resilient Data Structures DS Meet 50 / 78
Priority queue
Operations
insert(x): insert a new entry x
deletemin(): return the smallest uncorrupted value or a corrupted
value
Priority queue from previous tree: O log n + αδ 2 amortized
From [Jørgensen et al., 2007]
O (log n + δ) amortized per operation
Based on the cache-oblivious implementation in [Arge et al., 2002]
F. Silvestri (UniPD) Resilient Data Structures DS Meet 51 / 78
Structure (1)
Di D i+1
I ... Ui Ui+1 ...
b si si+1
Li L i+1
Insertion buffer I
k = O (log n) layers Li
Layer Li contains two buffers Di and Ui
Buffers are implemented as circular arrays
F. Silvestri (UniPD) Resilient Data Structures DS Meet 52 / 78
Structure (2)
Di D i+1
I ... Ui Ui+1 ...
b si si+1
Li L i+1
Buffers are double linked
The links between components and their sizes are stored resiliently
Buffers Di contain small entries that are moving down
Buffers Ui contain large entries that are moving up
F. Silvestri (UniPD) Resilient Data Structures DS Meet 53 / 78
Structure (3)
Di D i+1
I ... Ui Ui+1 ...
b si si+1
Li L i+1
Invariants
Each buffer (but I ) is faithfully ordered ←− Correct values are sorted
Di Di+1 and Di Ui+1 are faithfully ordered
|I | ≤ b ←− b = δ + log n + 1
←− si = 2si−1 = 2i δ 2 + log2 n
si /2 ≤ |Di | ≤ si , 0 ≤ i < k
|Ui | ≤ si /2 0 ≤ i ≤ k
F. Silvestri (UniPD) Resilient Data Structures DS Meet 54 / 78
Insert
insert(x)
1 Append x to I
2 if |I | > b
1 Move all entries of I into U0
2 Resiliently sort U0
3 If |U0 | > s0 /2, invoke push(U0 )
F. Silvestri (UniPD) Resilient Data Structures DS Meet 55 / 78
Delete min
First, find the min
1 Find the min in I
2 Find the min of the first δ + 1 elements of U0
3 Find the min of the first δ + 1 elements of D0
4 The min is the smallest among the three values
Then, delete the min
1 Remove the min from the appropriate buffer
2 Right shift all the elements in the affected buffer from the beginning
up to the position of the minimum
3 If min was in D0 and now |D0 | < s0 /2 invokes pull(D0 )
F. Silvestri (UniPD) Resilient Data Structures DS Meet 56 / 78
Push
push(Ui )
(Invoked when |Ui | > si /2)
If Li is not the last layer
1 Merge U , D and U
i i i+1
2 Assign the first |D | − δ entries to a new buffer D 0
i i
3 Assign the remaining entries to a new buffer U 0
i+1
4 Set U = ∅
i Di = Di0 0
Ui+1 = Ui+1
5 If |U
i+1 | > si /2 invoke push(Ui+1 ) recursively
For each Di where |Di | < si /2 invoke pull(Di )
In the cache-oblivious implementation Di0 receives
|Di |
F. Silvestri (UniPD) Resilient Data Structures DS Meet 57 / 78
Faulty push
Di = {1, 2, 3} Di+1 = {4, 5, 6}
Ui = {100, 101} Ui+1 = {}
Merge {1, 2, 3} and {100, 101}
Di = {1, 2, 3} Di+1 = {4, 5, 6}
Ui = {} Ui+1 = {100, 101}
Memory fault: 3 → 200
Merge {1, 2, 200} and {100, 101}
Di = {1, 2, 100} Di+1 = {4, 5, 6}
Ui = {} Ui+1 = {101, 200}
Di and Di+1 are not faithful ordered since 100 has not been corrupted!
F. Silvestri (UniPD) Resilient Data Structures DS Meet 58 / 78
Push (again)
If Li is the last layer
1 Merge Ui , Di
2 Assign the first |Di | entries to a new buffer Di0
3 0
Assign the remaining entries to a new buffer Di+1
4 Set Di = Di0 0
Di+1 = Di+1 Ui = Ui+1 = ∅
F. Silvestri (UniPD) Resilient Data Structures DS Meet 59 / 78
Pull
pull(Di )
(Invoked when |Di | < si /2)
If Di is not the last layer
1 Merge D , D
i i+1 and Ui+1
2 Assign the first s entries to a new buffer D 0
i i
3 Assign the next |D 0
i+1 | − (si − |Di |) − δ to a new buffer Di+1
4 Assign the remaining values to a new buffer U 0
i+1
5 Set D = D 0 D = D 0 U = U 0
i i i+1 i+1 i+1 i+1
6 If |D
i+1 | < si /2 invoke pull(Di+1 ) recursively
If Di is the last layer: nop
If |Ui+1 | > si /2 invoke push(Ui+1 ) recursively
0
In the cache-oblivious implementation Di+1 receives
|Di+1 | − (si − |Di |)
F. Silvestri (UniPD) Resilient Data Structures DS Meet 60 / 78
Complexities
Space: O (n)
Keys not replicated
Ω (δ + log n) keys per level (but I and Lk )
O (δ) pointers per level
O (n + δ) space
δ term can be removed by exploiting safe memory
Time/operation: O (log n + δ) amortized
Each layer is updated after O (si ) operations
F. Silvestri (UniPD) Resilient Data Structures DS Meet 61 / 78
Resiliency
&
Cache-obliviousness
F. Silvestri (UniPD) Resilient Data Structures DS Meet 62 / 78
I/O-efficiency
Faulty RAM has one memory level
Modern platforms feature memory
hierarchies
Reducing I/O improves performance ⇒
exploit locality
Caches (SRAM) even more sensitive to
memory faults
Low supply voltage, low critical charge per
cell
ECC prohibitive: tight constraints on die
size and speed
F. Silvestri (UniPD) Resilient Data Structures DS Meet 63 / 78
Fault tolerance vs I/O-efficiency
Hierarchical faulty memory model
[Brodal et al., 2009a]
Two memory levels (memory and cache)
Cache size M, block length B
Both levels can be faulty
I/O resilient algorithms for: sorting, dictionary,
priority queue
Algorithms are cache-aware: crucially depend on memory parameters
⇓
reduced portability
F. Silvestri (UniPD) Resilient Data Structures DS Meet 64 / 78
Fault tolerance vs cache-oblivious
Cache-oblivious algorithms overcome the issue [Frigo et al., 1999]
no explicit dependency on memory parameters
adapt automatically to all memory levels
optimality on a two-level hierarchy implies optimality on an arbitrary
hierarchy
Question
Can we design algorithms that are fault-tolerant and cache-oblivious?
Cache-oblivious algorithms are designed in a flat model (faulty-RAM),
but executed on the hierarchical faulty memory model
P private memory
if P = Θ (1): private memory may be implemented in the CPU registers
if P = ω (1): private memory hierarchy whose largest level has size P
Misses due to private memory are negligible in our algorithms.
Cache-oblivious algorithms don’t use M and B, but may use δ and P
F. Silvestri (UniPD) Resilient Data Structures DS Meet 65 / 78
Resilient cache-oblivious algorithms
[Caminiti et al., 2011] shows how to derive resilient cache-oblivious
algorithms for many problems
Local-dependency dynamic programming
Edit distance
Longest common subsequence
Gaussian Elimination Paradigm
All-pairs shortest path
Matrix multiplication
Gaussian Elimination Without Pivoting
Fast Fourier Transform
F. Silvestri (UniPD) Resilient Data Structures DS Meet 66 / 78
Edit distance. . .
We focus on a case of local dependency dynamic programming
Case study
Computing the edit distance (ED) of two strings:
We show how to derive a resilient cache-oblivious algorithm for ED
using P private memory
Similar techniques applies to GEP and FFT
F. Silvestri (UniPD) Resilient Data Structures DS Meet 67 / 78
First, some notation. . .
r -resilient variable x
Write 2r + 1 copies
Read by majority (in O (1) safe memory)
At least r + 1 faults are required to corrupt x
An adversary can corrupt at most bδ/(r + 1)c r -resilient variables
Rabin fingerprint ψA of a vector A = ha0 , a1 , . . . , an−1 i
n−1
X
ψA = ai 2w (n−i−1) mod p
i=0
p prime number, w memory word size
Can be computed with a scan of A and O (1) space
If entries are not accessed in order, fingerprints may require O (n log n)
due to exponentiation
F. Silvestri (UniPD) Resilient Data Structures DS Meet 68 / 78
Running example: ED
Edit distance
Input: strings X = x1 , . . . xn , Y = y1 , . . . yn .
Output: their edit distance
Edit − Distance(X , Y ) = number of edit ops {ins, del, sub} required to
transform X into Y
DP table for ED: (n + 1) × (n + 1) table, given by the following
recurrence:
i +j if i = 0 or j = 0
`[i, j] = `[i − 1, j − 1] if i, j > 0 and xi = yj
1 + min{`[i, j − 1], `[i − 1, j]} if i, j > 0 and xi 6= yj
The ED is `[n, n]
O n2 running time
F. Silvestri (UniPD) Resilient Data Structures DS Meet 69 / 78
A cache-oblivious algorithm for ED
Cache-oblivious algorithm [Chowdhury and Ramachandran, 2007]
Input: Strings X and Y ; T and L boundaries
Output: R and D boundaries
Decomposes the table into 4 subtables
Recursively computes the output
boundaries of each subtable
F. Silvestri (UniPD) Resilient Data Structures DS Meet 70 / 78
A cache-oblivious algorithm for ED
Cache-oblivious algorithm [Chowdhury and Ramachandran, 2007]
Input: Strings X and Y ; T and L boundaries
Output: R and D boundaries
Decomposes the table into 4 subtables
Recursively computes the output
boundaries of each subtable
F. Silvestri (UniPD) Resilient Data Structures DS Meet 70 / 78
A cache-oblivious algorithm for ED
Cache-oblivious algorithm [Chowdhury and Ramachandran, 2007]
Input: Strings X and Y ; T and L boundaries
Output: R and D boundaries
Decomposes the table into 4 subtables
Recursively computes the output
boundaries of each subtable
F. Silvestri (UniPD) Resilient Data Structures DS Meet 70 / 78
A cache-oblivious algorithm for ED
Cache-oblivious algorithm [Chowdhury and Ramachandran, 2007]
Input: Strings X and Y ; T and L boundaries
Output: R and D boundaries
Decomposes the table into 4 subtables
Recursively computes the output
boundaries of each subtable
F. Silvestri (UniPD) Resilient Data Structures DS Meet 70 / 78
A cache-oblivious algorithm for ED
Cache-oblivious algorithm [Chowdhury and Ramachandran, 2007]
Input: Strings X and Y ; T and L boundaries
Output: R and D boundaries
Decomposes the table into 4 subtables
Recursively computes the output
boundaries of each subtable
F. Silvestri (UniPD) Resilient Data Structures DS Meet 70 / 78
The main idea
A bad idea
All variables are δ-resiliently
O δn2 time
n2
O δ B√ M
misses
Match lower bounds when δ = O (1)
A good idea
Use dδ/2i e-resilient variables at recursive level i
Each input at level-i is associated with a fingerprint computed with
correct values using prime pi (read fingerprint).
The adversary can corrupt at most 2i subproblems at level i
The algorithm can recognize faults on inputs w.h.p.
F. Silvestri (UniPD) Resilient Data Structures DS Meet 71 / 78
The resilient algorithm
The algorithm at recursive level i
Input: strings X and Y , boundaries L and
T and the respective read fingerprints in
private memory.
Output: boundaries R and D and the
respective read fingerprints. null if an
incorrigible fault occurs.
Note: Inputs and outputs are Private memory (input):
dδ/2 e-resilient and fingerprints computed ΨX , ΨY , ΨT , ΨL
i+1
with prime pi . Private memory (output):
ΨR , ΨD
F. Silvestri (UniPD) Resilient Data Structures DS Meet 72 / 78
The resilient algorithm (2)
Algorithm:
1 Compute the 4 subproblems recursively
2 For each subproblem:
1 Extracts inputs as dδ/2i+1 e-resilient
variables
2 Create read fingerprints of subproblem
inputs with prime pi+1
3 While creating the new fingerprints
check correctness using the old ones
3 If a fault is detected return null
4 If a subproblem return null change
prime pi+1 and restart
F. Silvestri (UniPD) Resilient Data Structures DS Meet 73 / 78
Fingerprint mismatches
Input: vector X (dδ/2i e-resilient) and Ψ(X ) (computed with prime pi )
1 At the same time computes
fingerprint Ψ(X 0 ) of X 0 using pi+1
fingerprint Ψ̃(X 0 ) of X 0 using pi
2 Compute fingerprint Ψ̃(X ) of X using pi starting from Ψ̃(X 0 )
3 If Ψ̃(X ) 6= Ψ(X ), at least dδ/2i e + 1 faults occur, then return null
F. Silvestri (UniPD) Resilient Data Structures DS Meet 74 / 78
Analysis
Successful recursive calls:
No fingerprint mismatch
T (n, δ) = 4T (n/2, δ/2) + Θ (n(δ + 1)) = O n2 + δn log n
Unsuccessful calls (α ≤ δ actual number of faults):
Fingerprint mismatch at level i
at least δ/2i values corrupted
at most α2i /δ recomputations at level i
log
Xδ log δ
α2i T (n, δ) X 1
≤ T (n, δ) ≤ T (n, δ)
δ 4i 2i
i=1 i=1
F. Silvestri (UniPD) Resilient Data Structures DS Meet 75 / 78
Bounds for LD-DP
Bounds
Running time: O n2 + δn log n
Match O n2 : δ = O (n/ log n)
Cache misses: O n2 /(MB) + δn log n/B
Lower bound: Ω n2 /(MB) + δn/B
| {z } | {z }
non-resilient bound resilient bound
Previous result: O (nm/B) misses even without faults
The algorithm is cache-oblivious
Requires private memory Θ (log n)
F. Silvestri (UniPD) Resilient Data Structures DS Meet 76 / 78
Research directions
Three research directions:
1 δ-obliviousness
2 Graphs
how to define correctness?
linear-time graph algorithms: cannot store input resiliently
3 Other algorithms/data structures
Trade-off redundancy vs ECC [Christiano et al., 2011]
F. Silvestri (UniPD) Resilient Data Structures DS Meet 77 / 78
Thank you!
Questions?
F. Silvestri (UniPD) Resilient Data Structures DS Meet 78 / 78
Arge, L., Bender, M. A., Demaine, E. D., Holland-Minkley, B., and Munro, J. I.
(2002).
Cache-oblivious priority queue and graph algorithm applications.
In Proc. of 34th STOC, pages 268–276.
Assaf, S. and Upfal, E. (1991).
Fault tolerant sorting networks.
SIAM Journal on Discrete Mathematics, 4(4):472–480.
Aumann, Y. and Bender, M. (1996).
Fault tolerant data structures.
In Proc. of 37th FOCS, pages 580 –589.
Blömer, J. and Seifert, J.-P. (2003).
Fault Based Cryptanalysis of the Advanced Encryption Standard (AES) Financial
Cryptography.
In Financial Cryptography, volume 2742 of LNCS, chapter 12, pages 162–181.
Springer Berlin / Heidelberg.
Blum, M., Evans, W., Gemmell, P., Kannan, S., and Naor, M. (1991).
Checking the correctness of memories.
In Proc. 32nd FOCS, pages 90–99.
F. Silvestri (UniPD) Resilient Data Structures DS Meet 78 / 78
Brodal, G. S., Jørgensen, A. G., and Mølhave, T. (2009a).
Fault tolerant external memory algorithms.
In Proc. 11th WADS, volume 5664 of LNCS, pages 411–422.
Brodal, G. S., Jørgensen, A. G., Moruz, G., and Mølhave, T. (2009b).
Counting in the presence of memory faults.
In Proc. 20th ISAAC, volume 5878 of LNCS, pages 842–851.
Caminiti, S., Finocchi, I., Fusco, E. G., and Silvestri, F. (2011).
Dynamic programming in faulty memory hierarchies (cache-obliviously).
In Procs. of 31st FSTTCS.
Chlebus, B. S., Gambin, A., and Indyk, P. (1994).
PRAM computations resilient to memory faults.
In Proc. 2nd ESA, volume 855 of LNCS, pages 401–412. Springer Berlin /
Heidelberg.
Chowdhury, R. A. and Ramachandran, V. (2007).
The cache-oblivious gaussian elimination paradigm: theoretical framework,
parallelization and experimental evaluation.
In Proc. 19th SPAA, pages 71–80.
Christiano, P., Demaine, E. D., and Kishore, S. (2011).
F. Silvestri (UniPD) Resilient Data Structures DS Meet 78 / 78
Lossless fault-tolerant data structures with additive overhead.
In Proc. 12th WADS, pages 243–254.
de Wolf, R. (2009).
Error-correcting data structures.
In STACS, pages 313–324.
Finocchi, I., Grandoni, F., and Italiano, G. F. (2007).
Resilient search trees.
In Proc. of the 18th SODA, pages 547–553.
Finocchi, I., Grandoni, F., and Italiano, G. F. (2009).
Optimal resilient sorting and searching in the presence of memory faults.
Theor. Comput. Sci., 410(44):4457–4470.
Finocchi, I. and Italiano, G. (2004).
Sorting and searching in the presence of memory faults (without redundancy).
In Proc. of the 36th STOC, pages 101–110.
Finocchi, I. and Italiano, G. (2008).
Sorting and searching in faulty memories.
Algorithmica, 52:309–332.
F. Silvestri (UniPD) Resilient Data Structures DS Meet 78 / 78
Frigo, M., Leiserson, C. E., Prokop, H., and Ramachandran, S. (1999).
Cache-oblivious algorithms.
In Proc. 40th FOCS, pages 285–298.
Gieseke, F., Moruz, G., and Vahrenhold, J. (2010).
Resilient k-d trees: K-means in space revisited.
In Proc. of 10th ICDM, pages 815 –820.
Govindavajhala, S. and Appel, A. W. (2003).
Using memory errors to attack a virtual machine.
In Proc. of Symp. Security and Privacy, pages 154–165. IEEE.
Indyk, P. (1996).
On word-level parallelism in fault-tolerant computing.
In Proc. 13th STOCS, volume 1046 of LNCS, pages 193–204. Springer Berlin /
Heidelberg.
Jørgensen, A. G., Moruz, G., and Mølhave, T. (2007).
Priority queues resilient to memory faults.
In Algorithms and Data Structures, volume 4619 of LNCS, chapter 12, pages
127–138. Springer Berlin / Heidelberg.
Lakshmanan, K., Ravikumar, B., and Ganesan, K. (1991).
F. Silvestri (UniPD) Resilient Data Structures DS Meet 78 / 78
Coping with erroneous information while sorting.
IEEE Trans. on Computers, 40(9):1081 –1084.
Leighton, T. and Ma, Y. (1999).
Tight bounds on the size of fault-tolerant merging and sorting networks with
destructive faults.
SIAM Journal on Computing, 29(1):258–273.
Pelc, A. (2002).
Searching games with errorsfifty years of coping with liars.
Theoretical Computer Science, 270(1-2):71–109.
Ravikumar, B. (2002).
A fault-tolerant merge sorting algorithm.
In Computing and Combinatorics, volume 2387 of LNCS, pages 465–496. Springer
Berlin / Heidelberg.
Rényi, A. (1994).
A Diary on Information Theory.
John Wiley & Sons.
Schroeder, B., Pinheiro, E., and Weber, W. D. (2011).
DRAM errors in the wild: a large-scale field study.
Commun. ACM, 54:100–107.
F. Silvestri (UniPD) Resilient Data Structures DS Meet 78 / 78
Skorobogatov, S. and Anderson, R. (2003).
Optical Fault Induction Attacks Cryptographic Hardware and Embedded Systems.
In Proc. of CHES, volume 2523 of LNCS, chapter 2, pages 31–48. Springer Berlin /
Heidelberg.
Ulam, S. (1977).
Adventures of a mathematician.
Scribners.
Yao, A. C. and Yao, F. F. (1985).
On fault-tolerant networks for sorting.
SIAM Journal on Computing, 14(1):120–128.
F. Silvestri (UniPD) Resilient Data Structures DS Meet 78 / 78