0% found this document useful (0 votes)

12 views35 pages

Bridging the CPU-Memory Gap

The lecture discusses the design of operating systems with a focus on memory hierarchy, locality, and cache management. It highlights the challenges of memory access speed and the importance of locality in improving performance through caching techniques. Various cache replacement policies, such as FIFO and LRU, are examined to optimize memory usage and reduce cache misses.

Uploaded by

gptcharacholena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views35 pages

Bridging the CPU-Memory Gap

Uploaded by

gptcharacholena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CS 153

Design of Operating Systems

Spring 25

Lecture 11: Locality, Cache, and Memory

Hierarchy
Instructor: Chengyu Song

Some slides modified from originals by Dave O’hallaron

Efficient Translations
Recall that our original page table scheme doubled the latency of doing
memory lookups
◆ One lookup into the page table, another to fetch the data
Now two-level page tables triple the latency!
◆ Two lookups into the page tables, a third to fetch the data
◆ And this assumes the page table is in memory
How can we use paging but also have lookups cost about the same as
fetching from memory?
◆ Cache (remember) translations in hardware Why memory access is
◆ Translation Lookaside Buffer (TLB) slow and what to do
◆ TLB managed by Memory Management Unit (MMU) about it
The CPU-Memory Gap
The gap widens between DRAM, disk, and CPU speeds.
100,000,000.0 Disk
10,000,000.0

1,000,000.0
SSD
100,000.0
Disk seek time
10,000.0
Flash SSD access time
DRAM access time
1,000.0
ns

SRAM access time

DRAM CPU cycle time
100.0 Effective CPU cycle time

10.0

1.0

0.1 CPU

0.0
1980 1985 1990 1995 2000 2003 2005 2010
Year
The Price-Speed Gap
Question: why don’t we just use fast memory to do everything?
SRAM
◆ Latency: 0.5-2.5 ns, cost: ~$5000 per GB
DRAM
◆ Latency: 50-70 ns, cost: ~$20 - $50 per GB
SSD/NVM
◆ Latency: 70-150 ns, cost: ~$4 - $12 per GB
Magnetic disk
◆ Latency: 5-20 ms, cost: ~$0.02 - $2 per GB
Locality to the Rescue!
The key to bridging this CPU-Memory gap is a fundamental property of
computer programs known as locality
Locality
Principle of Locality: Programs tend to use data and instructions with
addresses near or equal to those they have used recently

Temporal locality:
◆ Recently referenced items are likely
to be referenced again in the near future

Spatial locality:
◆ Items with nearby addresses tend
to be referenced close together in time

Q: What does locality enable us to do?

Locality Example
sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;

Data references
◆ Reference array elements in
Spatial locality
succession (stride-1 reference
pattern).
◆ Reference variable sum each iteration.
Temporal locality
Instruction references
◆ Reference instructions in sequence.
◆ Cycle through loop repeatedly. Spatial locality

Temporal locality
Qualitative Estimates of Locality
Claim: Being able to look at code and get a qualitative sense of its locality
is a key skill for a professional programmer.
Question: Does this function have good locality with respect to array a?

int sum_array_rows(int a[M][N])

{
int i, j, sum = 0;

for (i = 0; i < M; i++)

for (j = 0; j < N; j++)
sum += a[i][j];
return sum;
}
Locality Example
Question: Does this function have good locality with respect to array a?

int sum_array_cols(int a[M][N])

{
int i, j, sum = 0;

for (j = 0; j < N; j++)

for (i = 0; i < M; i++)
sum += a[i][j];
return sum;
}
Locality Example
Question: Can you permute the loops so that the function scans the 3-d
array a with a stride-1 reference pattern (and thus has good spatial
locality)?

int sum_array_3d(int a[M][N][N])

{
int i, j, k, sum = 0;

for (i = 0; i < N; i++)

for (j = 0; j < N; j++)
for (k = 0; k < M; k++)
sum += a[k][i][j];
return sum;
}
Cache
l Cache: A smaller, faster storage that acts as a staging area for a subset
of the data in a larger, slower storage.
◆ The storage could be a software data structure or a hardware device → memory
hierarchy
l Why does cache work?
◆ Because of locality!
» Hit fast storage much more frequently even though its smaller
General Cache Concepts

Smaller, faster, more

Cache 8
4 9 14
10 3 expensive storage caches a
subset of the blocks

Data is copied in block-

10
4 sized transfer units

Memory Larger, slower, cheaper

0 1 2 3 storage viewed as
4 5 6 7 partitioned into “blocks”

8 9 10 11
12 13 14 15
General Cache Concepts: Hit
Request: 14 Data in block b is needed

Cache Block b is in cache:

8 9 14 3 Hit!

Memory 0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
General Cache Concepts: Miss
Request: 12 Data in block b is needed

Cache Block b is not in cache:

8 9
12 14 3 Miss!

Block b is fetched from

12 Request: 12 memory

Memory Block b is stored in cache

0 1 2 3
• Placement policy:
4 5 6 7 determines where b goes
8 9 10 11 • Replacement policy:
determines which block
12 13 14 15 gets evicted (victim)
Types of Cache Misses
Cold (compulsory) miss
◆ Cold misses occur because the cache is empty.
Conflict miss
◆ Most caches limit blocks at level k+1 to a small subset (sometimes a singleton) of the
block positions at level k.
» E.g. Block i at level k+1 must be placed in block (i mod 4) at level k.
◆ Conflict misses occur when the level k cache is large enough, but multiple data
objects all map to the same level k block.
» E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time.
Capacity miss
◆ Occurs when the set of active cache blocks (working set) is larger than the cache.
Cache Replacement Policy
Cache replacement policy: determine which data to remove when we
need a victim

Does it matter?
◆ Yes! Cache filling is expensive
◆ Getting the number down, can improve the performance of the system significantly
Considerations
Cache replacement support has to be simple
◆ They happen all the time, we cannot make that part slow
But it can be complicated/expensive when a miss occurs – why?
◆ Reason 1: if we are successful, this will be rare
◆ Reason 2: when it happens, we are paying the cost of loading
» Loading from lower layer is relatively slower: can afford to do some extra computation
» Worth it if we can save some future miss
What makes a good cache replacement policy?
Evicting the Best Data
Goal is to reduce the cache miss rate
The best data to evict is the one never touched again
◆ Will never have a cache miss on it
Never is a long time, so picking the data closest to “never” is the next
best thing
◆ Evicting the data that won’t be used for the longest period of time minimizes the
number of cache misses
◆ Proved by Belady
We’ll survey various replacement algorithms: Belady’s, FIFO, LRU (least
recently used)
Belady’s Algorithm
Belady’s algorithm
◆ Idea: Replace the data that will not be used for the longest time in the future
◆ Optimal? How would you show?
◆ Problem: Have to predict the future
Why is Belady’s useful then?
◆ Use it as a yardstick/upper bound
◆ Compare implementations of page replacement algorithms with the optimal to gauge
room for improvement
» If optimal is not much better, then algorithm is pretty good
◆ What’s a good lower bound?
» Random replacement is often the lower bound
First-In First-Out (FIFO)
FIFO is an obvious algorithm and simple to implement
◆ Maintain a list of pages in order in which they were paged in
◆ On replacement, evict the one brought in longest time ago
Why might this be good?
◆ Maybe the one brought in the longest ago is not being used
Why might this be bad?
◆ Then again, maybe it’s not
◆ We don’t have any info to say one way or the other
FIFO suffers from “Belady’s Anomaly”
◆ The miss rate might actually increase when the cache size grows (very bad)
Least Recently Used (LRU)
LRU uses reference information to make a more informed replacement
decision
◆ Idea: We can’t predict the future, but we can make a guess based upon past
experience
◆ On replacement, evict the page that has not been used for the longest time in the past
(Belady’s: future)
◆ When does LRU do well? When does LRU do poorly?
Implementation
◆ To be perfect, need to time stamp every reference (or maintain a stack) – much too
costly
◆ So we need to approximate it
Memory Hierarchies
Some fundamental and enduring properties of hardware and software:
◆ Fast storage technologies cost more per byte, have less capacity, and require more
power (heat!).
◆ The gap between CPU and main memory speed is widening.
◆ Well-written programs tend to exhibit good locality.
These fundamental properties complement each other beautifully.
They suggest an approach for organizing memory and storage systems
known as a memory hierarchy.
An Example of Memory Hierarchy
L0:
CPU registers hold words retrieved from L1
Registers cache

L1: L1 cache
Smaller, (SRAM) L1 cache holds cache lines retrieved from
L2 cache
faster,
costlier L2:
per byte L2 cache
(SRAM) L2 cache holds cache lines retrieved
from main memory
L3:
Main memory
Larger,
(DRAM) Main memory holds disk blocks
slower, retrieved from local disks
cheaper
per byte
L4: Local secondary storage Local disks hold files
(local disks) retrieved from disks on
remote network servers

Remote secondary storage

L5: (tapes, distributed file systems, Web servers)
Another Example
Memory hierarchy
l Fundamental idea of a memory hierarchy:
For each layer, faster, smaller device caches larger, slower device
l Why do memory hierarchies work?
Because of locality!
» Hit fast memory much more frequently even though its smaller
Thus, the storage at level k+1 can be slower (but larger and cheaper!)
l Big Idea: The memory hierarchy creates a large pool of storage that costs
as much as the cheap storage near the bottom, but that serves data to
programs at the rate of the fast storage near the top.
Examples of Caching in the Hierarchy
Cache Type What is Cached? Where is it Cached? Latency (cycles) Managed By

Registers 4-8 bytes words CPU core 0 Compiler

TLB Address translations On-Chip TLB 0 Hardware

L1 cache 64-bytes block On-Chip L1 1 Hardware

L2 cache 64-bytes block On/Off-Chip L2 10 Hardware
Virtual Memory 4-KB page Main memory 100 Hardware + OS
Buffer cache Parts of files Main memory 100 OS
Disk cache Disk sectors Disk controller 100,000 Disk firmware
Network buffer Parts of files Local disk 10,000,000 AFS/NFS client
cache
Browser cache Web pages Local disk 10,000,000 Web browser

Web cache Web pages Remote server disks 1,000,000,000 Web proxy
server
Intel Core i7 Memory System
Processor package
Core x4

Instruction MMU
Registers
fetch (addr translation)

L1 d-cache L1 i-cache L1 d-TLB L1 i-TLB

32 KB, 8-way 32 KB, 8-way 64 entries, 4-way 128 entries, 4-way

L2 unified cache L2 unified TLB

256 KB, 8-way 512 entries, 4-way

To other
cores
QuickPath interconnect
4 links @ 25.6 GB/s each
To I/O
bridge

L3 unified cache DDR3 Memory controller

8 MB, 16-way 3 x 64 bit @ 10.66 GB/s
(shared by all cores) 32 GB/s total (shared by all cores)

Main memory
End-to-end Core i7 Address Translation
32/64
CPU
Result L2, L3, and
main memory
Virtual address (VA)
36 12
VPN VPO L1 L1
hit miss
32 4
TLBT TLBI
L1 d-cache
(64 sets, 8 lines/set)
TLB
hit
TLB
miss

...

...
L1 TLB (16 sets, 4 entries/set)

9 9 9 9
40 12 40 6 6
VPN1 VPN2 VPN3 VPN4
PPN PPO CT CI CO
Physical
CR3 address
(PA)
PTE PTE PTE PTE

Page tables
Simple Memory System Example
Addressing
◆ 14-bit virtual addresses
◆ 12-bit physical address
◆ Page size = 64 bytes
13 12 11 10 9 8 7 6 5 4 3 2 1 0

VPN VPO

Virtual Page Number Virtual Page Offset

11 10 9 8 7 6 5 4 3 2 1 0

PPN PPO
Physical Page Number Physical Page Offset
Simple Memory System Page Table
Only show first 16 entries (out of 256)

VPN PPN Valid VPN PPN Valid

00 28 1 08 13 1
01 – 0 09 17 1
02 33 1 0A 09 1
03 02 1 0B – 0
04 – 0 0C – 0
05 16 1 0D 2D 1
06 – 0 0E 11 1
07 – 0 0F 0D 1
Simple Memory System TLB
16 entries
4-way associative* (what is this?!)

TLBT TLBI
13 12 11 10 9 8 7 6 5 4 3 2 1 0

VPN VPO

Set Tag PPN Valid Tag PPN Valid Tag PPN Valid Tag PPN Valid
0 03 – 0 09 0D 1 00 – 0 07 02 1
1 03 2D 1 02 – 0 04 – 0 0A – 0
2 02 – 0 08 – 0 06 – 0 03 – 0
3 07 – 0 03 0D 1 0A 34 1 02 – 0
Simple Memory System Cache
16 lines, 4-byte block size
Physically addressed
Direct mapped CT CI CO

11 10 9 8 7 6 5 4 3 2 1 0

PPN PPO
Idx Tag Valid B0 B1 B2 B3 Idx Tag Valid B0 B1 B2 B3
0 19 1 99 11 23 11 8 24 1 3A 00 51 89
1 15 0 – – – – 9 2D 0 – – – –
2 1B 1 00 02 04 08 A 2D 1 93 15 DA 3B
3 36 0 – – – – B 0B 0 – – – –
4 32 1 43 6D 8F 09 C 12 0 – – – –

5 0D 1 36 72 F0 1D D 16 1 04 96 34 15
6 31 0 – – – – E 13 1 83 77 1B D3
7 16 1 11 C2 DF 03 F 14 0 – – – –
Address Translation Example #1
Virtual Address: 0x03D4
TLBT TLBI
13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 1 1 1 0 1 0 1 0 0

VPN VPO

VPN 0x0F
___ TLBI 0x3
___ TLBT 0x03
____ TLB Hit? Y
__ Page Fault? __
N PPN:0x0D
____

Physical Address
CT CI CO
11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 0 1 0 1 0 1 0 0
PPN PPO
CO ___
0 CI___
0x5 CT 0x0D
____ Hit? Y__ Byte: ____
0x36
Address Translation Example #2
Virtual Address: 0x0B8F
TLBT TLBI
13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 1 1 1 0 0 0 1 1 1 1

VPN VPO

VPN 0x2E
___ TLBI ___
2 TLBT 0x0B
____ TLB Hit? N
__ Page Fault? __
Y PPN: TBD
____

Physical Address
CT CI CO
11 10 9 8 7 6 5 4 3 2 1 0

PPN PPO
CO ___ CI___ CT ____ Hit? __ Byte: ____
Address Translation Example #3
Virtual Address: 0x0020
TLBT TLBI
13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0

VPN VPO

VPN 0x00
___ TLBI ___
0 TLBT 0x00
____ TLB Hit? N
__ Page Fault? __
N PPN:0x28
____

Physical Address
CT CI CO
11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 0 1 0 0 0 0 0
PPN PPO
CO___
0 CI___
0x8 CT 0x28
____ Hit? N__ Byte: ____
Mem

Locality and Caching in Computer Architecture
No ratings yet
Locality and Caching in Computer Architecture
17 pages
Locality Principles in Caching Systems
No ratings yet
Locality Principles in Caching Systems
18 pages
Memory Hierarchy Design Overview
100% (1)
Memory Hierarchy Design Overview
47 pages
Understanding Memory Hierarchy in Computing
No ratings yet
Understanding Memory Hierarchy in Computing
56 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
21 pages
Cache Computing: Strategies and Policies
No ratings yet
Cache Computing: Strategies and Policies
45 pages
Memory System Overview and Caches
No ratings yet
Memory System Overview and Caches
88 pages
Memory Hierarchy and Cache Design
No ratings yet
Memory Hierarchy and Cache Design
22 pages
Memory Hierarchy and Locality in CS 3853
No ratings yet
Memory Hierarchy and Locality in CS 3853
37 pages
Understanding Memory Hierarchy and Caching
No ratings yet
Understanding Memory Hierarchy and Caching
24 pages
Understanding DRAM and SRAM Memory Types
No ratings yet
Understanding DRAM and SRAM Memory Types
59 pages
Memory Hierarchy and Locality Concepts
No ratings yet
Memory Hierarchy and Locality Concepts
57 pages
Understanding Cache Memory Basics
No ratings yet
Understanding Cache Memory Basics
42 pages
Memory Access and Bus Rate Dynamics
No ratings yet
Memory Access and Bus Rate Dynamics
36 pages
Cache Memory Design RISCV
No ratings yet
Cache Memory Design RISCV
30 pages
Cache Memory in Computer Architecture
No ratings yet
Cache Memory in Computer Architecture
28 pages
Memory Architecture in Computer Systems
No ratings yet
Memory Architecture in Computer Systems
27 pages
Memory Hierarchy and Caching Explained
No ratings yet
Memory Hierarchy and Caching Explained
107 pages
Understanding Cache Memory Strategies
No ratings yet
Understanding Cache Memory Strategies
31 pages
Cache Memory: Mapping & Replacement Techniques
No ratings yet
Cache Memory: Mapping & Replacement Techniques
36 pages
Welcome To Part 3: Memory Systems and I/O
No ratings yet
Welcome To Part 3: Memory Systems and I/O
31 pages
Understanding Direct Mapped Caching
No ratings yet
Understanding Direct Mapped Caching
18 pages
Memory Technologies in Computer Architecture
No ratings yet
Memory Technologies in Computer Architecture
44 pages
Cache Simulation Overview
No ratings yet
Cache Simulation Overview
144 pages
Understanding Cache Memory Locality
No ratings yet
Understanding Cache Memory Locality
36 pages
Memory System Overview and Caches
No ratings yet
Memory System Overview and Caches
87 pages
Cache Memory
No ratings yet
Cache Memory
14 pages
Memory Hierarchy in Modern Systems
No ratings yet
Memory Hierarchy in Modern Systems
304 pages
Understanding Cache Memory Hierarchy
No ratings yet
Understanding Cache Memory Hierarchy
30 pages
Cache Memory Architecture Explained
No ratings yet
Cache Memory Architecture Explained
13 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
Understanding Memory Hierarchy in Computer Architecture
No ratings yet
Understanding Memory Hierarchy in Computer Architecture
30 pages
High-Performance Cache Techniques
No ratings yet
High-Performance Cache Techniques
57 pages
Memory Design and Hierarchy Explained
No ratings yet
Memory Design and Hierarchy Explained
36 pages
Understanding Cache Memory and Locality
No ratings yet
Understanding Cache Memory and Locality
34 pages
Understanding Memory Hierarchy and Locality
No ratings yet
Understanding Memory Hierarchy and Locality
52 pages
Understanding Cache in Computing Systems
No ratings yet
Understanding Cache in Computing Systems
15 pages
Memory Hierarchy and Cache Optimization
No ratings yet
Memory Hierarchy and Cache Optimization
37 pages
Memory Organization and Hierarchy Guide
No ratings yet
Memory Organization and Hierarchy Guide
156 pages
Understanding Cache Memory Locality
No ratings yet
Understanding Cache Memory Locality
5 pages
Understanding Memory Locality and Caches
No ratings yet
Understanding Memory Locality and Caches
45 pages
Understanding Cache in Computer Systems
No ratings yet
Understanding Cache in Computer Systems
15 pages
Cache Simulation Overview
No ratings yet
Cache Simulation Overview
144 pages
Cache Performance in Computer Architecture
No ratings yet
Cache Performance in Computer Architecture
11 pages
Memory Hierarchy and Cache Management
No ratings yet
Memory Hierarchy and Cache Management
13 pages
Look-Aside Cache Overview
No ratings yet
Look-Aside Cache Overview
23 pages
Cache Memory Basics and Architecture
No ratings yet
Cache Memory Basics and Architecture
59 pages
Memory Hierarchy in Parallel Computing
No ratings yet
Memory Hierarchy in Parallel Computing
88 pages
Memory Hierarchy and Cache Optimization
No ratings yet
Memory Hierarchy and Cache Optimization
39 pages
Memory Subsystem Design and Locality
No ratings yet
Memory Subsystem Design and Locality
100 pages
Memory System Overview and Types
No ratings yet
Memory System Overview and Types
28 pages
Data Path Design and Memory Hierarchy
No ratings yet
Data Path Design and Memory Hierarchy
4 pages
Memory Architecture in Computer Systems
No ratings yet
Memory Architecture in Computer Systems
29 pages
Ecotourism's Impact on Coral Reefs
No ratings yet
Ecotourism's Impact on Coral Reefs
18 pages
Pneumatic Test Procedure for RF Pads
No ratings yet
Pneumatic Test Procedure for RF Pads
4 pages
Reforming Ayurveda Education System
No ratings yet
Reforming Ayurveda Education System
2 pages
Global Remote Control Systems Manufacturer
No ratings yet
Global Remote Control Systems Manufacturer
128 pages
ISO 27001:2022 Mandatory Documents List
No ratings yet
ISO 27001:2022 Mandatory Documents List
4 pages
RBI Priority Sector Lending Guidelines
No ratings yet
RBI Priority Sector Lending Guidelines
11 pages
Overview of Linguistic Studies Scope
No ratings yet
Overview of Linguistic Studies Scope
2 pages
Consolidation Test Report for Civil Engineering
No ratings yet
Consolidation Test Report for Civil Engineering
7 pages
Enviro Thin
No ratings yet
Enviro Thin
1 page
Calreticulin Antibody FMC 75 Ab22683
No ratings yet
Calreticulin Antibody FMC 75 Ab22683
5 pages
Kids' Time Management Guide
No ratings yet
Kids' Time Management Guide
184 pages
Class 9 AI Worksheet: Questions & Answers
No ratings yet
Class 9 AI Worksheet: Questions & Answers
3 pages
India's EV Policy and Lithium Insights
No ratings yet
India's EV Policy and Lithium Insights
8 pages
Create Fiori App with CDS and BOPF
33% (3)
Create Fiori App with CDS and BOPF
26 pages
CSIR-CCMB Multipurpose Shed Tender
No ratings yet
CSIR-CCMB Multipurpose Shed Tender
19 pages
Introduction to Technical Drawing
100% (1)
Introduction to Technical Drawing
7 pages
Comprehensive Testing Equipment List
No ratings yet
Comprehensive Testing Equipment List
3 pages
Bacon on Judicial Integrity and Justice
No ratings yet
Bacon on Judicial Integrity and Justice
6 pages
CFS Custodian Code List in India
No ratings yet
CFS Custodian Code List in India
6 pages
Eligible Candidates for Ehsaas Scholarship
No ratings yet
Eligible Candidates for Ehsaas Scholarship
367 pages
My Perspectives 3 Answers Guide
No ratings yet
My Perspectives 3 Answers Guide
17 pages
IPD Bill for Mr. Ghan shyambhai Ponkia
No ratings yet
IPD Bill for Mr. Ghan shyambhai Ponkia
1 page
Cybersecurity Roadmap by Kharim Mchatta
No ratings yet
Cybersecurity Roadmap by Kharim Mchatta
4 pages
Chair Crutch Reverse Engineering in Solidworks
No ratings yet
Chair Crutch Reverse Engineering in Solidworks
7 pages
Understanding the Fehu Rune Meaning
No ratings yet
Understanding the Fehu Rune Meaning
26 pages
Counseling Research Design Question Bank
No ratings yet
Counseling Research Design Question Bank
24 pages
Social Behavior of Mobile Legends Gamers
No ratings yet
Social Behavior of Mobile Legends Gamers
42 pages
Beisler 1710-2
No ratings yet
Beisler 1710-2
42 pages
CSE2011 Summer 2019 Assignment 2
No ratings yet
CSE2011 Summer 2019 Assignment 2
5 pages
Physics Problems on Friction and Acceleration
No ratings yet
Physics Problems on Friction and Acceleration
45 pages

Bridging the CPU-Memory Gap

Uploaded by

Bridging the CPU-Memory Gap

Uploaded by

CS 153

Design of Operating Systems

Lecture 11: Locality, Cache, and Memory

Some slides modified from originals by Dave O’hallaron

SRAM access time

Q: What does locality enable us to do?

int sum_array_rows(int a[M][N])

for (i = 0; i < M; i++)

int sum_array_cols(int a[M][N])

for (j = 0; j < N; j++)

int sum_array_3d(int a[M][N][N])

for (i = 0; i < N; i++)

Smaller, faster, more

Data is copied in block-

Memory Larger, slower, cheaper

Cache Block b is in cache:

Cache Block b is not in cache:

Block b is fetched from

Memory Block b is stored in cache

Remote secondary storage

Registers 4-8 bytes words CPU core 0 Compiler

L1 cache 64-bytes block On-Chip L1 1 Hardware

L1 d-cache L1 i-cache L1 d-TLB L1 i-TLB

L2 unified cache L2 unified TLB

L3 unified cache DDR3 Memory controller

Virtual Page Number Virtual Page Offset

VPN PPN Valid VPN PPN Valid

You might also like