Computer Architecture and Organization
Lecture12: Locality and Caching
Majid Khabbazian mkhabbazian@[Link]
Electrical and Computer Engineering University of Alberta
April 9, 2013
Outline
Locality
Caching
Locality
Principle of Locality: Programs tend to use data and instructions with addresses near or equal to those they have used recently Temporal locality:
Recently referenced items are likely to be referenced again in the near future
Spatial locality:
Items with nearby addresses tend to be referenced close together in time
3
Locality Example
sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum;
Data references
Reference array elements in succession (stride-1 reference pattern). Reference variable sum each iteration.
Spatial locality Temporal locality Spatial locality Temporal locality
Instruction references
Reference instructions in sequence. Cycle through loop repeatedly.
Qualitative Estimates of Locality
Claim: Being able to look at code and get a qualitative sense of its locality is a key skill for a professional programmer.
Question: Does this function have good locality with respect to array a?
int sum_array_rows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum; }
5
Locality Example
Question: Does this function have good locality with respect to array a?
int sum_array_cols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum; }
6
Locality Example
Question: Can you permute the loops so that the function scans the 3-d array a with a stride-1 reference pattern (and thus has good spatial locality)?
int sum_array_3d(int a[M][N][N]) { int i, j, k, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) for (k = 0; k < N; k++) sum += a[k][i][j]; return sum; }
7
Memory Hierarchies
Some fundamental and enduring properties of hardware and software:
Fast storage technologies cost more per byte, have less capacity, and require more power (heat!). The gap between CPU and main memory speed is widening. Well-written programs tend to exhibit good locality.
These fundamental properties complement each other beautifully. They suggest an approach for organizing memory and storage systems known as a memory hierarchy.
8
An Example Memory Hierarchy
L0: Registers
CPU registers hold words retrieved from L1 cache L1 cache holds cache lines retrieved from L2 cache
L1:
Smaller, faster, costlier per byte
L1 cache (SRAM) L2 cache (SRAM) Main memory (DRAM)
L2:
L2 cache holds cache lines retrieved from main memory
L3:
Larger, slower, cheaper per byte
Main memory holds disk blocks retrieved from local disks Local disks hold files retrieved from disks on remote network servers
L4:
Local secondary storage (local disks)
L5:
Remote secondary storage (tapes, distributed file systems, Web servers)
9
Caches
Cache: A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device.
Fundamental idea of a memory hierarchy:
For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1.
Why do memory hierarchies work?
Because of locality, programs tend to access the data at level k more often than they access the data at level k+1. Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit.
Big Idea: The memory hierarchy creates a large pool of storage that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top.
10
General Cache Concepts
Cache 8 4 9 14 10 3
Smaller, faster, more expensive memory caches a subset of the blocks
10 4
Data is copied in block-sized transfer units
Larger, slower, cheaper memory viewed as partitioned into blocks
Memory
0 4 8 12
1 5 9 13
2 6 10 14
3 7 11 15
11
General Cache Concepts: Hit
Request: 14
Data in block b is needed
Cache
14
Block b is in cache: Hit!
Memory
0 4 8 12
1 5 9 13
2 6 10 14
3 7 11 15
12
General Cache Concepts: Miss
Request: 12
Data in block b is needed
Cache
9 12
14
Block b is not in cache: Miss!
Block b is fetched from memory Block b is stored in cache
Placement policy: determines where b goes Replacement policy: determines which block gets evicted (victim)
13
12
Request: 12
Memory
0 4 8 12
1 5 9 13
2 6 10 14
3 7 11 15
General Caching Concepts: Types of Cache Misses
Cold (compulsory) miss
Cold misses occur because the cache is empty.
Conflict miss
Most caches limit blocks at level k+1 to a small subset (sometimes a singleton) of the block positions at level k.
E.g. Block i at level k+1 must be placed in block (i mod 4) at level k.
Conflict misses occur when the level k cache is large enough, but multiple data objects all map to the same level k block.
E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time.
Capacity miss
Occurs when the set of active cache blocks (working set) is 14 larger than the cache.
Examples of Caching in the Hierarchy
Cache Type
Registers TLB L1 cache L2 cache Virtual Memory Buffer cache Disk cache Network buffer cache Browser cache Web cache
What is Cached?
4-8 bytes words Address translations 64-bytes block 64-bytes block 4-KB page Parts of files Disk sectors Parts of files Web pages Web pages
Where is it Cached?
CPU core On-Chip TLB On-Chip L1 On/Off-Chip L2 Main memory Main memory Disk controller Local disk Local disk Remote server disks
Latency (cycles)
0 0 1 10 100 100,000 10,000,000 10,000,000 1,000,000,000
Managed By
Compiler Hardware Hardware Hardware OS Disk firmware AFS/NFS client Web browser Web proxy server 15
100 Hardware + OS
Summary
The speed gap between CPU, memory and mass storage continues to widen.
Well-written programs exhibit a property called locality.
Memory hierarchies based on caching close the gap by exploiting locality.
16
Cache-friendly code
Example:
Several times slower (Pentium 4)!!
17