Contents
• Main Memory
– Memory access time
– Memory cycle time
• Types of Memory Unit
– RAM
– ROM
• Memory Organization
• Memory System
• Cache Memory
– Associative mapping
– Direct mapping
– Set-associative mapping
– Replacement algorithm
• Memory Interleaving
Main Memory ( 1 )
• Main Memory - part of computer where program and data
are stored during execution.
• It consists of a number of cells (or locations), each of which
can store a piece of information (data, instruction, character
or number).
• The size of the cell can be single byte or several successive
bytes (word) - byte-addressable or word-addressable
computer.
Main Memory ( 2 )
• Each cell has a reference number, called address, by which
program can refer to it.
• If a memory address has k bits, the maximum number of cell
directly addressable is 2k.
– Example: For 16-bit addressing, the maximum
number of cells will be:
216 = 65536 memory cells
• The maximum size of address references available in main
memory for a computer system is called the size of the main
memory.
Main Memory ( 3 )
• The basic unit of memory is the binary digit “bit”. A bit contains a “0” or “1”.
• The most common definition of the word length of a computer is the number of
bits actually stored or retrieved in one memory access.
• Word length and address length are independent.
Processor Memory
k-bit
address bus
MAR
n-bit
data bus
Up to 2k addressable
MDR locations
Word length = n bits
Control lines
( R / W , MFC, etc.)
Connection of the memory to the processor.
Main Memory ( 4 )
• Memory Access Time
– The time that elapses between the initiation and the completion of
a memory access operation.
• e.g., the time between the READ and MFC (Memory Function
Complete) signals
• Memory Cycle Time
– The minimum time delay required between two successive memory
access operations.
– The cycle time is usually slighter longer than the access time.
MEMORY ORGANIZATION
C Programming Preview
Sizes of C Objects (in Bytes)
C Data Type Typical 32-bit Intel IA32 x86-64
char 1 1 1
short 2 2 2
int 4 4 4
long 4 4 8
long long 8 8 8
float 4 4 4
double 8 8 8
long double 8 10/12 10/16
char * 4 4 8
» Or any other pointer
15-213, S ‘09 – 7–
Memory is always organized
Byte-Oriented
BUT
Machine has “Word Size”
Machine Has “Word Size”
Nominal size of integer-valued data
Including addresses
Most current machines use 32 bits (4 bytes) words
Limits addresses to 4GB
Becoming too small for memory-intensive applications
High-end systems use 64 bits (8 bytes) words
Potential address space 1.8 X 1019 bytes
x86-64 machines support 48-bit addresses: 256 Terabytes
Machines support multiple data formats
Fractions or multiples of word size
Always integral number of bytes
15-213, S ‘09 – 9–
Byte & Word Oriented
Memory Organization64-bit 32-bit 16-bit Byte Mem
Words Words Words Addr Value
Addr 0000 3F
Addresses Specify Byte Addr
=
0001 29
0000
Locations =
0000 Addr 0002 7D
Addr =
Addresses of = 0002 0003 87
Addr
successive words differ 0000
?? =
0004 FE
Addr
by 2 (16-bit), 4 (32-bit) or =
0004 0005 02
Addr 0006 20
8 (64-bit) 0004
=
0006 0007 47
Addr 0008 AB
=
Addr 0009 CD
0008
=
0008 Addr 000A EF
Addr =
000A 000B F0
=
0008
?? Addr 000C D0
=
Addr 000C 000D 19
=
Addr 000E 99
000C
=
000E 000F 8C
Byte Ordering
How should bytes within multi-byte word be ordered in
memory?
Conventions
Big Endian: Sun, PPC Mac, Internet
Least significant byte has highest address
Little Endian: x86
Least significant byte has lowest address
15-213, S ‘09 – 11 –
Byte Ordering Example
Big Endian
Least significant byte has highest address
Little Endian
Least significant byte has lowest address
Example
Variable x has 4-byte representation 0x01234567
Address given by &x is 0x100
Big Endian 0x100 0x101 0x102 0x103
01 23 45 67
Little Endian 0x100 0x101 0x102 0x103
67 45 23 01
15-213, S ‘09 – 12 –
Types of Memory Unit ( 1 )
• Random-Access Memory (RAM)
– Any location can be accessed for a Read or Write
operation in some constant amount of time that is
independent of the memory location.
– Static RAM (SRAM)
• Memories that consist of circuits capable of retaining their state as long
as power is applied.
• SRAMs are fast (a few nanoseconds access time) but their cost is high.
– Dynamic RAM (DRAM)
• These memory units are capable of storing information for only tens of
milliseconds, thus require periodical refresh to maintain the contents.
Types of Memory Unit ( 2 )
• Read-Only Memory (ROM)
– Nonvolatile memory
– Data are written into a ROM when it is manufactured.
Normal operation involves only reading of stored data.
– ROM are useful as control store component in a
micro-programmed CPU (micro-coded CPU).
– ROM is also commonly used for storing the bootstrap
loader, a program whose function is to load the boot
program from the disk into the memory when the
power is turn on.
Types of Memory Unit ( 3 )
– PROM (Programmable ROM)
• data is allowed to be loaded by user but this process is
irreversible
• provide a faster and less expensive approach when
only a small number of data are required
– EPROM (Erasable, Programmable ROM)
• stored data can be erased by exposing the chip to
ultraviolet light and new data to be loaded
– EEPROM
• stored data can be erased electrically and selectively
• different voltages are needed for erasing, writing, and
reading the stored data
Types of Memory Unit ( 4 )
– Flash memory
• similar to EEPROM technology
• it is possible to read the contents of a single cell, but it is
only possible to write an entire block of cells
• greater density, higher capacity, lower cost per bit, low
power consumption
• typical applications: hand-held computers, digital cameras,
MP3 music players
• large memory modules implementation: flash cards and
flash drives
Types of Memory Unit ( 5 )
dc voltage
Address line T3 T4
T5 C1 C2 T6
Transistor
Storage
capacitor
T1 T2
Bit line Ground
Ground
B
Bit line Address Bit line
B line B
(a) Dynamic RAM (DRAM) cell (b) Static RAM (SRAM) cell
Figure 5.2 Typical Memory Cell Structures
Memmory Systems (1)
• Implement a 64K X 8
memory using 16K X 1
16K X 1 memory chip
memory chips
• Number of rows = 64K/16K =
4 Data input
• Number of columns =
14-bit
8/1 = 8.
address
• 64K = 16 bit address. Data output
• Each small chip needs 14 bit
(16K) address
• 16-14 = 2 bits for selecting
one of the four rows.
Chip select
Memory Systems (2)
16-bit
addresses 14-bit internal chip address
A
0
A1
A 4
1
A1 5
2-bit
decoder
Data o/p Data i/p
16 K X 1
memory chip
b7 b6 b0
Memory Systems (3)
21-bit
addresses 19-bit internal chip address
A0
A1
A 19
A 20
A memory system with 2M
words (2Mx32) formed by
(512K x 8) memory chips. 2-bit
decoder
512K ´ 8
memory chip
D31-24 D23-16 D 15-8 D7-0
512K x 8 memory chip
19-bit 8-bit data
address input/output
Chip select
Memory Systems ( 4 )
• Each chip has a control input called Chip Select (CS) used
to enable the chip.
• 21 address bits are needed to select a 32-bit word.
– The high-order 2 bits are decoded to determined which of the 4 CS
control signals are activated.
– The remaining 19 bits are used to access specific byte locations
inside each chip of the selected row.
Memory Systems ( 5 )
• Single In-line Memory Modules (SIMMs) and
Dual In-line Memory Modules (DIMMs)
– An assembly of several memory chips on a
separate small board that plugs vertically into a
single socket on the motherboard.
– Occupy a smaller amount of space.
– Allow easy expansion.
Memory Interleaving ( 1 )
• Main memory is structured as a number of physical modules (chip).
• Each memory module has its own Address Buffer Register (ABR) and Data
Buffer Register (DBR).
• Memory access may proceed in more than one module simultaneously
the aggregate rate of transmission of words to and from the main
memory can be increased.
• How individual addresses are distributed over the modules is critical in
determining the average number of modules that can be kept busy.
Memory Interleaving ( 2 )
• There are two memory address layouts :
(a) Consecutive words in a module
– The address consists of :
• (1) high-order k bits identify a single module (0 to n-1)
• (2) low-order m bits point to a particular word in that
module
• (3) Accessing consecutive addresses will keep
module busy.
Memory Interleaving ( 3 )
k bits m bits
Module Address in module MM address
ABR DBR ABR DBR ABR DBR
Module Module Module
0 i n-1
(a) Consecutive words in a module
Memory Interleaving ( 4 )
(b) Consecutive words in consecutive modules
– The address consists of :
• (1) low-order k bits determine a module
• (2) high-order m bits name a location within that module
• (3) Accessing consecutive addresses will keep several
modules busy at any one time
– It is called Memory Interleaving.
– Faster access to a block of data.
– Higher average utilization of the memory system.
Memory Interleaving ( 5 )
m bits k bits
Address in module Module MM address
ABR DBR ABR DBR ABR DBR
Module Module Module
0 i 2k - 1
(b) Consecutive words in consecutive modules
Memory Interleaving Example* Addr
(bits)
Data
(Byte)
Read
Time
(Clock
Memory Bank 0 Memory Bank 1 A2 A1 A0 Cycle)
CPU 01 11 000 01 2
A0
A1
02 12 001 02 3
Chip Select
0 010 03 3
A2 Chip Select 03 13
1 011 04 3
D0 – D7 04 14
2 out of 1
Decoder 100 11 2
101 12 3
D0 – D7 110 13 3
111 14 3
Consecutive Bytes in the same Memory Bank
*lots of assumptions to elaborate the Basic Concept
Memory Interleaving Example* Addr
(bits)
Data
(Byte)
Read
Time
(Clock
Memory Bank 0 Memory Bank 1 A2 A1 A0 Cycle)
CPU 01 11 000 01 2
A2
A1
02 12 001 11 2
Chip Select
0 010 02 2
A0 Chip Select 03 13
1 011 12 2
D0 – D7 04 14
2 out of 1
Decoder 100 03 2
101 13 2
D0 – D7 110 04 2
111 14 2
Consecutive Bytes in the Consecutive Memory Bank
*lots of assumptions to elaborate the Basic Concept
Concept
of
Cache
Cache Memory ( 1 )
• During the execution of a typical program it is often occurred
in a few localized areas of the program (in memory) at any
given interval of time –
• Locality of Reference
– temporal: a recently executed instructions is likely to be executed
again very soon, e.g., loop, stack.
– spatial: instructions in close proximity to a recently executed
instruction are also likely to be executed soon.
Cache Memory ( 2 )
Main
Processor Cache memory
• Cache memory used to store the active segments of the
program will then reduce the average memory access time,
resulting faster execution for the program.
• It is usually implemented using SRAM, which are very fast
memory (a few ns access time) but expensive.
Cache Memory ( 3 )
• In a read operation, the block containing the location
specified is transferred into the cache from the main
memory, if it is not in the cache (a miss). Otherwise (a hit),
the block can be read from the cache directly.
• The performance of cache memory is frequently measured
in terms of hit ratio . High hit ratio verifies the validity
of the local reference property.
Number of hits
hit ratio
Total number of memory references
Cache Memory ( 4 )
• Two different ways of write access for system with cache
memory :
– (1) Write-through method – the cache and the main memory
locations are updated simultaneously.
– (2) Write-back method - cache location updated during a write
operation is marked with a dirty or modified bit. The main memory
location is updated later when the block is to be removed from the
cache.
Cache Memory ( 5 )
• The correspondence between the main memory blocks and
those in the cache is specified by a mapping function.
– Direct Mapping
– Associative Mapping
– Set-associative Mapping
• To explain the mapping procedures, we consider
– a 2K cache consisting of 128 blocks of 16 words each, and
– a 64K main memory addressable by a 16-bit address, 4096 blocks of 16
words each. [Assumed Memory is Word Organized]
Direct Mapping ( 1 )
Main
memory
• Block j of the main memory maps Block 0
Block 1
onto block j modulo 128 of the
cache. Cache Block 127
tag
Block 0 Block 128
• The 7-bit cache block field tag Block 1 Block 129
determines the cache position.
• The high-order 5 tag bits identify tag Block 127 Block 255
Block 256
which of the 32 blocks is currently Block 257
resident in the cache.
Block 4095
Tag Block Word
5 7 4 Main memory address
Direct Mapping ( 2 )
• Since more than one memory block is mapped
onto a given cache block position, contention
may arise for that position even when the
cache is not full.
• This technique is easy to implement, but it is
not flexible.
Associative Mapping ( 1 )
Main
memory
• A main memory block can be
Block 0
placed into any cache block Block 1
position the space in the cache Cache
can be used more efficiently. tag
Block 0
tag
• The 12 tag bits identify a memory Block 1
block residing in the cache. Blocki
• The lower-order 4 bits select one tag
Block 127
of 16 words in a block.
Block 4095
Tag Word
12 4 Main memory address
Associative Mapping ( 2 )
• The cost of an associative cache is relatively
high because of the need to search all 128
tags to determine whether a given block is in
the cache.
• For performance reasons, associative search
must be done in parallel.
Set-Associative Mapping ( 1 )
• Blocks of the cache are grouped into sets, and the mapping
allows a block of the main memory to reside in any block of a
specific set.
• A cache that has k blocks per set is referred to as a k-way set-
associative cache.
• The contention problem of the direct method is eased.
• The hardware cost of the associative method is reduced.
Set-Associative Mapping ( 2 )
Main
memory
• The 6-bit set field determines Block 0
Block 1
which set of the cache might Cache
tag
contain the desired block. Set 0
tag
Block 0
Block 63
Block 1
tag Block 64
• The tag field is associatively Set 1
tag
Block 2
Block 65
Block 3
compared to the tags of the
two blocks of the set to check Set 63
tag
Block 126
Block 127
tag Block 128
Block 127
if the desired block is present. Block 129
Block 4095
Tag Set Word
6 6 4 Main memory address
Replacement Algorithms
• Difficult to determine which blocks to kick out
• Least Recently Used (LRU) block
• The cache controller tracks references to all
blocks as computation proceeds.
• Increase / clear track counters when a hit/miss
occurs
Replacement Algorithms
• For Associative & Set-Associative Cache
Which location should be emptied when the cache
is full and a miss occurs?
– First In First Out (FIFO)
– Least Recently Used (LRU)
• Distinguish an Empty location from a Full one
– Valid Bit
28 / 19
Replacement Algorithms
CPU A B C A D E A D C F
Reference
Miss Miss Miss Hit Miss Miss Miss Hit Hit Miss
Cache A A A A A E E E E E
FIFO B B B B B A A A A
C C C C C C C F
D D D D D D
Hit Ratio = 3 / 10 = 0.3
29 / 19
Replacement Algorithms
CPU A B C A D E A D C F
Reference
Miss Miss Miss Hit Miss Miss Hit Hit Hit Miss
Cache A B C A D E A D C F
LRU A B C A D E A D C
A B C A D E A D
B C C C E A
Hit Ratio = 4 / 10 = 0.4
30 / 19
Watch a YouTube video on cache