0% found this document useful (0 votes)
28 views34 pages

Understanding 5-Stage Pipelined Architecture

Introduction about pipelines

Uploaded by

Pratham Bihani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views34 pages

Understanding 5-Stage Pipelined Architecture

Introduction about pipelines

Uploaded by

Pratham Bihani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

PIPELINING: 5-STAGE

PIPELINE

CS/ECE 6810: Computer


Single-cycle RISC
Architecture
 Example: simple MIPS architecture
🞑Critical path includes all of the
processing steps
Controller Write
Back
PC

Inst. Regist Data


AL
Memory er Memo
U
File ry

Inst. Inst. Execu Memo


Fetch Decode te ry
Single-cycle RISC
Architecture
 Example
program
🞑 CT=6ns; CPU
AND
Time = ?
R1,R2,R3

XOR
R4,R2,R3

SUB

R5,R1,R4

ADD
CPU Time = IC x CPI x CT Time
R6,R1,R4
Single-cycle RISC
Architecture
 Example program
🞑 CT=6ns; CPU Time = 5 x 1 x 6ns = 30ns

AND R1,R2,R3
How to improve?
XOR R4,R2,R3

SUB

R5,R1,R4

ADD

R6,R1,R4
CPU Time = IC x CPI x CT Time
MUL
Reusing Idle
Resources
 Each processing step finishes in a fraction of
a cycle
🞑 Idleresources can be reused for
processing next instructions
Write Back
PC

Inst. Regist Data


AL
Memory er Memo
U
File ry

Inst. Inst. Execu Memo


Fetch Decode te ry
Pipelined
Architecture
 Five stage pipeline
🞑Critical path determines the
cycle time
0.7ns Write
Back
PC

Inst. Regist Data


AL
Memory er Memo
U
File ry

Inst. Inst. Execu Memo


Fetch Decode te ry
1.5ns 1.05ns 1.25ns 1.5ns
Pipelined
Architecture
 Example program
🞑CT=1.5ns; CPU
Time = ?
AND
R1,R2,R3

XOR
R4,R2,R3

SUB

R5,R1,R4

ADD
CPU Time = IC x CPI x CT Time
R6,R1,R4
Pipelined
Architecture
 Example program
🞑CT=1.5ns; CPU Time = 5 x 5 x 1.5ns =
37.5ns > 30ns
WORSE!!
AND R1,R2,R3

XOR R4,R2,R3

SUB

R5,R1,R4

ADD
CPU Time = IC x CPI x CT Time
R6,R1,R4
Pipelined
Architecture
 Example program
🞑CT=1.5ns; CPU
Time = ?
AND
R1,R2,R3

XOR
R4,R2,R3

SUB

R5,R1,R4

ADD
CPU Time = IC x CPI x CT Time
R6,R1,R4
Pipelined
Architecture
 Example program
🞑CT=1.5ns; CPU Time = 9 x 1 x 1.5ns
= 13.5ns
AND What is the cost of pipelining?
R1,R2,R3
XOR
R4,R2,R3

SUB

R5,R1,R4

ADD

R6,R1,R4 CPU Time = IC x CPI x CT Time


Pipelining
Technique
 Improving throughput at the expense of
latency
🞑 Delay: D = T + nδ
🞑 Throughput: IPS = n/(T + nδ)
Combinational
Logic Critical Path
Delay = 30
Pipelining
Technique
 Improving throughput at the expense of
latency
🞑 Delay: D = T + nδ
🞑 Throughput: IPS = n/(T + nδ)
Combinational D=
Logic Critical Path IPS
Delay = 30 =
Combinational Combinational D=
Logic Critical Path Logic Critical Path IPS
Delay = 15 Delay = 15 =
Comb. Comb. Comb. D=
Logic Logic Logic IPS
Delay = Delay = Delay = =
10 10 10
Pipelining
Technique
 Improving throughput at the expense of
latency
🞑 Delay: D = T + nδ
🞑 Throughput: IPS = n/(T + nδ)
Combinational D = 31
Logic Critical Path IPS =
Delay = 30 1/31
Combinational Combinational D = 32
Logic Critical Path Logic Critical Path IPS =
Delay = 15 Delay = 15 2/32
Comb. Comb. Comb. D = 33
Logic Logic Logic IPS =
Delay = Delay = Delay = 3/33
10 10 10
Pipelining Latency vs.
Throughput
 Theoretical delay and throughput
models for perfect pipelining

Delay
2 (D)
0
Performance

1
5
Relative

1
0
5 0 5 100 20
0 0
0 150
Number of Pipeline
Stages
Pipelining Latency vs.
Throughput
 Theoretical delay and throughput
models for perfect pipelining

Delay Throughput
2 (D) (IPS)
0
Performance

1
5
Relative

1
0
5 0 5 100 20
0 0
0 150
Number of Pipeline
Stages
Five Stage MIPS
Pipeline
Simple Five Stage
Pipeline
 A pipelined load-store architecture that
processes up to one instruction per cycle

Write
Back
PC

Inst. Regist Data


AL
Memory er Memo
U
File ry

Inst. Inst. Execu Memo


Fetch Decode te ry
Instruction
Fetch
 Read an instruction from memory (I-
Memory)
Use the program counter (PC) to index
🞑
into the I- Memory
🞑 Compute NPC by incrementing current
PC
 What about branches?

 Update pipeline registers


🞑 Write the instruction into the pipeline
registers
Instruction
Fetch
clock

Branch
Target

NPC = PC + 4

NPC
cloc PC +
k

4 Why increment
by 4?

Instructi
Memo
ry

on
Pipelin
e
Regist
Instruction
Fetch
cloc
k
P3
Branch
Target

cloc NPC = PC + 4
PC +

NP
C
k
P2
4 Why increment
P1 by 4?

Instructi
Memo
ry

on
Critical Path = Max{P1, P2, P3} Pipelin
e
Regist
Instruction
Decode
 Generate control signals for the opcode
bits

 Read source operands from the register file


(RF)
🞑 Use the specifiers for indexing RF
 How many read ports are required?

 Update pipeline registers


🞑Send the operand and immediate values to
next stage
Instruction
Decode
targ
et
NPC

NPC
re
g
Regist
er
Instructi

re
File

g
on

deco

ct
rl
de
Pipelin Pipelin
e e
Registe Registe
Execute
Stage
 Perform ALU operation
🞑 Compute the result of ALU
 Operation type: control signals
 First operand: contents of a register
 Second operand: either a register or the
immediate value
🞑 Compute branch target
 Target = NPC + immediate
 Update pipeline registers
🞑 Control signals, branch target, ALU
results, and destination
Execute
Stage
NPC

Re
ALU
re

s
g

Target

re
re

g
g

ct
ct

rl
rl

Pipelin Pipelin
e e
Registe Registe
Memory
Access
 Access data memory
🞑 Load/store address: ALU outcome
🞑 Control signals determine read or write
access

 Update pipeline registers


🞑 ALU results from execute
🞑 Loaded data from D-Memory

🞑 Destination register
Memory
Access
Targ
et

Re
Re

s
s

add
r

Da
Memory
re

dat
g

dat

t
a a

ct
ct

rl
rl

Pipelin Pipelin
e e
Registe Registe
Register Write
Back
 Update register file
🞑 Control signals determine if a register write is
needed
🞑 Only one write port is required
 Write the ALU result to the destination register, or
 Write the loaded data into the register file
Five Stage
Pipeline
 Ideal pipeline: IPC=1
🞑 Is there enough resources to keep the
pipeline stages busy all the time?

Inst. Decod Execu Memo Writeba


Fetch e te ry ck
+
PC +
Re ALU Re
4
g. Mem g.
Mem
File File
Pipeline
Hazards
Pipeline
Hazards
 Structural hazards: multiple instructions
compete for the same resource

 Data hazards: a dependent instruction


cannot proceed because it needs a value
that hasn’t been produced

 Control hazards: the next instruction cannot


be fetched because the outcome of an
earlier branch is unknown
Structural
Hazards
 1. Unified memory for instruction
and data

R1 Mem[R2]

R3

Mem[R20]

R6 R4-R5

R7 R1+R0
Structural
Hazards
 1. Unified memory for instruction
and data

R1 Mem[R2]

R3

Mem[R20]

R6 R4-R5

R7 R1+R0
Structural
Hazards
 1. Unified memory for instruction and data
 2. Register file with shared read/write
access ports
R1 Mem[R2]

R3

Mem[R20]

R6 R4-R5

R7 R1+R0
Structural
Hazards
 1. Unified memory for instruction and data
 2. Register file with shared read/write
access ports
R1 Mem[R2]

R3

Mem[R20]

R6 R4-R5

R7 R1+R0

You might also like