Computer Organization Lesson Plan
Computer Organization Lesson Plan
Sl. Lecture
Unit Topic to be Covered Text Book
No. ID
OBJECTIVES :
To understand the basic components of the Computers
To explore the memory Organization
To explore the I/O organizations in depth
Ability to analyze the hardware and software issues related to computers and the interface
between the two.
The basic functional units of computer are made of electronics circuit and it works with electrical
signal. We provide input to the computer in form of electrical signal and get the output in form of
electrical signal.
There are two basic types of electrical signals, namely, analog and digital. The analog signals are
continuous in nature and digital signals are discrete in nature.
The electronic device that works with continuous signals is known as analog device and the
electronic device that works with discrete signals is known as digital device. In present days most of
the computers are digital in nature and we will deal with Digital Computer in this course.
Computer is a digital device, which works on two levels of signal. We say these two levels of signal
as High and Low. The High-level signal basically corresponds to some high-level signal (say 5 Volt
or 12 Volt) and Low-level signal basically corresponds to Low-level signal (say 0 Volt). This is one
convention, which is known as positive logic. There are others convention also like negative logic.
Since Computer is a digital electronic device, we have to deal with two kinds of electrical signals.
But while designing a new computer system or understanding the working principle of computer, it
is always difficult to write or work with 0V or 5V.
Computer is used to solve mainly numerical problems. Again it is not convenient to work with
symbolic representation. For that purpose we move to numeric representation. In this convention,
we use 0 to represent LOW and 1 to represent HIGH.
0 means LOW
1 means HIGH
To know about the working principle of computer, we use two numeric symbols only namely 0 and
1. All the functionalities of computer can be captured with 0 and 1 and its theoretical background
corresponds to two valued boolean algebra.
With the symbol 0 and 1, we have a mathematical system, which is knows as binary number system.
Basically binary number system is used to represent the information and manipulation of
information in computer. This information is basically strings of 0s and 1s.
The smallest unit of information that is represented in computer is known as Bit ( Binary Digit ),
which is either 0 or 1. Four bits together is known as Nibble, and Eight bits together is known as
Byte
Computer technology has made incredible improvement in the past half century. In the early part of
computer evolution, there were no stored-program computer, the computational power was less
and on the top of it the size of the computer was a very huge one.
Today, a personal computer has more computational power, more main memory,more disk storage,
smaller in size and it is available in effordable cost.
This rapid rate of improvement has come both from advances in the technology used to build
computers and from innovation in computer design. In this course we will mainly deal with the
innovation in computer design.
The task that the computer designer handles is a complex one: Determine what attributes are
important for a new machine, then design a machine to maximize performance while staying within
cost constraints.
This task has many aspects, including instruction set design, functional organization, logic design,
and imple mentation.
While looking for the task for computer design, both the terms computer organization and computer
architecture come into picture.
It is difficult to give precise definition for the terms Computer Organization and Computer
Architecture. But while describing computer system, we come across these terms, and in literature,
computer scientists try to make a distinction between these two terms.
Computer architecture refers to those parameters of a computer system that are visible to a
programmer or those parameters that have a direct impact on the logical execution of a program.
Examples of architectural attributes include the instruction set, the number of bits used to represent
different data types, I/O mechanisms, and techniques for addressing memory.
Computer organization refers to the operational units and their interconnections that realize the
architectural specifications. Examples of organizational attributes include those hardware details
transparent to the programmer, such as control signals, interfaces between the computer and
peripherals, and the memory technology used.
In this course we will touch upon all those factors and finally come up with the concept how these
attributes contribute to build a complete computer system
Basic Computer Model :
The model of a computer can be described by four basic units in high level abstraction which is
shown in figure 1.1. These basic units are:
Input Unit
Output Unit
Memory Unit
o The program control unit has a set of registers and control circuit to generate control
signals.
o The execution unit or data processing unit contains a set of registers for storing data
and an Arithmatic and Logic Unit (ALU) for execution of arithmatic and logical
operations.
In addition, CPU may have some additional registers for temporary storage of data.
B. Input Unit :
With the help of input unit data from outside can be supplied to the computer. Program or data is
read into main storage from input device or secondary storage under the control of CPU input
instruction.
Example of input devices: Keyboard, Mouse, Hard disk, Floppy disk, CD-ROM drive etc.
C. Output Unit :
With the help of output unit computer results can be provided to the user or it can be stored in
stograge device permanently for future use. Output data from main storage go to output device
under the control of CPU output instructions.
Example of output devices: Printer, Monitor, Plotter, Hard Disk, Floppy Disk etc.
D. Memory Unit :
Memory unit is used to store the data and program. CPU can work with the information stored in
memory unit. This memory unit is termed as primary memory or main memory module. These are
basically semi conductor memories.
Secondary Memory :
There is another kind of storage device, apart from primary or main memory, which is known as
secondary memory. Secondary memories are non volatile memory and it is used for permanent
storage of data and program.
Before going into the details of working principle of a computer, we will analyse how
computers work with the help of a small hypothetical computer.
In this small computer, we do not consider about Input and Output unit. We will consider
only CPU and memory module. Assume that somehow we have stored the program and
data into main memory. We will see how CPU can perform the job depending on the
program stored in main memory.
P.S. - Our assumption is that students understand common terms like program, CPU,
memory etc. without knowing the exact details.
Consider the Arithmatic and Logic Unit (ALU) of Central Processing Unit :
Consider an ALU which can perform four arithmatic operations and four logical
operations
To distingish between arithmatic and logical operation, we may use a signal line,
In the similar manner, we need another two signal lines to distinguish between four
arithmatic operations.
Arithmatic Logical
000 ADD 100 OR
001 SUB 101 AND
010 MULT 110 NAND
011 DIV 111 NOR
Consider the part of control unit, its task is to generate the appropriate signal at right moment.
There is an instruction decoder in CPU which decodes this information in such a way that
computer can perform the desired task
The simple model for the decoder may be considered that there is three input lines to the
decoder and correspondingly it generates eight output lines. Depending on input combination only
one of the output signals will be generated and it is used to indicate the corresponding operation of
ALU.
Some of them are inside CPU, which are known as register. Other bigger chunk of storage space is
known as primary memory or main memory. The CPU can work with the information available in main
memory only.
To access the data from memory, we need two special registers one is known as Memory Data
Register (MDR) and the second one is Memory Address Register (MAR).
Data and program is stored in main memory. While executing a program, CPU brings instruction
and data from main memory, performs the tasks as per the instruction fetch from the memory. After
completion of operation, CPU stores the result back into the memory.
In next section, we discus about memory organization for our small machine.
Main memory unit is the storage unit, There are several location for storing information in the main
memory module.
The capacity of a memory module is specified by the number of memory location and the
information stored in each location.
A memory module of capacity 16 X 4 indicates that, there are 16 location in the memory module
and in each location, we can store 4 bit of information.
We have to know how to indicate or point to a specific memory location. This is done by address of
the memory location.
READ This operation is to retrive the data from memory and bring it to
Operation: CPU register
WRITE This operation is to store the data to a memory location from CPU
Operation: register
We need some mechanism to distinguish these two operations READ and WRITE.
With the help of one signal line, we can differentiate these two operations. If the content of this
signal line is
0, we say that we will do a READ operation; and if it is
1, then it is a WRITE operation.
To transfer the data from CPU to memory module and vice-versa, we need some connection. This
is termed as DATA BUS.
The size of the data bus indicates how many bits we can transfer at a time. Size of data bus is
mainly specified by the data storage capacity of each location of memory module.
We have to resolve the issues how to specify a particular memory location where we want to store
our data or from where we want to retrieve the data.
This can be done by the memory address. Each location can be specified with the help of a binary
address.
If we use 4 signal lines, we have 16 different combinations in these four lines, provided we use two
signal values only (say 0 and 1).
To distinguish 16 location, we need four signal lines. These signal lines use to identify a memory
location is termed as ADDRESS BUS. Size of address bus depends on the memory size. For a
memory module of capacity of 2n location, we need n address lines, that is, an address bus of size
n
We use a address decoder to decode the address that are present in address bus
As for example, consider a memory module of 16 location and each location can store 4 bit of
information
The size of address bus is 4 bit and the size of the data bus is 4 bit
The size of address decoder is 4 X 16.
If the contents of address bus is 0101 and contents of data bus is 1100 and R/W = 1, then 1100
will be
written in location 5.
If the contents of address bus is 1011 and R/W=0, then the contents of location 1011 will be placed
in data bus.
In next section, we will explain how to perform memory access operation in our small hypothetical
computer.
Memory Instruction
We need some more instruction to work with the computer. Apart from the instruction needed to
perform task inside CPU, we need some more instructions for data transfer from main memory to
CPU and vice versa.
In our hypothetical machine, we use three signal lines to identify a particular instruction. If we want
to include more instruction, we need additional signal lines.
With this additional signal line, we can go upto 16 instructions. When the signal of this new line is 0,
it will indicate the ALU operation. For signal value equal to 1, it will indicate 8 new instructions. So,
we can design 8 new memory access instructions.
We have added 6 new instructios. Still two codes are unused, which can be used for other
purposes. We show it as NOP means No Operation.
We have seen that for ALU operation, instruction decoder generated the signal for appropriate ALU
operation.
Apart from that we need many more signals for proper functioning of the computer. Therefore, we
need a module, which is known as control unit, and it is a part of CPU. The control unit is
responsible to generate the appropriate signal.
As for example, for LDAI instruction, control unit must generate a signal which enables the register
A to store in data into register A.
One major task is to design the control unit to generate the appropriate signal at appropriate time
for the proper functioning of the computer.
Consider a simple problem to add two numbers and store the result in memory, say we want to add
7 to 5.
To solve this problem in computer, we have to write a computer program. The program is machine
specific, and it is related to the instruction set of the machine.
Consider another example, say that the first number is stored in memory location 13 and the
second data is stored in memory location 14. Write a program to Add the contents of memory
location 13 and 14 and store the result in memory location 15.
One question still remain unanswerd: How to store the program or data to main memory. Once we
put the program and data in main memory, then only CPU can execute the program. For that we
need some more instructions.
We need some instructions to perform the input tasks. These instructions are responsible to provide
the input data from input devices and store them in main memory. For example instructions are
needed to take input from keyboard.
We need some other instructions to perform the output tasks. These instructions are responsible to
provide the result to output devices. For example, instructions are needed to send the result to
printer.
We have seen that number of instructions that can be provided in a computer depends on the
signal lines that are used to provide the instruction, which is basically the size of the storage
devices of the computer.
For uniformity, we use same size for all storage space, which are known as register. If we work with
a 16-bit machine, total instructions that can be implemented is 216.
The model that we have described here is known as Von Neumann Stored Program Concept. First we have
to store all the instruction of a program in main memory, and CPU can work with the contents that
are stored in main memory. Instructions are executed one after another.
We have explained the concept of computer in very high level abstraction by omitting most of the
details.
As the course progresses we will explain the exact working principle of computer in more details.
The present day digital computers are based on stored-program concept introduced by Von
Neumann. In this stored-program concept, programs and data are stored in separate storage unit
called memories.
Central Processing Unit, the main component of computer can work with the information stored in
storage unit only.
In 1946, Von Neumann and his colleagues began the design of a stored-program computer at the
Institute for Advanced Studies in [Link] computer is referred as the IAS computer.
The structure of IAS computer is shown in Figure 1.2.
This is the main unit of computer, which is responsible to perform all the operations. The CPU of the
IAS computer consists of a data processing unit and a program control unit.
The data processing unit contains a high speed registers intended for temporary storage of
instructions, memory addresses and data. The main action specified by instructions are performed
by the arithmatic-logic circuits of the data processing unit.
The control circuits in the program control unit are responsible for fetching instructions, decoding
opcodes, controlling the information movements correctly through the system, and providing proper
control signals for all CPU actions.
It is used for storing programs and data. The memory locations of memory unit is uniquely specified
by the memory address of the location. M(X) is used to indicate the location of the memory unit M
with address X.
The data transfer between memory unit and CPU takes place with the help of data register DR.
When CPU wants to read some information from memory unit, the information first brings to DR,
and after that it goes to appropriate [Link], data to be stored to memory must put into DR
first, and then it is stored to appropriate location in the memory unit.
The address of the memory location that is used during memory read and memory write operations
are stored in the memory register AR.
The information fetched from the memory is a operand of an instruction, then it is moved from DR to
data processing unit (either to AC or MQ). If it is an operand, then it is moved to program control
unit (either to IR or IBR).
Two additional registers for the temporary storage of operands and results are included in data
processing units: the accumulator AC and the multiplier-quotient register MQ.
Two instructions are fetch simultaneously from M and transferred to the program control unit. The
instruction that is not to be executed immediately is placed in the instruction buffer register IBR. The
opcode of the other instruction is placed in the instruction register IR where it is decoded.
In the decoding phase, the control circuits generate the required control signals to perform the
specified operation in the instruction.
The program counter(PC) is used to store the address of the next instruction to be fetched from
memory.
Input devies are used to put the information into computer. With the help of input devices we can
store information in memory so that CPU can use it. Program or data is read into main memory
from input device or secondary storage under the control of CPU input instruction.
Output devices are used to output the information from computer. If some results are evaluated by
computer and it is stored in computer, then with the help of output devices, we can present it to the
user. Output data from the main memory go to output device under the control of CPU output
We have already mentioned that computer can handle with two type of signals, therefore, to
represent any information in computer, we have to take help of these two signals.
These two signals corresponds to two levels of electrical signals, and symbolically we represent
them as 0 and 1.
In our day to day activities for arithmatic, we use the Decimal Number System. The decimal number
system is said to be of base, or radix 10, because it uses ten digits and the coefficients are
multiplied by power of 10.
A decimal number such as 5273 represents a quantity equal to 5 thousands plus 2 hundres, plus 7
tens, plus 3 units. The thousands, hundreds, etc. are powers of 10 implied by the position of the
coefficients. To be more precise, 5273 should be written as:
However, the convention is to write only the coefficient and from their position deduce the
necessary power of 10.
The binary number system is said to be of base 2 or radix 2, because it uses two digits and the
coefficients are multiplied by power of 2.
(in decimal)
In case of 8-bit numbers, the minimum number that can be stored in computer is 00000000 (0) and
maximum number is 11111111 (255) (if we are working with natural numbers).
So, the domain of number is restricted by the storage capacity of the computer. Also it is related to
number system; above range is for natural numbers.
In general, for n-bit number, the range for natural number is from
In the above example, the result is an 8-bit number, as it can be stored in the 8-bit computer, so we
get the correct results.
10000001 129
10101010 178
----------------- ------
100101011 307
In the above example, the result is a 9-bit number, but we can store only 8 bits, and the most
significant bit (MSB) cannot be stored.
The result of this addition will be stored as (00101011) which is 43 and it is not the desired result.
Since we cannot store the complete result of an operation, and it is known as the overflow case.
For our convenient, while writing in paper, we may take help of other number systems like octal and
hexadecimal. It will reduce the burden of writing long strings of 0s and 1s.
Octal Number : The octal number system is said to be of base, or radix 8, because it uses 8 digits
and the coefficients are multiplied by power of 8.
Eight digits used in octal system are: 0, 1, 2, 3, 4, 5, 6 and 7.
Hexadecimal number : The hexadecimal number system is said to be of base, or radix 16,
because it uses 16 symbols and the coefficients are multiplied by power of 16.
Sixteen digits used in hexadecimal system are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E and F.
We know that for n-bit number, the range for natural number is from .
For n-bit, we have all together different combination, and we use these different combination to
represent numbers, which ranges from .
If we want to include the negative number, naturally, the range will decrease. Half of the
combinations are used for positive number and other half is used for negative number.
We know that for n-bit number, the range for natural number is from .
Signed-Magnitude form.
1’s complement form.
2’s complement form.
In signed-magnitude form, one particular bit is used to indicate the sign of the number, whether it is
a positive number or a negative number. Other bits are used to represent the magnitude of the
number.
For an n-bit number, one bit is used to indicate the signed information and remaining (n-1) bits are
used to represent the magnitude. Therefore, the range is from .
Generally, Most Significant Bit (MSB) is used to indicate the sign and it is termed as signed bit. 0 in
signed bit indicates positive numvber and 1 in signed bit indicates negative number.
Consider a number system of base-r or radix-r. There are two types of complements,
Consider the eight bit number 01011100, 2's complements of this number is 10100100. If we
perform the follwoing addition:
0 1 0 1 1 1 0 0
1 0 1 0 0 0 1 1
--------------------------------
1 0 0 0 0 0 0 0 0
Since we are considering an eight bit number, so the 9th bit (MSB) of the result can not be stored.
Therefore, the final result is 00000000.
Since the addition of two number is 0, so one can be treated as the negative of the other number.
So, 2's complement can be used to represent negative number.
e.g., 10's complement of 5642 is 9's complement of 5642 + 1, i.e., 4357 + 1 = 4358
e.g., 2's complement of 1010 is 1's complement of 1010 + 1, i.e., 0101 + 1 = 0110.
Since we are considering an eight bit number, so the 9th bit (MSB) of the result can not be stored.
Therefore, the final result is 00000000.
Since the addition of two number is 0, so one can be treated as the negative of the other number.
So, 1's complement can be used to represent negative number.
Decimal 2's Complement 1's complement Signed Magnitude
+7 0111 0111 0111
Represe
+6 0110 0110 0110 ntation
+5 0101 0101 0101 of Real
Number
+4 0100 0100 0100
+3 0011 0011 0011
+2 0010 0010 0010
+1 0001 0001 0001
+0 0000 0000 0000
-0 ----- 1111 1000
-1 1111 1110 1001
-2 1110 1101 1010
-3 1101 1100 1011
-4 1100 1011 1100
-5 1011 1010 1101
-6 1010 1001 1110
-7 1001 1000 1111
-8 1000 ------ -------
Fixed-point representation
Floating-point representation
This is known as fixed-point representation where the position of decimal point is fixed and number
of bits before and after decimal point are also predefined.
If we use 16 bits before decimal point and 7 bits after decimal point, in signed magnitude form, the
range is
One bit is required for sign information, so the total size of the number is 24 bits
( 1(sign) + 16(before decimal point) + 7(after decimal point) ).
Floating-point representation:
In this representation, numbers are represented by a mantissa comprising the significant digits and
an exponent part of Radix R. The format is:
Numbers are often normalized, such that the decimal point is placed to the right of the first non zero
digit.
To store this number in floating point representation, we store 5236 in mantissa part and 4 in
exponent part.
IEEE standard floating point format: IEEE has proposed two standard for representing
floating-point number:
Single precision
Double precision
Double Precision:
Single Precision:
S E M
S E M
Representation of Character
Since we are working with 0's and 1's only, to represent character in computer we use strings of 0's
and 1's only.
To represent character we are using some coding scheme, which is nothing but a mapping
function.
Some of standard coding schemes are: ASCII, EBCDIC, UNICODE.
ASCII : American Standard Code for Information Interchange.
It uses a 7-bit code. All together we have 128 combinations of 7 bits and we can
represent 128 character.
As for example 65 = 1000001 represents character 'A'.
UNICODE : It is used to capture most of the languages of the world. It uses 16-bit
Unicode provides a unique number for every character, no matter what the platform, no matter what
the program, no matter what the language. The Unicode Standard has been adopted by such
industry leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and
many others
Computer Architecture is the field of study of selecting and interconnecting hardware components
to create computers that satisfy functional performance and cost goals. It refers to those attributes
of the computer system that are visible to a programmer and have a direct effect on the execution
of a program.
Instruction set
Data formats
In short, it is the combination of Instruction Set Architecture, Machine Organization and the
related hardware. Generation: The Brief History of Computer Architecture
ENIAC [1945]: Designed by Mauchly & Echert, built by US army to calculate trajectories for
ballistic shells during Worls War II. Around 18000 vacuum tubes and 1500 relays were used
to build ENIAC, and it was programmed by manually setting switches
UNIVAC [1950]: the first commercial computer
John Von Neumann architecture: Goldstine and Von Neumann took the idea of ENIAC
and developed concept of storing a program in the memory. Known as the Von Neumann's
architecture and has been the basis for virtually every machine designed since then.
Features:
Electron emitting devices
Data and programs are stored in a single read-write memory
Memory contents are addressable by location, regardless of the content itself
Machine language/Assemble language
Sequential execution
William Shockley, John Bardeen, and Walter Brattain invent the transistor that
reduce size of computers and improve reliability. Vacuum tubes have been replaced
by transistors.
Semiconductor memory
Fourth Generation (1974-Present) :: Very Large-Scale Integration (VLSI) / Ultra Large Scale
Integration (ULSI)
Instruction Set Architecture (ISA) Abstract interface between the Hardware and lowest-level
Software
Typical RISC:
Under a rapidly changing set of forces, computer technology keeps at dramatic change, for
example:
If computer architecture is a view of the whole design with the important characteristics visible
to programmer, computer organization is how features are implemented with the specific
building blocks visible to designer, such as control signals, interfaces, memory technology, etc.
Computer architecture and organization are closely related, though not exactly the same.
Memory -- storage for instructions and data for currently executing programs
I/O system -- controller which communicate with "external" devices:
secondary memory, display devices, networks
Data-path & control -- collection of parallel wires, transmits data, instructions, or control
signal
Computer organization defines the ways in which these components are interconnected and
controlled. It is the capabilities and performance characteristics of those principal functional units.
Architecture can have a number of organizational implementations, and organization differs
between different versions. Such, all Intel x86 families share the same basic architecture, and IBM
system/370 family share their basic architecture. The history of Computer Organization
Computer architecture has progressed four generation: vacuum tubes, transistors, integrated circuits,
and VLSI. Computer organization has also made its historic progression accordingly.
1977: 8080 - the first general purpose microprocessor, 8 bit data path, used in first
personal computer
1978: 8086 - with 16 bit, 1MB addressable, instruction cache, prefetch few instructions
1980: 80186 - identical to 8086 widh additional reserved interrupt vectors and some very
powerful buli-in
I/O functions.
1982: 80286 - 24 Mbyte addressable memory space, plus instructions
1985: 80386 - 32 bit, new addressing modes and support for multitasking
1989 -- 1995:
o 80486 - 25, 33, MHz, 1.2 M transistors, 5 stage pipeline, sophisticated powerful
cache and
instruction pipelining, built in math co-processor.
o Pentium - 60, 66 MHz, 3.1 M transistor, branch predictor, pipelined floating point,
multiple instructions
executed in parallel, first superscalar IA-32.
o PentiumPro - Increased superscalar, register renaming, branch prediction, data flow
analysis,
and speculative execution
1995 -- 1997: Pentium II - 233, 166, 300 MHz, 7.5 M transistors, first compaction of micro-
architecture,
MMX technology, graphics video and audio processing.
1999: Pentium III - additional floating point instructions for 3D graphics
2000: Pentium IV - Further floating point and multimedia enhancements
Evolution of Memory
o 1995, EDO - Extended Data Output, which increases the read cycle between
memory and CPU, 20 MHz
o 1997- 1998: SDRAM - Synchronous DRAM, which synchronizes itself with the CPU
bus and runs at higher clock speeds, PC66 at 66 MHz, PC100 at 100 MHz
o 1999: RDRAM - Rambus DRAM, which DRAM with a very high bandwidth, 800 MHz
A bus is a parallel circuit that connects the major components of a computer, allowing the transfer
of electric impulses form one connected component to any other.
o IDE - Integrated Drive Electronics, also know as ATA, EIDE, Ultra ATA,
Ultra DMA,
most widely used interface for hard disks
o PS/2 port - mini Din plug with 6 pins for a mouse and keyboard
ALU
Arithmetic and Logic Unit
The basic operations are implemented in hardware level. ALU is having collection of two types of
operations:
Arithmetic operations
Logical operations
To identify any one of these four logical operations or four arithmetic operations, two control lines
are needed. Also to identify the any one of these two groups- arithmetic or logical, another control
line is needed. So, with the help of three control lines, any one of these eight operations can be
identified.
Consider an ALU is having four arithmetic operations. Addition, subtraction, multiplication and
division. Also consider that the ALU is having four logical operations: OR, AND, NOT & EX-OR.
We need three control lines to identify any one of these operations. The input combination of these
control lines are shown below:
Control line is used to identify the group: logical or arithmetic, ie
: arithmetic operation : logical operation.
Control lines and are used to identify any one of the four operations in a group. One possible
combination is given here.
A decode is used to decode the instruction. The block diagram of the ALU is shown in figure
2.1.
Figure 2.1: Block Diagram of the ALU The ALU has got two
input registers named as A and B and one output storage register, named as C. It performs the
operation as:
C = A op B
The input data are stored in A and B, and according to the operation specified in the control lines,
the ALU perform the operation and put the result in register C.
As for example, if the contents of controls lines are, 000, then the decoder enables the addition
operation and it activates the adder circuit and the addition operation is performed on the data that
are available in storage register A and B . After the completion of the operation, the result is stored
in register C.
We should have some hardware implementations for basic operations. These basic operations can
be used to implement some complicated operations which are not feasible to implement directly in
hardware. There are several logic gates exists in digital logic circuit. These logic gates can be used
to implement the logical operation. Some of the common logic gates are mentioned here.
AND gate: The output is high if both the inputs are high. The AND gate and its truth table is shown
in Figure 2.2.
EX-OR gate: The output is high if either of the input is high. The EX-OR gate and its truth table is
given in Figure 2.4.
If we want to construct a circuit which will perform the AND operation on two 4-bit number, the
implementation of the 4-bit AND operation is shown in the Figure-2.5.
Figure2.5: 4-bit AND operatorArithmetic Circuit
Binary Adder :
In general, the adder circuit needs two binary inputs and two binary outputs. The input variables
designate the augends and addend bits; The output variables produce the sum and carry.
The binary addition operation of single bit is shown in the truth table
C: Carry Bit
S: Sum Bit
This circuit can not handle the carry input, so it is termed as half [Link] circuit diagram and block
diagram of Half Adder is shown in Figure 2.6.
Full Adder: A full adder is a combinational circuit that forms the arithmetic sum of three bits. It
consists of three inputs and two outputs.
Two of the input variables, denoted by x and y, represent the two bits to be added. The third input
Z, represents the carry from the previous lower position.
The two outputs are designated by the symbols S for sum and C for carry.
The circuit diagram and block diagram of a Full Adder is shown in the Figure 2.7. n-such single bit
full adder blocks are used to make n-bit full adder. To demonstrate the binary addition of four bit
numbers, let us consider a specific example.
A =1 0 0 1 B=0011
Binary Subtractor :
The subtraction operation can be implemented with the help of binary adder circuit, because
We know that 2's complement representation of a number is treated as a negative number of the
given number.
We can get the 2's complements of a given number by complementing each bit and adding 1 to it.
The circuit for subtracting A-B consist of an added with inverter placed between each data input B
and the corresponding input of the full adder. The input carry must be equal to 1 when
performing subtraction.
The operation thus performed becomes A , plus the 1's complement of B , plus 1. This is equal to A
plus 2's complement of B .
With this principle, a single circuit can be used for both addition and subtraction. The 4 bit adder
subtractor circuit is shown in the figure. It has got one mode ( M ) selection input line, which will
determine the operation,
If , then
If then
1's complement of
Figure 2.9: 4-bit adder subtractor
The circuit diagram of a 4-bit adder substractoris shown in the Figure 2.9.
if
if
Multiplication
Multiplication of two numbers in binary representation can be performed by a process of SHIFT and
ADD operations. Since the binary number system allows only 0 and 1's, the digit multiplication can
be replaced by SHIFT and ADD operation only, because multiplying by 1 gives the number itself
and multiplying by 0 produces 0 only.
The process consists of looking at successive bits of the multiplier, least significant bit first.
If the multiplier bit is a 1, the multiplicand is copied down, otherwise, zeros are copied down. The
numbers copied down in successive lines are shifted one position to the left from the previous
number. Finally, the numbers are added and their sum forms the product.
Instead of providing registers to store and add simultaneously as many binary numbers as there are
bits in the multiplier, it is convenient to provide an adder for the summation of only two binary
numbers and successively accumulate the partial products in a register. It will reduce the
requirements of registers.
Instead of sifting the multiplicand to the left, the partial product is shifted to right.
When the corresponding bit of the multiplier is 0, there is no need to add all zeros to the partial
product.
An algorithm to multiply two binary numbers. Consider that the ALU does not provide the
multiplication operation, but it is having the addition operation and shifting operation. Then we can
write a micro program for multiplication operation and provide the micro program code in memory.
When a multiplication operation is encountered, it will execute this micro code to perform the
multiplication
The micro code is nothing but the collection of some instructions. ALU must have those operation;
otherwise we must have micro code again for those operations which are not supported in ALU.
Consider a situation such that we do not have the multiplication operation in a primitive computer. Is
it possible to perform the multiplication. Of course, yes, provided the addition operation is available.
We can perform the multiplication with the help of repeated addition method; for example, if we
want to multiply 4 by 5 ( 4 5), then simply add 4 five times to get the result.
Consider a machine, which can handle 8 bit numbers, then we can represent number from 0 to 255.
If we want to multiply 175 225, then there will be at least 175 addition operation.
But if we use the multiplication algorithm that involves shifting and addition, it can be done in 8
steps, because we are using an 8-bit machine.
Again, the micro program execution is slightly slower, because we have to access the code from
micro controller memory, and memory is a slower device than CPU.
The counter P is initially set to a number equal to the number of bits in the multiplier. The counter is
decremented by 1 after forming each partial product. When the content of the counter reaches zero,
the product is formed and the process stops.
Initially, the multiplicand is in register B and the multiplier in Q. The register A is reset to 0.
The sum of A and B forms a partial product- which is transferred to the EA register.
Both partial product and multiplier are shifted to the right. The least significant bit of A is shifted into
the most significant position of Q; and 0 is shifted into E.
After the shift, one bit of the partial product is shifted into Q, pushing the multiplier bits one position
to the right.
The right most flip flop in register Q, designated by Q0 will hold the bit of the multiplier which must be
inspected next. If the content of this bit is 0, then it is not required to add the multiplicand, only
shifting is needed. If the content of this bit is 1, then both addition and shifting are neededAfter each
shifter, value of counter P is decremented and the process continues till the counter value becomes
0.
To control the operation, it is required to design the appropriate control logic that is shown in the
block diagram.
The flow chart of the multiplication operation is given in the Figure 2.11.
Figure 2.11: Flow chart of the multiplication operation
The working of multiplication algorithm is shown here with the help of an example.
Multiplicand
UNIT-II
CENTRAL PROCESSING UNIT DESIGN
In this Module, we have six lectures, viz.
2. Processor Organization
5. Microprogrammed control - I
6. Microprogrammed control - II
Introduction to CPU : The operation or task that must perform by CPU are:
To do these tasks, it should be clear that the CPU needs to store some data temporarily. It must
remember the location of the last instruction so that it can know where to get the next instruction. It
needs to store instructions and data temporarily while an instruction is beign executed. In other
words, the CPU needs a small internal memory. These storage location are generally referred as
registers.
The major components of the CPU are an arithmetic and logic unit (ALU) and a control unit (CU).
The ALU does the actual computation or processing of data. The CU controls the movement of data
and instruction into and out of the CPU and controls the operation of the ALU.
The CPU is connected to the rest of the system through system bus. Through system bus, data or
information gets transferred between the CPU and the other component of the system. The system
bus may have three components:
Data Bus:
Data bus is used to transfer the data between main memory and CPU.
Address Bus:
Address bus is used to access a particular memory location by putting the address of the memory
location.
Control Bus:
Control bus is used to provide the different control signal generated by CPU to different part of the
system. As for example, memory read is a signal generated by CPU to indicate that a memory read
operation has to be performed. Through control bus this signal is transferred to memory module to
indicate the required operation.
There are three basic components of CPU: register bank, ALU and Control Unit. There are several
data movements between these units and for that an internal CPU bus is used. Internal CPU bus is
needed to transfer data between the various registers and the ALU.
The internal organization of CPU in more abstract level is shown in the Figure 5.1 and Figure 5.2.
A computer system employs a memory hierarchy. At the highest level of hierarchy, memory is
faster, smaller and more expensive. Within the CPU, there is a set of registers which can be treated
as a memory in the highest level of hierarchy. The registers in the CPU can be categorized into two
groups:
Control and status registers: These are used by the control unit to control the operation of
the CPU. Operating system programs may also use these in privileged mode to control the
execution of program.
User-visible Registers:
In other cases, there is a partial or clean separation between data registers and address registers.
Data registers may be used to hold only data and cannot be employed in the calculation of an
operand address.
Address registers may be somewhat general purpose, or they may be devoted to a particular
addressing mode. Examples include the following:
Segment pointer: In a machine with segment addressing, a segment register holds the
address of the base of the segment. There may be multiple registers, one for the code
segment and one for the data segment.
Index registers: These are used for indexed addressing and may be autoindexed.
Stack pointer: If there is user visible stack addressing, then typically the stack is in memory
and there is a dedicated register that points to the top of the stack.
Condition Codes (also referred to as flags) are bits set by the CPU hardware as the result of the
operations. For example, an arithmatic operation may produce a positive, negative, zero or overflow
result. In addition to the result itself beign stored in a register or memory, a condition code is also
set. The code may be subsequently be tested as part of a condition branch operation. Condition
code bits are collected into one or more registers.
Register Organization
There are a variety of CPU registers that are employed to control the operation of the CPU. Most of
these, on most machines, are not visible to the user.
Different machines will have different register organizations and use different terminology. We will
discuss here the most commonly used registers which are part of most of the machines.
Program Counter (PC): Contains the address of an instruction to be fetched. Typically, the PC is
updated by the CPU after each instruction fetched so that it always points to the next instruction to
be executed. A branch or skip instruction will also modify the contents of the PC.
Instruction Register (IR): Contains the instruction most recently fetched. The fetched instruction is
loaded into an IR, where the opcode and operand specifiers are analyzed.
Memory Address Register (MAR): Containts the address of a location of main memory from
where information has to be fetched or information has to be stored. Contents of MAR is directly
connected to the address bus.
Memory Buffer Register (MBR): Contains a word of data to be written to memory or the word
most recently read. Contents of MBR is directly connected to the data [Link] is also known as
Memory Data Register(MDR).
Apart from these specific register, we may have some temporary registers which are not visible to
the user. As such, there may be temporary buffering registers at the boundary to the ALU; these
registers serve as input and output registers for the ALU and exchange data with the MBR and user
visible registers.
All CPU designs include a register or set of registers, often known as the processor status word
(PSW), that contains status information. The PSW typically contains condition codes plus other
status information. Common fields or flags include the following:
Sign: Contains the sign bit of the result of the last arithmatic operation.
Zero: Set when the result is zero.
Carry: Set if an operation resulted in a carry (addition) into or borrow (subtraction)
out of a high order bit.
Equal: Set if a logical campare result is equal.
Overflow: Used to indicate arithmatic overflow.
Interrupt enable/disable: Used to enable or disable interrupts.
Supervisor: Indicate whether the CPU is executing in supervisor or user mode.
Certain privileged instructions can be executed only in supervisor mode, and certain
areas of memory can be accessed only in supervisor mode.
Apart from these, a number of other registers related to status and control might be found in a
particular CPU design. In addition to the PSW, there may be a pointer to a block of memory
containing additional status informatioConcept of Program Execution
The instructions constituting a program to be executed by a computer are loaded in sequential
locations in its main memory. To execute this program, the CPU fetches one instruction at a time
and performs the functions specified. Instructions are fetched from successive memory locations
until the execution of a branch or a jump instruction.
The CPU keeps track of the address of the memory location where the next instruction is located
through the use of a dedicated CPU register, referred to as the program counter (PC). After fetching
an instruction, the contents of the PC are updated to point at the next instruction in sequence.
For simplicity, let us assume that each instruction occupies one memory word. Therefore, execution
of one instruction requires the following three steps to be performed by the CPU:
1. Fetch the contents of the memory location pointed at by the PC. The contents of this
location are interpreted as an instruction to be executed. Hence, they are stored in
the instruction register (IR). Symbolically this can be written as:
IR = [ [PC] ]
3. Carry out the actions specified by the instruction stored in the IR.
n (e.g. process control block) The first two steps are usually referred to as the fetch phase and the
step 3 is known as the execution phase. Fetch cycle basically involves read the next instruction
from the memory into the CPU and along with that update the contents of the program counter. In
the execution phase, it interpretes the opcode and perform the indicated operation. The instruction
fetch and execution phase together known as instruction cycle. The basic instruction cycle is shown
in the Figure 5.3.
In cases, where an instruction occupies more than one word, step 1 and step 2 can be repeated as
many times as necessary to fetch the complete instruction. In these cases, the execution of a
instruction may involve one or more operands in memory, each of which requires a memory access.
Further, if indirect addressing is used, then additional memory access are required.
The fetched instruction is loaded into the instruction register. The instruction contains bits that
specify the action to be performed by the processor. The processor interpretes the instruction and
performs the required action. In general, the actions fall into four categories:
Data processing: The processor may perform some arithmatic or logic operation on data.
The main line of activity consists of alternating instruction fetch and instruction execution activities.
After an instruction is fetched, it is examined to determine if any indirect addressing is involved. If
so, the required operands are fetched using indirect addressing.
The execution cycle of a perticular instruction may involve more than one reference to memory.
Also, instead of memory references, an instruction may specify an I/O operation. With these
additional considerations the basic instruction cycle can be expanded with more details view in the
Figure 5.4. The figure is in the form of a state diagram.
Processor Organization
There are several components inside a CPU, namely, ALU, control unit, general purpose register,
Instruction registers etc. Now we will see how these components are organized inside CPU. There
are several ways to place these components and inteconnect them. One such organization is
shown in the Figure 5.6.
In this case, the arithmatic and logic unit (ALU), and all CPU registers are connected via a single
common bus. This bus is internal to CPU and this internal bus is used to transfer the information
between different components of the CPU. This organization is termed as single bus organization,
since only one internal bus is used for transferring of information between different components of
CPU. We have external bus or buses to CPU also to connect the CPU with the memory module and
I/O devices. The external memory bus is also shown in the Figure 5.6 connected to the CPU via the
memory data and address register MDR and MAR.
The number and function of registers R0 to R(n-1) vary considerably from one machine to another.
They may be given for general-purpose for the use of the programmer. Alternatively, some of them
may be dedicated as special-purpose registers, such as index register or stack pointers.
In this organization, two registers, namely Y and Z are used which are transperant to the user.
Programmer can not directly access these two registers. These are used as input and output buffer
to the ALU which will be used in ALU operations. They will be used by CPU as temporary storage
Most of the operation of a CPU can be carried out by performing one or more of the following
functions in some prespecified sequence:
1. Fetch the contents of a given memory location and load them into a CPU register.
2. Store a word of data from a CPU register into a given memory location.
3. Transfer a word of data from one CPU register to another or to the ALU.
4. Perform an arithmatic or logic operation, and store the result in a CPU register.
Now we will examine the way in which each of the above functions is implemented in a computer.
Fetching a Word from Memory:
Information is stored in memory location indentified by their address. To fetch a word from memory,
the CPU has to specify the address of the memory location where this information is stored and
request a Read operation. The information may include both, the data for an operation or the
instruction of a prograTo perform a memory fetch operation, we need to complete the following
tasks:
The CPU transfers the address of the required memory location to the Memory Address Register
(MAR).
The MAR is connected to the memory address line of the memory bus, hence the address of the
required word is transfered to the main memory.
Next, CPU uses the control lines of the memory bus to indicate that a Read operation is initiated.
After issuing this request, the CPU waits until it receives an answer from the memory, indicating
that the requested operation has been completed.
The memory set this signal to 1 to indicate that the contents of the specified memory location are
available in memory data bus.
As soon as MFC signal is set to 1, the information available in the data bus is loaded into the
Memory Data Register (MDR) and this is available for use inside the CPU.
As an example, assume that the address of the memory location to be accessed is kept in register
R2 and that the memory contents to be loaded into register R1. This is done by the following
sequence of operations:
The time required for step 3 depends on the speed of the memory unit. In general, the time required
to access a word from the memory is longer than the time required to perform any operation within
the CPU.
The scheme that is used here to transfer data from one device (memory) to another device (CPU) is
referred to as an asynchronous transfer.
This asynchronous transfer enables transfer of data between two independent devices that have
different speeds of operation. The data transfer is synchronised with the help of some control
signals. In this example, Read request and MFC signal are doing the synchronization task.
An alternative scheme is synchronous transfer. In this case all the devices are controlled by a
common clock pulse (continously running clock of a fixed frequency). These pulses provide
common timing signal to the CPU and the main memory. A memory operation is completed during
every clock period. Though the synchronous data transfer scheme leads to a simpler
implementation, it is difficult to accommodate devices with widely varying speed. In such
cases, the duration of the clock pulse will be synchronized to the slowest device. It reduces the
speed of all the devices to the slowest one.
Storing a word into memory
The procedure of writing a word into memory location is similar to that for reading one from
memory. The only difference is that the data word to be written is first loaded into the MDR, the
write command is issued.
As an example, assumes that the data word to be stored in the memory is in register R1 and that
the memory address is in register R2. The memory write operation requires the following sequence:
1. MAR [R2]
2. MDR [R1]
3. Write
4. Wait for MFC
- In this case step 1 and step 2 are independent and so they can be carried out in any order. In
fact, step 1 and 2 can be carried out simultaneously, if this is allowed by the architecture, that is, if
these two data transfers (memory address and data) do not use the same data path.
In case of both memory read and memory write operation, the total time duration depends on wait
for the MFC signal, which depends on the speed of the memory module.
There is a scope to improve the performance of the CPU, if CPU is allowed to perform some other
operation while waiting for MFC signal. During the period, CPU can perform some other instructions
which do not require the use of MAR and MDR.
Register transfer operations enable data transfer between various blocks connected to the common
bus of CPU. We have several registers inside CPU and it is needed to transfer information from one
register another. As for example during memory write operation data from appropriate register must
be moved to MDR.
Since the input output lines of all the register are connected to the common internal bus, we need
appropriate input output gating. The input and output gates for register Ri are controlled by the
signal Ri in and Ri out respectively.
Thus, when Ri in set to 1 the data available in the common bus is loaded into Ri . Similarly when, Ri
out is set to 1, the contents of the register Ri are placed on the bus. To transfer data from one
register to other register, we need to generate the appropriate register gating signal.
For example, to transfer the contents of register R1 to register R2, the following actions are
needed:
Therefore, to perform any arithmetic or logic operation (say binary operation) both the input should
be made available at the two inputs of the ALU simultaneously. Once both the inputs are available
then appropriate signal is generated to perform the required operation.
We may have to use temporary storage (register) to carry out the operation in ALU .
The sequence of operations that have to carried out to perform one ALU operation depends on the
organization of the CPU. Consider an organization in which one of the operand of ALU is stored in
some temporary register Y and other operand is directly taken from CPU internal bus. The result of
the ALU operation is stored in another temporary register Z. This organization is shown in the
Figure 5.7.
Therefore, the sequence of operations to add the contents of register R1 to register R2 and store the
result in register R3 should be as follows:
1. R1out, Yin
2. R2out, Add, Zin
3. Zout, R3in
In step 2 of this sequence, the contents of register R2 are gated to the bus, hence to input ?B of the
ALU which is directly connected to the bus. The contents of register Y are always available at input
A of ALU. The function performed by the ALU depends on the signal applied to the ALU control
lines. In this example, the Add control line of ALU is set to 1, which indicate the addition operation
and the output of ALU is the sum of the two numbers at input A and B. The sum is loaded into
register Z, since the input gate is enabled (Zin ). In step 3, the contents of register Z are transferred
to the destination register R3.
Till now we have considered only one internal bus of CPU. The single-bus organization, which is
only one of the possibilities for interconnecting different building blocks of CPU.
An alternative structure is the two bus structure, where two different internal buses are used in
CPU. All register outputs are connected to bus A, add all registered inputs are connected to bus B.
There is a special arrangement to transfer the data from one bus to the other bus. The buses are
connected through the bus tie G. When this tie is enabled data on bus A is transfer to bus B. When
G is disabled, the two buses are electrically isolated.
Since two buses are used here the temporary register Z is not required here which is used in single
bus organization to store the result of ALU. Now result can be directly transferred to bus B, since
one of the inputs is in bus A. With the bus tie disabled, the result can directly be transferred to
destination register. A simple two bus structure is shown in the Figure 5.8.
For example, for the operation, [R3] [R1] + [R2] can now be performed as
In this case source register R2 and destination register R3 has to be different, because the two
operations R2in and R2out can not be performed together. If the registers are made of simple latches
then only we have the restriction.
We may have another CPU organization, where three internal CPU buses are used. In this
organization each bus connected to only one output and number of inputs. The elimination of the
need for connecting more than one output to the same bus leads to faster bus transfer and simple
control.
A multiplexer is provided at the input to each of the two working registers A and B, which allow them
to be loaded from either the input data bus or the register data bus.
In this three bus organization, we are keeping two input data buses instead of one that is use
Figure 5.9 : Three Bus structure
Two separate input data buses are present ? one is for external data transfer, i.e. retrieving from
memory and the second one is for internal data transfer that is transferring data from general
purpose register to other building block inside the CPU.
To execute a complete instruction we need to take help of these basic operations and we need to
execute these operation in some particular order.
As for example, consider the instruction : "Add contents of memory location NUM to the contents of
register R1 and store the result in register R1." For simplicity, assume that the address NUM is
given explicitly in the address field of the instruction .That is, in this instruction, direct addressing
mode is used.
Execution of this instruction requires the following action :
1. Fetch instruction
2. Fetch first operand (Contents of memory location pointed at by the address
field of the instruction)
3. Perform addition
4. Load the result into R1
Following sequence of control steps are required to implement the above operation for the single-
bus architecture that we have discussed in earlier section.
Steps Actions
1. PCout, MARin, Read, Clear Y, Set carry -in to ALU, Add, Zin
3. MDRout, Irin
7. Zout, R1in
8. END
In Step1:
The instruction fetch operation is initiated by loading the contents of the PC into the MAR
and sending a read request to memory.
To perform this task first of all the contents of PC have to be brought to internal bus and then it is
loaded to [Link] perform this task control circuit has to generate the PCout signal and MARin
signal.
After issuing the read signal, CPU has to wait for some time to get the MFC signal. During that time
PC is updated by 1 through the use of the ALU. This is accomplished by setting one of the inputs to
the ALU (Register Y) to 0 and the other input is available in bus which is current value of PC.
At the same time, the carry-in to the ALU is set to 1 and an add operation is specified.
In Step 2:
The updated value is moved from register Z back into the PC. Step 2 is initiated immediately after
issuing the memory Read request without waiting for completion of memory function. This is
possible, because step 2 does not use the memory bus and its execution does not depend on the
memory read operation.
In Step 3:
Step3 has been delayed until the MFC is received. Once MFC is received, the word fetched from
the memory is transfered to IR (Instruction Register), Because it is an instruction. Step 1 through 3
constitute the instruction fetch phase of the control sequence.
The instruction fetch portion is same for all instructions. Next step onwards, instruction execution
phase takes place.
As soon as the IR is loaded with instruction, the instruction decoding circuits interprets its contents.
This enables the control circuitry to choose the appropriate signals for the remainder of the control
sequence, step 4 to 8, which we referred to as the execution phase. To design the control
sequence of execution phase, it is needed to have the knowledge of the internal structure and
instruction format of the PU. Secondly , the length of instruction phase is different for different
instruction.
opcode M R
In Step 5 :
The destination field of IR, which contains the address of the register R1, is used to transfer the
contents of register R1 to register Y and wait for Memory function Complete. When the read
operation is completed, the memory operand is available in MDR.
In Step 6 :
In Step 7:
The result of addition operation is transfered from temporary register Z to the destination register
R1 in this step.
In step 8 :
It indicates the end of the execution of the instruction by generating End signal. This indicates
completion of execution of the current instruction and causes a new fetch cycle to be started by
going back to step 1.
Branching
With the help of branching instruction, the control of the execution of the program is transfered from
one particular position to some other position, due to which the sequence flow of control is broken.
Branching is accomplished by replacing the current contents of the PC by the branch address, that
is, the address of the instruction to which branching is required.
Consider a branch instruction in which branch address is obtained by adding an offset X, which is
given in the address field of the branch instruction, to the current value of PC.
The control sequence that enables execution of an unconditional branch instruction using the single
- bus organization is as follows :
Steps Actions
1. PCout, MARin, Read, Clear Y, Set Carry-in to ALU, Add ,Zin
3. MDRout, IRin
4. PCout, Yin
6. Zout, PCin
7. End
Execution starts as usual with the fetch phase, ending with the instruction being loaded into the IR
in step 3. To execute the branch instruction, the execution phase starts in step 4.
In Step 4
In Step 5
The offset X of the instruction is gated to the bus and the addition operation is performed.
In Step 6
The result of the addition, which represents the branch address is loaded into the PC.
In Step 7
It generates the End signal to indicate the end of execution of the current instruction.
Consider now the conditional branch instruction instead of unconditional branch. In this case, we
need to check the status of the condition codes, between step 3 and 4. i.e., before adding the offset
value to the PC contents.
For example, if the instruction decoding circuitry interprets the contents of the IR as a branch on
Negative(BRN) instruction, the control unit proceeds as follows:First the condition code register is
checked. If bit N (negative) is equal to 1 , the control unit proceeds with step 4 trough step 7 of
control sequence of unconditional branch instruction.
This in effect , terminates execution of the branch instruction and causes the instruction
immediately following in the branch instruction to be fetched when a new fetch operation is
performed.
Therefore , the control sequence for the conditional branch instruction BRN can be obtained from
the control sequence of an unconditional branch instruction by replacing the step 4 by
4. If then End
If N then PCout, yin
BZ : Branch on positive
BP : Branch on Positive
BO : Branch on overflow
To execute an instruction, the control unit of the CPU must generate the required control signal in
the proper sequence. As for example, during the fetch phase, CPU has to generate PCout signal
along with other required signal in the first clock pulse. In the second clock pulse CPU has to
generate PCin signal along with other required signals. So, during fetch phase, the proper sequence
for generating the signal to retrieve from and store to PC is PCout and PCin.
To generate the control signal in proper sequence, a wide variety of techniques exist. Most of these
techniques, howeve, fall into one of the two categories,
1. Hardwired Control
2. Microprogrammed Control.
Hardwired Control
In this hardwired control techniques, the control signals are generated by means of hardwired
circuit. The main objective of control unit is to generate the control signal in proper sequence.
Consider the sequence of control signal required to execute the ADD instruction that is explained in
previous lecture. It is obvious that eight non-overlapping time slots are required for proper execution
of the instruction represented by this sequence.
Each time slot must be at least long enough for the function specified in the corresponding step to
be completed. Since, the control unit is implemented by hardwire device and every device is having
a propagation delay, due to which it requires some time to get the stable output signal at the output
port after giving the input signal. So, to find out the time slot is a complicated design task.
For the moment, for simplicity, let us assume that all slots are equal in time duration. Therefore the
required controller may be implemented based upon the use of a counter driven by a clock.
Each state, or count, of this counter corresponds to one of the steps of the control sequence of the
instructions of the CPU.
In the previous lecture, we have mentioned control sequence for execution of two instructions only
(one is for add and other one is for branch). Like that we need to design the control sequence ofBy
looking into the design of the CPU, we may say that there are various instruction for add
operation. As for example,
ADD NUM R1 Add the contents of memory location specified by NUM to the
contents
of register R1 .
The control sequence for execution of these two ADD instructions are different. Of course, the fetch
phase of all the instructions remain same.
It is clear that control signals depend on the instruction, i.e., the contents of the instruction register.
It is also observed that execution of some of the instructions depend on the contents of condition
code or status flag register, where the control sequence depends in conditional branch instruction.
Hence, the required control signals are uniquely determined by the following information:
The external inputs represent the state of the CPU and various control lines connected to it, such as
MFC status signal. The condition codes/ status flags indicates the state of the CPU. These includes
the status flags like carry, overflow, zero, etc.
The structure of control unit can be represented in a simplified view by putting it in block diagram.
The detailed hardware involved may be explored step by step. The simplified view of the control
unit is given in the Figure 5.10.
The decoder/encoder block is simply a combinational circuit that generates the required control
outputs depending on the state of all its input.
The decoder part of decoder/encoder part provide a separate signal line for each control step, or
time slot in the control sequence. Similarly, the output of the instructor decoder consists of a
separate line for each machine instruction loaded in the IR, one of the output line INS1 to INSm is
set to 1 and all other lines are set to 0.
The detailed view of the control unit organization is shown in the Figure 5.11.
Figure 5.11: Detailed view of Control Unit organization
All input signals to the encoder block should be combined to generate the individual control signals.
In the previous section, we have mentioned the control sequence of the instruction,
"Add contents of memory location address in memory direct made to register R1 ( ADD_MD)",
It is required to generate many control signals by the control unit. These are basically coming out
from the encoder circuit of the control signal generator. The control signals are: PCin, PCout, Zin, Zout,
MARin, ADD, END, etc.
By looking into the above three instructions, we can write the logic function for Zin as :
For all instructions, in time step1 we need the control signal Zin to enable the input to register Zin
time cycle T6 of ADD_MD instruction, in time cycle T5 of BR instruction and so on.
These logic functions can be implemented by a two level combinational circuit of AND and OR
gates.
This END signal indicates the end of the execution of an instruction, so this END signal can be used
to start a new instruction fetch cycle by resetting the control step counter to its starting value.
The circuit diagram (Partial) for generating Zin and END signal is shown in the Figure 5.12 and
Figure 5.13 respectively.
Figure 5.12: Generation of Zin Control Signal Figure 5.13: Generation of the END Control Signal
The signal ADD_MD, BR, BRN etc. are coming from instruction decoder circuits which depends on
the contents of IR.
The signal T1, T2, T3 etc are coming out from step decoder depends on control step counter.
When wait for MFC (WMFC) signal is generated, then CPU does not do any works and it waits for
an MFC signal from memory unit. In this case, the desired effect is to delay the initiation of the next
control step until the MFC signal is received from the main memory. This can be incorporated by
inhibiting the advancement of the control step counter for the required period.
Let us assume that the control step counter is controlled by a signal called RUN.
By looking at the control sequence of all the instructions, the WMFC signal is generated as:
WMFC = T2 + T5 . ADD_MD + . . . . . . . . . . . . . .
The RUN signal is generated with the help of WMFC signal and MFC signal. The arrangement is
shown in the Figure 5.14.
The MFC signal is generated by the main memory whose operation is independent of CPU clock.
Hence MFC is an asynchronous signal that may arrive at any time relative to the CPU clock. It is
possible to synchronized with CPU clock with the help of a D flip-flop.
When WMFC signal is high, then RUN signal is low. This run signal is used with the master clock
pulse through an AND gate. When RUN is low, then the CLK signal remains low, and it does not
allow to progress the control step counter.
When the MFC signal is received, the run signal becomes high and the CLK signal becomes same
with the MCLK signal and due to which the control step counter progresses. Therefore, in the next
control step, the WMFC signal goes low and control unit operates normally till the next memory
access signal is generated.
The timing diagram for an instruction fetch operation is shown in the Figure 5.15.
Figure 5.15: Timing of control signals during instruction fetch
In this discussion, we have presented a simplified view of the way in which the sequence of control
signals needed to fetch and execute instructions may be generated.
It is observed from the discussion that as the number of instruction increases the number of
required control signals will also increase.
In VLSI technology, structure that involve regular interconnection patterns are much easier to
implement than the random connections.
One such regular structure is PLA ( programmable logic array ). PLAs are nothing but the arrays of
AND gates followed by array of OR gates. If the control signals are expressed as sum of product
form then with the help of PLA it can be implemented.
Hardwired Control
In this hardwired control techniques, the control signals are generated by means of hardwired
circuit. The main objective of control unit is to generate the control signal in proper sequence.
Consider the sequence of control signal required to execute the ADD instruction that is explained in
previous lecture. It is obvious that eight non-overlapping time slots are required for proper execution
of the instruction represented by this sequence.
Each time slot must be at least long enough for the function specified in the corresponding step to
be completed. Since, the control unit is implemented by hardwire device and every device is having
a propagation delay, due to which it requires some time to get the stable output signal at the output
port after giving the input signal. So, to find out the time slot is a complicated design task.
For the moment, for simplicity, let us assume that all slots are equal in time duration. Therefore the
required controller may be implemented based upon the use of a counter driven by a clock.
Each state, or count, of this counter corresponds to one of the steps of the control sequence of the
instructions of the CPU.
In the previous lecture, we have mentioned control sequence for execution of two instructions only
(one is for add and other one is for branch). Like that we need to design the control sequence of all
the instructions.
By looking into the design of the CPU, we may say that there are various instruction for add
operation. As for example,
ADD NUM R1 Add the contents of memory location specified by NUM to the
contents
of register R1 .
It is clear that control signals depend on the instruction, i.e., the contents of the instruction register.
It is also observed that execution of some of the instructions depend on the contents of condition
code or status flag register, where the control sequence depends in conditional branch instruction.
Hence, the required control signals are uniquely determined by the following information:
The external inputs represent the state of the CPU and various control lines connected to it, such as
MFC status signal. The condition codes/ status flags indicates the state of the CPU. These includes
the status flags like carry, overflow, zero, etc.
Figure 5.10: Organization of control uThe structure of control unit can be represented in a
simplified view by putting it in block diagram. The detailed hardware involved may be explored step
by step. The simplified view of the control unit is given in the Figure 5.10.
The decoder/encoder block is simply a combinational circuit that generates the required control
outputs depending on the state of all its input.
The decoder part of decoder/encoder part provide a separate signal line for each control step, or
time slot in the control sequence. Similarly, the output of the instructor decoder consists of a
separate line for each machine instruction loaded in the IR, one of the output line INS1 to INSm is
set to 1 and all other lines are set to 0.
The detailed view of the control unit organization is shown in the Figure 5.11.
All input signals to the encoder block should be combined to generate the individual control signals.
In the previous section, we have mentioned the control sequence of the instruction,
"Add contents of memory location address in memory direct made to register R1 ( ADD_MD)",
It is required to generate many control signals by the control unit. These are basically coming out
from the encoder circuit of the control signal generator. The control signals are: PCin, PCout, Zin, Zout,
MARin, ADD, END, etc
By looking into the above three instructions, we can write the logic function for Zin as :
Zin = T1 + T6 . ADD_MD + T5 . BR + T5 . BRN + . . . . . . . . . . . . . .
For all instructions, in time step1 we need the control signal Zin to enable the input to register Zin
time cycle T6 of ADD_MD instruction, in time cycle T5 of BR instruction and so on.
ADD = T1 + T6 . ADD_MD + T5 . BR + . . . . . . . . . . . . . .
These logic functions can be implemented by a two level combinational circuit of AND and OR
gates.
This END signal indicates the end of the execution of an instruction, so this END signal can be used
to start a new instruction fetch cycle by resetting the control step counter to its starting value.
The circuit diagram (Partial) for generating Zin and END signal is shown in the Figure 5.12 and
Figure 5.13 respectively.
Figure 5.12: Generation of Zin Control Signal Figure 5.13: Generation of the END Control Signal
The signal ADD_MD, BR, BRN etc. are coming from instruction decoder circuits which depends on
the contents of IR.
The signal T1, T2, T3 etc are coming out from step decoder depends on control step counter.
Let us assume that the control step counter is controlled by a signal called RUN.
By looking at the control sequence of all the instructions, the WMFC signal is generated as:
WMFC = T2 + T5 . ADD_MD + . . . . . . . . . . . . . .
The RUN signal is generated with the help of WMFC signal and MFC signal. The arrangement is
shown in the Figure 5.14.
The MFC signal is generated by the main memory whose operation is independent of CPU clock.
Hence MFC is an asynchronous signal that may arrive at any time relative to the CPU clock. It is
possible to synchronized with CPU clock with the help of a D flip-flop.
When WMFC signal is high, then RUN signal is low. This run signal is used with the master clock
pulse through an AND gate. When RUN is low, then the CLK signal remains low, and it does not
allow to progress the control step counter.
When the MFC signal is received, the run signal becomes high and the CLK signal becomes same
with the MCLK signal and due to which the control step counter progresses. Therefore, in the next
control step, the WMFC signal goes low and control unit operates normally till the next memory
access signal is generated.
The timing diagram for an instruction fetch operation is shown in the Figure 5.15.
Figure 5.15: Timing of control signals during instruction fetch
In this discussion, we have presented a simplified view of the way in which the sequence of control
signals needed to fetch and execute instructions may be generated.
It is observed from the discussion that as the number of instruction increases the number of
required control signals will also increase.
In VLSI technology, structure that involve regular interconnection patterns are much easier to
implement than the random connections.
One such regular structure is PLA ( programmable logic array ). PLAs are nothing but the arrays of
AND gates followed by array of OR gates. If the control signals are expressed as sum of product
form then with the help of PLA it can be implemented.
Microprogrammed Control
In hardwired control, we saw how all the control signals required inside the CPU can be generated
using a state counter and a PLA circuit.
There is an alternative approach by which the control signals required inside the CPU can be
generated . This alternative approach is known as microprogrammed control unit.
In microprogrammed control unit, the logic of the control unit is specified by a microprogram.
A microprogrammed control unit is a relatively simple logic circuit that is capable of (1) sequencing
through microinstructions and (2) generating control signals to execute each microinstruction.
The concept of microprogram is similar to computer program. In computer program the complete
instructions of the program is stored in main memory and during execution it fetches the instructions
from main memory one after another. The sequence of instruction fetch is controlled by program
counter (PC) .
Microprogram are stored in microprogram memory and the execution is controlled by microprogram
counter ( PC).
Microprogram consists of microinstructions which are nothing but the strings of 0's and 1's. In a
particular instance, we read the contents of one location of microprogram memory, which is nothing
but a microinstruction. Each output line ( data line ) of microprogram memory corresponds to one
control signal. If the contents of the memory cell is 0, it indicates that the signal is not generated
and if the contents of memory cell is 1, it indicates to generate that control signal at that instant of
time.
First let me define the different terminologies that are related to microprogrammed control unit.
Control word is defined as a word whose individual bits represent the various control signal.
Therefore each of the control steps in the control sequence of an instruction defines a unique
combination of 0s and 1s in the CW.
A sequence of control words (CWs) corresponding to the control sequence of a machine instruction
constitutes the microprogram for that instruction.
The microprograms corresponding to the instruction set of a computer are stored in a aspecial
memory which will be referred to as the microprogram memory. The control words related to an
instructions are stored in microprogram memory.
The organization of control unit to enable conditional branching in the microprogram is shown in the
Figure 5.18.
Figure 5.18: Organization of microprogrammed control with conditional branching.
The control bits of the microinstructions word which specify the branch conditions and address are
fed to the "Starting and branch address generator" block.
This block performs the function of loading a new address into the PC when the condition of
branch instruction is satisfied.
In a computer program we have seen that execution of every instruction consists of two part - fetch
phase and execution phase of the instruction. It is also observed that the fetch phase of all
instruction is same.
At the end of fetch microprogram, the starting address generator unit calculate the appropriate
starting address of the microprogram for the instruction which is currently present in IR. After the
PC controls the execution of microprogram which generates the appropriate control signal in
proper sequence.
1. When an End instruction is encountered, the PC is loaded with the address of the first
CW in the microprogram for the instruction fetch cycle.
Let us examine the contents of microprogram memory and how the microprogram of each
instruction is stored or organized in microprogram memory. Consider the two example that are used
in our previous lecture . First example is the control sequence for execution of the instruction "Add
contents of memory location addressed in memory direct mode to register R1".
Steps Actions
1. PCout, MARin, Read, Clear Y, Set carry-in to ALU, Add, Zin
3. MDRout, IRin
7. Zout, R1in
8. END
Steps Actions
1. PCout, MARin, Read, Clear Y, Set Carry-in to ALU, Add , Zin
3. MDRout, IRin
4. PCout, Yin
6. Zout, PCin
7. End
First consider the control signal required for fetch instruction , which is same for all the instruction,
we are listing them in a particular order.
PCout MARin Read Clear Y Set Carry to ALU Add Zin Zout PCin WMFC MDRout IRin
The control word for the first three steps of the above two instruction are : ( which are the fetch
cycle of each instruction as follows ):
Step1 1 1 1 1 1 1 1 0 0 0 0 0 ---
Step2 0 0 0 0 0 0 0 1 1 1 0 0 ---
Step3 0 0 0 0 0 0 0 0 0 0 1 1 ---
We are storing this three CW in memory location 0, 1 and 2. Each instruction starts from memory
location 0. After executing upto third step, i.e., the contents of microprogram memory location 2, this
control word stores the instruction in IR. The starting address generator circuit now calculate the
starting address of the microprogram for the instruction which is available in IR.
Consider that the microprogram for add instruction is stored from memory location 50 of
microprogram memory. So the partial contents from memory location 50 are as follows :
Location50 0 1 1 0 0 0 0 0 0 0 0 0 -- -- --
51 0 0 0 0 0 0 0 0 0 1 0 0 -- -- --
and so on . . . .
0 ---------- 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 -----------
1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 ----------
2 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 ---------
|
|
50 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0
51 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0
52 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0
53 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0
54 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
When the microprogram executes the End microinstruction of an instruction, then it generates the
End control signal. This End control signal is used to load the PC with the starting address of
fetch instruction ( In our case it is address 0 of microprogram memory). Now the CPU is ready to
fetch the next instruction from main memory.
From the discussion, it is clear that microprograms are similar to computer program, but it is in one
level lower, that's why it is called microprogram.
For each instruction of the instruction set of the CPU, we will have a microprogram.
While executing a computer program, we fetch instruction by instruction from main memory which is
controlled by program counter(PC).
When we fetch an instruction from main memory, to execute that instruction , we execute the
microprogram for that instruction. Microprograms are nothing but the collection of microinstrctions.
These microinstructions will be fetched from microprogram memory one after another and its
sequence is maintained by PC. Fetching of microinstruction basically provides the required
control signal at that time instant.
In the previous discussion, to design a micro programmed control unit, we have to do the following:
For each instruction of the CPU, we have to write a microprogram to generate the control
signal. The microprograms are stored in microprogram memory (control store). The starting
address of each microprogram are known to the designer
Each microprogram is the sequence of microintructions. And these microinstructions are
executed in sequence. The execution sequence is maintained by microprogram counter.
Each microinstructions are nothing but the combination of 0?s and 1?s which is known as
control word. Each position of control word specifies a particular control signal. 0 on the
control word means that a low signal value is generated for that control signal at that
particular instant of time, similarly 1 indicates a high signal.
Since each machine instruction is executed by a corresponding micro routine, it follows that
a starting address for the micro routine must be specified as a function of the contents of
the instruction register (IR).
To incorporate the branching instruction, i.e., the branching within the microprogram, a branch
address generator unit must be included. Both unconditional and conditional branching can be
achieved with the help of microprogram. To incorporate the conditional branching instruction, it is
required to check the contents of condition code and status flag. Microprogramed controlled control
unit is very much similar to CPU. In CPU the PC is used to fetch instruction from the main memory,
but in case of control unit, microprogram counter is used to fetch the instruction from control store.
But there are some differences between these two. In case of fetching instruction from main
memory, we are using two signals MFC and WMFC. These two signals are required to synchronize
the speed between CPU and main memory. In general, main memory is a slower device than the
CPU.
In microprogrammed control the need for such signal is less obvious. The size of control store is
less than the size of main memory. It is possible to replace the control store by a faster memory,
where the speed of the CPU and control store is almost same.
Since control store are usually relatively small, so that it is feasible to speed up their speed through
costly circuits.
If we can implement the main memory by a faster device then it is also possible to eliminate the
signals MFC & WMFC. But, in general, the size of main memory is very big and it is not
economically feasible to replace the whole main memory by a faster memory to eliminate MFC &
WMFC.
Grouping of control signals:
It is observed that we need to store the information of each control signal in control store. The
status of a particular control signal is either high or low at a particular instant of time.
It is possible to reserve one bit position for each control signal. If there are n control signals in a
CPU, them the length of each control signal is n . Since we have one bit for each control signal, so
a large number of resources can be controlled with a single microinstruction. This organization of
microinstruction is known as horizontal organization.
If the machine structure allows parallel uses of a number of resources, then horizontal organization
has got advantage. Since more number of resources can be accessed parallel, the operating speed
is also more in such organization. In this situation, horizontal organization of control store has got
advantage.
If more number of resources can be accessed simultaneously, than most of the contents of control
store is 0. Since the machine architecture does not provide the parallel access of resources, so
simultaneously we cannot generate the control signal. In such situation, we can combine some
control signals and group them together. This will reduce the size of control word. If we use
compact code to specify only a small number of control functions in each microinstruction, then it is
known as vertical organization of microinstruction.
In case of horizontal organization, the size of control word is longer, which is in one extreme point
and in case of vertical organization, the size of control word is smaller, which is in other extreme.
In case of horizontal organization, the implementation is simple, but in case of vertical organization,
implementation complexity increases due to the required decoder circuits. Also the complexity of
decoder depends on the level of grouping and encoding of the control signal.
Horizontal and Vertical organization represent the two organizational extremes in microprogrammed
control. Many intermediate schemes are also possible, where the degree of encoding is a design
parameter.
We will explain the grouping of control signal with the help of an example. Grouping of control
signals depends on the internal organization of CPU.
Assigning individual bits to each control signal is certain to lead to long microinstruction, since the
number of required control signals is normally large.
However, only a few bits are set to 1 and therefore used for active gating in any given
microinstructions. This obviously results in low utilization of the available bit space.
If we group the control signal in some non-over lapping group then the size of control word reduces.
This CPU contains four general purpose registers R0 , R1 , R2 and R3 . In addition there are three
other register called SOURCES, DESTIN and TEMP. These are used for temporary storage within
the CPU and completely transparent to the programmer. A computer programmer cannot use these
three registers.
For the proper functioning of this CPU, we need all together 24 gating signals for the transfer of
information between internal CPU bus and other resources like registers.
In addition to these register gating signals, we need some other control signals which include the
Read, Write, Clear Y, set carry in, WMFC, and End signal. (Here we are restricting the control
signal for the case of discussion in reality, the number of signals are more).
It is also necessary to specify the function to be performed by ALU. Depending on the power of
ALU, we need several control lines, one control signal for each function. Assume that the ALU that
is used in the design can perform 16 different operation such as ADD, SUBSTRACT, AND, OR, etc.
So we need 16 different control lines.
The above discussion indicates that 46(24+6+16) distinct signals are required. This indicates that
we need 46 bits in each micro instructions, therefore the size of control word is 46.
Consider the microprogram pattern that is shown for the Add instruction. On an average 4 to 5 bits
are set to 1 in each micro instruction and rest of the bits are 0. Therefore, the bit utilization is poor,
and there is a scope to improve the utilization of bit.
If is observed that most signals are not needed simultaneously and many signals are mutually
exclusive.
As for example, only one function of the ALU can be activated at a time. In out case we are
considering 16 ALU operations. Instead of using 16 different signal for ALU operation, we can group
them together and reduce the number of control signal. From digital logic circuit, it is obvious that
instead of 16 different signal, we can use only 4 control signal for ALU operation and them use a 4
X 16 decoder to generate 16 different ALU signals. Due to the use of a decoder, there is a reduction
in the size of control word.
Another possibilities of grouping control signal is: A sources for data transfer must be unique, which
means that it is not possible to gate the contents of two different registers onto the bus at the same
time. Similarly Read Write signals to the memory cannot be activated simultaneously.
This observation suggests the possibilities of grouping the signals so that all signals that are
mutually exclusive are placed in the same group. Thus a group can specify one micro operation at a
time.
At that point we have to use a binary coding scheme to represent a given signal within a group. As
for example, for 16 ALU function, four bits are enough to decode the appropriate function.
A possible grouping of the 46 control signals that are required for the above mention CPU is given
in the Table 5.1.
F1 F2 F3 F4 F5
(4 bits) (3 bits) (2 bits) (2 bits) (4 bits)
000: No 00: No 00: No
0000: No Transfer 0000: Add
Transfer Transfer Transfer
0001: PCout 001: PCin 01: MARin 01: Yin 0001: Sub
10: 0010:
0010: MDRout 001: IRin 10: MDRin
SOURCEin MULT
0011: Zout 011: Zin 11: TEMPin 11: DESTINin 0011: Div
0100: R0out 100: R0in |
0101: R1out 101: R1in |
0110: R2out 110: R2in |
0111: R3out 111: R3in |
1000:
|
SOURCEout
1001: DESTINout |
1010: TEMPout |
1011:
1111: XOR
ADDRESSout
F6 F7 F8 F9 F10
(2 bits) (1 bit) (1 bit) (1 bit) (1 bit)
0:
00: no action 0: no action 0: carry-in=0 0: no action
continue
01: read 1: clear Y 1: carry-in=1 1:WMFC 1: end
10: write
A possible grouping of signal is shown here. There may be some other grouping of signal possible.
Here all out- gating of registers are grouped into one group, because the contents of only one bus is
allowed to goto the internal bus, otherwise there will be a conflict of data.
But the in-gating of registers are grouped into three different group. It implies that the contents of
the bus may be stored into three different registers simultaneously transfer to MAR and Z. Due to
this grouping, we are using 7 bits (3+2+2) for the in-gating signal. If we would have grouped then in
one group, then only 4 bits would have been enough; but it will take more time during execution. In
this situation, two clock cycles would have been required to transfer the contents of PC to MAR and
Z.
Therefore, the grouping of signal is a critical design parameter. If speed of operation is also a
design parameter, then compression of control word will be less.
In this grouping, 46 control signals are grouped into 10 different groups ( F1 , F2 ,???., F10 ) and
the size of control word is 21. So, the size of control word is reduced from 46 to 21, which is more
than 50%.
Each microroutine can be accessed initially by decoding the machine instruction into the
starting address to be loaded into the PC.
Writing a microprogram for each machine instruction is a simple solution, but it will increase the size
of control store.
We have already discussed that most machine instructions can operate in several addressing
modes. If we write different microroutine for each addressing mode, then most of the cases, we are
repeating some part of the microroutine.
The common part of the microroutine can be shared by several microroutine, which will reduse the
size of control store. This results in a considerable number of branch microinstructions being
needed to transfer control among various parts. So, it introduces branching capabilities within the
microinstruction. This indicates that the microprogrammed control unit has to perform two basic
tasks:
Microinstruction sequencing: Get the next microinstruction from the control memory.
Microinstruction execution: Generate the control signals needed to execute the
microinstruction.
In designing a control unit, these tasks must be considered together, because both affect the format
of the microinstruction and the timing of control unit.
Design Consideration:
Two concerns are involved in the design of a microinstruction sequencing technique: the size of the
microinstruction and the address generation time.
Sequencing Techniques:
Based on the current microinstruction, condition flags and the contents of the instruction register, a
control memory address must be generated for the next microinstruction. A wide variety of
techniques have been used and can be grouped them into three general categories:
The branch control logic with two-address field is shown in the Figure 5.20.
Variable format. A multiplier is provided that serves as a destination for both address fields and the
instruction register. Based on an address selection input, the multiplexer selects either the opcode
or one of the two addresses to the control address register (CAR). The CAR is subsequently
decoded to produce the next microinstruction address. The address selection signals are provided
by a branch logic module whose input consists of control unit flags plus bits from the control portion
of the microinstruction.
In this single address field branch control logic, the options for next address are as follows:
Address field
Instruction register code
Next sequential address
The address selection signals determine which option to be selected. This approach reduce the
number of address fields to one
Variable format:
In variable format branch control logic one bit designates which format is being used. In one format,
the remaining bits are used to active control signals. In the other format, some bits drive the branch
logic module, and the remaining bits provide the address. With the first format, the next address is
either the next sequential address or an address derived from the instruction register. With the
second format, either a conditional or unconditional branch is being specified. The approach is
shown in the Figure 5.22.
Address Generation:
We have looked at the sequencing problem from the point of view of format consideration and
general logic requirements. Another viewpoint is to consider the various ways in which the next
address can be derived or computed.
The address generation technique can be divided into two techniques: explicit & implicit.
In two address field approach, signal address field or a variable format, various branch instruction
can be implemented with the explicit approaches.
UNIT-III
Instruction Set & Addressing
In this Module, we have three lectures, viz.
2. Machine Instruction
3. Instruction Format
We have examined the types of operands and operations that may be specified by machine
instructions. Now we have to see how is the address of an operand specified, and how are the bits
of an instruction organized to define the operand addresses and operation of that instruction.
o Immediate
o Direct
o Indirect
o Register
o Register Indirect
o Displacement
Stack All computer architectures provide more than one of these addressing modes. The question
arises as to how the control unit can determine which addressing mode is being used in a
particular instruction. Several approaches are used. Often, different opcodes will use different
addressing modes. Also, one or more bits in the instruction format can be used as a mode field.
The value of the mode field determines which addressing mode is to be used.
What is the interpretation of effective address. In a system without virtual memory, the effective
address will be either a main memory address or a register. In a virtual memory system, the
effective address is a virtual address or a register. The actual mapping to a physical address is a
function of the paging mechanism and is invisible to the programmer.
Immediate Addressing:
The simplest form of addressing is immediate addressing, in which the operand is actually present
in the instruction:
OPERAND = A
This mode can be used to define and use constants or set initial values of variables. The advantage
of immediate addressing is that no memory reference other than the instruction fetch is required to
obtain the operand. The disadvantage is that the size of the number is restricted to the size of the
address field, which, in most instruction sets, is small compared with the world length.
The instruction format for Immediate Addressing Mode is shown in the Figure 4.1.
Direct Addressing:
A very simple form of addressing is direct addressing, in which the address field contains the
effective address of the operand:
EA = A
Indirect Addressing:
With direct addressing, the length of the address field is usually less than the word length, thus
limiting the address range. One solution is to have the address field refer to the address of a word
in memory, which in turn contains a full-length address of the operand. This is know as indirect
addressing:
EA = (A)
Register Addressing:
Register addressing is similar to direct addressing. The only difference is that the address field
referes to a register rather than a main memory address:
EA = R
The advantages of register addressing are that only a small address field is needed in the
instruction and no memory reference is required. The disadvantage of register addressing is that
the address space is very limited.
Register Indirect Addressing:
Register indirect addressing is similar to indirect addressing, except that the address field refers to
a register instead of a memory location.
It requires only one memory reference and no special calculation.
EA = (R)
Register indirect addressing uses one less memory reference than indirect addressing. Because,
the first information is available in a register which is nothing but a memory address. From that
memory location, we use to get the data or information. In general, register access is much more
faster than the memory access.
Diaplacement Addressing:
A very powerful mode of addressing combines the capabilities of direct addressing and register
indirect addressing, which is broadly categorized as displacement addressing:
EA = A + (R)
Displacement addressing requires that the instruction have two address fields, at least one of which
is explicit. The value contained in one address field (value = A) is used directly. The other address
field, or an implicit reference based on opcode, refers to a register whose contents are added to A
to produce the effective address. The general format of Displacement Addressing is shown in the
Figure 4.6.
Relative addressing
Base-register addressing
Indexing
For relative addressing, the implicitly referenced register is the program counter (PC). That is, the
current instruction address is added to the address field to produce the EA. Thus, the effective
address is a displacement relative to the address of the instruction.
Base-Register Addressing:
The reference register contains a memory address, and the address field contains a displacement
from that address. The register reference may be explicit or implicit.
The address field references a main memory address, and the reference register contains a
positive displacement from that address. In this case also the register reference is sometimes
explicit and sometimes implicit.
Generally index register are used for iterative tasks, it is typical that there is a need to increment or
decrement the index register after each reference to it. Because this is such a common operation,
some system will automatically do this as part of the same instruction cycle.
This is known as auto-indexing. We may get two types of auto-indexing:
-- one is auto-incrementing and the other one is
-- auto-decrementing.
If certain registers are devoted exclusively to indexing, then auto-indexing can be invoked implicitly
and automatically. If general purpose register are used, the autoindex operation may need to be
signaled by a bit in the instruction.
EA = A + (R)
R = (R) - 1
In some machines, both indirect addressing and indexing are provided, and it is possible to
employ both in the same instruction. There are two possibilities: The indexing is performed either
before or after the indirection.
EA = (A) + (R)
First, the contents of the address field are used to access a memory location containing an address.
This address is then indexed by the register value.
EA = (A + (R) )
An address is calculated, the calculated address contains not the operand, but the address
of the operand
Stack Addressing:
A stack is a linear array or list of locations. It is sometimes referred to as a pushdown list or last-in-first-
out queue. A stack is a reserved block of locations. Items are appended to the top of the stack so
that, at any given time, the block is partially filled. Associated with the stack is a pointer whose
value is the address of the top of the stack. The stack pointer is maintained in a register. Thus,
references to stack locations in memory are in fact register indirect addresses.
The stack mode of addressing is a form of implied addressing. The machine instructions need not
include a memory reference but implicitly operate on the top of the stack.,
Each instruction must contain the information required by the CPU for execution. The elements of
an instruction are as follows:
Operation Code:
Specifies the operation to be performed (e.g., add, move etc.). The operation is specified by a
binary code, know as the operation code or opcode.
The next instruction to be fetched is located in main memory. But in case of virtual memory system,
it may be either in main memory or secondary memory (disk). In most cases, the next instruction to
be fetched immediately follow the current instruction. In those cases, there is no explicit reference
to the next [Link] an explicit reference is needed, then the main memory or virtual
memory address must be given.
I/O device. The steps involved in instruction execution is shown in the Figure 4.7-
Within the computer, each instruction is represented by a sequence of bits. The instruction is
divided into fields, corresponding to the constituent elements of the instruction. The instruction
format is highly machine specific and it mainly depends on the machine architecture. A simple
example of an instruction format is shown in the Figure 4.8. It is assume that it is a 16-bit CPU. 4
bits are used to provide the operation code. So, we may have to 16 (24 = 16) different set of
instructions. With each instruction, there are two operands. To specify each operands, 6 bits are
used. It is possible to provide 64 ( 26 = 64 ) different operands for each operand reference.
It is difficult to deal with binary representation of machine instructions. Thus, it has become
common practice to use a symbolic representation of machine instructions.
Opcodes are represented by abbreviations, called mnemonics, that indicate the operations.
Common examples include:
ADD Add
SUB Subtract
MULT Multiply
DIV Division
LOAD Load data from
memory to CPU
STORE Store data to memory
Figure 4.8: A simple instruction format
from CPU.
MULT R, X ; R R * X
may mean multiply the value contained in the data location X by the contents of register R and put
the result in register R
In this example, X refers to the address of a location in memory and R refers to a particular register.
Thus, it is possible to write a machine language program in symbolic form. Each symbolic opcode
has a fixed binary representation, and the programmer specifies the location of each symbolic
operandInstruction Types
Data Processing:
Arithmatic and Logic instructions Arithmatic instructions provide computational capabilities for
processing numeric data. Logic (Boolean) instructions operate on the bits of a word as bits rather
than as numbers. Logic instructions thus provide capabilities for processing any other type of data.
There operations are performed primarily on data in CPU registers.
Data Storage:
Memory instructions are used for moving data between memory and CPU registers.
Data Movement:
I/O instructions are needed to transfer program and data into memory from storage device or input
device and the results of computation back to the user.
Control:
Number of Addresses
What is the maximum number of addresses one might need in an instruction? Most of the
arithmantic and logic operations are either unary (one operand) or binary (two operands). Thus we
need a maximum of two addresses to reference operands. The result of an operation must be
stored, suggesting a third address. Finally after completion of an instruction, the next instruction
must be fetched, and its address is needed.
This reasoning suggests that an instruction may require to contain four address references: two
operands, one result, and the address of the next instruction. In practice, four address instructions
are rare. Most instructions have one, two or three operands addresses, with the address of the next
instructInstruction Set Design
One of the most interesting, and most analyzed, aspects of computer design is instruction set
design. The instruction set defines the functions performed by the CPU. The instruction set is the
programmer's means of controlling the CPU. Thus programmer requirements must be considered in
designing the instruction set.
Operation repertoire : How many and which operations to provide, and how complex
operations should be.
Data Types : The various type of data upon which operations are performed.
Instruction format : Instruction length (in bits), number of addresses, size of
various fields and so on.
Registers : Number of CPU registers that can be referenced by
instructions and their use.
Addressing : The mode or modes by which the address of an operand is
specified
Types of Operands
Numbers: All machine languages include numeric data types. Numeric data are classified into two
broad categories: integer or fixed point and floating point.
Characters: A common form of data is text or character strings. Since computer works with bits, so
characters are represented by a sequence of bits. The most commonly used coding scheme is
ASCII (American Standard Code for Information Interchange) code.
Logical Data: Normally each word or other addressable unit (byte, halfword, and so on) is treated as
a single unit of data. It is sometime useful to consider an n-bit unit as consisting of n 1-bit items of
data, each item having the value 0 or 1. When data are viewed this way, they are considered to be
logical data. Generally 1 is treated as true and 0 is treated as false.
Types of Opearations
The number of different opcodes and their types varies widely from machine to machine. However,
some general type of operations are found in most of the machine architecture. Those operations
can be categorized as follows:
Data Transfer
Arithmatic
Logical
Conversion
System Control
Transfer Control
Data Transfer:
The most fundamental type of machine instruction is the data transfer instruction. The data transfer
instruction must specify several things. First, the location of the source and destination operands
must be specified. Each location could be memory, a register, or the top of the stack. Second, the
length of data to be transferred must be indicated. Third, as with all instructions with operands, the
mode of addressing for each operand must be specified.
The CPU has to perform several task to accomplish a data transfer operation. If both source and
destination are registers, then the CPU simply causes data to be transferred from one register to
another; this is an operation internal to the CPU.
If one or both operands are in memory, then the CPU must perform some or all of the following
actions:
a) Calculate the memory address, based on the addressing mode.
b) If the address refers to virtual memory, translate from virtual to actual memory
address.
c) Determine whether the addressed item is in cache.
d) If not, issue a command to the memory moduleCommonly used data transfer operation:
Arithmatic:
Most machines provide the basic arithmatic operations like add, subtract, multiply, divide etc. These
are invariably provided for signed integer (fixed-point) numbers. They are also available for floating
point number.
The execution of an arithmatic operation may involve data transfer operation to provide the
operands to the ALU input and to deliver the result of the ALU operation.
Logical:
Most machines also provide a variety of operations for manipulating individual bits of a word or
other addressable units.
Conversion:
Conversion instructions are those that change the format or operate on the format of data. An
example is converting from decimal to binary
Input/Output :
Input/Output instructions are used to transfer data between input/output devices and memory/CPU
register.
System Control:
System control instructions are those which are used for system setting and it can be used only in
privileged state. Typically, these instructions are reserved for the use of operating systems. For
example, a system control instruction may read or alter the content of a control register. Another
instruction may be to read or modify a storage protection key.
Transfer of Control:
In most of the cases, the next instruction to be performed is the one that immediately follows the
current instruction in memory. Therefore, program counter helps us to get the next instruction. But
sometimes it is required to change the sequence of instruction execution and for that instruction set
should provide instructions to accomplish these tasks. For these instructions, the operation
performed by the CPU is to upload the program counter to contain the address of some instruction
in memory. The most common transfer-of-control operations found in instruction set are: branch,
skip and procedure call.
Branch Instruction
A branch instruction, also called a jump instruction, has one of its operands as the address of the
next instruction to be executed. Basically there are two types of branch instructions: Conditional
Branch instruction and unconditional branch instruction. In case of unconditional branch instruction,
the branch is made by updating the program counter to address specified in operand. In case of
conditional branch instruction, the branch is made only if a certain condition is met. Otherwise, the
next instruction in sequence is executed.
There are two common ways of generating the condition to be tested in a conditional branch
instruction
First most machines provide a 1-bit or multiple-bit condition code that is set as the result of some
operations. As an example, an arithmetic operation could set a 2-bit condition code with one of the
following four values: zero, positive, negative and overflow. On such a machine, there could be four
different conditional branch instructions:
In all of these cases, the result referred to is the result of the most recent operation that set the
condition code.
Another approach that can be used with three address instruction format is to perform a comparison
and specify a branch in the same instruction.
For example,
Skip Instruction
Another common form of transfer-of-control instruction is the skip instruction. Generally, the skip
imples that one instruction to be skipped; thus the implied address equals the address of the next
instruction plus one instruction length. A typical example is the increment-and-skip-if-zero (ISZ)
instruction. For example,
ISZ R1
This instruction will increment the value of the register R1. If the result of the increment is zero, then
it will skip the next instruction
Procedure Call Instruction
A procedure is a self contained computer program that is incorporated into a large program. At any
point in the program the procedure may be invoked, or called. The processor is instructed to go and
execute the entire procedure and then return to the point from which the call took place.
The procedure mechanism involves two basic instructions: a call instruction that branches from the
present location to the procedure, and a return instruction that returns from the procedure to the
place from which it was called. Both of these are forms of branching instructions.
Some important points regarding procedure call:
Since we can call a procedure from a variety of points, the CPU must somehow save the return
address so that the return can take place appropriately. There are three common places for storing
the return address:
Register
Start of procedure
Top of stack
Consider a machine language instruction CALL X, which stands for call procedure at location X. If
the register apprach is used, CALL X causes the following actions:
RN PC + IL
PC X
where RN is a register that is always used for this purpose, PC is the program counter and IL is the
instruction length. The called procedure can now save the contents of RN to be used for the later
return.
A second possibilities is to store the return address at the start of the procedure. In this case, CALL
X causes
X PC + IL
PC X + 1
Both of these approaches have been used. The only limitation of these approaches is that they
prevent the use of reentrant procedures. A reentrant procedure is one in which it is possible to have
several calls open to it at the same time.
A more general approach is to use stack. When the CPU executes a call, it places the return
address on the stack. When it executes a return, it uses the address on the stack.
It may happen that, the called procedure might have to use the processor registers. This will
overwrite the contents of the registers and the calling environment will lose the information. So, it is
necessary to preserve the contents of processor register too along with the return address. The
stack is used to store the contents of processor register. On return from the procedure call, the
contents of the stack will be popped out to appropriate registers.
In addition to provide a return address, it is also often necessary to pass parameters with a
procedure call. The most general approach to parameter passing is the stack. When the processor
executes a call, it not only stacks the return address, it stacks parameters to be passed to the called
procedures. The called procedure can access the parameters from the [Link] return, return
parameters can also be placed on the stack. The entire set of parameters, including return address,
that is stored for a procedure invocation is referred to as stack frame. A more general approach is
to use stack. When the CPU executes a call, it places the return address on the stack. When it
executes a return, it uses the address on the stack.
It may happen that, the called procedure might have to use the processor registers. This will
overwrite the contents of the registers and the calling environment will lose the information. So, it is
necessary to preserve the contents of processor register too along with the return address. The
stack is used to store the contents of processor register. On return from the procedure call, the
contents of the stack will be popped out to appropriate registers.
In addition to provide a return address, it is also often necessary to pass parameters with a
procedure call. The most general approach to parameter passing is the stack. When the processor
executes a call, it not only stacks the return address, it stacks parameters to be passed to the called
procedures. The called procedure can access the parameters from the [Link] return, return
parameters can also be placed on the stack. The entire set of parameters, including return address,
that is stored for a procedure invocation is referred to as stack frame. Most commonly used
transfer of control operation:
Instruction Format:
An instruction format defines the layout of the bits of an instruction, in terms of its constituents parts.
An instruction format must include an opcode and, implicitly or explicitly, zero or more operands.
Each explit operand is referenced using one of the addressing mode that is available for that
machine. The format must, implicitly or explictly, indicate the addressing mode of each operand. For
most instruction sets, more than one instruction format is used. Four common instruction format are
shown in the Figure 4.9.
Instruction Length:
On some machines, all instructions have the same length; on others there may be many different
lengths. Instructions may be shorter than, the same length as, or more than the word length. Having
all the instructions be the same length is simpler and make decoding easier but often wastes space,
since all instructions then have to be as long as the longest one. Possible relationship between
instruction length and word length is shown in the Figure 4.10.
Figure 4.10: Some Possible relationship between instructions and word length
Generally there is a correlation between memory transfer length and the instruction length. Either
the instruction length should be equal to the memory transfer length or one should be a multiple of
the other. Also in most of the case there is a correlation between memory transfer length and word
length of the machine.
Allocation of Bits:
For a given instruction length, there is a clearly a trade-off between the number of opcodes and the
power of the addressing capabilities. More opcodes obviously mean more bits in the opcode field.
For an instruction format of a given length, this reduces the number of bits available for addressing.
The following interrelated factors go into determining the use of the addressing bits:
Sometimes as addressing mode can be indicated implicitly. In other cases, the addressing mode
must be explicit, and one or more bits will be needed.
Number of Operands:
Typical instructions on today's machines provide for two operands. Each operand address in the
instruction might require its own mode indicator, or the use of a mode indicator could be limited to
just one of the address field.
A machine must have registers so that data can be brought into the CPU for processing. With a
single user-visible register (usually called the accumulator), one operand address is implicit and
consumes no instruction bits. Even with multiple registers, only a few bits are needed to specify the
register. The more that registers can be used for operand references, the fewer bits are needed.
Number of register sets:
A number of machines have one set of general purpose registers, with typically 8 or 16 registers in
the set. These registers can be used to store data and can be used to store addresses for
displacement addressing. The trend recently has been away from one bank of general purpose
registers and toward a collection of two or more specialized sets (such as data and displacement).
Address range:
For addresses that reference memory, the range of addresses that can be referenced is
related to the number of address bits. With displacement addressing, the range is opened up
to the length of the address register. Address granularity:
In a system with 16- or 32-bit words, an address can reference a word or a byte at the designer's
choice. Byte addressing is convenient for character manipulation but requires, for a fixed size
memory, more address bits.
Variable-Length Instructions:
Instead of looking for fixed length instruction format, designer may choose to provide a variety of
instructions formats of different lengths. This tectic makes it easy to provide a large repertoire of
opcodes, with different opcode lengths. Addressing can be more flexible, with various combinations
of register and memory references plus addressing modes. With variable length instructions, many
variations can be provided efficiently and compactly. The principal price to pay for variable length
instructions is an increase in the complexity of the CPU.
Number of addresses :
The processor architecture is described in terms of the number of addresses contained in each
instruction. Most of the arithmatic and logic instructions will require more operands. All arithmatic
and logic operations are either unary
(one source operand, e.g. NOT) or binary (two source operands, e.g. ADD).
Thus, we need a maximum of two addresses to reference source operands. The result of an
operation must be stored, suggesting a third reference.
Three address instruction formats are not common because they require a relatively long instruction
format to hold the three address reference.
With two address instructions, and for binary operations, one address must do double duty as both
an operand and a result.
In one address instruction format, a second address must be implicit for a binary operation. For
implicit reference, a processor register is used and it is termed as accumulator(AC). the
accumulator contains one of the operands and is used to store the result.
UNIT-IV
Input / Output
In this Module, we have four lectures, viz.
1. Introduction to I/O
Input/Output Organization
The computer system's input/output (I/O) architecture is its interface to the outside world.
Till now we have discussed the two important modules of the computer system -
o The processor and
o The memory module.
Each I/O module interfaces to the system bus and controls one or more peripheral devices
There are several reasons why an I/O device or peripheral device is not directly connected to the
system bus. Some of them are as follows -
There are a wide variety of peripherals with various methods of operation. It would be
impractical to include the necessary logic within the processor to control several devices.
The data transfer rate of peripherals is often much slower than that of the memory or
processor. Thus, it is impractical to use the high-speed system bus to communicate directly
with a peripheral.
Peripherals often use different data formats and word lengths than the computer to which
they are attached.
Processor Communication
Device Communication
Data Buffering
Error Detection
During any period of time, the processor may communicate with one or more external devices in
unpredictable manner, depending on the program's need for I/O.
The internal resources, such as main memory and the system bus, must be shared among a
number of activities, including data I/O.
The I/O function includes a control and timing requirement to co-ordinate the flow of traffic between
internal resources and external devices.
For example, the control of the transfer of data from an external device to the processor might
involve the following sequence of steps ?
1. The processor interacts with the I/O module to check the status of the attached device.
2. The I/O module returns the device status.
3. If the device is operational and ready to transmit, the processor requests the transfer of
data, by means of a command to the I/O module.
4. The I/O module obtains a unit of data from external device.
5. The data are transferred from the I/O module to the processor.
If the system employs a bus, then each of the interactions between the processor and the I/O
module involves one or more bus arbitrations.
During the I/O operation, the I/O module must communicate with the processor and with the
external device.
Command decoding :
The I/O module accepts command from the processor, typically sent as signals
on control bus.
Data :
Data are exchanged betweeen the processor and the I/O module over the data
bus.
Status Reporting :
Because peripherals are so slow, it is important to know the status of the I/O
module. For example, if an I/O module is asked to send data to the
processor(read), it may not be ready to do so because it is still working on the
previous I/O command. This fact can be reported with a status signal. Common
status signals are BUSY and READY.
Address Recognition :
Just as each word of memory has an address, so thus each of the I/O devices.
Thus an I/O module must recognize one unique address for each peripheral it
controls.
On the other hand, the I/O must be able to perform device communication. This communication
involves command, status information and data.
Data Buffering:
An essential task of an I/O module is data buffering. The data buffering is required due to the
mismatch of the speed of CPU, memory and other peripheral devices. In general, the speed of CPU
is higher than the speed of the other peripheral devices. So, the I/O modules store the data in a
data buffer and regulate the transfer of data as per the speed of the devices.
In the opposite direction, data are buffered so as not to tie up the memory in a slow transfer
operation. Thus the I/O module must be able to operate at both device and memory speed.
Error Detection:
Another task of I/O module is error detection and for subsequently reporting error to the processor.
One class or error includes mechanical and electrical malfunctions reported by the device (e.g.
paper jam). Another class consists of unintentional changes to the bit pattern as it is transmitted
from devices to the I/O module.
Block diagram of I/O Module is shown in the Figure 6.1.
There will be many I/O devices connected through I/O modules to the system. Each device will be
indentified by a unique address.
When the processor issues an I/O command, the command contains the address of the device that
is used by the command. The I/O module must interpret the addres lines to check if the command is
for itself.
Generally in most of the processors, the processor, main memory and I/O share a common
bus(data address and control bus).
Two types of addressing are possible -
Memory-mapped I/O
There is a single address space for memory locations and I/O devices.
The processor treats the status and address register of the I/O modules as memory location.
For example, if the size of address bus of a processor is 16, then there are 216 combinations and all
together 216 address locations can be addressed with these 16 address lines.
Out of these 216 address locations, some address locations can be used to address I/O devices and
other locations are used to address memory locations.
Since I/O devices are included in the same memory address space, so the status and address
registers of I/O modules are treated as memory location by the processor. Therefore, the same
machine instructions are used to access both memory and I/O devices.
Programmed I/O
With programmed I/O, the processor executes a program that gives its direct control of the I/O
operation, including sensing device status, sending a read or write command, and transferring the
data.
With interrupt driven I/O, the processor issues an I/O command, continues to execute other
instructions, and is interrupted by the I/O module when the I/O module completes its work.
In Direct Memory Access (DMA), the I/O module and main memory exchange data directly
without processor involvement.
With both programmed I/O and Interrupt driven I/O, the processor is responsible for extracting data
from main memory for output operation and storing data in main memory for input operation.
To send data to an output device, the CPU simply moves that data to a special memory location in
the I/O address space if I/O mapped input/output is used or to an address in the memory address
space if memory mapped I/O is used.
Input/Output Operation: The input and output operation looks very similar to a memory read or
write operation except it usually takes more time since peripheral devices are slow in speed than
main memory modules.
The working principle of the three methodds for input of a Block of Data is shown in the Figure
6.2
Figure 6.2: Working ofInput/Output Port
An I/O port is a device that looks like a memory cell to the computer but contains connection to the
outside world.
An I/O port typically uses a latch. When the CPU writes to the address associated with the latch,
the latch device captures the data and makes it available on a set of wires external to the CPU and
memory system.
The I/O ports can be read-only, write-only, or read/write. The write-only port is shown in the
Figure 6.3.
Figure 6.3: The write only port three techniques for input of block of data.
First, the CPU will place the address of the device on the I/O address bus and with the help of
address decoder a signal is generated which will enable the latch.
If it is in a read operation, the data that are already stored in the latch will be transferred to the
CPU.
A read only (input) port is simply the lower half of the Figure 6.4.
In case of I/O mapped I/O, a different address space is used for I/O devices. The address space for
memory is different. In case of memory mapped I/O, same address space is used for both memory
and I/O devices. Some of the memory address space are kept reserved for I/O devices.
To the programmer, the difference between I/O-mapped and memory-mapped input/output
operation is the instruction to be used.
For memory-mapped I/O, any instruction that accessed memory can access a memory-mapped I/O
port.
Generally, a given peripheral device will use more than a single I/O port. A typical PC parallel
printer interface, for example, uses three ports, a read/write port, and input port and an output
port.
The read/write port is the data port ( it is read/write to allow the CPU to read the last ASCII
character it wrote to the printer port ).
Memory-mapped I/O subsystems and I/O-mapped subsystems both require the CPU to move data
between the peripheral device and main memory.
For example, to input a sequence of 20 bytes from an input port and store these bytes into memory,
the CPU must send each value and store it into memory
Programmed I/O:
In programmed I/O, the data transfer between CPU and I/O device is carried out with the help of a
software routine.
When a processor is executing a program and encounters an instruction relating to I/O, it executes
that I/O instruction by issuing a command to the appropriate I/O module.
The I/O module will perform the requested action and then set the appropriate bits in the I/O status
register.
It is the responsibility of the processor to check periodically the status of the I/O module until it finds
that the operation is complete.
In programmed I/O, when the processor issuses a command to a I/O module, it must wait until the
I/O operation is complete.
Generally, the I/O devices are slower than the processor, so in this scheme CPU time is wasted.
CPU is checking the status of the I/O module periodically without doing any other work.
I/O Commands
To execute an I/O-related instruction, the processor issues an address, specifying the particular I/O
module and external device, and an I/O command. There are four types of I/O commands that an
I/O module will receive when it is addressed by a processor ?
Control : Used to activate a peripheral device and instruct it what to do. For example, a
magnetic tape unit may be instructed to rewind or to move forward one record. These
commands are specific to a particular type of peripheral device.
Test : Used to test various status conditions associated with an I/O module and its
peripherals. The processor will want to know if the most recent I/O operation is completed
or any error has occurred.
Read : Causes the I/O module to obtain an item of data from the peripheral and place it in
the internal buffer.
Write : Causes the I/O module to take an item of data ( byte or word ) from
the data bus and subsequently transmit the data item to the peripheral.
The problem with programmed I/O is that the processor has to wait a long time for the I/O module of
concern to be ready for either reception or transmission of data. The processor, while waiting, must
repeatedly interrogate the status of the I/O module.
This type of I/O operation, where the CPU constantly tests a part to see if data is available, is
polling, that is, the CPU Polls (asks) the port if it has data available or if it is capable of accepting
data. Polled I/O is inherently inefficient.
The solution to this problem is to provide an interrupt mechanism. In this approach the processor
issues an I/O command to a module and then go on to do some other useful work. The I/O module
then interrupt the processor to request service when it is ready to exchange data with the
processor. The processor then executes the data transfer. Once the data transfer is over, the
processor then resume
o For input, the I/O module services a READ command from the processor.
o The I/O module then proceeds to read data from an associated peripheral device.
o Once the data are in the modules data register, the module issues an interrupt to
the processor over a control line.
o The module then waits until its data are requested by the processor.
When the request is made, the module places its data on the data bus and is B. From the processor
point of view; the action for an input is as follows: :
Interrupt Processing The occurrence of an interrupt triggers a number of events, both in the
processor hardware and in software. When an I/O device completes an I/O operation, the following
sequences of hardware events occurs:
2. The processor finishes execution of the current instruction before responding to the
interrupt.
3. The processor tests for the interrupt; if there is one interrupt pending, then the
processor sends an acknowledgement signal to the device which issued the
interrupt. After getting acknowledgement, the device removes its interrupt signals.
4. The processor now needs to prepare to transfer control to the interrupt routine. It
needs to save the information needed to resume the current program at the point of
interrupt. The minimum information required to save is the processor status word
(PSW) and the location of the next instruction to be executed which is nothing but
the contents of program counter. These can be pushed into the system control
stack.
5. The processor now loads the program counter with the entry location of the interrupt
handling program that will respond to the interrupt.
Interrupt Processing:
Next, the program counter is loaded with the starting address of the interrupt service
routine.
Processor starts executing the interrupt service routine.
The data changes of memory and registers during interrupt service is shown in the Figure 6.5.
Return from Interrupt :
Once the program counter has been loaded, the processor proceeds to the next instruction cycle,
which begins with an interrupt fetch. The control will transfer to interrupt handler routine for the
current interrupt.
1. At the point, the program counter and PSW relating to the interrupted program have
been saved on the system stack. In addition to that some more information must be
saved related to the current processor state which includes the control of the
processor registers, because these registers may be used by the interrupt handler.
Typically, the interrupt handler will begin by saving the contents of all registers on
stack.
2. The interrupt handles next processes the interrupt. This includes an examination of
status information relating to the I/O operation or, other event that caused an
interrupt.
3. When interrupt processing is complete, the saved register values are retrieved from
the stack and restored to the registers.
The final act is to restore the PSW and program counter values from the stack. As a result, the next
instruction to be executed will be from the previously interrupted program. Design Issues for
Interrupt
4. There will almost invariably be multiple I/O modules, how does the processor
determine which device issued the interrupt?
If multiple interrupts have occurred how the processor does decide which one to process? Device
Identification
2. Software poll
The most straight forward approach is to provide multiple interrupt lines between the processor and
the I/O modules.
It is impractical to dedicate more than a few bus lines or processor pins to interrupt lines.
Thus, though multiple interrupt lines are used, it is most likely that each line will have multiple I/O
modules attached to it. Thus one of the other three techniques must be used on each line.
Software Poll :
When the processor detects an interrupt, it branches to an interrupt service routine whose job is to
poll each I/O module to determine which module caused the interrupt.
The poll could be implemented with the help of a separate command line (e.g. TEST I/O). In this
case, the processor raises TEST I/O and place the address of a particular I/O module on the
address lines. The I/O module responds positively if it set the interrupt.
Alternatively, each I/O module could contain an addressable status register. The processor then
reads the status register of each I/O module to identify the interrupting module.
Once the correct module is identified, the processor branches to a device service routine specific to
that device.
The main disadvantage of software poll is that it is time consuming. Processor has to check the
status of each I/O module and in the worst case it is equal to the number of I/O modules
Daisy Chain :
In this method for interrupts all I/O modules share a common interrupt request lines. However the
interrupt acknowledge line is connected in a daisy chain fashion. When the processor senses an
interrupt, it sends out an interrupt acknowledgement.
The interrupt acknowledge signal propagates through a series of I/O module until it gets to a
requesting module.
The requesting module typically responds by placing a word on the data lines. This word is referred
to as a vector and is either the address of the I/O module or some other unique identification.
In either case, the processor uses the vector as a pointer to the appropriate device service routine.
This avoids the need to execute a general interrupt service routine first. This technique is referred to
as a vectored interrupt. The daisy chain arrangement is shown in the Figure 6.7.
Bus Arbitration :
In bus arbitration method, an I/O module must first gain control of the bus before it can
raise the interrupt request line. Thus, only one module can raise the interrupt line at a
time. When the processor detects the interrupt, it responds on the interrupt
acknowledge line. The requesting module then places it vector on the data line.
There are several techniques to identify the requesting I/O module. These techniques also provide
a way of assigning priorities when more than one device is requesting interrupt service.
With multiple lines, the processor just picks the interrupt line with highest priority. During the processor
design phase itself priorities may be assigned to each interrupt lines.
With software polling, the order in which modules are polled determines their priority.
In case of daisy chain configuration, the priority of a module is determined by the position of the module in
the daisy chain. The module nearer to the processor in the chain has got higher priority, because
this is the first module to receive the acknowledge signal that is generated by the processor.
In case of bus arbitration method, more than one module may need control of the bus. Since
only one module at a time can successfully transmit over the bus, some method of
arbitration is needed. The various methods can be classified into two group ?
centralized and distributed.
In distributed scheme, there is no central controller. Rather, each module contains access control logic
and the modules act together to share the bus.
It is also possible to
combine different
device identification
techniques to
identify the devices
and to set the
priorities of the
devices. As for
example multiple
interrupt lines and
daisy chain
technologies can be
combined together
to give access for
more devices.
Figure 6.8: Possible arrangement to handle multiple interrupt
In one interrupt line,
more than one
device can be
connected in daisy
chain fashion. The
High priorities
devices should be
connected to the
interrupt lines that
has got higher
priority.
A possible
arrangement is
shown in the Figure
6.8.
Interrupt Nesting
The arrival of an interrupt request from an external device causes the processor to suspend the
execution of one program and starts the execution of another. The execution of this another
program is nothing but the interrupt service routine for that specified device.
Interrupt may arrive at any time. So during the execution of an interrupt service routine, another
interrupt may arrive. This kind of interrupts are known as nesting of interrupt.
Whether interrupt nesting is allowed or not? This is a design issue. Generally nesting of interrupt is
allowed, but with some restrictions. The common notion is that a high priority device may interrupt a
low priority device, but not the vice-versa.
To accomodate such type of restrictions, all computer provide the programmer with the ability to
enable and disable such interruptions at various time during program execution. The processor
provides some instructions to enable the interrupt and disable the interrupt. If interrupt is disabled,
the CPU will not respond to any interrupt signal.
On the other hand, when multiple lines are used for interrupt and priorities are assigned to these
lines, then the interrupt received in a low priority line will not be served if an interrupt routine is in
execution for a high priority device. After completion of the interrupt service routine of high priority
devices, processor will respond to the interrupt request of low priority devices
We have discussed the data transfer between the processor and I/O devices. We have discussed
two different approaches namely programmed I/O and Interrupt-driven I/O. Both the methods
require the active intervention of the processor to transfer data between memory and the I/O
module, and any data transfer must transverse a path through the processor. Thus both these
forms of I/O suffer from two inherent drawbacks.
o The I/O transfer rate is limited by the speed with which the processor can test and
service a device.
To transfer large block of data at high speed, a special control unit may be provided to allow
transfer of a block of data directly between an external device and the main memory, without
continuous intervention by the processor. This approach is called direct memory access or DMA.
DMA transfers are performed by a control circuit associated with the I/O device and this circuit is
referred as DMA controller. The DMA controller allows direct data transfer between the device and
the main memory without involving the processor
To transfer
data between
memory and
I/O devices,
DMA controller
takes over the
control of the
system from
the processor
and transfer of
data take place
over the
system bus.
For this
purpose, the
DMA controller
must use the
bus only when
the processor
does not need
it, or it must
force the
processor to
suspend
operation
temporarily.
The later
technique is
more common
and is referred
to as cycle
stealing,
because the
DMA module
in effect steals
a bus cycle.
Figure 6.9: Typical DMA block diagram
The typical
block diagram
of a DMA
controller is
shown in the
Figure 6.9.
When the processor wishes to read or write a block of data, it issues a command to the DMA
module, by sending to the DMA module the following information.
o Whether a read or write is requested, using the read or write control line between
the processor and the DMA module.
o The address of the I/O devise involved, communicated on the data lines.
o The starting location in the memory to read from or write to, communicated on data
lines and stored by the DMA module in its address register.
o The number of words to be read or written again communicated via the data lines
and stored in the data count register.
The processor then continues with other works. It has delegated this I/O operation to the DMA
module.
The DMA module checks the status of the I/O devise whose address is communicated to DMA
controller by the processor. If the specified I/O devise is ready for data transfer, then DMA module
generates the DMA request to the processor. Then the processor indicates the release of the
system bus through DMA acknowledge.
The DMA module transfers the entire block of data, one word at a time, directly to or from memory,
without going through the processor.
When the transfer is completed, the DMA module sends an interrupt signal to the processor. After
receiving the interrupt signal, processor takes over the system bus.
It is not required to
complete the current
instruction to
suspend the
processor. The
processor may be
suspended just after
the completion of the
current bus cycle.
On the other hand,
the processor can be
suspended just
before the need of
the system bus by
the processor,
because DMA
controller is going to
use the system bus,
it will not use the
processor.
When the processor is suspended, then the DMA module transfer one word and return control to
the processor.
Note that, this is not an interrupt, the processor does not save a context and do something else.
Rather, the processor pauses for one bus cycle.
During that time processor may perform some other task which does not involve the system bus. In
the worst situation processor will wait for some time, till the DMA releases the bus.
The net effect is that the processor will go slow. But the net effect is the enhancement of
performance, because for a multiple word I/O transfer, DMA is far more efficient than interrupt
driven or programmed I/O.
The DMA mechanism can be configured in different ways. The most common amongst them are:
In this organization all modules share the same system [Link] DMA module here acts as a
surrogate processor. This method uses programmed I/O to exchange data between memory and
an I/O module through the DMA module.
For each transfer it uses the bus twice. The first one is when transferring the data between I/O and
DMA and the second one is when transferring the data between DMA and memory. Since the bus
is used twice while transferring data, so the bus will be suspended twice. The transfer consumes
two bus cycle.
By integrating the DMA and I/O function the number of required bus cycle can be reduced. In this
configuration, the DMA module and one or more I/O modules are integrated together in such a way
that the system bus is not involved. In this case DMA logic may actually be a part of an I/O module,
or it may be a separate module that controls one or more I/O modules.
The DMA module, processor and the memory module are connected through the system bus. In
this configuration each transfer will use the system bus only once and so the processor is
suspended only once.
The system bus is not involved when transferring data between DMA and I/O device, so processor
is not suspended. Processor is suspended when data is transferred between DMA and memory.
The configuration is shown in the Figure 6.12.
In this configuration the I/O modules are connected to the DMA through another I/O bus. In this
case the DMA module is reduced to one.
Transfer of data between I/O module and DMA module is carried out through this I/O bus. In this
transfer, system bus is not in use and so it is not needed to suspend the processor.
There is another transfer phase between DMA module and memory. In this time system bus is
needed for transfer and processor will be suspended for one bus cycle. The configuration is shown
in the Figure 6.13.
1. I/O Buses
3. Disk Performance
I/O Buses:
The processor, main memory, and I/O devices can be interconnected through common data
communication lines which are termed as common bus.
The primary function of a common bus is to provide a communication path between the devices for
the transfer of data. The bus includes the control lines needed to support interrupts and arbitration.
The bus lines used for transferring data may be grouped into three categories:
Data,
Address
Control lines.
A single line is used to indicate Read or Write operation. When several sizes are possible like
byte, word, or long word, control signals are required to indicate the size of data.
The bus control signal also carry timing information to specify the times at which the processor and
the I/O devices may place data on the bus or receive data from the bus.
There are several schemes exist for handling the timing of data transfer over a bus. These can be
broadly classified as
o Synchronous bus
o Asynchronous bus
Synchronous Bus :
In a synchronous bus, all the devices are synchronised by a common clock, so all devices derive
timing information from a common clock line of the bus. A clock pulse on this common clock line
defines equal time intervals.
In the simplest form of a synchronous bus, each of these clock pulse constitutes a bus cycle during
which one data transfer can take place.
The timing of an input transfer on a synchronous bus is shown in the Figure 7.1.
At time t0, the master places the device address on the address lines and sends an appropriate
command (read in case of input) on the command lines.
In any data transfer operation, one device plays the role of a master, which initiates data transfer by
issuing read or write commands on the bus.
Normally, the processor acts as the master, but other device with DMA capability may also
becomes bus master. The device addressed by the master is referred to as a slave or target device.
The command also indicates the length of the operand to be read, if necessary.
The clock pulse width, t1 - t0, must be longer than the maximum propagation delay between two
devices connected to the bus.
After decoding the information on address and control lines by slave, the slave device of that
particular address responds at time t1. The addressed slave device places the required input data
on the data line at time .
At the end of the clock cycle, at time t2, the master strobes the data on the data lines into its input
buffer. The period t2- t1 must be greater than the maximum propagation delay on the bus plus the
set up time of the input buffer register of the master.
A similar procedure is followed for an output operation. The master places the output data on the
data lines when it transmits the address and command information. At time t2, the addressed device
strobe the data lines and load the data into its data buffer.
The simple design of device interface by synchronous bus has some limitations.
A transfer has to be completed within one clock cycle. The clock period, must be long
enough to accommodate the slowest device to interface. This forces all devices to operate
at the speed of slowest device.
The processor or the master has no way to determine whether the addressed device has
actually responded. It simply assumes that, the output data have been received by the
device or the input data are available on the data lines.
To solve these problems, most buses incorporate control signals that represent a response from the
device. These signals inform the master that the target device has recognized its address and it is
ready to participate in the data transfer operation.
They also adjust the duration of the data transfer period to suit the needs of the participating
devices.
A high frequency clock pulse is used so that a complete data transfer operation span over several
clock cycles. The numbers of clock cycles involved can vary from device to device
An instance of this scheme is shown in the Figure 7.2.
The timing of an input data transfer using the handshake scheme is shown in the Figure 7.3.
External Memory
Main memory is taking an important role in the working of computer. We have seen that computer
works on Von-Neuman stored program principle. We have to keep the information in main memory
and CPU access the information from main memory.
The main memory is made up of semiconductor device and by nature it is volatile. For permanent
storage of information we need some non volatile memory. The memory devices need to store
information permanently are termed as external memory. While working, the information will be
transferred from external memory to main memory.
The devices need to store information permanently are either magnetic or optical devices.
Magnetic Disk
A disk is a circular platter constructed of metal or of plastic coated with a magnetizable material.
Data are recorded on and later retrieved from the disk via a conducting coil named the head.
During a read or write operation, the head is stationary while the platter rotates beneath it.
The write mechanism is based on the fact that electricity flowing through a coil produces a
magnetic [Link] are sent to the head, and magnetic patterns are recorded on the surface
below. The pattern depends on the positive or negative currents. The direction of current depends
on the information stored , i.e., positive current depends on the information '1' and negative current
for information '0'.
The read mechanism is based on the fact that a magnetic field moving relative to a coil produces
on electric current in the coil. When the surface of the disk passes under the head, it generates a
current of the same polarity as the one alreRead/ Write head detail is shown in the Figure 7.5.
The head is a relatively small device capable of reading from or writing to a portion of the platter
rotating beneath it.
The data on the disk are organized in a concentric set of rings, called track. Each track has the
same width as the head. Adjacent tracks are separated by gaps. This prevents error due to
misalignment of the head or interference of magnetic fields.
For simplifying the control
circuitry, the same number of bits
are stored on each track. Thus the
density, in bits per linear inch,
increases in moving from the
outermost track to the innermost
track.
Some means are needed to locate sector positions within a track. Clearly there must be some
starting points on the track and a way of identifying the start and end of each sector. These
requirements are handled by means of a control data recorded on the disk. Thus, the disk is
formatted with some extra data used only by the disk drive and not accessible to the user. Since data
density in the outermost track is less and data density is more in inner tracks so there are wastage
of space on outer tracks.
To increase the capacity, the concept of zone is used instead of sectors. Each track is divided in zone
of equal length and fix amount of data is stored in each zone. So the number of zones are less in
innermost track and number of zones are more in the outermost track. Therefore, more number of
bits are stored in outermost track. The disk capacity is increasing due to the use of zone, but the
complexity of control circuitry is also more. The concept of sector and zone of a track is shown in
the Figure 7.7.
Figure 7.7: Sector and zone of a disk track
The head may be either fixed or movable with respect to the radial direction of the platter.
In a fixed-head disk, there is one read-write head per track. All of the heads are mounted on a rigid
arm that extends across all tracks.
In a movable-head disk, there is only one read-write head. Again the head is mounted on an arm.
Because the head must be able to be positioned above any track, the arm can be extended or
retracted for this purpose. The fixed head and movable head is shown in the Figure 7.8.
The disk itself is mounted in a disk drive, which consists of the arm, the shaft that rotates the disk,
and the electronics circuitry needed for input and output the binary data and to control the
mechanism.
A non removable disk is permanently mounted on the disk drive. A removable disk can be removed
and replaced with another disk.
For most disks, the magnetizable coating is applied to both sides of the platters, which is then
referred to as double sided. If the magnetizable coating is applied to one side only, then it is termed
as single sided disk.
Some disk drives accommodate multiple platters stacked vertically above one another. Multiple
arms are provided for read write head. The platters come as a unit known as a disk pack.
The physical organization of multiple platter disk is shown in the figure 7.9.
Each surface is divided into concentric tracks and each track is divided into sectors. The set of
corresponding tracks on all surfaces of a stack of disks form a logical cylinder. Data bits are stored
serially on each track.
Data on disks are addressed by specifying the surface number, the track number, and the sector
number.
In most disk systems, read and write operations always start at sector boundaries. If the number of
words to be written is smaller than that required to fill a sector, the disk controller repeats the last bit
of data for the remaining of the sector.
During read and write operation, it is required to specify the starting address of the sector from
where the operation will start, that is the read/write head must positioned to the correct track,
sector and surface. Therefore the address of the disk contains track no., sector no., and surface
no. If more than one drive is present, then drive number must also be specified.
The format of the disk address word is shown in the figure. It contains the drive no, track no.,
surface no. and sector no.
The read/write head will first positioned to the correct track. In case of fixed head system, the
correct head is selected by taking the track no. from the address. In case of movable head
system, the head is moved so that it is positioned at thBy the surface no, it selects the correct
surface.
To get the correct sector below the read/write head, the disk is rotated and bring the correct sector
with the help of sector number. Once the correct sector, track and surface is decided, the read/write
operation starts next.
Suppose that the disk system has 8 data recording surfaces with 4096 track per surface. Tracks are
divided into 256 sectors. Then the format of disk address word is:
Suppose each sector of a track contains 512 bytes of disk recorded serially, then the total capacity
of the disk is:
e correct track.
For moving head system, there are two components involved in the time delay between receiving
an address and the beginning of the actual data transfer.
Seek Time:
Seek time is the time required to move the read/write head to the proper track. This depends on the
initial position of the head relative to the track specified in the address.
Rotational Delay:
Rotational delay, also called the latency time is the amount of time that elapses after the head is
positioned over the correct track until the starting position of the addressed sector comes under the
Read/write head.
Disk Operation
Communication between a disk and the main memory is done through DMA. The following
information must be exchanged between the processor and the disk controller in order to specify a
transfer.
Disk address :
The location of the sector containing the beginning of the the desired block of words.
Word count :
The number of words in the block to be transferred.
The word count may corresponds to fewer or more bytes than that are contained in a sector. When
the data block is longer than a track:
The disk address register is incremented as successive sectors are read or written. When one track
is completed then the surface count is incremented by 1.
Thus, long data blocks are laid out on cylinder surfaces as opposed to being laid out on successive
tracks of a single disk surface.
This is efficient for moving head systems, because successive sector areas of data storage on the
disk can be accessed by electrically switching from one Read/Write head to the next rather than by
mechanically moving the arm from track to track.
To read or write, the head must be positioned at the desired tack and at the beginning of the
desired sector on the track.
Track selection involves moving the head in a movable-head system or electronically selecting one
head on a fixed head system.
On a movable-head system, the time taken to position the head at the track is known a seek time.
Once the track is selected, the disk controller waits until the appropriate sector rotates to line up
with the head. The time it takes to reach the beginning of the desired sector is known as rotational
delay or rotational latency.
The sum of the seek time, (for movable head system) and the rotational delay is termed as access
time of the disk, the time it takes to get into appropriate position (track & sector) to read or write.
Once the head is in position, the read or write operation is then performed as the sector moves
under the head, and the data transfer takes place.
Seek Time:
Seek time is the time required to move the disk arm to the required track. The seek time is
approximated as
where
estimated seek time
startup time
Rotational Delay:
Disk drive generally rotates at 3600 rpm, i.e., to make one revolution it takes around 16.7 ms. Thus
on the average, the rotational delay will be 8.3 ms.
Transfer Time:
The transfer time to or from the disk depends on the rotational speed of the disk and it is estimated
as
where
Transfer time
Number of bytes to be transferred.
Numbers of bytes on a track
Rotational speed, in revolution per second.
Thus,
the total average access time can be expressed as
Disks are potential bottleneck for system performances and storage system reliability.
The disk access time is relatively higher than the time required to access data from main memory
and performs CPU operation. Also the disk drive contains some mechanical parts and it involves
mechanical movement, so the failure rate is also high.
The disk performance has been improving continuously, microprocessor performance has improved
much more rapidly.
In data striping, he data is segmented in equal-size partitions distributed over multiple disks. The
size of the partition is called the striping unit.
Consider a striping unit equal to a disk block. In this case, I/O requests of the size of a disk block
are processed by one disk in the array.
If many I/O requests of the size of a disk block are made, and the requested blocks reside on
different disks, we can process all requests in parallel and thus reduce the average response time
of an I/O request.
Since the striping unit are distributed over several disks in the disk array in round robin fashion,
large I/O requests of the size of many continuous blocks involve all disks. We can process the
request by all disks in parallel and thus increase the transfer rate. organized to increase
performance and improve reliability of the resulting storage system. Performance is increased
through data striping. Reliability is improved through redundancy.
Disk arrays that implement a combination of data striping and redundancy are called Redundant
Arrays of Independent Disks (RAID). Redundancy
While having more disks increases storage system performance, it also lower overall storage
system reliability, because the probability of failure of a disk in disk array is increasing.
Reliability of a disk array can be increased by storing redundant information. If a disk fails, the
redundant information is used to reconstruct the data on the failed disk.
One design issue involves here - where to store the redundant information. There are two choices-
either store the redundant information on a same number of check disks, or distribute the redundant
information uniformly over all disk.
In a RAID system, the disk array is partitioned into reliability group, where a reliability group
consists of a set of data disks and a set of check disks. A common redundancy scheme is applied
to each group.
RAID levels
A RAID level 0 system is not a true member of the RAID family, because it does not include
redundancy, that is, no redundant information is maintained.
For RAID 0, the user and system data are distributed across all of the disk in the array, i.e. data are
striped across the available disk.
If two different I/O requests are there for two different data block, there is a good probability that the
requested blocks are in different disks. Thus, the two requests can be issued in parallel, reducing
the I/O waiting time.
RAID level 0 is a low cost solution, but the reliability is a problem since there is no redundant
information to retrieve in case of disk failure.
RAID level 0 has the best write performance of all RAID levels, because there is no need of
updation of redundant information. RAID Level 1 : Mirrored
RAID level 1 is the most expensive solution to achieve the redundancy. In this system, two identical
copies of the data on two different disks are maintained. This type of redundancy is called mirroring.
Every write of a disk block involves two write due to the mirror image of the disk blocks.
These writes may not be performed simultaneously, since a global system failure may occur while
writing the blocks and then leave both copies in an inconsistent state. Therefore, write a block on a
disk first and then write the other copy on the mirror disk.
A read of a block can be scheduled to the disk that has the smaller access time. Since we are
maintaining the full redundant information, the disk for mirror copy may be less costly one to reduce
the overall cost. RAID Level 2 :
RAID levels 2 and 3 make use of a parallel access technique where all member disks participate in
the execution of every I/O requests.
Data striping is used in RAID levels 2 and 3, but the size of strips are very small, often a small as a
single byte or word.
With RAID 2, an error-correcting code is calculated across corresponding bits on each data disk,
and the bits of the cods are stored in the corresponding bit positions on multiple parity disks.
RAID 2 requires fewer disks than RAID 1. The number of redundant disks is proportional to the log
of the number of data disks. For error-correcting, it uses Hamming code.
On a single read, all disks are simultaneously accessed. The requested data and the associated
error correcting code are delivered to the array controller. If there is a single bit error, the controller
can recognize and correct the error instantly, so that read access time is not slowed down.
RAID level 3 is organized in a similar fashion to RAID level 2. The difference is that RAID 3 requires
only a single redundant disk.
Instead of an error correcting code, a simple parity bit is computed for the set of individual bits in the
same position on all of the data disks.
In this event of drive failure, the parity drive is accessed and data is reconstructed from the
remaining drives. Once the failed drive is replaced, the missing data can be restored on the new
drive.
RAID level 6 :
In RAID level 6, two different parity calculations are carried out and stored in separate blocks on
different disks.
The advantage of RAID 6 is that it has got a high data availability, because the data can be
regenerated even if two disk containing user data fails. It is possible due to the use of Reed-
Solomon code for parity calculations.
In RAID 6, there is a write penalty, because each write affects two parity blocks
UNIT-V
Memory
In this Module, we have four lectures, viz.
1. Concept of Memory.
2. Cache Memory.
3. Memory Management
4. Virtual memory
We have already mentioned that digital computer works on stored programmed concept introduced
by Von Neumann. We use memory to store the information, which includes both program and data.
Due to several reasons, we have different kind of memories. We use different kind of
memory at different lavel.
Internal and
external
Internal memory is used by CPU to perform task and external memory is used to store bulk
information, which includes large software and data.
Memory is used to store the information in digital form. The memory hierarchy is given
by:
Register
Cache Memory
Main Memory
Magnetic Disk
This is a part of Central Processor Unit, so they reside inside the CPU. The information from main
memory is brought to CPU and keep the information in register. Due to space and cost constraints,
we have got a limited number of registers in a CPU. These are basically faster devices.
Cache Memory:
Cache memory is a storage device placed in between CPU and main memory. These are
semiconductor memories. These are basically fast memory device, faster than main memory.
We cannot have a big volume of cache memory due to its higher cost and some constraints of the
CPU. Due to higher cost we cannot replace the whole main memory by faster memory. Generally,
the most recently used information is kept in the cache memory. It is brought from the main memory
and placed in the cache memory. Now a days, we get CPU with internal cache.
Main Memory:
Like cache memory, main memory is also semiconductor memory. But the main memory is
relatively slower memory. We have to first bring the information (whether it is data or program), to
main memory. CPU can work with the information available in main memory only.
Magnetic Disk:
This is bulk storage device. We have to deal with huge amount of data in many application. But we
don't have so much semiconductor memory to keep these information in our computer. On the
other hand, semiconductor memories are volatile in nature. It loses its content once we switch off
the computer. For permanent storage, we use magnetic disk. The storage capacity of magnetic disk
is very high.
Removable media:
Register, cache memory and main memory are internal memory. Magnetic Disk, removable media
are external memory. Internal memories are semiconductor memory. Semiconductor memories are
categoried as volatile memory and non-volatile memory.
RAM: Random Access Memories are volatile in nature. As soon as the computer is switched off, the
contents of memory are also lost.
ROM: Read only memories are non volatile in nature. The storage is permanent, but it is read only
memory. We can not store new information in ROM.
PROM: Programmable Read Only Memory; it can be programmed once as per user
requirements.
EPROM: Erasable Programmable Read Only Memory; the contents of the memory can be
erased and store new data into the memory. In this case, we have to erase whole
information.
EEPROM: Electrically Erasable Programmable Read Only Memory; in this type of memory
the contents of a particular location can be changed without effecting the contents of other
location
Main Memory
The main memory of a computer is semiconductor memory. The main memory unit of computer is
basically consists of two kinds of memory:
The permanent information are kept in ROM and the user space is basically in RAM.
The smallest unit of information is known as bit (binary digit), and in one memory cell we can store
one bit of information. 8 bit together is termed as a byte.
The maximum size of main memory that can be used in any computer is determined by the
addressing scheme.
A computer that generates 16-bit address is capable of addressing upto 216 which is equal to 64K
memory location. Similarly, for 32 bit addresses, the total capacity will be 232 which is equal to 4G
memory location.
In some computer, the smallest addressable unit of information is a memory word and the machine
is called word-addressable.
The data transfer between main memory and the CPU takes place through two CPU registers.
MAR : Memory Address Register
MDR : Memory Data Register.
If the MAR is k-bit long, then the total addressable memory location will be 2k.
If the MDR is n-bit long, then the n bit of data is transferred in one memory cycle.
The transfer of data takes place through memory bus, which consist of address bus and data bus.
In the above example, size of data bus is n-bit and size of address bus is k bit.
It also includes control lines like Read, Write and Memory Function Complete (MFC) for
coordinating data transfer. In the case of byte addressable computer, another control line to be
added to indicate the byte transfer instead of the whole word.
For memory operation, the CPU initiates a memory operation by loading the appropriate data i.e.,
address to MAR.
If it is a memory read operation, then it sets the read memory control line to 1. Then the contents of
the memory location is brought to MDR and the memory control circuitry indicates this to the CPU
by setting MFC to 1.
If the operation is a memory write operation, then the CPU places the data into MDR and sets the
write memory control line to 1. Once the contents of MDR are stored in specified memory location,
then the memory control circuitry indicates the end of operation by setting MFC to 1.
A useful measure of the speed of memory unit is the time that elapses between the initiation of an
operation and the completion of the operation (for example, the time between Read and MFC). This
is referred to as Memory Access Time. Another measure is memory cycle time. This is the minimum
time delay between the initiation two independent memory operations (for example, two successive
memory read operation). Memory cycle time is slightly larger than memory access time
The binary storage cell is the basic building block of a memory unit.
The binary storage cell that stores one bit of information can be modelled by an SR latch with
associated gates. This model of binary storage cell is shown in the figure 3.2.
Figure 3.2: Binary Storage cell made up of SR-Latch
The binary cell sotres one bit of information in its internal latch.
Memory
Select Read/Write
Operation
0 X None
1 0 Write
1 1 Read
The storage part is modelled here with SR-latch, but in reality it is an electronics circuit made up of
transistors.
The memory consttucted with the help of transistors is known as semiconductor memory. The
semiconductor memories are termed as Random Access Memory(RAM), because it is possible to
access any memory location in random.
Depending on the technology used to construct a RAM, there are two types of RAM -
A DRAM is made with cells that store data as charge on capacitors. The presence or absence of
charge in a capacitor is interpreted as binary 1 or 0.
Because capacitors have a natural tendency to discharge due to leakage current, dynamic RAM
require periodic charge refreshing to maintain data storage. The term dynamic refers to this
tendency of the stored charge to leak away, even with power continuously applied.
A typical DRAM structure for an individual cell that stores one bit information is shown in the figure
3.3.
Figure 3.3: Dynamic RAM (DRAM) cell For the write operation, a voltage signal is applied to the bit
line B, a high voltage represents 1 and a low voltage represents 0. A signal is then applied to the
address line, which will turn on the transistor T, allowing a charge to be transferred to the capacitor.
For the read operation, when a signal is applied to the address line, the transistor T turns on and
the charge stored on the capacitor is fed out onto the bit line B and to a sense amplifier.
The sense amplifier compares the capacitor voltage to a reference value and determines if the cell
contains a logic 1 or a logic 0.
The read out from the cell discharges the capacitor, widh must be restored to complete the read
operation.
Due to the discharge of the capacitor during read operation, the read operation of DRAM is termed
as destructive read out.
In an SRAM, binary values are stored using traditional flip-flop constructed with the help of
transistors. A static RAM will hold its data as long as power is supplied to it.
A typical SRAM constructed with transistors is shown in the figure 3.4.
Four transistors (T1, T2, T3, T4) are cross connected in an arrangement that produces a stable logic
state. In logic state 1, point A1 is high and point A2 is low; in this state T1 and T4 are off, and T2 and
T3 are on . In logic state 0, point A1 is low and point A2 is high; in this state T1 and T4 are on, and T2
and T3 are off . Both states are stable as long as the dc supply voltage is applied.
The address line is used to open or close a switch which is nothing but another transistor. The
address line controls two transistors(T5 and T6).
When a signal is applied to this line, the two transistors are switched on, allowing a read or write
operation.
For a write operation, the desired bit value is applied to line B, and its complement is applied to line
. This forces the four transistors(T1, T2, T3, T4) into the proper state.
For a read operation, the bit value is read from the line B. When a signal is applied to the address
line, the signal of point A1 is available in the bit line B.
Both static and dynamic RAMs are volatile, that is, it will retain the information as long as
power supply is applied.
A dynamic memory cell is simpler and smaller than a static memory cell. Thus a DRAM is
more dense,
i.e., packing density is high(more cell per unit area). DRAM is less expensive than
corresponding SRAM.
DRAM requires the supporting refresh circuitry. For larger memories, the fixed cost of the
refresh circuitry is more than compensated for by the less cost of DRAM cells
SRAM cells are generally faster than the DRAM cells. Therefore, to construct faster memory
modules(like cache memory) SRAM is used.
A memory cell is capable of storing 1-bit of information. A number of memory cells are organized in
the form of a matrix to form the memory chip. One such organization is shown in the Figure 3.5.
A memory chip consisting of 16 words of 8 bits each, usually referred to as 16 x 8 organization. The
data input and data output line of each Sense/Write circuit are connected to a single bidirectional
data line in order to reduce the pin required. For 16 words, we need an address bus of size 4. In
addition to address and data lines, two control lines, and CS, are provided. The line is
to used to specify the required operation about read or write. The CS (Chip Select) line is required
to select a given chip in a multi chip memory system.
Consider a slightly larger memory unit that has 1K (1024) memory cells... 128 x 8 memory
chips:
If it is organised as a 128 x 8 memory chips, then it has got 128 memory words of size 8 bits. So the
size of data bus is 8 bits and the size of address bus is 7 bits ( ). The storage organization
of 128 x 8 memory chip is shown in the figure 3.6.
Analysis of large number of programs has shown that a number of instructions are executed
repeatedly. This may be in the form of a simple loops, nested loops, or a few procedures that
repeatedly call each other. It is observed that many instructions in each of a few localized areas of
the program are repeatedly executed, while the remainder of the program is accessed relatively
less. This phenomenon is referred to as locality of reference.
Figure 3.13: Cache memory between CPU and the main memory
Now, if it can be arranged to have the active segments of a program in a fast memory, then the tolal
execution time can be significantly reduced. It is the fact that CPU is a faster device and memory is
a relatively slower device. Memory access is the main bottleneck for the performance efficiency. If a
faster memory device can be inserted between main memory and CPU, the efficiency can be
increased. The faster memory that is inserted between CPU and Main Memory is termed as Cache
memory. To make this arrangement effective, the cache must be considerably faster than the main
memory, and typically it is 5 to 10 time faster than the main memory. This approach is more
economical than the use of fast memory device to implement the entire main memory. This is also a
feasible due to the locality of reference that is present in most of the program, which reduces the
frequent data transfer between main memory and cache memory. The inclusion of cache memory
between CPU and main memory is shown in Figure 3.13. Operation of Cache Memory
The memory control circuitry is designed to take advantage of the property of locality of reference.
Some assumptions are made while designing the memory control circuitry:
1. The CPU does not need to know explicitly about the existence of the cache.
2. The CPU simply makes Read and Write request. The nature of these two
operations are same whether cache is present or not.
3. The address generated by the CPU always refer to location of main memory.
4. The memory access control circuitry determines whether or not the requested word
currently exists in the cache.
When a Read request is received from the CPU, the contents of a block of memory words
containing the location specified are transferred into the cache. When any of the locations in this
block is referenced by the program, its contents are read directly from the cache.
Consider the case where the addressed word is not in the cache and the operation is a read. First
the block of words is brought to the cache and then the requested word is forwarded to the CPU.
But it can be forwarded to the CPU as soon as it is available to the cache, instaead of the whole
block to
When the cache is full and a memory word is referenced that is not in the cache, a decision must be
made as to which block should be removed from the cache to create space to bring the new block
to the cache that contains the referenced word. Replacement algorithms are used to make the proper
selection of block that must be replaced by the new one.
When a write request is received from the CPU, there are two ways that the system can proceed. In
the first case, the cache location and the main memory location are updated simultaneously. This is
called the store through method or write through method.
The alternative is to update the cache location only. During replacement time, the cache block will
be written back to the main memory. This method is called write back method. If there is no new
write operation in the cache block, it is not required to write back the cache block in the main
memory. This information can be kept with the help of an associated bit. This bit it set while there is
a write operation in the cache block. During replacement, it checks this bit, if it is set, then write
back the cache block in main memory otherwise not. This bit is known as dirty bit. If the bit gets dirty
(set to one), writting to main memory is required.
The write through method is simpler, but it results in unnecessary write operations in the main
memory when a given cache word is updated a number of times during its cache residency period.
During a write operation, if the address word is not in the cache, the information is written directly
into the main memory. A write operation normally refers to the location of data areas and the
property of locality of reference is not as pronounced in accessing data when write operation is
involved. Therefore, it is not advantageous to bring the data block to the cache when there a write
operation, and the addressed word is not present in cacheMapping Functions
The mapping functions are used to map a particular block of main memory to a particular block of
cache. This mapping function is used to transfer the block from main memory to cache memory.
Three different mapping functions are available:
Direct mapping:
A particular block of main memory can be brought to a particular block of cache memory. So, it is
not flexible.
Associative mapping:
In this mapping function, any block of Main memory can potentially reside in any cache block
position. This is much more flexible mapping method.
Block-set-associative mapping:
In this method, blocks of cache are grouped into sets, and the mapping allows a block of main
memory to reside in any block of a specific set. From the flexibility point of view, it is in between to
the other two methods.
All these three mapping methods are explained with the help of an example.
Consider a cache of 4096 (4K) words with a block size of 32 words. Therefore, the cache is
organized as 128 blocks. For 4K words, required address lines are 12 bits. To select one of the
block out of 128 blocks, we need 7 bits of address lines and to select one word out of 32 words, we
need 5 bits of address lines. So the total 12 bits of address is divided for two groups, lower 5 bits
are used to select a word within a block, and higher 7 bits of address are used to select any block of
cache memory.
Let us consider a main memory system consisting 64K words. The size of address bus is 16 bits.
Since the block size of cache is 32 words, so the main memory is also organized as block size of 32
words. Therefore, the total number of blocks in main memory is 2048 (2K x 32 words = 64K words).
To identify any one block of 2K blocks, we need 11 address lines. Out of 16 address lines of main
memory, lower 5 bits are used to select a word within a block and higher 11 bits are used to select
a block out of 2048 blocks.
Number of blocks in cache memory is 128 and number of blocks in main memory is 2048, so
at any instant of time only 128 blocks out of 2048 blocks can reside in cache menory.
Therefore, we need mapping function to put a particular block of main memory into
appropriate block of cache memoryDirect Mapping Technique:
The simplest way of associating main memory blocks with cache block is the direct mapping
technique. In this technique, block k of main memory maps into block k modulo m of the cache,
where m is the total number of blocks in cache. In this example, the value of m is 128. In direct
mapping technique, one particular block of main memory can be transfered to a particular block of
cache which is derived by the modulo function.
Since more than one main memory block is mapped onto a given cache block position, contention
may arise for that position. This situation may occurs even when the cache is not full. Contention is
resolved by allowing the new block to overwrite the currently resident block. So the replacement
algorithm is trivial.
The main memory address is divided into three fields. The field size depends on the memory
capacity and the block size of cache. In this example, the lower 5 bits of address is used to identify
a word within a block. Next 7 bits are used to select a block out of 128 blocks (which is the capacity
of the cache). The remaining 4 bits are used as a TAG to identify the proper block of main memory
that is mapped to cache.
When a new block is first brought into the cache, the high order 4 bits of the main memory address
If there is no match,
the required word
must be accessed
from the main
memory, that is, the
contents of that
block of the cache is
replaced by the new
block that is
specified by the new
address generated
by the CPU and
correspondingly the
TAG bit will also be
changed by the high
order 4 bits of the
address. The whole
arrangement for
direct mapping
technique is shown
in the figure 3.14.
In the associative mapping technique, a main memory block can potentially reside in any cache
block position. In this case, the main memory address is divided into two groups, low-order bits
identifies the location of a word within a block and high-order bits identifies the block. In the
example here, 11 bits are required to identify a main memory block when it is resident in the cache ,
high-order 11 bits are used as TAG bits and low-order 5 bits are used to identify a word within a
block. The TAG bits of an address received from the CPU must be compared to the TAG bits of
each block of the cache to see if the desired block is present.
In the associative mapping, any block of main memory can go to any block of cache, so it has got
the complete flexibility and we have to use proper replacement policy to replace a block from cache
if the currently accessed block of main memory is not present in cache. It might not be practical to
use this complete flexibility of associative mapping technique due to searching overhead, because
the TAG field of main memory address has to be compared with the TAG field of all the cache
block. In this example, there are 128 blocks in cache and the size of TAG is 11 bits. The whole
arrangement of Associative Mapping Technique is shown in the figure 3.15.
Figure 3.15: Associated Mapping Cacke Block-Set-Associative Mapping Technique:
This mapping technique is intermediate to the previous two techniques. Blocks of the cache are
grouped into sets, and the mapping allows a block of main memory to reside in any block of a
specific set. Therefore, the flexibity of associative mapping is reduced from full freedom to a set of
specific blocks. This also reduces the searching overhead, because the search is restricted to
number of sets, instead of number of blocks. Also the contention problem of the direct mapping is
eased by having a few choices for block replacement.
Consider the same cache memory and main memory organization of the previous example.
Organize the cache with 4 blocks in each set. The TAG field of associative mapping technique is
divided into two groups, one is termed as SET bit and the second one is termed as TAG bit. Each
set contains 4 blocks, total number of set is 32. The main memory address is grouped into three
parts: low-order 5 bits are used to identifies a word within a block. Since there are total 32 sets
present, next 5 bits are used to identify the set. High-order 6 bits are used as TAG bits.
The 5-bit set field of the address determines which set of the cache might contain the desired block.
This is similar to direct mapping technique, in case of direct mapping, it looks for block, but in case
of block-set-associative mapping, it looks for set. The TAG field of the address must then be
compared with the TAGs of the four blocks of that set. If a match occurs, then the block is present
in the cache; otherwise the block containing the addressed word must be brought to the cache. This
block will potentially come to the cooresponding set only. Since, there are four blocks in the set, we
have to choose appropriately which block to be replaced if all the blocks are occupied. Since the
search is restricted to four block only, so the searching complexity is reduced. The whole
arrangement of block-set-associative mapping technique is shown in the figure 3.15.
It is clear that if we
increase the number
of blocks per set,
then the number of
bits in SET field is
reduced. Due to the
increase of blocks
per set, complexity
of search is also
increased. The
extreme condition of
128 blocks per set
requires no set bits
and corrsponds to
the fully associative
mapping technique
with 11 TAG bits. The
other extreme of one
block per set is the
direct mapping
method.
Figure 3.15: Block-set Associated mapping Cache with 4 blocks per set
Replacement Algorithms
When a new block must be brought into the cache and all the positions that it may occupy are full, a
decision must be made as to which of the old blocks is to be overwritten. In general, a policy is
required to keep the block in cache when they are likely to be referenced in near future. However, it
is not easy to determine directly which of the block in the cache are about to be referenced. The
property of locality of reference gives some clue to design good replacement policy.
Since program usually stay in localized areas for reasonable periods of time, it can be assumed that
there is a high probability that blocks which have been referenced recently will also be referenced in
the near future. Therefore, when a block is to be overwritten, it is a good decision to overwrite the
one that has gone for longest time without being referenced. This is defined as the least recently
used (LRU) block. Keeping track of LRU block must be done as computation proceeds.
Consider a specific example of a four-block set. It is required to track the LRU block of this four-
block set. A 2-bit counter may be used for each block.
When a hit occurs, that is, when a read request is received for a word that is in the cache, the
counter of the block that is referenced is set to 0. All counters which values originally lower than the
referenced one are incremented by 1 and all other counters remain unchanged.
When a miss occurs, that is, when a read request is received for a word and the word is not present
in the cache, we have to bring the block to cache.
There are two possibilities in case of a miss:
If the set is not full, the counter associated with the new block loaded from the main memory is set
to 0, and the values of all other counters are incremented by 1.
If the set is full and a miss occurs, the block with the counter value 3 is removed , and the new
block is put in its palce. The counter value is set to zero. The other three block counters are
incremented by 1.
It is easy to verify that the counter values of occupied blocks are always distinct. Also it is trivial that
highest counter value indicates least recently used block.
A reasonable rule may be to remove the oldest from a full set when a new block must be brought in.
While using this technique, no updation is required when a hit occurs. When a miss occurs and the
set is not full, the new block is put into an empty block and the counter values of the occupied block
will be increment by one. When a miss occurs and the set is full, the block with highest counter
value is replaced by new block and counter is set to 0, counter value of all other blocks of that set is
incremented by 1. The overhead of the policy is less, since no updation is required during hit.
The simplest algorithm is to choose the block to be overwritten at random. Interestingly enough, this
simple algorithm has been found to be very effective in practice.
Main Memory
The main working principle of digital computer is Von-Neumann stored program principle. First of
all we have to keep all the information in some storage, mainly known as main memory, and CPU
interacts with the main memory only. Therefore, memory management is an important issue while
designing a computer system.
On the otherhand, everything cannot be implemented in hardware, otherwise the cost of system will
be very high. Therefore some of the tasks are performed by software program. Collection of such
software programs are basically known as operating systems. So operating system is viewed as
extended machine. Many more functions or instructions are implemented through software routine.
The operating system is mainly memory resistant, i.e., the operating system is loaded into main
memory.
Due to that, the main memory of a computer is divided into two parts.
One part is reserved for operating system. The other part is for user
program. The program currently being executed by the CPU is loaded
into the user part of the memory. The two parts of the main memory are
shown in the figure 3.17.
In a uni-programming system, the program currently being executed is
loaded into the user part of the memory.
In a multiprogramming system, the user part of memory is subdivided to
accomodate multiple process. The task of subdivision is carried out
dynamically by opearting system and is known as memory management.
Figure 3.17 : Partition
of main memory
Efficient memory management is vital in a multiprogramming system. If only a few process are in
memory, then for much of the time all of the process will be waiting for I/O and the processor will
idle. Thus memory needs to be allocated efficiently to pack as many processes into main memory
as possible.
When memory holds multiple processes, then the process can move from one process to another
process when one process is waiting. But the processor is so much faster then I/O that it will be
common for all the processes in memory to be waiting for I/O. Thus, even with multiprogramming, a
processor could be idle most of the time.
[Link] : A program is admitted to execute, but not yet ready to execute. The operating
system will initialize the process by moving it to the ready state.
[Link] : The process is ready to execute and is waiting access to the processor.
[Link]: The process is being executed by the processor. At any given time, only one
process is in running state.
[Link] : The process is suspended from execution, waiting for some system resource,
such as I/O.
[Link] : The process has terminated and will be destroyed by the operating system.
The processor alternates between executing operating system instructions and executing user
processes. While the operating system is in control, it decides which process in the queue sholud
be executed next.
A process being executed may be suspended for a variety of reasons. If it is suspended because
the process requests I/O, then it is places in the appropriate I/O queue. If it is suspended because
of a timeout or because the operating system must attend to processing some of it's task, then it is
placed in ready state.
We know that the information of all the process that are in execution must be placed in main
memory. Since there is fix amount of memory, so memory management is an important issue.
Memory Management
In an uniprogramming system, main memory is divided into two parts : one part for the operating
system and the other part for the program currently being executed.
In multiprogramming system, the user part of memory is subdivided to accomodate multiple
processes.
To utilize the idle time of CPU, we are shifting the paradigm from uniprogram environment to
multiprogram environment.
Since the size of main memory is fixed, it is possible to accomodate only few process in the main
memory. If all are waiting for I/O operation, then again CPU remains idle.
To utilize the idle time of CPU, some of the process must be off loaded from the memory and new
process must be brought to this memory place. This is known swapping.
What is swapping :
1. The process waiting for some I/O to complete, must stored back in disk.
2. New ready process is swapped in to main memory as space becomes
available.
3. As process completes, it is moved out of main memory.
4. If none of the processes in memory are ready,
Swapped out a block process to intermediate queue of blocked
process.
Swapped in a ready process from the ready queue.
But swapping is an I/O process, so it also takes time. Instead of remain in idle state of CPU,
sometimes it is advantageous to swapped in a ready process and start executing it.
The main question arises where to put a new process in the main memory. It must be done in such
a way that the memory is utilized properly.
Partitioning
Splitting of memory into sections to allocate processes including operating system. There are two
scheme for partitioning :
Even with the use of unequal size of partitions, there will be wastage of memory. In most cases, a
process will not require exactly as much memory as provided by the partition.
For example, a process that require 5-MB of memory would be placed in the 6-MB partition which
is the smallest available partition. In this partition, only 5-MB is used, the remaining 1-MB can not
be used by any other process, so it is a wastage. Like this, in every partition we may have some
unused memory. The unused portion of memory in each partition is termed as hole.
But, this is not the only hole that will be present in variable size partition. When all processes are
blocked then swap out a process and bring in another process. The new swapped in process may
be smaller than the swapped out process. Most likely we will not get two process of same size. So,
it will create another whole. If the swap- out and swap-in is occuring more time, then more and
more hole will be created, which will lead to more wastage of memory.
There are two simple ways to slightly remove the problem of memory wastage:
Coalesce : Join the adjacent holes into one large hole , so that some process can be
accomodated into the hole.
Compaction : From time to time go through memory and move all hole into one free block of
memory.
During the execution of process, a process may be swapped in or swapped out many times. it is
obvious that a process is not likely to be loaded into the same place in main memory each time it is
swapped in. Further more if compaction is used, a process may be shiefted while in main memory.
A process in memory consists of instruction plus data. The instruction will contain address for
memory locations of two types:
These addresses will change each time a process is swapped in. To solve this problem, a
distinction is made between logical address and physical address.
When the processor executes a process, it automatically converts from logical to physical address
by adding the current starting location of the process, called it's base address to each logical
address.
Every time the process is swapped in to main memory, the base address may be diffrent depending
on the allocation of memory to the process.
Consider a main memory of 2-MB out of which 512-KB is used by the Operating System. Consider
three process of size 425-KB, 368-KB and 470-KB and these three process are loaded into the
memory. This leaves a hole at the end of the memory. That is too small for a fourth process. At
some point none of the process in main memory is ready. The operating system swaps out
process-2 which leaves sufficient room for new process of size 320-KB. Since process-4 is smaller
then process-2, another hole is created. Later a point is reached at which none of the processes in
the main memory is ready, but proces-2, so process-1 is swapped out and process-2 is swapped in
there. It will create another hole. In this way it will create lot of small holes in the momory system
which will lead to more memory wastage.
The effect of dynamic partitionining that careates more whole during the
executi
Paging
Both unequal fixed size and variable size partitions are inefficient in the use of memory. It has been
observed that both schemes lead to memory wastage. Therefore we are not using the memory
efficiently.
The memory is partitioned into equal fixed size chunks that are relatively small. This chunk of
memory is known as frames or page frames.
Each process is also divided into small fixed chunks of same size. The chunks of a program is
known as pages.
In this scheme, the wastage space in memory for a process is a fraction of a page frame which
corresponds to the last page of the program.
At a given point of time some of the frames in memory are in use and some are free. The list of free
frame is maintained by the operating system.
Process A , stored in disk , consists of pages . At the time of execution of the process A, the
operating system finds six free frames and loads the six pages of the process A into six frames.
These six frames need not be contiguous frames in main memory. The operating system maintains
a page table for each process.
Within the program, each logical address consists of page number and a relative address within the
page.
In case of simple partitioning, a logical address is the location of a word relative to the beginning of
the program; the processor translates that into a physical address.
With paging, a logical address is a location of the word relative to the beginning of the page of the
program, because the whole program is divided into several pages of equal length and the length of
a page is same with the length of a page frame.
A logical address consists of page number and relative address within the page, the process uses
the page table to produce the physical address which consists of frame number and relative
address within the frame.
The Figure 3.22 shows the allocation of frames to a new process in the main memory. A page table
is maintained for each process. This page table helps us to find the physical address in a frame
which corresponds to a logical address within a process.
The conversion of logical address to physical address is shown in the figure for the Process A.
Figure 3.23: Translation of Logical Address to Physical Address
This approach solves the problems. Main memory is divided into many small equal size frames.
Each process is divided into frame size pages. Smaller process requires fewer pages, larger
process requires more. When a process is brought in, its pages are loaded into available frames
and a page table is set up.
The translation of logical addresses to physical address is shown in the Figure 3.23.
Figure 3.22: Allocation of free frames
Virtual Memory
Since a process need not be loaded into contiguous memory locations, it helps us to put a page of
a process in any free page frame. On the other hand, it is not required to load the whole process to
the main memory, because the execution may be confined to a small section of the program. (eg. a
subroutine).
It would clearly be wasteful to load in many pages for a process when only a few pages will be used
before the program is suspended.
Instead of loading all the pages of a process, each page of process is brought in only when it is
needed, i.e on demand. This scheme is known as demand paging .
Demand paging also allows us to accommodate more process in the main memory, since we are
not going to load the whole process in the main memory, pages will be brought into the main
memory as and when it is required.
With demand paging, it is not necessary to load an entire process into main memory.
This concept leads us to an important consequence ? It is possible for a process to be larger than
the size of main memory. So, while developing a new process, it is not required to look for the main
memory available in the machine. Because, the process will be divided into pages and pages will
be brought to memory on demand.
Because a process executes only in main memory, so the main memory is referred to as real
memory or physical memory.
A programmer or user perceives a much larger memory that is allocated on the disk. This memory
is referred to as virtual memory. The program enjoys a huge virtual memory space to develop his or
her program or software.
The execution of a program is the job of operating system and the underlying hardware. To improve
the performance some special hardware is added to the system. This hardware unit is known as
Memory Management Unit (MMU).
In paging system, we make a page table for the process. Page table helps us to find the physical
address from virtual address.
The virtual address space is used to develop a process. The special hardware unit , called Memory
Management Unit (MMU) translates virtual address to physical address. When the desired data is in
the main memory, the CPU can work with these data. If the data are not in the main memory, the
MMU causes the operating system to bring into the memory from the disk.
Address Translation
The basic mechanism for reading a word from memory involves the translation of a virtual or logical
address, consisting of page number and offset, into a physical address, consisting of frame number
and offset, using a page table.
There is one page table for each process. But each process can occupy huge amount of virtual
memory. But the virtual memory of a process cannot go beyond a certain limit which is restricted by
the underlying hardware of the MMU. One of such component may be the size of the virtual
address register.
The sizes of pages are relatively small and so the size of page table increases as the size of
process increases. Therefore, size of page table could be unacceptably high.
To overcome this problem, most virtual memory scheme store page table in virtual memory rather
than in real memory.
This means that the page table is subject to paging just as other pages are.
When a process is running, at least a part of its page table must be in main memory, including the
page table entry of the currently executing page.
A virtual address translation scheme by using page table is shown in the Figure 3.25.
Figure 3.24: Virtual Memory Organization. Each virtual address generated by the processor is
interpreted as virtual page number (high order list) followed by an offset (lower order bits) that
specifies the location of a particular word within a page. Information about the main memory
location of each page kept in a page table.
Some processors make use of a two level scheme to organize large page tables.
In this scheme, there is a page directory, in which each entry points to a page table.
Thus, if the length of the page directory is X, and if the maximum length of a page table is Y, then
the process can consist of up to X * Y pages.
Typically, the maximum length of page table is restricted to the size of one page frame. Inverted
page table structures
There is one entry in the hash table and the inverted page table for each real memory page rather
than one per virtual page.
Thus a fixed portion of real memory is required for the page table, regardless of the number of
processes or virtual page supported.
Because more than one virtual address may map into the hash table entry, a chaining technique is
used for managing the overflow.
The hashing techniques results in chains that are typically short ? either one or two entries.
The inverted page table structure for address translation is shown in the Figure 3.26.
Every virtual memory reference can cause two physical memory accesses.
Thus a straight forward virtual memory scheme would have the effect of doubling the memory
access time.
To overcome this problem, most virtual memory schemes make use of a special cache for page
table entries, usually called Translation Lookaside Buffer (TLB).
This cache functions in the same way as a memory cache and contains those page table entries
that have been most recently used.
In addition to the information that constitutes a page table entry, the TLB must also include the
virtual address of the entry.
The Figure 3.27 shows a possible organization of a TLB whwere the associative mapping technique
is used.
An essential requirement is that the contents of the TLB be coherent with the contents of the page
table in the main memory.
When the operating system changes the contents of the page table it must simultaneously
invalidate the corresponding entries in the TLB. One of the control bits in the TLB is provided for
this purpose
Address Translation proceeds as follows:
Given a virtual address, the MMU looks in the TLB for the reference page.
If the page table entry for this page is found in the TLB, the physical address is obtained
immediately.
If there is a miss in the TLB, then the required entry is obtained from the page table in the
main memory and the TLB is updated.
When a program generates an access request to a page that is not in the main memory, a
page fault is said to have occurred.
The whole page must be brought from the disk into the memory before access can proceed.
When it detects a page fault, the MMU asks the operating system to intervene by raising an
exception.(interrupt).
Processing of active task is interrupted, and control is transferred to the operating system.