Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Chapter 4
Number Representations
SKEE2263 Digital Systems
Mun’im/Ismahani/Izam
{munim@[Link],e-izam@[Link],ismahani@[Link]}
January 28, 2017
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Table of Contents
1 Fundamentals
2 Signed Numbers
3 Fixed-Point Numbers
4 Floating Point
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Taxonomy of Number Systems
Numbers
Integers Reals
Fixed Floating
Unsigned Signed
Point Point
Signed- Ones' Two's
Magnitude Complement Complement
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Integers
Number Number
Machine
of bits of values
4 16 Intel 4004
8 256 8080, 6800
16 65536 PDP11, 8086, 68000
32 4 × 109 68020, VAX11, IEEE single
48 1 × 1014 Unisys
64 1.8 × 1019 Cray, IEEE double
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Integers
Value for integer bit pattern:
N
X −1
Vunsigned = bi × 2i
i=0
Example 101102 :
101102 = 1 × 24 + 0 × 23 + 1 × 22 + 1 × 21 + 0 × 20
= 2210
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Signed-Magnitude
Value for N -bit signed-magnitude pattern is:
N
X −2
VSM = (−1)bN −1 × bi × 2i
i=0
Example 1010SM :
VSM = (−1)b3 × [b2 × 22 + b1 × 21 + b0 × 20 ]
= (−1)1 × [0(4) + 1(2) + 0(1)]
= −1 × 2
= −2
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Signed-Magnitude
Signed Integer Signed-Magnitude
+5 0000 0101
+4 0000 0100
+3 0000 0011
+2 0000 0010
+1 0000 0001
0 0000 0000
1000 0000
-1 1000 0001
-2 1000 0010
-3 1000 0011
-4 1000 0100
-5 1000 0101
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Ones’ Complement
Value for N -bit ones’ complement pattern is:
N
X −2
V1C = −bN −1 2N −1 + bi × 2i + bN −1
i=0
Example 10101C :
V1C = −b3 × 23 + b2 × 22 + b1 × 21 + b0 × 20 + b3
= −1(8) + 0(4) + 1(2) + 0(1) + 1
= −8 + 2 + 1
= −5
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Ones’ Complement
Signed Integer Ones’ Complement
+127 0111 1111
+126 0111 1110
... ...
+2 0000 0010
+1 0000 0001
0 0000 0000
1111 1111
-1 1111 1110
-2 1111 1101
-3 1111 1100
... ...
-126 1000 0001
-127 1000 0000
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Two’s Complement
Value for N -bit Two’s complement pattern is:
N
X −2
V2C = −bN −1 2N −1 + bi × 2i
i=0
Example 10102C :
V2C = −b3 × 23 + b2 × 22 + b1 × 21 + b0 × 20
= −1(8) + 0(4) + 1(2) + 0(1)
= −8 + 2
= −6
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Twos’ Complement
Signed Integer Twos’ Complement
+127 0111 1111
+126 0111 1110
... ...
+2 0000 0010
+1 0000 0001
0 0000 0000
-1 1111 1111
-2 1111 1110
... ...
-126 1000 0010
-127 1000 0001
-128 1000 0000
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Sign Extension
convert a number to a larger format.
just copy the sign bit to fill the new “high order” bits
+ 100 in 8-bit two’s-complement binary 0110 0100
+ 100 in 16-bit two’s-complement binary 0000 0000 0110 0100
- 100 in 8-bit two’s-complement binary 1001 1100
- 100 in 16-bit two’s-complement binary 1111 1111 1001 1100
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Offset binary
a.k.a. biased-K representation
Variation of two’s complement
Uses a value K as biasing value
Applications:
Exponent of floating-point number (biased-127 or
biased-1023)
Analog interfacing
Excess-3 code (actual value = binary - 3)
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Comparing Number Systems
Decimal Signed- One’s Two’s Offset
Magnitude Complement Complement Binary
7 0111 0111 0111 1111
6 0110 0110 0110 1110
5 0101 0101 0101 1101
4 0100 0100 0100 1100
3 0011 0011 0011 1011
2 0010 0010 0010 1010
1 0001 0001 0001 1001
0 0000 0000 0000 1000
-0 1000 1111 – –
-1 1001 1110 1111 0111
-2 1010 1101 1110 0110
-3 1011 1100 1101 0101
-4 1100 1011 1100 0100
-5 1101 1010 1011 0011
-6 1110 1001 1010 0010
-7 1111 1000 1001 0001
-8 – – 1000 0000
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Signed Systems Compared
Unsigned Signed- Ones’ Two’s
Magnitude Complement Complement
Smallest 0 −(2n−1 − 1) −(2n−1 − 1) −2n−1
n
Largest 2 −1 +(2n−1 − 1) +(2n−1 − 1) +(2n−1
− 1)
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Real Numbers
Number System Format Characteristics
Fixed-point ±i.f Low-precision
Rational ±p/q Difficult to work with
Floating-point ±m · be Most common way to handle reals
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Fixed-Point Numbers
The general expression for an N -bit fixed point 2’s complement
PN −2
−bN −1 2N −1 + i=0 bi × 2i
x=
2f
where:
N = total #bits
f = #bits in fraction (0 ≤ f ≤ N − 1)
N-1 0
S int frac
imaginary binary point
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Expression for Two’s Comp.
Value of N -bit two’s complement integer, f = 0
PN −2
−bN −1 2N −1 + i=0 bi × 2i
x=
20
N −1
= −bN −1 2 + bN −2 2N −2 + · · · + b1 21 + b0 20
N-1 0
S int
imaginary binary point
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Expression for Q-Format
Value of N -bit fixed point, f = N − 1
PN −2
−bN −1 2N −1 + i=0 bi × 2i
x=
2N −1
= −b0 + b−1 2 + b−2 2−2 + · · · + b−(N −1) 2−(N −1)
−1
N-1 0
S frac
imaginary binary point
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
N = 8, f = 4
Weights −23 22 21 20 · 2−1 2−2 2−3 2−4
Bit value 0 1 0 1 · 1 1 0 0
0101.11002 = 22 + 20 + 2−1 + 2−2
= 4 + 1 + 0.5 + 0.25
= 5.7510
OR
Weights −27 26 25 24 23 22 21 20
÷24
Bit value 0 1 0 1 1 1 0 0
x = (26 + 24 + 23 + 22 ) ÷ 24
= (64 + 16 + 8 + 4) ÷ 16
= 5.7510
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
N = 8, f = 7 → Q7f ormat
Weights −20 2−1 2−2 2−3 2−4 2−5 2−6 2−7
Bit value 0 1 1 1 1 1 1 1
max = 2−1 + 2−2 + 2−3 + 2−4 + 2−5 + 2−6 + 2−7
= (26 + 25 + 24 + 23 + 22 + 21 + 20 ) ÷ 27
= 127/128
= 0.9921875
Weights −20 2−1 2−2 2−3 2−4 2−5 2−6 2−7
Bit value 1 0 0 0 0 0 0 0
min = −20
= −1
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Multiplying Q15 Numbers
15 0
Q15 S
x Q15 S
31 16 15 0
Q30 S S
31 16 15 14 0
Q30 S S r
+ 1 00 0000 0000 0000
rounding by addition a '1' here
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
What is Floating Point?
5.2510 = 101.01 × 20
= 10.101 × 21
= 1.0101 × 22 ←
= 0.10101 × 23
Binary point “floats” to a pre-defined position
Process is called normalization
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
Floating Point Parts
- 2
- 5 . 4 3 2 1 × 1 0
Sign of Exponent
mantissa
Sign of
exponent
Mantissa Radix
±X = m × be
where m = mantissa, b = number base and e = exponent.
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
IEEE Single-Precision Format
31 30 23 22 0
S 8-bit exp 23-bit frac
±X = (−1)s × 1.m × 2e−127
Sign field: 0 for positive numbers (-10 = +1)
1 for negative numbers (-11 = -1).
Exponent field: Unsigned 8 bit, biased-127.
Mantissa field: Bits to the right of normalized binary number.
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
IEEE Single-Precision Format
1 1000 0001 110 0000 0000 0000 0000 0000
X = (−1)1 × 1.112 × 2129−127
= −1 × 1.7510 × 22
= −7
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
IEEE Double-Precision Format
Double precision is more common.
63 62 52 51 0
S 11-bit exp 52-bit frac
±X = (−1)s × 1.m × 2e−1023
public class TryFP {
public static void main(String[ ] args) {
double d = 1/3.; // Java likes double-prec more
float f = 1f/3f; // Must force use of single-prec
[Link]("Value of d="+d);
[Link]("Value of f="+f);
}
}
Value of d=0.3333333333333333
Value of f=0.33333334
Fundamentals Signed Numbers Fixed-Point Numbers Floating Point
FX vs FP
Fixed Point Arithmetic Floating-Point Arithmetic
Simple circuit Complex circuit (due to rounding
and normalization)
Small area and faster Large area and slower
Less accurate (the result is trun- More accurate (high precision)
cated if it exceeds the size)
Smaller range of values can be Wider range of values can be han-
handled dled