Introduction to Vectors in Math 1229
Introduction to Vectors in Math 1229
Unit 1:
Vectors
(text reference: Section 1.1)
c
V. Olds 2010
Unit 1 1
1 Vectors
You are familiar with the set of real numbers. Real numbers means all the numbers you’ve
ever heard of or can imagine (unless you’ve learnt about, or at least heard of, imaginary numbers
– they’re not real numbers). Real numbers includes all the integers (including both positive and
negative, and of course 0), fractions (called rational numbers), decimals that can’t be expressed as
fractions (the irrational numbers). All the numbers along the real number line from −∞ to ∞. We
call the set of real numbers ℜ.
You’re also familiar with the x-y plane. You’ve drawn graphs in this plane. There’s the x-axis
(horizontal) and the y-axis (vertical), which cross at the origin. Each axis is basically a real number
line. Any point in this plane can be expressed as an ordered pair, (x, y), giving its x-coordinate and
its y-coordinate. Each coordinate can be any real number. We call the x-y plane 2-space and the
set containing all of the points in this plane is called ℜ2 . (We pronounce that “R2”, i.e. “Artoo”.)
So ℜ2 can be thought of as the set of all ordered pairs of real numbers.
You’ve probably also seen something even more complicated, with 3 axes: the x-axis, the y-axis
and the z-axis. Each axis is perpendicular to both of the others, which makes it hard to draw on a
piece of paper or a blackboard. They’re often drawn with the y-axis horizontal, the z-axis vertical,
and the x-axis off at a funny angle, to represent that it’s coming straight out of the page at you. The
3 axes represent the 3 dimensions of “space”, i.e. reality. Like the room you’re sitting in. There’s
not only up and down, and left and right, but also near and far, or here and there, or ... well, you
know, that third dimension, which we might call depth. In Math, we call the region defined by these
3 axes 3-space. And points in 3-space are represented by ordered triples, (x, y, z), giving the x-, y-
and z-coordinates of the point. As before, each of these coordinates can be any real number. The
set containing all of the points in 3-space is called ℜ3 (pronounced “R3”), and we can think of this
as the set of all ordered triples of real numbers.
Now that you’ve got that straight, let’s confuse things. Sometimes when we write an ordered
pair or an ordered triple, it doesn’t represent a point. Instead, it represents a directed line segment,
called a vector. And when the pair or triple represents a vector, the numbers in it (or symbols
representing numbers) aren’t called coordinates, they’re called components.
That seems like it will be confusing, using the same notation to denote two different things.
It’s not too bad, though, because you can tell by the context whether a pair or triplet is a point
or a vector. And because we write the names differently, depending whether it’s a point or a vec-
tor. Points are named with normal capital letters, usually P or something nearby in the alphabet,
and sometimes with subscripts. So we might have the points P (p1 , p2 ) and Q(q1 , q2 ), or the points
P1 (x1 , y1 , z1 ) and P2 (x2 , y2 , z2 ). With vectors, we don’t use capital letters, and we do something
to show that it’s a vector. In our textbook, they use boldface type for the name of a vector. For
instance, the vector v = (v1 , v2 ) or the vector u = (u1 , u2 , u3 ). “Sure”, I hear you thinking, “that’s
easy enough for you, but what about me, when I’m writing with a pen or pencil?”. Well, that’s why
we’re going to use a different convention in these notes. One that’s much more obvious. When a
letter is the name of a vector, we’ll put an arrow over it. Like this: ~v = (v1 , v2 ).
So why is it that we represent a vector, which we said is a directed line segment, using something
that looks like a point? Well, it’s just a convention. It’s shorthand. When we say ~v = (1, 2), what
we mean is that the vector ~v is the directed line segment that starts at the origin and ends at the
point (1, 2). So here’s a picture of the vector ~v = (1, 2). It starts at the point (0, 0), and then ends
at a place that’s 1 unit to the right and 2 units up from there.
y
3 6
2
1 ~v
-
0 1 2 3 x
2 Unit 1
Definition: The vector ~v = (v1 , v2 ) in ℜ2 is the directed line segment that goes
from the origin (i.e. the point (0, 0)) to the point V (v1 , v2 ). Similarly, the vector
~v = (v1 , v2 , v3 ) in ℜ3 is the directed line segment that goes from the point (0, 0, 0) to
the point V (v1 , v2 , v3 ). The point V is called the endpoint of the vector ~v .
If we want to say that ~v is a vector in ℜ2 , i.e. a vector which has two components, then we say
~v ∈ ℜ2 . You’ve probably seen that symbol before, for instance for saying that a particular object
is an element of a particular set. Similarly, we can say ~v ∈ ℜ3 to state that ~v is a vector in ℜ3 .
Notice that earlier we said that ℜ2 was the set of all points in the x-y plane. But now we’re saying
that a vector is in that set. Hmm. If ℜ2 is a set, does it contain points or vectors? Well, actually,
we can think of it either way. We can describe 2-space as the set of all points in the x-y plane, or
as the set of all vectors in the x-y plane. Or we can define it simply as the set of all ordered pairs
(x, y). That’s probably the best way to think of it. Because an ordered pair can (as we’ve already
discussed) represent either a point or a vector, depending on the context. Likewise, we’ll think of
ℜ3 as simply the set of all ordered triples (x, y, z).
There is a vector in ℜ2 corresponding to each point in the x-y plane. Likewise, there is a vector
in ℜ3 corresponding to each point in 3-space. But wait a minute! What about the point (0,0) or
(0,0,0)? That’s where the directed line segment starts. So can there be a directed line segment that
goes from that point to itself? Well, yes. Although you won’t be able to see it, and you won’t care,
or be able to determine, what direction it goes in. That is, in spite of the fact that “the line segment
from the point (0,0) to the point (0,0)” seems nonsensical, because there’s no line segment there,
we do consider there to be a vector ~v = (0, 0). It actually comes in very handy. We call it the “zero
vector”, and give it the name ~0.
Definition: A zero vector is a vector whose components are all 0. The zero vector
in ℜ2 is the vector ~0 = (0, 0). Similarly, ~0 = (0, 0, 0) is the zero vector in ℜ3 .
Whenever we define a new mathematical construct, we need to define what “equality” means for
that construct. Even if it seems pretty obvious. So we need to define what it means to say that
two vectors are equal. We’ve already used that concept, in attaching names to vectors. For instance
when we say ~v = (v1 , v2 ), we’re saying that the vector whose name is ~v is equal to the vector in ℜ2
whose first component is v1 and whose second component is v2 . Likewise, when we say ~0 = (0, 0, 0),
we’re saying that the vector whose name is ~0 is equal to the vector in ℜ3 whose components are
all 0. But of course, what we really meant there is “this is the name I’m going to call that vector
by”, rather than “here are 2 different vectors, and they’re equal”. But often we do need to equate 2
vectors in that sense, too. Or to say that the vector you get when you do certain vector arithmetic
operations (which we’ll learn about shortly) is equal to a specified vector. So what do we mean
when we say, for instance, that ~u = ~v ?
Definition: Two vectors are equal if they are vectors in the same space and their
corresponding components are equal. That is, two vectors in ℜ2 are equal if they have
the same first component and also have the same second component. Similarly, two
vectors in ℜ3 are equal if they have the same first component and have the same second
component and have the same third component. In mathematical notation, we have
and similarly
Notice that in order for 2 vectors to be equal they must be vectors in the same space. If ~u ∈ ℜ2 and
~v ∈ ℜ3 then they can never be equal vectors, no matter what their components are.
Unit 1 3
Example 1.1. If ~u = (a, 2) and ~v = (−1, b), where it is known that ~u = ~v , what are the values of a
and b?
Solution:
Since ~u = ~v , then (a, 2) = (−1, b). And for these vectors to be equal, their respective components
must be equal. Since the first component of ~v is −1 and ~u = ~v , then the first component of ~u must
also be −1. And a is the first component of ~u, so it must be true that a = −1. Likewise, since the
second component of ~u is 2 and ~v = ~u, then the second component of ~v must also be 2. But the
second component of ~v is b, and so b = 2.
All vectors start at the origin. There are infinitely many lines that pass through the origin. For
any vector (other than the zero vector), there’s exactly one line through the origin that the vector
lies on. And often it’s important to realize whether or not 2 vectors lie on the same line. We have
a word for that. Collinear just means “same line”.
Definition: Two vectors in the same space are collinear if they lie on the same line.
Of course, if two vectors lie on the same line, they must be parallel to one another. So if two vectors
are collinear, they are also parallel, and vice versa.
Solution:
First we’ll draw each vector on its own set of axes.
y y y
6 6 6
4 4 4
w
~
3 3 3
2 ~u 2 2
1 1 *~v
1
- - -
0 1 2 3 x 0 1 2 3 x 0 1 2 3 x
We can also show the line that each vector lies on:
y y y
6 6 6
4 4 4
w
~
3 3 3
2 ~u 2 2
1
1
*~v
1
-
- -
0 1 2 3 x 0 1 2 3 x
0 1 2 3 x
It’s pretty clear that the line that ~u lies on is not the same as the line that ~v lies on. For ~u and w,
~
they look pretty much the same, but how can we be sure? We draw them all on the same axes:
y
6
4
w
~
3
2 ~u
1 *~v
-
0 1 2 3 x
From this last diagram, even without the lines drawn in we can see that vectors ~u and ~v are certainly
not collinear, and also that vector w ~ lies right on top of ~u, because they are collinear.
4 Unit 1
All vectors start at the origin, so all vectors (in the same space) touch one another. And yet,
we talk about the distance between vectors. By this, we mean the furthest distance between any 2
points with one point being on each vector. This always occurs at the endpoints of the vectors.
Definition: The distance between two vectors ~u and ~v is defined to be the distance
between their endpoints and is denoted d(~u, ~v ). In ℜ2 we have:
p
d(~u, ~v ) = (v1 − u1 )2 + (v2 − u2 )2
Similarly, in ℜ3 we have:
p
d(~u, ~v ) = (v1 − u1 )2 + (v2 − u2 )2 + (v3 − u3 )2
The formula for the distance between two vectors in ℜ2 is just an application of the Pythagorean
Theorem, found by considering the line segment joining the two endpoints to be the hypotenuse of a
right-angled triangle. The height of the triangle is the vertical distance between the two points (i.e.
the difference between their y-coordinates, or the second components of the vectors) and likewise
the length of the base of the triangle is the horizontal distance between the two points (i.e. the
difference between their x-coordinates, or the first components of the vectors). The formula for the
distance between two vectors in ℜ3 is based on the same idea, but in 3 dimensions. Notice that
because the terms inside the square root are each squared, the distance between ~u and ~v is the same
as the distance between ~v and ~u. That is, d(~u, ~v ) = d(~v , ~u). So it doesn’t matter which vector is
mentioned first.
Solution:
(a) We have u1 = 1, u2 = 2, v1 = 2 and v2 = 1. (That is, when we refer to the components of ~u as
u1 and u2 , we simply mean whatever numbers are the first and second components, respectively, of
the vector. Similarly for a vector with three components.) So for the distance between ~u and ~v we
get
p p p √ √
d(~u, ~v ) = (v1 − u1 )2 + (v2 − u2 )2 = (2 − 1)2 + (1 − 2)2 = (1)2 + (−1)2 = 1 + 1 = 2
(b) For the distance between w
~ and ~u we do a similar calculation:
p p p √ √
~ ~u) = (u1 − w1 )2 + (u2 − w2 )2 = (1 − 2)2 + (2 − 4)2 = (−1)2 + (−2)2 = 1 + 4 = 5
d(w,
Notice that the formula says to take, for each component, the square of the second-mentioned vector
minus the first-mentioned vector. So even though the formula said (v1 − u1 )2 , in calculating d(w, ~ ~u),
we put u1 before the minus sign, not after. (That is, in this calculation, ~u was filling the role played
by ~v in the formula.) In this particular formula, it doesn’t matter because, as previously mentioned,
the distance is the same, whether we think of it as the distance between w ~ and ~u or as the distance
between ~u and w.~ (Is the distance between here and there any different than the distance between
there and here? Well, I suppose sometimes, when there are one-way streets involved. But not
usually.) In other situations, though, it will be important to use the right vector in the right place
in the formula.
Example 1.4. Find the distance between ~u = (1, 2, 3) and ~v = (−1, 0, 1).
Solution:
We do the same sort of calculation as before, but now with a third component. We get:
p p
d(~u, ~v ) = (v1 − u1 )2 + (v2 − u2 )2 + (v3 − u3 )2 = (−1 − 1)2 + (0 − 2)2 + (1 − 3)2
p √ √ √ √ √
= (−2)2 + (−2)2 + (−2)2 = 4 + 4 + 4 = 4 × 3 = 4 3 = 2 3
Unit 1 5
The length of a vector is defined as the distance between where it starts and where it ends. That
is, the distance from the origin (where all vectors start) to the endpoint of the vector. There are
some other words we sometimes use which mean exactly the same thing. We sometimes talk about
the magnitude or the norm of a vector. These terms both mean the length of the vector.
Definition: The length of a vector ~v , also called the magnitude or the norm of the
vector, is denoted by ||~v || and is defined to be
||~v || = d(0, ~v )
Therefore for ~v ∈ ℜ2 we have:
p
||~v || = (v1 )2 + (v2 )2
and for ~v ∈ ℜ3 we have: p
||~v || = (v1 )2 + (v2 )2 + (v3 )2
Any vector whose length is 1 is called a unit vector.
Notice that the only way to get ||~v || = 0 is by having each component of ~v be 0. So ||~v || = 0 if and
only if ~v = ~0.
Example 1.5. Find the length of ~u = (1, 2), the magnitude of ~v = (2, 1) and the norm of w
~ = (2, 4).
Solution:
Length, magnitude and norm all mean the same thing, so we just use the same formula three times,
once for each vector. We get
p √ √
||~u|| = 12 + 22 = 1 + 4 = 5
p √ √
||~v || = 22 + 12 = 4 + 1 = 5
p √ √
||w||
~ = 22 + 42 = 4 + 16 = 20
√ p √ √ √
Notice that 20 = (4)(5) = 4 5 = 2 5. The length of w
~ is twice the length of ~u. That is,
||w||
~ = 2||~u||.
3 4
Example 1.6. Find , 0, − .
5 5
Solution:
s
2 2
3
, 0, − 4 3 4
5 = + (0)2 + −
5 5 5
s
32 (−4)2
= +0+
52 52
r
9 16
= +
25 25
25 √
r r
9 + 16
= = = 1=1
25 25
3 4 3 4
Notice: Since , 0, − = 1 then
, 0, − is a unit vector.
5 5 5 5
6 Unit 1
2 2
Example 1.7. If ~u = , ,k is a unit vector, what is the value of k?
3 3
Solution:
We need ||~u|| = 1 in order for ~u to be a unit vector. Since ||~u|| is given by
s
2 2 r r
2 2 4 4 2 8
||~u|| = + +k =2 + +k = + k2
3 3 9 9 9
then we must have r
8
+ k2 = 1
9
r !2
8 8
so + k2 = 12 → + k2 = 1
9 9
8 9 8 9−8 1
therefore k 2 = 1 − = − = =
9 9 9 9 9
r √
1 1 1
and so k=± = ±√ = ±
9 9 3
2 2
(That is, since both 13 = 91 and − 13 = 91 , then knowing that k 2 = 19 tells us that k is one of
these 2 values, but doesn’t tell us which one it is.)
1 1 2 2 1 2 2 1
We see that k could be either or − . That is, both , , and , ,− are unit vectors,
3 3 3 3 3 3 3 3
and ~u could be either of these vectors.
For instance, for the vectors in Examples 1.2 and 1.3 we had
w
~ = (2, 4) = (2 × 1, 2 × 2) = 2(1, 2) = 2~u
The vector w~ is two times the vector ~u, which is why we found that it was twice as long, i.e. has
length twice the length of ~u.
Every scalar multiple of a vector lies along the same line as that vector. And from the origin,
there are (only) two directions you can go along a line that passes through the origin. So there are
two directions which a scalar multiple of a vector could go: the same direction as the vector, or the
opposite direction. For instance, if ~v goes straight up, then some scalar multiples of ~v go straight
up, and others go straight down. Or if the scalar is 0, then the scalar multiple doesn’t go anywhere
at all. Scalars bigger than 0 give vectors with the same direction, while scalars less than 0 give
vectors with the opposite direction. And whether a new vector goes twice as far as ~v in the same
direction as ~v , or goes twice as far as ~v in the opposite direction, it will still be true that the new
vector is twice as long as ~v . That is, both 2~v and −2~v have magnitude 2||~v ||, but 2~v goes in the
same direction as ~v , while −2~v goes in the opposite direction.
Unit 1 7
Theorem 1.1. Let ~v be any vector, either in ℜ2 or ℜ3 , and consider any c ∈ ℜ. The vectors ~v and
c~v are collinear, and
Saying that two vectors are collinear is the same as saying that they are parallel. So ~v and c~v
are always parallel to one another, no matter what the value of the scalar c is. And the last part of
the theorem says that you can find the magnitude of the scalar multiple of a vector by multiplying
the magnitude of the vector by the absolute value of the scalar. So for instance, as observed before,
both 2~v and −2~v have magnitude 2||~v ||.
Notice that the zero vector is collinear to every vector, because for any ~v we have 0~v = ~0, so the
zero vector is a scalar multiple of every vector (with the same number of components).
10
Example 1.8. Let ~u = (2, −3) and ~v = (0, −1, 2). Find the vectors 2~u, −0.5~u, −3~v and 3 ~v, and
find the magnitude of each of these vectors.
Solution:
We could find the magnitudes of these new vectors using the appropriate formula for each. But
it’s easier to just find the magnitudes of ~u and ~v and use those to find the magnitudes of the given
vectors.
p √ √
||~u|| = ||(2, −3)|| = (2)2 + (−3)2 = 4 + 9 = 13
√
so ||2~u|| = |2| ||~u|| = 2 13
√ √
√ 13 √
r
13 13
and || − 0.5~u|| = | − 0.5| ||~u|| = 0.5 13 = = √ = = 3.25
2 4 4
p √ √
Also ||~v || = ||(0, −1, 2)|| = (0)2 + (−1)2 + (2)2 = 0 + 1 + 4 = 5
√
so || − 3~v || = | − 3| ||~v || = 3 5
√
√
10 10
and ~v =
||~v || = 10 5 = 10 5
3 3 3 3
8 Unit 1
Example 1.9. Let ~u = (4, −3) and ~v = (5, 0, −2). Find a unit vector in the same direction as ~u, and
a unit vector in the opposite direction to ~v .
Solution:
The magnitude of ~u is
p √ √
||~u|| = ||(4, −3)|| = (4)2 + (−3)2 = 16 + 9 = 25 = 5
We get a vector in the same direction as ~u by taking a scalar multiple of ~u, using a positive scalar.
Let c be the positive scalar for which c~u is a unit vector. Then we need ||c~u|| = 1. But we know
that ||c~u|| = |c| ||~u||, where in this case ||~u|| = 5, and since c is positive, then |c| = c. So we need
1
c||~u|| = 1 ⇒ 5c = 1 ⇒ c=
5
Therefore, a unit vector in the same direction as ~u is the vector
1 4 3
c~u = (4, −3) = ,−
5 5 5
1
Notice: For any vector ~u, if c~u is a unit vector, then |c| = .
||~u||
We take a similar approach for finding a unit vector in the opposite direction to ~v . As we’ve already
seen, for a unit vector we need to scale ~v by a constant whose magnitude (i.e. absolute value) is
1
v || . And for the vector to have the opposite direction to that of ~
||~ v , the constant must be negative.
So to get a unit vector in the opposite direction to ~v , we multiply ~v by the scalar − ||~v1|| . We have
p √ √
||~v || = ||(5, 0, −2)|| = (5)2 + (0)2 + (−2)2 = 25 + 0 + 4 = 29
and so a unit vector in the opposite direction to ~v is the vector c~v with c = − ||~v1|| , which gives
1 1 5 2
− ~v = − √ (5, 0, −2) = − √ , 0, √
||~v || 29 29 29
In Theorem 1.1 we observed that for any vector ~v and any scalar c, the vectors ~v and c~v are
collinear. But it is also true that if 2 vectors are collinear then it must be true that they are scalar
multiples of one another.
Theorem 1.2. Vectors ~u and ~v are collinear if and only if there is some scalar value c such that
~v = c~u.
Example 1.10. Let ~u = (−2, 7) and ~v = (4, k). If it is known that ~u and ~v are collinear, what is the
value of k?
Solution:
We know that if 2 vectors are collinear, one can be written as a scalar multiple of the other. So
knowing that ~u and ~v are collinear tells us that there is some value c for which ~v = c~u. We have
c~u = c(−2, 7) = (−2c, 7c)
Therefore to have c~u = ~v , we need (−2c, 7c) = (4, k). And of course these vectors are only equal if
their corresponding components are the same. So we must have −2c = 4 and 7c = k. We use the
first of these to solve for c, and then substitute that in to find k. We get
4
−2c = 4 → c = = −2 → k = 7c = 7(−2) = −14
−2
The only vector (4, k) which is collinear with ~u is the vector (4, −14).
Unit 1 9
Translation of Vectors
A vector is a directed line segment which starts at the origin. A directed line segment from
some point P to some point Q, where P is not the origin, is not called a vector. (So all vectors are
−−→
directed line segments, but only some directed line segments are vectors.) We use P Q to denote
such a directed line segment.
Definition: Two directed line segments which have the same magnitude and the same
direction are called equivalent.
This means that any non-zero vector ~v is equivalent to many other directed line segments – every
directed line segment which is parallel to ~v in the same direction as ~v and is the same length
√ as
~v . For instance, the vector ~v = (2, 2), which goes up to the right with slope 1, and is 8 units
long, is equivalent to the directed line segment from the point P (−4, 0) to the point Q(−2, 2), and
is also equivalent to the directed line segment from the point R(3, −1) to the point S(5, 1), and to
the directed line segment from the point A(−3, 3) to the point B(−1, 5), and so forth.
y
6
B
5
−
−→
AB
4
3
A
Q
2
−−→
PQ ~v S
1 −→
RS
P -
−4 −3 −2 −1 0 1 2 3 4 5 6 x
−1
R
−2
In some contexts, we want to replace a directed line segment by the vector which is equivalent
to it, or replace a vector by an equivalent directed line segment which starts somewhere other than
at the origin. We refer to this as translating the vector, or the directed line segment.
−
−→
Definition: The process of replacing a directed line segment AB with the equivalent
−−→
vector is called translating AB to the origin. Similarly, the process of replacing the
vector ~v with an equivalent directed line segment which starts at some point P is called
translating ~v to P .
Notice that a vector or other directed line segment obtained by translation is always parallel to, and
the same length as, the original directed line segment. This is because by definition, translating in-
volves an equivalent directed line segment, i.e. one which has the same direction and the same length.
Addition of Vectors
We can add one vector to another, as long as they are in the same space, i.e both in ℜ2 or both
in ℜ3 . The way we do this is quite obvious. If I told you that ~u is the vector that travels 40 metres
10 Unit 1
North, and that ~v is the vector that travels 30 metres East, what would you suppose the vector ~u +~v
would be? You’d probably guess that it’s equivalent to going 40 metres North and then 30 metres
East. And if you thought about it a bit more, you might realize that it should be the vector that
goes directly from where you’re starting to where you’re ending up, instead of taking the less direct
route. Because vectors don’t turn corners. Each starts at the origin and goes in a straight line to
its endpoint.
To add vector ~v to vector ~u, we translate ~v to the endpoint of ~u, which we can call U . This is
like travelling 40 metres North, and then travelling 30 metres East. So the sum vector, ~u + ~v , is the
vector that starts at the start of ~u, which is the origin of course, and goes to the endpoint of the
translation of ~v to U .
y translation of
6U ~v to- U
40
6 7
30
~u
20 ~u + ~v
10
~v - -
0 10 20 30 40 50 60 x
Adding two vectors when they’re written in component form is even easier. All we need to do is to
add the corresponding components.
Definition: For any vectors ~u = (u1 , u2 ) and ~v = (v1 , v2 ) in ℜ2 , the vector ~u + ~v , the
sum of ~u and ~v , is given by
For instance, in the picture above, we have ~u = (0, 40) (go 40 metres due North) and ~v = (30, 0) (go
30 metres due east), and we get
Solution:
~u + ~v = (12, −4, 7) + (−3, 5, 8) = (12 + (−3), (−4) + 5, 7 + 8) = (9, 1, 15)
There’s another way to think about the sum of 2 vectors. We’ve seen that if we translate ~v to
the endpoint of ~u (i.e. to U ), the vector ~u + ~v is the vector that goes from the start of ~u (i.e. the
origin) to the endpoint of the translation of ~v to U . And of course the translation of ~v is parallel to
the vector ~v . If we also translate ~u to the endpoint of ~v (i.e. to V ), then that will be a directed line
segment which is parallel to ~u. And those two sets of parallel lines form a parallelogram. That is,
the two translations have the same endpoint. So the vector ~u + ~v also goes from the start of ~v to
the end of the translation of ~u to V . This vector is the diagonal of the parallelogram. (See diagram
next page.)
Unit 1 11
y translation of
6 ~v - to U
U
7
~u + ~v is the diagonal of the parallelogram
~u
translation of ~u to V
~v- -
0 V x
Now let’s think about something a bit different. What do you suppose we mean by the negative
of a vector? For instance, if ~u is the vector which starts at the origin and goes 40 metres due North,
what would the negative of this vector be? What direction do you suppose it goes? How long do
you think it would be?
Definition: The negative of a vector is the vector which is the same length, but has
the opposite direction. We write the negative of ~v as −~v .
We know that a vector with the opposite direction to a vector ~v is collinear with ~v and hence is
a scalar multiple of ~v . For the direction to be opposite, the scalar must be negative. And for the
length to be the same, the components of the vector musn’t change in size, only in sign. That is,
−~v = (−1)~v . So for instance for the vectors in Example 1.11 we get
Theorem 1.3. In component form, −~u is the vector obtained by changing the sign of each compo-
nent of ~u. That is, −~u = (−u1 , −u2 ) in ℜ2 , or −~u = (−u1 , −u2 , −u3 ) in ℜ3 .
In general in mathematics, subtraction is the same as adding the negative. For instance, with
numbers, we can think of 5 − 2 as 5 + (−2). And the same is true with subtraction of vectors. We
define that ~u − ~v means ~u + (−~v ). In terms of directed line segments, this means that we form the
vector difference ~u − ~v by adding −~v to ~u, i.e. by translating the vector −~v to the endpoint of ~u,
i.e. to U . And of course −~v is simply the vector with the same magnitude as ~v but in the opposite
direction. And then the vector ~u − ~v = ~u + (−~v ) is the vector that goes from the origin, i.e. from
the start of ~u, directly to the endpoint of the translation of −~v to U .
y
6 U
translation of −~v to U
>
~u
~v ~u − ~v
-
−~v 0 x
12 Unit 1
Of course, since subtracting ~v from ~u is the same as adding −~v to ~u, and since adding two vectors
in component form simply involves adding corresponding components, then when we subtract one
vector from another in component form, we just add the negative of each component, i.e., subtract
corresponding components.
Definition: For any vectors ~u = (u1 , u2 ) and ~v = (v1 , v2 ) in ℜ2 , the difference vector
~u − ~v is given by
For instance, for ~u = (1, 2) and ~v = (3, 1) we get ~u − ~v = (1, 2) − (3, 1) = (1 − 3, 2 − 1) = (−2, 1).
Similarly, for ~u = (1, 0, 2) and ~v = (1, 1, −1) we have
Just as there was with addition of vectors, there’s another way we can think of the subtraction
of one vector from another. When we find a difference, i.e. one vector minus another, the resulting
vector, when translated to the endpoint of the second vector mentioned, has as its endpoint the first
vector mentioned. That is, if we translate ~u − ~v to V , the endpoint of ~v , we see that it goes to U ,
the endpoint of ~u. Or we can think of it the other way around. If we draw the vectors ~u and ~v (each
−−→
starting at the origin, of course), and draw the directed line segment V U , from the endpoint of ~v to
−−→
the endpoint of ~u, and then translate V U to the origin, this translated vector is the vector ~u − ~v .
We can see this in the diagram below.
y
6
>
U
−−→ translation of −~v to U
V U
>
~u
V
−−→
~v ~u − ~v is equivalent to V U
-
0 x
The unit vectors which run along the axes are very useful, and so they have special names. Con-
sider ℜ2 . The unit vector that runs along the positive x-axis, i.e. that runs from the origin for one
unit in the positive horizontal direction (right), is called ~ı. And the unit vector that runs along the
positive y-axis, i.e. that runs from the origin for one unit in the positive vertical direction (up), is
called ~. Similarly, in ℜ3 the unit vector along the positive x-axis is ~ı, the unit vector along the pos-
itive y-axis is ~ and we also have the unit vector that runs along the positive z-axis, which is called ~k.
Of course, the vector that starts at the origin, i.e. the point (0, 0) or (0, 0, 0), and runs for one
unit along the positive part of one of the axes ends at the point which has 1 as the coordinate
Unit 1 13
corresponding to the axis it moved along, and the other coordinate(s) is (are) still 0. For instance,
the vector ~ı in ℜ2 starts at the origin and ends at the point one unit to the right, which is the point
(1,0).
Definition: The special unit vectors running along the positive axes in ℜ2 are:
~ı = (1, 0)
and ~ = (0, 1)
Similarly, the special unit vectors running along the positive axes in ℜ3 are:
~ı = (1, 0, 0)
~ = (0, 1, 0)
and ~k = (0, 0, 1)
Any other vector ~v can be expressed in terms of these vectors. We multiply each of these special
vectors by a scalar which is the corresponding component of ~v and then add them up. That is, we
can express (v1 , v2 ) as v1~ı + v2~. Likewise, any vector (v1 , v2 , v3 ) can be expressed as v1~ı + v2~ + v3~k.
So for instance (2, −1) = 2~ı − ~ and (−3, 5, 17) = −3~ı + 5~ + 17~k.
In this Unit we have learnt several vector operations: addition, subtraction and scalar multipli-
cation. There are various properties of these operations which hold because of the way they are
defined. (Some of them we’ve already seen; others we haven’t talked about – but they’re fairly
obvious.) You should be aware of, and able to use, all of these properties, which are enumerated in
the following Theorem.
Theorem 1.4. Let ~u, ~v and w ~ be any vectors, all in ℜ2 or all in ℜ3 . Let ~0 be the zero vector in
that same space. Let c and d be any scalars. Then the following properties hold:
(a) ~u + ~v = ~v + ~u
That is, addition of vectors is commutative.
(b) (~u + ~v ) + w
~ = ~u + (~v + w)
~
That is, addition of vectors is associative.
(c) ~u + ~0 = ~u
That is, adding the zero vector to any vector leaves the vector unchanged.
(d) ~u + (−~u) = ~0
That is, the sum of a vector and its negative is the zero vector.
(e) cd(~u) = c(d~u)
That is, to form the scalar multiple of a vector, where the scalar is a product of two scalars, it
doesn’t matter if the scalars are applied one at a time, or if the scalars are multiplied together
before the vector is multiplied by them.
(f ) (c + d)~u = c~u + d~u
That is, scalar multiplication of a vector is distributive over addition of scalars. So if the scalar
by which a vector is to be multiplied is considered as the sum of two scalars, it doesn’t matter
if the vector is multiplied by each scalar separately, and then these new vectors added together,
or if the two scalars are added together and then the vector is multiplied by that sum.
(g) c(~u + ~v ) = c~u + c~v
That is, scalar multiplication of a vector is distributive over addition of vectors. So if the sum
of two vectors is to be multiplied by a scalar, it doesn’t matter whether the vectors are multiplied
by the scalar separately, and then these new vectors added together, or whether the two vectors
are added together and then the sum vector is multiplied by the scalar.
14 Unit 1
(h) 1~u = ~u
That is, multiplying any vector by the scalar 1 leaves the vector unchanged.
(i) (−1)~u = −~u
That is, the negative of a vector (i.e. the vector with the same magnitude but opposite in
direction) is the vector obtained by multiplying the vector by the scalar −1.
(j) 0~u = ~0
That is, multiplying any vector by the scalar 0 produces the zero vector.
Math 1229A/B
Unit 2:
Products of Vectors
(text reference: Section 1.2)
c
V. Olds 2010
Unit 2 15
2 Products of Vectors
In Unit 1 we learnt about a variety of arithmetic operations we can do with vectors. But the
only kind of multiplication we learnt about is scalar multiplication of a vector, i.e. when we take
a vector, and multipliply it be a scalar (i.e. a number) to obtain a new vector. We didn’t have
anything that was like multiplying two vectors together, i.e. forming a product of two vectors. In
this Unit, we learn about two different kinds of vector products.
The first of these is the dot product. This is a product in which two vectors are combined to
produce something new ... but what they produce isn’t a new vector, instead it’s a scalar. So the
inputs to the calculation are 2 vectors, i.e. two elements of ℜ2 , or of ℜ3 , but the output of the
calculation is a number, i.e. some element of ℜ. Because this product produces a scalar as its result,
the term scalar product is often used instead of dot product. In Physics, when a force is applied to
a mass (i.e. an object), to displace the object, both the force which is applied and the displacement
of the object (mass) are vectors. The amount of work done is a scalar quantity, calculated as the
dot product of the two vectors.
The second product is called the cross product. Again we have two vectors being combined to
produce something new, but this time what they produce is a new vector. (Just like when you take
a product of 2 numbers, what you get is a new number.) The cross product is therefore also known
as the vector product. The new vector that we get is perpendicular to both of the other vectors,
which means it is perpendicular to the whole plane in which those 2 vectors lie. Because of that,
this product is not defined for “vectors in the plane”, i.e. for vectors in ℜ2 . We must be dealing with
3-dimensional vectors, i.e. vectors in ℜ3 , in order to be able to express a vector that is perpendicular
to both. So the cross product is only defined for 2 vectors which are both in ℜ3 . Again, there’s a
Physics application of this kind of product. This time, think about applying a force to a wrench
in order to turn a bolt. The force applied is a vector, and the wrench represents another vector
(actually, the vector runs from the centre of the bolt to the point on the wrench at which the force
is applied). And the turning effect the force has on the bolt, called the moment, is the vector (i.e.
cross) product of those two vectors. Its direction is perpendicular to both the force and the wrench
– the bolt moves up or down. (But don’t worry about the Physics of these things. We’re not going
to be talking about Physics at all. They’re mentioned here just to show you that these products do
have real-world applications.)
Definition: Consider any vectors ~u and ~v , either both from ℜ2 or both from ℜ3 . The
dot product of ~u and ~v , written ~u • ~v , is the number obtained when the corresponding
components of ~u and ~v are multiplied together and then these products are summed.
That is, we have:
If ~u, ~v ∈ ℜ2
then ~u • ~v = u1 v1 + u2 v2
If ~u, ~v ∈ ℜ3
then ~u • ~v = u1 v1 + u2 v2 + u3 v3
Example 2.1.
(a) Calculate (1, 2) • (3, −1) and (2, −1, 4) • (3, 2, −2).
(b) If ~u = (0, 3, −2) and ~v = (1, 0, 5), find ~v • ~u and (−~u) • ~u.
(c) If ~u = (1, 2), ~v = (2, −1) and w
~ = (1, −2), find (~u − ~v ) • w.~
16 Unit 2
Solution:
(a) For ~u = (u1 , u2 ) = (1, 2) and ~v = (v1 , v2 ) = (3, −1), we calculate the product of the first
components, and the product of the second components, and add the two products together. That
is,
(1, 2) • (3, −1) = [1 × 3] + [2 × (−1)] = 3 + (−2) = 3 − 2 = 1
Likewise, if we consider the vectors to be ~x = (x1 , x2 , x3 ) = (2, −1, 4) and ~y = (y1 , y2 , y3 ) = (3, 2, −2),
we find the sum of x1 × y1 , x2 × y2 and x3 × y3 :
(b) For ~v • ~u, we just use the formula (recognizing that the order of the vectors is reversed this time).
For (−~u) • ~u, we first find −~u = −(0, 3, −2) = (0, −3, 2) and then take the dot product asked for:
(−~u) • ~u = (0, −3, 2) • (0, 3, −2) = (0)(0) + (−3)(3) + (2)(−2) = 0 + (−9) + (−4) = −9 − 4 = −13
Because all we’re doing is multiplying and adding, and order isn’t important in those operations,
then order also isn’t important in calculating a dot product. That is, the order of the components is
of course important, but the order of the vectors isn’t. Whether we calculate ~u • ~v or calculate ~v • ~u,
the same numbers are being multiplied together, giving the same products being added up, so the
answer is the same. That is, the dot product operation is commutative. Likewise, it doesn’t matter
when, or to what, a scalar multiplier is applied. Whether we multiply ~u by a scalar before forming
the dot product, or multiply ~v by that scalar instead, or even wait and multiply the dot product
value by that scalar, it all comes out the same. The same with addition. We can distribute dot
product over addition of vectors, or we can “factor out” a dot product from a sum of dot products
(as long as the same vector is dotted) and the answer never changes. Of course, if there are 0’s in all
the products that are being added, the final result is just 0, so the dot product of any vector with
the 0 vector is just 0. And compare the dot product formula, when dotting a vector with itself, to
the formula for finding the magnitude of a √ vector. The only thing missing is the square root sign.
So the magnitude of ~u can be thought of as ~u • ~u. (Remember, magnitude must be positive, so we
take the positive square root.) Or the dot product of ~u with itself can be thought of as the square
of the magnitude of ~u ... whichever way you want to look at it. These ideas are summarized in the
following theorem.
Theorem 2.1. Let ~u, ~v and w ~ be any 3 vectors from ℜ2 , or any 3 vectors from ℜ3 , and let c be
any scalar. Then the following properties hold:
1. (commutative property) ~u • ~v = ~v • ~u
4. ~u • ~0 = 0
5. ~u • ~u = ||~u||2
Unit 2 17
For instance, in Example 2.1 (b), we found that ~v • ~u = −10. By property 1. above, we could have
calculated ~u • ~v instead, and we would have got the same answer:
Also, we found that (−~u) • ~u = −13. Property 2. tells us that we didn’t need to find −~u first.
We could instead have calculated ~u • ~u, and then taken the negative (i.e. applied a −1 multiplier)
afterwards.
√ Also, having found that −(~u • ~u) = −13, so that ~u • ~u = 13, Property 5. tells us that
||~u|| = 13. And in Example 2.1 (c), instead of finding (~u − ~v ) • w,
~ we could have used Property 3.
as follows:
(~u −~v )• w
~ = (~u • w)−(~
~ v • w)
~ = [(1, 2)•(1, −2)]−[(2, −1)•(1, −2)] = (1+(−4))−(2+2) = 1−4−4 = −7
Notice: In that calculation, we actually used other properties, of dot product and of vector arith-
metic, too. Because the dot product is commutative (Property 1.), we don’t need to worry about
whether we have the form ~x • (~y + ~z) (as shown in Property 3.), or the form (~y + ~z) • ~x (as we had
above) when we distribute the dot product over vector addition. Also, because we know that vector
subtraction is really just the same as addition of the negative of the vector, we can also distribute
the dot product over vector subtraction. That is, (~u − ~v ) • w
~ = (~u + (−1)~v) • w.
~ And then when
we have distributed the dot product, we can also, by Property 2., pull the −1 multiplier outside, so
that again we’re adding the negative, of ~v • w
~ this time, and thus just subtracting.
Also Notice: We learnt in Unit 1 that for any vector ~u, 0~u = ~0. That is, if any vector is multiplied
by the scalar value 0, the result is the zero vector. And now, in the above theorem, we are told that
~u • ~0 = 0. That is, when any vector is dotted with the zero vector, the result of that dot product,
which as always is a scalar, is the number 0. Make sure you understand the use of, and difference
between, the scalar value 0 and the vector ~0, here and elsewhere.
Solution:
Approach 1: Since ~v = − 23 ~u, then we have
2 2 2
~u • ~v = ~u • − ~u = − ~u • ~u = − (||~u||)2
3 3 3
Approach 2: (quicker – don’t find ||~u|| first) Knowing that ~v = − 32 ~u also means that we have
~u = ~v ÷ − 32 = ~v × − 23 = − 32 ~v . And then we have
3 3 3
~u • ~v = − ~v • ~v = − (~v • ~v ) = − ||~v ||2
2 2 2
and so using the fact that ~u • ~v = −6 we get
3 2 √
− ||~v ||2 = −6 ⇒ ||~v ||2 = −6 × − =4 ⇒ ||~v || = 4=2
2 3
18 Unit 2
Definition: For any vectors ~u and ~v , either both in ℜ2 or both in ℜ3 , the angle
between ~u and ~v means the angle no more than 180◦ (or π radians) formed by the
directed line segment representations of these vectors. We often use θ to represent this
angle.
For instance, for the vectors ~u and ~v depicted here, the angle between ~u and ~v is the angle θ shown:
y
6
4
~u
3
2
1 θ *~v
-
0 1 2 3 x
Geometrically, the scalar value that is the dot product of two vectors gives the value of a specific
calculation involving the magnitudes of the vectors and the angle between them. And we can use
this fact to express a formula for the cosine value of the angle between the vectors.
You have probably taken some trigonometry before, which deals with angles (particularly in
triangles). Since our focus in this course is primarily on vectors expressed in component form, we
are not particularly concerned with trigonometry in this course, so you don’t need to remember any-
thing you learned previously about this. We don’t even need to think about what the cosine value
of an angle means or represents. And we certainly don’t want to learn/review enough trigonometry
to understand where the following theorem comes from. However, you should realize that any 2
vectors do form an angle. And the formula in the theorem below expresses how the dot product and
magnitudes of the vectors tell us about this angle.
Theorem 2.2. Consider any vectors ~u and ~v , either both in ℜ2 or both in ℜ3 . Let θ be the angle
between ~u and ~v . Then
~u • ~v = ||~u|| ||~v || cos θ
and provided that ~u 6= ~0 and ~v 6= ~0, this relationship can be rearranged to find the value of cos θ as
~u • ~v
cos θ =
||~u|| ||~v ||
That is, for any non-zero vectors ~u and ~v , the angle between ~u and ~v is the angle θ whose cosine
value is the value given by this formula. (There’s only one angle no bigger than 180◦ which has that
cosine value.)
Notice: If ~u = ~0 or ~v = ~0, then one of the terms in the denominator is 0, so we are trying to divide
by 0, which isn’t possible. It doesn’t make any sense to talk about the angle between the zero vector
and another vector, so in that case θ is undefined and therefore so is cos θ.
Unit 2 19
Also Notice: This formula for calculating cos θ, and another formula involving sin θ later in this unit,
are the only times we’ll be using any of the ideas of trigonometry in this course.
Example 2.3. Find the value of cos θ where θ is the angle between ~u and ~v in each of the following:
(a) ~u = (5, 4) and ~v = (2, −3) (b) ~u = (1, 2, 3) and ~v = (3, −2, 1)
(c) ~u = (1, 2) and ~v = (2, −1) (c) ~u = (2, 3) and ~v = (4, 6)
Solution:
~u • ~v
For each of these, we use the formula cos θ = , so we need to find ||~u||, ||~v || and ~u • ~v .
||~u|| ||~v ||
(a) For ~u = (5, 4) and ~v = (2, −3) we get:
p √ √
||~u|| = 52 + 42 = 25 + 16 = 41
p √ √
||~v || = 22 + (−3)2 = 4 + 9 = 13
and ~u • ~v = (5, 4) • (2, −3) = (5)(2) + (4)(−3) = 10 + (−12) = 10 − 12 = −2
−2 2
so cos θ = √ √ = −√ √
( 41)( 13) 41 13
Notice: Since (4, 6) = 2(2, 3) we have ~v = 2~u and so ~u and ~v have the same direction. That means
that the angle between them is 0 (i.e. 0◦ or 0 radians), and cos 0 = 1. Similarly, if w ~ = (−4, −6)
we have w~ = −2~u and so w ~ has the opposite direction to ~u. In this case, the calculations are the
same as those shown above except that the numerator is negative so the final answer is cos θ = −1.
Whenever ~u and ~v have opposite directions, the angle between them is θ = 180◦ (or π radians),
which has cosine value −1.
20 Unit 2
Let’s think some more about what we had in part (c) of that example. We had two non-zero vectors
whose dot product was 0, which meant that also the cosine of the angle between them was 0. What
angle has cosine value 0? Do you remember?
You probably don’t (and we said earlier that you wouldn’t need to remember such things), so let’s
look at a picture of these vectors. They were ~u = (1, 2) and ~v = (2, −1). We have:
y
2 6
~u
1
H
-
−2 −1 0 H 1 2x
~v HH
−1 HH
j
−2
It’s a right angle! ~u and ~v are perpendicular! Notice: In ℜ2 , the vector (b, −a) is always perpen-
dicular to the vector (a, b). And it’s true ... the angle (smaller than 180◦ ) whose cosine value is
zero is θ = 90◦ (or π2 radians). But when we’re talking about the relationship between two vectors,
perpendicular isn’t the word we usually use. We say that two lines are perpendicular, or that two
planes are perpendicular, but we say that two vectors are orthogonal.
Definition: Two vectors are said to be orthogonal if the angle between them is 90◦
(i.e. π2 radians).
Notice: We would also use the word orthogonal if we were talking about two directed line segments.
Two directed line segments are orthogonal if the vectors obtained by translating them to the origin
are orthogonal.
Also Notice: Orthogonal means the same thing as perpendicular, but the usage is a bit different.
Another word that also means the same thing is normal. We use perpendicular when comparing
two lines or planes, orthogonal when comparing two vectors (or directed line segments), and normal
when comparing a vector to a line or a plane. That is, we say that the vector ~u is normal to a
particular line or plane if the vector, or the corresponding directed line segment when translated to
some point on that line or plane, meets the line or plane at an angle of 90◦ .
Theorem 2.3. Let ~u and ~v be two vectors, either both in ℜ2 or both in ℜ3 . Then:
So as mentioned above, for any real values of a and b, the 2-dimensional vectors (a, b) and (b, −a) are
orthogonal to one another. And any scalar multiple of (b, −a) is also orthogonal to (a, b). However,
in ℜ3 there’s no easy way to recognize that 2 vectors are orthogonal, other than by calculating their
dot product. (For ℜ2 , you’ll want to remember that you can get a vector orthogonal to (a, b) simply
by switching the components and changing the sign of one of them.)
Example 2.4. Prove that ~u = (1, 3, −2) and ~v = (5, 1, 4) are orthogonal.
Solution:
Example 2.5. Let ~u = (2, 0, 4) and ~v = (1, 1, k). If ~u and ~v are orthogonal, what is the value of k?
Solution:
If ~u and ~v are orthogonal, then it must be true that ~u • ~v = 0. But we can calculate ~u • ~v :
As previously stated, the Cross Product of two vectors ~u and ~v in ℜ3 is a new vector in ℜ3
which is orthogonal to both ~u and ~v . This vector operation is only defined for vectors in ℜ3 .
~u × ~v = (u2 v3 − u3 v2 , u3 v1 − u1 v3 , u1 v2 − u2 v1 )
Example 2.6. Find the cross product of ~u = (1, 2, 3) and ~v = (4, 5, 6).
Solution:
We have u1 = 1, u2 = 2 and u3 = 3, with v1 = 4, v2 = 5 and v3 = 6, and so we get:
~u × ~v = (u2 v3 − u3 v2 , u3 v1 − u1 v3 , u1 v2 − u2 v1 )
= ((2)(6) − (3)(5), (3)(4) − (1)(6), (1)(5) − (2)(4))
= (12 − 15, 12 − 6, 5 − 8)
= (−3, 6, −3)
You will have noticed that the formula is kind of nasty ... easy to get confused on. Fortunately,
there are some easier ways to remember how to find the cross product. That is, procedures that are
less easily confused that will get you to the answer. The text shows one such procedure, involving
something called a determinant, on p. 17. You should have a look at that. Here, we’ll look at
another procedure that the text doesn’t show, and which you might find easier (until later in the
course when we learn about determinants).
22 Unit 2
Step 2: On the next line, lined up beneath them, write down the components of the second vector,
twice.
So now we have
u1 u2 u3 u1 u2 u3
v1 v2 v3 v1 v2 v3
Step 3: Now, cross off the first and last numbers on each line.
We get
6 u1 u2 u3 u1 u2 6 u3
6 v1 v2 v3 v1 v2 6 v3
Step 4: Always going left-to-right, and using only the numbers that aren’t crossed off, find the vectors
of down-products and of up-products, by multiplying.
For the down-products we go down to the right:
6 u1 u2 u3 u1 u2 6 u3
ց ց ց
6 v1 v2 v3 v1 v2 6 v3
6 u1 u2 u3 u1 u2 6 u3
ր ր ր
6 v1 v2 v3 v1 v2 6 v3
Notice: If you compare this to the definition, you’ll see that we’ve switched the order of the things
being multiplied together corresonding to the up-products, i.e. the things after the minus signs. But
because multiplication of numbers is commutative, that doesn’t matter. And since we won’t be
writing down a formula like this, but rather just numbers, we don’t care.
Example 2.7. Use this procedure to find the cross product in Example 2.6.
Solution:
We have ~u = (1, 2, 3) and ~v = (4, 5, 6). Applying the procedure step-by-step, we get:
Step 1: We write down the components of ~u, twice:
1 2 3 1 2 3
Step 2: Then on the line below, we write down the components of ~v , twice:
1 2 3 1 2 3
4 5 6 4 5 6
Unit 2 23
Step 3: And we cross off the first and last numbers on each line:
61 2 3 1 2 63
64 5 6 4 5 66
Step 4: Now we multiply down to the right for the vector of down-products and up to the right for the
vector of up-products:
61 2 3 1 2 63
ց ց ց
64 5 6 4 5 66
gives
down-products = (2 × 6, 3 × 4, 1 × 5) = (12, 12, 5)
and then
61 2 3 1 2 63
ր ր ր
64 5 6 4 5 66
gives
up-products = (5 × 3, 6 × 1, 4 × 2) = (15, 6, 8)
Solution:
We use the procedure above. This time we’ll just carry out the steps, without saying what we’re
doing for each:
61 2 1 1 2 61
ց
ր ց
ր ցր
−1
6 0 1 −1 0 61
Oh, look! ~v × ~u looks a lot like ~u × ~v , except the signs are all switched. That is, we see that
(−2, 2, −2) = −(2, −2, 2), so ~v × ~u = −(~u ×~v ). That’s not just a fluke. That’s always true! (Go back
to the definition, or to the statement of the procedure, and think about what happens if you reverse
the roles of ~u and ~v . You’ll see that in each component, the 2 products have switched places in the
subtraction, so each component is the negative of what it was before. That is, in the procedure,
we’ve just switched the down-products and the up-products, so we get the negative of the vector we
were getting before.)
We asserted before that the cross product of two vectors is a vector that’s orthogonal (i.e. per-
pendicular) to both vectors. How can we see that this is true? Recall from Theorem 2.3 that if two
vectors are orthogonal, their dot product is 0. We can use this to see that ~u × ~v is orthogonal to
both ~u and ~v .
Example 2.9. For the vectors in Example 2.6, show that ~u × ~v is orthogonal to both ~u and ~v .
Solution:
We know that two vectors are orthogonal if and only if their dot product is 0. We calculate (~u ×~v )•~u
and (~u × ~v ) • ~v . In Example 2.6 we had ~u = (1, 2, 3) and ~v = (4, 5, 6), and we found that ~u × ~v =
(−3, 6, −3). We get:
Since (~u × ~v ) • ~u = 0, then we see that the vector ~u × ~v and the vector ~u are orthogonal, i.e. are
perpendicular to one another. Also, since (~u ×~v)•~v = 0 as well, then ~u ×~v and ~v are also orthogonal.
That is, the vector ~u × ~v is orthogonal both to ~u and to ~v .
That’s just one example, but if you use the cross product formula, it’s not hard to see that for any
vectors ~u and ~v , (~u × ~v ) • ~u = 0 and also (~u × ~v ) • ~v = 0. (Try it!)
Theorem 2.4. For any vectors ~u and ~v in ℜ3 , the vector ~u × ~v is orthogonal to both ~u and ~v .
How can a vector be orthogonal to both ~u and ~v at the same time? If you try to think about
that in ℜ2 , you’ll hurt your brain, because it can’t be done! But in ℜ3 it’s easy. There’s a particular
plane which contains the vectors ~u and ~v , and the vector ~u × ~v lies in a plane perpendicular to that
plane. The easiest instance to understand can be seen in the next example.
Solution:
We have ~ı = (1, 0, 0) and ~ = (0, 1, 0), so we get:
61 0 0 1 0 60
ց
ր ց
ր ց
ր
60 1 0 0 1 60
So let’s think about that. We know that ~ı = (1, 0, 0) is a unit vector running along the positive
x-axis, and that ~ = (0, 1, 0) is a unit vector running along the positive y-axis, while ~k = (0, 0, 1)
is a unit vector running along the positive z-axis. Imagine the x- and y- axes drawn on the page
or the computer screen in the usual way (for 2-space), and the z-axis coming up out of the page at
you, or toward you out of the computer screen. The z-axis is perpendicular to both the x-axis and
the y-axis (and to the whole plane containing the page or screen). And so ~k is orthogonal to both ~ı
and ~. Also, if we calculate ~ ×~ı, we have (0, 0, 0) − (0, 0, 1) = (0, 0, −1) = −~k. That’s a unit vector
running along the negative z-axis (down through the page, or into the computer screen). ~ × ~ı has
the opposite direction to ~ı × ~. And it’s still orthogonal to both ~i and ~j.
We’ve seen that the cross product of two vectors is orthogonal to both, and also that the cross
product is not commutative, because if you switch the order of the vectors in the cross product, we
get the opposite vector, i.e. the negative, which runs in the opposite direction. There are various
other interesting properties that the cross product has. For instance, just like with the dot product,
if there’s a scalar multiplier on one of the vectors in a cross product, it can be applied to either
vector, or simply factored out and applied after finding the cross product vector. And the cross
product is distributive over addition (or subtraction) of vectors ... but since cross product is not
commutative, we state 2 distributive laws, one for the sum being the first vector in the cross product,
and one for the sum being the second. And it should be pretty easy to see why it’s true that if
either one of the vectors in a cross product is the zero vector, then the cross product vector is the
zero vector. (But remember, it’s still a vector. It’s ~0, not 0.) And likewise, it’s not hard to see
that if you cross any vector with itself, the down-products and up-products vectors are the same,
so what you get is, again, the zero vector. Also, there’s a formula for the magnitude of the cross
product of 2 vectors, and it looks a lot like the formula relating the dot product of the vectors to
their magnitudes, except that it involves the sine of the angle between the vectors, rather than the
cosine. The following theorem lists all these properties.
Theorem 2.5. Let ~u, ~v and w ~ be any vectors in ℜ3 and let c be any scalar. Let θ denote the angle
between ~u and ~v , as previously defined. Then the following properties hold:
1. ~v × ~u = −(~u × ~v )
5. ~u × ~0 = ~0 × ~u = ~0
6. ~u × ~u = ~0
All those properties seem pretty theoretical, and this isn’t a very theoretical course. They’re
useful to know, but let’s focus on something more practical. What can we use the cross product
for? Well, because of the last part of the theorem above, and because of some trigonometry that we
don’t need to know, it turns out that the magnitude of the cross product of two vectors is the same
as the area of the parallelogram whose sides are those vectors, or are any directed line segments
equivalent to those vectors.
To see what we mean about the parallelogram whose sides are vectors ~u and ~v , consider the
following diagram. Note that this appears to depict vectors in ℜ2 . That’s just to keep the diagram
clear. Imagine that there’s a third axis coming straight out of the page, but that the vectors depicted
26 Unit 2
happen to both (all) have third component 0, so that they lie in the x-y plane.
y
6
7
~u + ~v is the diagonal of the parallelogram
~u
~v- -
0 x
The area of this parallelogram is of course width times height. The width is ||~u||. What is
the height? Well, it turns out (for reasons we don’t need to concern ourselves with) it’s ||~v || sin θ.
And that means that width times height gives ||~u|| ||~v || sin θ which, according to the last part of the
theorem above, is the same thing as ||~u × ~v ||. You don’t need to worry about why that’s the height
of the parallelogram, or even why the magnitude of the cross product vector is what the theorem
says it is. You simply need to know that you can always find the area of a parallelogram in ℜ3 as
the magnitude of the cross product of vectors equivalent to 2 adjacent sides of the parallelogram, as
it says in the following theorem.
Theorem 2.6. Consider any parallelogram in ℜ3 . Let ~u and ~v be vectors equivalent to two adjacent
sides of the parallelogram. Then the area of the parallelogram is given by ||~u × ~v ||.
Example 2.11. Consider the vectors ~u = (1, −2, 3) and ~v = (2, 1, 0). Find the area of the parallelo-
gram determined by these vectors.
Solution:
When we say “the parallelogram determined by” ~u and ~v , we mean the parallelogram which has
these vectors as two adjacent sides. According to the theorem, the area of this parallelogram is the
magnitude of the cross product vector, so first we need to find that vector. For ~u = (1, −2, 3) and
~v = (2, 1, 0) we have
6 1 −2 3 1 −2 6 3
ց
ր ցր ց
ր
62 1 0 2 1 60
Any triangle can be thought of as half of a parallelogram. Generally, you just think about taking
another copy of the same triangle and putting it upside down right beside the triangle, so that they
have a common edge.
Unit 2 27
For instance, a triangle which has two adjacent sides being some vectors ~u and ~v , we have:
y
6
C
C same triangle
~u C
original C upside down
-
triangle C right next to it
C
-C -
0 x
~v
This means that the triangle determined by ~u and ~v (i.e. with ~u and ~v as adjacent sides) is one half
of the parallelogram determined by ~u and ~v .
1
Theorem 2.7. The area of any triangle in ℜ3 can be found as area = 2 × ||~u × ~v ||, where ~u and ~v
are vectors equivalent to any 2 sides of the triangle.
Example 2.12. Find the area of the triangle OAB, where O is the origin, A is the point (2, 5, 1) and
B is the point (4, 6, 2).
Solution:
−→ −−→ −
−→
The sides of the triangle are ~a = OA = (2, 5, 1), ~b = OB = (4, 6, 2) and BA = ~a − ~b = (−2, −1, 1).
−−→
(Recall from Unit 1 that the vector obtained by translating U V to the origin is the vector ~v − ~u,
where ~u and ~v have endpoints U and V , respectively. Also, notice that we don’t need to use the
−−
→
vector equivalent to BA in finding the area of the given triangle, because we only need to use 2 of
the sides.) So the area of the triangle is one-half the magnitude of the vector ~a × ~b:
62 5 1 2 5 61
ց
ր ց
ր ց
ր
64 6 2 4 6 62
p √
1 (4)2 + (0)2 + (−8)2 16 + 0 + 64
and therefore Area = ||(4, 0, −8)|| = =
2
√ 2 √ 2
√ √
80 4 × 20 4×4×5 4 5 √
= = = = =2 5
2 2 2 2
There’s one more useful application of the cross product, which is to find the volume of the
parallelepiped determined by 3 vectors. A parallelepiped is a 3-dimensional figure which is basically
a sloped cube or box. That is, a cube is a special case of a parallelepiped, in which each face is a
square. In general in a parallelepiped, each face is a parallelogram.
~u
w
*
~
-
~v
Notice that each of the edges is a (directed) line segment equivalent to one of the three vectors.
Theorem 2.8. The volume of the parallelepiped determined by the vectors ~u, ~v and w
~ is given by
Volume = |(~u × ~v ) • w|
~
That is, the volume is the absolute value of the dot product of the vector ~u × ~v with vector w. ~
Notice that the absolute value signs are necessary because volume must be positive, whereas the
dot product of ~u × ~v with w ~ may be either a positive or a negative number. And given 3 vectors
or directed line segments which determine a parallelepiped, there’s no “right” designation of which
plays the role of ~u, which plays the role of ~v and which plays the role of w
~ for this calculation. Any
configuration of the vectors will give the same value for the volume. (But for some, the value of the
dot product is negative, so we need to take the absolute value.)
−
−→
Example 2.13. Find the volume of the parallelepiped determined by the directed line segments AB,
−→ −−→
AC and AD for the points A(1, 0, 1), B(2, 1, 1), C(2, 2, −1) and D(1, 1, 2).
Solution:
−−→
We first use the fact that when any directed line segment U V is translated to the origin, the result
is the vector ~v − ~u, where ~u and ~v have endpoints U and V , respectively. We get:
−−→
AB = ~b − ~a = (2, 1, 1) − (1, 0, 1) = (1, 1, 0)
−→
AC = ~c − ~a = (2, 2, −1) − (1, 0, 1) = (1, 2, −2)
−−→
AD = d~ − ~a = (1, 1, 2) − (1, 0, 1) = (0, 1, 1)
The volume of the parallelepiped determined by the 3 directed line segments is the same as the
volume of the parallelepiped determined by these 3 vectors. (The parallelepiped determined by the
vectors is just the same parallelepiped, but moved so that the corner which was at point A is now at
the origin.) We can designate the 3 vectors above as ~u, ~v and w ~ in any way. Let’s assign the names
in the order the vectors are listed in. That is, let’s use ~u = (1, 1, 0), ~v = (1, 2, −2) and w
~ = (0, 1, 1).
The next step is to find ~u × ~v :
61 1 0 1 1 60
ց
ր ց
ր ց
ր
61 2 −2 1 2 −2
6
so ~u × ~v = (1 × (−2), 0 × 1, 1 × 2) − (2 × 0, (−2) × 1, 1 × 1)
= (−2, 0, 2) − (0, −2, 1)
= (−2, 2, 1)
Unit 2 29
(~u × ~v ) • w
~ = (−2, 2, 1) • (0, 1, 1) = (−2)(0) + (2)(1) + (1)(1) = 0 + 2 + 1 = 3
Notice: If we had designated the vectors in a different way, we would still get the same answer.
For instance, suppose we used ~u = (1, 1, 0), ~v = (0, 1, 1) and w
~ = (1, 2, −2). (That is, suppose we
switched the roles of the vectors we previously called ~v and w.)
~ Then we would get:
61 1 0 1 1 60
ց
ր ց
ր ց
ր
60 1 1 0 1 61
Unit 3:
Lines and Planes
(text reference: Section 1.3)
c
V. Olds 2010
30 Unit 3
Lines in ℜ2
You are already familiar with equations of lines. In previous courses you will have written
equations of lines in slope-point form, in slope-intercept form, and probably also in standard form
for a line in ℜ2 . Recall that:
y − y1 = m(x − x1 ) is the slope-point form equation of the line through point (x1 , y1 ) with slope m
y = mx + b is the slope-intercept form equation of the line with slope m and y-intercept b
ax + by = c is the standard form equation which either of the others can be rearranged to
In this course, we don’t use the slope-point or slope-intercept forms of equations of lines. Instead,
we use various other, vector-based, forms of equations. But we do still use the standard form.
You already know that given any 2 distinct points, whether in ℜ2 or in ℜ3 , there is exactly one
line which passes through both points. Suppose we have 2 points, P and Q. Let ℓ be the line that
passes through these two points. Both point P and point Q lie on line ℓ, and so do all the points
−−→
between them. In fact, the directed line segment P Q lies on line ℓ. When this directed line segment
is translated to the origin, the resulting vector most likely doesn’t lie on line ℓ (unless the origin
happens to lie on line ℓ), but if not, it does lie on a line which is parallel to line ℓ. It lies on the line
parallel to ℓ which passes through the origin. So this vector does give us some information about
the line. (Similar to the information given by knowing the slope of a line in ℜ2 , although it’s not
quite the same information.)
If a vector ~v lies on a particular line, or on a line parallel to that line, we say that ~v is parallel to,
or is collinear with that line. And we call ~v a direction vector for the line. Not that the line actually
has a direction associated with it. It doesn’t. It extends in both directions, but has no particular
“forwards along the line” or “backwards along the line” associated with it. So don’t read too much
meaning into the term direction vector. If ~v is a direction vector for line ℓ, then so is −~v . And so
is every other scalar multiple of ~v , except for 0~v. Because of course 0~v = ~0 which has no direction
information. But every other scalar multiple of ~v starts at the origin, and goes either the same or
the opposite direction as ~v and therefore also lies on the line parallel to ℓ which passes through the
origin. So any such vector would be considered a direction vector for line ℓ.
Definition: Any non-zero vector which is parallel to a line ℓ is called a direction vector
for line ℓ.
For instance, consider the line x + y = 2. The points (1, 1), (2, 0), (0, 2), (−1, 3), (3, −1), (−2, 4),
(4, −2), etc., all lie on this line. So do (1/2, 3/2) and (3/2, 1/2) and infinitely many other points.
Pick any 2 of these points, and find the vector which is the translation to the origin of the directed
line segment between them, and you have a direction vector for the line. And we know that for any
−−→ −−→
points P and Q, letting ~ p = OP denote the vector from the origin to point P and ~q = OQ denote
the vector from the origin to point Q, the vector ~v = ~q − ~p is the translation of directed line segment
−−→
P Q to the origin. So for instance for points P (1, 1) and Q(0, 2), we have p~ = (1, 1) and ~q = (0, 2),
and we see that ~v = ~ q−~ p = (0, 2) − (1, 1) = (−1, 1) is a direction vector for the line x + y = 2. And
other choices of P and Q give other direction vectors which are scalar multiples of this one. (Go
ahead, pick some other points, and see what vectors you get.)
Unit 3 31
Point-Parallel Form
If we know a direction vector, ~v , for a line, and any one point, P , on the line, we can use them
to write an equation of the line. Because if we take the vector from the origin to the specified point,
−−→
p~ = OP , and travel from there any non-zero scalar multiple of the direction vector (i.e. form the
vector sum of the vector p~ and some scalar multiple of the direction vector), we travel along the
line and end up at some other point on the line (i.e. the sum vector goes from the origin to some
point on the line). And any point on the line can be reached by doing this. It’s just a matter of
choosing the right scalar multiple. So let Q(x, y) be any point on the line ℓ which goes through
−−→
point P and has direction vector ~v . Then ~q = OQ = p~ + t~v , for some value of t. And if we write the
vectors in component form, it looks like we’re adding points together, although of course we’re not.
If we let P (x1 , y1 ) denote the known point on the line, and use ~v = (v1 , v2 ) to denote the direction
vector, we get (x, y) = (x1 , y1 )+t(v1 , v2 ) as an equation which describes all points (x, y) on the line ℓ.
The way we actually write the line is a little different. We use ~x(t) to denote the vector (x, y),
i.e. the vector from the origin to an unspecified point on the line. (It is written as ~x(t) to denote
that the specific point obtained depends on, i.e. is a function of, the particular t value used.) Since
we’re writing an equation of the line using a point on the line and a vector which is parallel to the
line, we refer to the equation as being in point-parallel form.
Definition: The point-parallel form equation of the line ℓ which passes through point
P and has direction vector ~v is given by
~x(t) = p~ + t~v
i.e. ~x(t) = (p1 , p2 ) + t(v1 , v2 )
Example 3.1. Write a point-parallel form equation for each of the following lines:
(a) The line through P (3, 1) and Q(0, 6).
(b) The line through P (1, 2) with direction vector ~v = (2, −1).
(c) The line through the origin with direction vector ~v = (0, 1).
Solution:
−−→
(a) The directed line segment P Q lies on, and hence is parallel to, the line through P and Q. We
have the vector p~ = (3, 1) which goes from the origin to the point P , and the vector ~q = (0, 6) which
−−→
goes from the origin to the point Q, and so when we translate P Q to the origin, we get
~v = ~
q − p~ = (0, 6) − (3, 1) = (−3, 5)
as a direction vector for the line. And now, we can use either P or Q as the point which we know
to be on the line. If we use P , we get the point-parallel form equation:
(b) This time, we don’t need to find the direction vector. We have ~p = (1, 2), so we use this and
~v = (2, −1) in the form ~x(t) = p~ + t~v to get the point-parallel form equation
(c) Again, all we need to do is plug the point and the direction vector into the point-parallel form.
The point, of course, is the origin, i.e. (0, 0), and the direction vector is ~v = (0, 1). We get:
Parametric Equations
There’s another form of an equation of a line, which is really more than one equation, that
follows directly from the point-parallel form. Remember, the ~x(t) on the left hand side of the
point-parallel equation is simply saying that the vector ~x = (x, y), corresponding to any point (x, y)
on the line, is determined by the choice of value of the parameter t. So as we saw before, the
point-parallel equation ~x(t) = (p1 , p2 ) + t(v1 , v2 ) really says that for any point (x, y) on the line,
(x, y) = (p1 , p2 )+ t(v1 , v2 ). Now, this is a statement about vectors, but (as previously noted) it looks
like we’re doing arithmetic with points. We’re not really, because that wouldn’t make any sense,
but if we break it down to individual components of vectors, we get statements which are equally
true of coordinates of points. Consider the first components of the vectors in the equation. We can
express the vector arithmetic being done for that component as x = p1 + tv1 . And if we think about
x as the x-coordinate of an unspecified point on the line, and p1 as the x-coordinate of the point P ,
and v1 as the x-coordinate of the endpoint of the direction vector, then the statement x = p1 + tv1
simply says that you can get the x-coordinate of a point on the line by adding some multiple of the
x-coordinate of the endpoint of the direction vector to the x-coordinate of the known point. And
then, if you do the same thing with the y-coordinates, using the same value of the multiplier, t, the
formula y = p2 + tv2 gives the y-coordinate of the same point on the line. So the point-parallel form
equation also gives us two equations, which together describe any point on the line. And because it
describes the point in terms of the effect of the parameter t, we call these parametric equations of
the line.
Definition: The line ℓ with point-parallel form equation ~x(t) = (p1 , p2 ) + t(v1 , v2 ) has
parametric equations
x = p1 + tv1
y = p2 + tv2
Notice: Parametric equations of lines in ℜ2 always come in pairs. You can’t have only one para-
metric equation, telling about just one component/coordinate. That doesn’t describe the line. Also,
if you use parametric equations to find points on the line, you have to remember to use the same
value of t in both equations.
Example 3.2. Find parametric equations for each of the lines in Example 3.1.
Solution:
(a) We have the point-parallel form equation ~x(t) = (3, 1) + t(−3, 5). We get the right hand side of
the x equation from the first components, and the right hand side of the y equation from the second
components. Of course, we don’t usually write something like t(−3) or t(5), or even t5. We would
write this product as −3t or 5t. And we know that adding a negative is the same as subtracting, so
the minus sign in the −3t can replace the plus sign. We get:
x = 3 − 3t
y = 1 + 5t
(b) This time we have the point-parallel form equation ~x(t) = (1, 2) + t(2, −1), which gives the
parametric equations:
x = 1 + 2t
y = 2−t
(c) And now, we use ~x(t) = (0, 0) + t(0, 1). But unless it’s the only thing there, we don’t need to
write a 0. And we never need to write a 1 multiplier. We get:
x = 0 + 0t ⇒ x = 0
y = 0 + 1t ⇒ y = t
Unit 3 33
Example 3.3. Write a point-parallel form equation for the line with parametric equations
x = 1 + 5t
y = 2
Solution:
We use the x equation to find the first components for our point-parallel equation, and the y equation
to find the second components. We need to recognize that in each equation, the number on the right
hand side that isn’t multiplied by t is the coordinate of the known point, P , and that the number
that is multiplied by t is the component of the direction vector, ~v . So from the first equation, i.e.
the x-equation, we see that p1 = 1 and v1 = 5. And from the second equation, since there’s no t
multiplying it, the 2 must be p2 . So where’s the t? It’s invisible, which means it must have a 0
multiplier. That is, v2 = 0. So we have the point P (1, 2) and the direction vector ~v = (5, 0), which
when we put it in the form ~x(t) = p~ + t~v gives the point-parallel form equation
Two-Point Form
Now, suppose that we have two points, P and Q, and consider the vectors p~ and ~q, from the
origin to the points. Let X be any point on the line segment joining P and Q. Of course, we can
−−
→
consider X to be a point on the directed line segment P Q. How can we describe the vector ~x, from
the origin to point X? Let’s look at a picture.
6 P
XXX
XX Q
XX
*
p~ ~x
~q
-
−−→
Consider the directed line segment P X. Suppose that we travel along the vector p~ and then
−−→
along the directed line segment P X. Then we started at the origin and ended up at the point X,
the same as if we travelled along the vector ~x. In terms of vector sums, we travelled p~ + ~u where ~u
−−→
is the translation of P X to the origin. So we have ~x = ~p + ~u.
−−→ −−→ −−→
But let’s think more about P X and ~u. P X is a piece of the directed line segment P Q. For
−−→ −−→
instance, if X was exactly one-third of the way along P Q, then P X would be equivalent to one-
−−
→ −−→
third of P Q. And we know that P Q = ~q − ~p. So we could say (if X happened to be exactly one-third
−−→
of the way along P Q) that ~u = (1/3)(~q − ~p). Now, we don’t necessarily have X being one-third of
−−→
the way along. We’re considering any point X on P Q. But then we do know that X is 100t% of the
−−→
way along P Q, for some value t between 0 and 1. (For instance, if X is 20% of the way along, then
−−→ −−→
t = .2.) And then this means that we have P X = tP Q so that ~u = t(~q − ~p). Therefore we also have
That is, for any point X along the line segment joining point P and point Q, we have
~x = (1 − t)~
p + t~q
But nothing that we did here really required being in between P and Q. We could do something
similar for any point on the line containing P and Q. The only difference is that t would no longer
necessarily be between 0 and 1. That is, for any point X on the line containing the points P and
Q, we could travel from the origin to point P , and then travel some scalar multiple of the vector
~u to end up at the point X. For instance, consider the diagram shown below. As before, we have
~x = p~ + t~u, but now t is bigger than 1. Or if we needed to go the other direction along the line, from
P , then t would be negative.
XX
X6 XXX P
XXXQX
XX XX
XX XX
*
X
XX
p~ ~q XXX
XXX
~
x
-
for some value t. That is, we can express the line containing two points P and Q as the line
containing all points X(x, y) such that ~x = (1 − t)~
p + t~q for some value t. And so we have another
form of equation for the line. We call this the two-point form, and as before, we write ~x(t) instead
of just ~x.
Definition: The two-point form of equation for the line through points P and Q is:
~x(t) = (1 − t)~
p + t~q
Example 3.4. Write equations in two-point form for each of the lines in Example 3.1.
Solution:
(a) The line here is the line through the points P (3, 1) and Q(0, 6), so we have p~ = (3, 1) and
~q = (0, 6) and we get the two-point form
(b) This time, we have the line through P (1, 2) with direction vector ~v = (2, −1). We don’t know
two points on the line, so we need to find a second point. We saw in Example 3.1(b) that a point-
parallel form equation for this line is ~x(t) = (1, 2) + t(2, −1). We can choose any value of t other
than 0 to get another point on the line. (Notice: We don’t want to use t = 0, because that will
just give us the point we already know. But any other value of t will do.) For instance, using
t = 1 we have ~x(1) = (1, 2) + 1(2, −1) = (1, 2) + (2, −1) = (3, 1), so we see that Q(3, 1) is another
point on the same line. Now that we know two points on the line, we can find a two-point form
equation. Notice, though, that since we have already been using t as the parameter for the point-
parallel form equation, we should use a different name for the parameter in the two-point form
equation. (Especially since we gave t a specific value. We wouldn’t want to get confused and think
the parameter in the two-point form equation was supposed to have that value too.) Notice also
that it doesn’t matter in the least what letter we use to represent the parameter (which is just a
scalar multiplier). So we can use s instead. We get:
(c) This time, we have the line through the origin with direction vector (0, 1). We know that the
point (0, 0) is on the line, and clearly the point (0, 1) is also on the line (because the vector (0, 1) is
on the line, since the line does pass through the origin). So a two-point form equation for this line
is
~x(t) = (1 − t)(0, 0) + t(0, 1)
Point-Normal Form
When two lines meet at right angles, we call them perpendicular. (You knew that.) And we
have already learnt that when two vectors are perpendicular, there’s another word we use. Instead
of saying they’re perpendicular, we say they are orthogonal. When we’re talking about a vector
and a line, there’s yet another word that we use. (This was mentioned earlier, but is now defined.)
So orthogonal and normal really just mean perpendicular, but the three words are used in different
contexts.
If ~n is a normal vector for a particular line, then it is orthogonal to any direction vector for the
line. We have already seen that if we know a direction vector for a line, and one point on the line,
we can write a vector equation for the line, in point-parallel form. Similarly, if we know a normal
vector for a line in ℜ2 , and one point on the line, we can write a vector equation for the line. We
call it the point-normal form of the line. The equation comes from the fact that the dot product
of two orthogonal vectors is 0.
Suppose ~n is a normal vector for a particular line, and P is a point on that line. Let X be
−−→
any other point on the line. Then the directed line segment P X lies on the line, and so the vector
−−→
~x − ~p, which is equivalent to P X, is parallel to the line. But then ~n is orthogonal to ~x − p~, and so
~n • (~x − p~) = 0. So if ~n = (n1 , n2 ) and the point is P (p1 , p2 ), then we have (n1 , n2 )• (~x − (p1 , p2 )) = 0.
This equation is the form we call point-normal. As always, it describes all the points X(x, y) which
lie on the line.
Definition: Let ℓ be any line in ℜ2 . If ~n = (n1 , n2 ) is a normal for line ℓ, and P (p1 , p2 )
is a point on line ℓ, then an equation for line ℓ in point-normal form is:
Example 3.5. Write an equation in point-normal form for the line through P (1, 2) with normal
~n = (−1, 1).
Solution:
We get the equation:
(−1, 1) • (~x − (1, 2)) = 0
Recall that in ℜ2 , the vector (b, −a) is orthogonal to the vector (a, b), because (b, −a) • (a, b) = 0.
This means that whenever we know a direction vector for a line in ℜ2 (i.e. a vector which is parallel
to the line) then we can easily find a normal for the line (i.e. a vector which is perpendicular to the
line). And vice versa. So it’s easy to find the point-parallel form of a line from the point-normal
form, and also to find the point-normal form from the point-parallel form.
36 Unit 3
Example 3.6.
(a) Find an equation in point-normal form for the line ~x(t) = (0, 1) + t(2, −1).
(b) Write an equation in point-parallel form for the line from Example 3.5.
(c) Write a point-normal form equation for the line with parametric equations
x=3+t and y = 2t − 4
Solution:
(a) We have ~x(t) = (0, 1) + t(2, −1), which we recognize as a point-parallel form equation for the
line through point (0, 1) parallel to the vector (2, −1). Since the vector (2, −1) is parallel to the
line, then the vector (1, 2), obtained by switching the components and changing one of the signs,
is perpendicular to the line. That is, ~n = (1, 2) is a normal for this line. So a point-normal form
equation for the line is
(1, 2) • (~x − (0, 1)) = 0
(b) In Example 3.5 we found the point-normal form equation (−1, 1)• (~x − (1, 2)) = 0 for a particular
line. Since (−1, 1) is a normal for this line, and the vector (1, 1) is orthogonal to (−1, 1), then the
vector (1, 1) is parallel to the line, i.e. is a direction vector for the line. And of course (1, 2) is a
point on the line. So a point-parallel form equation for this line is
(c) From the parametric equations of the line we can identify both a point on the line and a direction
vector for the line. Remember, the multiplier on t is the component of the direction vector, while
the number without a t is the coordinate of the known point. Keeping this in mind allows us to
correctly identify both the known point and the direction vector from the parametric equations,
even when they look a bit different than we expect.
x = 3+t
y = 2t − 4
We’re more accustomed to seeing the form we have in the x equation. The form in the y equation,
with the t term coming before the non-t term, is different. This is just done to avoid having a
“leading negative”. Equations look less tidy when the first thing on one side of the equation is a
negative sign, so mathematicians often avoid writing things that way. That is, the given parametric
equations are just a tidier form of
x = 3+t
y = −4 + 2t
In this form we see that the corresponding point-parallel form equation is ~x(t) = (3, −4) + t(1, 2).
So (1, 2) is a direction vector for the line and therefore (2, −1) is a normal for the line. Thus we can
write a point-normal form equation as
Standard Form
Using the point-normal form of a line, we can get another form of equation for a line in ℜ2 – one
which is already familiar to you. We get it by writing the vector ~x as ~x = (x, y) and distributing the
dot product over the bracket in the point-normal form equation. (The vector (x, y), as always, just
represents any vector whose endpoint (x, y) is a point on the line. That is, (x, y) is any unspecified
point on the line.) For any line with normal vector (n1 , n2 ) containing point (p1 , p2 ) we get
But if (n1 , n2 ) is a known normal vector to the line, and (p1 , p2 ) is a known point on the line (so that
n1 , n2 , p1 and p2 are all just numbers and we know which numbers they are), then (n1 , n2 ) • (p1 , p2 )
is just a number, i.e. is a known scalar, which we could call c. So we have n1 x + n2 y = c. Or,
to make this general form look more familiar to you, we could use a and b in place of n1 and n2
and write ax + by = c. Ah. You’ve seen that before, haven’t you? That’s the standard form of an
equation of a line.
And what we’ve seen is that in this kind of equation, the coefficients a and b are the components
of a normal vector for the line. And we also saw how to find the constant c.
Theorem 3.1. If ax + by = c is a standard form equation for line ℓ, then ~n = (a, b) is a normal
vector for line ℓ. Also, if P (p1 , p2 ) is a point on the line, then c = ~n • p~.
So if we know a point-normal equation for a line, i.e. if we know a normal for the line and we
know a point on the line, then it’s easy to find a standard form equation for the line. We simply
use the components of the normal as the coefficients of x and y, and use the normal vector and the
point to find the right hand side value, c. Likewise, if we have an equation of a line in standard
form, we can easily find a normal to the line, because the coefficients of x and y are the components
of a normal vector to the line. And then we just need to find any point on the line, to write a
point-normal form equation of the line.
Example 3.7. Write a standard form equation for the line in Example 3.5.
Solution:
In Example 3.5 we had the line through P (1, 2) with normal vector ~n = (−1, 1). We use the
components of the normal vector as the coefficients of x and y in the standard form equation, so we
have (−1)x + (1)y = c, or −x + y = c. We find the value of c using c = ~n • p~. We get
So the standard form equation is −x + y = 1. However, we don’t usually write something like this
with a leading negative. So we multiply the whole equation (i.e. both sides of the equation) by −1
to get rid of it. We get x − y = −1. (Notice: (1, −1) = −(−1, 1) = −~n is parallel to (collinear with)
~n, and is therefore another normal vector for this line.)
38 Unit 3
Solution:
We use the coefficients of x and y as the components of a normal vector for the line. Of course,
x − 2y = 1x + (−2)y, so the coefficients are 1 and −2. That is, we get ~n = (1, −2) as a normal vector
for the line. Now, we just need to find any point on the line. We plug in any convenient x-value
and solve for y. Or we plug in any convenient y-value and solve for x. For instance, when y = 0 we
have x − 2(0) = 5, so x − 0 = 5. That is, we see that when y = 0 we must have x = 5. So P (5, 0) is
a point on the line. Now we can write the point-normal form equation:
Example 3.9. Write a standard form equation of the line ~x(t) = (3, 2) + t(2, 7).
Solution:
From the given point-parallel form equation, we see that P (3, 2) is a point on the line and ~v = (2, 7)
is a direction vector for the line, i.e. is parallel to the line. And so ~n = (7, −2) is a normal vector to
the line, so the standard form equation has 7x − 2y = c for some value c. And we can find c using
Lines in ℜ3
Of course, we can have lines in 3-space, as well as in the plane. And there’s a lot that’s the same
in ℜ3 as it was in ℜ2 , so we use the same terminology and notation.
For instance, when we move from 2 dimensions to 3, it’s still true that given any 2 points, there
is exactly one line that passes through both those points. And the vector equivalent to the directed
line segment between those points is parallel to that line, so we still call it a direction vector for the
line. That is, we define the term direction vector the same way in ℜ3 as we did in ℜ2 .
As in ℜ2 , we can use a direction vector for a line (i.e. a vector parallel to the line) and any
one point on the line to write a point-parallel equation for the line. And from that we can write
parametric equations. Or we could write a 2-point form equation, instead.
The only difference is that now the points have 3 coordinates and the vectors have 3 components.
Of course, for parametric equations this means that we have a third equation, corresponding to the
z components of the vectors.
Definition: Let P (p1 , p2 , p3 ) and Q(q1 , q2 , q3 ) be any points in ℜ3 and let ~v = (v1 , v2 , v3 )
be any vector in ℜ3 . Then:
1. If ℓ is the line which passes through P parallel to ~v (so that ~v is a direction vector
for ℓ), then
~x(t) = (p1 , p2 , p3 ) + t(v1 , v2 , v3 )
is an equation for line ℓ in point-parallel form.
2. If line ℓ passes through point P and ~v is a direction vector for ℓ, then parametric
equations of line ℓ are:
x = p1 + tv1
y = p2 + tv2
z = p3 + tv3
3. If points P and Q are both on line ℓ then a two-point form equation for ℓ is
Example 3.10. Let ℓ be the line which passes through the points P (1, 2, 3) and Q(1, −1, 1). Write
equations of line ℓ in two-point form and in point-parallel form.
Solution:
In two-point form, we get the equation for ℓ:
For a point-parallel form equation of line ℓ we first need to find a direction vector for ℓ. The directed
−−
→
line segment P Q is equivalent to
which is parallel to (and hence is a direction vector for) ℓ. Using this direction vector and the point
P which we know is on the line, we get
(Of course, we could have used point Q instead of point P to write the point-parallel form equation.
Likewise, we could have used ~p−~ q = (0, 3, 2) as the direction vector. And in the two-point form
equation, we could have switched the roles of P and Q.)
Example 3.11. Write parametric equations for the line through the point (0, 1, −1) which is parallel
to ~v = (2, 1, 0).
Solution:
An equation of the line in point-parallel form is ~x(t) = (0, 1, −1) + t(2, 1, 0). This tells us that a
point (x, y, z) is on this line if there is some value of t for which (x, y, z) = (0, 1, −1) + t(2, 1, 0). So
it must be true that, for the same value of t, we have
x = 0 + 2t
y = 1 + 1t
z = −1 + 0t
40 Unit 3
x = 2t
y = 1+t
z = −1
Example 3.12. ℓ1 is the line ~x(t) = (1 − t)(2, 1, −1) + t(0, 1, 2). ℓ2 is the line with parametric equa-
tions x = 2t − 2, y = 1, z = 5 − 3t. Are ℓ1 and ℓ2 the same line?
Solution:
Hmm. That’s different. Let’s see. For ℓ1 we recognize that what we’ve been given is a two-point
form equation. (We can tell because of the (1 − t) multiplier.) From it we can see that P (2, 1, −1)
and Q(0, 1, 2) are two points on line ℓ1 . This also tells us that the vector
−−→
~v = P Q = ~q − p~ = (0, 1, 2) − (2, 1, −1) = (−2, 0, 3)
is parallel to line ℓ1 .
For ℓ2 we’re given parametric equations. It may be helpful to write these equations all the same
way, with “constant + multiple of t” on the right hand side. We have:
x = 2t − 2 x = −2 + 2t
y = 1 ⇒ y = 1 + 0t
z = 5 − 3t y = 5 + (−3)t
From the rearranged set of equations, using our knowledge of the form of parametric equations, we
see that the point on ℓ2 used to write these parametric equations is R(−2, 1, 5). Also, the direction
vector used for these equations is ~u = (2, 0, −3).
Since ~u = (2, 0, −3) = −(−2, 0, 3) = −~v , we see that these vectors are scalar multiples of one an-
other, so they are collinear. That is, the direction vector ~u used to write the equation of ℓ2 is parallel
to the vector which we know is parallel to ℓ1 . Therefore ~u is also parallel to ℓ1 , and thus lines ℓ1
and ℓ2 are parallel to one another. It’s possible that they could be the same line. How can we tell
whether they are?
Since ℓ1 and ℓ2 are parallel, then either they have no points in common or else they are the same
line and have all points in common. So all we need to do is determine whether any point which is
known to be on one line is also on the other. If it is, then they are actually the same line. But if it
isn’t, then they must be different, but parallel, lines.
We know that the point P (0, 1, 2) is on line ℓ1 . Is it also on line ℓ2 ? If it is, then (x, y, z) = (0, 1, 2)
must satisfy the parametric equations for ℓ2 , using the same value of t for each component (equation).
Since the second coordinate of P is 1, the equation y = 1 is satisfied. For the first coordinate, we
see that we need to have x = 2t − 2 satisfied for x = 0. This gives
0 = 2t − 2 ⇒ 0 + 2 = 2t ⇒ 2t = 2 ⇒ t=1
z = 5 − 3(1) = 5 − 3 = 2
Since z = 2 is the third coordinate of point P , we see that the point (x, y, z) = (0, 1, 2) does satisfy
the parametric equations of ℓ2 . That is, we have
So now we know that ℓ1 and ℓ2 are parallel lines, with a point in common, which means that they
must have all points in common and be the same line. That is, since ℓ1 and ℓ2 are parallel and
intersect at point P (0, 1, 2), they must intersect at all other points as well, and actually be the same
line.
Planes in ℜ3
x = 2+t
y = −2 − t
z = 0
But those aren’t the only points which satisfy that equation. For instance, the point R(1, 0, −1)
also satisfies this equation. And this point is not on ℓ1 . The easiest way to tell is because from the
parametric equations of ℓ1 we can see that every point on ℓ1 has z = 0, but the third coordinate of
point R isn’t 0, so it is not a point on ℓ1 .
Hmm. Every point on line ℓ1 satisfies x + y + z = 0, but it’s not true that every point that
satisfies x + y + z = 0 is on line ℓ1 . So x + y + z = 0 cannot be an equation of line ℓ1 . Then what
is it? Well, actually, it’s the equation of a plane. ℜ2 (i.e. 2-space) is just a single plane. But ℜ3 ,
which is to say 3-space, contains infinitely many planes. (For instance, think of the walls, ceilings
and floors of the building you’re in. And also every other building you ever have been or ever could
be in. And all the ramps you’ve ever seen. And what those ramps would look like if they were
knocked off kilter. And ... Each of those things lies in some particular plane in ℜ3 , and there are
many other planes besides those.)
So x + y + z = 0 is an equation of a plane. Let’s call it the plane Π. (Planes are often named
Π, which is just the Greek letter P, just like lines are named ℓ. ℓ for line, Π for plane. Same idea.)
Any plane contains infinitely many lines. One of the lines that lies in the particular plane Π we’ve
been talking about is the line ℓ1 . But there are many others. For instance, we saw that P (2, −2, 0)
and R(1, 0, −1) both lie on this plane, so the line on which those points lie is another line in plane
Π. We can call that one ℓ2 . And the vector ~r − ~p = (1, 0, −1) − (2, −2, 0) = (−1, 2, −1) is parallel to
ℓ2 so we can express ℓ2 as ~x(t) = (1, 0, −1) + t(−1, 2, −1).
42 Unit 3
Notice that x + y + z = (1, 1, 1) • (x, y, z). Let’s think about the vector (1, 1, 1) whose components
are the coefficients in the equation of the plane Π. We know that (1, −1, 0) is parallel to line ℓ1 ,
which lies in plane Π. Notice that (1, 1, 1) • (1, −1, 0) = 1(1) + 1(−1) + 1(0) = 1 − 1 + 0 = 0,
so the vector (1, 1, 1) is a normal for (i.e. is perpendicular to) line ℓ1 . Likewise, we know that
(−1, 2, −1) is parallel to line ℓ2 , which also lies in plane Π. Notice that (1, 1, 1) • (−1, 2, −1) =
1(−1) + 1(2) + 1(−1) = −1 + 2 − 1 = 0, so the vector (1, 1, 1) is also a normal for (perpendicular to)
line ℓ2 . However (1, −1, 0) is not a scalar multiple of (−1, 2, −1), so those vectors aren’t orthogonal
(i.e. parallel) and therefore lines ℓ1 and ℓ2 aren’t parallel to one another. How can the same vector
be perpendicular to both? Well, by being perpendicular to the whole plane in which both lines lie.
This vector (1, 1, 1) is actually perpendicular to, i.e. a normal for, the plane Π.
At this point, it may not be too surprising to you to learn that if we write an equation using a
normal vector and a point in ℜ3 , what we get is an equation of a plane. That is, if we extend to ℜ3
the idea of the point-normal form of an equation of a line in ℜ2 , the result is not an equation of a
line in ℜ3 , but rather an equation of a plane in ℜ3 . This point-normal form equation of a plane in
3-space looks just like the point-normal form equation of a line in 2-space, except that the vectors
have 3 components instead of 2. So it’s the presence of that third component that distinguishes a
point-normal form equation of a plane (with 3 components in the vectors) from a point-normal form
equation of a line (with only 2 components in the vectors).
Example 3.13. Write an equation of the plane containing P (1, 2, 3) with normal vector ~n = (1, 0, −1)
in point-normal form.
Solution:
The form of the equation is ~n • (~x − p~) = 0, so we get
Example 3.14. Write a point-normal form equation of the plane Π which contains the lines ℓ1 rep-
resented by ~x(t) = (2, −2, 0) + t(1, −1, 0) and ℓ2 represented by ~x(s) = (1, 0, −1) + s(−1, 2, −1).
Solution:
In order to write a point-normal form equation of the plane Π, we need a normal vector for the
plane and a point that lies in the plane. Of course, the point-parallel form equations of ℓ1 and ℓ2
each give us a point that lies in the plane. That is, we know that (2, −2, 0) is a point in this plane,
because this is a point on ℓ1 which lies in plane Π, and likewise that (1, 0, −1) is another point on
plane Π, because it is a point on line ℓ2 , which also lies in this plane. So we can use either one of
these points in our equation of Π.
Unit 3 43
How do we find a normal for the plane? Well, we know from the equation of ℓ1 that the vector
~u = (1, −1, 0) is parallel to ℓ1 , and likewise from the equation of ℓ2 that the vector ~v = (−1, 2, −1)
is parallel to ℓ2 . Of course any vector ~n which is a normal for Π (i.e. is perpendicular to this plane)
must be perpendicular to any line that lies within Π. So if ~n is a normal for Π, then ~n is perpen-
dicular to both ℓ1 and ℓ2 and therefore must be orthogonal to both ~u and ~v . (That is, any vector
which is perpendicular to ℓ1 is also perpendicular to (orthogonal to) every vector that is parallel to
ℓ1 . And similarly for ℓ2 .)
So how do we find a vector which is perpendicular to both ~u and ~v ? Well that’s easy. We know
that the vector ~u × ~v is perpendicular to both ~u and ~v . So we can use
(Recall that we discussed previously that the vector (1, 1, 1) was a normal for the plane containing
these lines ℓ1 and ℓ2 .)
Now we know both a normal vector for Π and a point in plane Π so we can write the point-normal
form equation. We get
(1, 1, 1) • (~x − (2, −2, 0)) = 0
Consider any point-normal form equation of a plane. For instance, let’s work with the one we
found in that last example. If we express the vector ~x as ~x = (x, y, z) and carry through the vector
arithmetic, what do we get? Well let’s see:
Ah, yes. We’ve seen that before. As an equation of the plane in which we found ℓ1 and ℓ2 to
lie. That is, we started out our discussion of planes by wondering what x + y + z = 0 was the
equation of, and we realized it was a plane. The equation looks just like the standard form equa-
tion of a line in ℜ2 , except it has a z in it. So this form is called the standard form equation of a plane.
That is, as we said earlier, if we have an equation in the same form as a standard form equation
of a line, but with z in it as well as x and y, then this equation isn’t describing a single line. It is
describing a whole plane in ℜ3 . And now we see that the coefficients of x, y and z in this equation
are the components of a normal vector for that plane.
Example 3.15. Write an equation in standard form for the plane with normal vector ~n = (1, 2, 3)
which contains the point P (0, −1, 2).
Solution:
Since ~n = (1, 2, 3) is a normal vector for the plane, then the standard form equation must have the
form 1x + 2y + 3z = d for some scalar d. But of course we would write that as x + 2y + 3z = d.
How can we find the value of d? Well, we know that the point P (0, −1, 2) lies on the plane, so
(x, y, z) = (0, −1, 2) must satisfy this equation. That is, we plug in x = 0, y = −2 and z = 2 to find
the value of d. We get:
x + 2y + 3z = d ⇒ 0 + 2(−1) + 3(2) = d ⇒ −2 + 6 = d ⇒ d = 6 − 2 = 4
Notice: We could have used ~n • ~p = d, from rearranging the point-normal equation for the plane.
What we did here is just another explanation of the exact same arithmetic. (Look back at the ex-
amples in which we found point-normal equations of lines. We could have described the arithmetic
we did there as “let x = p1 and y = p2 ” instead of “find ~x • p~ ”.)
In Example 3.14, we used the fact that if we know 2 vectors which are direction vectors for 2
lines contained in a plane, their cross product gives a normal for the plane. However that only works
if the 2 vectors are non-collinear. That is, if the 2 vectors lie on the same line, then there are other
vectors in the same plane that are orthogonal to both vectors. It’s only if the 2 vectors are not
parallel to one another that we can be sure that any (non-zero) vector which is perpendicular to
both must be perpendicular to the whole plane.
Theorem 3.2. If ~u is a direction vector for a line in some plane Π, and ~v is a direction vector for
another line in plane Π, where ~u and ~v are not collinear, then ~n = ~u ×~v is a normal vector for plane
Π.
We know that for any two points (whether in ℜ2 or in ℜ3 ), there is exactly one line which con-
tains those two points. Consider 3 points in ℜ3 . If the 3 points are all collinear, i.e. if they all lie
on the same line, then there are many planes which contain those 3 points. (The infinitely many
different planes that intersect along that line.) But if the 3 points are not all collinear, then there is
only one plane that contains all three points.
Example 3.16. Find both a point-normal form equation and a standard form equation of the plane
determined by the points P (−1, 0, 1), Q(1, 2, 3) and R(2, −1, 5).
Solution:
The line passing through points P and Q lies in this plane, and so any vector parallel to that line
is also parallel to the plane. And if we let ~u = ~q − ~p, then ~u is such a vector. Similarly, the vector
~v = ~r − ~p is parallel to the line which passes through both P and R, and since that line also lies in
the plane, ~v is another vector which is parallel to the plane we need to describe. Also, we have
and we can see that since ~u and ~v are not scalar multiples of one another then they are not collinear.
We use these two non-collinear vectors which are both parallel to the plane to find a normal for the
plane:
~n = ~u × ~v = (8 − (−2), 6 − 8, −2 − 6) = (10, −2, −8)
Now we use this normal vector and any one of the three points to write a point-normal equation of
the plane. For instance, using point P , the form ~n • (~x − p~) = 0 gives:
Finally, we can also rearrange this equation to standard form. Letting ~x = (x, y, z), we get:
(10, −2, −8) • ((x, y, z) − (−1, 0, 1)) = 0 ⇒ (10, −2, −8) • (x, y, z) − (10, −2, −8) • (−1, 0, 1) = 0
⇒ 10x − 2y − 8z = (10, −2, −8) • (−1, 0, 1)
⇒ 10x − 2y − 8z = −10 + 0 − 8
⇒ 10x − 2y − 8z = −18
(Note: We might prefer to divide through the equation by 2. That is, this plane would often be
expressed as 5x − y − 4z = −9.)
Suppose we have some particular plane Π. Consider any point P which does not lie on this plane.
How far is this point from the plane? That is, what is the shortest distance from the point to the
plane?
The shortest way to get from the plane to point P is to start from the point on the plane which
is nearest to point P . This will be the point P ′ on the plane such that the directed line segment
from P ′ to P is normal (i.e. perpendicular) to Π. And the −−(shortest) distance from P to the plane
→
will just be the length of that directed line segment, i.e. P ′ P . However we don’t want to have to
actually find the point P ′ , which is the point on the plane that is nearest to P , in order to find the
distance from P to the plane. And in fact we don’t need to.
Let plane Π have normal vector ~n and let Q be any known point on plane Π. Then it can be
shown that the distance from P to P ′ is given by:
−−→ −
→ |~n • (~q − p~)|
′
P P = ~p − p′ =
||~n||
46 Unit 3
That is, we simply need to find the dot product of any normal vector to the plane with the
vector equivalent to the directed line segment between the point P and any known point on the
plane, discard the negative sign (if there is one), and divide by the magnitude of the normal vector
used.
−−→
(Notice: We have not explained why this gives P ′ P , so you should not be trying to understand
that from the above. If you’re interested, look at the explanation given in the text. All we’ve done
here is to assert that it can be shown that this is true.)
Theorem 3.3. Consider any plane Π. Let ~n be any normal vector for plane Π and let Q be any
point on plane Π. Consider any other point P which is not on the plane Π. Then the distance
between point P and plane Π is given by:
|~n • (~q − ~p)|
distance =
||~n||
Example 3.17. Find the distance between the point P (1, 2, 3) and the plane with point-normal form
equation (1, 2, 1) • (~x − (3, −1, 0)) = 0.
Solution:
We simply plug ~n = (1, 2, 1), p~ = (1, 2, 3) and ~q = (3, −1, 0) into the formula. We have:
Example 3.18. Find the distance from the origin to the plane x + y − z = 5.
Solution:
The origin is the point P (0, 0, 0). Notice that we know that this point is not on the plane because
(x, y, z) = (0, 0, 0) does not satisfy the equation of the plane. Also, we recognize from the standard
form equation that ~n = (1, 1, −1) is a normal for the plane. So now we just need to find any point
that’s on the plane. Simply pick any values of x, y and z that, when taken together, do satisfy the
equation of the plane. For instance, for the point (x, y, z) = (5, 0, 0) we get x + y − z = 5 + 0 − 0 = 5,
so Q(5, 0, 0) is a point on the plane.
Now we just use the formula. We see that the distance from the origin to the plane x + y − z = 5 is:
|~n • (~
q − p~)| |(1, 1, −1) • ((5, 0, 0) − (0, 0, 0))| |(1, 1, −1) • (5, 0, 0)| 5
= = p = √
||~n|| ||(1, 1, −1)|| 2 2
1 + 1 + (−1) 2 3
Notice: If we accidentally try to find the distance from a point P to a plane Π for a point P which
actually lies on the plane Π, we will simply get the answer 0. In that case the directed line segment
from P to another point Q known to be on the plane lies on the plane, so any normal for the plane
is orthogonal to ~q − p~. And we know that if two vectors are orthogonal then their dot product is 0,
so the numerator of the formula will be 0, giving the answer 0 as the distance.
Unit 3 47
We have already seen some ways in which planes in ℜ3 behave like lines in ℜ2 . That is, we
know that in ℜ2 , a point-normal form equation corresponds to a line, but the similar form in ℜ3
corresponds to a plane. And the same is true for a standard form equation. In ℜ2 it represents a
line, but in ℜ3 it represents a plane. And both of these forms use a normal vector. In fact, in ℜ2
we talk about a normal vector for a line, but in ℜ3 we only talk of a normal vector for a plane.
So it may not surprise you to learn that the same distance formula given in Theorem 3.3, which
refers to the distance between a point and a plane in ℜ3 , can also be used in ℜ2 , but there it gives
the distance between a point and a line.
Theorem 3.4. Consider any line ℓ. Let ~n be any normal vector for line ℓ and let Q be any point
on line ℓ. Consider any other point P which is not on line ℓ. Then the distance between point P
and line ℓ is given by:
Example 3.19. Find the distance between the point P (1, 2) and the line ℓ described by 2x + y = 1.
Solution:
Line ℓ has normal ~n = (2, 1). We need to find some point Q on line ℓ. Letting x = 0 we get
2(0) + y = 1, so y = 1. That is, the point on line ℓ which has x-coordinate 0 has y-coordinate 1, so
the point Q(0, 1) is a point on line ℓ. (Notice that for (x, y) = (1, 2) we have 2x+y = 2(1)+2 = 4 6= 1,
so P (1, 2) is not on line ℓ.)
|~n • (~
q − p~)| |(2, 1) • ((0, 1) − (1, 2))| |(2, 1) • (−1, −1)| | − 2 − 1| 3
= = √ = √ =√
||~n|| ||(2, 1)|| 2
2 +1 2 4+1 5
Any 2 lines which lie in the same plane and are not parallel intersect at exactly 1 point. To find
this point of intersection, we simply need to find a point which is on both lines. We use parametric
equations of at least one of the lines when we do this. We can either: (1) Use parametric equations of
both lines to equate corresponding components of a point, and solve for the values of the parameters
which satisfy all of those equations; or (2) Use parametric equations of one line and a standard
form equation for the other line (if they are lines in ℜ2 ), and substitute for x and y in terms of the
parameter, then solve for the value of the parameter.
48 Unit 3
Example 3.20. Find the point of intersection of the line ℓ1 : ~x(t) = (1, 0) + t(2, 1) with the line ℓ2 :
~x(s) = (1, 1) + s(−1, 0).
Solution:
x = 1 + 2t x = 1−s
For ℓ1 we have parametric equations and for ℓ2 we have .
y = t y = 1
If some point P (x, y) is on both these lines, then it must be true that there are some values of t and
s which give the same values of x and y. So we must have 1 + 2t = 1 − s and t = 1. Since t = 1,
then 1 + 2t = 3, so 1 − s = 3 and we see that s = 1 − 3 = −2.
Notice that we’ve found values of the parameters, s and t, but we have not yet found the point on
the line which corresponds to these values. That is, we know the value of t that gives the point on
line ℓ1 at which the two lines intersect, and likewise we know the value of s that gives that same
point on line ℓ2 . But we were asked to find the actual point at which the two lines intersect. We’re
not finished until we’ve done that. And we have more information than we need to find the point,
since we know two ways to get it. So we can use the value of t we found, in the equation for ℓ1 , to
get the point P . And then we can use the value of s we found, in the equation of ℓ2 , to check our
work. We get:
as the point on ℓ1 which we were looking for. We check that the point on ℓ2 is the same point:
Since we did find the same point on each line, this is the point we were looking for. We see that ℓ1
and ℓ2 intersect at the point P (3, 1).
Note: As we observed above, we found values of both parameters, but really we only need one.
As we have seen, the other allows us to check our work. We’re just checking that we didn’t make
an arithmetic error. If we got a different point on ℓ2 than the one on ℓ1 that would tell us that
somewhere in our calculations we made an arithmetic mistake. Either in finding the points, or (more
likely) in finding the values of the parameters. We would need to re-do our calculations until we
find the mistake, and then finish the problem (including the check) again.
Example 3.21. Find the point of intersection of the line ℓ1 : ~x(t) = (1, 1, 2) + t(2, 1, −1) with the line
ℓ2 : ~x(s) = (0, 1, 2) + s(1, −1, 1).
Solution:
x = 1 + 2t x = s
For ℓ1 we have y = 1 + t and for ℓ2 we have y = 1−s .
z = 2−t z = 2+s
The point of intersection of ℓ1 and ℓ2 is a point P (x, y, z) which satisfies both sets of equations at
the same time, so we must have:
1 + 2t = s (1)
1+t = 1−s (2)
2−t = 2+s (3)
Equation (1) says that s = 1 + 2t, so that 1 − s = 1 − (1 + 2t) = 0 − 2t = −2t. Therefore equation
(2) gives 1 + t = −2t, so 1 = −3t and thus t = − 31 . And then substituting t = − 31 into s = 1 + 2t
Unit 3 49
gives s = 1 + 2 − 31 = 1 − 2
= 13 . Checking these values in (3) we get
3
1 1 6 1 7
2−t = 2− − =2+ = + =
3 3 3 3 3
1 7
and 2 + s = 2+ =
3 3
so it’s true that 2 − t = 2+s
1
Now, using s = 3 in the equation for ℓ2 we get the point
1 0 3 6 1 1 1 1 2 7
(x, y, z) = (0, 1, 2) + s(1, −1, 1) = (0, 1, 2) + (1, −1, 1) = , , + ,− , = , ,
3 3 3 3 3 3 3 3 3 3
Note: Any 2 non-parallel lines in ℜ2 always intersect at a single point. For 2 non-parallel lines in
ℜ3 , there are 2 possibilities. If they lie in the same plane, then they intersect at a single point. But
they may lie in parallel planes and not intersect at all. In that case, it will be impossible to find
values of s and t that satisfy all 3 equations at the same time.
As stated earlier, if we wish to find the point of intersection of 2 lines in ℜ2 and one of the
lines is in point-normal form or standard form, we don’t need to find a direction vector and write
parametric equations of that line. Instead we can just use the standard form directly, as shown in
the next example.
Example 3.22. Find the point of intersection of the lines ℓ1 : ~x(t) = (1, 0)+t(3, −1) and ℓ2 : 2x−y = 5.
Solution:
x = 1 + 3t
For ℓ1 we have . Substituting for x and y in the standard form equation for ℓ2 we
y = −t
have:
3
2x − y = 5 ⇒ 2(1 + 3t) − (−t) = 5 ⇒ 2 + 6t + t = 5 ⇒ 7t = 3 ⇒ t=
7
Of course, we should check that this really is a point on ℓ2 (i.e. check for arithmetic mistakes). For
(x, y) = 16 3
7 , − 7 we get:
16 3 32 3 35
2x − y = 2 − − = + = =5
7 7 7 7 7
If line ℓ lies in plane Π, then all points on line ℓ are on plane Π. If line ℓ lies on a plane parallel
to plane Π, then no point on line ℓ lies on plane Π. But if line ℓ does not lie on plane Π or on any
plane parallel to Π, then ℓ intersects Π at a single point. That is, there is only a single point on line
ℓ which lies on plane Π.
To find the point of intersection of line ℓ with plane Π, we use parametric equations of ℓ to
express the coordinates of the point in terms of the parameter, then use the standard form equation
of Π to solve for the value of the parameter. (This is exactly what we did in Example 3.22. But
now there’s z as well as x and y.)
Example 3.23. Find the point at which the line ℓ described by ~x(t) = (1, 0, 1) + t(3, 2, 1) intersects
the plane Π described by x + y − 2z = 3.
Solution:
Parametric equations of line ℓ give:
x = 1 + 3t
y = 2t
z = 1+t
x + y − 2z = 3 ⇒ (1 + 3t) + 2t − 2(1 + t) = 3 ⇒ 1 + 3t + 2t − 2 − 2t = 3
⇒ 3t = 3 − (−1) = 4 ⇒ t = 34
Notice: This procedure only works when the line ℓ does not lie in the plane Π or in any plane parallel
to Π. If line ℓ lies in plane Π, then when we substitute into the equation for Π and try to solve for
t, the t’s will all disappear and we’ll be left with something like 3 = 3. This equation is satisfied for
all values of t, telling us that all points on ℓ satisfy the equation for Π. But if line ℓ lies on a plane
parallel to Π, then again all the t′ s will disappear, but we’ll be left with something like 2 = 3. This
equation isn’t satisfied for any value of t, telling us that there is no value of t for which ~x(t) is a
point on plane Π.
Math 1229A/B
Unit 4:
Vectors in ℜm
(text reference: Section 2.1)
c
V. Olds 2010
Unit 4 51
4 Vectors in ℜm
We’ve learnt about ℜ2 and ℜ3 , which you’ve seen before. Now, we’re going to extend some of
the same ideas to spaces with more than 3 dimensions. We refer to m-dimensional space as ℜm or
m-space. “But”, you’re saying to yourself, “how can you have more than 3 dimensions?”. Well, for
starters, time is often referred to as the fourth dimension of physical space. The world we live in
has depth, breadth and height, and also time. Where you’re sitting right this minute is a particular
location that could be described by x-, y- and z-coordinates. But you’re in that location at this
particular instant. At other times you weren’t. However the location still existed. So time is another
axis along which space can be measured.
And after that? Well, for higher dimensions, it’s just much easier if you don’t try to relate it
to physical space. Because it tends to make your brain hurt if you try to actually picture ℜ5 or
ℜ6 or ℜ20 , and so forth. But to mathematicians, that’s no reason not to talk about them. The-
oretically, there could be more dimensions. And even if there aren’t, there are still situations in
which it is useful to use constructs which correspond to things we’ve already seen (points, vectors,
etc.) but with more parts to them, as if they were from a higher-dimensional space. For instance, a
company which produces 10 different products might have a use for considering a “point” with 10
coordinates, where each coordinate corresponds to a different one of the 10 products and indicates,
for instance, how many of that product are in stock at the moment. There are many uses of the
kind of mathematical constructs we’ve been working with, other than literally as points in space
or directed line segments, in which the kind of arithmetic we’ve been doing still makes sense. And
if these constructs aren’t being used to represent physical space, then there’s no reason that they
would need to be limited to 3 dimensions.
We start our study of ℜm with a number of definitions, which will all look familiar to you from
ℜ and ℜ3 . These definitions just extend the construct we’ve been using, vectors, to higher dimen-
2
sional space, and then extend (most of) the things we were doing with them, i.e. tell us how to
do arithmetic etc. with these higher-dimensional vectors. And along the way we’ll also see some
theoretical results. But the arithmetic works just the same in higher dimensions as it did in ℜ2 and
ℜ3 , and the theorems will look familiar too, so it will be really easy for you to learn this. There’s
not much that’s really new in this unit. It’s just a matter of getting used to the idea of ℜm for m > 3.
Definition: For any positive integer m ≥ 2, we use ℜm , also called m-space, to denote
the set of all ordered m-tuples ~u = (u1 , u2 , ..., um ), where each value ui may be any real
number (i.e. for all ui ∈ ℜ). For any ~u ∈ ℜm , we refer to ~u as a vector, or an m-vector.
The numbers u1 , u2 , ..., um are called the components of the m-vector ~u.
Note: As we have already learnt, the terminology is sometimes a bit different for ℜ2 and ℜ3 . For
instance, we generally refer to an ordered pair or (rarely) ordered duple in ℜ2 and an ordered triple
in ℜ3 , rather than saying ordered 2-tuple or ordered 3-tuple. Also note that, as we have been doing
with vectors in ℜ2 and ℜ3 , when a vector is given a certain name, such as ~u, the components (when
not specific numbers) are assumed to be named with the same letter, with subscripts. For instance
referring to the vector ~u we would understand that the first component is called u~1 , the second
component is called u2 , and so forth. The only exception to this is, again as we’ve already seen,
that often in ℜ2 we use ~x = (x, y) and likewise in ℜ3 we use ~x = (x, y, z).
Examples: (1, 2, 3, 4) and (−0.2, 61.3, 0.04, 0) are both vectors in ℜ4 . (5, 3, 6, 3, −1, −10) is a 6-vector.
If ~v ∈ ℜ5 , then ~v = (v1 , v2 , v3 , v4 , v5 ).
52 Unit 4
Definition: The zero vector in ℜm is the m-vector whose components are all 0. Also,
two m-vectors are equal if and only if their corresponding components are identical.
That is, for any ~u, ~v ∈ ℜm , ~u = ~v if and only if u1 = v1 , u2 = v2 , ..., and um = vm .
Note: Only vectors which have the same number of components can be equal. That is, ~u can only
be equal to ~v if they are both m-vectors, for the same value of m. (And of course will only actually
be equal if their corresponding components are identical.)
Examples: (0, 0, 0, 0) is the zero vector in ℜ4 , and for ~0 ∈ ℜ12 we have ~0 = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0).
If we have ~u = (1, 2, c, 3, −1) and ~v = (a, 2, 8, d, −1) and we know that ~u = ~v , then we must have
a = 1, c = 8 and d = 3.
Definition: The distance between two m-vectors ~u and ~v is given by the distance
formula:
p
d(~u, ~v ) = (v1 − u1 )2 + (v2 − u2 )2 + ... + (vm − um )2
Note: These are precisely analogous to the notation, terminology and formulas we had for the dis-
tance between 2 vectors, and the magnitude of a vector, in ℜ2 and in ℜ3 .
Example 4.1.
(a) Show that (0, 0, 0, 0, −1, 0, 0, 0) is a unit vector.
(b) Show that there are no real numbers a and b for which ~u = (1, 1, a, b) is a unit vector.
Solution:
(a) We have
p
||(0, 0, 0, 0, −1, 0, 0, 0)|| = 02 + 02 + 02 + 02 + (−1)2 + 02 + 02 + 02
√
= 0+0+0+0+1+0+0+0
√
= 1=1
Example 4.2. If ~u = (1, 2, 3, 4) and ~v = (0, −1, 12, 7), find the distance between ~u and ~v and also
find the magnitude of each.
Solution:
The distance between ~u and ~v is
p p
d(~u, ~v ) = (0 − 1)2 + (−1 − 2)2 + (12 − 3)2 + (7 − 4)2 = (−1)2 + (−3)2 + (9)2 + (3)2
√ √
= 1 + 9 + 81 + 9 = 100 = 10
For the magnitudes of ~u and ~v we get
p √ √
||~u|| = ||(1, 2, 3, 4)|| = 12 + 22 + 32 + 42 = 1 + 4 + 9 + 16 = 30
p √ √
and ||~v || = ||(0, −1, 12, 7)|| = 02 + (−1)2 + (12)2 + 72 = 0 + 1 + 144 + 49 = 194
Solution:
We are being asked to find the length, i.e. the magnitude, of this vector. We get:
p √ √
||(1, 1, −1, 0, −1)|| = 12 + 12 + (−1)2 + 02 + (−1)2 = 1 + 1 + 1 + 0 + 1 = 4 = 2
We see that the vector (1, 1, −1, 0, −1) is 2 units long.
Definition: For any scalar c and any m-vector ~u, the scalar multiple of ~u by c is
c~u = (cu1 , cu2 , ..., cum )
For the scalar −1, the scalar multiple of any ~u ∈ ℜm by −1 is called the negative of ~u,
denoted −~u, so that
−~u = (−u1 , −u2 , ..., −um )
If two m-vectors are scalar multiples of one another, we say that they are parallel.
Note: Again, this is the same terminology and notation, and similar calculations, as we had in ℜ2
and ℜ3 . To find a scalar multiple of an m-vector, we multiply each component of that vector by
the scalar. And to find the negative of an m-vector, we multiply each component by −1, i.e. switch
the sign of each component. Also, we use the term parallel to describe two m-vectors which have
the same property that would lead us to conclude that they were parallel if they were vectors with
fewer components.
Solution:
We simply need to multiply the components of each vector by the corresponding scalar:
−~u = −(1, 2, −1, 2) = (−1, −2, −(−1), −2) = (−1, −2, 1, −2)
5~v = 5(2, 0, 3, 0, 1) = (5 × 2, 5 × 0, 5 × 3, 5 × 0, 5 × 1) = (10, 0, 15, 0, 5)
−2w
~ = −2(1, 0, 1, 0, 0, 3, 2)
= (−2 × 1, −2 × 0, −2 × 1, −2 × 0, −2 × 0, −2 × 3, −2 × 2)
= (−2, 0, −2, 0, 0, −6, −4)
Of course, we could have found −~u more quickly by realizing that we just needed to switch the sign
of each component:
−~u = −(1, 2, −1, −2) = (−1, −2, 1, 2)
54 Unit 4
Definition: Two m-vectors are said to be collinear if and only if each is a scalar
multiple of the other. That is, ~u and ~v ∈ ℜm are collinear if and only if there is some
scalar c such that ~u = c~v . If two non-zero m-vectors ~u and ~v are collinear, so that ~u = c~v ,
then they are said to have the same direction if c > 0 and are said to have opposite
directions if c < 0.
Examples: The vectors (1, 2, 3, 4) and (2, 4, 6, 8) are collinear because (2, 4, 6, 8) = 2(1, 2, 3, 4). These
vectors have the same direction. Also, for ~u = (1, 0, −1, 0, 1) and ~v = (−10, 0, 10, 0, −10), ~v = −10~u
and so ~u and ~v are collinear, with directions opposite to one another.
Solution:
The arithmetic is easier if we realize that (4, 8, −20, 12) = 4(1, 2, −5, 3). We get
p √ √
||(4, 8, −20, 12)|| = ||4(1, 2, −5, 3)|| = |4|||(1, 2, −5, 3)|| = 4 12 + 22 + (−5)2 + 32 = 4 1 + 4 + 25 + 9 = 4 39
Example 4.6. If ~u = (1, −1, 2, −2, 3, −3) and ~v = −5~u, find ||~v ||.
Solution:
It will be easier to find the magnitude of ~u and use that to find the magnitude of ~v , rather than to
find the magnitude of ~v directly. We get:
||~v || = || − 5~u|| = | − 5|||~u|| = 5||(1, −1, 2, −2, 3, −3)||
p
= 5 12 + (−1)2 + 22 + (−2)2 + 32 + (−3)2
√
= 5 1+1+4+4+9+9
√ √ √ √ √ √
= 5 28 = 5 4 × 7 = 5 4 7 = 5(2) 7 = 10 7
Example 4.7. If ~u = (1, −1, 1, 0), find a unit vector in the opposite direction to ~u.
Solution:
Let ~v be a unit vector in the opposite direction to ~u. Then ~v is a scalar multiple of ~u, i.e. ~v = c~u,
for some scalar value c < 0, such that ||~v || = 1. Since ~v = c~u, then we get
||~v || = ||c~u|| = |c|||~u||
and so ||~v || = 1 gives |c|||~u|| = 1, which means that we must have |c| = ||~u1 || . (Notice that we have
previously observed this in ℜ2 and ℜ3 .) For ~u = (1, −1, 1, 0), we have
p √ √
||~u|| = 12 + (−1)2 + 12 + 02 = 1 + 1 + 1 + 0 = 3
so we see that we need |c| = √13 in order for ~v = c~u to be a unit vector, and we need c < 0 in order
for ~v = c~u to have the opposite direction to ~u. Therefore we need c = − √13 . So we see that a unit
vector in the opposite direction to ~u is
1 1 1 1
~v = c~u = − √ (1, −1, 1, 0) = − √ , √ , − √ , 0
3 3 3 3
Unit 4 55
Note: This says that, as we do in ℜ2 and ℜ3 , we find the sum of 2 vectors as the vector whose com-
ponents are the sums of the corresponding components of the 2 vectors, and we find the difference
of 2 vectors, which can be considered to be the sum of the first vector and the negative of the second
vector, as the vector whose components are the differences of the corresponding components.
Solution:
Solution:
We simply carry out the specified arithmetic operations. We get:
w
~ = 3~u − 2~v
= 3(1, 0, 1, 0, −1, 0) − 2(0, 1, 1, 0, −1, −1)
= (3, 0, 3, 0, −3, 0) − (0, 2, 2, 0, −2, −2)
= (3 − 0, 0 − 2, 3 − 2, 0 − 0, −3 − (−2), 0 − (−2))
= (3, −2, 1, 0, −1, 2)
Since all of the arithmetic operations we have defined for ℜm are exactly analogous to the arith-
metic operations we defined for ℜ2 and ℜ3 , then the same properties hold. That is, in Theorem
1.4, on page 13 we stated some properties which hold for vectors in ℜ2 and ℜ3 , and these same
properties hold for vectors in ℜm . You should review Theorem 1.4, and think about those same
properties when applied to m-vectors for m > 3. Experiment a bit to convince yourself that those
properties still hold when we have vectors from a higher dimensional space.
Dot Product in ℜm
Definition: Let ~u and ~v be any 2 m-vectors. Then the dot product of ~u and ~v ,
written ~u • ~v , is the scalar value given by
Note: This says that just as before, we find the dot product of 2 vectors by taking the sum of the
products of corresponding components.
56 Unit 4
Solution:
Example 4.11. If ~u = (1, 1, −1, 2, −2) and ~v = (−1, 2, −1, 1, −2), find ~u • ~v and ~v • (−~u).
Solution:
Notice that in that example, we found that ~v • (−~u) = −8 = −(~u •~v). That’s because once again,
having defined the dot product for m-vectors to be exactly analogous to the dot product for vectors
in ℜ2 and ℜ3 , the same properties hold. That is, all of the properties of the dot product that we
observed in Theorem 2.1, on page 16, hold for m-vectors just like they do for vectors in ℜ2 and ℜ3 .
As before, you should review the properties stated in that theorem, think about what they mean
when applied to vectors with more than 3 components, and experiment a bit to convince yourself
that these properties do still hold.
For instance, the theorem tells us that ~v • ~u = ~u • ~v , and also that ~v • (−~u) = ~v • (−1)~u =
(−1)(~v • ~u) = −(~v • ~u), so in Example 4.11 it is not surprising that we found that ~v • (−~u) = −(~u •~v ),
because we have
~v • (−~u) = −(~v • ~u) = −(~u • ~v )
In ℜ2 and ℜ3 , the word orthogonal is used to describe 2 vectors which form a right angle at
the origin, i.e. two vectors which are perpendicular to one another. And we know an easy way to
determine when two vectors are orthogonal, because the value of the dot product of those vectors is
0. We extend this sense of orthogonal to vectors in ℜm .
Definition: For vectors ~u and ~v in ℜm , we say that the vectors are orthogonal if and
only if ~u • ~v = 0.
Example 4.12. For what value(s) of k are the vectors ~u = (1, −2, −4, k, 2) and ~v = (2, 1, −1, 3, 1)
orthogonal?
Solution:
In order for ~u and ~v to be orthogonal, it must be true that ~u • ~v = 0. We calculate ~u • ~v , in terms of
the unknown constant k, and then solve for the value(s) of k which make that expression have the
value 0. We have
Notice: We have extended the definitions of vector sums and differences, scalar multiplication, and
the dot product of vector to include vectors in ℜm for any m. But we have not extended the idea of
cross product. As stated in Unit 2, the cross product of 2 vectors is defined only for vectors in ℜ3 .
(Or at least for our purposes it is.)
Lines in ℜm
Note: In ℜ2 , we use ~x = (x, y), and in ℜ3 we use ~x = (x, y, z). But now we’ve reached the end of the
alphabet ... what do we do in ℜ4 ? Or in ℜ5 ? Generally, when ~x ∈ ℜm for m > 3 we use subscripted
x’s. That is, we use ~x = (x1 , x2 , ..., xm ).
Definition: Borrowing from the terminology used for ℜ2 and ℜ3 , we define equations
of lines in ℜm as follows:
1. Consider any points (i.e. m-tuples) P and Q in Rm , and let p~ and ~q be the cor-
responding vectors in ℜm . The two-point form of an equation of the line
through P and Q is given by
~x(t) = (1 − t)~
p + t~q
2. For any vector ~v which is parallel to the vector ~q − p~, we say that ~v is a direction
vector for the line through P and Q, and we can write a point-parallel form
equation of the line as
~x(t) = p~ + t~v
3. For any line ~x(t) = p~ +t~v in ℜm , parametric equations of the line can be obtained
by equating corresponding components from the point-parallel form equation. For
instance, for the line
x1 = p1 + v1 t
x2 = p2 + v2 t
x3 = p3 + v3 t
x4 = p4 + v4 t
There will, of course, be one parametric equation for each component of the m-
vector. That is, m parametric equations are required to describe a line in ℜm .
58 Unit 4
Example 4.13. Write equations of the line in ℜ4 containing P (1, 2, 3, 4) and Q(2, 0, −1, 1) in the
following forms: two-point form, point-parallel form, parametric equations.
Solution:
We use the two points given to write a two-point form equation of the line:
−−→
As in ℜ2 and ℜ3 , we can find a direction vector, ~v , for the line as the vector equivalent to P Q, given
by ~q − ~p. We get
~v = ~
q − p~ = (2, 0, −1, 1) − (1, 2, 3, 4) = (1, −2, −4, −3)
and so we can write a point-parallel form equation of the line using this direction vector and either
point:
~x(t) = (1, 2, 3, 4) + t(1, −2, −4, −3)
x1 = 1+t
x2 = 2 − 2t
x3 = 3 − 4t
x4 = 4 − 3t
Example 4.14. Write parametric equations of the line through P (1, 0, 2, 1, −3, 2) which is parallel to
~v = (0, 1, 0, 1, −1, 1).
Solution:
We see (by counting the coordinates of P and the components of ~v ) that this is a line in ℜ6 , so
there will be 6 parametric equations. Let ~x(t) = (x1 , x2 , x3 , x4 , x5 , x6 ) be any point on this line.
A point-parallel equation of the line is ~x(t) = (1, 0, 2, 1, −3, 2) + t(0, 1, 0, 1, −1, 1). Therefore the
parametric equations for the line are:
x1 = 1 + 0t
x2 = 0+t
x3 = 2 + 0t
x4 = 1 + 1t
x5 = −3 − t
x6 = 2+t
x1 = 1
x2 = t
x3 = 2
x4 = 1+t
x5 = −3 − t
x6 = 2+t
Unit 4 59
Hyperplanes
Example 4.15. Write an equation of the hyperplane in ℜ5 which passes through (1, 0, −2, 3, 1) with
normal (2, 3, −1, −2, 1) in:
(a) point-normal form (b) standard form
Solution:
(a) We have ~n = (2, 3, −1, −2, 1) and p~ = (1, 0, −2, 3, 1), so a point-normal form equation of this
hyperplane is:
(2, 3, −1, −2, 1) • (~x − (1, 0, −2, 3, 1)) = 0
(b) The normal vector gives the coefficients of the xi ’s, and the value of the constant on the right hand
side of the equation is found by evaluating the equation at the given point (i.e. by calculating ~n • ~p).
That is, the equation has the form 2x1 +3x2 −x3 −2x4 +x5 = c where c = 2(1)+3(0)−(−2)−2(3)+1 =
2 + 0 + 2 − 6 + 1 = −1, so a standard form equation of the hyperplane is:
Example 4.16. Find a normal vector for the hyperplane in ℜ4 corresponding to the standard form
equation 5x1 − 3x2 + x4 = 7. Find any point on this hyperplane.
Solution:
The components of the normal vector are the coefficients of the xi ’s in the standard form equation.
Since the left hand side of the equation is 5x1 − 3x2 + x4 , with no x3 visible, the coefficient of x3
must be 0. So the normal vector is ~n = (5, −3, 0, 1).
To find a point on this hyperplane, we simply need to find any point P (x1 , x2 , x3 , x4 ) which satisfies
the standard form equation. For instance, if we set x1 = 0 and x2 = 0 then we get 0 − 0 + x4 = 7,
and x3 could have any value, so (0, 0, 0, 7) is a point on the hyperplane. Likewise, (0, 0, −43, 7) is
another. Many other points could be found, with any combination of x1 , x2 and x4 values which
makes 5x1 − 3x2 + x4 = 7. (For instance, if we chose x1 = 2 and x2 = 1, we would get x4 = 0. And
once again, x3 could have any value, since it is multiplied by 0.)
Notice: As we saw in this example, some of the xi ’s might be invisible in the standard form equation,
because the corresponding component of the normal vector is 0. This can be true for one or more
xi ’s with i larger than the largest subscript showing, as well as for “missing” ones between those
that show. In the example, we were told that the hyperplane was in ℜ4 , so we knew that there
60 Unit 4
weren’t any more xi ’s after the last one we could see. But if it hadn’t said that, we wouldn’t know
how many components the normal vector should have. We might assume that there should be 4,
but it could just as well have been 5 or 6 or ... That is, the hyperplane through (0, 0, 0, 7, 0, 0) with
normal vector (5, −3, 0, 1, 0, 0), in ℜ6 , also has standard form equation 5x1 − 3x2 + x4 = 7.
The same thing can happen with ℜ2 and ℜ3 . The equation x + y = 1 is an equation of a line in
ℜ2 , with normal (1, 1). But it’s also the equation of a plane in ℜ3 , with normal (1, 1, 0). (Since the
unknowns are called x and y, rather than x1 and x2 , it’s unlikely to correspond to a hyperplane in
ℜm for some m > 3, although it could. The text sometimes uses ~x = (x, y, z, w) for a vector in ℜ4 ,
and by that convention, x + y = 1 is also a hyperplane in ℜ4 with normal (1, 1, 0, 0). And I suppose
we could similarly use (x, y, z, w, v) for ℜ5 and so on ...)
Please Note: At the end of this section, the text contains some material that we aren’t going to
talk about, or at least not right now. The concepts of linear equation and linear combination we’ll
meet soon enough, and there’s no need to talk about them here and now. And the concept of a
plane determined by two vectors isn’t covered in this course.
Math 1229A/B
Unit 5:
Systems of Linear Equations
(text reference: Section 2.2)
c
V. Olds 2010
Unit 5 61
Of course, we can have other kinds of equations with 2 variables, and you’ve probably seen
some of those before. For instance, x2 + y 2 = 1 is the equation of the circle with radius 1. More
complicated
√ curves in 2-space have equations like x2 + xy + y 2 = 4 or 5x4 y − 3x2 y 2 + 2xy 5 = 0 or
√
x − x y + y = 3. And then there are equations like x + xy − y22 = 6, and 2x+y = 4 and x + sin y = 1.
But none of those are linear equations. They’re not lines. They’re curvy things.
You also know that in ℜ3 , an equation like x + y + z = 1 is not a line. It’s a plane. But ... what is
a plane? It’s just a whole bunch of lines side-by-side. Well, okay, we wouldn’t really think of a plane
that way, but a plane does have some characteristics which make it similar to a line in some ways.
It’s flat. No curvy bits. And the same goes for a hyperplane in ℜm . Any plane or hyperplane, in
standard form, has an equation which is just the sum of a bunch of constant multiples of variables,
set equal to some constant. There are never any more complicated things done to the variables, like
squaring, or taking the square root, or multiplying two variables together, or dividing by a variable.
Just a constant times a variable, plus a constant times another variable, plus ... and equal to some
constant. The most interesting things that happen are that sometimes a constant is negative, so
that we’re actually subtracting, or sometimes a constant is 0, so that the variable isn’t even there.
Equations like this are called linear equations, because the relationship between the variables is
always like the relationship between the variables in an equation of a line in ℜ2 . We’re going to be
working with linear equations a lot in this course. In fact, the textbook is called Elementary Linear
Algebra, because the kind of algebra we’re studying all relates to these linear equations. So we need a
careful definition of what these are. But it’s not any more complicated than what we’ve already said.
Note: That last bit just means that the equation 0 = 0 isn’t considered to be a linear equation.
There has to be at least one variable actually appearing in the equation. Also, as you already
know, the variables don’t necessarily have to have the names shown in the definition. They could be
x, y, z, w, r and t, or they could be subscripted y’s or s’s instead of x’s. The names of the variables
don’t matter. Any equation which says, or can be rearranged to say, that the sum of scalar multiples
of variables is equal to a constant is a linear equation.
Definition:
• A SLE is in standard form if all of the equations have all of the variables appearing
on the left hand side of the equation, in the same order for all equations, with spaces
left for any variables missing in that equation, and the constant term appears on
the right hand side of each equation.
• A SLE is said to be consistent if it has at least one solution.
• A SLE which has no solutions is called inconsistent.
y
x + 2z =
Example 5.1. Consider the SLE 2 .
y = 2z
(a) Put this system into standard form.
(b) Show that any vector of the form (x, y, z) = (−t, 2t, t) is a solution to this system.
Solution:
(a) To put the SLE into standard form, we collect all the variables on the left hand side and put any
constant on the right hand side. For the first equation, we can start by getting rid of the fraction
(although this isn’t necessary). We have
y
x + 2z = ⇒ 2x + 4z = y ⇒ 2x + 4z − y = 0 ⇒ 2x − y + 4z = 0
2
Also, for the second equation we have
y = 2z ⇒ y − 2z = 0
So we can write the SLE in standard form as
2x − y + 4z = 0
y − 2z = 0
Unit 5 63
Notice that the last step in rearranging the first equation wasn’t necessary. The variables don’t
necessarily have to be in any particular order, as long as they’re in the same order in each equation.
We could have left the first equation as 2x + 4z − y = 0, but in that case we would have needed
to state the second equation as −2z + y = 0 in our standard form. Usually, though, we do list the
variables in their “natural” order. It’s just easier to think of that way.
(b) From the original statement of the second equation, we know that any solution to this sys-
tem must satisfy y = 2z. Substituting this into the original form of the first equation gives
x + 2z = y2 = 2z 2 = z. And from x + 2z = z we see that we need x = z − 2z = −z. So for
any particular value of z, setting x = −z and y = 2z gives a solution to the SLE. That is, if we set
z equal to some value t, then if we also set x = −t and y = 2t we get a solution to the system. So
(x, y, z) = (−t, 2t, t) is a solution for any value t. For instance, setting t = 1 we see that (−1, 2, 1) is
a solution to this SLE. Likewise, t = −2 gives (2, −4, −2) as another solution.
Of course, to show that (−t, 2t, t) is a solution, we didn’t actually need to do that. What we did
above actually found the form of solution. All that was really necessary here was to show that
(x, y, z) = (−t, 2t, t) does satisfy both equations. Substituting x = −t, y = 2t and z = t into the left
hand side of each equation in the standard form of the SLE we get:
2x − y + 4z = 2(−t) − (2t) + 4(t) = −2t − 2t + 4t = 0
y − 2z = 2t − 2(t) = 0
Since for both equations what we got was the right hand side value of the corresonding equation in
the standard form of the SLE, we see that this vector does satisfy both equations and therefore is a
solution to the system.
Definition: When a solution to a system of equations can be stated with one or more
parameters, such that any value of the parameter(s) gives a solution to the system, the
system is said to have an r-parameter family of solutions, where r is the number of
parameters in the solution.
For instance, we saw that the SLE in Example 5.1 has the 1-parameter family of solutions (−t, 2t, t)
for any t ∈ ℜ.
Definition: Two systems of linear equations are said to be equivalent if they have
exactly the same solutions.
For instance, consider the SLE in Example 5.1. We know that in any solution to this system, it
must be true that y = 2z. But if y = 2z then it must also be true that 2y = 4z, i.e. that 2y − 4z = 0.
y
Likewise, we know that any solution to the system must satisfy = x + 2z, and with y = 2z we
2
y y y
could express this as = x + y to get x + y − = 0 which gives x + = 0.
2 2 2
So the SLE’s
2y − 4z = 0 2x − y + 4z = 0
y and
x + = 0 y − 2z = 0
2
are equivalent.
64 Unit 5
Definition: To solve a system of linear equations means to find all solutions to the
system, or to determine that the system is inconsistent.
x + y = 1
For instance, given the SLE , we can solve the system as follows. We see that
y = 0
we must have y = 0. Substituting this into the first equation, we get x + 0 = 1 so we see that we
must have x = 1. Therefore (x, y) = (1, 0) is the only solution to this SLE.
You have most likely solved systems of equations before. In the next unit, we’ll learn a way to
organize the data from a SLE so that we can find all solutions in a systematic way which is less
cumbersome than working with the actual equations. But for now we’ll look at what’s going on with
the kind of manipulations we’re going to be doing, while working with the equations. We’ll be doing
things like “add two equations together”, so you need to understand that what we mean by that is
to create a new equation, whose left hand side (LHS) is the sum of the LHS’s of the equations being
added, and whose right hand side (RHS) is the sum of the RHS’s. For instance, if we want to add
the equation x + y = 1 and the equation x + 2y = −2, we form the new equation
(x + y) + (x + 2y) = (1) + (−2) ⇒ 2x + 3y = −1
Similarly, when we talk about multiplying an equation by a scalar, what we mean is to multiply
both sides of the equation by that scalar, so when we multiply the equation LHS = RHS by the
scalar (i.e. the constant) c, we make a new equation that says that c × LHS = c × RHS. For instance,
when we multiply 2x + 3y − z = 5 by 2 we get 4x + 6y − 2z = 10.
Our objective in manipulating equations in this way is to solve a SLE. In order to do this, any
manipulation we do must transform the SLE into an equivalent SLE. If the SLE we end up with
isn’t equivalent to the one we started with, then the solutions to the system we now have are not
necessarily solutions to, or are not necessarily all of the solutions to, the original system, and so
finding them won’t solve the SLE we were trying to solve.
There are only 3 kinds of things we’re allowed to do when manipulating equations to solve a SLE.
These are called elementary operations. It’s important to remember what these operations are, and
to do only these kinds of operations, because doing anything else will almost certainly result in a
SLE which is not equivalent to the system we’re trying to solve.
Definition: The following operations are the elementary operations which are al-
lowed in solving a SLE:
I Multiply an equation by a non-zero scalar.
II Interchange the positions of two equations in the system.
III Replace one of the equations by the sum of that equation and a scalar multiple of
another one of the equations in the system.
No other operations are allowed.
Theorem 5.1. Performing one of the elementary operations to transform a system of linear equa-
tions always results in a SLE which is equivalent to the original system.
We’re not going to worry about why that’s true. We’ll just accept that it is. If you only perform
elementary operations, and of course perform them properly, then each system you get is equivalent
to the one before, so when you arrive at one for which the solution(s) are obvious, or for which it is
clear that there is no solution, you have solved the system you started with.
Unit 5 65
x + 2y = 4
Example 5.2. Solve the SLE .
x − y = 1
Solution:
If we add 2 times the second equation to the first equation, the y term in the resulting equation will
have coefficient 0, because 2 + 2(−1) = 0. So the first elementary operation we’re going to perform
is to replace the first equation by the first equation plus 2 times the second equation. This is an
elementary operation of type III. The new first equation is
Replacing the first equation by this new equation we get the transformed, equivalent, SLE
3x = 6
x − y = 1
Now, we can see that the first equation is going to tell us the value of x. We multiply the first
equation (of this transformed system) by 31 (which is to say, we divide through by 3) to get just x
on the LHS. This is an elementary operation of type I. The new first equation is
1 1
(3x) = (6) ⇒ x=2
3 3
Re-writing the system with this new version of the first equation, we have the SLE
x = 2
x − y = 1
Next, we can eliminate x from the second equation by subtracting the first equation. That is, we
perform another type III elementary operation, replacing the second equation by itself plus −1 times
the first equation. The new second equation is
x = 2
y = 1
Well, it’s pretty easy to see what the solution to that SLE is. It’s staring us right in the face. Clearly,
(x, y) = (2, 1) is the only solution to this system of equations. And since the only things we did
were elementary operations, this system is equivalent to, i.e. has the same solutions as, the original
system. And so (x, y) = (2, 1) is also the only solution to the SLE we started out with.
x + y + z = 5
2x + y = 10
y + 2z = 0
Solution:
Let’s think about what our elementary operations accomplished in Example 5.2. First, we elimi-
nated y from the first equation, so that it was telling us the value of x. Later, we eliminated x from
66 Unit 5
the second equation, so that it was telling us the value of y. This general approach of eliminating
variables from the equations, so that each equation tells us about a different variable, is what we
want to accomplish with elementary operations.
In this case, we want to end up with the first equation being the only one with an x in it, so that
this is the equation telling us about the value of x. Therefore we’ll start as before by eliminating x
from the other equations. Conveniently, the third equation already doesn’t have x in it, so we only
need to eliminate x from the second equation. In that equation, we have 2x, so in order to eliminate
this, we want to add −2x. We can do that by adding −2 times the first equation to the second
equation. Therefore we start with a type III elementary operation: replace the second equation by
the sum of itself and −2 times the first equation. The new second equation is
(2x+y)+(−2)(x+y +z) = 10+(−2)(5) ⇒ 2x+y −2x−2y −2z = 10−10 ⇒ −y −2z = 0
The transformed SLE is
x + y + z = 5
−y − 2z = 0
y + 2z = 0
Since we want the second equation to be telling us about the value of y, it will be useful to have the
coefficient of y in the second equation being 1. That is, we want the second equation to start with
just y, rather than the −y it currently starts with. There are at least a couple of different ways to
accomplish this, but the easiest is to recognize that we already have an equation that starts with
just y, so we can simply move that equation, to make it the second equation. That is, we can use
a type II elementary operation to interchange the positions of the second and third equations. (In
this case, this elementary operation isn’t really very much easier than the alternatives, but generally
speaking type II operations are the easiest — and least prone to error — so we prefer those to either
of the others.) This transforms the SLE to
x + y + z = 5
y + 2z = 0
−y − 2z = 0
Next, in keeping with our approach of eliminating variables from every equation except the one
that’s going to tell us about the value of that variable, we want to eliminate the y terms from the
first and third equations. For the first equation, we need a type III elementary operation. (There
isn’t another equation we can interchange with the first, that doesn’t have a y in it, so we can’t use
a type II operation. Also, there’s no non-zero scalar that we can multiply the first equation by to
transform +y to 0, so we also can’t use a type I operation.) We see that since the coefficient of y
in the first equation is 1, the same as in the second equation, we can get rid of the y in the first
equation by simply subtracting the second equation. That is, we will replace the first equation by
the sum of itself and −1 times the second equation. This gives the new first equation as
(x + y + z) + (−1)(y + 2z) = 5 + (−1)(0) ⇒ x + y + z − y − 2z = 5 − 0 ⇒ x−z =5
The newly transformed SLE is
x − z = 5
y + 2z = 0
−y − 2z = 0
And since the coefficient on y in the third equation is −1, we can eliminate the y term by adding
y, i.e. by simply adding the second equation. So we replace the third equation by the sum of itself
and 1 times the second equation, which gives the new third equation as
(−y − 2z) + (y + 2z) = 0 + 0 ⇒ 0=0
Hmm. Well, okay. The equivalent SLE is now
x − z = 5
y + 2z = 0
0 = 0
Unit 5 67
At this point, we have the first equation starting with just x, and with no y term in it, to tell us
about the value of x, and the second equation starting with just y, with no x term in it, to tell us
about the value of y. We want to next make the third equation start with just z, and eliminate
the z term from the other two equations, so that the third equation tells us about the value of z,
and the others involve only the variable that each is telling us about the value of. But we have a
problem. The third equation has gone away! It doesn’t have a z in it anymore. All it says is the
seemingly unhelpful fact that zero is equal to zero. Note: An equation that tells us that 0 = 0 isn’t
actually a problem. It simply says “everything’s okay”. In the next example, we’ll see that some-
times we get an equation that tells us that 0 equals something other than 0. That does indicate a
problem, of sorts. But the assertion that 0 equals 0 is just a statement of fact that doesn’t bother us.
But then, how do we proceed if there’s no equation to tell us about the value of z? Well, we don’t.
That is, we’re pretty much done. The fact that there’s no equation telling us about the value of z
tells us that z could have any value. That is, we can have z = t for any value t ∈ ℜ. And the first
equation, telling us about the value of x, tells us about the value of x relative to z. Likewise, the
second equation is telling us about the value of y relative to z. Let’s look at those two equations,
and re-express them so that the LHS has only the variable we want the equation to be telling us
about:
x−z =5 ⇒ x=z+5
y + 2z = 0 ⇒ y = −2z
So if z is equal to some number t, we need x = t + 5 and y = −2t. That is, (x, y, z) = (t + 5, −2t, t)
is a solution for any value t ∈ ℜ. That is, there are infinitely many different solutions to this last
SLE, and they all have this same form (for different values of t). Therefore the SLE we started out
with,
x + y + z = 5
2x + y = 10
y + 2z = 0
(which is of course equivalent to the one we ended up with, since we only transformed the SLE using
elementary operations), has the one-parameter family of solutions (x, y, z) = (t + 5, −2t, t) for any
t ∈ ℜ.
Solution:
This time, the SLE we’re given is not in standard form. Whenever we’re given an SLE that isn’t in
standard form, the first thing we do, always, is to put the system into standard form. So we start
by rearranging each of the equations into (in this case, since we have 2 variables, x and y) the form
ax + by = c. For the first equation, we need to add 2y to each side of the equation, i.e. move the 2y
from the RHS to the LHS. We get the equation 2x + 2y = 2. For the second equation, we add 3x to
both sides, to move the 3x from the RHS to the LHS, and we also need to add 6 to both sides, i.e.
move the constant to the RHS. We get 3x + 3y = 6. This gives the SLE as
2x + 2y = 2
3x + 3y = 6
Notice: What we did here, so far, didn’t involve elementary operations. But we weren’t transforming
the SLE into an equivalent SLE, we were simply re-writing the given equations to get to standard
form. Moving things from one side of an equation to another, by adding or subtracting the same
thing on both sides of the equation, doesn’t change the equation in any fundamental way. It’s still
the same equation, it just looks a bit different.
68 Unit 5
You may be able to tell already, from the standard form SLE, what the conclusion is going to be.
But let’s see what we can do with elementary operations to make it more obvious. (Besides, it’s good
practice.) We’ll follow the same basic procedure as before, making the first equation tell us about
the value of x, and elimating x from the other equation. First, in order to have the first equation tell
us about x, we want it to start with just x, rather than 2x. The easiest way to accomplish that this
time is to divide through by 2, i.e. to multiply the first equation by the non-zero constant 21 . This
1
is a type I elementary operation. The new version of the first equation is 2 (2x + 2y) = 12 (2),
x + y = 1
3x + 3y = 6
Next, we want to eliminate x from the second equation. Since the coefficient of x in the second
equation is 3, we want to subtract 3x. We can accomplish that by subtracting 3 times the first
equation. That is, we perform a type III elementary operation, replacing the second equation by
itself plus −3 times the first equation. When we do this we get:
Notice: Whenever we obtain an equation that says that 0 = c where c is any non-zero constant, that
contradictory equation is telling us that the system is inconsistent. Not just that this particular
equation has no solution, but more, that the whole system has no solution.
In these last 3 examples, we had examples of the only 3 things that can happen when we solve
a system of linear equations. The system can have a unique solution, like in Example 5.2. Or the
system can have no solution, as we saw in Example 5.4. The only other possibility is that the system
has infinitely many solutions, expressed as a parametric family of solutions. This was the situation
in Example 5.3, which we had previously encountered in Example 5.1, as well.
Math 1229A/B
Unit 6:
Row Reduction
(text reference: Section 2.3)
c
V. Olds 2010
Unit 6 69
6 Row Reduction
Next, we learn a method known as row-reduction for solving SLE’s. This method works in ba-
sically the same way as the method we used in the previous unit, but we will use a structure called
a matrix to eliminate the repetitive writing down of symbols which never change from one step to
the next. This helps us to organize what we’re doing and develop a systematic procedure for getting
from the system which needs to be solved to an equivalent system in which the solution(s), or the
fact that there is no solution, is obvious. First, we must define what this mathematical structure
called a matrix is.
The horizontal lines of numbers are called rows and the vertical lines of numbers are
called columns. The number, aij , in row i and column j is called the (i,j)-entry of the
matrix. A matrix with m rows and n columns is called an m × n matrix (pronounced
“m by n”).
We will learn more about these things called matrices later in the course. For now, we want to
learn about one particular use of a matrix, to use it as shorthand to represent a system of linear
equations.
That is, row i of the coefficient matrix contains, in order, the coefficients from equation i. Likewise,
column j of the coeffient matrix contains the coefficients of the j th variable in the order in which
they appear in the equations. When we form the augmented matrix, by also including the column of
RHS values, the matrix looks just like the SLE, with all of the variables omitted, and with a vertical
line instead of all the equal signs.
70 Unit 6
Example 6.1. Write the coefficient matrix and the augmented matrix for the following SLE:
x − 4y + 3z = 5
−x + 3y − z = −3
2x − 4z = 6
Solution:
Since the system is already in standard form, we can obtain the coefficient matrix by simply writing
the coefficients from the system in the same order in which they appear in the SLE. For instance,
we get the first row of the coefficient matrix by extracting the coefficients of x, y and z, i.e. 1, −4
and 3, from the first equation. Note: The (3, 2)-entry is 0, recognizing that in the third equation,
there is a 0 coefficient making the y-term invisible. We get the Coefficient Matrix:
1 −4 3
−1 3 −1
2 0 −4
For the augmented matrix, we write the columns from the coefficient matrix, then write a vertical
line, and add a new column containing the right hand side values of the equations. In this case, the
Augmented Matrix is:
1 −4 3 5
−1 3 −1 −3
2 0 −4 6
Notice: The augmented matrix for the SLE contains all of the numbers which appear in the system;
we’re simply omitting all the non-numeric objects (i.e. variables and equal signs).
x1 = 1 + x2 − 3x3
Example 6.2. Write the augmented matrix for the system of linear equations: x2 = 2 − x1 + x4 .
x3 = x2 + x4 − x1
Solution:
Remember: Our definition of the coefficient matrix and the augmented matrix require that the
system be in standard form. So the first thing we have to do is rearrange each equation to get the
standard form SLE. We get:
x1 − x2 + 3x3 = 1
x1 + x2 − x4 = 2
x1 − x2 + x3 − x4 = 0
Now, we just have to leave out all the x’s, +’s, −’s and =’s, while filling in the invisible 0’s and
±1’s. That is, we write the coefficients as we see (and don’t see) them, and write the RHS’s, with
a vertical line separating the column of RHS values from the coefficients. We get the augmented
matrix:
1 −1 3 0 1
1 1 0 −1 2
1 −1 1 −1 0
1 2 3
Example 6.3. If the augmented matrix for a particular system of linear equations is ,
4 5 6
write the SLE.
Solution:
The augmented matrix tells us everything we need to know about the SLE except what the variables
are called. Then again, it doesn’t matter much what names we use for the variables. We can tell
from the number of columns in the coefficient matrix part of the augmented matrix that there are
Unit 6 71
2 variables in the SLE, so let’s use x and y. And since there are 2 rows in the augmented matrix,
the SLE must have 2 equations. We just have to attach the coefficients to the variables for each
equation, replace the vertical line with equal signs, and use the RHS values from the extra column.
We get:
1 2 3 x + 2y = 3
corresponds to the SLE
4 5 6 4x + 5y = 6
Looking at these examples we see that it is easy to translate a standard form system of linear equa-
tions to its augmented matrix, and to translate from the augmented matrix back to the system.
We are going to use these augmented matrices to transform from one SLE to an equivalent SLE,
using operations corresponding to the elementary operations we have already learnt, until we get to
an augmented matrix which corresponds to a SLE in which it is easy to find the solution(s) or to
recognize that the system is inconsistent. We will learn new operations to perform on the augmented
matrices, as well as a procedure for deciding which operations to apply, in what order. But before we
do that, there is another very important consideration ... How do we know when to stop performing
these operations? That is, how do we recognize that we have obtained “an augmented matrix which
corresponds to a SLE in which it is easy to find the solution(s) or to recognize that the system is
inconsistent”? We look for something called row-reduced echelon form.
(i) Every row of A which contains at least one non-zero entry has a 1 as its first (from
left to right) non-zero entry. By convention, we refer to this 1 as the leading 1 of
the row in which it occurs.
(ii) Each column of A which contains a leading 1 for some row contains no other non-
zero entries (i.e. all other entries are 0’s).
(iii) In any two rows of A which each contain some non-zero entries, the leading 1 from
the lower row must occur farther to the right than the leading 1 from the upper
row. That is, the leading ones in the matrix move from left to right as we read
down the matrix.
(iv) All rows of A which consist entirely of zeros are placed at the “bottom” of the
matrix, i.e. are lower in the matrix than all rows which contain some non-zero
entries.
than the leading 1 in the preceding row. Therefore condition (iii) is satisfied. Finally, looking at
the matrix we see that there are no rows which consist entirely of zeros, which means that where
such rows should come in the matrix is irrelevant. We only need to be concerned with condition
(iv) when there are some rows like that. If there aren’t any, then the condition is not violated, so
it is satisfied. Therefore all of the conditions required for row-reduced echelon form are satisfied
by this matrix, so the matrix is in RREF. Notice that column 4 does not contain any row’s lead-
ing 1, so the requirements of RREF say nothing about what the column 4 entries may or may not be.
Solution:
(a) We have the matrix
1 0 2 2 3
0 1 3 1 2
0 0 0 0 0
Looking at this matrix, we see that in each row that has any non-zero entries, the first non-zero
entry is a 1. That is, row 1 has a leading 1 (in column 1) and row 2 also has a leading 1 (in column
2). Row 3 doesn’t have any non-zero entries, so it doesn’t need (and cannot have) a leading 1.
Therefore condition (i) of the requirements for RREF is satisfied. Also, looking at columns 1 and 2,
which are the only columns which contain any row’s leading 1, we see that all of the other entries are
0. That is, the column which contains the leading 1 for row 1 does not contain any other non-zero
entries, and the same is true for the column which contains the leading 1 for row 2. Since these
are the only columns which contain leading 1’s for some row, these are the only columns to which
condition (ii) of the requirements for RREF pertain. Therefore this condition is satisfied. And we
see that row 2’s leading 1 is further to the right than row 1’s, above it, so condition (iii) of the
requirements for RREF is satisfied, too. Finally, row 3 contains only 0’s and is at the bottom of the
matrix, further down than all the rows which contain some non-zero entries. Thus condition (iv)
of the requirements for RREF is also satisfied. Since all of the conditions are satisfied, this matrix
does fulfill the requirements and therefore is in row-reduced echelon form.
the matrix. The row whose leading 1 is in column 3 would have to be lower down in the matrix than
the row whose leading 1 is in column 2 in order for codition (iii) to be satisfied. (Unless, of course,
there wasn’t any row whose leading 1 was in column 2. But that’s not the case here.) Therefore
this matrix also is not in RREF.
Suppose that we have a system of linear equations, and that we also have a RREF augmented
matrix that we know corresponds to a system which is equivalent to that SLE. How do we use it to
find the solution(s) to the SLE? We simply use the RREF matrix to write the corresponding SLE so
that we can see the solution(s) to that SLE, and since this SLE is equivalent to the original SLE, we
have also found the solution(s) to the original SLE. For instance, suppose that we have the system
x + y = 3
x − y = 1
so, because we know that this system is equivalent to, and thus has the same solutions as, the
x + y = 3
original system, we see that (x, y) = (2, 1) is the only solution to .
x − y = 1
Example 6.5. For each of the following, find all solutions to a system of linear equations which is
equivalent to the SLE corresponding to the given RREF augmented matrix, where the variables are
as stated.
1 0 0 1
(a) 0 1 0 2 with variables x, y and z.
0 0 1 0
1 0 2 2 3
(b) 0 1 3 1 2 with variables x1 , x2 , x3 and x4 .
0 0 0 0 0
74 Unit 6
1 0 0 2 5
0 1 0 0 3
(c)
0
with variables w, x, y and z.
0 1 −3 2
0 0 0 0 1
Solution:
For each of the given RREF matrices, we write the corresponding SLE, using the variables given,
and state all solutions to that SLE.
(a) We can easily confirm that this matrix is in RREF. We see that
1 0 0 1 x = 1
0 1 0 2 corresponds to y = 2
0 0 1 0 z = 0
so the only solution to any SLE which is equivalent to this SLE is (x, y, z) = (1, 2, 0).
(b) Notice that this is the augmented matrix which we have already confirmed, in Example 6.4(a)
is in RREF. This time we have
1 0 2 2 3 x1 + 2x3 + 2x4 = 3
0 1 3 1 2 which corresponds to x2 + 3x3 + x4 = 2
0 0 0 0 0 0x1 + 0x2 + 0x3 +0x4 = 0
We see that the first equation is telling us about the value of x1 (relative to x3 and x4 ) and the
second equation is telling us about the value of x2 (relative to x3 and x4 ). The third equation simply
states that 0 = 0, which tells us nothing (except that there isn’t a problem). That is, the original
system must have had 3 equations, but only the first 2 ended up giving us useful information, and
the third was consistent with those 2, but contained no new information. There is no equation
telling us about the value of x3 , so it must be free to have any real value. Likewise, with no equation
telling us about the value of x4 , this variable is also free to have any real value. Since each of these
variables could have any value, we use a different parameter for each. That is, we can have x3 = s
for any s ∈ ℜ and also x4 = t for any t ∈ ℜ. Using these values in the first and second equations,
and rearranging to state the corresponding values of x1 and x2 we get
x1 + 2s + 2t = 3 ⇒ x1 = 3 − 2s − 2t
x2 + 3s + t = 2 ⇒ x2 = 2 − 3s − t
Therefore any SLE which is equivalent to this SLE has the two-parameter family of solutions
(x1 , x2 , x3 , x4 ) = (3 − 2s − 2t, 2 − 3s − t, s, t).
(c) Checking for leading 1’s, and that their columns are otherwise empty, and seeing that the leading
1’s move from left to right as we read down the matrix (and observing that there is no row which
contains only 0’s), we see that this matrix is, as stated, in RREF. We write the corresponding SLE:
1 0 0 2 5 w + 2z = 5
0 1 0 0 3 x = 3
0 0 1 −3 2 corresponds to
y − 3z = 2
0 0 0 0 1 0w + 0x + 0y + 0z = 1
In this system, it doesn’t matter what the first 3 equations say, because the fourth equation says
that 0 = 1. The existence of 4 equations tells us that the original SLE, whatever it was, had 4
equations. But the fourth equation has been transformed into an equation which is nonsense, i.e.
cannot be true, no matter what the values of w, x, y and z are. So the SLE corresponding to the
RREF augmented matrix has no solution, and therefore any SLE equivalent to this SLE must be
inconsistent.
Unit 6 75
At this point, we have learnt how to write the augmented matrix for any SLE in standard form,
and how to recognize a matrix that is in RREF, as well as how to find the solution(s), if any, to
the SLE corresponding to an augmented matrix which is in RREF. But we don’t know yet how to
transform the augmented matrix for a SLE into a matrix in RREF for an equivalent system. That’s
what we learn next.
Manipulating the augmented matrix for a system of linear equations corresponds to, i.e. is
equivalent to, manipulating the equations in the system. In the previous unit, we learnt how to
manipulate equations using the three elementary operations, in order to solve a SLE. What we need
to learn now is how to perform operations on an augmented matrix which correspond to those ele-
mentary operations we perform on equations. But what we are going to learn applies more broadly
than just to augmented matrices corresponding to SLE’s. The operations we are going to learn can
be applied to any kind of matrix.
In an augmented matrix representing a SLE, each row of the matrix corresponds to a different
equation in the system. Therefore the kind of operations we perform on equations in a system are
performed on rows of a matrix. There are 3 elementary operations which we are allowed to perform
on equations, and so there are three corresponding elementary row operations which we can perform
on a matrix.
Definition: The following operations are the elementary row operations (abbrevi-
ated ero’s) which can be performed to transform a matrix into RREF:
I Multiply any row of the matrix by any non-zero scalar.
II Interchange the positions of any two rows in the matrix.
III Replace any row in the matrix by the sum of that row and a scalar multiple of any
other row of the matrix.
No other operations are allowed.
Also, we say that two matrices are row equivalent if one matrix can be transformed
into the other by applying a sequence of elementary row operations.
Notice: When we ‘multiply a row by a scalar’, or ‘sum one row and a scalar multiple of another
row’, we perform these arithmetic operations by treating a row of a matrix like a vector, using the
vector operations we have learnt previously. That is, we multiply a row of a matrix by a scalar
by multiplying each entry in the row by that scalar. Likewise, adding 2 rows of a matrix means
summing corresponding entries (components) in the rows.
Also Notice: Compare the definitions of the elementary row operations, stated here, to the definition
of the elementary operations on equations used to transform an SLE into an equivalent SLE, stated
on page 64. You will see that they correspond exactly, and so when we perform these elementary
row operations on an augmented matrix, it is just like performing the corresponding elementary
operations on the equations in the SLE represented by that augmented matrix. And therefore we
have a theorem for augmented matrices similar to Theorem 5.1(see page 64) which told us that when
we perform elementary operations to transform a SLE, the new SLE is always equivalent to the one
we started with.
Theorem 6.1. Applying any sequence of elementary row operations to the augmented matrix for a
SLE produces a new augmented matrix whose corresponding SLE has exactly the same solution(s)
as (i.e. is equivalent to) the original SLE.
Using the definition of row equivalent, another way to state this theorem is:
If 2 augmented matrices are row equivalent, then their corresponding SLE’s have the
same solution(s).
Before we think about applying elementary row operations to an augmented matrix, to find the
solution(s) to a SLE, let’s look at some examples of row equivalent matrices, and practice using
these ero’s more generally to row-reduce any matrix (to RREF). For instance, the matrices
1 −4 3 −2 8 −6
−1 3 −1 and −1 3 −1
2 0 −4 2 0 −4
are row equivalent since multiplying row 1 of the first matrix by −2 (a type I ero) transforms the
first matrix into the second matrix.
There is shorthand which we can use to indicate what ero we are performing. If we transform
a matrix by multiplying a row of the matrix by a scalar, i.e. if we replace Row i of the matrix
by c times that row, for some non-zero scalar c, we indicate this by writing Ri ← cRi . (This says
calculate c times Row i and put it where Row i was before. That is, Row i is replaced by c times
Row i.) Likewise, when we interchange two rows of a matrix, i.e. put Row i where Row j was, and
vice versa, we write Ri ↔ Rj . And for a type III ero, in which we add a scalar multiple of another
row to a row, i.e. Row i is replaced by Row i plus c times Row j, we write Ri ← Ri + cRj . So for
the three transformations we observed above, we can write
1 −4 3 −2 8 −6 1 −4 3 −1 3 −1
R1 ←(−2)R1 R1 ↔R2
−1 3 −1 −−−−−−−−→ −1 3 −1 and −1 3 −1 −−−−−→ 1 −4 3
2 0 −4 2 0 −4 2 0 −4 2 0 −4
1 −4 3 −1 2 1
R ←R +2R2
and −1 3 −1 −−1−−−1−−−→ −1 3 −1
2 0 −4 2 0 −4
The long arrow says “transform the matrix to a row equivalent matrix by performing the ero indi-
cated”.
Unit 6 77
Example 6.6. For each of the matrices in Example 6.4 which is not already in RREF, perform ele-
mentary row operations to transform the matrix into a row equivalent matrix which is in RREF.
Solution:
1 0 2 2 3
(a) As we observed in Example 6.4(a), the matrix 0 1 3 1 2 is already in RREF.
0 0 0 0 0
1 1 0 0
(b) In Example 6.4(b), we observed that the augmented matrix 0 1 0 0 is not in RREF
0 0 1 0
because column 2, which contains the leading 1 for row 2, has a non-zero entry in row 1. That is,
the problem here is that the (1, 2)-entry of the matrix must be 0 in order to have RREF. Notice
that row 2’s leading 1 in column 2 means that we can eliminate the non-zero entry in this column,
in row 1, by adding the negative of the existing non-zero (1, 2)-entry times row 2 to row 1. That
is, we can use a type III ero in which we add a scalar multiple of row 2 to row 1, to get rid of the
non-zero (1, 2)-entry. And the scalar we need is just the negative of the entry which we are trying
to transform into 0. Also notice that row 2 has a 0 in column 1, so adding a multiple of row 2 to
row 1 will not have any effect on the (1, 1)-entry, which means that after we do this, row 1 will still
have a leading 1. (It’s important that when we perform ero’s we don’t “mess up” the things we
already have which do comply with RREF, so that we don’t create more work for ourselves.) When
we perform the ero R1 ← R1 + (−1)R2 , i.e. R1 ← R1 − R2 , we get the new version of Row 1 by
performing this arithmetic on the vectors which look like those rows of the matrix. (Note: It doesn’t
matter that the matrix happens to be an augmented matrix. We just ignore the | when we do the
arithmetic.) So the new row 1 corresponds to
1 0 0
(c) In Example 6.4(c), we found that the matrix 0 0 1 is not in RREF because although
0 1 0
each row does have a leading 1, and the columns containing these leading 1’s each have no other
non-zero entries, the leading 1’s do not follow the required pattern of moving from left to right as
we read down the matrix. The leading 1 in row 3 is not farther to the right than the leading 1 in
row 2. We can easily fix this by simply switching the positions of those two rows. That is, we need
to perform a type II ero to interchange the positions of Rows 2 and 3. We get:
1 0 0 1 0 0
R2 ↔R3
0 0 1 − −−−−→ 0 1 0
0 1 0 0 0 1
We see that the transformed matrix is in RREF, so we’re done.
(d) Finally, in part (d) of Example 6.4, we observed that the matrix
1 0 0
0 2 0
0 0 1
78 Unit 6
is not in RREF because row 2 does not have a leading 1. The first non-zero entry in row 2 is a 2, not
a 1. We can always transform a “leading c”, for some value c which is neither 0 nor 1, into a leading
1 by multiplying it (i.e. by multiplying the row) by 1c . So all we need to do here is to perform the
type I ero multiply Row 2 by the non-zero scalar 12 . We get
1 0 0 R2 ←( 12 )R2
1 0 0
0 2 0 −−−−−−−→ 0 1 0
0 0 1 0 0 1
In this example, for each of the matrices which was not already in row-reduced echelon form, we
were able to, quite easily, transform the matrix into RREF using elementary row operations. But
those matrices were very close to RREF to start with. In each case, we needed only a single ero to
get to RREF. Usually, several or perhaps even many ero’s will be required to obtain RREF. One
natural question which may occur to you is Is this always possible?, that is Is it always possible to
transform a matrix into RREF using ero’s?. And when more than one ero is needed, assuming that
it is possible to obtain RREF, you might also wonder Does it matter which ero’s we do, and does
the order we do them in matter?. Well, clearly it does matter which ero’s we do, because some will
take us to RREF and others won’t. But still, there will often be choices about which problem to
tackle next, or which type of ero to use to accomplish a particular goal in moving towards RREF. So
the question is, Does it matter what choices we make?. The next theorem addresses these questions.
(a) Every matrix can be transformed into RREF by applying a finite sequence of elementary row
operations.
(b) The row-reduced echelon form of a matrix is unique. That is, if the same matrix is reduced by
two different sequences of ero’s, the RREF obtained in both cases will be identical.
Although part (a) of Theorem 6.2 tells us that it is always possible to transform a matrix into
RREF by a finite sequence of ERO’s, and part (b) tells us that any sequence of ERO’s that results
in RREF will produce the same RREF, it is still possible to perform many ERO’s on a matrix with-
out ever getting any closer to having the matrix in RREF. So it is very important to approach
the reduction of a matrix in a systematic way. We want to follow a procedure which ensures that
each ero performed is moving the matrix closer to being in RREF, so that we arrive at RREF quickly.
In order for a matrix to be in row-reduced echelon form, we need each row, if it has any non-zero
entries in it, to have a leading 1. Also, we need the column which has a row’s leading 1 in it to
contain 0’s in all the other rows. When we want to transform a matrix to RREF, we tackle the rows
of the matrix from the top down, and deal with those 2 requirements, in order, for the row we’re
currently dealing with. That is, starting with the top row, we make sure that we have a leading 1
in that row, and then make the rest of the entries in the column that contains that leading 1 be
0’s. Once we’ve accomplished that, we move on to the next row. As we go along, we move any rows
which contain only 0’s down to the bottom of the matrix, and also move rows around if necessary
to ensure that the leading 1 we’re currently getting, or working with, is the left-most of the entries
which don’t yet conform.
And we want to be sure that we never “undo” any of the work done in previous steps, which
can happen if we’re not careful. For instance, if we’ve obtained a leading 1 in a particular row, we
don’t want that 1 to be transformed into some other number later on. A leading 1 should always
Unit 6 79
stay a leading 1 as we continue to reduce the matrix. Likewise, if we’ve “zeroed out” a column, so
that the leading 1 it contains is the only non-zero entry in that column, we want to be sure that we
don’t re-introduce any non-zero entries into that column later on in the reduction process.
With these considerations in mind, we use the procedure shown below whenever we need to row-
reduce a matrix. It is strongly recommended that you become familiar with, and comfortable
with using, this procedure/algorithm, in order to save time and work in reaching RREF. While it
may, on rare occasions, be true that the reduction could have been completed slightly more quickly
by some other sequence of operations, far more often this procedure gives a highly efficient route to
obtaining the row-reduced echelon form of the matrix.
needed is −c, where c is the entry which is to become 0. For instance, if we have
just obtained a leading one in row i, and some row farther down in the matrix, row
j, has a 2 in the column containing row i’s leading one, perform the type III ero:
replace row j by row j plus (-2) times row i. Or for the matrix obtained at the end
of Note 1, the next step would be
1 − 12 25 1 1 − 12 5
R2 ←R2 +(−12)R1
2 1
−−−−−−−−−−−→
12 10 3 7 0 16 −27 −5
3. Because we always “zero out” the column containing one leading 1 before moving
on to obtain the next leading 1, then when we perform any subsequent type III
ero’s, it is always 0 which is being added to a previously-obtained leading 1, leaving
it as a 1. For instance, look at the first ero being done in the line marked (*) in
Example 6.7 below (on page 81). Prior to this ero, we have obtained a leading 1 in
row 1, and “zeroed out” its column. At this point, having obtained a leading 1 in
row 2, we need to perform a type III ero to obtain a 0 in the (1, 2)-entry. When we
do this, because of the 0 in the first entry of row 2, the leading 1 in row 1 remains
1. (If we had not already obtained a 0 in the (1, 2)-entry, this next ero would end
up with something other than a 1 as the leading entry in row 1.)
4. Similarly, since we always use replace this row by this row plus a scalar multiple of
the row we’re currently working with, we never multiply “this row” by a constant,
and so we never replace a leading 1 with anything other than 1. To see this, look
at the (**) line in Example 6.7 below (on page 82). For row 1, we need to eliminate
the 3 in column 3, using the 1 in row 3. We don’t replace row 1 by − 31 R1 + R3 ,
because row 1’s leading 1 would become − 31 , and we’d need to start over to obtain
a leading 1 in row 1. What we use is R1 ← R1 + (−3)R3 . That way, the leading 1
in Row 1 isn’t changed.
5. Likewise, when we replace a row using a type III ero, since it is always the row
containing the leading 1 whose column we need to “zero out” that we are adding
scalar multiples of to other rows, and that row has 0’s in all of the columns previously
“zeroed out”, we never add anything other than 0 to a previously-obtained 0, and
so the columns we have previously “zeroed out” remain that way. Look again at
what’s being done to Row 1 in the (**) line in Example 6.7 – when we do the ero
R1 ← R1 + (−3)R3 , it’s −3 × 0 = 0 that we add to the 0 in Row 1.
6. Although a leading 1 exists in a column, and we work with that column (to “zero
it out”), it’s important to remember that a leading 1 belongs to a particular row,
not a column. That is, it is rows which must have leading 1’s, not columns. If some
column doesn’t contain any row’s leading 1, but does have non-zero entries, that
doesn’t matter.
7. Because this procedure moves us toward RREF in a systematic fashion, with each
ero taking us closer to RREF and carefully not undoing the work already accom-
plished by previous ero’s, using this procedure will never “take you around in cir-
cles”, which is definitely a danger when row-reduction is approached less systemat-
ically.
entry, then we will certainly have a row whose leading 1 is in the first column. And that will have
to be row 1, since any leading 1 in a row further down would have to be farther to the right than
row 1’s leading 1. So we will want the (1, 1)-entry of the matrix to be the leading 1 for row 1. We
currently have a 0 in the (1, 1)-entry. The easiest way to get a 1 there is to put a row which already
has a 1 as its first entry into the top position. Since both rows 2 and 3 have 1’s as their first entries,
either would do. Let’s choose row 2. Then our first ero is the type II ero: interchange the positions
of rows 1 and 2. So we have:
0 1 1 1 −1 2
R1 ↔R2
1 −1 2 − −−−−→ 0 1 1
1 0 −3 1 0 −3
Now we have a leading 1 in row 1. The next thing we want to do is eliminate the other non-zero
entries in the column containing that leading 1, i.e. column 1. Conveniently, row 2 already has a 0
in column 1, so the only non-zero we need to eliminate is the first entry in row 3. It is currently a 1,
and we need it to be a 0. We can accomplish that by subtracting the 1 that’s in the same position
in row 1. That is, we can obtain a 0 in the (3, 1)-entry of the matrix by subtracting row 1 from row
3. Therefore our next step is to perform the type III ero: replace row 3 by row 3 plus (−1) times
row 1. We get:
1 −1 2 1 −1 2
R3 ←R3 +(−1)R1
0 1 1 −−−−−−−−−−−→ 0 1 1
1 0 −3 0 1 −5
Now that we have a leading 1 in row 1 and its column contains no other non-zero entries, we turn
our attention to row 2. We need the first non-zero entry in row 2 to be a 1. Aha! It already is!
That’s convenient. So, no ero needed to accomplish that. We’ve (already) got our next leading 1.
And now we need to transform the matrix so that there are no other non-zero entries in the column
which contains row 2’s leading 1, i.e. in column 2. Currently both row 1 and row 3 have non-zero
entries in column 2, so we need to eliminate both of these. That will require 2 separate ero’s, but
both will have the form: replace row i by row i plus some multiple of row 2. What are the multiples
we need? The ones which will give the value 0 for the (i, 2)-entry after this operation. For row 1, we
want to be adding a 1 to the −1 that’s there now, so we can just add row 2. That is, the multiplier
is just 1 (the negative of the −1 which we need to transform to 0). And for row 3, we have a 1 there
(i.e. in the second column) now, so we need to be subtracting 1. Which means adding (−1) times
the row 2 entry. So the multiplier we need this time is −1 (again, the negative of the entry which is
to be transformed to 0). We perform these 2 ero’s, one after the other:
1 −1 2 1 0 3 1 0 3
R ←R1 +R2 R3 ←R3 +(−1)R2
(∗) 0 1 1 −−1−−−− −−→ 0 1 1 −−−−−−−−−−−→ 0 1 1
0 1 −5 0 1 −5 0 0 −6
So we have “dealt with” rows 1 and 2. That leaves row 3. We need the first non-zero entry in row 3
to be a 1, instead of the −6 that’s there now. We can obtain this by simply dividing row 3 by −6.
1
That is, we perform the type I ero: multiply row 3 by − . We get:
6
1 0 3 R3 ←(− 16 )R3
1 0 3
0 1 1 −−−−−−−−→ 0 1 1
0 0 −6 0 0 1
And now that we’ve obtained another leading 1, we need to “zero-out” it’s column. That is, we
need all the other entries in column 3 (which contains row 3’s leading 1) to be 0. Currently in
column 3 we have a 3 in row 1 and a 1 in row 2. To transform each of these to 0, we will per-
form two type III ero’s adding a scalar multiple of row 3 to each row, separately. In each case,
the scalar multiplier we need is just the negative of the number that’s already there. That is, −3
is the multiplier for the ero we apply to row 1, and −1 is the multiplier for the ero we apply to row 2.
82 Unit 6
Notice that when row 1 is replaced by row 1 minus 3 times row 3, neither row 2 nor row 3 is affected
by this change. Likewise, when row 2 is replaced by row 2 minus one times row 3, neither row 1 nor
row 3 is affected by the change. Whenever we do these “clear out the column which contains the
leading 1 we just found” operations, only the row whose entry we’re “clearing out” changes. This
means that we can actually perform more than one of these operations at the same time. Well,
we do them one at a time, but we don’t need to re-write the whole matrix with each one. We can
just re-write the matrix once, incorporating both of the changes we need, at the same time. Or
if there were 3 lines that all needed their non-zero entries to become 0, we could write this as a
single step in which all 3 changes appear to take place simultaneously. We still want to say what
ero’s we’re doing, so if there are 2 of them, we write one of them above the arrow and one below.
When there are more than 2, we can write two (or even more) on the same line, separated by commas.
So we can write the transformations corresponding to the 2 ero’s we need to do to eliminate the
non-zeroes in the column containing row 3’s leading 1 as:
1 0 3 1 0 0
R1 ←R1 −3R3
(∗∗) 0 1 1 − −−−−−−−→ 0 1 0
R2 ←R2 −R3
0 0 1 0 0 1
Checking the properties we need, we see that this matrix is in RREF, so we’re done. Since we only
performed ero’s, each matrix is row equivalent to the one preceeding it, and so this last matrix is row
equivalent to the first. That is, the RREF matrix we found is row equivalent to the one we were given.
Notice: You should write down the transformed matrices showing only one change at a time for as
long as you need to. If it’s too confusing to you to re-write the matrix only once for two or more
changes, don’t do it until you’re more comfortable with it. After some practise, you’ll catch on and
you’ll get sick of re-writing the matrix.
Example 6.8. Use elementary row operations to bring the augmented matrix of the system of linear
equations shown below to row-reduced echelon form, and state all solutions to the SLE.
2x + 2y + z = 4
3x + y − 2z = 5
x − 3z = −2
Solution:
First, we need to form the augmented matrix for the system. Remember, we omit the variables and
equal signs (and insert the invisible zeroes and ones) to get an augmented matrix which looks very
much like the system, but only contains the numbers (with any needed negative signs). The RHS
values are offset by a vertical line. We get:
2 2 1 4
3 1 −2 5
1 0 −3 −2
We start at the top and as far left as possible. Row 1’s left-most entry is non-zero, so we will want
it to be a leading 1. Currently it’s a 2. We could multiply row 1 by 21 to change the 2 into a 1, but
it’s easier to notice that there’s another row which already has a 1 as the first entry. We can just
make that row be row 1, by switching the positions of that row (row 3) and row 1:
2 2 1 4 1 0 −3 −2
R ↔R1
3 1 −2 5 −−3−−−→ 3 1 −2 5
1 0 −3 −2 2 2 1 4
Having just obtained a leading 1, the next thing we do is to eliminate all other non-zeroes in the
column that leading 1 is in, i.e. column 1. We can do this by subtracting 3 times row 1 from row 2
Unit 6 83
and subtracting 2 times row 1 from row 3. That is, for each of the other rows we replace that row
by itself plus the negative of the current column 1 entry times row 1. We get:
1 0 −3 −2 1 0 −3 −2
R2 ←R2 −3R1
3 1 −2 5 −−−−−−−−→ 0 1 7 11
R3 ←R3 −2R1
2 2 1 4 0 2 7 8
Now that we’ve “dealt with” row 1, we turn our attention to row 2. We see that row 2 already has
a 1 as its first non-zero entry, and that this 1 is immediately to the right of row 1’s leading 1, so this
is our leading 1 for row 2. (That is, fortuitously we don’t have to do any work to obtain a leading 1
in row 2, since it already has a 1 in the appropriate position.) Since row 2’s leading 1 is in column
2, we next need to eliminate all other non-zero entries from column 2. We do this using the same
kind of ero’s as when we were doing this for column 1, except this time it is row 2 which a multiple
of will be added to each other row as necessary. We see that row 1 already has a 0 in column 2, so
we don’t need to do anything to row 1. It is only row 3 which has a column 2 entry which must be
transformed into a 0. Since this entry is a 2, we add −2 times row 2 to row 3:
1 0 −3 −2 1 0 −3 −2
R3 ←R3 −2R2
0 1 7 11 −−−−−−−−→ 0 1 7 11
0 2 7 8 0 0 −7 −14
Row 2 has now been “dealt with” so we move on to row 3. This time, we don’t already have a
leading 1, so we have to get one. We need to transform the −7 which is currently row 3’s first
1
non-zero entry into a 1. We do that with a type I ero: multiply row 3 by :
−7
1 0 −3 −2 R3 ←(− 17 )R3
1 0 −3 −2
0 1 7 11 −−−−−−−−→ 0 1 7 11
0 0 −7 −14 0 0 1 2
And of course now that we have a new leading 1 we must eliminate all other non-zero entries from
the column containing that leading 1, i.e. column 3. Again, we use type III ero’s, this time with
multiples of row 3 being added to the other rows. We need to eliminate a −3 in row 1, so to that
row we add −(−3), i.e. 3, times row 3. Likewise, to eliminate the 7 in row 2, we add −7 times row
3. We get:
1 0 −3 −2 1 0 0 4
R1 ←R1 +3R3
0 1 7 11 −−−−−−−−→ 0 1 0 −3
R2 ←R2 −7R3
0 0 1 2 0 0 1 2
This final matrix is in RREF. (Since column 4 (the RHS column) does not contain the leading 1
for any row, it doesn’t matter what its entries are for purposes of RREF.) This RREF augmented
matrix corresponds to the linear system:
x = 4
y = −3
z = 2
Because we only performed elementary row operations in transforming the matrix, the final aug-
mented matrix is row equivalent to the original augmented matrix and therefore this new SLE has
the same solution(s) as the original system. We see that the only solution is (x, y, z) = (4, −3, 2).
Next, we want to formalize what we did in this example into a systematic approach for solving
systems of linear equations. It’s quite straightforward. We simply do exactly what we did in the
example: Write the augmented matrix for the given SLE, transform it to RREF, and use that RREF
matrix to find the solutions to the underlying transformed SLE, which is equivalent to the SLE we
started with. This procedure is called Gauss-Jordan Elimination.
84 Unit 6
The final SLE is equivalent to the original SLE, so these are also the solutions to the
original system.
Note: When we row-reduce an augmented matrix, it is never necessary to bring the whole matrix
to RREF, just the coefficient matrix part. That is, if there is a row whose leading 1 would be in the
RHS-value column, there is actually no need to convert that entry into a 1. (We’ll see why a bit
later, when we do an example in which there is such a row.)
3x + 3y + 12z = 6
x + y + 4z = 2
2x + 5y + 20z = 10
−x + 2y + 8z = 4
Solution:
We write the augmented matrix for the given system, and reduce it. Hopefully by this time you are
able to understand why we do the ero’s specified here, so the explanation will be omitted. If you have
trouble seeing what the reasoning was, look again at the procedure we described for transforming a
matrix to RREF. We get:
3 3 12 6 1 1 4 2
1 1 4 2 R1 ↔R2 3 3 12 6
2 5 20 10
−−− −−→
2 5 20 10
−1 2 8 4 −1 2 8 4
1 1 4 2 1 1 4 2
R2 ←R2 +(−3)R1 0 0 0 0 R ↔R 0 3 12 6
−−−−−−−−−−−−−−−−−−−−−→ 0
−−2−−−→
4
(R3 ←R3 +(−2)R1 ),(R4 ←R4 +R1 ) 3 12 6 0 3 12 6
0 3 12 6 0 0 0 0
1 1 4 2 1 0 0 0
R2 ←( 31 )R2 0 1 4 2 R1 →R1 +(−1)R2 0 1 4 2
−−−−−−−→
0
−−−−−−−−−−−→
3 12 6 R3 →R3 +(−3)R2 0 0 0 0
0 0 0 0 0 0 0 0
x = 0
y + 4z = 2
which gives x = 0
y = 2 − 4z
The first equation tells us about the value of x, and the second equation tells us about y, because
the leading 1 for row 1 is in the first column (the x column) and the leading 1 in row 2 is in column
Unit 6 85
2 (the column corresponding to y). The last two rows of the RREF matrix just said that 0 = 0,
so we didn’t bother writing those as part of the SLE. We do not have an equation telling us about
the value of z, so z may have any value. We set z equal to a parameter, t, to indicate this, and
substitute t for z everywhere else. We get
x = 0, y = 2 − 4t and z = t,
where t may have any real value, so the solutions to the system are described by:
(x, y, z) = (0, 2 − 4t, t), t ∈ ℜ.
That is, for any real value of t, (x, y, z) = (0, 2 − 4t, t) gives a solution to the SLE.
Note: It is always a good idea to check that the solution(s) obtained does satisfy all of the equations
in the original SLE. As we saw in the previous unit, we can do this by substituting the unique
solution, or the parametric solution, into the original equations and verifying that left side equals
right side for each. (This is left to the reader here.) If the supposed “solution” obtained does not
satisfy the equations in the original SLE, this indicates that an arithmetic error has been made at
some point in the calculations (or perhaps that some operation other than an ERO was performed
at some point, so that the final matrix is not row equivalent to the original matrix).
x + y + z = 2
2y − z = 4
x + 3y = 5
Solution:
We have:
1 1 1 2 1 1 1 2
R ←R −R
0 2 −1 4 −−3−−−−
3 1
−−→ 0 2 −1 4
1 3 0 5 0 2 −1 3
3
1
R2 ←( 21 )R2
1 1 2 1 0 2 0
R ←R1 −R2
−−−−−−−→ 0 1 − 21 2 −−1−−−−−−→ 0 1 − 21 2
R3 ←R3 −2R2
0 2 −1 3 0 0 0 −1
The coefficient matrix is now in RREF, so we stop. The SLE corresponding to this last augmented
matrix is:
x + 3z 2 = 0
z
y − 2 = 2
0x + 0y + 0z = −1
The last equation says that 0 = −1, which of course is nonsense. As we have seen before, this sort of
thing is telling us that the SLE is inconsistent, so our conclusion is that the given SLE has no solution.
Notice what the final augmented matrix looked like in this example. The last row of the matrix
has only 0’s in the coefficient matrix part, and a non-zero value in the RHS column. There are 3
things to note about this.
First of all, it doesn’t matter what non-zero value is in the RHS column for a row in which the
coefficient matrix part contains only zeroes. As long as the coefficient matrix part is all 0’s and the
RHS column isn’t, the corresponding equation tells us that the system is inconsistent. And this is
the only situation in which we would, if we insisted on putting the whole augmented matrix into
RREF instead of just the coefficent matrix part, have a leading 1 in the RHS column. Of course,
86 Unit 6
we could obtain an augmented matrix (wholly) in RREF by simply multiplying the last row by −1
(or in general, by 1 over whatever the non-zero RHS value is). But there’s no need to do so. This is
why we express the Gauss-Jordan Method as only requiring that the coefficient matrix part of the
augmented matrix be in RREF.
Next, suppose we get a row like this in the augmented matrix before we’ve finished bringing the
coefficient matrix to RREF. If we continue reducing the augmented matrix, this row is not going
to change. The only ero’s we might perform involving this row, no matter how much work remains
to get to RREF, would be interchanging this row with other rows, moving it farther and farther
down the matrix. (In the example above, it was already at the bottom of the matrix, and we were
finished by the time we got it, so that didn’t happen.) So when we finish reducing the augmented
matrix, there will still be a row in which the coefficient matrix part contains only 0’s and the RHS
column contains a non-zero. And that row corresponds to an equation saying that zero equals some
non-zero value, which tells us that the system is inconsistent. And in that case, any work we did
after getting this row of the matrix was unnecessary. There’s no need to have the rest of the matrix
in RREF if we can already tell that the conclusion is going to be that the SLE has no solution.
Therefore, you can stop row-reducing as soon as a row like this appears in the matrix, and simply
conclude at that point that the system is inconsistent and has no solution.
Finally, if we know that a row which contains only 0’s in the coefficient matrix part and a non-
zero in the RHS column corresponds to the equation 0 = c for some c 6= 0 and thus tells us that
the system is inconsistent, then we don’t need to bother writing the SLE which corresponds to the
reduced augmented matrix at all. We simply observe from the final augmented matrix that the SLE
has no solution.
Similarly, though, for any other kind of final augmented matrix we can also just “read off” the
solution from the final augmented matrix, without necessarily having to think about “okay, what is
the SLE which corresponds to this augmented matrix?” and then “what is/are the solution(s) to
that SLE?”. We can describe a procedure for identifying the set of solutions to a SLE directly from
the final augmented matrix. Consider what we found in the last 3 examples, in reverse order.
First of all, in Example 6.10 which we just did, we had the situation we’ve just been discussing
– a row in which there are only 0’s in the coefficient matrix part but with a non-zero entry in the
RHS column. And as we have discussed, regardless of anything else that may be in the augmented
matrix, this means that the system has no solution, because it is inconsistent. So that’s the first
thing we check.
Now, look at the final augmented matrix in Example 6.9. In that example we found that the
system had a parametric family of solutions. What characteristics did the final augmented matrix
have, to give a corresponding SLE which lead us to that conclusion? Well, let’s think about that.
0 0 0 0
Unit 6 87
row 1’s leading 1 is in column 1, and row 2’s is in column 2. Rows 3 and 4 do not have leading 1’s,
so there is no row whose leading 1 is in column 3. The variables in the original SLE were x, y and
z, so column 3 is the z-column. Row 1 is telling us about the value of x (it says x = 0), because the
leading 1 for row 1 is in the x-column (i.e. the first column). Row 2 is telling us about the value of
y, because the leading 1 in row 2 is in the y-column. But there is no row telling us about the value
of z, because the z-column does not contain the leading 1 for any row.
As we have seen, this means that z is free to take on any value. And likewise, any variable whose
column in the final augmented matrix does not contain the leading 1 for any row is free to take on
any value. Thus, once we have obtained the row-reduced augmented matrix, and we have checked
that the system is not inconsistent, if there is any variable whose column does not contain a leading
1 for some row in that matrix, then we set that variable equal to a parameter to obtain the set of
solutions for the system, and this solution set is a “parametric family of solutions”. If there is more
than one such variable, then we must set each of them equal to a different parameter. In that case,
the solution set to the SLE is a multiple parameter family of solutions.
Of course, if we do have a parametric family of solutions, then we need to write the SLE corre-
sponding to the final augmented matrix in order to find those solutions. But we can skip the part
where we literally write the underlying system, and introduce the parameter(s) immediately.
Notice that if a SLE is consistent, then having a column in the coefficient matrix part of the
augmented matrix which does not contain the leading 1 for any row can only occur when the num-
ber of leading 1’s in the final augmented matrix is smaller than the number of variables in the system.
If the system is not inconsistent, and does not have a parametric family of solutions, then the
only other possibility is that there is a unique solution. Consider the final augmented matrix. If the
system is not inconsistent, then there isn’t a row which contains only 0’s in the coefficient matrix
part but with a non-zero in the RHS column. So either there are no rows of the matrix which contain
only 0’s in the coefficient matrix, or else each such row has 0 also in the RHS column, i.e. is a row
containing only 0’s. And if the system does not have a parametric family of solutions, then there is no
column in the coefficient matrix part which does not contain a leading 1 for any row. That is, every
column in the coefficient matrix part of the augmented matrix does contain a leading 1 for some row.
And if every column in the coefficient matrix part of the final augmented matrix does contain
the leading 1 for some row, then the RREF of the coefficient matrix contains nothing but 0’s and
1’s. And when you write the SLE corresponding to the final augmented matrix, then (ignoring any
equations which just say 0=0) each equation has only a different one variable on the LHS and a
number on the RHS. So this kind of reduced augmented matrix is telling us about a unique solution,
which we can find as follows: For each row, the variable whose column contains the leading 1 which
is the only non-zero entry in the row is equal to the RHS value in that row.
1 0 0 4
For instance, in Example 6.8 we had the final augmented matrix 0 1 0 −3 . We see that
0 0 1 2
in the coefficient matrix part, each column does contain the leading 1 for some row, so this part of
the matrix only contains these leading 1’s and the 0’s which the rest of each of these columns must
be filled with. Row 1 has its leading 1 in column 1, the x-column, so it’s telling us that x = 4. Row
2 has its leading 1 in column 2, the y-column, so this row says that y = −3. And row 3 has its
leading 1 in column 3, which is the z-column, so we have z = 2.
Theorem 6.3. Recognizing the solution(s) for a SLE from the final augmented matrix
Given an augmented matrix whose coefficient matrix part is in RREF:
1. If any row of the matrix contains only 0’s in the coefficient matrix part, with a non-zero in the
RHS column, then the system is inconsistent and has no solution.
2. Otherwise, if there are any columns in the coefficient matrix part which do not contain the
leading 1 for any row, the system has a parametric family of solutions (i.e. infinitely many
solutions). For each such column, set the corresponding variable equal to a different parameter.
Use the final matrix to write the underlying SLE and re-arrange to express each of the other
variables in terms of the parameter(s).
3. And if neither of the conditions above is true, then the system has a unique solution. Simply set
each variable equal to the RHS value for the row whose leading 1 is in that variable’s column.
2x − y + 3z = 24
+ 2y − z = 14
7x − 5y = 6
Solution:
We write the augmented matrix, row-reduce it, and then observe from the final matrix what the
solution(s), if any, to this SLE is/are.
− 21 3
2 −1 3 24 R1 ← 12 R1
1 2 12
0 2 −1 14 −−−−−−→ 0 2 −1 14
7 −5 0 6 7 −5 0 6
− 21 3
1 2 12
1 − 12 3
2 12
R2 ←( 12 )R2
R3 ←R3 +(−7)R1
− 12
−−−−−−−−−−−→ 0 2 −1 14 0
−−−−−−−→ 1 7
0 − 32 − 21
2 −78
0 − 23 − 21
2 −78
5 31 5 31
1 0 4 2 1 0 4 2
R1 ←(R1 +( 12 )R2 ) 4
R3 ←(− 45 R3 )
− 21 − 21
−−−−−−−−− 3
−−→
R3 ←(R3 +( 2 )R2 )
0 1 7
0
−−−−−−−−−→ 1 7
0 0 − 45
4 − 135
2 0 0 1 6
1 0
R1 ←(R1 +(− 54 )R3 )
0 8
−−−−−−−−−1−−−→ 0 1 0 10
R2 ←(R2 +( 2 )R3 )
0 0 1 6
Looking at the final RREF augmented matrix, we see that (1) there is no row which contains only
0’s in the coefficient matrix part with a non-zero in the RHS column, so the system is consistent; (2)
every column of the coefficient matrix part does contain the leading 1 for some row, so no parameters
are needed, i.e. each variable has a unique value; and (3) from row 1, x has the value 8, from row
2, y has the value 10, and from row 3, z has the value 6. Therefore the unique solution to the given
SLE is (x, y, z) = (8, 10, 6).
Unit 6 89
x1 + x2 − x3 + x4 = 0
x3 + x4 + x5 = 0
2x1 + 2x2 − x3 + x5 = 0
x1 + x2 − 2x3 − x5 = 0
2x3 − 4x4 + 2x5 = 0
−x1 − x2 + 2x3 − 3x4 + x5 = 0
Solution:
When we write the augmented matrix, we see that row 1 already has a leading 1. We “zero out” its
column (column 1):
1 1 −1 1 0 0
0 0 1 1 1 0
2 2 −1 0 1 0
1 1 −2 0 −1 0
0 0 2 −4 2 0
−1 −1 2 −3 1 0
1 1 −1 1 0 0
0 0 1 1 1 0
R3 ←R3 +(−2)R1 0 0 1 −2 1 0
−−−−−−−−−−−−−−−−−−−−−→
(R4 ←R4 +(−1)R1 ),(R6 ←R6 +R1 )
0 0 −1 −1 −1 0
0 0 2 −4 2 0
0 0 1 −2 1 0
Notice that, below row 1, the left-most non-zero entry is in column 3, and is a leading 1 for row 2.
That is, column 2 (like column 1) contains only 0’s below row 1. Thus, as we proceed, column 2 will
not contain a leading 1 for any row. (Although this is a bit unusual, it does sometimes happen. It
means, of course, that we will (at the end) need to introduce a parameter for this column.) Since
row 2 already has a leading 1, we now need to clear out column 3.
1 1 0 2 1 0
0 0 1 1 1 0
(R1 ←R1 +R2 ),(R3 ←R3 −R2 ),(R4 ←R4 +R2 ) 0 0 0 −3 0 0
−−−−−−−−−−−−−−−−−−−−−−−−−−−−→
(R5 ←R5 +(−2)R2 ),(R6 ←R6 −R2 )
0 0 0 0 0 0
0 0 0 −6 0 0
0 0 0 −3 0 0
After moving the zero row to the bottom of the matrix (R4 ↔ R6 ), we proceed by obtaining our
next leading 1 (in row 3, column 4) and then clear out column 4.
90 Unit 6
1 1 0 2 1 0
0 0 1 1 1 0
R3 ←− 13 R3
0 0 0 1 0 0
−−−−−−−→
0 0 0 −3 0 0
0 0 0 −6 0 0
0 0 0 0 0 0
1 1 0 0 1 0
0 0 1 0 1 0
R1 ←(R1 −2R3 ),(R2 ←R2 −R3 ) 0 0 0 1 0 0
−−−−−−−−−−−−−−−−−−−−→
(R4 ←R4 +3R3 ),(R5 ←R5 +6R3)
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
The last matrix is in RREF. We see that there are no rows in which the coefficient matrix part
contains only 0’s while the RHS contains a non-zero. Therefore the system is consistent. We see
that as expected the x2 column does not contain the leading 1 for any row, and neither does the x5
column, so we need to set each of these equal to a parameter. Setting x2 = s and x5 = t we see that
row 1 says that x1 + s + t = 0, so x1 = −s − t; row 2 says that x3 + t = 0, so x3 = −t; and row 3
says that x4 = 0. Therefore the system has the 2-parameter family of solutions
Unit 7:
Matrix Operations
(text reference: Section 3.1)
c
V. Olds 2010
Unit 7 91
7 Matrix Operations
At the beginning of Unit 6, we defined the mathematical construct called a matrix. Next we
learn more about matrices, especially how to do matrix arithmetic. First, we should review the
definition of a matrix. And define one more term, that we didn’t need in what we have done so far.
The horizontal lines of numbers are called rows and the vertical lines of numbers are
called columns. The number, aij , in row i and column j is called the (i,j)-entry of the
matrix. A matrix with m rows and n columns is called an m × n matrix (pronounced
“m by n”). The numbers m and n are called the dimensions of the matrix.
When one of the dimensions of a matrix is 1, so that the matrix has only one row or one column,
the matrix is very similar to a vector. Because of that, we sometimes use the word vector in describ-
ing the matrix. But we need be clear about which dimension is 1, so we qualify the term vector.
This also helps to remind us that we’re really talking about a matrix, rather than an actual vector.
Definition: For any value n > 1, a 1 × n matrix can be referred to as a row vector.
Similarly, for any value m > 1, an m × 1 matrix can be referred to as a column vector.
Example 7.1. Describe each of the following matrices, and identify both the (1, 2)-entry and the
(2, 1)-entry, if the matrix has one.
1 2
3 4 0
1 2 3 3
(a) (b) 5 6 (c) 1 (d) −3 0 2 (e) −32
4 5 5
7 8
7
2
9 10
Solution:
1 2 3
(a) Since the matrix has 2 rows and 3 columns, it is a 2 × 3 matrix. The (1, 2)-entry
4 5 5
of this matrix is 2 (i.e. the number in row 1, column 2), and the (2, 1)-entry is 4 (i.e. the number
in row 2, column 1). (Notice that both in stating the dimensions of the matrix and in referring to a
particular entry, the first number always refers to row(s).)
1 2
3 4
(b) The matrix 5 6
has 5 rows and 2 columns, so it is a 5 × 2 matrix. The (1, 2)-entry is 2
7 8
9 10
and the (2, 1)-entry is 3.
92 Unit 7
0
(c) We have the matrix 1 , which has 3 rows and only 1 column, so this is a 3 × 1 column vector.
2
Since it doesn’t have any column 2, there is no (1, 2)-entry. The (2, 1)-entry is 1.
3
(d) This matrix, −3 0 2 , has only 1 row, with 4 columns, so it is a 1 × 4 row vector. The
7
(1, 2)-entry is 0, and of course there is no (2, 1)-entry.
(e) Is that a matrix? Just −32 ? Sure it is. We can tell by the square brackets around it (as
well as by the fact that the question said that each of the question parts involved a matrix). This
matrix has only 1 row and 1 column. It is a 1 × 1 matrix, and therefore has neither a (1, 2)-entry
nor a (2, 1)-entry. Its only entry is the (1, 1)-entry, which is −32. Note: According to our definitions
of row vector and column vector, the “other” dimension must be bigger than 1, so a 1 × 1 matrix
is not considered to be either of these things. We just call it a 1 × 1 matrix (which helps us to re-
member that it is a matrix, rather than just a scalar which happens to be written in square brackets.)
There is some more terminology and notation for matrices that we should talk about. In vectors,
we talk about corresponding components, meaning the numbers in the same position in 2 vectors in
the same space. Similarly, when we’re talking about two matrices which have the same dimensions,
we use the term corresponding entries to refer to the numbers in the same positions in the 2
matrices. So for instance if we have two m × n matrices, i.e. with the same values of m and of
n for each, the (1, 1)-entries of the 2 matrices are corresponding entries. And the (3, 2)-entries of
the matrices, if there are any, are also corresponding entries. In general, the (i, j)-entry of one ma-
trix and the (i, j)-entry of the other matrix, for the same values of i and j, are corresponding entries.
Matrices are named with capital letters. And when a matrix is named with a particular capital
letter, we often use the lower case version of the same letter, subscripted with row and column
indices, to denote entries in the matrix. For instance, if we have a matrix called A, we can use aij to
denote the (i, j)-entry of A. And then sometimes we want to define a matrix as the matrix A whose
(i, j)-entry is called aij . We do this by saying “Let A = [aij ]”, or “Consider the matrix A = [aij ]”.
So [aij ] simply denotes the matrix containing entries which are referred to as aij . For instance, we
could say “Consider the 2 × 3 matrix A = [aij ] with aij = i − j”, which defines A to be the 2 × 3
matrix in which each entry is its row number minus its column number. So we would have
0 −1 −2
A=
1 0 −1
Definition:
• Any matrix in which every entry is zero is called a zero matrix. So for any positive
integers m and n, there is an m × n zero matrix.
Notice: This is similar to the idea of the zero vector in ℜn .
• For any n > 1, any n × n matrix is called a square matrix of order n.
That is, a square matrix is just a matrix which has the same number of rows and
columns. And the order of the matrix is the number of rows (or the number of
columns).
• In a square matrix of order n, the entries aii for i = 1, ..., n are called the main
diagonal of the matrix.
That is, the main diagonal of a square matrix runs diagonally, from the top left
corner to the bottom right corner of the matrix.
Unit 7 93
• Any matrix in which the only non-zero entries appear on the main diagonal is called
a diagonal matrix.
So in a diagonal matrix, all the entries aij for i 6= j are 0. Of course, there may
also be some zeroes along the main diagonal.
• The identity matrix of order n is the n × n diagonal matrix in which aii = 1 for
all i = 1, ..., n. The identity matrix of order n is often denoted In , or just I.
That is, an identity matrix is a square matrix which has 1’s all along the main
diagonal, and 0’s everywhere else.
Here, A is the 2 × 2 zero matrix. It is also a square matrix of order 2. And since it is a square matrix,
and all off-diagonal entries are 0, we could also say that it is a diagonal matrix. (Any square zero
matrix can be said to be a diagonal matrix. But diagonal matrices usually do have some non-zero
entries.) B is another square matrix of order 2. And C is another zero matrix — the 2 × 3 zero
matrix. Matrix D is a square matrix of order 3, and since all the non-zero entries are along the
main diagonal, with zeroes everywhere else, it is a diagonal matrix. And I4 , of course, is the identity
matrix of order 4. Which means it’s also a square matrix, and a diagonal matrix. (Notice: We’ve
seen identity matrices before. A square matrix in RREF which doesn’t have any rows of only zeroes
is always an identity matrix.)
Some matrix concepts, definitions, and/or arithmetic operations are just like the corresponding
concepts, definitions and/or arithmetic operations for vectors in ℜn . We’ve already seen some, like
the zero matrix. Next we learn some more.
Definition:
• Matrix Equality: Two matrices are said to be equal if and only if they have the
same dimensions, and their corresponding entries are equal.
That is, A = B if and only if A and B are both m × n matrices (for the same m
and n) and it is true that aij = bij for all values of i and j.
• Matrix Addition: If A and B have the same dimensions, then the sum of matrices
A and B is obtained by summing the corresponding entries.
So if A and B are both m × n matrices, the matrix C = A + B has cij = aij + bij for
all i and j. Notice that if A and B do not have the same dimensions, then A + B
is not defined. We can only add matrices which have the same dimensions.
• Scalar Multiplication: For any matrix A and any scalar c, the scalar multiple
cA is obtained by multiplying every element of A by c.
So the matrix B = cA has bij = c(aij ) for all i and j.
• Negation: For any matrix A, the negative of A, denoted −A, is the matrix
(−1)A. That is, each entry of −A is the negative of the corresponding entry of A,
so if B = −A, then bij = −aij for all i and j.
94 Unit 7
• Matrix Subtraction: For any matrices A and B which have the same dimensions,
the matrix difference A − B is defined to be the sum of A and −B.
That is, if C = A − B, then C = A + (−B), so cij = aij − bij for all i and j.
Notice that each of these works in exactly the same way as the analogous operation for vectors.
Vectors can only be equal, or be added or subtracted, if they’re from the same space. For matrices,
they must have the same dimensions. That is, in both cases, they must have the same number of
entries (components), in the same configuration. And the scalar multiplication operation multiplies
every element by the scalar, both for vectors and for matrices. Likewise, just as −~v = (−1)~v , we
have −A = (−1)A for any matrix A.
Solution:
1 −2 3
(a) A = =B
4 0 6
Since A and B are both 2 × 3 matrices and aij = bij for each pair (i, j), they are equal matrices.
1 0 3 1 0 3
(b) A = 6= B =
5 1 −2 5 1 2
Although A and B are both 2 × 3 matrices, with many of their entries identical, there is a combi-
nation ij for which aij 6= bij (i.e. a23 = −2 whereas b23 = 2). Therefore, A and B are not equal
matrices.
1 4
1 2 3
(c) A = 6 B= 2 5
=
4 5 6
3 6
Here, A has dimension 2 × 3, whereas B has dimension 3 × 2, so they cannot be equal matrices, no
matter how similar their entries may be.
1 0
1 0
(d) A = 6 B= 0
= 1
0 1
0 0
Again, A and B do not have the same dimension (A is 2 × 2 while B is 3 × 2), so they are not equal.
Unit 7 95
Before we look at more examples, there is one more matrix operation we should define. This
one is not like any operation on vectors, because it involves changing the dimensions of the matrix,
by interchanging the rows and columns, which has no counterpart in the context of vectors, since a
vector has only one dimension. Effectively, we turn the matrix sideways, so that the rows become
columns and the columns become rows. We refer to this as transposing, i.e. finding the transpose
of, the matrix.
2 3
1 a b
Example 7.3. If A = −1
4 and B = , are there any values of a, b and c for
c 2 −1
0 −2
which A = 2B T ?
Solution:
Comparing this last matrix to matrix A, we see that all of the known values match. That is, both
matrices have 2 has their (1, 1)-entry, 4 as their (2, 2)-entry and −2 as their (3, 2)-entry, so it will
be possible to find values of a, b and c which make these matrices equal. (If there was any entry
for which known values in the 2 matrices were not identical, then it would not be possible for the
matrices to be equal.)
2 3 2 2c
We need to have −1 4 = 2a 4 , so it must be true that 2c = 3, 2a = −1 and 2b = 0.
0 −2 2b −2
We see that we need c = 32 , a = − 21 and b = 0.
96 Unit 7
Example 7.4. Find the sum of matrices A and B, if possible, in each of the following.
2 −1 3 1 5 0
(a) A = B=
0 2 5 −2 4 −6
−2 3 1 3 −2
(b) A = B=
1 4 −2 1 −3
5
− BT =
(c) A = 0 2 7 −4
−3
Solution:
(a) Recall that in order to add two matrices they must have the same dimensions. Since A is a 2 × 3
matrix and B is also a 2 × 3 matrix, the sum A + B is defined. Also recall that the sum of two matri-
ces is the matrix whose entries are the sums of the corresponding entries of the two matrices. We get
2 −1 3 1 5 0 (2 + 1) (−1 + 5) (3 + 0) 3 4 3
A+B = + = =
0 2 5 −2 4 −6 (0 − 2) (2 + 4) (5 − 6) −2 6 −1
(b) This time, A is a 2 × 2 matrix while B is a 2 × 3 matrix. Matrix addition can only be per-
formed when the matrices to be added have the same dimensions, so in this case A+B is not defined.
(c) We see that A is a 3 × 1 column vector. And since −B T is a 1 × 3 row vector, then so is B T and
therefore B is a 3 × 1 column vector. (Notice: The transpose of the transpose of a matrix is just
the original matrix. So B = (B T )T .) Therefore it will be possible to add A and B. But of course,
we need to find B. Recall that the negative of a matrix can be obtained by changing the sign of
each entry in the matrix. And of course the negative of the negative of a matrix is just the original
matrix (i.e. −(−M ) = M for any matrix M ). So here, B T = −(−B T ).
−2
−B T = 2 7 −4 ⇒ B T = − 2 7 −4 = −2 −7 4 ⇒ B = −7
4
Now that we have found B (which is, as we knew it would be, a (3 × 1) matrix, so that it can be
added to A), we find A + B:
5 −2 5−2 3
A + B = 0 + −7 = 0 − 7 = −7
−3 4 −3 + 4 1
Notice: We could have done this more directly as follows, using the fact that the transpose of the
transpose of a matrix is the matrix itself. (That is, if we switch the rows and columns, and then
switch them again, we have just put them back where they were in the first place.) So we can
consider B as (B T )T , and of course adding can be considered as subtracting the negative of the
matrix, and to subtract one matrix from another we just subtract the corresponding components.
Furthermore, whether we change the signs of a matrix before or after transposing it clearly makes
no difference. That is, (−B)T = −(B T ), so we have
A + B = A − (−B) = A − [−(B T )T ] = A − (−B T )T
Therefore to add A and B we can subtract the transpose of −B T from A:
5 5 2 5−2 3
T
A+B = A−(−B T )T = 0 − 2 7 −4
= 0 − 7 = 0−7 = −7
−3 −3 −4 −3 − (−4) 1
Unit 7 97
Example 7.5. Given A and B as follows, find (a) 3A − B and (b) (2A − 3I + B T )T .
1 2 1 2
A= B=
3 4 −2 0
Solution:
(a) Notice that since A and B are both 2 × 2 matrices, the stated operations are all defined. We
find 3A by multiplying each element of A by 3, and then subtract B by subtracting corresponding
components:
1 2 1 2 3(1) 3(2) 1 2
3A − B = 3 − = −
3 4 −2 0 3(3) 3(4) −2 0
3−1 6−2 2 4
= =
9 − (−2) 12 − 0 11 12
(b) Recall that I is an identity matrix, i.e. a square matrix whose main diagonal elements are all
1’s and whose off-diagonal elements are all 0’s. Since I here appears in a sum/difference of 2 × 2
matrices, clearly it is I2 , i.e. the identity matrix of order 2, which is meant. (That is, we assume that
I here means the particular identity matrix for which the required calculation is defined.) We start
by finding the matrix whose transpose will be the final answer. That is, we can find 2A − 3I + B T ,
and then take the transpose of that matrix to get (2A − 3I + B T )T . We have:
T
T 1 2 1 0 1 2
2A − 3I + B = 2 −3 +
3 4 0 1 −2 0
2 4 3 0 1 −2
= − +
6 8 0 3 2 0
2 − 3 + 1 4 − 0 + (−2) 0 2
= =
6−0+2 8−3+0 8 5
T T 0 8
Therefore (2A − 3I + B ) =
2 5
So far we have mostly been dealing with arithmetic operations for matrices which are very similar
to the corresponding arithmetic operations for vectors. But with matrices, there is also a multipli-
cation operation defined. Recall that with vectors, we don’t have a multiplication operation. We do
have two different kinds of products, the dot product and the cross product, but neither of these is
considered to be multiplication, so for vectors there is nothing that directly corresponds to multi-
plication. And the dot product operation for vectors is not one which could easily and directly be
extended to the context of matrices. Also, the cross product is only defined for vectors in ℜ3 , so
it is very exclusive and cannot be extended to matrices. However, part of the multiplication op-
eration for matrices will look very familiar, because it does involve what is effectively the dot product.
Matrix Multiplication
Matrix multiplication is more complicated than the other matrix operations that we’ve looked at
so far. It’s not hard, just somewhat more complicated. Once you get the hang of it, it’s easy. But
let’s work up to it gradually, to make sure you remember the steps. First, we’ll look at the rules
about which matrices can be multiplied together?.
98 Unit 7
Well, that’s probably not what you were expecting! It seems a little quirky, but it’s not that
hard a rule. And there’s a good reason for it. Once we learn how to calculate a matrix product,
you’ll see why we need those dimensions to match. And then it will be easy to remember, because if
they don’t match, you won’t be able to calculate the entries of the product matrix. You can think
of it as the “inner” dimensions of the product. That is, if we multiply an m × n matrix times an
n × p matrix, we’re doing (m × n) × (n × p) and it’s those two inside dimensions, that are right
next to each other but from different matrices, that have to be the same. (Of course, when we say
(m × n) × (n × p), the middle × doesn’t mean the same thing that the other two do. But the fact
that it looks the same is kind of helpful. Or we could write it as (m × n) · (n × p), because sometimes
we use · to represent multiplication. And using the · might be even better, to help you remember
what to do ... but we’re not there yet.)
So for instance, if A is a 2 × 3 matrix, and B is a 3 × 2 matrix, then we can form the matrix
product AB, because (2 × 3) × (3 × 2) has the 2 inner dimensions matching. We can always mutliply
a “something” by 3 times a 3 by “anything”. Likewise, we can multiply a 3 × 2 times a 2 × 3, so
the matrix product BA is also defined. However, the products A(B T ) and (AT )B are not defined,
because for A(B T ) we’re trying to multiply a 2 × 3 times a 2 × 3, and for (AT )B we’re trying to
multiply a 3 × 2 times a 3 × 2. In both of those, the number of rows in the second matrix is not the
same as the number of columns in the first matrix.
And now, suppose that we also have C, which is a 2 × 2 matrix. Then we can use C in a matrix
product as the first matrix if it’s multiplying a matrix that has 2 rows, or as the second matrix in
the product if it’s being multiplied by a matrix that has 2 columns. So the matrix product CA is
defined (i.e. (2 × 2) × (2 × 3) works) and the matrix product BC is defined (i.e. (3 × 2) × (2 × 2)
works). But the matrix product AC is not defined (because (2 × 3) × (2 × 2) doesn’t match) and
neither is the matrix product CB (because (2 × 2) × (3 × 2) doesn’t match either). On the other
hand, the products (AT )C and C(B T ) are defined.
1. We always just write the names of the matrices beside each other to express a matrix
product. We don’t use a × or a · to indicate that we’re multiplying. Just the same
as with unknowns. We never write x × y, we just write xy to say x times y. With
numbers, we need a multiplication symbol between them, or brackets, because two
number written beside each other means something else ... another number. (e.g. if
we write 62, that doesn’t mean 6 times 2, it means sixty-two.) But if A is a matrix
and B is a matrix, then AB never means anything but A times B, so we don’t need
a symbol to say “times”.
2. When we write a T to indicate the transpose of a matrix, it always means just the
matrix it’s attached to, i.e. right beside. So we don’t usually write something like
A(B T ). There’s no need for the brackets. We just write AB T and we know that it
means A times the transpose of B, because the T is on the B. If we wanted to say
“the transpose of the product matrix AB”, then we would have to write it as (AB)T .
We need the brackets, so that the transpose can be “attached” to the brackets to
show that it’s the whole thing inside the brackets that is being transposed.
Unit 7 99
How many different matrix products of the form M1 M2 are defined, where each of M1 and M2 is
either one of the given matrices or the transpose of one of the given matrices?
Solution:
A is a 2 × 3 matrix, so AT is a 3 × 2 matrix. Both B and B T are 2 × 2 matrices. C is a 3 × 1 matrix
and D is a 3 × 4 matrix, so C T is a 1 × 3 matrix and DT is a 4 × 3 matrix.
Let’s consider the matrices, one by one, as the first matrix in the product, and see which of the 8
matrices could be the second matrix in the product. We can form a matrix product of the form
AM for any matrix M which has 3 rows, i.e. as long as M is a 3 × n matrix for any value n.
Therefore the products AAT , AC and AD are all defined. For the matrix product BM , M must be
a 2 × n matrix. A satisfies this requirement, as do B itself, and its transpose, so the products BA,
BB and BB T are all defined. For CM we would need M to be a 1 × n matrix. Only C T meets
this requirement, so the only product of this form which is defined is CC T . And DM requires that
M be a 4×n matrix, which only describes DT , so DDT is the only product with D as the first matrix.
Of course, we could also have a transposed matrix as the first matrix in the product. For AT M we
need M to be a 2 × n matrix, and that means that AT A, AT B and AT B T are all defined. Since B T
is a 2 × 2 matrix, we can again have any of those same matrices as the second matrix in a product
B T M , so B T A, B T B and B T B T are defined. C T M needs M to be a 3 × n matrix, so C T C, C T D
and C T AT are all defined. Similarly, since DT also has 3 columns, it can multiply any of those same
matrices, that all have 3 rows, so DT C, DT D and DT AT are all defined.
Using only these 4 matrices and their transposes, any of 20 different matrix products can be formed.
Okay, so we know which matrix products are defined. But what do we get when we multiply one
matrix by another? That is, if the matrix product AB is defined, what does it produce? Well, it
gives a new matrix. And the 2 inner dimensions, that are the same, collapse in on themselves and
disappear, as we see in the following.
Example 7.7. Recall the matrices defined in Example 7.6. Which of the matrix products identified
in that example as being defined give a matrix which is either a row vector or a column vector?
100 Unit 7
Solution:
We had the following: A is a 2 × 3 matrix, B is a 2 × 2 matrix, C is a 3 × 1 matrix (a column vector)
and D is a 3 × 4 matrix, so that AT is a 3 × 2 matrix, B T is another 2 × 2 matrix, C T is a 1 × 3
matrix (a row vector) and DT is a 4 × 3 matrix.
For the product matrix M1 M2 to be a row vector, i.e. a 1 × p matrix, we need M1 to have only 1
row, i.e. to also be a row vector, with dimensions 1 × n, and so M2 must be an n × p matrix. The
only row vector in the previous example is matrix C T and so only the products of the form C T M
product a row vector. These are the products C T C, C T D and C T AT , which give a 1 × 1 matrix, a
1 × 4 matrix and a 1 × 2 matrix, respectively. But in our definition of row vector, we specified that
a row vector must have more than one column, so C T C isn’t a row vector after all. Only C T D and
C T AT are row vectors.
Similarly, for the product matrix M1 M2 to be a column vector, it must be an m × 1 matrix, for
some m > 1. That is, the second matrix in the product must be a column vector. Therefore we
need M2 to be an n × 1 matrix, for any value n and so M1 must be an m × n matrix, for some
m > 1. (That is, M1 cannot be a row vector.) Among the matrices in the example, C is the only
column vector, and AC, C T C and DT C were the only products which could be formed with C as
the second matrix in the product. But C T C, the product of a 1 × 3 times a 3 × 1, produces a 1 × 1
matrix, and hence is not a column vector. That is, C T , which is a 1 × 3 matrix, doesn’t satisfy the
requirement of not being a row vector, so the only products for which the product matrix is a col-
umn vector are AC (which gives a 2×1 column vector) and DT C (which gives a 4×1 column vector).
At this point, we know which matrix products are defined, and what the dimensions of the prod-
uct matrix are. Both things depend on the dimensions of the matrices in the product. We must have
the number of columns in the first matrix of the product being the same as the number of rows of
the second matrix in the product, and this number somehow disappears, so that the product matrix
has the same number of rows as the first matrix in the product, and has the same number of columns
as the second matrix in the product. Why? And how does that happen? Well, it’s because we do
what is effectively a dot product. And as you recall, a dot product is the product of 2 n-vectors,
and its value is only a single number. We’ll start by seeing how to multiply a matrix with only 1
row times a matrix with only one column, i.e. multiplying a row vector times a column vector. This
involves only a single calculation. After that, we’ll see that to multiply larger matrices, we just do a
series of that same kind of calculation, one for each combination of a row of the first matrix in the
product and a column of the second matrix.
and B T =
(a) AB, where A = 1 2 3 4 .
(c) C T C, where C T =
5 0 −3 .
Unit 7 101
Solution:
(a) If we let ~a = (1, 2) and ~b = (3, 4), then we have
h
3 i
= ~a • ~b = [(1, 2) • (3, 4)] = [(1)(3) + (2)(4)] = [3 + 8] = [11]
AB = 1 2
4
(b) Since A is a row vector, then AT is a column vector, which is what we need here. Notice
that we don’t actually need to define vectors to calculate their dot product. We can just do the
T 0 −5 2 5 and
calculation, without the vectors. We need to calculate BA , where B =
A = 1 0 −3 2 , which we can do by multiplying the k th entry in the only row of B by the
(c) For C T =
5 0 −3 we get
5
CT C =
5 0 −3 0 = [5(5) + 0(0) + (−3)(−3)] = [25 + 0 + 9] = [34]
−3
Note: It’s important to remember that, unlike a vector dot product in which the product of two
vectors is a scalar, the product of two matrices is always a matrix. So when we find the product of a
row vector and a column vector, we don’t just get a number. We get a matrix, which contains only
one number (i.e. entry).
Also Note: In order to calculate the product of a row vector times a column vector in the manner
defined, the row of the first matrix must contain the same number of entries as the column of the
second matrix does. The number of entries in a row of a matrix is the number of columns in the
matrix, because a row has one entry for each column. Similarly, the number of entries in a column
of a matrix is the number of rows in the matrix, because a column has one entry for each row. This
is why the number of columns in the first matrix must be the same as the number of rows in the
second matrix, in order for the product matrix to be defined.
Now we’re ready to actually define the product of two matrices — any two matrices whose prod-
uct is actually defined. That is, we’re ready to learn how to calculate matrix products for larger
matrices. All we do is “dot” each row with each column, to get a number which is then one of the
entries in the product matrix.
Definition: Let A be any m × n matrix and B be any n × p matrix, for some values
of m, n and p. Also, let C = AB. Then C is the m × p matrix whose (i, j)-entry is
ai1 b1j + ai2 b2j + ... + ain bnj . That is, if we let ~ai be the n-vector whose components are
the entries from row i of matrix A, in order, and ~bj be the n-vector whose components are
the entries from column j of matrix B, in order, then we have C = [cij ], where cij = ~ai •~bj .
Note: In the matrix product AB, where A is an m × n matrix and B is an n × p matrix, we do one
of these dot-product-like calculations for each combination of a row of A and a column of B. The
102 Unit 7
calculation using row i and column j gives us the (i, j)-entry of the product matrix. This gives m
different rows of entries (all the entries calculated using a particular row of A are in that row of the
product matrix) in p different columns (all the entries calculated using a particular column of B are
in that column of the product matrix). That’s why the product matrix is an m × p matrix.
Solution:
(a) Let’s do one with the vectors, calculating each entry separately. We define a vector for each
row
1 2
1 2 1
of the matrix A = and a vector for each column of the matrix B = −3 −1 . We
1 3 2
2 4
get ~a1 = (1, 2, 1), ~a2 = (1, 3, 2), ~b1 = (1, −3, 2) and ~b2 = (2, −1, 4). Now we dot each a vector with
each b vector, with ~ai • ~bj giving us the (i, j)-entry of AB. Since we’re multiplying a 2 × 3 matrix
times a 3 × 2 matrix, the product matrix will be a 2 × 2 matrix. We get:
(1, 1)-entry = ~a1 • ~b1 = (1, 2, 1) • (1, −3, 2) =
1(1) + 2(−3) + 1(2) = 1−6+2 = −3
(1, 2)-entry = ~a1 • ~b2 = (1, 2, 1) • (2, −1, 4) =
1(2) + 2(−1) + 1(4) = 2−2+4 = 4
(2, 1)-entry = ~a2 • ~b1 = (1, 3, 2) • (1, −3, 2) =
1(1) + 3(−3) + 2(2) = 1−9+4 = −4
(2, 2)-entry = ~a2 • ~b2 = (1, 3, 2) • (2, −1, 4) =
1(2) + 3(−1) + 2(4) = 2−3+8 = 7
1 2
1 2 1 −3 4
Now, we put it all together in a matrix. We see that AB = −3 −1 = .
1 3 2 −4 7
2 4
(b) We can do the calculations separately, as we did in the previous part, or we can do them right
in a matrix. In this case, we’re multiplying a 4 × 3 matrix times a 3 × 1 matrix, so the product will
be a 4 × 1 matrix.
1 2 3 (1, 2, 3) • (3, 1, −2)
1 −2 1 3 (1, −2, 1) • (3, 1, −2)
AB = 1 =
3 −1 2 (3, −1, 2) • (3, 1, −2)
−2
−1 0 1 (−1, 0, 1) • (3, 1, −2)
1(3) + 2(1) + 3(−2) 3+2−6 −1
1(3) + (−2)(1) + 1(−2) 3−2−2 −1
=
3(3) + (−1)(1) + 2(−2) =
9−1−4
=
4
(c) In part (b) we still used the vectors, even though we didn’t name them. But we don’t need
to. We can just do the vector-like calculations. (Besides, for the product we’re asked for this time,
Unit 7 103
they would be 1-vectors, which we haven’t actually defined.) We are asked to calculate CC T , where
T
C = 5 0 −3 . We are multiplying a 3 × 1 column vector times a 1 × 3 row vector, which will
give us a 3 × 3 matrix (!). Because each row of the first matrix, C, contains only 1 entry, as does
each column of the second matrix, C T , each of the dot-product-type calculations is just a single
product of two numbers. Remember, each of the row 1 entries of the product matrix is row 1 of C
“times” a different column of C T , and similarly for the other rows. We get:
5 5(5) 5(0) 5(−3) 25 0 −15
CC T = 0 5 0 −3 =
0(5) 0(0) 0(−3) = 0 0 0
−3 (−3)(5) (−3)(0) (−3)(−3) −15 0 9
1 2
Example 7.10. If A = , find AA.
−1 3
Solution:
We have a 2 × 2 matrix, so in the matrix product AA, the number of rows in the second matrix
(2) is the same as the number of columns in the first matrix (also 2) and therefore the product is
defined. Since we are multiplying a 2 × 2 matrix times a 2 × 2 matrix, the product matrix will also
be a 2 × 2 matrix. We get:
1 2 1 2 1(1) + 2(−1) 1(2) + 2(3) 1−2 2+6 −1 8
AA = = = =
−1 3 −1 3 (−1)(1) + 3(−1) (−1)(2) + (3)(3) −1 − 3 −2 + 9 −4 7
In this example, we multiplied a matrix by itself. We saw that the product was defined, and
that the product matrix has the same dimensions as the original matrix. Whenever we have a
square matrix, so that the number of columns is equal to the number of rows, the product of the
matrix multiplied by itself is defined. And for a square matrix of order n, when we do this we’re
multiplying an n × n matrix times and n × n, so the product matrix is also an n × n matrix,
i.e. a square matrix of order n. And that means that we could also multiply this product matrix
by the original matrix again. And the result would be another n×n matrix, so we could keep going ...
We use the same terminology and notation for this product of copies of the same matrix multi-
plied together as we use for numbers. If we have some number c and multiply it by itself, we get
cc = c2 , which we express as “c squared”. And then if we multiply by c again, we get c2 c = ccc = c3 ,
which we call “c cubed”, and so forth. In general, if we multiply k copies of a number c together,
we get ck , which in general we refer to as “c to the power k”. So when we multiply a matrix A
by itself, we use A2 to denote the product, and call it “A squared”. And if we then multiply by
this product matrix by A again, we get A3 , pronounced “A cubed”. And in general, if we do this
over and over, multiplying k copies of A together, we call the product matrix Ak , pronounced “A
to the power k”. Of course, this only works if A is a square matrix, so that these products are defined.
1 2
Example 7.11. For A = , find A4 .
−1 3
Solution:
−1 8
In Example 7.10, we had this same matrix and we found AA, i.e. we calculated A2 = .
−4 7
We can use that in finding A4 .
104 Unit 7
Approach 1: We can find A3 = A2 A and then use that to find A4 = A3 A. That is, we just keep
multiplying by A until we have done it enough times. We get:
−1 8 1 2
A3 = A2 A =
−4 7 −1 3
(−1)(1) + 8(−1) (−1)(2) + 8(3) −1 − 8 −2 + 24 −9 22
= = =
(−4)(1) + 7(−1) (−4)(2) + 7(3) −4 − 7 −8 + 21 −11 13
Approach 2: We could get to the answer more quickly by realizing that 4 copies of A multiplied
together (i.e. A4 ) could be considered as 2 copies of A multiplied by together, and then multiplied
by another 2 copies of A multiplied together (i.e. A2 times A2 ). This way we get:
−1 8 −1 8
A4 = A2 A2 =
−4 7 −4 7
(−1)(−1) + 8(−4) (−1)(8) + 8(7) 1 − 32 −8 + 56 −31 48
= = =
(−4)(−1) + 7(−4) (−4)(8) + 7(7) 4 − 28 −32 + 49 −24 17
Sometimes when a matrix is multiplied by itself repeatedly, a particular pattern can be seen
in the various powers of the matrix, i.e. in the matrices that are the matrix powers. This allows
us to express all powers of the matrix easily, or perhaps to find a particular power of the matrix
without actually having to do all the matrix multiplication. Three different kinds of such patterns
are observed in the following example.
99
Example 7.12. Determine what Ak looks like
for all k > 1, and findA , for each of the following.
1 0 1 1
(a) A = I2 (b) A = (c) A =
0 −1 1 1
Solution:
(a) We start by finding A2 :
1 0 1 0
A2 = AA = I2 I2 =
0 1 0 1
1(1) + 0(0) 1(0) + 0(1) 1+0 0+0 1 0
= = =
0(1) + 1(0) 0(0) + 1(1) 0+0 0+1 0 1
We see that A2 = A. But then if we multiply by A again, we’re just doing the same calculation
we already did. And so we’ll get the same result. And then multiplying that matrix by A will give
those same calculations again, with the same result. And so forth. So in this case, we see that
k 1 0 99 1 0
A = for all k ≥ 1, so A =
0 1 0 1
Once again we get A2 = I2 . But since this time A 6= I2 , calculating A3 won’t involve repeating the
same calculations. So we need to calculate A3 :
1 0 1 0
A3 = A2 A =
0 1 0 −1
1(1) + 0(0) 1(0) + 0(−1) 1+0 0+0 1 0
= = = =A
0(1) + (1)(0) 0(0) + (1)(−1) 0+0 0−1 0 −1
But now, since A3 = A, then if we multiply by A again we’ll just be repeating the calculations we
did in finding A2 . And those calculations will, of course, once again give A2 . So without doing any
more calculations at all, we can see that A4 = A2 = I2 . And then when we multiply by A again, to
get A5 , we’ll be repeating the calculations that lead to A3 , so we’ll get A again (because A3 = A).
And so forth. Everytime we multiply by A, we’re going to get either A itself or the matrix that was
the product of A multiplied by A. And so we see that:
k 2 1 0
For any k > 1 with k an even number, we get A = A = .
0 1
1 0
And for any k > 1 with k an odd number, we get Ak = A = .
0 −1
1 0
Therefore, since 99 is an odd number, we know that A99 = A = .
0 −1
Hmm. Well, let’s calculate A3 and see what that looks like:
2 2 1 1
A3 = A2 A =
2 2 1 1
2(1) + 2(1) 2(1) + 2(1) 4 4
= =
2(1) + 2(1) 2(1) + 2(1) 4 4
Let’s see. We’ve got a matrix filled with 1’s, then a matrix filled with 2’s, and next a matrix filled
with 4’s, and ... Of course, 2 = 21 and 4 = 22 , so ... Aha! Let’s think for a minute. Suppose that
we have a matrix filled with 2m ’s, and we multiply that matrix by A. Then we have:
m
2m
m
2 (1) + 2m (1) 2m (1) + 2m (1)
2 1 1
=
2m 2m 1 1 2m (1) + 2m (1) 2m (1) + 2m (1)
m
2 (2) 2m (2)
=
2m (2) 2m (2)
m+1
2m+1
2
=
2m+1 2m+1
So each
time wemultiply byA, we multiply all the entries in the matrix by 2. And since we have
21 21 22 22
2 3
A = amd A = , then we see that
21 21 22 22
98
298
k−1
2k−1
k 2 99 2
A = and so A =
2k−1 2k−1 298 298
In our various examples and other calculations in this unit we have used some, and you may
have observed others, of the following properties of the matrix operations we’ve been learning.
It is worth looking at an example, to demonstrate this last property as well as the reason for this
reversal in the order of multiplication.
1 1
1 0
Example 7.13. For A = 1 2 and B = , find (AB)T .
2 −1
1 3
Solution: According to property 16, we can either find the matrix product AB and then take its
transpose, or else take the transposes of A and B and then find the product (B T )(AT ).
1 1 1 1 2
For the second approach, we have AT = and B T = .
1 2 3 0 −1
Notice that AT is 2 × 3 and B T is 2 × 2. The matrix product (AT )(B T ) is not defined, because
the dimensions are not appropriate for multiplying these matrices. However, the matrix product
(B T )(AT ) is defined (this is true whenever the matrix product AB is defined). And even if the
dimensions were not wrong (without the reversal), because we have interchanged the roles of rows
and columns in taking the transposes of the matrices we must reverse the order of multiplication, in
order to be doing the same dot product calculations as we did using the first approach.
1 2 1 1 1 1+2 1+4 1+6 3 5 7
We get (B T )(AT ) = = =
0 −1 1 2 3 0−1 0−2 0−3 −1 −2 −3
We see that this matrix is the same as the one found by the first approach. That is, we see that
(B T )(AT ) = (AB)T .
It is equally important to recognize the properties that do not hold for matrix operations. Most
notably, matrix multiplication is not commutative. That is, in general, AB 6= BA, even when both
of these matrix products are defined and they have the same dimension.
First of all, notice that neither AB nor BA might be possible. (i.e., A and B might not have
appropriate dimensions for either product to exist.) And even if one of these matrix products exists,
the other may not. As well, if both AB and BA exist, they may have different dimensions, and
therefore cannot be equal.
However, even if A and B are square matrices of the same order, so that AB and BA are both
defined and have the same dimension, it still will not usually be true that these are equal matrices.
The next example demonstrates this.
108 Unit 7
1 1 1 2
Example 7.14. For A = and B = , show that AB 6= BA.
2 1 −1 2
Solution:
1 1 1 2 1−1 2+2 0 4
AB = = =
2 1 −1 2 2−1 4+2 1 6
1 2 1 1 1+4 1+2 5 3
BA = = = 6= AB
−1 2 2 1 −1 + 4 −1 + 2 3 1
Notice: This calculation involved 3 ‘dot products’ to find BC (each involving 4 scalar products)
and then 2 ‘dot products’ to find A(BC) (each involving 3 scalar products) for a total of 5 ‘dot
products’ (and 18 scalar multiplications). Finding AB, i.e. multiplying a 2 × 3 matrix times a 3 × 4
matrix, would require 8 ‘dot products’ (each involving 3 scalar products) and then finding (AB)C,
i.e. multiplying a 2 × 4 matrix times a 4 × 1 matrix, would require 2 more ‘dot products’ (each
involving 4 scalar products) and so a total of 10 ‘dot products’ (involving 32 scalar products) would
be required. By eliminating the largest dimension first, we have significantly reduced the amount of
arithmetic required (and thereby reduced the opportunities for making arithmetic mistakes).
Math 1229A/B
Unit 8:
The Inverse of a Matrix
(text reference: Section 3.2)
c
V. Olds 2010
Unit 8 109
Suppose we have a system of m equations in the n unknowns x1 , x2 , ..., xn . Let aij be the
coefficient of the j th variable in the ith equation. Call the right hand side value of the ith equation
bi . So we have the SLE:
a11 x1 + a12 x2 + ... ... + a1n xn = b1
a21 x1 + a22 x2 + ... ... + a2n xn = b2
.. .. .. .. ..
. . . . .
am1 x1 + am2 x2 + ... ... + amn xn = bm
Now, let A = [aij ] be the coefficient matrix for this SLE. Also, let X be a column vector whose
(j, 1)-entry is simply the unknown xj , i.e. X just looks like a vertical list of the unknowns. And let
B be a column vector whose entries are the right hand side values, so that the (i, 1)-entry of B is
bi . Notice that X is an n × 1 matrix and B is an m × 1 matrix.
Consider the matrix product AX. We have an m × n matrix times an n × 1 matrix, so this
product is defined, and the product matrix will be an m × 1 matrix. And because of the way matrix
multiplication works, this product matrix is:
x1
a11 a12 · · · · · · a1n x2
a21 a22 · · · · · · a2n ..
AX = . .. .. .. .. .
.. . . . . .
. .
am1 am2 · · · · · · amn
xn
a11 x1 + a12 x2 + . . . . . . + a1n xn
a21 x1 + a22 x2 + . . . . . . + a2n xn
= .. .. .. .. ..
. . . . .
am1 x1 + am2 x2 + . . . . . . + amn xn
Why look at that! AX is a column vector whose entries are the left hand sides of the equations.
And so if we write a matrix equation AX = B, this simply says that the ith entry of column vector
AX must be equal to the ith entry of column vector B, i.e. it says that for each equation in the
system, the left hand side of the equation must equal the right hand side of the equation. That is,
the matrix equation AX = B exactly represents the SLE. We have:
x1
a11 a12 · · · · · · a1n x2 b1
a21 a22 · · · · · · a2n . b2
.
AX = B ⇒ . .. .. .. .. . = .
.. . . . . . ..
..
am1 am2 · · · · · · amn bm
xn
a11 x1 + a12 x2 + . . . . . . + a1n xn b1
a21 x1 + a22 x2 + . . . . . . + a2n xn b2
⇒ .. .. .. .. .. = ..
. . . . . .
am1 x1 + am2 x2 + . . . . . . + amn xn bm
When we write the augmented matrix [A|B], which is the coefficient matrix of the SLE with the
column vector of right hand side values appended as an extra column, it is just a form of short-hand
110 Unit 8
for this matrix equation AX = B. We assume that we know what the unknowns are, and so the
interesting (i.e. important) parts of the equation are the coefficient entries in A and the right hand
side values in B. It’s easy to see how
x1
a11 a12 · · · · · · a1n b1 a11 a12 · · · · · · a1n x2 b1
a21 a22 · · · · · · a2n b2 a21 a22 · · · · · · a2n ..
b2
.. .. .. .. .. .. is shorthand for .. .. .. .. .. . = ..
.
. . . . . . . . . . . .
..
am1 am2 · · · · · · amn bm am1 am2 · · · · · · amn bm
xn
Consider, for instance, the SLE from Example 6.9 back in Unit 6 (page 84). The system is:
3x + 3y + 12z = 6
x + y + 4z = 2
2x + 5y + 20z = 10
−x + 2y + 8z = 4
Notice that when we solve the SLE AX = B, the solution is a set of values for the unknowns
in the column vector X. When we state a solution to a system of equations, it is usually more
convenient to express it as a vector, rather than as a column vector. That is, if the solution to an
SLE involving x, y and z is x = 1, y = 2 and z = 3, we generally write this as
x 1
(x, y, z) = (1, 2, 3) rather than y = 2
z 3
Similarly, when we found the solution to the SLE shown above, in Example 6.9, we wrote it as
(x, y, z) = (0, 2 − 4t,
t),
and it makes
sense that we would continue to do so, rather than having
x 0
to write it as y = 2 − 4t . (See how messy that looks when we need to mention it in a
z t
paragraph! The vector form simply fits more easily.)
Because of this practice of writing an n × 1 column vector in the form of an n-vector, we often
give column vectors names using vector notation, rather than using matrix notation. That is, rather
Unit 8 111
than talking about the column vectors X and B, we can refer to these as the vectors ~x and ~b. So
we usually write the matrix equation form of an SLE as A~x = ~b, rather than as AX = B. This
is the convention which we will use from now on. However, you should always keep in mind that
when we write A~x = ~b for a system of m equations in n variables, since A is (as always) the m × n
coefficient matrix, ~x, the vector of variables, is really (i.e. represents) an n × 1 column vector, and
~b, the vector of right hand side values, is actually an m × 1 column vector, so that this equation
makes sense mathematically. (That is, “matrix times vector” is not defined. The vector is really a
column vector, i.e. a matrix.)
Definition: Any SLE involving m equations and n variables can be represented by the
matrix form of the SLE A~x = ~b, where A is the m × n coefficient matrix, ~x is the vector
(technically, column vector) of the unknowns and ~b is the vector (technically, column
vector) of right hand side values. Both ~x and ~b can be written out either as column
vectors or as n- or m-vectors, whichever is most convenient in the present context.
This means that solving a system of linear equations is equivalent to solving the matrix equation
A~x = ~b for the vector ~x. When we have a single equation in one unknown, of the form ax = b,
where x is a variable and a and b are scalars, we can solve this easily by multiplying both sides of
the equation by the inverse of a, a−1 , to get x = a−1 b. That is, you may think of this as dividing
both sides of the equation by a, but we can also express it as multiplying by a1 , which can also be
written as a−1 and is called the multiplicative inverse of a. The equation A~x = ~b looks very much
like ax = b, but it is much more complicated than it looks. We have more than one unknown,
disguised as the vector ~x, and there are several equations, hiding in the matrices A and ~b. And yet
... it is sometimes possible to do something like x = a−1 b to find the solution to the matrix equation
A~x = ~b. To do this, we need to define the inverse of a matrix.
Definition: Let A be a square matrix. If there exists a matrix B with the same dimen-
sions as A such that
AB = BA = I
then we say that A is invertible (or nonsingular) and that B is the inverse of A,
written B = A−1 . If A has no inverse (i.e., if no such matrix B exists), then A is said to
be noninvertible (or singular).
1 2 −2 1
Example 8.1. For A = and B = 3 , show that B is the inverse of A.
3 4 2 − 12
Solution:
We must show that AB = BA = I. Notice that here, since AB and BA are both 2 × 2 matrices, I
means I2 . We first calculate AB:
1 2 −2 1
AB = 3
3 4 2 − 12
=
(3)(−2) + (4) 23 (3)(1) + (4) − 12
−2 + 3 1 − 1 1 0
= = =I
−6 + 6 3 − 2 0 1
112 Unit 8
Do you suppose it’s possible for a matrix to have more than one inverse? Could there be some
other, different, matrix C such that if we find AC and CA for the matrix A in the example, we once
again get I? Well, no. Just as a1 is the only number for which a a1 = 1, so that the number a has
only one inverse, it is also true that a matrix cannot have more than one inverse.
Proof: Let A be any invertible square matrix. Suppose that matrix B is an inverse of A. And
suppose that C is also an inverse of A. We will show that it must be true that B = C.
We know that A is a square matrix, as it must be to be invertible. Let n be the order of the square
matrix A. Then B and C must also be square matrices of order n. Of course, the n × n matrix B
can be multiplied by the identity matrix of order n, In . This matrix multiplication leaves the matrix
unchanged (by property 6 of Theorem 7.1 in Unit 7, page 106). That is, we have:
B = BIn
However, since C is an inverse of A, then by definition AC = In , that is In = AC, so we can write:
BIn = B(AC)
Now, we know that matrix multiplication is associative (by property 5 of the same Theorem). This
means that
B(AC) = (BA)C
But since B is also an inverse of A, then, again using the definition, BA = In and so we have:
(BA)C = In C
Finally, we know that multiplying the square matrix C by the identity matrix of the same order
leaves the matrix unchanged (as before), so we get:
In C = C
Putting this all together, we have:
B = BIn = B(AC) = (BA)C = In C = C
and we see that, in fact, B = C. That is, we see that in order for B and C to both be inverses of
A, they must be the same matrix, so any invertible matrix has a unique inverse.
There is another very useful theorem, which tells us that whenever AB = I is true, it must also be
true that BA = I, so we don’t have to check. For instance, in Example 8.1 we only really needed to
compute one of AB or BA to prove that B is the inverse of A.
Unit 8 113
Theorem 8.2. Let A be a square matrix. If a square matrix B exists with AB = I, then it must
always be the case that BA = I as well, so that in fact B = A−1 .
At this point, we know what it means to say that a particular matrix is the inverse of some
square matrix. And we know how to tell whether some particular (square) matrix is the inverse
of some other particular (square) matrix. But suppose we have some matrix, and perhaps we even
know that it is invertible (also known as nonsingular), that is we know that it has an inverse ...
But how do we find the inverse matrix? Well, we can do it by setting up a particular augmented
matrix (bigger than the ones we’ve used before) and row-reducing. If we want to find the inverse of
some square matrix A of order n, then we set up an n × 2n augmented matrix which has all of the
columns of A on the left side, and all of columns of the identity matrix In on the right side, with the
line coming between the two sets of columns. (So we’ve just jammed A and In together, with the
right bracket of A and the left bracket of In fused together to make the line.) And then we bring
this matrix to RREF. When we’re done, if the matrix A on the left has been transformed into the
matrix In , which used to be on the right, then the matrix which is now on the right is A−1 . (Voila!
It’s magic!)
Note: The text contains an explanation of why this procedure finds the inverse, when it exists, in
their Example 3 of this section. You should have a look at that (pages 103 - 104).
Then
1. if C = I then D = A−1
2. if C is not the identity matrix, then A is not invertible, i.e., is singular.
Notice: This procedure does more than just find the inverse of the matrix, if it exists. It also gives
us a way of knowing that no inverse exists, when we apply the procedure to a singular matrix.
1 2
Example 8.2. Use this procedure to find the inverse of A = .
3 4
Solution:
We set up the augmented matrix, writing the columns of A, then a line, then the columns of I2 :
1 2 1 0
3 4 0 1
114 Unit 8
And now we perform elementary row operations to bring this matrix to RREF. Row 1 already has
a leading 1, so we clear out the rest of that column, and then get a leading one in row 2 and clear
out that column, too.
1 2 1 0 R2 ←R2 −3R1 1 2 1 0
−−−−−−−−→
3 4 0 1 0 −2 −3 1
R2 ←− 21 R2 1 2 1 0
−−−−−− −→
0 1 32 − 21
R1 ←R1 −2R2 1 0 −2 1
−−−−−−−−→
0 1 32 − 21
The matrix is now in RREF. We see that the left side of the matrix does contain the columns of I2 ,
so matrix A is invertible and the columns in the right half of the matrix are the columns of A−1 .
Therefore
−1 −2 1
A = 3
2 − 21
Ordinarily we would check our arithmetic by finding the product AA−1 to confirm that we do get
I2 . However the inverse we obtained in this case is the matrix that we already saw, in Example 8.1,
is the inverse of this matrix A, so we don’t need to check that again.
Solution:
(a) We wish to find the inverse of a 3 × 3 matrix, so we start by forming the augmented matrix
obtained by appending the 3 × 3 identity matrix. Then we row-reduce this augmented matrix to
bring it to RREF.
1 1 1 1 0 0 1 1 1 1 0 0
R2 ←R2 −R1
[A|I] = 1 2 3 0 1 0 −−−−−−−−− → 0 1 2 −1 1 0
R3←R3 −2R1
2 1 2 0 0 1 0 −1 0 −2 0 1
1 0 −1 2 −1 0
R1 ←R1 −R2
−−−−−−−−→ 0 1 2 −1 1 0
R3←R3 +R2
0 0 2 −3 1 1
R3 ← 12 R3
1 0 −1 2 −1 0
−−−−−−→ 0 1 2 −1 1 0
3 1 1
0 0 1 −2 2 2
1
− 21 1
1 0 0 2 2
R1 ←R1 +R3
−−−−−−−−→ 0 1 0 2 0 −1
R2 ←R2 −2R3 3 1 1
0 0 1 −2 2 2
This matrix is now in RREF. We see that the matrix A (i.e. the left side of the augmented matrix)
has been transformed into the 3 × 3 identity matrix. This tells us that A is invertible and that the
columns on the right side of the RREF augmented matrix are the columns of A−1 . Thus we see that
1
− 21 1
2 2
A−1 = 2 0 −1
− 32 2
1 1
2
Unit 8 115
Since the product of the given matrix A times the matrix we think is the inverse of A does give I3 ,
our inverse matrix is correct.
This last matrix is not yet in RREF. However, we can see that the matrix on the left is not going to
become I3 . The bottom row of the columns on the left side of the matrix has only 0’s. If we continue
performing ero’s, we will next get a leading one in row 3, and it will be way over on the right, in the
fourth column of the matrix. We’ll get it by changing the signs in row 3, but that doesn’t affect the
0’s in the left half of row 3. When we clear out the rest of column 4, we’ll be adding scalar multiples
of row 3 to the other rows. So in column 3, we’ll be adding 0’s and nothing will change. And it
certainly won’t affect the fact that (3, 3)-entry will be a 0, not a 1. So as previously stated, we’re
not going to end up with the columns on the left forming the 3 × 3 identity matrix. Even without
finishing bringing this augmented matrix to RREF, we can stop, and conclude that A is singular,
i.e. is not invertible. Therefore A−1 does not exist in this case, so A is singular. (Notice that when
we stopped, the LHS of the augmented matrix was in RREF, and since that RREF was not I3 , we
could see that A is singular.)
Now, let’s return to the problem of solving a system of linear equations. We have seen that we
can represent any SLE by the matrix equation A~x = ~b, where A is the m × n coefficient matrix of
the SLE, ~x is actually an n × 1 column vector, containing the unknowns from the SLE, and ~b is
really another column vector, m × 1 this time, which contains the right hand side values of the SLE.
(Recall that we’re now using bi to denote the RHS value of equation i.)
In the particular situation where the coefficient matrix A happens to be an invertible square
matrix, we can use the inverse of this matrix, A−1 , to easily solve the SLE. Consider multiplying
each side of the matrix equation by this inverse matrix. In order to make the dimensions be right
116 Unit 8
for this multiplication to be defined, we pre-multiply by A−1 , by which we mean that A−1 is the first
matrix in the product. (As opposed to post-multiplying by a matrix, which means that the specified
matrix is the second term in the matrix product.)
So we take our matrix equation that represents the SLE, and we write A−1 at the beginning of
the LHS, and also at the beginning of the RHS. When we do this, we get:
⇒ I~x = A−1~b
⇒ ~x = A−1~b
Notice: In an m × n SLE with a square coefficient matrix, we have A (which must have m rows)
being an m × m matrix, and therefore so is A−1 (when it exists). And the vector of right hand side
values, ~b, is an m × 1 column vector. So the matrix product A−1~b involves multiplying an m × m
matrix times an m × 1 matrix, and therefore (in this situation) this product is always defined.
This means that if we can find A−1 , we are able to solve the system simply by multiplying A−1
times ~b. That is, the method we can use in this situation is:
~x = A−1~b
Remember: The method of inverses can only be used when A is a square invertible matrix. If the
coefficient matrix A is not a square matrix, or is singular, then we cannot use this approach to find ~x.
Example 8.4. Solve the following system using the method of inverses.
x + y + z = 4
x + 2y + 3z = 6
2x + y + 2z = 5
Solution:
The SLE can be represented as a matrix equation as:
1 1 1 x 4
A~x = ~b ⇒ 1 2 3 y = 6
2 1 2 z 5
We have a coefficient matrix A which is a square matrix. Also, this is the same matrix whose inverse
we found in Example 8.3(a). So we know that A is invertible, and that its inverse is
1
− 21 1
2 2
A−1 = 2 0 −1
− 32 1
2
1
2
Unit 8 117
Notice: For any SLE for which the coefficient matrix A is square and invertible, A−1 is unique
and therefore so is A−1~b. This means that any system of linear equations for which the method of
inverses can be used must have a unique solution. That is, if the coefficient matrix of an SLE is a
non-singular square matrix, then the SLE has a unique solution.
Solution:
The coefficient matrix for this system is:
1 1 1
A= 1 2 3
1 2 4
Example 8.6. For what value(s) of c does the following SLE not have a unique solution?
x + y + z = 1
x + 2y + 3z = 1
x + 2y + cz = 1
Solution:
We know that if the coefficient matrix, which is square, is invertible, then the system must have
a unique solution. So the only time that this system will not have a unique solution is when the
coefficient matrix has no inverse. We need to determine what value or values of c make the coefficient
matrix singular. We do this by trying to find the inverse.
1 1 1 1 0 0 1 1 1 1 0 0
R2 ←R2 −R1
1 2 3 0 1 0 −−−−−−−−→ 0 1 2 −1 1 0
R3 ←R3 −R1
1 2 c 0 0 1 0 1 c − 1 −1 0 1
1 0 −1 2 −1 0
R1 ←R1 −R2
−−−−−−−−→ 0 1 2 −1 1 0
R3 ←R3 −R2
0 0 c − 3 0 −1 1
1
As long as c − 3 6= 0, i.e. as long as c 6= 3, we can multiply row 3 by to obtain a leading 1 in
c−3
row 3, and then we will be able to clear out the rest of column 3 and get the columns of I3 on the
left side of the augmented matrix. Therefore for any c 6= 3, the coefficient matrix is invertible and
therefore the SLE has a unique solution.
However, if c = 3, then the left side of the bottom row of the matrix is all 0’s and therefore when
we finish bringing the matrix to RREF, the left side of the matrix will not contain the columns of
an identity matrix, meaning the the coefficient matrix has no inverse. So it is only in the case c = 3
that the SLE can fail to have a unique solution.
Notice that the ero’s performed to bring the matrix [A | I] to RREF are the same (up to the point
at which the cooefficient matrix A is in RREF, at least) as those which would be performed to bring
the augmented matrix of the SLE, [A | ~b] to RREF. And so when we solve the system, we will again
have the bottom row on the left of the augmented matrix containing only 0’s. Therefore column
3 of the RREF of the augmented matrix will not contain the leading one for any row. If the RHS
value in the bottom row of the augmented matrix is not also 0, the system will be inconsistent and
have no solution. But if the RHS value in the bottom row of the augmented matrix is also 0 (which
will be the case for the given SLE), then the system will have a one-parameter family of solutions,
i.e. will have infinitely many solutions.
Either way, when c = 3 the given SLE does not have a unique solution. But for any other value of
c, it does. So c = 3 is the only such value.
Math 1229A/B
Unit 9:
Theory of SLE’s
(text reference: Section 3.3)
c
V. Olds 2010
Unit 9 119
9 Theory of SLE’s
We have seen that any system of linear equations can be written as a matrix equation, i.e. a
statement about certain matrices. This means that characteristics of SLE’s are very closely related
to certain characteristics of matrices. Now that we know quite a bit about matrices, we can use
what we know, and some other things that we’ll learn in this unit, to recognize various properties
of systems of linear equations. (In the units yet to come, we’ll learn more about certain matrices,
which we will relate to SLE’s, too.)
We have seen examples of SLE’s which have no solution, and others which have a unique solution,
as well as some which have infinitely many solutions. We asserted before that these were the only
possibilities. Now that we know how to express an SLE as a matrix equation, and how to perform
matrix operations, and know the properties of those operations, we can actually prove that there
are no other possibilities.
Reminder: In the following theorem, and throughout this unit, it is important to remember that
when we express a system of linear equations as A~x = ~b, the “vectors” ~x and ~b are actually column
vectors. That is, they are really matrices, which happen to have only one column, so it’s more
convenient to express them as vectors. But they have the characteristics of matrices, not vectors,
and the operations we perform on them are matrix operations. And of course it’s also important
to remember what the dimensions of the various matrices in the equation are. For a system of m
equations in n unknowns, in the corresponding matrix equation A~x = ~b, A is an m × n matrix, ~x
is an n × 1 matrix and ~b is an m × 1 matrix. (In particular, keep in mind that ~x and ~b do not
necessarily have the same dimensions, even though both are column vectors.)
Theorem 9.1. Any system of linear equations which has more than one solution must have in-
finitely many solutions, i.e. must have a parametric family of solutions.
Proof:
Let A~x = ~b be any SLE with m equations in n unknowns (so that A is m × n, ~x is n × 1 and ~b is
m × 1). Suppose that the n × 1 column vectors ~x1 and ~x2 , with ~x1 6= ~x2 , are both solutions to this
system. That is, suppose that ~x1 and ~x2 are two different solutions to the SLE. Then A~x1 = ~b and
also A~x2 = ~b.
Aw
~ = A [(1 − t)~x1 + t~x2 ] (from our definition of w)
~
= A(1 − t)~x1 + At~x2 (by property 7 of Theorem 7.1)
= (1 − t)A~x1 + tA~x2 (by property 14 of Theorem 7.1)
= (1 − t)~b + t~b (because A~x1 = ~b and A~x2 = ~b)
= ~b − t~b + t~b (by property 9 of Theorem 7.1)
= ~b (because we were adding and subtracting the same thing)
~ = ~b, so w
That is, we see that Aw ~ is also a solution to A~x = ~b.
We see that whenever the system A~x = ~b has two different solutions ~x1 and ~x2 , it also has solutions
of the form w~ = (1 − t)~x1 + t~x2 for any real value of t. And since ~x1 and ~x2 are different solutions,
then these other solutions are all different from one another. And there are infinitely many of them,
so A~x = ~b has infinitely many solutions.
120 Unit 9
Since the 3 possibilities mentioned above are possible, and this theorem proves that no other
possibilities exist, then the following result follows directly from this theorem. (Note: Corollary
means a result which follows directly from a result already proven.)
Corollary 9.2. For any system of linear equations there are exactly 3 possibilities:
• the SLE may have no solution,
• the SLE may have a unique solution, or
• the SLE may have infinitely many solutions.
There is a particular characteristic which a matrix has, and which we have not yet defined, which
tells us a lot about how many solutions a system of linear equations can have when that matrix is
the coefficient matrix of the system. This characteristic is determined by the RREF of the matrix.
Definition: The rank of a matrix is the number of non-zero rows in the row-reduced
echelon form of the matrix. We can use r(A) to denote the rank of matrix A. Also, we
say that the m × n matrix A has full rank if r(A) = n, i.e. if every column of the
RREF of A contains the leading one for some row.
Of course, this means that if we want to find the rank of some matrix A, we need to find the
RREF of A. And then we just count the number of non-zero rows. That number is the value of r(A).
1 1 1 1 1 1
RREF
(b) 2 2 2 −−−−→ 0 0 0
3 3 3 0 0 0
This time, there are some zero rows in the RREF of A. In fact, the RREF of A has only one non-zero
row, so r(A) = 1. (And since A has more than 1 column, A does not have full rank.)
1 1 1 1 0 0
RREF
(c) 2 3 3 −−−−→ 0 1 0
3 3 5 0 0 1
Unit 9 121
h i
Notice that the given matrix, whose rank we are asked to find, is the augmented matrix, A | ~b , not
h i
the matrix A which is embedded in this augmented matrix. The RREF of A | ~b has 3 non-zero
h i
rows, and so we have r A | ~b = 3. And since the augmented matrix has 3 columns, this matrix
has full rank.
1 1
Notice: From the RREF of the augmented matrix, we can also see that the matrix A = 2 2
3 3
1 0
has RREF 0 1 , so r(A) = 2. And since A has only 2 columns, A also has full rank.
0 0
Since the RREF of a matrix has the same number of rows and columns as the original matrix,
then clearly the rank of a matrix can never be larger than the number of rows in the matrix. But
also, the RREF of a matrix cannot contain more leading ones than there are columns for them to
occur in (since each row’s leading one must be in a different column), and any row which does not
have a leading one must contain only zeroes. And this means that the rank of the matrix also cannot
be larger than the number of columns in the matrix. (It also means that we can think of the rank
of a matrix as the number of leading ones in the RREF of the matrix.) So for any m × n matrix A,
it must be true that r(A) ≤ m and also that r(A) ≤ n (and so r(A) must be less than or equal to
the smaller of m and n).
When we solve a system of linear equations, using elementary row operations to bring the aug-
mented matrix to RREF (or at least to the point where the coefficient matrix is in RREF), we also
look at the number and positioning of the leading ones to see what solutions the system has. And
so the number of solutions that a system has is related to the rank – both the rank of the coeffi-
cient matrix and the rank of the augmented matrix. Let’s think about the possibilities one at a time.
If the RREF of the coefficient matrix has at least one zero row, and at least one of those rows has
a non-zero in the extra column, then we know that means that the SLE has no solution. Now think
about: what does this mean about the rank of the coefficient matrix, and the rank of the augmented
matrix? The final augmented matrix, in which the coefficient matrix is in RREF, has more non-zero
rows than the RREF of the coefficient matrix has. (And if we bring the whole augmented matrix
to RREF, it will have exactly one more non-zero row than the RREF of the coefficient matrix has.)
So the rank of the augmented matrix is larger than the rank of the coefficient matrix.
If that’s not the case, then clearly the final augmented matrix has the same number of non-zero
rows as the RREF of the coefficient matrix (and the whole augmented matrix is in RREF) so this
means that the rank of the augmented matrix is the same as the rank of the coefficient matrix.
But if some column of the RREF of the coefficient matrix does not contain the leading one for
any row, then we must introduce a parameter for the variable corresponding to that column and so
the system has infinitely many solutions. (There may be more than one such column, but for now
we’ll just characterise the SLE as having infinitely many solutions, no matter how many parameters
are needed to express those solutions.) In terms of rank, if there’s a column of the RREF of the
coefficient matrix which does not contain the leading one for any row, that means that the rank of
the coefficient matrix is less than the number of columns of the coefficient matrix (i.e. the number
of unknowns in the system).
On the other hand, if we don’t have “no solution” and we don’t have “infinitely many solutions”,
then the system must have a unique solution. That is, if we don’t find that the RREF of the coef-
ficient matrix has more zero rows than the final augmented matrix has (and therefore, as already
122 Unit 9
observed, the rank of the coefficient matrix is the same as the rank of the augmented matrix), and
every column of the RREF of the coefficient matrix contains the leading one for some row, then
the rank of the coefficient matrix is equal to the number of columns it contains (i.e. the number of
variables in the SLE). And the rank of the augmented matrix is the same.
All of this means that knowing only the rank of the coefficient matrix and the rank of the aug-
mented matrix for some SLE, we can tell how many solutions the SLE has. We summarize our
findings in the following theorem.
Solution: h i
(a) Since r(A) = r A | ~b = 3, but A has 4 columns, so that the system has 4 unknowns, i.e.
h i
n = 4, then we have r(A) = r A | ~b < n and so the system has infinitely many solutions.
h i
(Since r(A) = r A | ~b , then the RREF of A has the same number of non-zero rows as the RREF
of the augmented matrix, so there is no row in the final augmented matrix in which there are only 0’s
in the coefficient matrix part, and a non-zero in the extra column, so the system is consistent. And
since r(A) = 3 is the number of leading ones in the RREF of A, but A has 4 columns, then clearly one
column of the RREF of A does not contain a leading one for any row, so a parameter must be intro-
duced for the variable corresponding to that column and so the system has infinitely many solutions.)
h i
(b) Since r(A) = 3 < r A | ~b = 4 then the system has no solution.
(Since r(A) = 3 but A has 4 rows, and the rank of A is the number of leading ones in the RREF of
A,hthen clearly
i the RREF of A has one row which contains only zeroes, i.e. a zero row. But we have
r A | ~b = 4, and this gives the number of leading ones in the RREF of the augmented matrix.
Since the coefficient matrix part of the RREF of the augmented matrix is just the RREF of A, and
there are only 3 leading ones in the RREF of A, then the fourth leading one must be in the row in
which the RREF of A has only zeroes. That is, in order for the rank of the augmented
h i matrix to
~
be 4 when the rank of A is only 3, the last row of the 4 × 4 augmented matrix A | b must have
only zeroes in the coefficient matrix part, with its leading one in the extra column. And that tells
us that the system has no solution.)
Unit 9 123
(c)
hSinceiA is a 4 × 3 matrix, then the system has only n = 3 unknowns. And since r(A) =
r A | ~b = 3 = n, then the system has a unique solution.
(Since r(A) = 3 and A has only 3 columns, then every column of the RREF of A contains the leading
one for some row. However, A has 4 rows, so clearly the RREF of A has a row of only zeroes. But
that row must also have a 0 in the extra column, because the rank of the augmented matrix is the
same as the rank of A. Therefore there is not “no solution”, and we don’t need to introduce any
parameters, so the system has a unique solution.)
Now let’s think again about a system that has infinitely many solutions, and think about the
number of columns of the RREF of A which don’t contain the leading one for any row, i.e. the
number of parameters needed to express those infinitely many solutions. We have already seen that
for a system with infinitely many solutions, it must be true that the coefficient matrix and the
augmented matrix have the same rank, and that this rank is less than the number of variables in
the system.
Let n be the number of variables in the SLE (and hence also the number of columns of A) and
let p = r(A). Then we know that (since the system has infinitely many solutions) p < n. The rank
of A is the number of leading ones in the RREF of A, so the RREF of A contains p leading ones,
which appear in p different columns. But there are n columns in A, so there must also be n − p
columns which don’t contain the leading one for any row. And to express the solution set for the
system, we need to introduce a parameter for each of these columns. Therefore the system has an
(n − p)-parameter family of solutions.
This means that if we know the rank of the coefficient matrix, the rank of the augmented matrix
and the number of variables in the system, we can not only tell whether the system has infinitely
many solutions, but also if it does, we can tell how many parameters are needed to express those
solutions.
Theorem 9.4. ~
h LetiA~x = b be a system of m linear equations in n unknowns. Let p = r(A). If
p < n and r A | ~b = p as well, then the system has an (n − p)-parameter family of solutions.
h i
For instance, if we know that A~x = ~b has 21 variables, where r(A) = r A | ~b = 17, then
the system must have a 4-parameter family of solutions. (That is, since the coefficient matrix and
the augmented matrix have the same rank, the system is consistent. And we have r(A) = 17, with
n = 21, so A does not have full rank and (since there are r(A) leading ones in the RREF of A) there
are 21 − 17 = 4 columns of the RREF of A which do not contain the leading one for any row, and
so 4 parameters are needed to express the infinitely many solutions to the system.)
So far, we have focussed only on the ranks of the coefficient matrix and the augmented matrix.
That is, we have considered ~b only in conjunction with A, in comparing the rank of the augmented
matrix to the rank of the coefficient matrix and the number of unknowns in the system. However,
there is also one property of the right hand side values, i.e. the column vector ~b, which some SLE’s
have, which can give us some information about the number of solutions a system can have, even
without knowing anything at all about the coefficient matrix. We have a special name for a system
in which ~b has this property.
124 Unit 9
Definition: The system of linear equations corresponding to the matrix equation A~x = ~b
is called homogeneous whenever ~b = ~0, i.e. when the right hand side values are all 0.
A system in which ~b 6= ~0 is said to be nonhomogeneous.
Example 9.3. Characterise each of the following SLE’s as either homogeneous or nonhomogeneous.
x + z = y
x + y = 0 x = y − z
(a) (b) (c) x = 2y + z
5x − 4y = 0 z = 2y − 3x
x = 4 − 3y
Solution:
(a) The SLE is in standard form, so we can clearly see that the right hand side values are all 0.
Therefore this system is homogeneous.
(b) This time the SLE is not in standard form. If we put it into standard form, it is easier to see
the RHS values:
x = y − z x − y + z = 0
in standard form, becomes
z = 2y − 3x 3x − 2y + z = 0
Now we can easily see that this SLE is also homogeneous.
(c) Again, we start by putting the SLE into standard form (although you may already see what the
answer will be):
x + z = y x − y + z = 0
in standard form, x = 2y + z becomes x − 2y − z = 0
x = 4 − 3y x + 3y = 4
This time, one of the RHS values is not 0, i.e. it is not the case that the RHS values are all 0, so
this system is nonhomogeneous.
There is one property which every homogeneous SLE has. We know, from property 12 of Theo-
rem 7.1, that for any matrix A, A~0 = ~0 (where, if A is m × n, the first zero column vector mentioned
is n × 1 and the second is m × 1, of course). But then that means that for any m × n matrix A,
the n × 1 column vector ~0 is a solution to the homogeneous system with coefficient matrix A. So
we already know one solution to any homogeneous SLE. That solution, which is so trivial that we
already know what it is, without knowing anything about the system other than that it is homoge-
neous, is called the trivial solution.
Definition: In a homogeneous SLE, the solution ~0 is called the trivial solution. Any
other solutions which the system may have are referred to as nontrivial solutions.
So we know that any homogeneous SLE has at least one solution – the trivial solution. And that
means that one of the 3 possibilities in Corollary 9.2 can’t happen for a homogeneous system. That
is, a homogeneous system cannot “have no solution”!
Theorem 9.5. Any homogeneous system of linear equations either has exactly one solution, the
trivial solution, or else has infinitely many solutions (including the trivial solution).
Unit 9 125
It shouldn’t be too hard for you to realize that in a homogeneous system, thehrank of ithe aug-
mented matrix is always the same as the rank of the coefficient matrix. That is, r A | ~0 = r(A).
The RREF of the augmented matrix cannot have more leading ones than the RREF of the coefficient
matrix, because there cannot be a leading one in the extra column. No matter what elementary
row operations we perform, the RHS values in the augmented matrix of a homogeneous system are
always all zeroes. As we perform ero’s we may move those 0’s around, multiply them by non-zero
constants, or add a multiple of one 0 to another, but none of that will make any of them be anything
but 0.
This eliminates one of the situations in Theorem 9.3 when we’re dealing with a homogeneous
SLE. Of course it does, because we have eliminated one of the possibilities from Corollary 9.2, and
the 3 situations in Theorem 9.3 exactly correspond to the 3 possibilities from Corollary 9.2. Thus
we have a Corollary to Theorem 9.3 for the situation in which the SLE is homogeneous. And we
might as well incorporate the number of parameters into this, as well, from Theorem 9.4.
Corollary 9.6. For any m × n matrix A with rank r(A) = p, the homogeneous system A~x = ~0 has:
• a unique solution if p = n, or
You surely already realize, and we may have previously observed, that a system of linear equa-
tions in which there are fewer equations than unknowns cannot possibly have a unique solution. In
terms of row-reducing, the RREF of a coefficient matrix in which there are more columns than rows
cannot have a leading one in every column, because there are at most as many leading ones as there
are rows. In terms of rank, we have already commented on the fact that an m × n matrix A must
have r(A) ≤ m as well as r(A) ≤ n, so if m < n we certainly have r(A) < n.
In general, knowing that a SLE does not have a unique solution doesn’t tell us very much about
the number of solutions it does have. We cannot conclude that there are infinitely many solutions,
because we cannot eliminate the possibility that the system might have no solution.
But for a homogeneous system, that other possibility is already eliminated. We know that ev-
ery homogeneous system has at least one solution, and if there are more unknowns than there are
equations then there cannot be “only the trivial solution”. Therefore there must be infinitely many
solutions. (Knowing only that the system is homogeneous and that there are more unknowns than
equations doesn’t tell us exactly how many parameters are needed to express the infinitely many
solutions. We cannot find that without row-reducing, either to solve the system or to find the rank
of the coefficient matrix, because the RREF of the coefficient matrix may have some zero rows.)
This gives us another corollary, for the situation in which the coefficient matrix of a homogeneous
system cannot have full rank.
Corollary 9.7. Any homogeneous SLE in which the number of unknowns is larger than the number
of equations has infinitely many solutions.
This means that sometimes we can tell just at a glance how many solutions a system of linear
equations has. We use this in the following example.
126 Unit 9
Example 9.4. If possible, determine how many solutions each of the following SLE’s has just from
looking at it.
2x + 2y − 5z = 0
(a)
23x + 14y − z = 0
2x + 2y − 5z = 0
(b)
23x + 14y − z = 1
2x + 2y − 5z = 0
(c) 23x + 14y − z = 0
11x − 32y + 14z = 0
2x + 2y − 5z = 0
(a) The system is homogeneous, with 3 variables and only 2 equa-
23x + 14y − z = 0
tions, so it must have infinitely many solutions.
2x + 2y − 5z = 0
(b) The system is not homogeneous. Since there are 3 variables and
23x + 14y − z = 1
only 2 equations, we know that it does not have a unique solution, but we cannot tell how many
solutions it does have. There may be no solution, or infinitely many solutions.
2x + 2y − 5z = 0
(c) The system 23x + 14y − z = 0 is homogeneous. But since it does not have more
11x − 32y + 14z = 0
variables than equations, we can’t tell whether it has only the trivial solution, or infinitely many
solutions.
is not homogeneous. However, because of the structure of the equations, we can see that this system
has infinitely many solutions. If we were to form the augmented matrix for this system, there would
be no work to do in bringing it to RREF, because it would already be there. We can tell just from
looking at the system that both the coefficient matrix and the augmented matrix have rank 3, which
is less than the number of unknowns (which is 6), and so the system must have a 3-parameter family
of solutions. (We can even see that (1, 3, 7, 0, 0, 0) is one solution, and that the parametric family of
solutions is (1 − 3r − 5s + 2t, 3 − 5r + 3s − 21t, 7 + 3r − 7s, r, s, t).)
In this unit, we have observed a number of theoretical results which tell us about the number of
solutions a system of linear equations has. If you understand what the rank of a matrix is (just the
number of non-zero rows in the RREF of the matrix, which can also be thought of as the number of
leading ones in the RREF matrix), then the implications for the number of solutions aren’t really
Unit 9 127
very hard to keep track of. It’s mostly the same as our rules for identifying how many solutions
there are, and how many parameters we need to introduce, when we finish row-reducing the matrix.
But there have been a number of different pieces of information about a SLE or its coefficient
matrix, both in this unit and earlier, that all lead to the same important conclusion – that the
system has a unique solution. In the following theorem, we collect all of these pieces of information
together. (And later, we’ll add one more piece of information, having to do with something we
haven’t learnt about yet, that also leads to this same conclusion.)
Theorem 9.8. If A is a square matrix of order n then the following statements are equivalent to
one another.
1. A is invertible (i.e., nonsingular).
2. r(A) = n (i.e., A has full rank).
3. The RREF of A is I (i.e., A is row-equivalent to the identity matrix).
4. The system A~x = ~b has an unique solution (for all n × 1 column vectors ~b).
5. The homogeneous system A~x = ~0 has only the trivial solution.
Notice: Saying that these statements are all equivalent tells us that if any one of them is true for a
particular matrix A, then all of them are true for that matrix. So for instance knowing that A is
invertible tells us that r(A) = n and also that A~x = ~b has a unique solution no matter what ~b is,
and so on. But it’s important to remember that this only applies to square matrices.
Math 1229A/B
Unit 10:
Determinants
(text reference: Section 4.1)
c
V. Olds 2010
128 Unit 10
10 Determinants
Square matrices are a special class of matrices. We have already seen one instance of a concept
which is defined only for square matrices — the inverse matrix. That is, only a square matrix may
have an inverse. In this unit we will (begin to) learn about another concept which is defined only
for square matrices — the determinant of a matrix.
The number which is the determinant of a square matrix measures a certain characteristic of the
matrix. In a more advanced study of matrix algebra, this characteristic is used for various purposes.
In this course, the only way in which we will use this number is in its connection to the existence of
the inverse of the matrix, and through that it’s application to SLE’s in which the coefficient matrix
is a square matrix. For these purposes, what will matter to us is whether or not this number, the
determinant of the matrix, is 0. But of course, in order to determine whether or not the determinant
of a particular matrix is 0, we need to know how to calculate that number.
Calculating the determinant of a square matrix is somewhat complicated. The definition is re-
cursive, meaning that the calculation is defined in a straightforward way for small matrices, and
then for larger matrices, the determinant is defined as being a calculation involving the determinants
of smaller matrices, which are certain submatrices of the matrix. We could express this recursive
definition of the determinant of a square matrix of order n as applying for all n ≥ 2, specifically
defining only the determinant of a square matrix of order 1, i.e. a (square) matrix containing only
a single number. However, the calculation for a 2 × 2 matrix is very straightforward — easier to
think of as a special definition all on its own — so instead we use specific definitions for n = 1 and
n = 2, and then define the determinant of a square matrix of order n > 2 in terms of determinants
of submatrices of order n − 1, which are found by expressing them in terms of determinants of
successively smaller submatrices until we get down to submatrices of order 2. The calculation of
det A as defined in this way, when A is a square matrix of order n > 2, is not really as complicated
as it will look. It’s just a matter of applying a certain formula carefully, as many times as necessary
until we have expressed det A in terms of the determinants of 2 × 2 matrices. Those determinants
are easy to find.
So we start by defining det A for square matrices of order 1 and of order 2. When A is a 1 × 1
matrix, i.e. a matrix containing only one number, finding the particular number det A which is
associated with that matrix is trivial. That number is the only number around — the single number
that’s in the matrix. For a square matrix of order 2, i.e. a matrix containing 4 numbers arranged
in a square, we have to do a little more work. But it’s a simple calculation. In fact, we can think
of the calculation as “down products minus up products”, which is something we have seen before.
But this time there’s only one down product, and only one up product, so it’s actually just “down
product minus up product”.
d, then det A = ad − cb. That is, if A is a 2 × 2 matrix, then det A = a11 a22 − a21 a12 .
Example 10.1. Find the determinants
of the following matrices:
1 2 2 0
(a) A = [5] (b) B = (c) C =
3 4 1 3
Solution:
(a) Here, A is 1 × 1, so det A = a11 = 5. That is, the determinant of this matrix is just the number
that’s in the matrix.
(b) For a 2×2 matrix, we use the formula det B = b11 b22 −b21 b12 . That is, we take the product of the
numbers going diagonally down to the right (i.e., on the main diagonal) and then subtract from that
the product of the numbers going diagonally up to the right. So we take the down product minus the
up product. Here, the down product is b11 b22 = 1(4) = 4 and the up product is b21 b12 = 3(2) = 6.
Therefore det B = 1(4) − 3(2) = 4 − 6 = −2.
Before we can define how the determinant of a larger matrix is defined in terms of determinants of
certain submatrices, we need to define what those submatrices are, and some notation for indicating
what submatrix we’re referring to. We will also define terminology which means the determinant
of a particular submatrix, and for another number obtained from that determinant, which is that
same number, but sometimes with the sign changed.
Notice that if we have a square matrix of order n, we can obtain various submatrices of order
n − 1 by deleting both one row and one column of the larger matrix. In fact, the matrix doesn’t
have to be square. For any m × n matrix with m > 1 and n > 1 we can obtain an (m − 1) × (n − 1)
submatrix by deleting one row and one column of the larger matrix. So we will define these sub-
matrices for any matrix that has more than one row and more than one column, but we’ll only be
using them in the context of the original matrix being a square matrix. We simply need to indicate
which row and which column are to be deleted.
Definition: For any m × n matrix A with m > 1 and n > 1, the submatrix Aij is the
(m − 1) × (n − 1) submatrix of A obtained by deleting row i and column j.
1 2
1 2
Example 10.2. For A = 3 4 and B = find A21 and B11 .
3 4
5 6
Solution:
1 2
We find A21 by deleting row 2 and column 1 of A = 3 4 . Since A is a 3 × 2 matrix, A21 will
5 6
be a 2 × 1 matrix, consisting of the parts of rows 1 and 3 of A which are not in column 1. That is,
when we’ve deleted row 2 and column 1, all that’s left is the column 2 entries for rows 1 and 3. We
get
61 2
6 3 6 4 gives A21 = 2
6
65 6
130 Unit 10
Similarly, we get B11 by deleting the first row and the first column from B. This will give the 1 × 1
submatrix which contains only the number that’s in row 2, column 2. That is,
1 2 61 62
For B = gives B11 = [4]
3 4 63 4
1 2 1 3
2 1 3 1
Example 10.3. For A =
2
, find A11 and A23 .
5 3 4
3 7 8 9
Solution:
To find A11 , we delete both the first row and the first column. We get
61 62 61 63
62 1 3 1
1 3 1 gives A11 = 5
62 3 4
5 3 4
7 8 9
62 7 8 9
For A23 , we delete row 2 and column 3 from the matrix A. We see that
1 2 61 3
62 1 2 3
61 63 61
2 gives A23 = 2 5 4
5 63 4
3 7 9
3 7 68 9
Notice that A11 is a submatrix that we could see, intact, within matrix A, whereas the submatrix
A23 contains non-contiguous parts of A, because things have been deleted from the midst of rows
and columns. But the numbers from A that are in A23 still have the same relative positions to one
another.
We have a special name for the determinant of submatrix Aij of a matrix A. Also, in our cal-
culation of the determinant of A we will use det Aij , but we will need the negative of this number
whenever i and j are not both odd or both even. Notice that the sum of two odd numbers is even, as
is the sum of two even numbers. But the sum of an odd number and an even number is odd. And of
course, if we raise −1 to an odd power, the value is −1, whereas if we raise −1 to an even number, the
value is 1. So we can accomplish “use det A if i and j are both odd or both even, but otherwise use the
negative of det Aij ” by multiplying det Aij by (−1)i+j . We have a special name for this product, too.
Definition: Let A be any square matrix of order n > 1 and let Aij be the submatrix
obtained by deleting row i and column j.
To find a cofactor of A, we find the corresponding minor and multiply it by −1 raised to the power
i + j (so that we multiply by −1 only if one of the row number and column number is odd and the
other is even). We get:
C23 = (−1)2+3 M23
= −[(−3)(−3) − 2(2)]
= −(9 − 4)
= −5
= (2)(6) − 5(1)
= 12 − 5
= 7
Notice that we don’t, yet, know how to find minors or cofactors of a square matrix of order
4, or of any order larger than 3, because we haven’t yet defined how to find the determinant of a
square matrix of order larger than 2. For instance, the 1,1-minor of a 4 × 4 matrix A is simply the
determinant of A11 , but since A11 is a 3 × 3 matrix, we don’t know how to calculate that. But
we do, now, know everything we need to in order to define this. That is, we have assembled all
the pieces to allow us to recursively define how to find the determinant of a square matrix of order
bigger than 2. We define it in terms of the cofactors of certain submatrices. The definition we will
132 Unit 10
now state will give a particular way of calculating det A. But then afterwards, we’ll have a theorem
that shows other, similar, ways of calculating it, using a different series of cofactors. There are, in
fact, 2n different ways that we could calculate the determinant of an n × n matrix, using cofactors
determined by any one particular row or column. In our definition, we’ll use the cofactors of row 1.
We multiply each entry of this row by the cofactor with the same index. (That is, multiply a1j by
C1j .) And then we add them all up.
That is, we multiply each entry of row 1 by the corresponding cofactor of A, which is to
say that we multiply that entry by the determinant of the submatrix obtained by deleting
the row and column in which that entry occurs, or the negative of that determinant if
Pn
the row and column are not either both odd or both even. Using to denote “do this
j=1
calculation for every value of j from 1 to n, and add them all up”, we have
n
X
det A = a1j (−1)1+j det A1j
j=1
1 2 3
Example 10.5. Find det A, where A = 2 1 3 .
3 2 1
Solution:
We calculate det A as it says in the definition:
3
X 3
X
det A = a1j C1j = a1j (−1)1+j det A1j
j=1 j=1
1+1
= a11 (−1) det A11 + a12 (−1)1+2 det A12 + a13 (−1)1+3 det A13
2 1 3 3 2 3 4 2 1
= (1)(−1) det + (2)(−1) det + (3)(−1) det
2 1 3 1 3 2
= [(1)(1) − (2)(3)] − 2[(2)(1) − (3)(3)] + 3[(2)(2) − (3)(1)]
= (1 − 6) − 2(2 − 9) + 3(4 − 3)
= −5 − 2(−7) + 3(1) = −5 + 14 + 3 = 12
Notice that the (−1)i+j multipliers made the sign alternate across the row. That is, for the a11 term
it is +1, then for the a12 term it is −1 and for the a13 term it is +1 again. So we could express
det A as
det A = a11 det A11 − a12 det A12 + a13 det A13
1 −2 −4 5
0 3 0 0
Example 10.6. If A =
0
, find det A.
−1 3 2
0 4 −5 2
Unit 10 133
Solution: Again, we use the definition to express det A in terms of the determinants of certain
submatrices of A:
det A = a11 (−1)1+1 det A11 + a12 (−1)1+2 det A12 + a13 (−1)1+3 det A13 + a14 (−1)1+4 det A14
3 0 0 0 0 0
= 1(−1)2 det −1 3 2 + (−2)(−1)3 det 0 3 2
4 −5 2 0 −5 2
0 3 0 0 3 0
+(−4)(−1)4 det 0 −1 2 + 5(−1)5 det 0 −1 3
0 4 2 0 4 −5
3 0 0 0 0 0
= 1 det −1 3 2 − (−2) det 0 3 2
4 −5 2 0 −5 2
0 3 0 0 3 0
+(−4) det 0 −1 2 − 5 det 0 −1 3
0 4 2 0 4 −5
Notice that once again the +’s and −’s resulting from the (−1)1+j ’s are alternating. Now, to find
the determinant of each of those 3 × 3 submatrices of A, we need to use the definition again, to
express each in terms of determinants of 2 × 2 submatrices. Of course, this time the 1 and j indices
in the (−1)1+j term (as well as the a1j and A1j terms) are the row and column numbers in the
submatrix whose determinant we are currently calculating, not the row and column numbers from
the original matrix.
3 0 0 0 0 0
det A = 1 det −1 3 2 − (−2) det 0 3 2
4 −5 2 0 −5 2
0 3 0 0 3 0
+(−4) det 0 −1 2 − 5 det 0 −1 3
0 4 2 0 4 −5
1+1 3 2 1+2 −1 2 1+3 −1 3
= 1 3(−1) det + 0(−1) det + 0(−1) det
−5 2 4 2 4 −5
1+1 3 2 1+2 0 2 1+3 0 3
−(−2) 0(−1) det + 0(−1) det + 0(−1) det
−5 2 0 2 0 −5
−1 2 0 2 0 −1
+(−4) 0(−1)1+1 det + 3(−1)1+2 det + 0(−1)1+3 det
4 2 0 2 0 4
−1 3 0 3 0 −1
−5 0(−1)1+1 det + 3(−1)1+2 det + 0(−1)1+3 det
4 −5 0 −5 0 4
Now, we have det A expressed in terms of determinants of 2 × 2 matrices, so for each of them we
134 Unit 10
a b
simply use the formula: det = ad − cb.
c d
det A = 1 {3(1)[3(2) − (−5)(2)] + 0(−1)[(−1)(2) − 4(2)] + 0(1)[(−1)(−5) − 4(3)]}
+2 {0(1)[3(2) − (−5)(2)] + 0(−1)[0(2) − 0(2)] + 0(1)[0(−5) − 0(3)]}
−4 {0(1)[(−1)(2) − 4(2)] + 3(−1)[0(2) − 0(2)] + 0(1)[0(4) − 0(−1)]}
−5 {0(1)[(−1)(−5) − 4(3)] + 3(−1)[0(−5) − 0(3)] + 0(1)[0(4) − 0(−1)]}
Well, that was a lot of work! But as long as we take it slowly and carefully, paying attention to
all the details of what we’re doing, none of it is difficult. Notice, though, that the way the calculation
was expressed above, many calculations were done unnecessarily, because we knew they were going
to be multiplied by 0. Let’s restate that calculation, exactly the same, but taking advantage of those
zero multipliers to not bother expressing the calculations that don’t matter because they won’t be
used.
Well, that was somewhat better! But what if we could have taken advantage of more 0’s earlier
on? Consider, for instance, the determinant of AT for the matrix A in that example. The transpose
of the matrix has 3 zeroes in its first row. And that would mean that we would only need to calculate
the determinant of one 3 × 3 matrix, because all of the others will simply be multiplied by 0 and
therefore don’t need to be calculated. That would certainly be convenient.
Actually, we could use that in the calculation of det A as given, too. Because the calculation
of the determinant of A doesn’t actually have to be done in the way stated in the definition. As
previously stated, that’s just one of the ways to calculate the determinant of a matrix. We call the
method expressed in the definition expansion along row 1 because we use the sum of the products
of the row 1 entries times the row 1 cofactors. But in fact we can expand along any row of the
matrix, instead of along row 1. (Just look at all those lovely 0’s in row 2!) Or, we can expand
along any column, instead of expanding along a row. (So we could expand along column 1, doing
the same convenient calculation that we already observed would make calculating the determinant
of the transpose of the matrix very easy.) We have a theorem which tells us that we can do any one
of these expansions and we’ll get the same answer, no matter which one we do.
Theorem 10.1. Let A be any square matrix of order n > 2. Then the value of detA can be found
by expanding along any row or column of A. That is:
• for any fixed value of i, with 1 ≤ i ≤ n, det A can be calculated using expansion along row
i as
OR
• for any fixed value of j, with 1 ≤ j ≤ n, det A can be calculated using expansion along
column j as
Theorem 10.1 means that we could have found det A in Example 10.6 much more easily, taking
advantage of the zeroes in column 1, or in row 2. Let’s see how that would work out.
136 Unit 10
0 4 −5 2
Solution:
Approach 1:
We can expand along row 2, which contains lots of 0’s, instead of along row 1. Watch out for the
effects of the (−1)i+j multipliers, though. We get:
det A = 0(−1)2+1 det A21 + 3(−1)2+2 det A22 + 0(−1)2+3 det A23 + 0(−1)2+4 det A24
1 −4 5
= −0 + 3 det 0 3 2 − 0 + 0
0 −5 2
The effect of the (−1)i+j multipliers is shown, even for the 0’s, to demonstrate that although the
+’s and −’s still alternate, this time, since we had i being even, the pattern switched, and started
with minus instead of with plus. Now, to calculate the determinant of A22 , we can expand along
column 1, to once again take advantage of the 0’s. So we have:
1 −4 5
det A = 3 det 0 3 2
0 −5 2
3 2
= 3 1(−1)1+1 det −0+0
−5 2
Approach 2:
Instead, we can expand along column 1, and once again choose wisely for the next expansion:
det A = 1(−1)1+1 det A11 − 0 det A21 + 0 det A31 − 0 det A41
3 0 0
= 1 det −1 3 2
4 −5 2
1+1 3 2
= 3(−1) det −0+0
−5 2
It won’t always be possible to reduce the amount of work required in calculating the determinant
of a large matrix by as much as we were able to here, and if the matrix doesn’t contain any zeroes,
it won’t be possible to reduce the work at all. But it’s always worthwhile to consider carefully in
choosing which row or column to expand along, to take advantage of as many zeroes as possible.
If we can calculate the determinant of a matrix by expanding along a row, or by expanding along
a column, and get the same value either way, then that means that the matrix and its transpose
Unit 10 137
must have the same value for the determinant. That is, if we calculate the determinant of some
matrix A by, for instance, expanding along row 1, and then we calculate the determinant of AT by
expanding along column 1, we’re doing exactly the same calculations and must therefore end up
with the same value. That’s something worth remembering.
Let’s recap what we’ve got so far. We have defined the definition of derivative in 3 separate
pieces (referred to as Part One, Part Two and Part Three). The first piece just defined that there
is a number called the determinant of A for any square matrix A. The second piece told us how to
calculate the determinant of a 1 × 1 matrix and the determinant of a 2 × 2 matrix. Then we defined
the minors and cofactors of a matrix (as well as the submatrix Aij ). And finally the third piece of
the definition told us how to calculate the determinant of a square matrix of order larger than 2. But
then we had Theorem 10.1 telling us other ways to calculate those determinants. Let’s pull all of that
(except for the other definitions, of the submatrix Aij , the minors and the cofactors — we’ll still need
those definitions) together into a single definition of determinant, so that we’ve got it all in one place.
Sometimes, it is possible to know what the value of the determinant of a matrix is without any
calculations at all! Or to find the value with only very minimal calculation. That is, there are
various properties that a matrix can have which make the value of the determinant obvious. We
finish off this unit by observing these properties. For instance, consider the next example.
Solution: Oh my goodness! Let’s count ... yes, that’s a 10 × 10 matrix. Let’s see. We’ll have ten
9 × 9 submatrices, each of which will require nine 8 × 8 submatrices, for each of which we’ll have
eight 7 × 7 submatrices, for which we’ll have to find the determinants of seven 6 × 6 submatrices,
and for those ... it’s exhausting just trying to say how much work this will be. And look at some of
those numbers! Where did I put my calculator??? ... But wait! Look at row 4! Let’s expand along
row 4! Aha!
det A = −0 + 0 − 0 + 0 − 0 + 0 − 0 + 0 − 0 + 0 = 0
Well that wasn’t bad at all! We didn’t even need to write down any of those submatrices. Or even
observe the pattern of pluses and minuses, although it was good practice to do so.
Because A in this example had a row that contained nothing but 0’s, we were able to choose
to expand along this row to calculate the determinant without really doing any work at all. And
similarly, if we had a matrix which contained a column that was all 0’s, we could expand along that
column to see that the value of the determinant was 0. This is an extremely useful thing to realize.
Theorem 10.3. If square matrix A has any row which contains only 0’s, or any column which
contains only 0’s, then det A = 0.
1 2 3
5 5
Example 10.8. Find det 4 5 6 and det .
11 11
4 5 6
Solution:
Well, no zeroes there. We’ll have to crank through the work. For the first one, expanding along row
1 (since there’s no other choice that looks easier) we have:
1 2 3
5 6 4 6 4 5
det 4 5 6
= 1 det − 2 det + 3 det
5 6 4 6 4 5
4 5 6
Oh! Look at that! Every single one of those 2 × 2 matrices had determinant 0. Why was that? Well,
look at the 2 × 2 matrices. In each case, the two rows are identical, which means that when we did
the “down product minus up product” calculation, the down product and the up product were the
same. That’s where all those zeroes came from.
The second matrix whose determinant we’re asked to find is a 2 × 2, but it doesn’t have the two
rows being identical. We better do the calculation:
5 5
det = 5(11) − (11)(5) = 55 − 55 = 0
11 11
Again the determinant is 0! Although the rows were different, the columns were identical. And
again that meant that the down product had the same value as the up product.
In this example, in the first matrix, the reason that all of the 2 × 2 submatrices had two identical
rows is because the larger matrix they were submatrices of had two identical rows. And similarly,
if we had a larger matrix in which there were two identical columns (for instance, the transpose of
Unit 10 139
the matrix with two identical rows), then careful choosing of a column to expand along would result
in all the 2 × 2 submatrices whose determinants we need to calculate having two identical columns.
And as we have seen here, when we take the determinant of a 2 × 2 matrix in which either the rows
are identical or the columns are identical, the determinant is just 0. So that means that if the larger
matrix whose determinant we need to find has two identical rows, or has two identical columns, we
can expand in such a way that we end up with 2 × 2 submatrices whose determinants are all 0, so
that the larger matrix also has determinant 0. And that’s another result which can save us a lot of
work.
Theorem 10.4. If a square matrix A has two rows which are identical, or has two columns which
are identical, then det A = 0.
2 4 2 2 4 2
Example 10.9. Find det 1 3 1 and det 1 3 1 .
0 0 0 −5 0 −5
Solution:
For the first matrix, we see that there is a row containing only 0’s, so the determinant is 0. And
for the second one, there are two identical columns, so again the determinant is 0. That is, without
doing any calculations at all we see that
2 4 2 2 4 2
det 1 3 1 =0 and det 1 3 1 =0
0 0 0 −5 0 −5
1 2 3
Notice: Look back at Example 10.5 from earlier. There, we found that det 2 1 3 = 12. And
3 2 1
in that matrix, there’s a row that’s identical to a column. So clearly having a row and a column
identical doesn’t have the same effect as having two rows which are identical, or having two columns
which are identical. Make sure you remember that it’s only two identical rows or two identical
columns that cause the determinant to be 0, not just two identical “rows or columns”.
Also Notice: Warning! In the last several determinant calculations we did in which we actually
did expand along a row or a column, we didn’t bother writing the (−1)i+j multipliers. We just
used the alternating + and − pattern that we knew would hold. And that approach is fine as long
as you’re careful to figure out whether the pattern starts with a + or a −. But if you don’t think
carefully about that, and get it wrong, then all of the signs will be wrong in your calculation, which
will result in the sign in your final answer being wrong. (And in those multiple choice questions on
quizzes and exams, you’ve gotta know that the right number with the wrong sign will always be one
of the answer choices.) So be careful.
We’ve seen a couple of characteristics that a matrix can have that allow us to know, without
doing any calculations at all, that the value of the determinant is 0. There are also some character-
istics which allow us to find the value of the determinant, whatever it is, much more easily than by
actually carrying out the row or column expansions. For instance, let’s look at what happens in the
calculations in the next example.
140 Unit 10
1 0 0 0
2 3 0 0
Example 10.10. Find det I5 and det
4
.
5 6 0
7 8 9 10
Solution:
I5 is the Identity Matrix of order 5. It’s got lots of zeroes in it. Choosing to expand along row 1
each time (which has the +/− pattern always starting with +), we take advantage of many of those
zeroes. Likewise, the second matrix whose determinant we’re asked to find also has lots of zeroes
in it (although not as many as in an identity matrix). Because of where they’re placed, when we
calclate that determinant, we can take advantage of the zeroes by always expanding along ... well,
row 1 would work again, but just to change things up a bit, let’s expand along the last column each
time, which will also use many of the 0’s. (And will give us practice at being careful about the signs.
For the 4 × 4, expanding along column 4 the first sign is given by (−1)1+4 , so the pattern starts
with a minus. But then for the 3 × 3, expanding along column 3 the first sign is given by (−1)1+3 ,
so that pattern will start with a plus.)
For I5 we get:
1 0 0 0 0
1 0 0 0
0 1 0 0 0
= 1 det 0 1 0 0
det I5 = det 0 0 1 0 0 0
−0+0−0+0
0 1 0
0 0 0 1 0
0 0 0 1
0 0 0 0 1
1 0 0
1 0
= 1 1 det 0 1 0 − 0 + 0 − 0 + 0 = 1(1) 1 det −0+0 +0
0 1
0 0 1
Hmm. We just ended up multiplying all the ones together. Those 5 ones that were along the main
diagonal.
Let’s see what we get for the other one. (Remember, we said we’re gong to expand along the last
column each time for this one.)
1 0 0 0
2 3 0 0 1 0 0
4 5 6 0 = −0 + 0 − 0 + 10 det 2 3 0
det
4 5 6
7 8 9 10
1 0
= 0 + 10 0 − 0 + 6 det
2 3
= 10 {0 + 6[1(3) − 2(0)]}
= 10(6)[1(3) − 0]
= 10(6)(1)(3)
= 1(3)(6)(10) = 180
Again, the value of the determinant ended up being just the product of the numbers along the main
diagonal. Hmm.
In fact, any time we calculate the determinant of a diagonal matrix, such as an identity matrix,
we will end up with (no matter how we do the expansion) the value of the determinant being just
the product of the numbers along the main diagonal. And for a matrix that looks like the second
Unit 10 141
one, or like the transpose of that matrix, the same thing will happen. We have special names to
describe matrices that look like that one, or like its transpose.
Definition: A square matrix A is called upper triangular if all entries below the main
diagonal are zero, and is called lower triangular if all entries above the main diagonal
are zero.
(Notice: A diagonal matrix fits both definitions, i.e. could be said to be both upper triangular and
lower triangular. But it’s easier to just call it diagonal, as we have already been doing.)
1 0 0 0 1 2 4 7
2 3 0 0 0 3 5 8
The definition says concerning the matrix 4 5 6 0 and its transpose 0 0 6 0 ,
7 8 9 10 0 0 0 10
that the first is lower triangular (because the non-zero entries form a triangle in the lower left part
of the matrix), and the second is upper triangular (because the non-zero entries form a triangle in
the upper right part of the matrix).
As we have already seen, calculating determinants of all of these kinds of matrices can be done
very easily, without even thinking about how we do the expansion, as the following theorem tells us.
Theorem 10.5. If a square matrix A = [aij ] is either upper or lower triangular, or is a diagonal
matrix, then the determinant of A is the product of the elements lying on the main diagonal. That
is,
det A = (a11 )(a22 )(a33 )...(ann )
Corollary 10.6. The determinant of any Identity Matrix is 1. That is, for any n ≥ 1, det In = 1.
And finally we have D = −3 (I24 ). That is, matrix D is obtained by multiplying the Identity Matrix
of order 24 by the scalar −3. Remember that when we multiply a matrix by a scalar, every entry
in the matrix is multiplied by the scalar. Of course, the 0’s will still just be 0’s, so D is a diagonal
matrix of order 24 in which each of the entries on the main diagonal is −3. Therefore the determinant
is just the product of all those −3’s. That is, we have:
(Note that because the exponent is even, the negatives all cancel out.)
Math 1229A/B
Unit 11:
Properties of Determinants
(text reference: Section 4.2)
c
V. Olds 2010
Unit 11 143
11 Properties of Determinants
In this section, we learn more about determinants. First, we observe some properties of determi-
nants that allow us to calculate determinants more easily. We examine the effects on the determinant
when the various kinds of elementary row operations are performed, so that we can easily see how
the determinants of the various row-equivalent matrices are related to one another as we perform
these operations. This allows us to calculate the determinant of a matrix by row-reducing the matrix
(a procedure we already know well) to obtain a matrix whose determinant is easily calculated using
facts we’ve already learnt in the previous section. We also learn some useful properties which allow
us to calculate the determinant of a matrix from the determinants of one or more other matrices
whose determinants we may already know. And finally we examine the relationship between deter-
minants and inverses, which allows us to relate determinants to systems of linear equations, using
what we already know about the implications of the existence of the inverse of a matrix for the
number of solutions to the SLE which has that matrix as its coefficient matrix. Throughout all of
this, of course, it is important to remember that we are only dealing with square matrices when
we talk about determinants. That is, it is only for a square matrix that the characteristic “the
determinant of the matrix” is defined.
First, let’s think about what effect multiplying some row of a matrix by a non-zero scalar will
have on the determinant. That is, let’s think about the relationship between det A and det B if
matrix B is identical to matrix A except that one of the rows in B is the corresponding row of A
multiplied by some c 6= 0.
So suppose we have some n × n matrix A = [aij ]. Let B = [bij ] be the matrix obtained by
multiplying one row, row k, by some non-zero scalar c. Then we know that bkj = cakj and bij = aij
for all i 6= k. We can calculate det B by expanding along row k. Notice that when we form
submatrices of B by deleting row k (and also some column of B), the one row that’s different than
in matrix A is deleted, so that in the submatrix of B obtained, each entry is just the corresponding
entry from matrix A and therefore the entire submatrix of B is simply the corresponding submatrix
of A. That is, we have Bkj = Akj . So when we expand along row k we get:
n
X
det B = bkj (−1)k+j Bkj
j=1
n
X
= (cakj )(−1)k+j Akj because bkj = cakj
j=1
and Bkj = Akj
= cak1 (−1)k+1 Ak1 + cak2 (−1)k+2 Ak2 + · · · + cakn (−1)k+n Akn
= c [det A]
Why look at that! When we multiply a row of matrix A by any non-zero scalar c, the effect is to
multiply the value of the determinant by that same scalar. And notice that the same thing would
happen if we were to multiply a column by c instead of a row, because we could calculate the deter-
minant by expansion along that column. That is, we have already observed that det B = det B T , so
doing something to a column has the same effect on the determinant as doing something to a row.
(We’ll use that fact every time we look at the effect on the determinant of doing something to a row
of a matrix.) So we have the following Theorem.
144 Unit 11
Theorem 11.1. If matrix B is obtained from square matrix A by multiplying one row or column of
A by some non-zero scalar c, then det B = c(det A).
Notice: We’re specifying here that the scalar must be non-zero, but of course multiplying a row or
column by 0 has the same effect – the value of the determinant is also multiplied by 0 – because we
know that if a matrix has a row or column of only 0’s (which is what multiplying a row or column
by 0 would give) then the value of the determinant is 0.
We could also express the result from the theorem above as factoring a non-zero scalar multiplier
out of a row or column. That is, we see that (in the case of row k being multiplied by c):
a11 a12 ··· a1n a11 a12 ··· a1n
a21 a22 ··· a2n a21 a22 ··· a2n
.. .. .. .. .. .. .. ..
. . . .
. . . .
a(k−1)1 a(k−1)2 ··· a(k−1)n = c det a(k−1)1 a(k−1)2 ··· a(k−1)n
det
cak1
cak2 ··· cakn
ak1
ak2 ··· akn
a(k+1)1 a(k+1)2 ··· a(k+1)n a(k+1)1 a(k+1)2 ··· a(k+1)n
.. .. .. .. .. .. .. ..
. . . . . . . .
an1 an2 ··· ann an1 an2 ··· ann
And of course it’s the same if some column k is multiplied by a non-zero scalar c. So the following
result follows directly from Theorem 11.1.
Corollary 11.2. If every entry of one row or column of a square matrix B has a common factor, c,
and matrix A is obtained from matrix B by factoring that common factor out of that row or column,
i.e. by multiplying the row or column by 1c , then det B = c det A.
One of the ways in which this is useful is when the numbers in a matrix are obnoxious, but there
are common factors in all the entries in some rows and/or columns which can be factored out to
make the arithmetic easier.
42 5 1
Example 11.1. Find det A where A = 84 0 0
63 5 2
Solution:
Some of the numbers in this matrix look pretty obnoxious. But we see that the entries in the first
column have a common factor of 21. The arithmetic will be easier if we factor it out. That is, we
have the matrix
42 5 1 21 × 2 5 1
A = 84 0 0 = 21 × 4 0 0
63 5 2 21 × 3 5 2
so the matrix
2 5 1
B= 4 0 0
3 5 2
Unit 11 145
can be obtained from matrix A by factoring the 21 multiplier out of column 1. Therefore, according
to our Corollary above, we have
42 5 1 2 5 1
det A = det 84 0 0 = 21 det 4 0 0 = 21(det B)
63 5 2 3 5 2
and the arithmetic for calculating det B is not difficult. Expanding along row 2, we have
5 1
det B = 4(−1)2+1 det − 0 + 0 = 4(−1)[5(2) − 5(1)] = −4(5) = −20
5 2
so we see that det A = 21 det B = 21(−20) = −420.
Of course, in that example, expanding along row 2 would have made the the arithmetic not all
that complicated anyway, since only one of the column 1 entries would have been used. But we
can have matrices in which we can factor out common factors from more than one row or column,
having a much more profound effect in simplifying the arithmetic.
33 22 88
Example 11.2. Find det A where A = 3 0 6
12 0 32
Solution:
Clearly, we will want to expand along column 2 to calculate the determinant. But before we do
so, we can make the arithmetic much easier by factoring out common factors in various rows and
columns. We get:
33 22 88
det A = det 3 0 6 (row 1 has common factor 11)
12 0 32
3 2 8
= 11 det 3 0 6 (column 1 has common factor 3)
12 0 32
1 2 8
= 3 × 11 det 1 0 6 (row 3 has common factor 4)
4 0 32
1 2 8
= 4 × 33 det 1 0 6 (column 3 has common factor 2)
1 0 8
1 2 4
= 2 × 132 det 1 0 3 (now expand along column 2)
1 0 4
1+2 1 3
= 264 2(−1) det −0+0
1 4
= 264[2(−1)(4 − 3)]
= 264(−2)(1)
= −528
146 Unit 11
Sometimes, the numbers are obnoxiously small, i.e. fractions or decimals, rather than obnox-
iously big. Again, we can make the arithmetic easier by factoring out common factors from rows or
columns, effectively multiplying by the common denominator of the fractions in a row or column, or
by the power of ten that makes the decimal numbers into integers.
1
− 16 1
2 3
2 1 4
−3
Example 11.3. Find det B, where B =
3 3
1
4 − 14 1
2
Solution:
Ugh! Fractions! (Not really, but ... ) Not only are there no 0’s, there aren’t even any integers!
But we can make the arithmetic easier by getting the fractions out in front. We can bring each row
to a common denominator, and then factor out “1 over the common denominator” from each row.
(Sometimes, we might want to factor from some columns as well as some rows, and/or factor out
common factors of numerators as well, but not this time.) We get:
1
− 61 1 3
− 61 2
2 3 6 6
3 −1 2
2 1 4 = det − 2
1 4
1 1 1
−3
det B = det 3 3 3 3 3
= det −2 1 4
6 3 4
1 −1 2
1
4 − 41 1
2
1
4 − 41 2
4
That is, at the last step we factored 61 out of row 1, 13 out of row 2 and 41 out of row 3 (all at once).
Now we can see that there’s a 2 we can factor out of column 3, and column 2 is a reasonably good
choice to expand along:
3 −1 1
1 1 1
det B = × × × 2 det −2 1 2
6 3 4 1 −1 1
2 1+2 −2 2 2+2 3 1 3+2 3 1
= × (−1)(−1) det + (1)(−1) det + (−1)(−1) det
6×3×4 1 1 1 1 −2 2
1
= {(−1)(−1)[−2 − 2] + (1)(1)[3 − 1] + (−1)(−1)[6 − (−2)]}
3×3×4
1
= (−4 + 2 + 8)
36
6
=
36
1
=
6
When we have a matrix full of obnoxious decimals, it may be easier for you to think of simulta-
neously multiplying and dividing by a power of 10, rather than “factoring out” the decimal places.
For instance, to eliminate a single decimal place in a row or column, we divide the determinant by
1
10, i.e. multiply by 10 = 0.1 while multiplying that row by 10. To eliminate 3 decimal places, we
would divide the determinant by 1000, i.e. multiply by 0.001 while multiplying the row or column
by 1000 to transform the numbers in the row or column to integers.
Unit 11 147
0.003 0.002
Example 11.4. Find the determinant of .
0.05 0.04
Solution:
Rather than fiddle around with multiplying all those decimals, we can factor 0.001 out of row 1 and
0.01 out of row 2. That is, we will multiply the determinant by 0.001 while multiplying row 1 by
1000 and then also multiply the determinant by 0.01 while multiplying row 2 by 100. We get:
0.003 0.002 3 2 3 2
det = (0.001) det = (0.001)(0.01) det
0.05 0.04 0.05 0.04 5 4
Let’s think again about what Theorem 11.1 told us. The effect of multiplying any row or column
by a non-zero scalar is to multiply the determinant by that scalar. We know that if a matrix has
two identical rows, or has two identical columns, then the determinant of the matrix is 0. And if we
multiply one of those identical rows or columns by a scalar, the determinant of the resulting matrix
will be 0 times that scalar, i.e. will also be 0. But then that means that we don’t necessarily have
to have two identical rows or columns in order to know that the determinant is 0. The determinant
will be 0 whenever one row is a scalar multiple of another row, or one column is a scalar multiple of
another column, because we can factor out the common scalar from that row or column to obtain
a matrix with two identical rows, or two identical columns. That is, we get another Corollary from
Theorem 11.1.
Corollary 11.3. If a square matrix A has one row which is a scalar multiple of another, or one
column which is a scalar multiple of another, then det A = 0.
1 2 3 2 1
2 4 1 6 2
Example 11.5. Find det A where A =
5 10 15 9 6 .
−4 −8 7 9 −4
−3 −6 21 9 18
Solution:
Well, that matrix is both big and relatively ugly. There are no 0’s, and so calculating the determinant
of this 5 × 5 matrix will be a lot of work. But wait ... look at columns 1 and 2. We see that column
2 is 2 times column 1. And that means that we don’t need to do any of those calculations. Since
one column is a scalar multiple of another, we see that
det A = 0
Next, let’s think about what happens when we interchange 2 rows of a matrix. Let’s start with
the easiest case. Suppose we interchange the rows of a 2 × 2 matrix. Then we have
c d a b
det = cb − ad = −(ad − cb) = − det
a b c d
Oh! Look at that! The sign of the determinant changes. In general, suppose we have an n × n
matrix and we interchange rows 1 and 2. That is, consider some n × n matrix A = [aij ], and let
B = [bij ] be the matrix obtained by interchanging rows 1 and 2 of matrix A, so that b1j = a2j and
148 Unit 11
b2j = a1j , but for any other value i, bij = aij . Then deleting row 2 (and some column) of matrix B
has the same result as deleting row 1 (and that same column) of matrix A, so we have B2j = A1j .
Now, if we calculate det B by expanding along row 2, and use what we know about the way the
+’s and −’s alternate when we do the expansion (instead of thinking about the powers of (−1)),
recognizing that for expansion along row 2 the pattern starts with a negative, we have:
Pn 2+j
det B = j=1 b2j (−1) B2j
= − [det A]
That is, we were getting the terms of the row 1 expansion of det A, but with the pattern of alternat-
ing pluses and minuses starting with a minus, whereas for expansion along row 1 the pattern should
start with a plus. So we “factored out a minus”, i.e. multiplied by −1 to switch all the signs. We see
that once again, the effect of interchanging two rows of a matrix is that the sign of the determinant
changes. The same thing would happen if we interchanged any two rows, although it’s not quite as
straightforward to see. And of course if we interchange two columns of a matrix, that’s the same as
interchanging two rows of the transpose of the matrix, so the effect on the determinant will be the
same — the sign of the determinant will change.
Theorem 11.4. Let A be any square matrix and let B be the matrix obtained either by interchanging
two rows of A, or by interchanging two columns of A. Then det B = − det A.
1 2 3
Example 11.6. Find det 1 0 2 .
0 0 5
Solution:
If we interchange columns 1 and 2, we’ll get an upper triangular matrix:
1 2 3 2 1 3
det 1 0 2 = − det 0 1 2 = −(2)(1)(5) = −10
0 0 5 0 0 5
0 1 1 2
0 0 0 3
Example 11.7. Find det A where A =
1
.
0 1 1
0 0 2 1
Solution:
We can transform A to an upper triangular matrix by moving rows around. We have to be careful,
though. We need to know how many interchanges of rows are done, so that we know whether or not
to change the sign of the determinant. (That is, performing an even number of interchanges of rows
switches the sign an even number of times, effectively not switching it at all, but performing an odd
Unit 11 149
number of interchanges does switch the sign.) It’s important to remember that we can’t just pick up
a row and move it somewhere else. For instance, we can’t just put the third row at the top of the
matrix, leaving the relative positions of the other rows unchanged. We would have to interchange
row 1 and row 3, changing the relative positions of what had been the first and second rows. Or
move row 3 up gradually, first interchanging it with row 2 and then interchanging it with row 1,
performing 2 interchanges to get row 1 to the top and leaving the relative positions of the original
first and second rows unchanged. (That’s what’s done in the following.)
0 1 1 2 0 1 1 2
0 0 0 3 1 0 1 1
det A = det 1 0 1 1 = − det 0 0 0 3 (by interchanging rows 2 and 3)
0 0 2 1 0 0 2 1
1 0 1 1
0 1 1 2
= det
0
(by interchanging rows 1 and 2)
0 0 3
0 0 2 1
1 0 1 1
0 1 1 2
= − det
0
(by interchanging rows 3 and 4)
0 2 1
0 0 0 3
= −(1)(1)(2)(3) = −6
Notice: There’s no one right pattern of row interchanges here. And some will be longer than others.
But all the possible series of interchanges you can do to get from the matrix we started with to the
matrix we finished with will involve an odd number of interchanges. And for any matrix, it will
always be true that to get from the initial matrix to any particular rearranged matrix by a series
of row or column interchanges, either all of the possible series of interchanges will require an even
number of interchanges, or all of the possible series of interchanges will require an odd number of
interchanges.
In Theorems 11.1 and 11.4, we’ve seen the effects of 2 of our 3 types of elementary row operations
on the determinant of a matrix. What is the effect of the one remaining ero on the determinant?
Actually, nothing at all. It’s harder to demonstrate this in the general case, i.e. for an n × n matrix,
but easy enough to see for a 2×2 matrix, and since evaluating a determinant can always be expressed
in terms of evaluating determinants of 2 × 2 submatrices, you should then be willing to accept that
the same result is true for a larger matrix.
Of course, the ero we’re talking about here is “replace a row by that row plus a scalar multiple
of another row”. Let’s see what happens to the determinant of a non-specific 2 × 2 matrix when we
replace row 2 by itself plus some non-specific scalar multiple of row 1, i.e. by row 2 plus k times row
1 for any k 6= 0. For instance, if row 1 contains a and b, while row 2 contains c and d, then the new
row 2 will contain c + ka and d + kb, and we have:
a b a b
det = a(d + kb) − (c + ka)b = ad + kab − cb − kab = ad − cb = det
c + ka d + kb c d
We see that the value of the determinant is not affected at all by this ero. The determinant has the
same value as the determinant of the matrix before the ero is performed. And of course, if we did
a similar transformation using columns instead of rows, it would be like performing the ero on the
transpose of the matrix and would again make no difference to the value of the determinant.
150 Unit 11
Theorem 11.5. Let A = [aij ] be any square matrix. Consider the matrix B = [bij ] obtained by
adding any scalar multiple of one row (or column) to another row (or column) in A and leaving the
other rows (or columns) unchanged. Then det B = det A.
1 2 3
Example 11.8. Find det 0 5 7 .
−2 −4 1
Solution:
In this case, row 3 is almost, but not quite, a scalar multiple of row 1. That is, the first two entries
of row 3 are −2 times the first two entries of row 1, but the third entry is different. This means that
if we add 2 times row 1 to row 2, those first two entries will become zeroes and we’ll have an upper
triangular matrix. And doing this will not change the value of the determinant. So we see that:
1 2 3 1 2 3 1 2 3
det 0 5 7 = det 0 5 7 = det 0 5 7 = (1)(5)(7) = 35
−2 −4 1 −2 + 2(1) −4 + 2(2) 1 + 2(3) 0 0 7
1 1 1
Example 11.9. If A = 0 1 1 , find det A.
2 5 3
Solution:
Well, this one is less obvious. But maybe we can transform the matrix to an upper triangular matrix.
We’ll start by getting a zero in the (3, 1)-entry, by subtracting 2 times row 1 from row 3. According
to Theorem 11.5, the value of the determinant of this new matrix will be the same as the value of
the determinant of the original matrix. So we have:
1 1 1 1 1 1 1 1 1
det 0 1 1 = det 0 1 1 = det 0 1 1
2 5 3 2 − 2(1) 5 − 2(1) 3 − 2(1) 0 3 1
And now, we can get a zero in the (3, 2)-entry by subtracting 3 times row 2 from row 3, again leaving
the value of the determinant unchanged:
1 1 1 1 1 1 1 1 1
det 0 1 1 = det 0 1 1 = det 0 1 1 = (1)(1)(−2) = −2
0 3 1 0 − 3(0) 3 − 3(1) 1 − 3(1) 0 0 −2
Using the results from Theorems 11.1, 11.4 and 11.5 together, we can find the determinant of
any matrix by row-reducing. We don’t need to row-reduce all the way to RREF. We simply need to
get to an upper (or lower) diagonal matrix. We track the effect of the ero’s on the determinant as
we perform them.
1 −1 1 2
3 1 4 1
Example 11.10. Find the determinant of A =
2
.
1 −1 2
1 1 −2 1
Solution:
A does not have any row that’s helpfully filled with, or nearly filled with, zeroes, so there’s no row or
column we can expand along that won’t require a lot of work. And no row or column is duplicated,
Unit 11 151
or is a scalar multiple of any other, so we can’t easily conclude that the determinant is 0. But
rather than crunching through the work of expansion, we can perform the relatively friendlier task
of row-reducing the matrix to get to a triangular matrix. We don’t necessarily need to worry about
getting leading ones (although we may want to, to make the arithmetic easier), and we don’t need
to entirely clear out columns, we simply need to clear out the entries below (or alternatively above)
the main diagonal. Every time we perform an ero, we indicate what the effect on the determinant
is. We get:
1 −1 1 2
3 Start by clearing out column 1 below row 1,
1 4 1
det A = det 2
by subtracting multiples of row 1,
1 −1 2
which leaves the determinant unchanged.
1 1 −2 1
1 −1 1 2
0 Clearing out column 2 below the main diagonal
4 1 −5
= det
0
will be easier with a leading 2 in row 2,
3 −3 −2
so we interchange rows 2 and 4
0 2 −3 −1
1 −1 1 2 The sign of the determinant changed.
0 2 −3 −1 Now we subtract 2 times row 2 from row 4
= − det
and subtract 23 row 2 from row 3.
0 3 −3 −2
0 4 1 −5 (No effect on the determinant.)
1 −1 1 2
0 The last calculation will be easier if we
2 −3 −1
= − det 3 get rid of the leading fraction in row 3.
− 21
0 0 2 Factor 12 out of row 3.
0 0 7 −3
1 −1 1 2
1
0 2 −3 −1 Now subtract 73 row 3 from row 4,
= − 2 det
0
0 3 −1 which doesn’t change the value of the determinant.
0 0 7 −3
1 −1 1 2
1
0 2 −3 −1
= − 2 det
0
0 3 −1
0 0 0 − 32
1
(1)(2)(3) − 23 = 2
= − 2
Notice: We could avoid that last, awkward, fraction calculation (i.e. row 4 - 37 row 3) by factoring a
3 out of row 3 (to get a leading one, but introducing a fraction in the (3, 4)-entry), or by calculating
the determinant at that point (i.e. almost upper triangular except for a 2 × 2 in the bottom right
corner) by repeatedly expanding on column 1 as follows:
1 −1 1 2
2 −3 −1
1 0 2 −3 −1 1
− det 0
= − 1 det 0 3 −1 − 0 + 0 − 0
2 0 3 −1 2
0 7 −3
0 0 7 −3
1 3 −1
= − 1(2) det −0+0
2 7 −3
1
= − (2)[(3(−3) − 7(−1)]
2
= −(−9 + 7) = −(−2) = 2
152 Unit 11
Notice that in that example, we had some fractional multipliers, but they miraculously disap-
peared. That’s because they were caused by the leading non-ones appearing on the main diagonal.
Of course, the determinant of a matrix which contains only integers must always be integer-valued.
So if fractions are introduced as you row-reduce, those fractions must cancel out at the end. If they
don’t, there’s something wrong in your arithmetic.
There are also some theoretical results relating the effect on the determinant of certain operations
of matrix arithmetic. First of all, suppose a matrix is multiplied by a scalar. What effect does that
have on it’s determinant? Look out ... I can hear you thinking “Well, that’s easy. Obviously, the
determinant is multiplied by ...” and up to that point you’re right. But if you were going to finish
the sentence with “that same scalar.” then you’re wrong. Think about it for a moment. We’ve
already seen (in Theorem 11.1) that multiplying one row of a matrix by a scalar has the effect of
multiplying the determinant by that scalar. And for any matrix A, we obtain the matrix cA by mul-
tiplying every element of the matrix by the scalar c. So for an n × n matrix, we’re multiplying each
of the n rows by c. When we multiply row 1 by c, that multiplies the determinant by c. And then
when we multiply row 2 by c, that multiplies the determinant by c again! So now the determinant
has been multiplied by c × c = c2 . And by the time we’ve multiplied each of the n rows by c, we’ve
multiplied the determinant by c n times over, i.e. we’ve multiplied the determinant by cn . So the
sentence actually needs to be finished with “by that same scalar, raised to the power n, where n is
the order of the square matrix.”. So we have another Corollary from Theorem 11.1.
Corollary 11.6. Let A be any square matrix of order n and c be any scalar. Then det cA = cn det A.
Example 11.11. If A is a 4 × 4 matrix with det A = 2, and B is a 3 × 3 matrix with det B = −1,
find det(−A) and det(5B).
Solution:
Since A is a square matrix of order 4, then for any scalar c we have det(cA) = (c4 ) det A. So we get:
24 12
Example 11.12. Find det .
18 −6
Solution:
Well, it’s only a 2 × 2 matrix, so we know how to “easily” calculate the determinant, but ... the
arithmetic looks a bit non-trivial. However, every element in the matrix is a multiple of 6. So we
can factor a 6 out of each row, to make the arithmetic friendlier. That is, letting A be the matrix
which when multiplied by 6 gives this matrix, the determinant of this matrix is 62 det A. And we
get A by dividing each element by 6. So we see that
24 12 2 4 2
det = 6 det = 36[4(−1) − 3(2)] = 36(−4 − 6) = 36(−10) = −360
18 −6 3 −1
Unit 11 153
There’s another result, that’s harder to explain, that tells us about the determinant of the prod-
uct of two matrices. We won’t worry about why it’s true. We’ll just accept that it is true.
Theorem 11.7. Let A and B be two square matrices of the same order. That is, both are n × n
matrices for some value n, so that the product matrix AB is defined. Then the determinant of the
product matrix is the product of the determinants of the 2 matrices. That is,
3 2 −1 6 6 −3
Example 11.13. Let A = ,B= and C = . Find det(ABC).
7 5 3 4 −4 2
Solution:
Since each of the matrices is a 2×2 matrix, then the matrix ABC is defined and is also a 2×2 matrix.
That means that once we find what this matrix is, it will be easy to calculate its determinant. But
actually calculating ABC will be a lot of work. Instead, we can just use Theorem 11.7 to save us all
that work. We simply need to find the determinants of the 3 matrices, and multiply the determinants
together, instead of multiplying the matrices. That is, applying the theorem twice, we get:
det(ABC) = det[(AB)C] = (det AB)(det C) = [(det A)(det B)](det C) = (det A)(det B)(det C)
Therefore we have:
3 2 −1 6 6 −3
det(ABC) = (det A)(det B)(det C) = det det det
7 5 3 4 −4 2
1 5 34 2 0 0
Example 11.14. If A = 0 −2 47 and B = 9 −1 0 , find det AB.
0 0 10 15 −7 2
Solution:
Notice that although A is an upper triangular matrix, and B is a lower triangular matrix, the product
matrix, AB will not be triangular. (If they were both upper triangular, or both lower triangular,
then the product would be, too. But not when we have one of each type. Go ahead and experiment.
You don’t have to experiment with these. Try it with matrices whose non-zeroes are all 1’s. You’ll
see.) So not only will finding AB be a lot of work, but then we’ll have to do a lot of work to find it’s
determinant, too. However finding det A and det B is easy. Fortunately we can use Theorem 11.7
and not do much more work than that.
5 1 −3 2
Example 11.15. If A = and B = , find det(A + B).
4 3 7 1
Solution:
This time it’s not a product, it’s a sum. We don’t have a theorem telling us about the determinant
of the sum of two matrices. (Maybe it’s coming next? Better not count on it, or make any foolish
154 Unit 11
assumptions. In the absence of a theorem stating otherwise, we need to do the work.) So we need
to find the sum matrix and then calculate its determinant.
5 1 −3 2 2 3
A+B = + =
4 3 7 1 11 4
so we get
2 3
det(A + B) = det = 2(4) − 11(3) = 8 − 33 = −25
11 4
Notice that det A = 11 and det B = −17 and so det A + det B = −6 6= det(A + B). (So in fact
we aren’t going to have a theorem saying anything nice about the determinant of the sum of two
matrices. It is always necessary to calculate the sum matrix and find its determinant.)
Theorem 11.7 also has a nice corollary about the determinant of the inverse of a matrix. We know
that for any nonsingular matrix A, AA−1 = I. And we also know that det I = 1. So that means
that det(AA−1 ) = 1. And now from the theorem we also know that det(AA−1 ) = (det A)(det A−1 ).
So we see that:
1
AA−1 = I ⇒ det AA−1 = 1 ⇒ (det A)(det A−1 ) = 1 ⇒ det A−1 =
det A
(Notice that since (det A)(det A−1 ) = 1 then it cannot be true that det A = 0, because that would
give (det A)(det A−1 ) = 0 6= 1, and so we can divide through by det A.)
1
Corollary 11.8. For any nonsingular square matrix A, det A−1 = .
det A
Of course, if we have to find A−1 in order to know whether or not A has an inverse, so that we
know whether or not we can apply the corollary, then maybe the corollary is of limited usefulness.
If only there was an easier way to know whether or not a matrix is invertible.
Let’s think again about what we said above. (The parenthetical remark just before the statement
of the corollary.) We saw that if A is nonsingular, i.e. if A−1 exists, then det A cannot be 0. Of
course, we can also take that the other way. If det A = 0, then it cannot be true that A is invertible,
i.e. is nonsingular. (Using the same reasoning as above.) And in fact, whenever det A 6= 0, it turns
out that A is nonsingular. That doesn’t follow from the reasoning above, or a variation of it, but
it’s not too hard to see.
Consider any square matrix A. Let B be the RREF of A. Then we know that B can be ob-
tained from A by transforming A using elementary row operations. And we know what the effect
on the determinant of those operations is. Some operations don’t change the determinant at all.
Others change the sign of the determinant. And the only other possibility for ero’s is to multiply
the determinant by a non-zero scalar. None of those effects will cause the determinant to be 0 if it
wasn’t already. That is, the net effect on the determinant of all the ero’s performed in transforming
A to the RREF matrix B can be expressed as det A = c det B for some non-zero scalar c. (c is the
product of all the non-zero scalars which rows were multiplied by, times the product of all the −1
multipliers resulting from interchanging rows.)
And what do we know about the RREF of a square matrix A? From our Procedure for Finding
the Inverse of a Matrix, we know that if the RREF of A is an identity matrix, then A is nonsin-
gular, and otherwise, it’s not. That is, if A is nonsingular, then B = I, so det B = det I = 1 and
Unit 11 155
det A = c det B = c for some non-zero value c. So we see that if A is nonsingular, i.e. if A−1 exists,
then det A 6= 0. (We had already realized that, above.)
Now suppose that det A 6= 0. Then since det A = c det B for some non-zero value c, it must also
be true that det B 6= 0. Therefore B, which is the RREF of A, doesn’t contain a row of only 0’s.
But for a square matrix, if the RREF is not an identity matrix then it must contain a row of zeroes.
So knowing that B does not contains a row of only zeroes tells us that it must be true that B = I.
And therefore (according to the Procedure for Finding the Inverse of a Matrix) A is invertible, i.e.
A is nonsingular.
And so we see that if A is nonsingular then det A 6= 0, and if det A 6= 0 then A is nonsingular.
And so “det A 6= 0” and “A is nonsingular” can only occur together. Knowing that one is true also
tells us that the other is true.
Theorem 11.9. Let A be any square matrix. Then A is invertible if and only if det A 6= 0.
4 5 9 12 15
0 9 6 10 14
Example 11.16. Prove that the matrix A =
0 0 −13 8 11 has an inverse.
0 0 0 −21 26
0 0 0 0 −5
Solution:
Until now, the only way we knew to prove that this square matrix is invertible was to actually find
the inverse. (That’s not quite true. We have Theorem 9.8 which told us about various other things
which are equivalent to knowing that A is nonsingular, so we could prove any of those. For instance,
we could prove that A~x = ~0 has only the trivial solution. But we would have to do just as much
work to prove that any of those results are true.) And since we weren’t asked to actually find A−1 ,
but just to prove that it exists, and finding the inverse would be a lot of work, we’d really rather not
do it. Now, because of Theorem 11.9, we don’t have to. We only have to calculate det A and show
that it’s not 0. And since A is triangular, that’s easy. Or would be if those were nicer numbers along
the main diagonal of A. But in fact, we don’t even have to find the value of the determinant. We
just have to show that it’s not 0. And we know that the product of a bunch of non-zero numbers,
no matter how many, and no matter how ugly, cannot be 0. So here we have
1 2 3
Example 11.17. For A = 0 −2 7 , find det A−1 .
0 0 −1
Solution:
We see that det A = 1(−2)(−1) = 2 6= 0, so A−1 exists and using Corollary 11.8 we get
1 1
det A−1 = =
det A 2
156 Unit 11
1 2 3 2 0 0
Example 11.18. Prove that A = 0 2 1 and B = 0 2 0 are both invertible, but the
0 0 5 0 0 3
matrix A − B is not.
Solution:
We see that det A = 1(2)(5) = 10 6= 0, so A is invertible, and that det B = (2)(2)(3) = 12 6= 0, so B
is also invertible. And we have
1 2 3 2 0 0 −1 2 3
A−B = 0 2 1 − 0 2 0 = 0 0 1
0 0 5 0 0 3 0 0 2
Since A − B is upper triangular, det(A − B) = (−1)(0)(2) = 0 and therefore A − B is not invertible.
In Example 11.16 above, we mentioned Theorem 9.8, which states several equivalent statements,
including “square matrix A is invertible”. We now know that “det A 6= 0” is equivalent to “A is
invertible”, which means that it’s also equivalent to all those other statements. So we can add an-
other piece to that theorem. That is, combining Theorem 9.8 and Theorem 11.9 we get the following
Corollary.
Corollary 11.10. If A is a square matrix of order n then the following statements are equivalent
to one another.
1. A is invertible (i.e., nonsingular).
2. r(A) = n (i.e., A has full rank).
3. The RREF of A is I (i.e., A is row-equivalent to the identity matrix).
4. The system A~x = ~b has an unique solution (for all n × 1 column vectors ~b).
5. The homogeneous system A~x = ~0 has only the trivial solution.
6. det A 6= 0.
3 2
Example 11.19. If A = , find all solutions to the homogeneous system A3 ~x = ~0.
5 4
Solution:
We see that det A = 3(4) − 5(2) = 12 − 10 = 2, so det A3 = (det A)(det A)(det A) = 23 = 8 6= 0.
Therefore the homogeneous system with coefficient matrix A3 has only the trivial solution. That is,
the only solution to the given SLE is ~x = ~0.
1 2 5 7
0 3 6 2
Example 11.20. How many solutions does the SLE A~x = ~0 have, if A =
0
?
0 0 8
0 0 0 5
Solution:
We see that det A = 1(3)(0)(5) = 0 so this homogeneous SLE does not have only the trivial solution,
it has infinitely many solutions. Furthermore, we can see that when we row-reduce A, only column
3 will not contain the leading one for any row, so the system has a one-parameter family of solutions.
Math 1229A/B
Unit 12:
Applications of the Determinant
(text reference: Section 4.3)
c
V. Olds 2010
Unit 12 157
Cramer’s Rule
Consider a SLE A~x = ~b in which A is a square matrix of order n with det A 6= 0. We can form
n new n × n matrices by replacing different columns of A by the column vector ~b. And if we do so,
then we can directly find the value of xj in the unique solution to A~x = ~b using the determinant of
one of these new matrices and the determinant of A. A fellow named Cramer developed a rule for
doing this. Before we get to the rule, though, we need to define these new matrices and the notation
we use to refer to them.
Definition: Let A~x = ~b be any SLE in which A is a square matrix. We define the ma-
trix A(j) to be the matrix obtained by replacing column j of A with the column vector ~b.
First, let’s look at an example of forming these matrices, to make sure you understand what the
definition is saying.
Example 12.1. Find A(1), A(2) and A(3), where A is the coefficient matrix of the linear system:
x1 + x2 − x3 = 6
x1 − x2 + x3 = 2
x1 − 2x3 = 0
Solution:
We have the SLE A~x = ~b where
1 1 −1 6
A= 1 −1 1 and ~b = 2
1 0 −2 0
We form the matrix A(j) by replacing the j th column of A by the column vector ~b. So for instance
to form A(1) we write the numbers from ~b instead of the first column of A, and then write columns
2 and 3 of A as usual. And so forth. We get:
6 1 −1 1 6 −1 1 1 6
A(1) = 2 −1 1 A(2) = 1 2 1 A(3) = 1 −1 2
0 0 −2 1 0 −2 1 0 0
Now, how do we use these matrices? Recall that as long as det A 6= 0, the SLE A~x = ~b has a
unique solution. According to Mr. Cramer, the values of the xj ’s in the unique solution to such
158 Unit 12
a system are the quotients of the determinants of these new matrices over the determinant of the
coefficient matrix. Neat trick, eh? We won’t attempt to prove or even explain why this works. We’ll
just take Mr. Cramer’s word for it. (And that means that you can just ignore pages 170 and 171 in
the text.)
This means that if these determinants are reasonably easy to find, using Cramer’s Rule can
be an easier way to find the solution to a SLE than row reducing. (However if the determinants
require a lot of work to calculate, then using Cramer’s Rule involves more work than row reducing.
So that’s just obnoxious.) For instance determinants of 2 × 2 matrices are always easy to find, so
Cramer’s Rule is a reasonably good way to solve a system of 2 equations in 2 unknowns, as long as
the coefficient matrix is nonsingular. And if there are 0’s around then sometimes the determinants
of larger square matrices are reasonably easy to find. The following examples show how Cramer’s
Rule can be used to find the unique solution to a SLE with a nonsingular square coefficient matrix.
It’s important to remember, though, that Cramer’s Rule simply doesn’t apply to a system whose
coefficient matrix isn’t square, or has determinant 0.
Example 12.2. Use Cramer’s Rule to find the unique solution to the SLE in Example 12.1.
Solution:
We have the SLE:
x1 + x2 − x3 = 6
x1 − x2 + x3 = 2
x1 − 2x3 = 0
1 1 −1
with coefficient matrix A = 1 −1 1 and we found the matrices
1 0 −2
6 1 −1 1 6 −1 1 1 6
A(1) = 2 −1 1 A(2) = 1 2 1 A(3) = 1 −1 2
0 0 −2 1 0 −2 1 0 0
First, we find det A. We can zero out column 1 and then expand along that column:
1 1 −1 1 1 −1
−2 2
det A = det 1 −1 1 = det 0 −2 2 = 1 det −0+0
−1 −1
1 0 −2 0 −1 −1
So using Cramer’s Rule, we see that the values of x1 , x2 and x3 in the unique solution to the system
are:
det A(1) 16 det A(2) 16 det A(3) 8
x1 = = = 4, x2 = = = 4, x3 = = =2
det A 4 det A 4 det A 4
That is, the unique system to this SLE is (x1 , x2 , x3 ) = (4, 4, 2).
3x1 − x2 = 5
Example 12.3. Use Cramer’s Rule, if possible, to solve the system .
2x1 + x2 = −2
Solution:
3 −1 ~ 5
We have the coefficient matrix A = and the RHS column vector b = . We see
2 1 −2
that det A = (3)(1) − (2)(−1) = 3 − (−2) = 5 6= 0, so Cramer’s Rule can be used. We get:
5 −1
det
det A(1) −2 1 (5)(1) − (−2)(−1) 5−2 3
x1 = = = = =
det A det A 5 5 5
3 5
det
det A(2) 2 −2 (3)(−2) − (2)(5) −6 − 10 16
and x2 = = = = =−
det A det A 5 5 5
3 16
Therefore the unique solution to the SLE is (x1 , x2 ) = 5 , − 5 .
Check:
3 3
− 1 − 16 9 16 25
5 3 5 5 5 + 5 5
3 −1 5
= = = = = ~b
2 1 −2
− 16 3
− 16 6 16
− 10
5 2 5 +1 5 5 − 5 5
Example 12.4. Use Cramer’s Rule, if possible, to solve the following SLE:
x1 + 2x2 + x3 = 0
3x2 + 2x3 = 2
x2 + 3x3 = 0
160 Unit 12
Solution:
1 2 1 0
We have A = 0 3 2 , with ~b = 2 . We get
0 1 3 0
1 2 1
3 2
det A = det 0 3 2 = 1 det
− 0 + 0 = 3(3) − 1(2) = 7
1 3
0 1 3
so det A 6= 0 and the system has a unique solution which we may find using Cramer’s Rule. We get:
0 2 1
det 2 3 2 2 1
0 − 2 det +0
det A(1) 0 1 3 1 3 −2(6 − 1) 10
x1 = = = = =−
det A 7 7 7 7
1 0 1
det 0 2 2
det A(2) 0 0 3 (1)(2)(3) 6
x2 = = = =
det A 7 7 7
1 2 0
det 0 3 2 3 2
1 det −0+0
det A(3) 0 1 0 1 0 0−2 2
x3 = = = = =−
det A 7 7 7 7
We get the unique solution (x1 , x2 , x3 ) = − 10 6 2
7 , 7, −7 .
Check:
10 10 12 2 0
− − + −
7
7 7 7 7
1 2 1 6
0
0 3 18 4 14
= 2 = ~b
2
7 = 0+ 7 − 7
=
7
0 1 3
0
2 6 6 0
− 0+ −
7 7 7 7
Therefore, to find AdjA, we simply make a matrix in which the (i, j)-entry is the (i, j)-cofactor of
AT . To do this, it’s usually easiest to start by finding AT . (Alternatively, we could find the matrix
of cofactors of A, and then take the transpose of this matrix, but if we do the transposing at the
beginning, it’s done with and we don’t have to remember to do it at the end.) Of course, since the
cofactors are what we use to calculate the determinant, the effect of the (−1) multipliers is to make
the signs alternate, just the way they do for determinant calculations.
1 2
Example 12.5. Find AdjA where A = .
3 4
Solution:
1 3
We have AT = , so we get:
2 4
+ det A11 − det A21 det[4] − det[2] 4 −2
AdjA = = =
− det A12 + det A22 − det[3] det[1] −3 1
1 2 3
Example 12.6. If A = 0 4 5 , find AdjA.
0 0 6
Solution:
1 0 0
We have AT = 2 4 0 , so we get:
3 5 6
4 0 2 0 2 4
det 5 6
− det
3 6
det
3 5
0 0 1 0 1 0
AdjA = − det
det − det
5 6 3 6 3 5
0 0 1 0 1 0
det − det det
4 0 2 0 2 4
24 −12 −2
= 0 6 −5
0 0 4
1 2
Example 12.7. Let A = .
3 4
1
AdjA = A−1 .
(a) Find A(AdjA). (b) Find (AdjA)A. (c) Find det A. (d) Show that det A
Solution:
4 −2
In Example 12.5 we found AdjA = . We use that to calculate the matrix products in
−3 1
parts (a) and (b).
(a) We get:
1 2 4 −2 4−6 −2 + 2 −2 0
A(AdjA) = = =
3 4 −3 1 12 − 12 −6 + 4 0 −2
Hmm. A diagonal matrix. With the same entry repeated along the diagonal. (Must be a fluke.)
162 Unit 12
Why look! It’s the same matrix! We know that generally AB 6= BA, so that’s surprising.
1 2
(c) We have det A = det = 1(4) − 3(2) = 4 − 6 = −2. Goodness! That’s the number that
3 4
was all along the diagonal of the 2 (same) product matrices.
−2 0
(d) We already found, in part (b), that (AdjA)A = , and from part (c) we have
0 −2
det A = −2, so we see that
1 1 1 −2 0 1 0
AdjA A = [(AdjA)A] = − = = I2
det A −2 2 0 −2 0 1
Since det1 A AdjA A = I2 , then by Theorem 8.2 det1 A AdjA is the inverse of A.
What we found in that example was not, of course, just a fluke. (If it was just a fluke, we either
wouldn’t have done that example, or would immediately follow it with another example that didn’t
turn out that way, to show it wasn’t a general rule.) It can be proved (although we’re not going to
do it) that for any square matrix A, it is always true that
And that means that if det A 6= 0, then A is invertible and we can post-multiply through by A−1 in
the equation (AdjA)A = (detA)I to get
1
(AdjA)A(A−1 ) = (det A)I(A−1 ) ⇒ AdjA = (det A)A−1 ⇒ A−1 = AdjA
det A
So this gives us a very different way to find A−1 , when it exists, than the way we were doing it before.
Solution:
1 2 3 24 −12 −2
We have A = 0 4 5 , and in Example 12.6 we found that AdjA = 0 6 −5 . We
0 0 6 0 0 4
see that det A = (1)(4)(6) = 24 and so using the adjoint to find the inverse matrix we get:
1 − 21 − 121
24 −12 −2
1 1
A−1 = 1 5
AdjA = 0 6 −5 = 0 4 − 24
det A 24
0 0 4
1
0 0 6
Unit 12 163
Notice that besides giving us a new way to find the inverse matrix, it also gives us a way to check
our calculation of AdjA. We check that this really is the inverse of A:
1 1
1 + 0 + 0 − 21 + 42 + 0 − 121
− 10 3
1 − 2 − 12 24 + 6
1 2 3
1 5 4 20 5
0 4 5 0 − 24 = 0 + 0 + 0
0+ 4 +0 0 − 24 + 6
4
0 0 6
1 6
0 0 6 0+0+0 0+0+0 0+0+ 6
1 0 0
= 0 1 0
0 0 1
1 0 2
Example 12.9. Use the adjoint to find the inverse of A = 0 3 0 .
1 0 5
Solution:
1 2
Expanding along row 2 we get det A = −0 + 3 det − 0 = 3(5 − 2) = 9, so A−1 exists. We
1 5
1 0 1
have AT = 0 3 0 so we get
2 0 5
3 0 0 0 0 3
det 0 5
− det
2 5
det
2 0
− det 0 1 1 1 1 0
AdjA = det − det
0 5 2 5 2 0
0 1 1 1 1 0
det − det det
3 0 0 0 0 3
15 0 −6
= 0 3 0
−3 0 3
15
0 − 96 5
0 − 32
9 3
15 0 −6
−1 1 1 3
1
and A = AdjA = 0 3 0 =
0 9 =
0 0 3 0
det A 9
−3 0 3
− 93 0 3
9 − 13 0 1
3
Check:
5
− 23 5 2
− 32 + 0 + 2
3 0 3 +0− 3 0+0+0 3
1 0 2 1 0 0
AA−1 1
= 0 3 0
0 3 0
= 0+0+0
0+1+0 0+0+0 = 0 1 0
1 0 5 0 0 1
− 13 0 1
3
5 5
3 +0− 3 0+0+0 − 32 + 0 + 35
164 Unit 12
6 0 0
Example 12.10. If det A = 6 and AdjA = 0 3 0 , find A.
0 0 2
Solution:
We find A−1 and then find its inverse, which is A.
6 0 0 1 0 0
1 1
A−1 = AdjA = 0 3 0 = 0 1
2 0
det A 6 1
0 0 2 0 0 3
Notice that in this case the easiest way to find (A−1 )−1 = A is to row reduce. We just multiply row
2 by 2 and row 3 by 3:
1 0 0 1 0 0 1 0 0 1 0 0
RREF
0 1 0 0 1 0 −−−−→ 0 1 0 0 2 0
2
0 0 13 0 0 1 0 0 1 0 0 3
1 0 0
We see that A = (A−1 )−1 = 0 2 0 .
0 0 3
3 2 1
1
Example 12.11. If det A = 2 and A−1 = 0 2 2 , find AdjA.
1
0 0 3
Solution:
Since A−1 = det1 A AdjA, then AdjA = (det A)A−1 . (That is, just
multiply through the equation
by det A.) So we see that:
3 2 1 6 4 2
AdjA = (det A)A−1 = 2 0 21 2 = 0 1 4
0 0 13 0 0 2
3
Solution:
As we have already seen, the formula A−1 = det1 A AdjA can be rearranged to AdjA = (det A)A−1 ,
so we just need to take the determinant of both sides. That is, since AdjA = (det A)A−1 then
det(AdjA) = det[(det A)A−1 ]. We use the facts that (1) when we factor a scalar multiplier out of
a determinant calculation, the scalar gets raised to the power n, where n is the order of the square
matrix whose determinant we’re calculating (i.e. det(cA) = cn det A, from Corollary 11.6 on page
152), and of course the det A multiplier is a scalar, and (2)the determinant of the inverse is the
inverse (i.e. reciprocal) of the determinant (that is, det A−1 = det1 A , from Corollary 11.8 on page
154). Here, since A is a square matrix of order 5, then so is A−1 and so we get:
(−2)5
−1 5 −1 5 1
det(AdjA) = det[(det A)A ] = (det A) det(A ) = (−2) = = (−2)4 = 16
det A −2
We’ll finish up the course by looking at one last example, to tie what we’ve just been learning into
solving systems of equations.
Unit 12 165
4 −2 −2
Example 12.13. If det A = 4 and AdjA = −8 6 2 , find the unique solution to the SLE
2 −2 2
~ ~
A~x = b where b = (2, −1, 1).
Solution:
Since we know the adjoint matrix and the valueof det A, then we actually know A−1 and so we can
use the method of inverses. We have A−1 = 14 AdjA and so we get:
4 −2 −2 2 8+2−2 8 2
1 1 1 1
~x = A−1~b = (AdjA)~b = −8 6 2 −1 = −16 − 6 + 2 = −20 = −5
4 4 4 4
2 −2 2 1 4+2+2 8 2
Notice that as usual we can save ourselves some hassle by waiting until the end to apply the frac-
tional scalar multiplier.