0% found this document useful (0 votes)
10 views139 pages

LinearAlgebraII Nairobi

This document contains lecture notes for the B.Ed Science Programme at the University of Nairobi, focusing on Linear Algebra II, authored by Dr. Bernard Mutuku Nzimbi. It covers essential topics such as determinants, eigenvalues, minimal polynomials, linear functionals, and bilinear forms, aimed at second and third-year students. The notes emphasize the importance of linear algebra in various fields, including mathematics, engineering, and the social sciences.

Uploaded by

Kepher Neville
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views139 pages

LinearAlgebraII Nairobi

This document contains lecture notes for the B.Ed Science Programme at the University of Nairobi, focusing on Linear Algebra II, authored by Dr. Bernard Mutuku Nzimbi. It covers essential topics such as determinants, eigenvalues, minimal polynomials, linear functionals, and bilinear forms, aimed at second and third-year students. The notes emphasize the importance of linear algebra in various fields, including mathematics, engineering, and the social sciences.

Uploaded by

Kepher Neville
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

December 21, 2011

UNIVERSITY OF NAIROBI
FACULTY OF SCIENCE
[Link] SCIENCE PROGRAMME LECTURE NOTES
SMA 204: LINEAR ALGEBRA II

WRITTEN BY :

Dr. Bernard Mutuku Nzimbi

REVIEWED BY:

Dr. James K. Katende

School of Mathematics, University of Nairobi


P.o Box 30197, Nairobi, KENYA.

EDITED BY:

Dr. Bernard Mutuku Nzimbi


Author: Dr. B.M. Nzimbi
School of Mathematics
College of Biological and Physical Sciences
University of Nairobi
P.O. Box 30197, Nairobi, KENYA.
e-mail: nzimbi@[Link]

2010 Mathematics Subject Classification: 11C08,11D04,11D09,11E04,11E39


46-01, 47A75.

Copyright ⃝
c 2011 Benz, Inc. All rights reserved. Printed in Kenya.
Contents

Preface iv

1 DETERMINANTS 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Some Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Synopsis on Determinants . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Determinants by Cofactor Expansion . . . . . . . . . . . . . . . . 5
1.3.3 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.4 Determinants by Row/Column Reduction . . . . . . . . . . . . . 13
1.4 Determinants of Block Triangular Matrices . . . . . . . . . . . . . . . . . 18
1.5 Properties of the Determinant Function . . . . . . . . . . . . . . . . . . . 19
1.6 Applications of Determinants . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6.1 Finding Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . 23
1.6.2 Equivalent Conditions and systems of Linear Equations . . . . . . 25
1.6.3 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.6.4 Area, Volume and Equations of Lines and planes . . . . . . . . . 28
1.7 Solved Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2 EIGENVALUES AND EIGENVECTORS 46


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.2 The Eigenvalue Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

i
2.2.1 Applications of the Eigenvalue Problem . . . . . . . . . . . . . . . 48
2.2.2 Finding Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . 48
2.2.3 Characteristic Polynomial of a Square Matrix . . . . . . . . . . . 50
2.3 Polynomial Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.4 Algebraic Multiplicity and Geometric Multiplicity of an Eigenvalue . . . 54
2.4.1 Characteristic Polynomials of Block Triangular Matrices . . . . . 56
2.5 Similarity and Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . 57
2.5.1 Similar Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.5.2 Diagonalization of Square Matrices . . . . . . . . . . . . . . . . . 59
2.6 Orthonormal diagonalization . . . . . . . . . . . . . . . . . . . . . . . . 64
2.6.1 Diagonalization of Symmetric Matrices . . . . . . . . . . . . . . . 65
2.7 Solved Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3 MINIMAL POLYNOMIAL OF A SQUARE MATRIX 81


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.2 Minimal polynomial of a Block Triangular Matrix . . . . . . . . . . . . . 85
3.3 Minimal polynomials of Diagonalizable matrices . . . . . . . . . . . . . . 86
3.4 Minimal polynomials of Similar matrices . . . . . . . . . . . . . . . . . . 86
3.5 Solved Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4 LINEAR FUNCTIONALS 92
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2 Dual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3 Dual Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.3.1 Determining Dual Basis given a Basis for a Vector Space . . . . . 94
4.4 Solved Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5 BILINEAR AND QUADRATIC FORMS 98


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2 Bilinear Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

ii
5.2.1 Bilinear Forms and Matrices . . . . . . . . . . . . . . . . . . . . . 100
5.2.2 Symmetric Bilinear Forms, Quadratic Forms . . . . . . . . . . . . 100
5.2.3 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2.4 Classification of Real Symmetric Bilinear Forms . . . . . . . . . . 102
5.3 Testing for Positive Definiteness . . . . . . . . . . . . . . . . . . . . . . . 103
5.3.1 Change of Variable in a Quadratic Form . . . . . . . . . . . . . . 104
5.3.2 Geometric View of Principal Axes . . . . . . . . . . . . . . . . . . 106
5.4 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.5 Solved Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6 ORTHOGONAL MATRICES AND OPERATORS 111


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.2 Unitary and Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . . 112
6.3 Some Special Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3.1 Special Orthogonal Groups in Two and Three Dimensions . . . . 115
6.4 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Bibliography 122

iii
Preface
The study of linear algebra is indispensable for a prospective student of pure, statistics
or applied mathematics. The subject of linear algebra is one of the fundamental areas of
mathematics, and is the foundation for the study of many advanced topics, not only in
mathematics, but also in engineering and the physical sciences. It prepares students for
the demands of physical sciences. A thorough understanding of the concepts of linear
algebra has also become increasingly important for the study of advanced topics in eco-
nomics and the social sciences. Topics such as eigenvalue problem, determinants, bilinear
and quadratic forms, linear operators are fundamental in mathematics and physics as
well as engineering, economics, and many other areas.
The only absolute prerequisites for mastering the material in the book are SMA 203: Lin-
ear Algebra I and an interest in mathematics and a willingness occasionally to suspend
disbelief when a familiar idea occurs in an unfamiliar guise. But only an exceptional
student would profit from reading the book unless he/she has previously acquired a fair
working knowledge of vector spaces and matrices.

This book is a development of various courses designed for second year students of math-
ematics, humanities and third year students of education at the University of Nairobi,
whose preparation has been some rudimentary knowledge of matrices and vector spaces.
The lectures have been designed to facilitate learning by outreach and distance learning
students on their own.

iv
Objectives
At the end of this course unit the learner will be able to:

• Find determinants of square matrices by cofactor expansion and by elementary


row/column operations.

• Apply the notion of a determinant to find inverse of an invertible matrix; check


whether a given system of linear equations has a unique solution, no solution or
more than one solutions; solve a system of equations using Cramer’s rule; find
area of a triangle, equations of straight lines and curves and planes and volume
of some solids.

• Comprehend the eigenvalue problem of a square matrix; find eigenvalues and


corresponding eigenvectors of a square matrix; find characteristic values of block
triangular matrices and show that a square matrix is a root of its characteristic
polynomial.

• Understand the concept of similar matrices; be able to diagonalize some matri-


ces; show that symmetric matrices are diagonalizable; be able to orthogonally
diagonalize a symmetric matrix.

• Find minimal polynomial of a square matrix; find minimal polynomial of a block


triangular matrix; link minimal polynomial of a matrix and its diagonalizability.

• Define a linear functional; find the dual basis of a given vector space

• Comprehend the concepts of a bilinear form and a quadratic form; give a matrix
representation of bilinear and quadratic forms; classify real quadratic forms; test
for postiche definiteness; apply change of variable in a quadratic form; apply
these concepts in constrained optimization

• Comprehend the notions of unitary and orthogonal matrices; apply these con-
cepts in some actions like rotations and reflections.

The learner will leave with an understanding of the basic results of linear algebra and an
appreciation of the beauty and utility of mathematics. The learner will also be fortified
with the mathematical maturity required for subsequent courses in abstract algebra, real

v
analysis, and elementary topology.
The organization of the book is as follows:
In Chapter one, we introduce the notion of a determinant and study its properties. We
explore two methods of computing determinants of matrices. We also explore some ap-
plications of determinants.

In Chapter two, we give a general theory on the eigenvalue problem. We spend some
time showing how to find eigenvalues and eigenvectors of a matrix. This study shows how
to find the characteristic polynomial of a square matrix. We also investigate similarity
and diagonalization, and show that similar matrices have same characteristic polynomial.

In Chapter three, we continue our study to minimal polynomial, a unique polynomial


extracted from the characteristic polynomial. We investigate the relationship between
characteristic polynomial and minimal polynomial of a square matrix. This relationship
will inform us if a matrix is diagonalizable or not.

The fourth chapter is devoted to the study of linear functionals, which are scalar-valued
linear transformations. The most important concept here is to find the dual basis for a
given vector space.

In Chapter five, we generalize the notions of linear mappings and linear functionals. We
introduce the notion of a bilinear form, which gives rise to a quadratic form. We give
some applications of these concept in constrained optimization.

Finally in Chapter six we continue our study of orthogonal matrices and extend it to
orthogonal linear operators. We look at some properties of unitary and orthogonal ma-
trices and operators. Lastly, but not least, we study some special orthogonal matrices
in two and three dimensions and their applications in rotations and reflections.

Each chapter has several examples that are solved in detail. The idea is to remove the
mystery and show the student how to solve problems. Exercises at the end of each
chapter have been designed to correspond to the solved problems in the text so that

vi
the student can reinforce ideas learned while reading the chapter. A considerable effort
was expended in designing the exercises to ensure an appropriate level of difficulty. The
material in all these chapters constitute the course unit of Linear Algebra II, a course
designed mainly for undergraduate students in the programmes of Bachelor of Education
(both Science and Arts) and Bachelor of Science by distance learning. It can also be
used by the undergraduate students in Bachelor of Science and Bachelor of Arts.

This book has been presented and developed with all the rigour linear algebra requires
at this level. Every definition is stated carefully, every theorem is carefully stated, and
many of these theorems have complete proofs. I have enjoyed writing this book and it
is my hope that you will enjoy reading it. I hope that students find their time spent
reading book profitable. I hope instructors find it flexible enough to fit the needs of their
course. I also hope that everyone will send me their comments and suggestions, which
can help in improving this edition.

Acknowledgements
This book could not have been written without direct and indirect assistance from many
sources. I had the good fortune of interacting with a number of special people and insti-
tutions. I seize this opportunity to express my gratitude to all of them. It is a pleasure
to acknowledge with great pleasure the encouragement I received for this project from
my former students and colleagues at the University of Nairobi and beyond. I have
been fortunate to meet my mentors at Syracuse University, especially Prof. Jack Ucci,
Prof. A. Lutoborski and Prof. Steven Diaz. Their enthusiasm, intellectual integrity
were important ingredients to make me continue. This experience has greatly influenced
my work and my thinking and I hope some of it is reflected in this text. To them I owe a
special debt of gratitude. I am indebted to Dr. James K. Katende and Mwalimu Clau-
dio Achola for their constructive comments when they were reviewing the manuscript,
which have led to improvements including corrections, revisions and additional exercises.
I wish to acknowledge, with thanks, my colleagues who have encouraged me to acquire
expert knowledge of the typesetting software LATEX. As always, I am grateful to the
University of Nairobi, whose facilities have made it possible to write this book. They

vii
have also provided an environment that helped me develop academically and profession-
ally and given me the resources needed to make this book a success. I am also thankful
to the School of Mathematics for giving me the opportunity to teach linear algebra for
years, which has indeed helped this project to take shape. I would also like to thank
the Open and Distance Learning(ODL) Programme for giving me the opportunity at
several review sessions at Multimedia University, Mbagathi and Kenya Wildlife Training
Institute, Naivasha to organize and typeset this book. In these circumstances it would
have been extraordinarily slothful not to produce a book. I read the galleys myself and
checked the calculations in the text. If the book contains any errors or stylistic mis-
judgement or shortcomings, they are entirely mine!

Last, but not least, I am grateful to my family for their continuing and unwavering
support. To them I dedicate this book.

Dr. Bernard Mutuku Nzimbi


Nairobi, December 2011

viii
Notation and Symbols
The following symbols and notation will be used throughout the book.

|A| or det(A): Determinant of a square matrix A.


aij : The (i,j)-entry of a matrix A.
Mij : The minor of aij , the determinant of of the submatrix of Aij obtained by deleting
the ith row and jth column of A.
Cij : The cofactor of aij , which is the signed (i, j)-minor.
RREF : The Reduced Row Echelon Form.
Mn×n : n × n matrix; sometimes the vector space of such matrices.
R: Set of real numbers.
C: Set of complex numbers.
F or K: Scalar field.
At or AT : Transpose of A.
In : The n × n identity matrix.
A−1 : The inverse of matrix A.
adj(A): The classical adjoint OR adjutant of A, which is the transpose of matrix of
co-factors of A.
χA (λ): The characteristic polynomial of a square matrix A.
mA (λ): The minimal polynomial of A.
EA (λ): The eigenspace corresponding to the eigenvalue λ of A.
Rn : The n-dimensional real space R × R × . . . × R.
Cn : The n-dimensional complex space or unitary space C × C × . . . × C.
⟨u, v⟩: The inner product, dot or scalar product of the vectors u and v.

ix
Chapter 1

DETERMINANTS

1.1 Introduction

Determinants are among the most useful topics of linear algebra, with numerous applica-
tions in engineering, physics, economics, mathematics, and other sciences. In geometry
they offer a natural setting for writing very elegant formulas that compute areas and vol-
umes, as well as equations of geometric objects such as lines, circles, planes, spheres, etc.

Objectives
At the end of this lecture, you should be able to:

• Compute the determinant of any square matrix by cofactor expansion and by


using elementary row/column operations.

• Use determinants to find equations of a line, area of a triangle, parallelogram,


volume of parallelepiped.

• Use determinants to find equations of a line, area of a triangle, parallelogram,


volume of parallelepiped.

• Use determinants to find inverse of a nonsingular matrix.

• Use determinants to check whether a given square system of linear equations


has a unique solution, no solution or more than one solutions.

1
In order to discuss determinants of square matrices, we need to introduce some defini-
tions, terminologies and notations frequently used in linear algebra.

1.2 Matrices

Definition 1.1 A matrix A is a rectangular array of elements (called scalars) from


an arbitrary but fixed field K, where each element called an entry depends on two
subscripts.

A field could be: the set of real numbers, R, the complex numbers, C, etc.
A matrix A is usually presented in the form:
 
a11 a12 · · · a1n
 
 a21 a22 · · · a2n 
 
A= . .. ... .. 
 .. . . 
 
am1 am2 · · · amn

The rows of A are the m horizontal lists of scalars:

(a11 , a12 , . . . , a1n ), (a21 , a22 , . . . , a2n ), . . . , (am1 , am2 , . . . , amn )

and the columns of A are the n vertical lists of scalars:


     
a11 a12 a1n
     
 a   a   a 
 21   22   2n 
 .   . 
, , . . . ,  . 
 ..   ..   .. 
     
am1 am2 amn

The element aij , called the ij-entry or ij-element, appears in row i and column j.
We will denote such a matrix by simply writing A = [aij ].
A matrix with m rows and n columns is called an m by n matrix, written m × n. The
pair of numbers m and n is called the size or order of the matrix. In denoting the size
of a matrix we always list the number of rows first and the number of columns second.
Two matrices A and B are equal, written A = B, if they have the same size and if the
corresponding elements are equal.
A matrix with only one row is a row matrix or row vector, and a matrix with only one

2
column is called a column matrix, or column vector. A matrix whose entries are zero is
called a zero matrix and will be denoted by 0.
By interchanging the rows and columns of A, the transpose matrix AT is generated.
Namely, if
   
a11 a12 ··· a1n a11 a21 ··· am1
   
 a21 a22 · · · a2n   a12 a22 · · · am2 
   
A= . .. .. ..  then A T
=  . .. .. .. .
 .. . . .   .. . . . 
   
am1 am2 · · · amn a1m a2m · · · anm

The rows of AT are the columns of A and the ijth element of AT is aj i, i.e., (AT )ij = aji .
If A is an m × n matrix, then AT is an n × m matrix.
Matrices whose entries are all real numbers are called real matrices and said to be
matrices over R. Analogously, matrices whose entries are all complex numbers are
called complex matrices and are said to be matrices over C.
An m × n matrix A is said to be square if it has the same number of rows as columns(i.e.
if m = n). Square matrices figure importantly in applications of linear algebra, but
non-square matrices are also encountered in common physical applications,especially in
least squares data analysis.

Example 1.1 The following are examples of matrices:


 
( ) 1
1 2 ( )  
, [−2], 1 2 5 ,   9 

3 4
−4

Remark 1.1

Often when dealing with 1 × 1 matrices, for instance, A = [−2], we will drop the
surrounding brackets and just write A = −2.
There are a lot of notational issues that we are going to have to get used to in this
course, especially when reading books by different authors. We will stick to a particular
convention:
Upper case letters denote matrices and lower case letters denote numbers or entries of
a matrix. The entry in the ith row and the jth column of a matrix A is denoted by aij

3
or (A)ij . The first (leftmost) subscript will always give the row the entry is in and the
second (rightmost) subscript will always give the column the entry is in. We denote the
determinant of a square matrix A by det(A) or |A|.
 
a11 a12 . . . a1n
 
 a21 a22 · · · a2n 
 
Definition 1.2 In an n × n matrix A =  . . . . ..  , the entries
 .. ... . 
 
an1 an2 · · · ann
a11 , a22 , ..., ann

are called the main diagonal of the matrix.

1.2.1 Some Special Matrices

Most basic examples of matrices are:


1. Zero matrix -matrix whose entries are all zeros, e.g.
 
0 0 ··· 0
 
 0 0 ··· 0 
 
A= . . . . 
 .. .. . . .. 
 
0 0 ··· 0

2. Identity matrix - matrix with 1′ s on the main diagonal and 0′ s elsewhere(off the
main diagonal). That is,  
1 0 ··· 0
 
 1 ··· 0 
 0 
I= .... . . .. 
 . . . . 
 
0 0 ··· 1
In this course we will be mainly concerned with such real square matrices.

1.3 Determinants

A determinant is defined specifically for a square matrix.

4
1.3.1 Synopsis on Determinants

For any square array of numbers, i.e., a square matrix, we can define a determinant: a
scalar number, real or complex. In this chapter we will give the fundamental definition
of a determinant and use it to prove several elementary properties. These properties
include: determinant addition, scalar multiplication, row and column addition or sub-
traction, and row and column interchange. As we will see, the elementary properties
often enable easy evaluation of a determinant, which otherwise could require an exceed-
ingly large number of multiplication and addition operations. Every determinant has
cofactors, which are also determinants but of lower order (if the determinant corresponds
to an n x n array, its cofactors correspond to (n − 1) × (n − 1) arrays). We will show how
determinants can be evaluated as linear expansions of cofactors. We will then use these
cofactor expansions to prove that a system of linear equations has a unique solution if
the determinant of the coefficients in the linear equations is not 0. This result is known
as Cramer’s rule, which gives the analytic solution to the linear equations in terms of
ratios of determinants. The properties of determinants established in this chapter will
play (in the chapters to follow) a big role in the theory of eigenvalues and eigenvectors,
inverses of square matrices, analytic geometry, and in the theory of matrices as linear
operators in vector spaces.
There are two main methods to compute determinants of square matrices. These are:

1. Cofactor Expansion.

2. Elementary Row/Column Operations.

1.3.2 Determinants by Cofactor Expansion

Determinant of order 1 and 2


The determinant of a 1 × 1 matrix A = [a] is a.
( )
a11 a12
The determinant of a 2 × 2 matrix A = is a11 a22 − a12 a21 .
a21 a22
( )
1 2
Example 1.2 det = 1.4 − 2.3 = −2.
3 4

5
 
a11 a12 a13
 
Let B =  
 a21 a22 a23 . The determinant of B can be written in terms of 2 × 2
a31 a32 a33
determinants:
( ) ( ) ( )
a22 a23 a21 a23 a21 a22
det(B) = a11 det − a12 det + a13 det
a32 a33 a31 a33 a31 a32

or explicitly as

det(B) = a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 )

In the same manner we can define 4 × 4, 5 × 5,..., etc, determinants. We can continue
similarly and define n × n determinants in terms of (n − 1) × (n − 1) determinants called
minors.

Definition 1.3 Let A be a square matrix. The minor of aij or the the (i, j) minor,
denoted by Mij , of A is the determinant of the sub-matrix obtained by deleting the ith
row and the jth column of A.

Remark 1.2

In the examples above, we have introduced what is known as the cofactor expansion
of a determinant about the first row. Each entry of the first row was multiplied by
the corresponding minor. Each such product was multiplied by ±1, depending on the
position of the entry. The signed products were added together. In fact there is nothing
special about the choice of the first row in the computation of the determinant. We
could have used any other row or column. Here is how.
Let A be a square matrix. First we assign a sign to each entry of A according to a
checkerboard pattern of pluses and minuses.
 
+ − + ...
 
 − + − ... 
 
 
 + − + ... 
 
.. .. .. ..
. . . .

6
Then we pick any row or column and multiply each signed entry by the corresponding
minor. Finally, we add all these products. Note that the sign of the (i, j) position in
the checkerboard pattern is given by (−1)i+j .
We try to expand a determinant about the row or column with the most zeros. This
avoids the computation of some of the minors.

Definition 1.4 Let A be a square matrix. The cofactor of aij or (i, j) cofactor
denoted by Cij , of A is the signed (i,j) minor. That is, Cij = (−1)i+j Mij .

Determinants of order 3 or higher


We can write
 determinants  of 3 × 3 matrices in terms of cofactors.
a11 a12 a13
 
Let A =  
 a21 a22 a23 . Then
a31 a32 a33

det(A) = a11 M11 − a12 M12 + a13 M13 = a11 C11 + a12 C12 + a13 C13 (1.1)

Note that the definition is recursive. For example, to process the determinant of a square
matrix of size n = 4, we must process the determinant for a matrix of size n = 3, n = 2,
and n = 1. We can generalize (1.1) to any n × n matrix, n ≥ 3.

Definition 1.5 Let A be an n × n matrix. Then



n
det(A) = a11 C11 + a12 C12 + a13 C13 + · · · + a1n C1n = a1j C1j
j=1

This is cofactor expansion of determinant along the first row of A.

Remark 1.3

In fact there is nothing special about the choice of the first row in the computation of
the determinant of A. We could have used any other row or column of A.
Thus
det(A) = ai1 Ci1 + ai2 Ci2 + ai3 Ci3 + · · · + ain Cin (along row i)
= a1s C1s + a2s C2s + a3s C3s + · · · + ans Cns (along column s), 1≤ i ≤ n or 1 ≤ s ≤ n
This says that the determinant of a square matrix is independent of the row or column
used to compute it in cofactor expansion.

7
Definition 1.6 The process of computing determinant of a square matrix A across a
row or down a column is called cofactor expansion of determinant.
 
4 2 1
 
Example 1.3 Let A =   −2 −6 3 . Find det(A) using the given cofactor expan-
−7 5 0
sion:
(a). Expand along the first row of A.
(b). Expand along the third row.
(c). Expand along the second column.

Solution
(a). The minors along the first row are:

−6 3 −2 3 −2 −6
M11 = = −15, M12 = = 21, M13 = = −52
5 0 −7 0 −7 5
and the corresponding cofactors are

C11 = (−1)1+1 M11 = −15, C12 = (−1)1+2 M12 = −21, C13 = (−1)1+3 M13 = −52,

respectively. Thus, the determinant of A by cofactor expansion along the first row is

det(A) = a11 C11 + a12 C12 + a13 C13


= 4C11 + 2C12 + 1C13
= 4(−15) + 2(−21) + 1(−52) = −154

(b). The minors along the third row are:

2 1 4 1 4 2
M31 = = 12, M32 = = 14, M33 = = −20
−6 3 −2 3 −2 −6
and the corresponding cofactors are

C31 = (−1)3+1 M31 = 12, C32 = (−1)3+2 M32 = −14, C33 = (−1)3+3 M33 = −20,

respectively. Thus, the determinant of A by cofactor expansion along the third row is

det(A) = a31 C31 + a32 C32 + a33 C33


= −7C31 + 5C32 + 0C33
= −7(12) + 5(−14) + 0(−20) = −154

8
(c). The minors along the second column are:

−2 3 4 1 4 1
M12 = = 21, M22 = = 7, M32 = = 14
−7 0 −7 0 −2 3

and the corresponding cofactors are

C12 = (−1)1+2 M12 = −21, C22 = (−1)2+2 M22 = 7, C32 = (−1)3+2 M32 = −14,

respectively. Thus, the determinant of A by cofactor expansion along the second column
is
det(A) = a12 C12 + a22 C22 + a32 C32 = 2(−21) + (−)6(7) + 5(−14) = −154

Some Simple Observations


We describe two simple observations that follow immediately from the definition of
determinant by cofactor expansion.

Proposition 1.1 Suppose that a square matrix A has a zero row or a zero column.
Then det(A) = 0.

Proof . We simply use cofactor expansion along the zero row or the zero column.
Suppose A is an n × n matrix and row i of A is a row of zeroes. We compute det(A) via
cofactor expansion about row i.

∑n i+j
det(A) = j=1 (−1) aij Mij
∑n
= j=1 aij Cij
∑n
= j=1 0

= 0.
The proof for the case of a zero column is entirely similar and could also be derived by
employing the transpose of the matrix.
 
a11 · · · a1n
 . . 
Definition 1.7 Consider an n × n matrix A =  .. . . ... . If aij = 0 when-
 
an1 · · · ann
ever i > j, then A is called an upper triangular matrix. If aij = 0 whenever i < j,
then A is called a lower triangular matrix. We say that A is triangular if it is
either upper triangular or lower triangular.

9
   
1 2 3 1 0 0
   
Example 1.4 A =    
 0 4 5  is upper triangular; B =  2 3 0  is lower trian-
0 0 7 4 5 6
gular.

Definition 1.8 A square matrix A is diagonal if it is both upper and lower triangular.
 
2 0 0 0
 
 0 −3 0 0 
 
Example 1.5 A =   is diagonal.
 0 0 1 0 
 
0 0 0 8
Theorem 1.2 If A is a triangular matrix of order or size n, then

det(A) = a11 a22 · · · ann

Proof. We use Mathematical Induction on the number of rows/columns of A to prove


for the case when A is an upper triangular matrix. The case when A is a lower triangular
matrix can be proven similarly.
If A has size 1, i.e, 1 × 1, then A = [a11 ] and det(A) = A11 .
Assume that the theorem is true for any upper triangular matrix of size k − 1. Consider
an upper triangular matrix A of size k. Expanding along the kth row, we obtain
det(A) = 0.Ck1 + 0.Ck2 + · · · + [Link] + akk .Ckk
= akk .Ckk

Note that Ckk = (−1)k+k Mkk = (−1)2k Mkk = Mkk , where Mkk is the determinant of the
upper triangular matrix formed by deleting the kth row and kth column of A. Since
this matrix is of size k − 1, we can apply our induction assumption(hypothesis) to write
the following
det(A) = akk Mkk
= akk (a11 a22 a33 · · · a(k−1)(k−1) )
= a11 a22 a33 · · · akk

as required

10
 
9 2 3 8
 
 0 −3 0 5 
 
Example 1.6 If A =  , then det(A) = 9(−3)(1)(7) = −189.
 0 0 1 6 
 
0 0 0 7

We need the following definitions.

Definition 1.9 An m × n matrix A is in echelon form if


1. All the rows that consist of zeros are grouped together at the bottom of the matrix;
2. The first(leading) (counting from left to right) nonzero entry in the (i + 1)st row must
appear in a column to the right of the leading nonzero entry in the ith row.

Definition 1.10 A matrix A that is in in echelon form is in Reduced Row Echelon


Form (RREF), provided that the first nonzero element in each nonzero row is the only
nonzero entry in its column. This means that a matrix A is in RREF if it satisfies
properties (1) and (2) in Definition 1.9 and the following two properties:
3. The leading entry in any nonzero row is 1;
4. All entries in the column above and below a leading 1 are zero.

Remark 1.4

Note that
1. A reduced row echelon form matrix is always in echelon form.
2. A zero row of a matrix is a row that consists entirely of zeros and a nonzero row
is a row that has at least one nonzero entry.
3. The first nonzero entry of a nonzero row is called a leading entry.
4. If a leading entry happens to be 1, we call it a leading 1 .

1.3.3 Gaussian Elimination

The following process reduces a matrix to echelon or RREF:


Step 1: Locate the leftmost) nonzero column.
Step 2: If the first row has a zero in the column of Step 1, interchange it with one that
has a nonzero entry in the same column.

11
Step 3: Obtain zeros below the leading entry by adding suitable multiples of the top
row to the rows below that.
Step 4: Temporarily cover(ignore) the top row and repeat the same process starting
with Step 1 applied to the leftover submatrix. Repeat this process with the rest of the
rows. (At this stage, the matrix is already in echelon form).
Step 5: Starting with the last nonzero row, work upward: For each row obtain a lead-
ing 1 and introduce zeros above it by adding suitable multiples to the corresponding rows.

A matrix B obtained from A by this process is said to be in Reduced Row Echelon Form.

Example 1.7 Consider the following matrices

   
−1 0 1 2 1 −6 0 2 ( )
    0 2 −6
M =
 0 0 −3 4 
 , N =  0
 0 1 4 
, P =
0 −1 1
0 0 0 0 0 0 0 0

M and N are in echelon form. N is also in RREF. M is not in RREF because condition
3 fails. Matrix P is not in echelon form, because condition 2 fails.

Example 1.8 Consider the following matrices


 
  1 0 0 0
1 0 1 ( )  
   0 1 0 0 
0 0  
A= 
 0 0 1  , B = , C =  
1 0  0 0 0 0 
0 0 1  
0 0 0 0

Matrices A and B are not in echelon form and hence are not in RREF. Matrix C is in
RREF.

Elementary Row/Column Operations

Definition 1.11 The following operations, performed on the rows of a matrix, are called
elementary row operations:
1. Interchanging two rows: Ri ←→ Rj
2. Adding a constant multiple of one row to another: Ri := Ri +cRj , where c is a scalar.
3. Multiplying a row by a nonzero scalar: Ri := cRi .

12
Definition 1.12 Two matrices are (row) equivalent if one can be obtained from the
other by a finite sequence of elementary row operations. Sometimes we use the abbrevi-
ation A ∼ B for the statement ”matrix A is equivalent to B”.

We investigate row operations on square matrices.

1.3.4 Determinants by Row/Column Reduction

We show how certain row/column operations simplify calculation of determinants. We


first study the effect of elementary row operations on determinants.

Proposition 1.3 (ELEMENTARY ROW OPERATIONS) Suppose that A and


B are n × n matrices.
(a). If B is obtained from A by interchanging two rows of A, then

det(B) = −det(A)

(b). If B is obtained from A by adding a multiple of one row of A to another row, then

det(B) = det(A)

(c). If B is obtained from A by multiplying one row of A by a non-zero constant c,


then
det(B) = [Link](A)

Proof
(a). The proof is by induction on n, the size of matrix A. It is easily checked that the
result holds when n = 2. When n > 2, we use cofactor expansion by a third row, say
row i. Then

n
det(B) = aij (−1)i+j det(Bij ),
j=1

where the (n − 1) × (n − 1) matrices Bij are obtained from the matrices Aij by inter-
changing two rows of Aij , so that det(Bij ) = −det(Aij ). It follows that

n
det(B) = − aij (−1)i+j det(Aij ) = −det(A),
j=1

13
as required.

(b). Again, the proof is by induction on n. It is easily checked that the result holds
when n = 2. When n > 2, we use cofactor expansion by a third row, say row i. Then

n
det(B) = aij (−1)i+j det(Bij ),
j=1

where the (n − 1) × (n − 1) matrices Bij are obtained form the matrices Aij by adding
a multiple of one row of Aij to another row, so that det(Bij ) = det(Aij ). It follows that

n
det(B) = aij (−1)i+j det(Aij ) = det(A),
j=1

as required.

(c). This is simpler. Suppose that the matrix B is obtained from the matrix A by
multiplying row i of A by a nonzero constant c. Then

n
det(B) = caij (−1)i+j det(Bij )
j=1

Note now that Bij = Aij since row i has been removed respectively from B and A. It
follows that

n
det(B) = caij (−1)i+j det(Aij ) = [Link](A),
j=1

as required.

Proposition 1.4 (ELEMENTARY COLUMN OPERATIONS) Suppose that A


and B are n × n matrices.
(a). If B is obtained from A by interchanging two columns of A, then

det(B) = −det(A)

(b). If B is obtained from A by adding a multiple of one column of A to another


column, then
det(B) = det(A)

14
(c). If B is obtained from A by multiplying one column of A by a nonzero constant c,
then
det(B) = [Link](A)

Proof. This follows from the proof of Proposition 1.3 by replacing ”row” with ”column”
.

Remark 1.5

Elementary row and column operations can be combined with cofactor expansion to
calculate the determinant of a given square matrix.
 
2 3 2 5
 
 1 4 1 2 
 
Example 1.9 Consider the matrix A =  . Adding −1.C3 to C1 , we have
 5 4 4 5 
 
2 2 0 4
 
2 3 2 5
 
 1 4 1 2 
 
det(A) = det  . Adding − 21 R4 to R3 , we have
 5 4 4 5 
 
2 2 0 4
 
0 3 2 5
 
 0 4 1 2 
 
det(A) = det  . Using cofactor expansion by column C1 , we have
 0 3 4 3 
 
2 2 0 4
   
3 2 5 3 2 5
   
det(A) = 2(−1)4+1 det 
 4 1 2  = −2det  4 1 2 . Adding −R1 to R3 , we have
  
3 4 3 3 4 3
 
3 2 5
 
det(A) = −2det  
 4 1 2 . Adding 1.C2 to C3 , we have
0 2 −2
 
3 2 7
 
det(A) = −2det 
 4 1 3 . Using cofactor expansion by R3 , we have

0 2 0

15
( ) ( )
3 7 3 7
det(A) = −2.2(−1)3+2 det = [Link] . Using the formula for the deter-
4 3 4 3
minant of 2 × 2 matrices, we conclude that det(A) = 4(9 − 28) = −76.

1 2 3 1 2 3 1 2 3 1 2 3
Example 1.10 10 30 40 = 10 1 3 4 = 10 0 1 1 = 10 0 1 1 =
3 1 1 3 1 1 3 1 1 0 −5 −8
10(−8 + 5) = 10(−3) = −30.

Proposition 1.5 If A is an n × n matrix, then det(kA) = |kA| = k n .|A|

Remark 1.6

To evaluate the determinant of a matrix, the matrix is row-reduced/column-reduced


until we obtain either a triangular matrix or a row/column of zeros. At each stage,
there is a factor k or −1 or 1. We keep track of all these constants throughout the
reduction process as we apply Proposition 1.3 or Proposition 1.4.

Example
 1.11 Use
 elementary row operations to evaluate the determinant of
0 1 5
 
A= 3 −6 9 .
2 6 1

Solution
Interchanging R1 and R2 , we have
3 −6 9 1 −2 3 1 −2 3
det(B1 ) = 0 1 5 = (−1)det(A) = (−1)(3) 0 1 5 = (−1)(3) 0 1 5
2 6 1 2 6 1 0 10 −5
1 1 5 1 −2 3
= (−1)(3) 0 20 40 = (−1)(3)(−55) 0 1 5 = 165.
0 0 −55 0 0 1

Remark 1.7

If a square matrix has two proportional rows( in particular, two rows are the same),
then the determinant is zero. This result is stated and proved in the following theorem.

16
Theorem 1.6 Let A be an n × n matrix. If the jth row (or column) of A is a multiple
of the kth row(or column) of A, then det(A) = 0.

Proof . We prove the result for columns. Let A = [A1 , A2 , ..., An ], where A1 , A2 , A3 , ..., An
denote the n column vectors of A and suppose that Aj = cAk . Define B to be the ma-
trix B = [A1 , A2 , ..., Aj , ..., Ak , ..., An ] and observe that det(A) = [Link](B). Now if we
interchange the jth and kth columns of B, then the matrix B remains the same, but
the determinant changes sign. This [det(B) = −det(B)] can happen only if det(B) = 0;
and since det(A) = [Link](B), then det(A) = 0.

Theorem 1.7 If A, B and C are n × n matrices that are equal except that the sth
column(or row) of A is equal to the sum of the sth columns(or rows) of B and C, then
det(A) = det(B) + det(C).

Theorem 1.8 If A is an n × n matrix, and if a multiple of the kth row(or column) is


added to the jth row(or column), then the determinant is not changed.

Proof . Let A = [A1 , A2 , ..., Aj , ..., Ak , ..., An ] and B = [A1 , A2 , ..., Aj + cAk , ..., An ] by
the theorem above(Theorem 1.7), det(B) = det(A) = det(A) + det(Q), where Q =
[A1 , A2 , ..., cAk , ..., Ak , ..., An ]. But by an earlier result, det(Q) = 0; so det(B) = det(A),
and the theorem is proved.

Remark 1.8

We have surveyed two general methods for evaluating determinants:


1. Cofactor Expansion
2. Elementary Row/Column Operations.
The second method is usually faster than cofactor expansion along row or column. If
the size of the matrix is large, the number of arithmetic operations for cofactor expansion
can become extremely big. For this reason, most computer and calculator algorithms use
the method involving elementary row operations. The following table gives the number
of additions(plus subtractions) and multiplications(plus additions) needed for each of
these two methods for matrices of order 3, 5, and 10.

17
Cofactor Expansion Row Reduction
order n Additions Multiplications Additions Multiplications
3 5 9 5 10
5 119 205 30 45
10 3,628,799 6,235,300 285 339

The number of operations for the cofactor expansion of an n × n matrix grows like n!
Since 30! ≈ 2.65×1032 , a 30×30 matrix would require over 1032 operations! If a computer
could do one trillion operations per second, it would still take over one trillion years to
compute the determinant of this matrix using cofactor expansion(given the matrix has
no zero entries)-yet row reduction would take only a few seconds!

1.4 Determinants of Block Triangular Matrices

Definition 1.13 Let A = [Aij ] be a square matrix such that the non-diagonal blocks
below or above the diagonal block/ the non-diagonal blocks are all zero matrices. Then
A is called a block triangular matrix/ block diagonal matrix.
   
1 2 3 1 2 0
   
Example 1.12 Let A =   4 5 6  and B =  3 4 0 . Then A is a triangular
  
0 0 9 0 0 7
block matrix while B is a diagonal block matrix.
( )
A1 B
Theorem 1.9 Let A be a square matrix. If A = , then det(A) = det(A1 ).det(A2 ).
0 A2
If B = 0, i.e. a zero matrix, then still det(A) = det(A1 ).det(A2 ).

Note that in the theorem above, the zero matrix denoted by 0 and B need not be square.
We generalize our result to a finite number of diagonal blocks:

Theorem 1.10 Suppose A is an upper(lower) triangular block matrix with the diagonal
blocks A1 , A2 , ..., An . Then det(A) = det(A1 ).det(A2 )...det(An ).

18
 
2 3 4 7 8
 
 −1 5 3 2 1 
 
 
Example 1.13 Find |M | where M =  0 0 2 1 5 .
 
 0 0 3 −1 4 
 
0 0 5 2 6

Solution. Note that M is 


an upper triangular
 block matrix with diagonal blocks
( ) 2 1 5
2 3  
M1 = and M2 =  3 −1 4 
.
−1 5
5 2 6

It is easy to check that det(M1 ) = 13 and det(M2 ) = 29. Hence det(M ) =


det(M1 ).det(M2 ) = 13(29) = 377.

1.5 Properties of the Determinant Function

Let Mn×n denote the vector space of all n × n matrices over a field K. Then the
determinant function | | : Mn×n −→ C(or R) defined by

A 7−→ det(A)

is a complex-valued or real-valued map depending on the field K.

Definition 1.14 If A is an m × n matrix then the transpose of A, denoted by At is


the n × m matrix whose ith column is the ith row of A.
   
1 2 9 1 3 5
   
Example 1.14 If A =   3 4 0 , then At =  2 4 6 .
  
5 6 8 9 0 8

Assuming that the sizes of the matrices are such that the operations can be performed(i.e.
are compatible), the transpose operation has the following properties:

Theorem 1.11 Let A and B be compatible matrices and k be a scalar. Then

19
(i). (At )t = A

(ii). (A + B)t = At + B t

(iii). (kA)t = kAt

(iv). (AB)t = B t At

Proposition 1.12 If A is a square matrix, then det(A) = det(At ).

Proof. If A is triangular, so is At . The case when A is not triangular can be proved


easily by reducing A to its row-equivalent triangular form. Note that the transpose
operation does not change the elements on the main diagonal of any square matrix.
Hence
det(A) = det(At )

Note that the Proposition 1.12 says that we can effectively replace ”row” by ”column”
in most results.

Proposition 1.13 Let A and B be n × n matrices. Then

(i). det(A + B) ̸= det(A) + det(B)

(ii). det(AB) = det(A).det(B).

Proof. We prove the case for diagonal matrices. The case when A and B are are not
triangular can be proved similarly by first reducing them to triangular form.
     
a11 ... 0 b11 · · · 0 a11 b11 · · · 0
 . .   . . ..   
Let A =  . ..  and B =  .. .. . Then AB =  ... .. .. .
 . ···   .   . . 
0 . . . ann 0 · · · bnn 0 · · · ann bnn

Therefore

det(AB) = (a11 b11 )(a22 b22 )...(ann bnn ) = (a11 a22 ...ann )(b11 b22 ...bnn ) = det(A).det(B). 

20
Proposition 1.14 Let R be an n × n matrix in reduced row echelon form, and let R
contain no row of zeros. Then R = In , where In denotes the n × n identity matrix.

Remark 1.9

No row of zeros in Proposition 1.14 implies that each column of R contains a leading 1.
So the leading 1s are on the main diagonal. Hence the result.

Definition 1.15 A square matrix A is said to be a singular matrix if det(A) = 0.

Definition 1.16 A square matrix B is said to be the inverse of A if AB = BA = I.

A square matrix A is said to be invertible if such a B exists. We denote the inverse of


A by A−1 .

Definition 1.17 An elementary matrix E is a simple matrix which differs from the
identity matrix in a minimal way(i.e. by a single elementary row or column operation).

Recall that left(pre-)/right(post-) multiplication by an elementary matrix represents el-


ementary row operations/elementary column operations. Recall also if A and A b are such
b is obtained from A by a series of elementary row operations, then they are said
that A
to be row-equivalent. This row operation can be represented by an elementary matrix
E
E. That is A −→ A.b Equivalently, EA = A.b

( )
1 2
Example 1.15 Let A = . Applying the elementary operation R3 := R3 − 3R1
3 4
( )
b= 1 2
on A gives A . It is easy to check that this elementary row operation is
0 −2
( )
1 0 b
represented by the matrix E1 = . Clearly, E1 A = A.
−3 1

Theorem 1.15 A square matrix A is invertible if and only if det(A) ̸= 0.

Proof. (=⇒) If A is invertible, then there exists A−1 such that AA−1 = I. Taking
determinants both sides of this equation and using the fact that det(I) = 1, we have

1 = det(I) = det(AA−1 ) = det(A).det(A−1 )

21
So neither determinant on the right is zero. Thus det(A) ̸= 0.
(⇐=) Conversely, assume det(A) ̸= 0. Then we shall show that A is row-equivalent
to the identity matrix I. Let R be the Row Reduced Echelon Form (RREF) of A.
That is Ek ...E3 E2 E1 A = R, where E1 , E2 , ..., Ek are the matrices corresponding to the
elementary row operations in the reduction process. Note that each of these matrices
have an inverse. Hence
A = E1−1 E2−1 ...Ek−1 R.

Therefore
det(A) = det(E1−1 )det(E2−1 )...det(Ek−1 )det(R).

Since det(A) ̸= 0 , so also det(R) ̸= 0.

Since R is in reduced row echelon form, it must be the identity matrix I or it must have
at least one row of zeros. But if R has a row of zeros, then det(R) = 0, which could
imply that det(A) = 0. This contradicts our assumption that det(A) ̸= 0. So R cannot
have any zero rows by the previous result. Hence R must be I. We therefore conclude
that A is row equivalent to I and so A is invertible.

Corollary 1.16 If A is an invertible matrix, then det(A−1 ) = 1


det(A)
.

Proof. Since A is invertible, there exists A−1 such that AA−1 = I and det(A) ̸= 0.
Thus det(AA−1 ) = det(I). Equivalently, det(A).det(A−1 ) = 1, from which we obtain
that det(A−1 ) = 1
det(A)
.
 
1 0 3
 
Example 1.16 Find |A−1 | for the matrix A = 
 0 −1 2 .

2 1 0

Solution. One way to solve this problem is to find A−1 , then evaluate det(A−1 ). It is
simpler, however to apply the corollary above.

1 0 3
It is easy to check that |A| = 0 −1 2 = 4.
2 1 0

Thus |A−1 | = 1
|A|
= 41 .

22
1.6 Applications of Determinants

1.6.1 Finding Inverse of a Matrix

Definition 1.18 Let A = [aij ] be an n × n matrix over a field K and let Cij denote
the cofactor of aij . The classical adjoint of A denoted by adj(A) , is the transpose
of the matrix of cofactors of A, namely

adj(A) = [Cij ]t .
 
2 3 −4
 
Example 1.17 Let A = 
 0 −4 2  . The cofactors of the nine elements of A are:
1 −1 5

−4 2 0 2 0 −4
C11 = + = −18, C12 = − = 2, C13 = + = 4.
−1 5 1 5 1 −1

3 −4 2 −4 2 3
C21 = − = −11, C22 = + = 14, C23 = − = 5.
−1 5 1 5 1 −1

3 −4 2 −4 2 3
C31 = + = −10, C32 = − = −4, C33 = + = −8.
−4 2 0 2 0 −4
 
−18 2 4
 
The matrix of cofactors is Cij =   −11 14 5 . The transpose of this matrix of

−10 −4 −8
cofactors yields the classical adjoint of A. That is,
 
−18 −11 −10
 
adj(A) = 
 2 14 −4 
.
4 5 −8

Theorem 1.17 Let A be any square matrix. Then

[Link](A) = (adj(A).A) = (det(A)).I,

23
where I is the identity matrix. Thus, if det(A) ̸= 0, then

−1 1 ( )
A = adj(A) .
det(A)
Proof. By looking at the product of a matrix A with its adjoint
 
a a ... a1n
 11 12  
 a 
 21 a22 ... a2  C11 C21 ... Cj1 ... Cn1
 . ..   
 .. .. ..
.   
 . .  C12 C22 ... Cj2 ... cn2 
[Link](A) =   .. .. .. .. .. .. ,
 ai1 ai2 ... ain   . . . . . . 
  
 .. .. .. .. 
 . . . .  C1n C2n ... Cjn ... Cnn
 
an1 an2 ... ann

the entry in the ith row and jth column of this product is

ai1 Cj1 + ai2 Cj2 + ... + ain Cjn .

If i = j, then this sum is simply the cofactor expansion of A along the ith row, which
means that the sum is det(A).
If i ̸= j, then the sum is zero: To see this, consider the following matrix B in which the
jth row of A has been replaced with the ith row of A:
 
a11 a12 · · · a1n
 
 a21 a22 · · · a2n 
 
 .. .. .. .. 
 . . . . 
 
 
 ai1 ai2 ... ain 
B=  .. .. .. ..
.

 . . . . 
 
 a 
 i1 ai2 ... ain 
 . .. .. .. 
 .. 
 . . . 
an1 an2 ... ann

Cofactor expansion along the jth row of B produces

det(B) = ai1 Cj1 + ai2 Cj2 + ... + ain Cjn .

24
Since B has two identical rows, we have that det(B) = 0.
Therefore [Link](A) has the form
 
|A| 0 ... 0
 
 |A| ... 
 0 0 
[Link](A) =  .. .. .. ..  = |A|.I
 . . . . 
 
0 0 ... |A|

Therefore
[ 1 ]
A adj(A) = I.
|A|
This implies that
−1 1 ( )
A = adj(A) .
det(A)
 
2 3 −4
 
Example 1.18 Find A−1 for A = 
 0 −4 2 .

1 −1 5

Solution. It is easy to check that det(A) = −46 ̸= 0. Thus A does have an inverse.
The adjoint of A has been computed in Example 1.17. Thus
   
( ) −18 −11 −10 9 11 5
1    
23 46 23
−1 1    .
A = adj(A) = −  2 14 −4  =  − 23 − 231 7 2
23 
det(A) 46
4 5 −8 − 232
− 46
5 4
23

1.6.2 Equivalent Conditions and systems of Linear Equations

Theorem 1.18 If A is an n × n matrix, then the following statements are equivalent.


1. A is invertible.
2. Ax = b has a unique solution for every n × 1 matrix b.
3. Ax = 0 has only the trivial solution.
4. A is row equivalent to In .
5. A can be written as the product of elementary matrices.
6. det(A) ̸= 0.

25
The notion of a determinant can be used to check whether or not a system of n linear
equations in n unknowns has a unique solution, more than one solution or no solution,
without even having to solve the system.

Example 1.19 Which of the following systems has a unique solution?

2x2 − x3 = −1
(a). 3x1 − 2x2 + x3 = 4
3x1 + 2x2 − x3 = −4

2x2 − x3 = −1
(b). 3x1 − 2x2 + x3 = 4
3x1 + 2x2 + x3 = −4

Solution. The coefficient matrices of the two systems have determinants

0 2 −1
(a). 3 −2 1 =0
3 2 −1

0 2 −1
(b). 3 −2 1 = −12
3 2 1

Therefore only the second system has a unique solution.

1.6.3 Cramer’s Rule

Theorem 1.19 [Cramer’s Rule] If a system of n linear equations in n variables


Ax = b has a coefficient matrix A with non-zero determinant |A|, then the solution to
the system is given by
det(A1 ) det(A2 ) det(An )
x1 = , x2 = , ..., xn =
det(A) det(A) det(A)

where the ith column of Ai is the column of constants b in the system of equations.

26
Proof. Let the system be represented by Ax = b. Since |A| ̸= 0, we can write
 
x1
 
( 1 )  x 
 2 
x = A−1 b = adj(A) b =  .  .
|A|  .. 
 
xn

If the entries of b are b1 , b2 , ..., bn , then xi is given by


1 ( )
xi = b1 C1i + b2 C2i + ... + bn Cni .
|A|
But the sum b1 C1i + b2 C2i + ... + bn Cni is precisely the cofactor expansion of Ai , which
means that
|Ai |
xi = .
|A|
Cramer’s rule is named after Swiss mathematician Gabriel Cramer(1704-1752), who
introduced determinants and used them to solve algebraic equations.

Example 1.20 Use Cramer’s Rule to solve the following system of linear equations for
x, y and z.

−x + 2y − 3z = 1
2x + z = 0
3x − 4y + 4z = 2

Solution. The determinant of the coefficient matrix A is

−1 2 −3
|A| = 2 0 1 = 10.
3 −4 4

Since |A| =
̸ 0, we know that the solution exists and is unique and Cramer’s Rule may
be applied to solve for x, y and z as follows:

1 2 −3
0 0 1
2 −4 4 4
x= =
10 5

27
−1 1 −3
2 0 1
3 2 4 3
y= =−
10 2

−1 2 1
2 0 0
3 −4 2 8
z= =−
10 5

1.6.4 Area, Volume and Equations of Lines and planes

Determinants have many applications in analytic geometry. These include:

(a). Area of a Triangle in the xy-plane

The area of the triangle whose vertices are (x1 , y1 ), (x2 , y2 ) and (x3 , y3 ) is given by

 
x1 y1 1
1  
Area = ± det  x
 2 2y 1 
 (1.2)
2
x3 y3 1
where the sign (±) is chosen to give a positive area.

28
Proof. We prove the case for yi > 0. Assume x1 ≤ x3 ≤ x2 , and that (x3 , y3 ) lies above
the line segment connecting (x1 , y1 ) and (x2 , y2 ).(see figure below).

Figure 1.1: Three trapezoids

Consider the three trapezoids whose vertices are as follows:

T1 : (x1 , 0), (x1 , y1 ), (x3 , y3 ), (x3 , 0)

T2 : (x3 , 0), (x3 , y3 ), (x2 , y2 ), (x2 , 0)

T3 : (x1 , 0), (x1 , y1 ), (x2 , y2 ), (x2 , 0)

The area of the given triangle is equal to the sum of the areas of the first two trapezoids
less the area of the third.

Therefore
1 1 1
Area of T riangle = (y1 + y3 )(x3 − x1 ) + (y3 + y2 )(x2 − x3 ) − (y1 + y2 )(x2 − x1 )
2 2 2

29
= 12 (x1 y2 + x2 y3 + x3 y1 − x1 y3 − x2 y1 − x3 y2 )

x1 y1 1
1
= 2 x2 y2 1
x3 y3 1

If the vertices do not occur in the order x1 ≤ x3 ≤ x2 or if the vertex (x3 , y3 ) is not
above the line segment connecting the other two vertices, then the formula may give the
negative of the area. So the area will be the absolute value of this area.

Example 1.21 Find the area of the triangle whose vertices are (1, 0), (2, 2) and (4, 3).

Solution. It is unnecessary to know the relative position of the three vertices. We


simply evaluate the determinant:

1 0 1
1 3
2 2 1 =− .
2 2
4 3 1
Therefore Area = | − 32 | = 32 .

Remark 1.10

Suppose the three points in part (a) are collinear (i.e. they lie on the same line). Then
the determinant in (1.2) would be zero. This can be used to find the equation of a
straight line.

(b). Equation of Straight Line

Consider the collinear points (0, 1), (2, 2) and (4, 3). The determinant giving the area of
the ”triangle” having these three points as vertices is

0 1 1
1
2 2 1 = 0.
2
4 3 1

30
Generalization: Three points (x1 , y1 ), (x2 , y2 ) and (x3 , y3 ) are collinear if and only
if  
x1 y 1 1
 
det  x2 y2 1 

 = 0.
x3 y 3 1
Two-point Form of the Equation of a Straight Line: The equation of the line
through the distinct points (x1 , y1 ) and (x2 , y2 ) is given by
 
x y 1
 
det  x1 y1 1  = 0.

x2 y 2 1

Example 1.22 Use determinant to find the equation of the line segment through the
points (2, 4) and (−1, 3).

Solution. Applying the determinant formula for the equation of the line passing through
these two points we have:
x y 1
2 4 1 = 0.
−1 3 1
Expanding by cofactor expansion along the top row we obtain

4 1 2 1 2 4
x −y +1 = x − 3y + 10 = 0.
3 1 −1 1 −1 3

Therefore the equation of the line is

x − 3y = −10.

Remark 1.11

We can use determinant method to find equations of curves defined by more complicated
equations, for instance, equation of a circle passing through three points (x1 , y1 ), (x2 , y2 )
and (x3 , y3 ).

Example 1.23 Find the equation of the circle passing through the three points (1, 1), (−2, 0),
and (0, 1).

31
Solution
Instead of working with the specific equation of the form

x2 + y 2 + ax + by + c = 0,

we need to have coefficients in all terms so as to create a system of equations with more
than one solution. We try an equation of the form

r1 (x2 + y 2 ) + r2 x + r3 y + r4 = 0.

Combining this general equation with the specific equations that must hold in order for
the given points to lie on the curve leads to the system

r1 (x2 + y 2 ) + r2 x + r3 y + r4 = 0
r1 (12 + 12 ) + r2 (1) + r3 (1) + r4 = 0
r1 ((−1)2 + 02 ) + r2 (−2) + r3 (0) + r4 = 0
r1 (02 + 12 ) + r2 (0) + r3 (1) + r4 = 0

Since this system will have more than one solution, the determinant of the coefficient
matrix is zero. Writing out this equation we obtain the coefficients ri of the equation
for the circle.
That is
x2 + y 2 x y 1
2 1 1 1
= 0.
1 −2 0 1
1 0 1 1
Expanding, we have −x2 − y 2 + x − 2y + 3 = 0 or x2 + y 2 − x + 2y − 3 = 0, which upon
completing squares we have (x − 21 )2 + (y + 1)2 = 17
4
, which is a circle centred at ( 12 , −1),

17
and radius 2
.

We can adjust this idea to obtain equations of other standard curves in R2 .


Exercise. Extend the above notion and try to get a polynomial of degree n passing
through n + 1 specific points. Try to extend this method to surfaces in R3 .

32
(c). Volume of Tetrahedron

The volume of a tetrahedron whose vertices are (x1 , y1 , z1 ), (x2 , y2 , z2 ), (x3 , y3 , z3 ) and
(x4 , y4 , z4 ) is given by
 
x1 y1 z1 1
 
 x y z 1 
1  2 2 2 
V olume = ± det   (1.3)
6  x3 y3 z3 1 
 
x4 y4 z4 1
where the ± sign is chosen to give a positive area.

Figure 1.2: Tetrahedron

Example 1.24 Find the volume of the tetrahedron whose vertices are (0, 4, 1), (4, 0, 0), (3, 5, 2)
and (2, 2, 5).

Solution. Using the determinant formula for volume produces

0 4 1 1
1 4 0 0 1 1
= (−72) = −12.
6 3 5 2 1 6
2 2 5 1

Therefore the volume of the tetrahedron is 12.

33
(d). Test for Coplanar Points in Space

Theorem 1.20 If four points in 3-dimensional space happen to lie in the same plane,
then the determinant in the formula for volume (1.3) turns out to be zero. Thus, four
points (x1 , y1 , z1 ), (x2 , y2 , z2 ), (x3 , y3 , z3 ) and (x4 , y4 , z4 ) are coplanar if and only if
 
x y 1 z1 1
 1 
 x y 2 z2 1 
 2 
det   = 0.
 x3 y 3 z3 1 
 
x4 y 4 z4 1

This test provides the determinant form for the equation of a plane passing through
three points in space.

(e). Three-point Form of the Equation of a Plane

The equation of the plane passing through the distinct points (x1 , y1 , z1 ), (x2 , y2 , z2 ) and
(x3 , y3 , z3 ) is given by  
x y z 1
 
 x y z 1 
 1 1 1 
det   = 0.
 x 2 y 2 z2 1 
 
x 3 y 3 z3 1

Example 1.25 Find the equation of the plane passing through the points (0, 1, 0), (−1, 3, 2)
and (−2, 0, 1).

Solution. Using the determinant form of the equation of the plane passing through
three distinct points produces
 
x y z 1
 
 0 1 0 1 
 
det   = 0.
 −1 3 2 1 
 
−2 0 1 1

Cofactor expansion along the second row gives

4x − 3y + 5z = −3.

34
1.7 Solved Exercises
a + b + 2c a b
1. Show that c b + c + 2a b = 2(a + b + c)3 .
c a c + a + 2b

Solution. Applying the elementary row operations: R2 := R2 − R1 and R2 := R2 − R3

we get

a + b + 2c a b a + b + 2c −(a + b + c) 0
c b + c + 2a b = 0 a+b+c −(a + b + c)
c a c + a + 2b c a c + a + 2b

1 −1 0
= (a + b + c)2 0 1 −1
c a c + a + 2b

= (a + b + c)2 (c + a + 2b + c + a)

= (a + b + c)2 (2a + 2b + 2c)

= 2(a + b + c)3

1 1 1
2. Show that x y z = (x − y)(y − z)(z − x).
x2 y 2 z 2

Solution. Carrying out the elementary operations: C2 := C2 − C1 and C3 := C3 − C1 ,


the Vandermonde’s determinant

35
1 1 1 1 0 0
x y z = x y−x z−x
x 2
y 2
z 2
x 2
y −x
2 2
z 2 − x2

(y − x) (y 2 − x2 )
=
(z − x) (z 2 − x2 )

1 1
= (y − x)(z − x)
(y + x) (z + x)

= (y − x)(z − x)(z − y)

= (x − y)(y − x)(z − x)

1 a b+c
3. Show that 1 b c+a = 0.
1 c a+b

Solution. Using the operation C3 := C3 + C2 and factoring out (a + b + c), we have:

1 a b+c 1 a a+b+c
1 b c+a = 1 b a+b+c
1 c a+b 1 c a+b+c

1 a 1
= (a + b + c) 1 b 1
1 c 1

= (a + b + c).0

= 0

36
4. Consider the system

kx + y + z = 1
x + ky + z = 1
x + y + kz = 1
Use determinants to find those values of k for which the system has:

(a). a unique solution

(b). more than one solution

(c). no solution

Solution. (a). The system has a unique solution when the determinant det(A) of the
k 1 1
coefficient matrix A is not equal to zero. It easy to check that det(A) = 1 k 1 =
1 1 k
k − 3k + 2 = (k − 1) (k + 2). The system has a unique solution when (k − 1) (k + 2) ̸= 0,
3 2 2

i.e. when k ̸= 1 and k ̸= −2.


Using Gaussian elimination on the augmented matrix [A|b], we have
 
..
 k 1 1 . 1 
 .. k−1 
 0 k−1 k2 −1
.  (1.4)
 k k k 
..
0 0 2−k −k . 1−k
2

(b). Using (1.4) the system has more than one solution when k = 1.

(c). Using (1.4) the system has no solution(i.e. is inconsistent) when k = −2.

5. Solve by Cramer’s rule the system

x + 3y − z = 4
2x − y + z = 3
3x − 2y + 2z = 5

37
 
1 3 −1
 
Solution The coefficient matrix for the system is A = 
 2 −1 1  and

3 −2 2
 
4
 
b =  3 

. A simple computation gives det(A) = −2. Since det(A) is not zero, we
5
can apply Cramer’s rule. We first need to compute det(Ai ), i = 1, 2, 3 where Ai is the
matrix obtained by replacing the i-th column of A by the vector b.

4 3 −1 1 4 −1 1 3 4
det(A1 ) = 3 −1 1 = −2, det(A2 ) = 2 3 1 = −4, det(A3 ) = 2 −1 3 = −6
5 −2 2 3 5 2 3 −2 5
|A1 | −2 |A2 | −4 |A3 | −6
Hence x = |A|
= −2
= 1, y = |A|
= −2
= 2 and z = |A|
= −2
= 3.
 
a1 x x ··· x
 
 ··· x 
 x a2 x 
 
6. Find det(A) by elementary operations, where A = 
 x x a3 ··· x  .
 .. .. .. .. 
 . . . . 
 
x x x · · · an

Solution Subtracting row 1 from all other rows we have

a1 x x ··· x
x − a1 a2 − x 0 ··· 0
det(A) = x − a1 0 a3 − x · · · 0 .
.. .. .. ...
. . .
x − a1 0 0 · · · an − x

Taking out (x − a1 ) from column 1, (a2 − x) from column 2 and so on, we obtain
a1
a1 −x
x
a2 −x
··· x
an −x
−1 1 0 ··· 0
det(A) = (a1 − x)(a2 − x)...(an − x) −1 0 1 ··· 0 .
.. .. .. . .
. . . .
−1 0 0 ··· 1

38
a1 x
Putting a1 −x
=1+ a1 −x
and adding all columns to the first one we get

1+ x
a1 −x
+ · · · anx−x x
a2 −x
··· x
an −x
0 1 0 ··· 0
det(A) = (a1 − x)(a2 − x)...(an − x) 0 0 1 ··· 0 .
.. .. .. . .
. . . .
0 0 0 ··· 1

The last matrix is upper triangular and its determinant is the product of the diagonal
entries:
x x
(1 + + ··· + ).1n−1 .
a1 − x an − x
Thus
x x
det(A) = (a1 − x)(a2 − x)...(an − x)(1 + + ··· + ).
a1 − x an − x

7. Find the point of intersection of the three planes defined by

x + 2y − z = 4
2x − y + 3z = 3
4x + 3y − 2z = 5
 
1 2 −1
 
Solution The coefficient matrix A = 
 2 −1 3 
 and det(A) = 15. Since det(A) ̸=
4 3 −2
0, we can use Cramer’s rule to solve the system. It is easy to check that det(A1 ) =
0, det(A2 ) = 45 and det(A3 ) = 30, where Ai , i = 1, 2, 3 is the matrix obtained from A
by replacing the i-th column of A with the vector b.
|A1 | |A2 | |A3 |
Thus x = |A|
= 0, y = |A|
= 3 and z = |A|
= 2. Thus the three planes intersect at
the point P (0, 3, 2).
( )
2 4
8. Find A−1 by the classical adjoint (i.e. adjugate) method where A = .
6 8
Solution Clearly det(A) = −8 ̸= 0. Therefore A−1 exists. We compute all the four

39
cofactors of A.
M11 = 8, C11 = 8 ; M12 = 6, C12 = −6
M21 = 4, C21 = −4 ; M22 = 2, C22 = 2
( ) ( )
C11 C12 8 −6
Therefore the matrix of cofactors is Cij = = and it is easy
C21 C22 −4 2
( )
8 −4
to see that adj(A) = .
−6 2
( ) ( )
8 −4 −1 1
Therefore A−1 = 1
det(A)
adj(A) = − 81 = 2
.
−6 2 3
4
− 41

1.8 Exercises

1. Compute the following determinants

cos θ − sin θ
(a).
sin θ cos θ

cos θ cos φ sin θ cos φ − sin φ


(b). cos θ sin φ sin θ sin φ cos φ
− sin θ cos θ 0

1 1 1 1
1 x 1 1
2.(a). Show that = (x − 1)3 .
1 1 x 1
1 1 1 x
1 2 3 4 5
2 3 4 5 1
(b). Evaluate the determinant 3 4 5 1 2 .
4 5 1 2 3
5 1 2 3 4

40
(b + c) c b
3.(a). Show that c (c + a) a = 4abc.
b a (a + b)

(b). Prove that if all the entries of a 3 × 3 matrix A are equal to ±1, then det(A) is
an even number.

(c). Without evaluating determinants, show that

a1 b1 a1 x + b1 y + c1 a1 b1 c1
a2 b2 a2 x + b2 y + c2 = a2 b2 c2
a3 b3 a3 x + b3 y + c3 a3 b3 c3

1 a bc
1 b ca = (b − a)(c − a)(c − b)
1 c ab

1 a a3 1 a a2
1 b b3 = (a + b + c) 1 b b2
1 c c3 1 c c2

4.(a). Show that

1 + c1 1 1 1
1 1 + c2 1 1 1 1 1 1
= c1 c2 c3 c4 (1 + + + + ).
1 1 1 + c3 1 c1 c2 c3 c4
1 1 1 1 + c4

(b). Compute the following determinants

sin α + sin β cos β − cos α


(i).
cos β − cos α sin α − sin β

41
α2 + 1 αβ αζ
(ii). 2
αβ β +1 βζ
2
αζ βζ ζ +1

sin α cos α 1
(iii). sin β cos β 1 .
sin ζ cos ζ 1

(c). Prove that

0 1 1 a
1 0 1 b
= a2 + b2 + c2 − 2ab − 2bc − 2ac + 2d.
1 1 0 c
a b c d

5. (a). Consider the two matrices below, and suppose you already have computed
det(A) = −120. What is det(B)? Why?
   
0 8 3 −4 0 8 3 −4
   
 −1 2 −2 5   0 −4 2 3 
   
A= , B =  
 −2 8 4 
3   −2 8 4 3 
  
0 −4 2 3 −1 2 −2 5
(b). Solve the equation

1 1 1 ··· 1
1 1−x 1 ··· 1
1 1 2 − x ··· 1 = 0.
.. .. .. ..
. . . .
1 1 1 ··· n−x

6. Consider the polynomial g = g(x1 , x2 , ..., xn ) defined by



g = g(x1 , x2 , ..., xn ) = (xi − xj ).
i<j

42
Show that g = g(x1 , x2 , ..., xn ) = (−1)n Vn−1 (x) where x = xn and Vn−1 is the Van-
dermonde’s determinant (named after French mathematician and musician Alexandre
Theophile Vandermonde(1735-1796), one of the founders of the theory of determi-
nants) defined by
1 1 ··· 1 1
x1 x2 ... xn−1 x
Vn−1 = x21 x22 ... x2n−1 x2 .
.. .. .. ..
. . ··· . .
xn−1
1 xn−1
2 ... xn−1
n−1 x
n−1

7. Using Cramer’s rule, find x, y, and z for each of the following systems of equations.

3x − 4y + 2z = 1
2x + 3y − 3z = −1
5x − 5y + 4z = 7.

3x + 4y − 2z = 3
2x + 2y − 3z = 1
−x + y − 2z = −2.

4x + 7y − z = 7
3x + 2y + 2z = 9
x + 5y − 3z = 3

x + 3y = 0
2x + 6y + 4z = 0
−x + 2z = 0.

6
x
− 2
y
+ 1
z
= 4
2
x
+ 5
y
− 2
z
= 3
4
5
x
− 1
y
+ 3
z
= 63
4
.

43
x+y+z = 1
x + 1.0001y + 2z = 2
x + 2y + 2z = 1

8. Do the following planes intersect? Give a reason for your answer(Hint: Do not use
Cramer’s rule.)
x+y+z = 2
2x + 2y + 2z = 4
x y z
2
+ 2
+ 2
= 1

 
−2 1 0
 
9. Let A = 
 2 6 2 
.
1 8 4

(a). Find the cofactors of A.

(b). Find adj(A).

(c). Find det(A). Does A−1 exist? Give a reason for your answer. If so, find A−1 .

10. Explain why A is singular if and only if A(aj(A)) = 0.

11. For scalars α, explain why adj(αA) = αn−1 adj(A).

12. Derive the following formulas

1+α1
α1
1 ··· 1
1 1+α2
··· 1 1+

(a). α2
= ∏ αi .
.. .. .. αi
. . .
1 1 ··· 1+αn
αn n×n

44
α β β ··· β
β α β ··· β
{ (α − β)n (1 + nβ
), if α ̸= β
(b). β β α ··· β = α−β

.. .. .. . . . 0, if α = β
. . . . ..
β β β ··· α
n×n

1 + α1 α2 ··· αn
α1 1 + α2 · · · αn
(c). .. .. .. .. = 1 + α1 + α2 + ... + αn .
. . . .
α1 α2 ··· 1 + αn
n×n

13. Use induction to argue that a cofactor expansion of det(A), where A is an n × n


matrix requires
1 1 1
c(n) = n!(1 + + + ... + )
2! 3! (n − 1)!
multiplications for n ≥ 2.
Assume a computer will do 1000, 000 multiplications per second, and neglect all other
operations to estimate how long it will take to evaluate the determinant of a 100 × 100
matrix using cofactor expansion. Hint: Recall the expansion for ex , and use 100! ≈
9.33 × 10157 .

 
0 3 2
 
14.(a). Find det(A), where A = 
 1 5 1 .

−4 2 −1
 
0 3 2
 
(b). With no further computation, what is det 
 10 50 10 ?

−4 2 −1
 
0 3 2
 
(c). With no further computation, what is det  10 56 14 

?
−4 2 −1

45
Chapter 2

EIGENVALUES AND
EIGENVECTORS

2.1 Introduction

We welcome you to the second lecture on eigenvalues and eigenvectors of square matrices.
In this chapter we will prove that every square matrix has at least one eigenvalue and
an eigenvector to go with it. We will also determine the maximum number of eigenval-
ues a square matrix may have. The determinant function will be a powerful tool here.
However, it is possible, with some more advanced machinery, to compute eigenvalues
without ever making use of determinants.

46
Objectives
At the end of this lecture, you should be able to:

• State the eigenvalue problem.

• Determine the characteristic polynomial and characteristic equation of a square


matrix.

• Find eigenvalues and corresponding eigenvectors of a square matrix.

• Determine the algebraic and geometric multiplicities of eigenvalues of a square


matrix.

• Use eigenvalues and eigenvectors to diagonalize and orthogonally diagonalize a


square matrix.

2.2 The Eigenvalue Problem

If A is an n × n matrix, do there exist non-zero vectors x ∈ Rn such that Ax is a


scalar multiple of x, i.e., Ax = λx? This scalar, often denoted by the Greek letter
lambda, is called the eigenvalue of A, and the non-zero vector x is called the eigenvector
of A corresponding to λ. Eigenvectors are also called characteristic vectors and
eigenvalues are also called characteristic values. The pair (λ, x) is sometimes called
an eigenpair. The set of all distinct eigenvalues of A is denoted by σ(A), and called
the spectrum of A.

Remark 2.1

Note that we omit the case x = 0 since A0 = λ0 is true for all values of λ. An eigenvalue
λ = 0, however, is possible. The equation

Ax = λx

is equivalent to
(λI − A)x = 0, x ̸= 0, (2.1)

where I is the n × n identity matrix. If equation (2.1) is to have non-zero solutions, then
λ must be chosen so that the n × n matrix λI − A is singular. That is, det(λI − A) = 0.

47
Therefore the eigenvalue problem consists of two parts:

1. Find all scalars λ such that the matrix A − λI is singular, i.e. det(λI − A) = 0.

2. Given that λI − A is singular, find all the non-zero vectors x such that
(λI − A)x = 0.
Clearly if we know an eigenvalue of A, then the elementary row operation techniques
provide an efficient way to find the eigenvectors.

2.2.1 Applications of the Eigenvalue Problem

Many problems in sciences lead to the eigenvalue problem. Eigenvalue analysis is used
in solving systems of differential equations, solving optimization problems and diago-
nalization of linear transformations. Eigenvalue analysis is also used in the design of
car stereo systems so that the sounds are directed correctly for listening, vibration in
bridges, storey buildings, aircraft, suspension systems of cars and aerospace appliances,
e.g. rockets, missiles, etc and in describing the evolution of ”discrete-time systems”.

2.2.2 Finding Eigenvalues and Eigenvectors

To find the eigenvalues and eigenvectors for an n × n matrix A, we let I be the n × n


matrix. Writing the equation Ax = λx in the form λIx = Ax then produces (λI −A)x =
0. This homogeneous system has non-zero solutions if and only if the coefficient matrix
(λI − A) is not invertible. That is if and only if det(λI − A) = 0.

Theorem 2.1 Let A be an n × n matrix.


1. An eigenvalue of A is a scalar λ such that det(λI − A) = 0.
2. The eigenvalues of A corresponding to λ are the non-zero solutions of the homogeneous
system (λI − A)x = 0.
( )
2 0
Example 2.1 Let A = . Verify that x1 = (1, 0)t is an eigenvector of A
0 −1
corresponding to the eigenvalue λ1 = 2 and x2 = (0, 1)t is an eigenvector of A corre-
sponding to the eigenvalue λ2 = −1.

48
( )( ) ( ) ( )
2 0 1 2 1
Solution. Ax1 = = =2 .
0 −1 0 0 0

Thus x1 = (1, 0)t is an eigenvector of A corresponding to the eigenvalue λ1 = 2.


( )( ) ( ) ( )
2 0 0 0 0
Similarly, Ax2 = = = −1 .
0 −1 1 −1 1

Thus x2 = (0, 1)t is an eigenvector of A corresponding to the eigenvalue λ2 = −1.


 
1 −2 1
 

Exercise. For the matrix A =  0 0 0 , verify that x1 = (−3, −1, 1)t and x2 =

0 1 1
t
(1, 0, 0) are eigenvectors of A and find their corresponding eigenvalues.

Proposition 2.2 If A is an n × n matrix and λ is an eigenvalue of A with a corre-


sponding eigenvector x, then every non-zero scalar multiple of x is also an eigenvector
of A.

Proof. A(cx) = c(Ax) = c(λx) = λ(cx).

Proposition 2.3 If x1 and x2 are eigenvectors of A corresponding to the same eigen-


value λ, then their sum is also an eigenvector corresponding to λ.

Proof. A(x1 + x2 ) = Ax1 + Ax2 = λx1 + λx2 = λ(x1 + x2 ).

Remark 2.2

From Proposition 2.2 and Proposition 2.3, we conclude that the set of all eigenvectors
of a given eigenvalue λ, together with the zero vector, is a vector subspace of Rn . This
special subspace of Rn is called the eigenspace of A, and is usually denoted EA (λ).
Clearly EA (λ) = Ker(λI − A).

Theorem 2.4 Suppose A is an n×n matrix and λ is an eigenvalue of A. Then EA (λ) =


Ker(λI − A).

Proof
First note that 0 ∈ EA (λ) by definition and 0 ∈ Ker(λI − A). Now consider any nonzero

49
vector x ∈ Rn or Cn . Then

x ∈ EA (λ) if f λx = Ax
if f λx − Ax = 0
if f λIx − Ax = 0
if f (λI − A)x = 0
if f x ∈ Ker(λI − A).

2.2.3 Characteristic Polynomial of a Square Matrix

Definition 2.1 Let A be an n × n matrix. The equation det(λI − A) = 0 is called the


characteristic equation of A. When expanded in polynomial form, the polynomial

|λI − A| = λn + cn−1 λn−1 + ... + c1 λ + c0

is called the characteristic polynomial of A and is denoted by χA (λ).

Definition 2.1 tells us that the eigenvalues of an n × n matrix A correspond to the roots
of the characteristic polynomial of A. That is, they are the solutions of the characteristic
equation.

Example 2.2 ( Find the )


characteristic equation, eigenvalues and corresponding eigenvec-
2 −12
tors of A = .
1 −5
Solution. The characteristic equation of A is

χA (λ) = |λI − A|
(λ − 2) 12
=
−1 (λ + 5)
= (λ − 2)(λ + 5) − (−12)
= λ2 + 3λ − 10 + 12
= λ2 + 3λ + 2
= (λ + 1)(λ + 2) = 0

which gives λ1 = −1 and λ2 = −2 as the eigenvalues of A.


We now determine the eigenvectors of A.

50
( ) ( )
−3 12 1 −4
• For λ = λ1 = −1, we have −I − A = which reduces to .
−1 4 0 0
( )( ) ( )
1 −4 x1 0
Solving = , we have x1 − 4x2 = 0. Letting x2 = t ̸= 0, we
0 0 x2 0
conclude that every eigenvector of λ1 is of the form
( ) ( ) ( )
x1 4t 4
v1 = = =t , t ̸= 0.
x2 t 1
( )
4
In particular, v 1 = is an eigenvector of A corresponding to the eigenvalue λ = −1.
1
( ) ( )
−4 12 −1 3
• For λ = λ2 = −2, we have −2I − A = which reduces to .
−1 3 0 0
( )( ) ( )
−1 3 x1 0
Solving = , we have −x1 + 3x2 = 0. Letting x2 = t ̸= 0, we
0 0 x2 0
conclude
( that) every
( eigenvector
) ( of)λ2 is of the form ( )
x1 3t 3 3
v2 = = =t , t ̸= 0. In particular, v 2 = is an eigenvector
x2 t 1 1
of A corresponding to the eigenvalue λ = −2.
 
2 1 0
 
Example 2.3 . Find the eigenvalues and corresponding eigenvectors of A =   0 2 0 .

0 0 2

Solution. The characteristic equation of A is

λ−2 −1 0
χA (λ) = |λI − A| = 0 λ−2 0 = (λ − 2)3 = 0.
0 0 λ−2

Thus the only eigenvalue of A is λ = 2.

To find the eigenvectors


 for λ= 2,we solve
 the homogeneous system (2I − A)x = 0.
0 −1 0 x1 0
    
That is    
 0 0 0   x2  =  0
. This implies that x2 = 0. Using parameters

0 0 0 x3 0

51
s = x1 , t = x3 , we find that the eigenvectors of λ = 2 are of the form
       
x1 s 1 0
       
v=       
 x2  =  0  = s  0  + t  0  , s, t ̸= 0.
x3 t 0 1
   
1 0
   
In particular, v 1 =    
 0  and v 2 =  0  are eigenvectors of A corresponding to the
0 1
eigenvalue λ = 2.
Since λ = 2 has two linearly independent eigenvectors, the dimension of its eigenspace
is 2.
 
3 −2 0
 
Example 2.4 Find the eigenvalues and eigenspaces of A = 
 −2 3 0 
.
0 0 5
Solution. It is easy to show that the characteristic polynomial of A is (λ − 1)(λ − 5)2 .
Therefore the eigenvalues of A are λ = 1 and λ = 5.
    
−2 2 0 x1 0
    
• λ = 1:  2 0     
 −2   x2  =  0 , with solutions x − 3 = 0, x2 =
0 0 −4 x3 0
     
x1 t 1
     
t, x1 = t and t ̸= 0. So v1 =      
 x2  =  t  = t  1  .
x3 0 0
 
{ 1 }
So the eigenvector basis for λ = 1 is  
 1  . That is, the eigenspace for λ = 1 is
0
 
{ 1 }
the span of   1  .

0
    
2 2 0 x1 0
    
• λ=5:  2 2 0   x2  =  0 . Solving, we have x1 = −x2 , x2 =
    
0 0 0 x3 0
x2 , x3 = x3 (i.e it is a free variable or can take any value).

52
       
x1 −x2 −1 0
       
Thus v =      
 x2  =  x2  = x2  1  + x3  0
 . Let x2 = s ̸= 0, x3 = t ̸= 0.

x3 x3 0 1
   
−1 0
   
This implies that v = s  1  + t  0 
  
.
0 1
Thus v can be expressed as a linear combination of two linearly
 independent
   vectors as
{  −1   }
0
shown above. So a basis for the eigenspace for λ = 5 is  1
, 0  .
  
0 1

Thus the eigenspace is two-dimensional.

2.3 Polynomial Matrices

Consider a polynomial f (t) = an tn + ... + a1 t + a0 over a field K. If A is any square


matrix, we define
f (A) = an An + ... + a1 A + a0 I,

where I is the identity matrix of the same size as A. We say that A is a root of f (t) if
f (A) = 0, the zero matrix.  
−1 3 2
 
Example. Let p(x) = 14 + 19x − 3x2 − 7x3 + x4 , and A = 
 1 0 −2 .
−3 1 1
We will compute p(A). First we compute the necessary powers of A. Note that A0 is
defined
 as the identity 
matrix. It
 is easy to show 
that  
−2 1 −6 19 −12 −8 −7 49 54
  3   4  
A2 = 
 5 1 0 , A =  −4 15
  8 , A =  −5 −4 −30 .
  
1 −8 −7 12 −4 11 −49 47 43
 
−139 193 166
 
Then p(A) = 14I + 19A − 3A2 − 7A3 + A4 =   27 −98 −124 .

−198 118 20
Note that p(x) factors as p(x) = 14 + 19x − 3x2 − 7x3 + x4 = (x − 2)(x − 7)(x + 1)2 .
Therefore

53
p(A) = (A − 2I)(A − 7I)(A + I)2 .
This example shows that it is natural to evaluate a polynomial with a matrix, and that
the factored form of the polynomial is as good as (or may be better than) the expanded
form.

Theorem 2.5 [Cayley-Hamilton Theorem] Every square matrix A is a root of its


characteristic polynomial.

2.4 Algebraic Multiplicity and Geometric Multiplicity of an


Eigenvalue

Definition 2.2 Suppose A is an n×n matrix. Let χA (λ) be the characteristic polynomial
of A. The algebraic multiplicity of an eigenvalue λ0 is the number of times it appears
in the factorization χA (λ) = (λ − λ1 )(λ − λ2 )...(λ − λn ) of the characteristic polynomial.
That is, the highest power of (λ − λ0 ) that divides the characteristic polynomial χA (λ).
The geometric multiplicity of λ0 is the dimension of the eigenspace EA (λ0 ) of λ0 .
That is, the number of linearly independent eigenvectors that span EA (λ0 ).

The algebraic multiplicity of an eigenvalue λ0 is usually denoted by αA (λ0 ) and its


geometric multiplicity is denoted by GA (λ0 ). Since an eigenvalue λ0 is a root of the
characteristic polynomial of A, there is a factor (λ − λ0 ) and the algebraic multiplicity is
just the power of this factor in a factorization of χA (λ). In particular, αA (λ0 ) ≥ 1. Since
every eigenvalue must have at least one eigenvector, the associated eigenspace cannot
be trivial(i.e equal to {0}), and so GA (λ0 ) ≥ 1. The relationship between the algebraic
and geometric multiplicities is given in the following theorem.

Theorem 2.6 Geometric multiplicity of A ≤ Algebraic multiplicity of A.

That is 1 ≤ GA (λ) ≤ αA (λ) ≤ n.

Theorem 2.7 Suppose that A is an n × n matrix with distinct eigenvalues λ1 , λ2 , ..., λk .


Then

k
αA (λi ) = n.
i=1

That is, the sum of all algebraic multiplicities is equal to n.

54
Theorem 2.8 Suppose A is an n × n matrix. Then A cannot have more than n distinct
eigenvalues.

Proof Suppose A has k distinct eigenvalues λ1 , λ2 , ..., λk . Then


∑k
k = i=1 1
∑k
≤ i=1 αA (λi )

= n.

Definition 2.3 Matrices in which Geometric multiplicity of A < Algebraic multiplicity


of A are called defective matrices or imperfect matricesor deficient matrices.
These are matrices that fail to possess complete sets of eigenvectors. Matrices in which
geometric multiplicity is equal to algebraic multiplicity are called perfect matrices.

By definition, an n × n matrix A is defective if it has fewer than n eigenvectors.

Example 2.5 If χA (λ) = (λ − 7)4 (λ − 5)(λ − 3)2 is the characteristic polynomial of


A, then the eigenvalues of A are {3, 5, 7}. The eigenvalues 3, 5 and 7 have algebraic
multiplicities 2, 1, and 4, respectively. The geometric multiplicities cannot be determined
from the characteristic polynomial, unless considerably more information is known. All
that may be said in this example is that:

dim(E3 ) ≤ 2, dim(E5 ) ≤ 1, dim(E7 ) ≤ 4.

Example 2.6 Determine


 the algebraic and geometric multiplicities for the eigenvalues
1 1 0
 
of A =  0 1 1 

.
0 0 1

Solution. The characteristic polynomial of A is χA (λ) = (λ − 1)3 , and thus the only
eigenvalue of A is λ = 1. This eigenvalue has algebraic multiplicity 3. Solving for the
eigenspace using (I − A)x = 0 gives x2 = 0, x3 = 0 and x1 is a free variable. Thus v is
in the eigenspace E1 if and only if v is of the form
   
x1 1
   
v=   
 0  = x1  0  .
0 0
Thus the geometric multiplicity of the eigenvalue λ = 1 is 1.

55
Example 2.7 Determine
 the algebraic and geometric multiplicities for the eigenvalues
1 1 0
 
of B = 
 0 1 0 .

0 0 1

Solution. The characteristic polynomial of B is χB (λ) = (λ − 1)3 and so λ = 1 is the


only eigenvalue of B and it has algebraic multiplicity
 3. The
 corresponding
  eigenspace
0 −1 0 x1 0
    
is found by solving (I − B)x = 0. That is     
 0 0 0   x2  =  0 . That is,
0 0 0 x3 0
x2 = 0, x1 
= t, x
3 = s, with
  t ̸
= 0 and
  s ̸
= 0.
t 1 0
     
Thus v =      
 0  = t  0  + s  0 . This shows that the geometric multiplicity for
s 0 1
λ = 1 is 2.

2.4.1 Characteristic Polynomials of Block Triangular Matrices

Suppose that(A is a block


) triangular matrix with square diagonal blocks A1 and A2 .
A1 B
That is A = where A1 and A2 are square [Link] λI − A is also a
0 A2
block triangular matrix with diagonal blocks λI − A1 and λI − A2 . Thus

λI − A1 −B
|λI − A| = = |λI − A1 |.|λI − A2 | = χA1 (λ) . χA2 (λ).
0 λI − A2

That is, the characteristic polynomial of A is the product of the characteristic polyno-
mials of the diagonal blocks A1 and A2 .
By induction, we have the following result.

Theorem 2.9 Suppose A is a block triangular matrix with square diagonal blocks A1 , A2 , ..., Ar .
Then the characteristic polynomial of A is the product of the characteristic polynomials
of the diagonal blocks Ai , i = 1, 2, ..., r. That is,

χA (λ) = χA1 (λ) . χA2 (λ)...χAr (λ).

56
 
9 −1 5 7
 
 8 3 2 −4 
 
Example 2.8 Find the characteristic polynomial of M =  .
 0 0 3 6 
 
0 0 −1 8

Solution. Check that M is a block triangular matrix with diagonal blocks M1 and M2 ,
with χM1 (λ) = λ2 −12λ+35 = (λ−5)(λ−7) and χM2 (λ) = λ2 −11λ+30 = (λ−5)(λ−6).
Thus the characteristic polynomial of M is the product

χM (λ) = χM1 (λ) . χM2 (λ) = (λ − 5)2 (λ − 6)(λ − 7).

2.5 Similarity and Diagonalization

We know that two linear systems of equations have the same solution if their augmented
matrices are row equivalent. We now identify classes of matrices that have the same
eigenvalues.

2.5.1 Similar Matrices

Definition 2.4 A matrix A is said to be similar to a matrix B if there exists an


invertible(non-singular) matrix P such that P −1 AP = B.

We refer to P −1 AP as a similarity transformation. The notion of two matrices being


similar is a lot like saying they are row-equivalent. Two similar matrices need not be
equal, but they share many important properties.

Definition 2.5 Suppose A is a square matrix. Then A is said to be diagonalizable


if there exists a non-singular matrix P such that D = P −1 AP , where D is a diagonal
matrix.

Remark 2.3

Clearly, a square matrix A is diagonalizable if it is similar to a diagonal matrix D.

57
Theorem 2.10 If A and B are similar square matrices, then A and B have the same
characteristic polynomials and hence the same eigenvalues. Moreover, these eigenvalues
have the same algebraic multiplicity.

Proof. Since A and B are similar, there exists a non-singular matrix P such that
B = P −1 AP . To establish the above fact, observe that

χB (λ) = |λI − B|
= |λI − P −1 AP |
= |λP −1 P − P −1 AP |
= |P −1 (λI − A)P |
= |P −1 ||λI − A||P |
= |P −1 ||P ||λI − A|
= |λI − A|
= χA (λ)

Remark 2.4

Note that although similar matrices always have the same characteristic polynomial, it
is not true that two matrices with the same characteristic polynomials are necessarily
similar.

Example 2.9 Consider the following matrices


( ) ( )
1 0 1 0
A= , I= .
1 1 0 1

Now χ(λ) = (λ − 1)2 is the characteristic polynomial for both A and I; so A and I have
the same set of eigenvalues. If A and I were similar, however, there would be a 2 × 2
matrix P such that I = P −1 AP . But the equation I = P −1 AP is equivalent to P = AP ,
which is in turn equivalent to P P −1 = A or I = A. Thus I and A cannot be similar.

Remark 2.5

Two matrices can have exactly the same characteristic polynomial without being similar,
so similarity leads to a more finely detailed way to distinguish matrices. Although similar
matrices have the same eigenvalues, they do not generally have the same eigenvectors.

58
For example, if B = P −1 AP and if Bx = λx, then P −1 AP x = λx or A(P x) = λ(P x).
Thus if x is an eigenvector for B corresponding to λ, then P x is an eigenvector for A
corresponding to λ.

2.5.2 Diagonalization of Square Matrices

Computations involving a square matrix A can be simplified if we know that A is similar


to a diagonal matrix. Suppose we need to calculate the power Ak where k is a positive
integer. Knowing that D = P −1 AP , we can proceed as follows:

Dk = (P −1 AP )k = P −1 Ak P.

Since D is diagonal, it is easy to form the power Dk . Once the matrix Dk has been
computed, the matrix Ak can be recovered easily by forming P Dk P −1 :

P Dk P −1 = P (P −1 Ak P )P −1 = Ak .

Diagonalization Algorithm

Given an n × n matrix A,
Step 1: Find the characteristic polynomial χA (λ) of A.
Step 2: Find all the n roots of χA (λ) to obtain the eigenvalues of A.
Step 3: Find all the eigenvectors v i corresponding to each eigenvalue λi .
Step 4: Consider the collection S = {v 1 , v 2 , ..., v m } of all eigenvectors obtained in Step 3.

(a). If m ̸= n then A is not diagonalizable.


(b). If m = n then A is diagonalizable. Let P be the matrix whose columns are the
eigenvectors v 1 , v 2 , ..., v n . Then D = P −1 AP = diag(λ1 , λ2 , ..., λn ) is diagonal, where λi
is the eigenvalue corresponding to the eigenvector v i .

Theorem 2.11 Let A be an n × n matrix. Suppose that v 1 , v 2 , ..., v n are non-zero eigen-
vectors of A belonging to distinct eigenvalues λ1 , λ2 , ..., λn . Then v 1 , v 2 , ..., v n are linearly
independent.

Proof. We prove by contradiction. Suppose the theorem is not true. Let v 1 , v 2 , ..., v s
be a minimal set of vectors for which the theorem is not true. We have s > 1, since

59
v 1 ̸= 0. Also, by the minimality condition, v 2 , v 3 , ..., v s are linearly independent. Thus
v 1 is a linear combination of v 2 , v 3 , ..., v s , say

v 1 = a2 v 2 + a3 v 3 + ... + as v s (2.2)

where some ak ̸= 0. Applying A to both sides of (2.2) and using the linearity of A yields

Av 1 = λ1 v 1 = a2 Av 2 + a3 Av 3 + ... + as Av s = a2 λ2 v 2 + a3 λ3 v 3 + ... + as λs v s (2.3)

Multiplying (2.2) by λ1 yields

λ1 v 1 = a2 λ2 v 2 + ... + a − sλs v s (2.4)

Subtracting (2.3) from (2.4) yields

a2 (λ1 − λ2 )v 2 + a3 (λ1 − λ3 )v 3 + ... + as (λ1 − λs )v s = 0. (2.5)

Since v 1 , v 2 , ..., v s are linearly independent, the coefficients in (2.5) must all be zero.
That is
a2 (λ1 − λ2 ) = 0, a3 (λ1 − λ3 ) = 0, ..., as (λ1 − λs ) = 0.

However, the λi are distinct. Hence λ1 − λj ̸= 0 for j > 1. Hence a2 = 0, a3 = 0, ..., as =


0. This contradicts the fact that some ak ̸= 0. Thus our initial assumption is false.
Thus the theorem is proved.

Theorem 2.12 Suppose χA (λ) = (λ − λ1 )(λ − λ2 )...(λ − λn ) is the characteristic poly-


nomial of an n × n matrix A, and that the n roots are distinct. Then A is diagonalizable.

Proof. Since A has n distinct eigenvalues, by Theorem 2.11 A has a set of n linearly
independent eigenvectors. Thus A is diagonalizable.

Corollary 2.13 Let A be an n × n matrix. If A has n distinct eigenvalues, then A has


a set of n linearly independent eigenvectors.

A matrix A is said to be nilpotent if Ak = 0, for some integer k ≥ 1. We have


found that matrices with non-distinct eigenvalues may or may not be perfect.
( Note )
that
0 1
not all square matrices are diagonalizable. For instance the matrix A = is
0 0

60
not diagonalizable. Observe that A2 = 0. If there exists a non-singular matrix P such
that P −1 AP = D, where D is diagonal then D2 = P −1 A2 P = 0 implies that D = 0,
which implies that A = 0, which is false. Thus A, as well as any other nonzero nilpotent
matrix, is not diagonalizable. Non-zero nilpotent matrices are not the only ones that
can’t be diagonalized, but as we will see, nilpotent matrices paly a particularly nilpotent
role in non-diagonalizability.
We give necessary and sufficient conditions for a square matrix to be diagonalizable.

Theorem 2.14 Suppose A is an n × n matrix. Then A is diagonalizable if and only if


there is a linearly independent set S that contains n eigenvectors of A.

Proof.
(⇐=): Let S = {v 1 , v 2 , ..., v n } be a linearly independent set of eigenvectors of A corre-
sponding to the eigenvalues λ1 , λ2 , ..., λn of A. Define P = [v 1 |v 2 |...|v n ]. That is, P is a
matrix whose columns are the eigenvectors of A. Then
 
λ1 0
 
D= 
...  = [λ1 e1 |λ2 e2 |...|λn en ]

0 λn

The columns of P are the eigenvectors of the linearly independent set S and so P is
non-singular. We know P −1 exists and

P −1 AP = P −1 A[v 1 |v 2 |...|v n ]
= P −1 [Av 1 |Av 2 |...|Av n ]
= P −1 [λ1 v 1 |λ2 v 2 |...|λn v n ]
= P −1 [λ1 P e1 |λ2 P e2 |...|λn P en ]
= P −1 [P (λ1 e1 )|P (λ2 e2 )|...|P (λn en )]
= P −1 P [λ1 ev 1 |λ2 e2 |...|λn en ]
= ID
= D

Thus A is similar to a diagonal matrix D, via the non-singular matrix P . Thus A is


diagonalizable.
(=⇒): Suppose A is diagonalizable. Then there is a non-singular n × n matrix T =

61
 
d1 0
 
[y 1 |y 2 |...|y n ] and a diagonal matrix E = 

..
.  = [d1 e1 |d2 e2 |...|dn en ] such

0 dn
−1
that T AT = E. Then consider

[Ay 1 |Ay 2 |...|Ay n ] = A[y 1 |y 2 |...|y n ]


= AT
= IAT
= T T −1 AT
= TE
= T [d1 e1 |d2 e2 |...|dn en ]
= [T d1 e1 |T d2 e2 |...|T dn en ]
= [d1 T e1 |d2 T e2 |...|dn T en ]
= [d1 y 1 |d2 y 2 |...|dn y n ]

Thus we conclude that the individual columns are equal vectors. That is, Ay i = di y i ,
for 1 ≤ i ≤ n. That is, y i is an eigenvector of A corresponding to the eigenvalue di .
Since T is non-singular, the set S containing T ′ s columns is a linearly independent set.
So the set set S has all the required properties.
It is clear that diagonalizable matrices have full eigenspaces.

Theorem 2.15 A square matrix A is diagonalizable if and only if the algebraic multi-
plicity of each eigenvalue is the same as its geometric multiplicity. That is, for every
eigenvalue λ of A, GA (λ) = αA (λ).

Corollary 2.16 Suppose A is an n × n matrix with n distinct eigenvalues. Then A is


daigonalizable.

Proof Let λ1 , λ − 2, ..., λn denote the n distinct eigenvalues of A. Then we have n =


∑n
i=1 αA (λi ), which implies that αA (λi ) = 1, 1 ≤ i ≤ n. It follows that GA (λi ) = 1,

1 ≤ i ≤ n. So GA (λi ) = αA (λi ), 1 ≤ i ≤ n and by Theorem 2.15, A is diagonalizable.


( )
5 −6
Example 2.10 . Show that A = is diagonalizable. Compute A10 .
3 −4

62
Solution. It is easy to verify
( that ( ) λ1 = 2 and λ2 = −1 with corre-
) A has eigenvalues
2 1
sponding eigenvectors v 1 = and v 2 = . Forming P = [v 1 v 2 ], we obtain
1 1
( ) ( )
2 1 −1 2
P = , P −1 = . It is easy to check that
1 1 1 −1
( )
2 0
P −1 AP = =D
0 −1

We now compute A10 .


( ) ( )
210 0 1024 0
D10 = =
0 (−1)10 0 1

Hence A10 = P D10 P −1 is given by


( )
2047 −2046
A10 =
1023 −1022

Definition 2.6 An n × n matrix Q is called an orthogonal matrix if Q is invertible


and Q−1 = Qt .

Definition 2.6 can be rephrased as follows: A square matrix Q is orthogonal if and only
if Qt Q = I. Another useful description of orthogonal matrices can be obtained from the
above relation. Suppose Q = [q1 , q − 2, ..., qn ] is an n × n matrix. since the ith row of Qt
is equal to qit , the definition of matrix multiplication tells us that: the ij-th entry of Qt Q
is equal to qit qj . Thus a matrix Q is orthogonal if and only if the columns {q1 , q2 , ..., qn }
of Q, form an orthonormal set of vectors.

Example 2.11 Verify that the matrices Q1 and Q2 are orthogonal


   
1 0 1 0 0 1
1  √   
Q1 = √  0 2 0  , Q2 =  1 0 0  .
2   
−1 0 1 0 1 0

Definition 2.7 A square matrix A is said to be symmetric if At = A.

63
( ) ( )
1 2 1 2
Example 2.12 Let A = and B = . Then A is symmetric while B
2 3 1 2
is not symmetric.

Symmetric matrices A have the property that αA (λ) = GA (λ) and hence are examples
of perfect matrices.

Definition 2.8 A square matrix A is said to be real symmetric if it is symmetric and


all its entries are real numbers.

Theorem 2.17 If A is an n × n real symmetric matrix, then all its eigenvalues are real.

Proof. Let A be any n × n real symmetric matrix, and suppose that Av = λv, v ̸= 0
and where we allow the possibility that v is a complex vector. To isolate λ, we first note
that
v t (Av) = v t (λv) = λ(v t v).

Regarding Av as a vector, we see that v t (Av) = (Av)t v. Thus

λv t v = v t (Av) = (Av)t v = v t At v.

This holds since A = At . Since A is real, we also know that Av = λv. Hence, we deduce
that
λv t v = λv t v.

Because v ̸= 0, v t v is non-zero, and we see that λ = λ, which means that λ is real.

In many physical problems, a matrix of interest will be real and symmetric. If the
eigenvalues are to represent physical quantities of interest, then they need to be real
numbers.

2.6 Orthonormal diagonalization

We have seen some conditions for a square matrix to be similar to a diagonal matrix.
Notice that a similarity transformation is a change of basis on a matrix representation.
So we can now discuss the choice of basis used to build a matrix representation, and
decide if some bases are better than others.

64
2.6.1 Diagonalization of Symmetric Matrices

Gram Schmidt Orthogonalization Process Suppose {v 1 , v 2 , ..., v 2 } is a basis of


an inner product space V . One can use this basis to construct an orthogonal basis
{w1 , w2 , ..., wn } of V as follows:

Set
w1 = v1
⟨v 2 ,w1 ⟩
w2 = v 2 − ⟨w 1 ,w 1 ⟩
w1
⟨v 3 ,w1 ⟩ ⟨v 3 ,w2 ⟩
w3 = v3 − ⟨w1 ,w1 ⟩ 1
w − ⟨w 2 ,w 2 ⟩
w2
.. .. ..
. . .
⟨v n ,w1 ⟩ ⟨v n ,w2 ⟩ ⟨v n ,wn−1 ⟩
wn = v n − ⟨w1 ,w1 ⟩ 1
w − ⟨w2 ,w2 ⟩ 2
w − ... − w
⟨wn−1 ,wn−1 ⟩ n−1

Each wk is orthogonal to the preceding w′i s. Thus {w1 , w2 , ..., wn } is an orthogonal


basis for V . Normalizing each wi will yield an orthonormal basis for V . This process is
known as the Gram-Schimdt orthogonalization process.

Example 2.13 Use the Gram-Schmidt process to construct an orthonormal basis set
from      
1 0 3
     
v1 =      
 2  , v 2 =  1  , v 3 =  −7 
−1 −1 1

Solution We use have w1 = v 1 . Normalizing this vector we have u1 = √ w1 =


⟨w1 ,w1 ⟩
 
1
 
√  2 . To find the second vector, first compute ⟨u1 , v 2 ⟩ = 3. The first vector is
1
6  
−1
already normalized, so
     
0 1 − 12
⟨w1 , v 1 ⟩   3   
w2 = v 2 − u1 = 
 1 −  2 = 0 
 6   
⟨w1 , w1 ⟩
−1 −1 −2 1

   
− 12 − √12
√    
Normalizing we have u2 = √ w2
= 2
 0  =  0 .
  
⟨w2 ,w2 ⟩
−2
1
− 2
√1

65
Finally, the third vector is found from
⟨w1 , v 3 ⟩ ⟨w2 , v 3 ⟩
w3 = v 3 − w1 − w2 .
⟨w1 , w1 ⟩ ⟨w2 , w2 ⟩
       
3 1 − 1
3
  12   2 2   
    
But ⟨w2 , v 3 ⟩ = −2, so w3 =  −7  + 6  2  + 1  0  =  −3   .
2

1 −1 − 12 −3
Normalizing ⟨w3 , w3 ⟩ = 27 and so
     
3 3 1
w3 1   1  
 −3  = √1  −1 
 
u3 = =√  −3 = √
∥w3 ∥ 27   3 3  3 
−3 −3 −1

It is easy to check that {w1 , w2 , w3 } is an orthogonal set while {u1 , u2 , u3 } is an or-


thonormal set.

We now show that every symmetric matrix can be diagonalized by an orthogonal matrix.
We demonstrate this by first stating the following theorem.

Theorem 2.18 (Schur’s Theorem) Let A be an n × n matrix, where A has only real
eigenvalues. Then there is an n × n matrix Q such that Qt AQ = T , where T is an n × n
upper triangular matrix.

Theorem 2.19 Let A be a real nn matrix.


(a). If A is symmetric, then there exists an orthogonal matrix Q such that Qt AQ = D,
where D is diagonal.
(b). If Qt AQ = D, where Q is orthogonal and D is diagonal, then A is a symmetric
matrix.

Proof.(a). Suppose A is symmetric. Then A has only real eigenvalues. Thus there exist
an orthogonal matrix Q such that Qt AQ = M , where M is an upper triangular matrix.
Using the transpose operation on the above equality and also using the fact that At = A,
we obtain
M t = (Qt AQ)t = Qt At Q = Qt AQ = M.

Thus, since M is upper triangular and M t = M , it follows that M is a diagonal matrix.


(b). Suppose that Qt AQ = D, where Q is orthogonal and D is diagonal. Since D is

66
diagonal, we know that Dt = D. Thus, using the transpose operation on this equality,
we obtain
Qt AQ = D = Dt = (Qt AQ)t = Qt At Q.

From this result, we see that Qt AQ = Qt At Q. Multiplying by Q and Qt , we obtain

Q(Qt AQ)Qt = Q(Qt At Q)Qt ,

or
(QQt )A(QQt ) = (QQt )At (QQt )

Thus A = At . Hence A is symmetric.

Remark 2.6

Theorem 2.14 above states that every real symmetric matrix A is orthogonally diag-
onalizable. That is, there exists an orthogonal matrix Q such that Qt AQ = D, where
D is diagonal. The eigenvalues of A are the diagonal entries of D and eigenvectors of A
can be chosen as the columns of Q. Since the columns of Q form an orthonormal set,
we have the following result.

Theorem 2.20 Suppose A is a symmetric matrix and x and y are two distinct eigen-
vectors of A corresponding to distinct eigenvalues. Then x and y are orthogonal vectors.

Proof Let x be an eigenvector of A corresponding to the eigenvalue λ and let y be an


eigenvector of A corresponding to a different eigenvalue ρ. So we have λ − ρ ̸= 0. Then

⟨x, y⟩ = 1
λ−ρ
(λ − ρ)⟨x, y⟩
= 1
λ−ρ
(λ⟨x, y⟩ − ρ⟨x, y⟩)
= 1
λ−ρ
(⟨λx, y⟩ − ⟨x, ρ y⟩)
= 1
λ−ρ
(⟨λx, y⟩ − ⟨x, ρy⟩)
= 1
λ−ρ
(⟨Ax, y⟩ − ⟨x, Ay⟩)
= 1
λ−ρ
(⟨Ax, y⟩ − ⟨Ax, y⟩)
1
= λ−ρ
(0)
= 0.

Thus x is orthogonal to y.

67
Corollary 2.21 Let A be an n × n symmetric matrix. It is possible to choose eigenvec-
tors v 1 , v 2 , ..., v n for A such that {v 1 , v 2 , ..., v n } is an orthonormal basis for Rn .

Example 2.14 Find an orthonormal basis for R4 consisting of the eigenvectors of


 
1 −1 −1 −1
 
 −1 1 −1 −1 
 
A= .
 −1 −1 1 −1 
 
−1 −1 −1 1

Solution. The characteristic polynomial of A is χA (λ) = (λ − 2)3 (λ + 2). Thus the


eigenvalues of A are λ1 = λ2 = λ3 = 2 and λ4 = −2. It is easy to verify that the
corresponding eigenvectors are given by
       
1 1 1 1
       
 −1   0   0   1 
       
w1 =   , w2 =   , w3 =   , w4 =  .
 0   −1   0   1 
       
0 0 −1 1
Note that w1 , w2 , w3 belong to the eigenspace associated with λ = 2, whereas w4 is in the
eigenspace associated with λ = −2. Check that wt1 w4 = wt2 w4 = wt3 w4 = 0. Also note
that the matrix P = [w1 , w2 , w3 , w4 ] will diagonalize A. However, P is not an orthogonal
matrix. To obtain an orthonormal basis for R4 (and hence an orthogonal matrix Q that
diagonalizes A), we first find an orthogonal basis for the eigenspace associated with
λ = 2. Applying Gram-Schmidt process to the set {w1 , w2 , w3 }, we produce orthogonal
vectors      
1 1/2 1/3
     
 −1   1/2   1/3 
     
v1 =   , v2 =   , v3 =  .
 0   −1   1/3 
     
0 0 −1
Thus the set {v 1 , v 2 , v 3 , w4 } is an orthogonal basis for R4 consisting of eigenvectors of
A. This set can then be normalized to determine an orthonormal basis for R4 and an
orthogonal matrix Q that diagonalizes A.

68
2.7 Solved Exercises
( )
1. Let V = P2 and T (x) = T x(t) = (1 + t2 )x′′ (t) + x′ (t) + x(t), where ′ = d
dt
and P2 is
the vector space of polynomials of degree less than or equal to 2. Find the eigenvalues
and eigenvectors of T .

Solution. One must verify that T : V −→ V , and this can be done in the process
of the computations. Pick an ordered basis {1, t, t2 } for V and compute each T (ti ) to
find the matrix A of T . If the image under T of each basis member lies in V , then so
does the image T (x) for every x ∈ V , by the linearity of T . Hence the construction
of A by expressing each T (ti ) as a linear combination of {1, t, t2 } can succeed only if
T : V −→ V . From T (1) = 1, T (t) = 1 + t and T (t2 ) = 2(1 + t2 ) + 2t + t2 , we have
 
1 1 2
 
A = [T ] = 
 0 1 2 .

0 0 3
Hence χA (λ) is the characteristic polynomial of A. Thus the eigenvalues of T are 1 and
3, with algebraic multiplicities 2 and 1, respectively.
 
0 1 0
 
• For λ1 = 1: (A − I) after row-reduction gives  0 0 1 
. A basis for the eigenspace
0 0 0
 
{ 1 }
is  
 0  .
0
   
−2 0 3 {  1.5  }
 

• For λ2 = 3: (A−3I) after row-reduction yields  0 1 −1 , which gives 
 
 1 
0 0 0 1
as a basis.
Returning to P2 , the eigenvalue, eigenspace pairs for T are λ + 1, E1 = span{e1 }, with
e1 = 1, and λ2 = 3, E2 = span{e2 } with e2 = 1.5 + t + t2 . Since all the eigenvectors
have been found, there is not a basis of eigenvectors for T .

69
 
3 1 1
 
2. Let A =  
 1 3 1 .
1 1 3

(a). Find all the eigenvalues of A.


(b). Exhibit two different bases for R3 consisting solely of eigenvectors for A.
Solution. (a). It is easy to check that {5, 2, 2} is the set of eigenvalues of A.
   
−2 1 1 1 0 −1
   
• For λ = 5: we have A − 5I = 
 1 1 
−2  which reduces to  0 1 −1 .
 
1 1 −2 0 0 0
 
{ 1 }
So this eigenspace has dimension one and has a basis S5 =  
 1  .
1
   
1 1 1 1 1 1
   
• For λ = 2: we have A − 2I =  1 1 1  which reduces to  0 0 0 .
  
1 1 1 0 0 0
   
{ −1 −1
  }
So this eigenspace has dimension two and a basis is S2 =  1 , 0 
  
0 1
Putting these together, we obtain our first basis for R :3
     
{ 1   −1 −1
  }
S=   
 1 , 1  
, 0
 .

1 0 1
There are many ways to transform this into a second basis, e,g, choose an orthonormal
basis, i.e. a basis of pairwise orthogonal unit vectors. To do this use Gram-Schmidt
procedure to obtain:
     
{ 1  1  1  −1  1  −1
}
Ω= √   1  ,√   1   ,√  −1 
 .
3 2 6
1 0 2
( )
5 2
3. Show that A = satisfies the Cayley-Hamilton Theorem.
9 2

70
SolutionThe characteristic equation of A is χA (λ) = λ2 − 7λ − 8 = 0. The Cayley-
( )
43 14
Hamilton Theorem tells us that A2 − 7A − 8I = 0. Note that A2 = , 7A =
63 22
( ) ( )
35 14 8 0
, 8I = . Putting these together, we obtain
63 14 0 8
( ) ( ) ( ) ( )
43 14 35 14 8 0 0 0
A2 − 7A − 8I = − − =
63 22 63 14 0 8 0 0

4. Suppose that λ and ρ are two different eigenvalues of a square matrix A. Prove
that the intersection of the eigenspaces for these two eigenvalues is trivial. That is,
EA (λ) ∩ EA (ρ) = {0}.
Solution.
It suffices to show that the two sets are equal. First, note that {0} ⊆ EA (λ) ∩ EA (ρ).
Choose x ∈ {0}. Then x = 0. Eigenspaces are subspaces, so both EA (λ) and EA (ρ)
contain the zero vector, and therefore x ∈ EA (λ) ∩ EA (ρ). That is

{0} ⊆ EA (λ) ∩ EA (ρ)..........(∗)

To show that EA (λ)∩EA (ρ) ⊆ {0}, suppose x ∈ EA (λ)∩EA (ρ). Then x is an eigenvector
of A for both λ and ρ and so

x = 1x
= 1
λ−ρ
(λ − ρ)x, λ ̸= ρ, λ − ρ ̸= o
= 1
λ−ρ
(λx − ρx)
= 1
λ−ρ
(Ax − Ax)
1
= λ−ρ
(0)
= = 0.

Since x = 0, and trivially x ∈ {0}. That is

EA (λ) ∩ EA (ρ) ⊆ {0}..........(∗∗)

From (*) and (**), equality follows.

5. Suppose A is a square matrix and λ is an eigenvalue of A. Let q(x) be a polynomial


in the variable x. Show that q(λ) is an eigenvalue of the matrix q(A).

71
Solution.
Let x ̸= 0 be one eigenvector of A corresponding to λ, and write q(x) = a0 + a1 x +
a2 x2 + ... + am xm . Then

q(A)x = (a0 A0 + a1 A1 + ... + am Am )x


= (a0 A0 )x + (a1 A1 )x + ... + (am Am )x
= a0 (A0 x) + a1 (A1 x) + ... + am (Am x)
= (a0 λ0 )x + (a1 λ1 )x + ... + (am λm )x
= (a0 λ0 + a1 λ1 + ... + am λm )x
= q(λ)x

So x ̸= 0 is an eigenvector of q(A) corresponds to eigenvalue q(λ).

6. Suppose that A is a square matrix. Prove that the constant term of the characteristic
polynomial of A is equal to det(A).
Solution. Suppose the characteristic polynomial χA (λ) = a0 + a1 λ + ... + an λn . Then

a0 = a0 + a1 (0) + a2 (0) + ... + an (0)


= χA (0)
= det(A − 0I)
= det(A)

7. Suppose A is a square matrix. Prove that a single vector may not be an eigenvector
of A corresponding to two different eigenvalues.
Solution. Suppose x ̸= 0 is an eigenvector of A corresponding to two eigenvalues λ and
ρ, where λ ̸= ρ. Then λ − ρ ̸= 0, and so we also have

0 = Ax − Ax
= λx − ρx
= (λ − ρ)x

Either λ − ρ = 0 or x = 0 , which are both contradictions. This proves the claim.

8. Suppose A and B are similar matrices. Prove that A3 and B 3 are similar matrices.
Generalize.

72
Solution
We know that there is a non-singular matrix P such that A = P −1 BP . Then
A3 = (P −1 BP )3
= (P −1 BP )(P −1 BP )(P −1 BP )
= P −1 BBBP
= P −1 B 3 P
More generally, if A is similar to B and k is any non-negative integer, then Ak is similar
to B k . This can be proved by mathematical induction on k.

9. Suppose B is a non-singular matrix. Prove that AB is similar to BA.


Solution Note that B −1 (BA)B = (B −1 B)(AB) = IAB = AB.
( )
−1 4
10. Find a matrix Q that orthogonally diagonalizes A = .
4 5
λ = −3 and (
Solution It can be easily checked(that ) λ = 7)are the eigenvalues of A with
−2 1
corresponding eigenvectors v 1 = and v 2 = .
1 2

Since A is symmetric, the eigenvectors are orthogonal. To get eigenvectors of length 1,


we normalize v 1 and v 2 to get the new unit vectors:
( )
√2
v1 5
w1 = ∥v 1 ∥
=
√1
5
and
( )
√1
v2 5
w2 = ∥v 2 ∥
= .
√2
5

Therefore, Q = [w1 |w2 ]. That is


( )
√2 √1
5 5
Q=
− √15 √2
5
 
−2 0 3
 
10. Find the eigenvalues and all the eigenvectors of the matrix A = 
 0 4 0 
.
−6 0 7

73
Does there exist a vector v ∈ R3×1 such that Av = 2v?
Solution The characteristic equation is −(λ − 1)(λ − 4)2 and the eigenvalues of A are
λ = 1 and λ = 4 (of algebraic multiplicity 
2).   
−3 0 3 1 0 1
   
• If λ = 1, (A − I) reduces to (A − I) = 
 0 3 0  ∼  0 1 0 .

−6 0 6 0 0 0
The rank
 is 2, so the eigenspace has dimension 3 − 2 = 1, generated by the eigenvector
1
 
v1 =  0 .

−1
   
−6 0 3 2 0 −1
   
• If λ = 4 (A − 4I) reduces to (A − 4I) = 
 0 0 0  ∼  0 0 0 .
  
−6 0 3 0 0 0
The rank is 1, so the eigenspace
 has dimension 3
 − 1 = 2, generated by two linearly
1 0
   
independent eigenvectors v 2 =    
 0  and v 3 =  1 .
2 0
The answer is ”no”. Because if it was true, then λ = 2 would be an eigenvalue of A,
which is not.

 f : R −→R be the linear map which is in the usual basis of R has the matrix
3 3 3
11. Let
2 0 −3
 
A=  0 5 0 .

4 0 9
(a). Find all eigenvalues and the corresponding eigenvectors of f .
(b). Is there a basis for R3 such that the matrix of f with respect to this basis is a
diagonal matrix?
Solution
(a). It is easy to show that χA (λ) = −(λ − 5)2 (λ − 6) and hence the eigenvalues of A
are λ = 5(with algebraic multiplicity 2) and λ = 6.  
1
 
• If λ = 5, there are two linearly independent eigenvectors v 1 =  
 0  and v 2 =
−1

74
 
0
 
 1 .
 
0
 
3
 
• If λ = 6, we have v 3 =  0 

 as its corresponding eigenvector.
−4
(b). Since we have 3(= dimR3 ) linearly independent eigenvectors, these must be a basis
for R
3
, and the matrix
 with respect to {v 1 , v 2 , v 3 } is the diagonal matrix
5 0 0
 
D=  0 5 0 .

0 0 6
   
2 2 1 2 1 −1
   
12. Prove that the matrices A =  
 1 3 1  and B =  0
 2 −1  have the

1 2 2 −3 −2 3
same characteristic polynomial, and yet they are not similar.
Solution A simple computation shows that χA (λ) = −(λ − 1)2 (λ − 5) and χB (λ) =
−(λ − 1)2 (λ − 5). That is χA (λ) = χB (λ).
If they are not similar, then λ = 1 , which has algebraic multiplicity 2, must have dif-
ferent geometric multiplicities for A and B.
For λ = 1we reduce:  
1 2 1 1 2 1
   
A−I =  
 1 2 1  ∼  0 0 0  and

1 2 1 0 0 0
   
1 1 −1 1 0 0
   
B−I = 0   
1 −1  ∼  0 1 −1  .
−3 −2 2 0 0 0
Since A − I has rank 1, and B − I has rank 2, the matrices A and B cannot be similar.
 
1 0 0
 
13. Given that A =   1 1 1 ,

1 0 2
(a). find all the eigenvalues and the corresponding eigenvectors of A.
(b). find a matrix P such that P −1 AP = D.

75
Solution
A simple computation gives χA (λ) = (1 − λ)2 (2 − λ) and the eigenvectors  are λ =
1
 
1(algebraic multiplicity 2) and λ = 2 with corresponding eigenvalues v 1 =   0  and

−1
   
0 0
   
v2 =    
 1  and v 3 =  1 .
0 1
   
1 0 0 1 0 0
   
It is immediately seen that P = [v 1 |v 2 |v 3 ] = 
 0 1 1  and D =  0 1 0 .
  
−1 0 1 0 0 2
 
0 2 0
 
14. Diagonalize the matrix A =   2 0 2 .

0 2 0
Solution Notice that since A is symmetric , the eigenvectors of A forms a basis for
√ √
R3 . Solving the characteristic polynomial gives {0, −2 2, 2 2} as the
 eigenvalues
 of
1 1
   √ 
A, corresponding normalized eigenvectors v 1 = √12   0  , v2 = 1  − 2  , v3 =
 2 
−1 1
 
1
1
 √ 
2 2 
.
1
 
√1 1 1
 2 2
√ √
2

The matrix that orthogonally diagonalizes A is Q = 
 0 − 2
2
2
2 .

− √12 1 1
  2 2

0
0 0
 √ 
Note that Qt AQ = 
 0 −2 2 0 .
√ 
0 0 2 2

15. Find two matrices that are of the same size and have the same determinant but are
not similar.(Hint: keep this simple. Look at 2 × 2 diagonal matrices).

76
( ) ( )
1
1 0 2
0
Solution Let A = and B = .
0 1 0 2
Clearly det(A) = det(B). But A and B are not similar. For suppose they are similar.
Then B = P −1 AP = P −1 P = I, which is false.

2.8 Exercises
   
4 −2 4 2
   
1. Given the matrix A = 
 −2 2 
7  and v =  1 ,
 
4 2 4 −2
(a). Prove that v is an eigenvector of A.
(b). Solve the equation v T x = 0 and prove that every x ∈ R3 which satisfies the equation
is an eigenvector of A.
(c). Find an orthogonal matrix Q and a diagonal matrix D such that QT AQ = D.
 
2 −1 −1
 
2.(a).Find the eigenvalues and eigenvectors of A = 
 −1 −1  . 2
−2 2 −1
(b). Determine the algebraic and geometric multiplicities of each eigenvalue of A.
 
2 −2 3
 
3. Consider the matrix A = 
 1 1 
1 .
1 3 −1
(a). Find characteristic polynomial of A.
(b). Find all the eigenvalues of A and their corresponding eigenvectors.
(c). Show by a direct calculation that A obeys the Cayley-Hamilton Theorem.
(d). Compute A4 using the Cayley-Hamilton Theorem.
(e). Compute A−1 .
(f). Show that the eigenvectors of A are linearly independent.
( )
−5 8
4. Suppose A is the 2 × 2 matrix A = .
−4 7
(a). Find the eigenvalues of A.
(b). Find the eigenspaces of A.

77
(c). For the polynomial p(x) = 3x2 − x + 2, compute p(A).
(d). Find a diagonal matrix similar to A.

5. Find the
( eigenvalues,
) eigenspaces, algebraic and geometric multiplicities for the ma-
−1 2
trix B = .
−6 6

6. A matrix A is idempotent if A2 = A. Show that the only possible eigenvalues of an


idempotent matrix are λ = 0 and λ = 1.

7.(a). Suppose A is an n × n matrix. Suppose λ is an eigenvalue of A and k ≥ 0 is an


integer. Prove that λk is an eigenvalue of Ak .
(b). Suppose that A is a non-singular matrix and λ is an eigenvector of A. Prove that
1
λ
is an eigenvalue of A−1 .

8. Suppose A and B are similar matrices with A non-singular. Prove that B is non-
singular and that A−1 is similar to B −1 .
( )
1 α
9. Prove that the eigenvectors of the matrix A = , where α ̸= 0, generates a
0 1
one-dimensional space. Find a basis for this space.
 
2 2 0
 
10. Prove that the eigenvectors of the matrix A =  0 2 0 

 generates a 2-dimensional
0 0 2
space and find a basis for this space.

11. Find 
the eigenvalues
 and eigenvectors of each of the following matrices:
1 1 1
 
(a). A = 
 0 1 1 

0 0 1

78
 
1 1 0
 
(b). B = 
 0 1 1 

0 0 1
( )
0 2
(c).
−2 0
(Hint: In (c) expect complex eigenvalues and eigenvectors.)

12. Diagonalize
 the matrices

−1 3 −1
 
(a). A = 
 −3 5 −1 

−3 3 1
 
1 1 1 1
 
 1 1 −1 −1 
 
(b).  .
 1 −1 1 −1 
 
1 −1 −1 1
d
13. Find the eigenvalues and eigenvectors of the linear operator dt
in the space of poly-
nomials Pn of degree at most n with real coefficients.
 
0 1 −3
 
14. Let A = 
 −1 −3 3 .
−1 −1 0
(a). Find the eigenvalues and corresponding eigenvectors of A.
(b). Determine the algebraic and geometric multiplicities of each eigenvalue of A.
(c). Explain why A is not similar to a diagonal matrix.
 
3 2 1
 
15. Show that λ = 2 is an eigenvalue of A = 
 0 0 
.2
−2 −3 0
Find αA (λ) and GA (λ). Can you conclude anything about the diagonalizability of A
from these results? ( )
7 1
5 5
16. Compute limn→∞ An for A = .
−1 1
2

79
17.(a). Find a matrix with eigenvalues 2 and 5.
(b). Prove that a square matrix is singular if and only if 0 is one of its eigenvalues.
(c). Suppose v is an eigenvector of an n × n matrix A associated with the eigenvalue λ.
Suppose P is a non-singular n × n matrix. Show that P −1 v is an eigenvector of P −1 AP
associated with the eigenvalue λ.
(d). Suppose that A and B are n × n matrices. Suppose v is an eigenvector of A asso-
ciated with the eigenvalue λ. Suppose v is also an eigenvector of B associated with the
eigenvalue ρ. Show that v is an eigenvector of A + B associated with λ + ρ.

18. Prove that if A is diagonalizable, then so is every matrix that is similar to A.

19. Consider the linear operator T : C(R) −→ C(R) defined by T (f (x)) = f (x − 1),
where C(R) denotes the set of all continuous functions from R to R.
(a). Show that the function f (x) = ex is an eigenvector of T associated with the
eigenvalue e−1 .
(b). Show that the function f (x) = e2x is an eigenvector of T . What is the associated
eigenvalue?
(c). Show that any negative number is an eigenvalue of T .
(d). Show that 0 is not an eigenvalue of T .

80
Chapter 3

MINIMAL POLYNOMIAL OF A
SQUARE MATRIX

3.1 Introduction

We continue our study to a unique polynomial extracted from the characteristic poly-
nomial of a square matrix.

Objectives
At the end of this lecture, you should be able to:

• Determine the minimal polynomial of any square matrix.

• Understand the relationship between characteristic polynomial and minimal


polynomial of a square matrix.

• Find eigenvalues and corresponding eigenvectors of a square matrix.

• Determine whether a square matrix is diagonalizable by using its minimal poly-


nomial.

• Determine whether two square matrices of the same size and in the same field
are similar.

Definition 3.1 A polynomial f (t) ̸= 0 is said to be monic if its leading coefficient

81
equals 1.

Let A be a square matrix. Let J(A) denote the collection of all polynomials f (t) for
which A is a root,i.e., for which f (A) = 0. The set J(A) is not empty, since the Cayley-
Hamilton Theorem tells us that the characteristic polynomial χA (t) of A belongs to J(A).
Let mA (t) denote the monic polynomial of lowest degree in J(A). Such a polynomial
exists and is unique. We call mA (t) the minimal polynomial of the matrix A. That is,
the minimal polynomial mA (t) of A is a monic polynomial of smallest degree for which
mA (A) = 0. The Cayley-Hamilton Theorem ensures or guarantees the existence of a
minimal polynomial. Clearly, its degree is less or equal to the degree of the characteristic
polynomial.

Example 3.1 f (t) = t2 + 3t + 1 and g(t) = t10 − 5t6 + 7t2 + 5 are monic polynomials.

We say that a polynomial p(x) annihilates a matrix A if p(A) = 0. This is equivalent


to saying that the polynomial p(x) has A as a root. By the Cayley-Hamilton Theorem,
the characteristic polynomial χA (x) of A annihilates A. Sometimes an n × n matrix A
is annihilated by a monic polynomial whose degree is less than n.

Theorem 3.1 The minimal polynomial p(x) of A is unique.

Proof Let p′ (x) be another monic polynomial has the same degree as p(x) and such
that p′ (A) = 0. Then p(x) − p′ (x) will be a polynomial annihilating A and having degree
strictly less than deg p. It follows that p′ (x) = p(x).

By the division algorithm, every polynomial q(x) = s(x).p(x) + r(x), where deg r <
deg p(x). The polynomial r(x) in this case is called the remainder. We say that a
polynomial p(x) divides another polynomial q(x) if there exists a polynomial s(x) such
that q(x) = s(x).p(x). In other words, p(x) divides q(x) if we can take the remainder
r(x) to be zero.

Example 3.2 Let p(x) = x − 2 and q(x) = x4 − 4x + 4. Clearly p(x) divides q(x), since
q(x) = (x − 2)(x − 2).

Clearly, every nonzero constant polynomial divides every polynomial.

82
Theorem 3.2 The minimal polynomial mA (t) of a matrix A divides every polynomial
that has A as a root. In particular, mA (t) divides the characteristic polynomial χA (t) of
A. Moreover, the minimal polynomial mA (t) of A has A as a root. That is, mA (A) = 0.

Proof. Suppose that f (t) is a polynomial for which f (A) = 0. By the division al-
gorithm, there exist polynomials q(t) and r(t) for which f (t) = mA (t)q(t) + r(t) and
r(t) = 0 or deg r(t) < deg mA (t). Substituting t = A in this equation and using that
f (A) = 0 and mA (A) = 0, we obtain r(A) = 0. If r(t) ̸= 0, then r(t) is a polynomial of
degree less than mA (t) that has A as a root. This contradicts the definition of the mini-
mal polynomial. Thus r(t) = 0, and so f (t) = mA (t)q(t), that is, mA (t) divides f (t).

There is an even stronger relationship between the characteristic polynomial and the
minimal polynomials of A.

Theorem 3.3 The characteristic polynomial χA (t) and the minimal polynomial mA (t)
of A have the same irreducible factors.

Proof. Suppose that f (t) is an irreducible polynomial. If f (t) divides mA (t), then f (t)
( )
divides χA (t) since mA (t) divides χA (t) . On the other hand, if f (t) divides χA (t),
then f (t) also divides [mA (t)]n . But f (t) is irreducible; hence f (t) also divides mA (t).
Thus mA (t) and χA (t) have the same irreducible factors.

Clearly, χA (λ) = ±mA (λ) if χA (λ) has distinct linear factors.

Remark 3.1

Note that an irreducible factor is a factor that cannot be expressed as the product of
two or more non-trivial factors. Irreducibility depends on the field K: a polynomial may
be irreducible over some fields but reducible over others. For instance, p(x) = x2 + 1 is
irreducible over R but reducible over C. The polynomial q(x) = x2 − 2 is irreducible in
Q but reducible in R. Clearly , all polynomials of degree 1 are irreducible.
Note that the Theorem 3.2 does not say that mA (t) = χA (t), only that any irreducible
factor of one must divide the other. In particular, since a linear factor is irreducible,
mA (t) and χA (t) have the same linear factors. Hence they have the same roots. Thus
we have the following theorem.

83
Theorem 3.4 A scalar λ is an eigenvalue of a square matrix A if and only if λ is a
root of the minimal polynomial of A.

Clearly any nonconstant polynomial f (t) is expressible as the product of irreducible


polynomials. Also if f (t) is monic, then it can be expressed as a product of monic
irreducibles.
( )
1 1
Example 3.3 Let A = . Find the minimal polynomial of A.
0 1

Solution The characteristic polynomial of A is χA (λ) = (λ − 1)2 . The minimal polyno-


mial of A is either f (t) = (λ − 1)2 or g(t) = (λ − 1) but not both, since it is unique. In
addition, it must annihilate A and be of least degree . The polynomial f (t) satisfies all
these conditions by( the Cayley-Hamilton
) ( ) Theorem. We need only check g. Note that
0 1 0 0
g(A) = (A − I) = ̸= .
0 0 0 0
Therefore g(λ) is not the minimal polynomial of T . Hence the minimal polynomial of A
is mA (λ) = (λ − 1)2 . Notice in this example that χA (λ) = mA (λ).
 
2 2 −5
 
Example 3.4 Find the minimal polynomial mA (λ) of A =  
 3 5 −15 .
1 2 −4

Solution. Check that the characteristic polynomial of A is χA (λ) = λ3 − 5λ2 + 7λ − 3 =


(λ−1)2 (λ−3). The minimal polynomial mA (λ) must divide χA (λ). Also each irreducible
factor of χA (λ), that is λ − 1 and λ − 3, must also be a factor of mA (λ). Thus mA (λ)
is exactly one of the following: f (λ) = (λ − 1)(λ − 3) or g(λ) = (λ − 1)2 (λ − 3). We
know by the Cayley-Hamilton Theorem that g(A) = χA (A) = 0. Hence we need only
test f (λ). We have
    
1 2 −5 −1 2 −5 0 0 0
    
f (A) = (A − I)(A − 3I) =  
 3 6 −15   3 4 −15   
 =  0 0 0 .
1 2 −5 1 2 −7 0 0 0

Thus
f (λ) = mA (λ) = (λ − 1)(λ − 3) = λ2 − 4λ + 3

is the minimal polynomial of A.

84
3.2 Minimal polynomial of a Block Triangular Matrix

We investigate the case of a diagonal block matrix. The case of a block triangular matrix
is similar.

Theorem 3.5 Suppose that M is a block diagonal matrix with diagonal blocks A1 , A2 , ..., Ar .
Then the minimal polynomial of A is equal to the least common multiple (LCM) of the
minimal polynomials of the diagonal blocks A1 , A2 , ..., Ar .

Proof. We prove the theorem( for the)case r = 2. The general theorem follows easily
A 0
by induction. Suppose M = , where A and B are square matrices. We need
0 B
to show that the minimal polynomial mM (t) of M is the least common multiple of the
minimal polynomials mA (t) and mB (t) of A and B, respectively. Since mM (t) is the
minimal polynomial of M ,
( )
mA (A) 0
mM (M ) = = 0,
0 mB (B)

and hence mA (A) = 0 and mB (B) = 0. Since mA (t) is the minimal polynomial of A,
mA (t) divides mM (t). Similarly, mB (t) divides mM (t). Thus mM (t) is a multiple of
mA (t) and mB (t). Now let f(t) be another multiple of mA (t) and mB (t). Then
( ) ( )
f (A) 0 0 0
f (M ) = = .
0 f (B) 0 0

But mM (t) is the minimal polynomial of M ; hence mM (t) divides f (t). Thus mM (t) is
the least common multiple of mA (t) and mB (t).

Example 3.5 Find the characteristic polynomial and the minimal polynomial of the
 
2 5 0 0 0
 
 0 2 0 0 0 
 
 
block diagonal matrix A =  0 0 4 2 0  .
 
 0 0 3 5 0 
 
0 0 0 0 7

85
( ) ( )
2 5 4 2
Solution. Note that A = diag(A1 , A2 , A3 ) where A1 = , A2 = , A3 =
0 2 3 5
[7]. The characteristic polynomial χA (λ) of A is the product of the characteristic poly-
nomials χA1 (λ), χA2 (λ) and χA3 (λ) of A1 , A2 and A3 , respectively. One can show
that χA1 (λ) = (λ − 2)2 , χA2 (λ) = (λ − 2)(λ − 7) and χA3 (λ) = (λ − 7). Thus
χA (λ) = (λ − 2)3 (λ − 7) is the characteristic polynomial of A.
It is easy to check that the minimal polynomials mA1 (λ), mA2 (λ) and mA3 (λ) of A1 , A2
and A3 , respectively, are equal to their characteristic polynomials. That is
mA1 (λ) = (λ−2)2 , mA2 (λ) = (λ−2)(λ−7) and mA3 (λ) = (λ−7). But mA (λ) is equal to
{ }
the least common multiple of these three polynomials,.i.e. mA (λ) = LCM mA1 (λ), mA2 (λ), mA3 (λ) .
Thus mA (λ) = (λ − 2)2 (λ − 7).

3.3 Minimal polynomials of Diagonalizable matrices

Theorem 3.6 Let A be a n × n. Then A is diagonalizable if and only if the minimal


polynomial of A is a product of irreducible factors.

Remark 3.2

Note that the Theorem 3.6 says that for A to be diagonalizable with eigenvalues λ1 , λ2 , ..., λk ,
then its minimal polynomial of A is of the form
mA (λ) = (λ − λ1 )α1 (λ − λ2 )α2 ...(λ − λk )αk , where α1 = α2 = ... = αk = 1.

3.4 Minimal polynomials of Similar matrices

In Chapter two, we have seen that similar matrices have the same characteristic poly-
nomials and that the converse is not true. We gave examples of matrices with the same
characteristic polynomial and yet they are not similar. In this section we prove that if
two matrices of the same size have the same minimal polynomials and are both diago-
nalizable or both non-diagonalizable, then they must be similar. We also note although
similar matrices must have the same characteristic polynomial, they need not have the
same minimal polynomials.

86
Theorem 3.7 Similar matrices have the same minimal polynomials.

Proof Recall that similar matrices have the same characteristic polynomials. Suppose
A and B are similar. Then χA (x) = χB (x). Suppose mA (x) ̸= mB (x). Then mA (x) =
mB (x) + r(x), where r(x) ̸= 0. Note that since mA (x) divides χA (x) and mB (x) divides
χB (x), we have that χA (x) = mA (x).p(x) and χB (x) = mB (x).q(x), for some non-zero
polynomials p(x) and q(x). Therefore χA (x) = (mB (x) + r(x)).p(x) = mB (x).p(x) +
r(x).p(x). But χA (x) = χB (x) implies that mB (x).p(x) + r(x).p(x) = mB (x).q(x) which
(
implies that mB (x).p(x) − mB (x).q(x) + r(x).p(x) = 0 which implies that mB (x) p(x) −
)
q(x) + r(x).p(x) = 0. This is true only if p(x) = q(x) and r(x) = 0, which are both
contradictions to our assumption that r(x) ̸= 0. This proves that mA (x) = mB (x).

Remark 3.3

The converse of Theorem 3.7 is not true in general. It is true only if the two matrices
are both diagonalizable or both non-diagonalizable and the minimal polynomials are
products of distinct linear factors. Note also two matrices having the same characteristic
polynomial does not generally imply that they have the same minimal polynomial.

Example 3.6 Let A and B be the following matrices:


   
2 1 0 2 1 0
   
A=  0 2 1  , B =  0 2 0 .
 
0 0 2 0 0 2

(a). Find the characteristic and minimal polynomials of A and B.


(b). Are A and B diagonalizable? Give a reason for your answer.
(c). Is A similar to B? Give a reason for your answer.

Solution
(a). By inspection, χA (λ) = χB (λ) = (λ − 2)3 . A simple computation shows that
mA (λ) = (λ − 2)3 and mA (λ) = (λ − 2)2 .
(b). Clearly A and B are not diagonalizable since their minimal polynomials are not
products of distinct linear factors.
(c). A and B are not similar. Although they are both non-diagonalizable, they do not
have the same minimal polynomials.

87
Remark 3.4

Note that two matrices of the same size can have the same minimal and characteristic
polynomials without being similar.

Example 3.7 Let A and B be the following matrices


   
0 1 0 0 0 1 0 0
   
 0 0 0 0   0 0 0 0 
   
A= , B =  
 0 0 0 1    0 0 0 0 
  
0 0 0 0 0 0 0 0

Clearly χA (λ) = χA (λ) = λ4 and mA (λ) = mA (λ) = λ2 .

3.5 Solved Exercises

1. Suppose A ̸= I is a square matrix for which A3 = I. Determine whether or not A is


similar to a diagonal matrix if A is a matrix over
(a). the real field R.
(b). the complex field C.

Solution. Since A3 = I, A has a zero of the polynomial f (t) = t3 −1 = (t−1)(t2 +t+1).


The minimal polynomial mA (t) of A cannot be t−1, since A ̸= I. Hence mA (t) = t2 +t+1
or mA (t) = t3 − 1. (a). Neither polynomial is a product of linear polynomials over R.
Hence A is not diagonalizable over R.
(b). On the other hand, each of the polynomials is a product of linear polynomials over
C. Hence A is diagonalizable over C.
 
0 0 1
 
2. Let A =  0 1 0 .

0 0 0
(a). Find the characteristic polynomial.
(b). Find the minimal polynomial.
(c). Without further computation, determine whether or not A diagonalizable.
Solution

88
(a). A simple computation gives the characteristic polynomial χA (λ) = λ(λ − 1)2 .
(b). We extract the minimal polynomial of A from the characteristic polynomial: it
must divide the characteristic polynomial, is of least degree, it is monic, has same lin-
ear(irreducible) factors and hence same roots and has A as a root. Thus the minimal
polynomial mA (λ) is either f (λ) = λ(λ − 1)2 or g(λ) = λ(λ − 1). We only check g since
f being the characteristic
  polynomial satisfies
  all the above
 conditions.
0 0 1 −1 0 1 0 0 0
    
A(A − I) =    
 0 1 0   0 0 0  =  0 0 0 .

0 0 1 0 0 0 0 0 0
Thus g(λ) = mA (λ) = λ(λ − 1) is the minimal polynomial of A.
(c). Since the minimal polynomial of A is a product of linear(irreducible) factors, A is
daigonalizable.
 
2 1 0 0 0
 
 0 2 0 0 0 
 
 
3. Let A =  0 0 2 0 0 .
 
 0 0 0 −1 1 
 
0 0 0 0 −1

(a). Find χA (λ) and mA (λ).


(b). Is A similar to a diagonal matrix? Give a reason for your answer.
Solution
(a). By inspection χA (λ) = (λ+1)2 (λ−2)3 . Note that A is block diagonal with diagonal
blocks ( ) ( )
2 1 −1 1
A1 = , A2 = [2] = 2, A3 =
0 2 0 −1
with minimal polynomials χA1 ) (λ) = (λ − 2)2 , χA2 ) (λ) = (λ − 2) and χA3 (λ) = (λ + 1)2 .
{ } { }
Therefore χA (λ) = LCM χA1 (λ), χA2 (λ), χA3 (λ) = LCM (λ−2)2 , λ−2, (λ+1)2 =
(λ + 1)2 (λ − 2)2
(b). A is not similar to a diagonal matrix because the minimal polynomial of A is not
a product of distinct linear factors.

Note that the minimal polynomial is unique and does not depend on how we choose our

89
diagonal blocks. In Problem (3), we several other choices of the diagonal blocks.

3.6 Exercises
 
−1 −2 −2
 
1. Let A = 
 1 2 1 . (a). Find the characteristic polynomial of A.
−1 −1 0
(b). Find the minimal polynomial of A.

′′ ′
2. Let T : P2 −→ P2 be defined by T p(x) = (1 − x2 )p (x) − xp (x) + 2p(x), where P2

denotes the vector space of polynomials of degree less or equal to 2 and p (x) and p”
denotes the first and second derivative of p(x), respectively.
(a). Find a matrix representation A for T .
(b). Find the characteristic polynomial and minimal polynomial of A.
(c). Is A diagonalizable? Give a reason for your answer.
 
1 1 1 1
 
 1 1 1 1 
 
3. Let B =  .
 1 1 1 1 
 
1 1 1 1
(a). Find the characteristic polynomial of B.
(b). Find the minimal polynomial of B.
(c). Is B diagonalizable? Give a reason for your answer.
 
1 2 0 0
 
 0 −1 0 0 
 
4. Let M =  .
 0 0 1 0 
 
0 0 −1 2
(a). Find the characteristic polynomial of M .
(b). Find the minimal polynomial of M .
(c). Is M diagonalizable? Give a reason for your answer.

90
 
2 0 1 0 0
 
 0 2 3 0 0 
 
 
5. Let N =  0 0 1 0 0 .
 
 0 0 0 1 1 
 
0 0 0 1 1
(a). Find the characteristic polynomial of N .
(b). Find the minimal polynomial of N .
(c). Is N diagonalizable? Give a reason for your answer.

6.(a). Give an example of a 3 × 3 complex matrix whose minimal polynomial equals λ2 .


(b). Give an example of a 4 × 4 complex matrix whose minimal polynomial equals
λ(λ − 1)2 .
   
1 1 1 1 1 0
   
7. Let A =  0 1 1  and B =  0 1 1 .
  
0 0 1 0 0 1
(a). Find the characteristic polynomials of A and B.
(b). Find the minimal polynomials of A and B.
(c). Are A and B diagonalizable? Give a reason for your answer.
(d). Use your results to determine whether or not A and B are similar.
 
3 1 0 0 0
 
 0 3 0 0 0 
 
 
8. Let A =  0 0 3 1 0 .
 
 0 0 0 3 0 
 
0 0 0 0 3
(a). Find χA (λ) and mA (λ).
(b). Is A similar to a diagonal matrix? Give a reason for your answer.

9. Explain why similar matrices have the same minimum and characteristic polynomials.

91
Chapter 4

LINEAR FUNCTIONALS

4.1 Introduction

In this chapter we will study linear mappings from a vector space V into its field of
scalars K. Here, we view K as a vector space over itself.

Objectives
At the end of this lecture, you should be able to:

• Define a linear functional.

• Construct examples of common functionals.

• Find dual basis of a given basis.

Definition 4.1 Let V be a vector space. A linear functional is a linear mapping


T : V −→ K, where K is the scalar field for V .

Remark 4.1

In Definition 4.1 since T is linear, we have T (v 1 + v 2 ) = T v 1 + T v 2 and T (αv) = αT v


for every v 1 , v 2 , v ∈ V and α ∈ K. Equivalently, T (αu + βv) = αT u + βT v, for every
u, v ∈ V and α, β ∈ K.

92
The following are some examples of linear functionals:

1. Let T : Rn −→ R be defined by T x = ni xi , where x = (x1 , x2 , ..., xn ). Then T is
a linear functional on V = Rn .

∫1
2. Let T : C([0, 1]) −→ R be defined by T x = 0
x(t)e−t dt, where V = C([0, 1]) denotes
the vector space of continuous functions on the closed interval [0, 1]. Then T is a linear
functional on V .

3. Let T : C([0, 1]) −→ R be defined by T x = x(t0 ), where V = C([0, 1]) denotes the
vector space of continuous functions on the closed interval [0, 1] and t0 is a fixed number
in [0, 1]. This is called the evaluation map. It is easy to check that T is a linear
functional on V = C([0, 1]).

4. Let πi : Rn −→ R be the ith projection mapping πi (x1 , x2 , ..., xn ) = xi . Then π is a


linear functional on Rn .

5. Let V = Mn be the vector space of n × n matrices over a scalar field K. Let


T : V −→ K be the trace mapping T (A) = a11 + a22 + ... + ann , where A = [aij ].
T assigns to a matrix A ∈ V the sum of its diagonal elements. Then T is a linear
functional.

4.2 Dual Space

The set of linear functionals on a vector space V over a field K is also a vector space
over K, with addition and scalar multiplication defined by

(ϕ + ψ)(v) = ϕ(v) + ψ(v)

and
(αϕ)(v) = αϕ(v),

where ϕ and ψ are linear functionals on V and α ∈ K. This space is called the dual
space of V , and is denoted by V ∗ .

93
4.3 Dual Basis

Suppose V is a vector space of dimension n over K. The dual space V ∗ has dimension
n (since K is of dimension 1 over itself). In fact, each basis of V determines a basis for
V ∗.

4.3.1 Determining Dual Basis given a Basis for a Vector Space

Theorem 4.1 Suppose B = {v1 , v2 , ..., vn } is a basis of a vector space V over a scalar
field K. Let ϕ1 , ϕ2 , ..., ϕn ∈ V ∗ be linear functionals as defined by
{
1, if i = j
ϕi (vj ) = δij =
0, if i ̸= j
Then B ∗ = {ϕ1 , ϕ2 , ..., ϕn } is a basis of V ∗ .

The above basis B∗ is termed the basis dual to B or the dual basis. The above formula
is a short way of writing

ϕ1 (v1 ) = 1, ϕ1 (v2 ) = 0, ϕ1 (v3 ) = 0, ..., ϕ1 (vn ) = 0


ϕ2 (v1 ) = 0, ϕ2 (v2 ) = 1, ϕ(2 v3 ) = 0, ..., ϕ2 (vn ) = 0
... ... ... ... ...
ϕn (v1 ) = 0, ϕn (v2 ) = 0, ϕn (v3 ) = 0, ..., ϕn (vn ) = 1
These linear mappings ϕi are unique and well-defined.

( ) ( )
{ 2 3 }
Example 4.1 Consider the basis B = v1 = , v2 = of R2 . Find the
1 1

dual basis B = {ϕ1 , ϕ2 }.

Solution. We seek linear functional ϕ1 (x, y) = ax + by and ϕ2 (x, y) = cx + dy such that

ϕ1 (v1 ) = 1, ϕ1 (v2 ) = 0
ϕ2 (v1 ) = 0, ϕ2 (v2 ) = 1.
These four conditions lead to the following two systems of linear equations:

94
ϕ1 (v1 ) = ϕ1 (2, 1) = 2a + b = 1,
ϕ1 (v2 ) = ϕ1 (3, 1) = 3a + b = 0
and
ϕ2 (v1 ) = ϕ1 (2, 1) = 2c + d = 0,
ϕ2 (v2 ) = ϕ1 (3, 1) = 3c + d = 1
with solutions a = −1, b = 3 and c = 1, d = −2. Hence ϕ1 (x, y) = −x + 3y and
ϕ2 (x, y) = x − 2y form the dual basis. Therefore B ∗ = {−x + 3y, x − 2y}.

4.4 Solved Exercises

1. Find the basis {ϕ1 , ϕ2 , ϕ3 } that is dual to the basis


     
{ 1 0 0
     }
v1 =   
 −1  2  1  3  3 
, v =  , v =  
3 −1 −2

of R3 .

Solution. We seek linear functional ϕ1 (x, y, z) = a1 x + a2 y + a3 z, ϕ2 (x, y, z) = b1 x +


b2 y + b3 z and ϕ3 (x, y, z) = c1 x + c2 y + c3 z such that
{
1, if i = j
ϕi (vj ) = δij =
0, if i ̸= j

We find ϕ1 by setting ϕ1 (v1 ) = 1, ϕ1 (v2 ) = 0, ϕ1 (v3 ) = 0. This yields

ϕ1 (1, −1, 3) = a1 − a2 + 3a3 = 1, ϕ1 (0, 1, −1) = a2 − a3 = 0, ϕ1 (0, 3, −2) = 3a2 − 2a3 = 0.

Solving the system of equations yields a1 = 1, a2 = 0, a3 = 0. Thus

ϕ1 (x, y, z) = x.

We find ϕ2 by setting ϕ2 (v1 ) = 0, ϕ2 (v2 ) = 1, ϕ2 (v3 ) = 0. This yields

ϕ2 (1, −1, 3) = b1 − b2 + 3b3 = 0, ϕ2 (0, 1, −1) = b2 − b3 = 1, ϕ2 (0, 3, −2) = 3b2 − 2b3 = 0.

95
Solving the system of equations gives b1 = 7, b2 = −2, b3 = −3. Thus

ϕ2 (x, y, z) = 7x − 2y − 3z.

We find ϕ3 by setting ϕ3 (v1 ) = 0, ϕ3 (v2 ) = 0, ϕ3 (v3 ) = 1. This yields

ϕ3 (1, −1, 3) = c1 − c2 + 3c3 = 0, ϕ3 (0, 1, −1) = c2 − c3 = 0, ϕ2 (0, 3, −2) = 3c2 − 2x3 = 1.

Solving the system of equations gives c1 = −2, c2 = 1, c3 = 1. Thus

ϕ3 (x, y, z) = −2x + y + z.

2. Let V = P1 = {a + bt : a, b ∈ R}, the vector space of real polynomials of degree


≤ 1. Find the basis {v 1 , v 2 } of V that is dual to the basis {ϕ1 , ϕ2 } of V ∗ defined by
∫1 ∫2
ϕ1 (f (t)) = 0 f (t)dt and ϕ2 (f (t)) = 0 f (t)dt.

Solution. Let v 1 = a + bt and v 2 = c + dt. By definition of the dual basis,

ϕ1 (v 1 ) = 1, ϕ1 (v 2 ) = 0

and
ϕ2 (v 1 ) = 0, ϕ2 (v 2 ) = 1.

Thus ∫1
ϕ1 (v 1 ) = (a + bt)dt = a + 21 b = 1,
∫ 02
ϕ2 (v 1 ) = 0
(a + bt)dt = 2a + 2b = 0
and ∫1
ϕ1 (v 2 ) = (c + dt)dt = c + 12 d = 0,
∫ 02
ϕ2 (v 2 ) = 0
(c + dt)dt = 2c + 2d = 1
Solving each system yields a = 2, b = −2 and c = − 12 , d = 1. Thus {v 1 = 2 − 2t, v 2 =
− 12 + t} is the basis of V that is dual to {ϕ1 , ϕ2 }.

4.5 Exercises
{ }
1. Find the dual basis of the basis (1, 0, 0), (0, 1, 0), (0, 0, 1) of R3 .

96
2. Let V = P2 , the vector space of polynomials over R of degree ≤ 2. Let ϕ1 , ϕ2 , ϕ3 be
the linear functionals on V defined by
∫ 1
ϕ1 (f (t)) = f (t)dt, ϕ2 (f (t)) = f ′ (1), ϕ(f (t)) = f (0)
0

Here f (t) = a + bt + ct2 ∈ V and f ′ (t) denotes the derivative of f (t). Find the basis
{ } { }
f1 (t), f2 (t), f3 (t) of V that is dual to ϕ1 , ϕ2 , ϕ3 .

3. Prove that the mapping f : R3 −→ R defined by f (x, y, z) = 2x − 5y + z is a linear


functional on R3 .

4. Find the dual basis of the basis {(1, −2, 3), (1, −1, 1), (2, −4, 7)}.

97
Chapter 5

BILINEAR AND QUADRATIC


FORMS

5.1 Introduction

In this chapter we generalize the notions of linear mappings and linear functionals. We
introduce the notion of a bilinear form, which gives rise to a quadratic form. Quadratic
forms occur frequently in applications of linear algebra to engineering (in design crite-
ria and optimization) and signal processing (as output noise power). They also arise
in physics (as potential and kinetic energy), differential geometry (as normal curvature
of surfaces), economics (as utility functions), and statistics (in confidence ellipsoids).
Some of the mathematical background for such applications flows easily from our work
on symmetric matrices.

98
Objectives
At the end of this lecture, you should be able to:

• Distinguish between a linear form, a bilinear form and a quadratic form.

• Give the properties of a quadratic form.

• Determine the symmetric matrix associated with a quadratic form and vice
versa.

• Classify any quadratic form.

• Diagonalize a quadratic form.

5.2 Bilinear Forms

Definition 5.1 Let V be a vector space of finite dimension over a field K. A bilin-
ear form on V is a mapping f : V × V −→ K such that, for all α, β ∈ K, and all
u1 , u2 , v 1 , v 2 , u, v ∈ V the following two conditions are satisfied:

(1). f (αu1 + βu2 , v) = αf (u1 , v) + βf (u2 , v), i.e. f is ”linear in the first
variable”.

(2). f (u, αv 1 + βv 2 ) = αf (u, v 1 ) + βf (u, v 2 ), i.e. f is ”linear in the second


variable”.

Example 5.1 Let f be the dot product on Rn . For u = (u1 , u2 , ..., u2 ) and v = (v1 , v2 , ..., vn ),
we have
f (u, v) = u.v = u1 v1 + u2 + v2 + ... + un vn

Clearly, f is a bilinear form.

99
5.2.1 Bilinear Forms and Matrices

Let f be a bilinear form on V and let B = {u1 , u2 , ..., un } be a basis of V . Suppose


u, v ∈ V and u = a1 u1 + a2 u2 + ... + an un and v = b1 u1 + b2 u2 + ... + bn un . Then

f (u, v) = f (a1 u1 + a2 u2 + ... + an un , b1 u1 + b2 u2 + ... + bn un ) = ai bj f (ui , uj ).
i,j
2
The bilinear form f is completely determined by the n values of f (ui , uj ). The matrix
A = [aij ] where aij = f (ui , uj ) is called the matrix representation of f relative to the
basis B or, simply, the ”matrix representation of f in B ”.

f (u, v) = ai bj f (ui , uj ) = [u]tB A[v]B ,
i,j

where [u]B denotes the co-ordinate (column) vector of u in the basis B.

Definition 5.2 Let f be a bilinear form on V . Then f is called


(i). alternating of f (v, v) = 0, for every v ∈ V
(ii). skew-symmetric if f (u, v) = −f (v, u)

Theorem 5.1 Let f be a bilinear form. Then f is alternating if and only if it is skew-
symmetric.

Proof. Suppose f is an alternating bilinear form. Then for any u, v ∈ V ,

0 = f (u + v, u + v) = f (u, u) + f (u, v) + f (v, u) + f (v, v) = f (u, v) + f (v, u).

This proves that f (u, v) = −f (v, u).


Conversely, suppose f is skew-symmetric. We show it is alternating. Since for every
v ∈ V we have f (v, v) = −f (v, v), then 2f (v, v) = 0, which implies that f (v, v) = 0. 

5.2.2 Symmetric Bilinear Forms, Quadratic Forms

We now investigate the notions of symmetric bilinear forms and quadratic forms and
their representation by means of symmetric matrices.

Definition 5.3 Let f be a bilinear form on V . Then f is said to be symmetric if, for
every u, v ∈ V , f (u, v) = f (v, u).

Theorem 5.2 f is a symmetric bilinear form if and only if any matrix representation
A of f is a symmetric matrix.

100
5.2.3 Quadratic Forms

Quadratic forms are heavily used in calculus to check the second order conditions in
optimization problems. They have a particular use in econometrics, as well.

Definition 5.4 Let V be a vector space and K a scalar field. A mapping q : V −→ K


is a quadratic form if q(v) = f (v, v) for some symmetric bilinear form f on V .

Suppose f is represented by a symmetric matrix A = [aij ]. Letting X = [xi ] denote a


column vector of variables, then the quadratic form q can be represented in the form
t ∑ ∑ ∑
q(X) = f (X, X) = X AX = aij xi xj = aii x2i + 2 aij xi xj
i,j i i<j

Definition 5.5 [Alternative definition for a quadratic form] A quadratic form


q in variables x1 , x2 , ..., xn is a polynomial such that every term has degree two; that is,
∑ ∑
q(x1 , x2 , ..., xn ) = ci x2i + dij xi xj
i i<j

with a symmetric matrix A = [aij ] where cii = ci and aij = 12 dij .

Remark. These two definitions are equivalent.


If the matrix representation A of q is diagonal, then q has the diagonal representation
t
q(X) = X AX = a11 x21 + a22 x22 + ... + ann x2n

This means that the quadratic polynomial representing q has no ”cross-product” terms.
Indeed, every quadratic form has such a representation.

Remark 5.1

A quadratic form on Rn is a function q defined on Rn whose value at a vector X ∈


t
Rn can be computed by an expression of the form q(X) = X AX, where A is an
n × n symmetric matrix. The matrix A is called the matrix of/associated with the
t t
quadratic form. The simplest quadratic form is q(X) = X IX = X X = ∥X∥2 .

101
5.2.4 Classification of Real Symmetric Bilinear Forms

Definition 5.6 A real symmetric bilinear form f on V is said to be


(i). positive definite if and only if q(v) = f (v, v) > 0 for every v ̸= 0, v ∈ V .
(ii). positive(or positive semidefinite or nonnegative semidefinite) if and only
if q(v) = f (v, v) ≥ 0 for all v ∈ V .
(iii). negative definite on V if and only if q(v) = f (v, v) < 0 for all v ̸= 0, v ∈ V
(iv). negative on V if and only if q(v) = f (v, v) ≤ 0, for all v ∈ V .
(v). indefinite on V if and only if q(v 1 ) = f (v 1 , v 1 ) > 0 > q(v 2 ) = f (v 2 , v 2 ), for some
v1, v2 ∈ V .

Example 5.2 Let f be the dot product on Rn . Clearly f is a symmetric bilinear for on
Rn . Note that f is positive definite. For any u = (u1 , u2 , ..., un ) ̸= 0 in Rn ,

f (u, u) = u21 + ... + u2n > 0

Example 5.3 Let u = (x1 , x2 , x3 ) and v = (y1 , y2 , y3 ). Express f in matrix notation,


where f (u, v) = 3x1 y1 − 2x1 y3 + 5x2 y1 + 7x2 y2 − 8x2 y3 + 4x3 y2 − 6x3 y3 .

t
Solution. Let A
= [aij ], where
 aij isthe coefficient of xi yj . Then f (u, v) = X AY =
( ) 3 0 −2 y1
 
x1 x2 x3   5 7 −8  
  y2 

0 4 −6 y3

Example 5.4 Find the symmetric matrix that corresponds to each of the following
quadratic forms:
(a). q(x, y, z) = 3x2 + 4xy − y 2 + 8xz
(b). q(x, y, z) = 2x2 − 5y 2 − 7z 2

Solution     
x ( ) 3 2 4 x
   
(a). Let X =   x y z   
t
 y  . Then q(x, y, z) = X AX =  2 −1 0   y .
z 4 0 0 z
Thus

102
 
3 2 4
 
A= 
 2 −1 0 
4 0 0

(b). Proceeding
 as in
 (a), it is clear that
2 0 0
 
A=  0 −5 0 

0 0 −7

Example 5.5 Find the quadratic form q(X) that corresponds to the following symmetric
( )
5 −3
matrix A = .
−3 8
Solution( )
x
Let X = . Then
y
t
q(x, y) = X AX = 5x2 − 6xy + 8y 2

5.3 Testing for Positive Definiteness

There are several ways to determine whether or not a real quadratic form is positive
definite.

Theorem 5.3 (Sylvester’s Test) Let q be a symmetric quadratic form on a vector


space V with matrix A in some basis. Then q is positive definite if and only if all the
principal minors of A are strictly positive. That is det(A(k) ) > 0.

Recall: Principal
 minors are the determinants
 of the principal submatrices A(k) of A.
a11 a12 ... a1n
  ( )
 a ... a2n 
 21 a22  a11 a 12
Suppose A =  . .. .. . . Then A = [a11 ], A =
(1) (2)
, ..., A(n) =
 .. . . ..  a21 a22
 
an1 an2 ... ann
A.

Note that Sylvester’s test is named after the famous English and American mathemati-
cian James Joseph Sylvester (1814-1897).

103
Theorem 5.4 (Principal Axis Theorem). A quadratic form q is positive definite
if and only if all the eigenvalues of A, the symmetric matrix representation of q in some
basis of V are strictly positive.

Theorem 5.5 (Diagonalization Theorem). A quadratic form q on V with sym-


metric matrix representation A in some basis of V is positive definite if all the diagonal
entries of the diagonal representation D obtained from A of q are strictly positive.

Note that the diagonal entries of D are the eigenvalues of A.


 
1 1 0
 
Example. For which values of α will the matrix A =  1 2 α  be positive definite?

0 α 1
Solution. The principal minors of A are det(A(1) ) = 1 > 0, det(A(2) ) = 2 − 1 = 1 > 0
and det(A(3) ) = det(A) = 1 − α2 > 0 if − 1 < α < 1. Thus A is positive definite iff
− 1 < α < 1, or |a| < 1.

5.3.1 Change of Variable in a Quadratic Form

If X represents a variable vector in Rn , then a change of variable is an equation of the


form
X = Pu (5.1)

or equivalently, u = P −1 X, where P is an invertible matrix and u is a new variable


vector in Rn . Here u is the coordinate vector of X relative to the basis of Rn determined
by the columns of P .
t
If the change of variable (5.1) is made in a quadratic form q(X) = X AX, then
t
q(X) = X AX = (P u)t A(P u) = ut P t AP u = ut (P t AP )u

and the new matrix of the quadratic form is P t AP . If P orthogonally diagonalizes A,


then P t = P −1 and P t AP = P −1 AP = D. Thus the matrix of the new quadratic form
is diagonal.

Example 5.6 Make a change of variable that transforms the quadratic form

q(x, y) = x2 − 8xy − 5y 2

104
into a quadratic form with no cross-product term.
( )
1 −4
Solution The matrix of the quadratic form is A = . First step is to
−4 −5
orthogonally diagonalize A. Its eigenvalues turn out to be λ = 3 and λ = −7. The
corresponding unit(normalized) eigenvectors are
( ) ( )
√2 √1
λ = 3 : v1 = 5
; λ = −7 : v 2 = 5
− √15 √2
5

These vectors are orthogonal and so provide an orthonormal basis for R2 . Let
( ) ( )
√2 √1 3 0
5 5
P = [v 1 v 2 ] = ,D =
− 5
√ 1 √2
5
0 −7

Then A = P DP −1 and D = P −1 AP = P t AP . ( ) ( )
x u1
A suitable change of variable is X = P u, where X = and u = .
y u2
Then
t
q(x, y) = x2 − 8xy − 5y 2 = X AX = (P u)t A(P u) = ut P t AP u = ut Du = 3u21 − 7u22

Theorem 5.6 (Principal Axes Theorem)Let A be an n×n symmetric matrix. Then


there is an orthogonal change of variable X = P u that transforms the quadratic form
t
q(X) = X AX into a quadratic form ut Du with no cross-product term.

Remark 5.2

The columns of P in Theorem 5.6 are called the principal axes of the quadratic from
t
q(X) = X AX. The vector u is the coordinate vector of X relative to the orthonormal
basis of Rn given by these principal axes. Quadratic forms are important in statistics in
the form of the covariance matrix. Here the change of basis to diagonalize the matrix
leads to a decomposition of a multivariate distribution as a product of independent
distributions in the direction of the new basis vectors.

105
5.3.2 Geometric View of Principal Axes
t
Suppose q(X) = X AX, where A is an invertible 2 × 2 symmetric matrix, and let c be
a constant. It can be shown that the set of all X in R2 that satisfy
t
q(X) = X AX = c

either corresponds to an ellipse(or circle), a hyperbola, two intersecting lines, a single


point, or contains no points at all. If A is a diagonal matrix, the graph is said to be in
standard position.

Figure 5.1: Ellipse and hyperbola in standard position

Figure 5.2: Ellipse and hyperbola not in standard position

We now present another method we can use to classify quadratic forms.

Theorem 5.7 (Quadratic forms and Eigenvalues) Let A be an n × n symmetric


t
matrix. A quadratic form q(X) = X AX is

106
(i). positive definite if and only if the eigenvalues of A are all positive, i.e λi > 0 for
all i.
(ii). positive(or positive semidefinite or nonnegative semidefinite) if and only
if the eigenvalues of A are non-negative, i.e λi ≥ 0 for all i.
(iii). negative definite on V if and only if the eigenvalues of A are negative, i.e.
λi < 0 for all i.
(iv). negative on V if and only if the eigenvalues of A are non-positive, i.e. λi ≤ 0 for
all i.
(v). indefinite on V if and only if A has both positive and negative eigenvalues.

Proof. By the Principal Axes Theorem, there exists an orthogonal change of variable
X = P u such that
t
q(X) = X AX = ut Du = λ1 u21 + λ2 u22 + ... + λn u2n (5.2)

where λ1 , ..., λn are the eigenvalues of A. Since P is invertible, there exists a one-to-
one correspondence between all nonzero X and all nonzero u. Thus the values of q(X)
for X ̸= 0 coincide with the values of the expression on the right hand side of (5.2),
which is obviously controlled by the signs of the eigenvalues λ1 , ..., λn in all the five cases
described in the theorem.

Example 5.7 Is the quadratic form

q(x1 , x2 , x3 ) = 3x21 + 2x22 + x23 + 4x1 x2 + 4x2 x3

positive definite ? Give a reason for your answer.


 
3 2 0
 
Solution A =  
 2 2 2  and the eigenvalues of A are 5, 2 and −1. This proves that
0 2 1
q is an indefinite quadratic form.

5.4 Constrained Optimization

Engineers, economists, scientists and mathematicians often need to find the maximum
or minimum value of a quadratic form q(X) for X in some specified set. Typically, the

107
problem can be arranged so that X varies over the set of unit vectors. This is called
constrained optimization problem.
When a quadratic form q(X) has no cross-product terms, it is easy to find the maximum
t
and minimum of q(X) for X X = 1.

Example 5.8 Find the maximum and minimum values of q(x1 , x2 , x3 ) = 9x21 +4x22 +3x23
subject to the constraint x21 + x22 + x23 = 1.

Solution Since x22 and x23 are non-negative, note that 4x22 ≤ 9x22 and 3x23 ≤ 9x23 and
q(x1 , x2 , x3 ) = 9x21 + 4x22 + 3x23
≤ 9x21 + 9x22 = 9x23
hence whenever x21 + x22 + x23 = 1. So the maximum
= 9(x21 + x22 + x23 )
= 9
value of q(X) cannot exceed 9 when X is a unit vector. Furthermore q(X) = 9 when
t
X = (1, 0, 0). Thus 9 is the maximum value of q(X) for X X = 1.
To find the minimum value of q(X), observe that

9x21 ≥ 3x21 , 4x22 ≥ 3x22

and hence
q(X) ≥ 3x21 + 3x22 + 3x23 = 3(x21 + x22 + x23 ) = 3
whenever x21 + x22 + x23 = 1. Also, q(X) = 3 when x1 = 0, x2 = 0, and x3 = 1. So 3 is
t
the minimum value of q(X) when X X = 1.

Remark 5.3

It is easy to see from Example 5.8 that the matrix of the quadratic form has eigenvalues
9, 4, and 3 and the largest and smallest eigenvalues equal, respectively, the (constrained)
maximum and minimum of q(X). The same holds for any quadratic form.

5.5 Solved Exercises


 
1 0 2
 
1. Let B =  0 α 2 . Find values of α for which B is positive definite.

2 2 α

solution It is easy to check that α must satisfy: α > 2 + 2 2.

108
2. Find a change of variable that removes the cross-product term in
5x1 − 4x1 x2 − 5x22 = 48.
( )
5 −2
Solution A = . The eigenvalues of A are 3 and 7, with corresponding
−2 5
( ) ( )
√1
2
− √1
2
unit (normalized) eigenvectors v 1 = , v2 = . Let
√1 √1
( ) 2 2
√1
2
− √12
P = [v 1 v 2 ] = . Then P orthogonally diagonalizes A, so the change of
√1 √1
2 2
variable X = P u produces the quadratic form

q(u1 , u2 ) = ut Du = 3u21 + 7u22

The new axes for this change of variable are shown on Fig 5.2(a).

5.6 Exercises

1. Suppose ψ : V × V −→ K is a bilinear form. Prove that the form


[ ]
(a). Ω(u, v) = 12 ψ(u, v) + ψ(v, u) is a bilinear form, and that Ω(u, v) = Ω(v, u).
[ ]
(b). Prove that Ω(u, v) = 12 ψ(u + v, u + v) − ψ(u, u) − ψ(v, v) .

2. Classify the quadratic form. Then make a change of variable X = P u that transforms
the quadratic form into one with no cross-product term. Write the new quadratic form.
(a). q(x1 , x2 ) = 3x21 − 3x1 x2 + 6x22
(b). q(x1 , x2 ) = 9x21 − 8x1 x2 + 3x22
(c). q(x1 , x2 ) = x21 − 6x1 x2 + 9x22
(d). q(x1 , x2 ) = 8x21 + 6x1 x2
(e). q(x1 , x2 ) = x21 + 10x1 x2 + x22

3. Let A be the matrix of the quadratic form

q(x1 , x2 , x3 ) = 9x21 + 7x22 + 11x23 − 8x1 x2 + 8x1 x3 .

It can be shown that the eigenvalues of A are 3, 9, and 15. Find an orthogonal matrix P
t
such that the change of variable X = P u transforms X AX into a quadratic form with

109
no cross-product term. Give P and the new quadratic form.

4. Let A = B t B, where B is an n × n matrix.


(a). Show that A is symmetric and positive semidefinite
(b). Show that if B is invertible, then A is positive definite.

t
5. Let A be an n × n invertible matrix. Show that if the quadratic form q(X) = X AX
t
is positive definite then so is the quadratic form qb(X) = X A−1 X.

6. Find all values of α such that the quadratic form

q(x1 , x2 , x3 ) = 2x1 x2 − x21 − 2x22 + 2αx1 x3 − x23

is positive definite.

110
Chapter 6

ORTHOGONAL MATRICES AND


OPERATORS

6.1 Introduction

In Chapter 2, we introduced the notion of orthogonal matrices and the role they play
in orthogonally diagonalizing real symmetric matrices. In this lecture, we continue the
study on orthogonal matrices, studying more properties and how they define orthogonal
operators in vector spaces.
Recall that a real matrix P is orthogonal if P is nonsingular and P −1 = P t . Equiva-
lently, if P P t = P t P = I. If V is complex, then P is said to be unitary and satisfies
U ∗ U = U U ∗ = I. That is U ∗ = U −1 , where T ∗ = (T )t , is the adjoint of a matrix T .

Objectives
At the end of this lecture, you should be able to:

• Characterize orthogonal matrices and unitary matrices.

• Understand the geometric significance of orthogonal matrices and operators.

111
6.2 Unitary and Orthogonal Matrices

Definition 6.1 A unitary matrix is an n × n complex matrix U whose columns (or


rows) constitute an orthonormal basis for Cn .

Definition 6.2 An orthogonal matrix is an n × n real matrix P whose columns (or


rows) constitute an orthonormal basis for Rn .

We note that unitary and orthogonal matrices have special features, one of which is the
fact that they have easily computed inverses. We study some properties of orthogo-
nal/unitary matrices.
( √ )
1 3
2 2
Example 6.1 The matrix A = √ is orthogonal.
− 2
3 1
2

Theorem 6.1 (Orthogonal matrices are invertible) Let Q be an n × n orthogonal


matrix. Then Q is nonsingular and Q−1 = Qt .

To be able to prove this theorem we need the following result.

Lemma 6.2 (One-sided Inverse is sufficient) Suppose A and B are n × n such


that AB = I. Then BA = I.

Proof The matrix I is nonsingular. If B is singular, then this would imply that I is
singular, a contradiction. So B must be nonsingular. Now that we know that B is
nonsingular, there exists an n × n matrix C such that BC = I. Now

BA = (BA)I
= (BA)(BC)
= B(AB)C
= BIC
= BC
= I

Lemma 6.2 says that if A is nonsingular, then the matrix B will be both a right inverse
and a left inverse for A, so A is invertible and A−1 = B.
Proof of Theorem 6.1 . By definition we know that Qt Q = I. If either Qt or Q

112
were singular, then this equation would have us conclude that I is singular. This is a
contradiction, since I is nonsingular. So Q and Qt are both nonsingular. By Lemma 6.2
QQt = I, and hence Q−1 = Qt
Theorem 6.1 also holds for a unitary [Link] simply replace Qt with Q∗ .

Theorem 6.3 (Columns of orthogonal/unitary matrices are orthonormal sets)


Suppose that A is an n × n matrix with columns S = {A1 , A2 , ..., An }. Then A is an
orthogonal/unitray matrix if and only if S is an orthonormal set.

Proof. The proof revolves around recognizing that a typical entry of the product (A)t A
is an inner product of columns of A. To support this claim note that
[ ] ∑n [ t ]
t
(A) A = k=1 (A) [A]kj
ij ik
∑n
= k=1 [A]ki [A]kj
∑n
= k=1 [A]ki [A]kj
∑n
= k=1 [A]kj [A]ki
∑n
= k=1 [Aj ]k [Ai ]k

= ⟨Aj , Ai ⟩

We now employ this equality in a chain of equivalences: S = {A1 , A2 , ..., An } is an


orthonormal set {
0, i ̸= j
⇐⇒ ⟨Aj , Ai ⟩ =
1, i = j
{
[ ] 0, i ̸= j
⇐⇒ (A) A =
t
ij 1, i = j
[ ]
⇐⇒ (A)t A = [In ]ij , 1 ≤ i ≤ n, 1 ≤ j ≤ n
ij

⇐⇒ (A)t A = In

⇐⇒ A
is an orthogonal/unitary matrix

Theorem 6.4 (Orthogonal/Unitary matrices preserve Inner Products) Sup-


pose that Q is an n × n orthogonal or unitary matrix and u and v be two vectors in Rn
or Cn . Then ⟨Qu, Qv⟩ = ⟨u, v⟩ and ∥Qv∥ = ∥v∥.

113
Proof.
⟨Qu, Qv⟩ = (Qu)t Qv
= ut Qt Qv
= ut Qt Qv
= ut (Q)t Qv
= ut (Q)t Qv
= ut I n v
= ut In v
= ut v
= ⟨u, v⟩
The second conclusion is just a specialization of the first condition.

∥Qv∥ = ∥Qv∥2

= ⟨Qv, Qv⟩

= ⟨v, v⟩

= ∥v∥2
= ∥v∥

Theorem 6.5 Let P be a real matrix. Then the following are equivalent:
(a). P is orthogonal
(b). the rows of P form an orthonormal set
(c). the columns of P form an orthonormal set.

Theorem 6.6 If A is orthogonal then det(A) = ±1.

Proof Since A is orthogonal, A−1 = At . Thus det(A−1 ) = det(At ) = det(A). Thus


1
det(A)
= det(A), and so [det(A)]2 = 1. This implies that det(A) = ±1.
′ ′
Theorem 6.7 Suppose B = {ei } and B = {ei } are orthonormal bases of a vector space

V . Let P be the change-of- basis matrix from B to B . Then P is orthogonal.

Theorem 6.8 If λ is an eigenvalue of an orthogonal matrix, then |λ| = 1.

Example. Let T : R3 −→ R3 be the linear operator that rotates each vector v about
the z-axis by a fixed angle θ. That is, T (x, y, z) = (x cos θ − y sin θ, x sin θ + y cos θ, z).
It is easy to show that T is orthogonal.

114
Definition 6.3 Let V be an n-dimensional vector space. Mappings T : V −→ V that
preserve a metric are called isometries.

Note that isometries are called unitary operators when V is complex(i.e. when the
underlying field is C) and orthogonal when V is real(i.e. when the underlying field is
R). Thus isometries on Rn are precisely the orthogonal matrices, and the isometries on
Cn are the unitary matrices. The set U(n) ⊂ M(n, C) of unitary n × n matrices is a
group under matrix multiplication. It is called the unitary group.
The set O(n) ⊂ M(n, R) of orthogonal n × n matrices is a group under matrix multi-
plication. It is called the orthogonal group.

Definition 6.4 Two matrices A, B ∈ M(n) are said to be unitarily equivalent if


there exists U ∈ U(n) such that A = U −1 BU and orthogonally equivalent if there
exists C ∈ O(n) such that A = C −1 BC.

Note that orthogonal equivalence implies unitary equivalence but the converse is not
generally true.

Theorem 6.9 The composition of isometries is again an isometry.

Proof Let U, V and W be vector spaces and assume that the mappings P : U −→ V
and Q : V −→ W both are isometries. Then ∥P u∥ = ∥u∥ for all u ∈ U and ∥Qv∥ = ∥v∥
for all v ∈ V . Then
∥QP u∥ = ∥Q(P u)∥ = ∥P u∥ = ∥u∥

for all u ∈ U .

6.3 Some Special Isometries

6.3.1 Special Orthogonal Groups in Two and Three Dimensions

The orthogonal operators in a vector space V with determinant 1 form a group called
the special orthogonal group . This group is usually denoted by SO(V). An element
of this group is called a proper rotation. If V = Rn , this group is denoted by SO(n, R).
This group plays an important role in Euclidean geometry. The operators T in SO(V)

115
in 2-dimensional case are the most simple ones. If B = {e1 , e2 } is an orthornomal basis
in V , then these operators are of the form
( )
cos φ − sin φ
T = (6.1)
sin φ cos φ

A matrix of the form (6.1) is called a matrix of two-dimensional rotation, while


the numeric parameter φ is interpreted as the angle of rotation.
Let B = {e1 , e2 , e3 } is an orthonormal basis in V . The operators T in SO(V) in 3-
dimensional case are of the form
 
cos φ − sin φ 0
 
T =
 sin φ cos φ 0 
 (6.2)
0 0 1

The operator T associated with matrix (6.2) is called the operator of rotation about
the vector e3 by the angle φ.

Example 6.2 In 2-dimensional Euclidean plane X, with B = {e1 , e2 } as orthonormal


basis, the identity operator I is clearly a proper rotation and so also the isometry −I.
The isometry ϕ defined by
ϕe1 = e2 , ϕe2 = e1

has determinant −1 and is therefore an improper rotation. This is a reflection.

Example 6.3 A vector in u ∈ R3 can be rotated counterclockwise through an angle φ


around a coordinate axis by means of multiplication Px u in which Px is an appropriate
orthogonal matrix:

116
 
1 0 0
 
Px =  
 0 cos φ − sin φ 
0 sin φ cos φ

Figure 6.1: Rotation around the x-axis

117
 
cos φ 0 sin φ
 
Py = 
 0 1 0 

− sin φ 0 cos φ

Figure 6.2: Rotation around the y-axis

118
 
cos φ − sin φ 0
 
Pz = 
 sin φ cos φ 0 

0 0 1

Figure 6.3: Rotation around the z-axis

119
6.4 Solved Problems

1. Find an orthogonal matrix P whose first row is u1 = ( 13 , 32 , 23 ).


Solution. First find a nonzero vector w2 = (x, y, z) which is orthogonal to u1 . That
is, for which 0 = ⟨u1 , w2 ⟩ = x
3
+ 23 y + 23 z = 0 or x + 2y + 2z = 0. One such solution is
w2 = (0, 1, −1). Normalize w2 to obtain the second row of P . That is, u2 = (0, √12 , − √12 ).
Next find a nonzero vector w3 = (x, y, z) that is orthogonal to both u1 and u2 , i.e. for
which 0 = ⟨u1 , w3 ⟩ = x3 + 23 y+ 32 z = 0 or x+2y+2z = 0 and 0 = ⟨u2 , w3 ⟩ = √y − √z
2 2
= 0 or
y − z = 0. Set z = −1 and find the solution w3 = (4, −1, −1). Normalize
 w3 and obtain

1 2 2
 3 3 3

the third row of P , that is, u3 = ( √418 , − √118 , − √118 ). Thus P = 
 0 √1
2
− √12  .
4

3 2
− 3√1 2 − 3√1 2

Note that the above matrix P is not unique.

2. Let P and Q be a square matrices. Prove each of the following:


(a). P is orthogonal if and only if P t is orthogonal
(b). If P is orthogonal, then P −1 is orthogonal
(c). If P and Q are orthogonal, then P Q is orthogonal.
(d). If P is rotation through θ and Q is rotation through ϕ, what is P Q? Can you find
the trigonometric identities for sin(θ+ϕ) and cos(θ+ϕ) in the matrix multiplication P Q?

Proof. (a). We have (P t )t = P . Thus P is orthogonal iff P P t = I iff (P t )t P t = I iff P t


is orthogonal.
(b). We have P t = P −1 . Since P is orthogonal, by part (a), P −1 is orthogonal.
(c). We have P t = P −1 and Qt = Q−1 . Thus

(P Q)(P Q)t = P QQt P t = P QQ−1 P −1 = I.

Therefore (P Q)t = (P Q)−1 , and so P Q is orthogonal.


(d). is left as an exercise.
( )
1 −1
3. Is the matrix B = orthogonal?
1 1

120
Solution No! Although the column vectors are orthogonal, they are not normalized.

4. Are the following transformations orthogonal? T : R3 −→ R3 , L : R3 −→ R3 defined


by T (x, y, z) = (x − z, x + y, z) and L(x, y, z) = (z, x, y).

 of R , T 
3
Solution
 With respect
 to the standard
 basis and L have matrix
 representations
1 0 −1 0 0 1 2 1 −1
     
T =  1 1 0  and L =  1 0 0 . T T =  1 2 0  ̸= I. So T is not
  
t
 
0 0 1 0 1 0 −1 0 1
orthogonal. LLt = I, and hence L is orthogonal. You can check that the rows and
columns of L are orthonormal.

6.5 Exercises

1. Let A be an orthogonal matrix. Show that det(A) = ±1. Show that if B is also
orthogonal and det(A) = −det(B), then A + B is singular.

2. Prove that the eigenvalues of a unitary operator is contained in the unit circle
∂D = {z : |z| = 1}.

3. If u is a unit vector, show that Q = I − 2uut is an orthogonal matrix(It is a reflection,


( )
known as the Householder transformation). Compute Q when ut = 12 12 − 12 − 21 .

4. Find a third column so that the matrix


 
√1 √1 −
 3 14 
Q=
 √1
3
√2
14
− 

√1
3
− √314 −

is orthogonal. (Hint: It must be a unit vector that is orthogonal to the other columns.
How much freedom does it leave?). Verify that the rows automatically become orthonor-
mal at the same time.

5. Verify that the following matrices are orthogonal.

121
( )
√1 √1
2 2
(a).
− 12
√ √1
2
 
0 0 1
 
(b). 
 1 0 0 

0 1 0
( √ )
1
2
− 2
3
(c). √
3 1
2 2
 
3 6 −2
 7 7 7

(d). 

−2
7
3
7
6
7


6 −2 3
7 7 7

6. Verify that the following matrices are unitary.


( )
1
1 + i −1 + i
(a). 2
1+i 1−i
 
1+i
√ 3+2i
√ 2+2i

 5 55 22 
(b). 

1−i

5
2+2i

55
−3+i

22
.

√i 3−5i
√ √−2
5 55 22

7. Suppose P and Q are n × n orthogonal matrices and α ∈ R. Prove or give a coun-


terexample for each of the following:
(a). P + Q is orthogonal.
(b). αP is orthogonal.

122
Bibliography

[1] Roland E. Larson and Bruce H. Edwards, Elementary Linear Algebra, Third
Edition,...

[2] Seymour Lipschutz and Marc Lars Lipson, Schaum’s Outline Series: Theory
and Problems of Linear Algebra, Third Edition, McGraw-Hill, NY, 2001.

[3] Scheick J.T., Linear Algebra with Applications, International Series in Pure and
Applied Mathematics, McGraw-Hill, NY, 1997.

[4] Lee W. Johnson, R. Dean Riess and Jimmy T. Arnold, Introduction to


Linear Algebra, Second Edition, Addison-Wesley, NY, 1989.

[5] George Nakos and David Joyner, Linear Algebra with Applications,
Brooks/Cole, CA, 1998.

123
Index

additions, 17 vector, 2
algebraic multiplicity, 53 column-reduced, 16
alternating, 100 complex
area, 28 matrices, 3
arithmetic operations, 17 Constrained Optimization, 107
coplanar
bilinear form, 99
points, 34
block diagonal, 18
Cramer’s Rule, 26
block triangular, 18
defective
calculator algorithms, 17
matrices, 55
Cayley-Hamilton Theorem, 54
deficient
characteristic
matrix, 55
equation, 50
determinant, 1
values, 47
cofactor expansion, 6
vectors, 47
determinant function, 19
characteristic
diagonal, 10
polynomial, 50
diagonalizable
classical adjoint, 23
matrix, 57
coefficient matrix, 26
Diagonalization, 57
cofactor, 7
diagonalization, 48
expansion, 6
Diagonalization Algorithm, 59
cofactor
diagonalization theorem, 104
expansion, 5
dual basis, 93
cofactors, 5
dual space, 93
column
matrix, 2 echelon form, 11

124
eigenpair, 47 invertible
eigenspace, 49 matrix, 21
eigenvalue, 46 isometries, 115
eigenvalue problem, 46
leading 1, 11
eigenvector, 46
leading entry, 11
elementary
linear equations, 25
column operations, 12
linear functional, 92
row operations, 12
lower
elementary matrix, 21
triangular
Elimination
matrix, 9
Gaussian, 11
entry, 2 matrix, 2
equation block triangular, 56
plane, 34 classical adjoint, 23
straight line, 30 echelon form, 11
equations of lines, 28 elementary, 21
evaluation map, 93 invertible, 21
matrix
Gaussian Elimination, 11
spectrum, 47
gaussian elimination, 37
maximum value, 107
geometric multiplicity, 53
minimal polynomial, 81
Gram-Schimdt orthogonalization process, 65
minimum value, 107
homogeneous system, 48 minor, 6
Householder transformation, 121 monic polynomial, 82
multiplications, 17
identity
multiplicity
matrix, 4
algebraic, 54
imperfect
geometric, 54
matrices, 55
indefinite, 101 negative, 101
inverse negative definite, 101
matrix, 21 nonnegative semidefinite, 101

125
nonzero row, 11 row-reduced, 16
RREF, 11
order
matrix, 2 Schur’s Theorem, 66
orthogonal group, 114 signed products, 6
orthogonal matrix, 63, 111 similar
orthogonally diagonalizable, 67 matrices, 57
orthogonally equivalent, 115 singular
orthonorma set, 111 matrix, 21
orthonormal basis, 111 size
orthonormal set, 63 matrix, 2
skew-symmetric, 100
plane
special orthogonal group, 115
equation, 34
spectrum
positive , 101
matrix, 47
positive definite, 101
square
Principal Axes Theorem, 105
matrix, 3
principal axis theorem, 103
standard position, 105
proper rotation, 115
symmetric, 63
proportional rows, 16
symmetric bilinear form, 100
quadratic form, 101
tetrahedron
real volume, 32
matrices, 3 Theorem
real symmetric, 63 Schur, 66
Reduced Row Echelon Form, 11 Theorem
root Cayley-Hamilton , 54
of a polynomial, 53 transpose, 3, 19
row triangular, 9
matrix, 2 triangular block, 18
row trivial solution, 25
vector, 2
unique solution, 25
row equivalent, 13

126
unitarily equivalent, 115
unitary group, 114
unitary matrix, 111
upper
triangular
matrix, 9

vandermonde’s determinant, 35
volume
tetrahedron, 32

zero
matrix, 4
zero row, 11

127

You might also like