Definition and basic properties 161
16. More on determinants
In this chapter only, n-vectors will be denoted by lower-case boldface roman letters; for example,
a = (a1 , . . . , an ) ∈ IFn .
Definition and basic properties
Determinants are often brought into courses such as this quite unnecessarily. But when they
are useful, they are remarkably so. The use of determinants is a bit bewildering to the beginner,
particularly if confronted with the classical definition as a sum of signed products of matrix entries.
I find it more intuitive to follow Weierstrass and begin with a few important properties of
the determinant, from which all else follows, including that classical definition (which is practically
useless anyway).
(As to the many determinant identities available, in the end I have always managed with just
one nontrivial one, viz. Sylvester’s determinant identity, and this is nothing but Gauss elimination;
see the end of this chapter. The only other one I have often used is the Cauchy-Binet formula.)
The determinant is a map,
det : IFn×n → IF : A 7→ det A,
with various properties. The first one in the following list is perhaps the most important one.
(i) det(AB) = det(A) det(B)
(ii) det( id) = 1
Consequently, for any invertible A,
1 = det( id) = det(AA−1 ) = det(A) det(A−1 ).
Hence,
(iii) If A is invertible, then det A 6= 0 and, det(A−1 ) = 1/ det(A).
While the determinant is defined as a map on matrices, it is very useful to think of det(A) =
det[a1 , . . . , an ] as a function of the columns a1 , . . . , an of A. The next two properties are in those
terms:
(iv) x 7→ det[. . . , aj−1 , x, aj+1 , . . .] is linear, i.e., for any n-vectors x and y and any scalar α (and
arbitrary n-vectors ai ),
det[. . . , aj−1 , x + αy, aj+1 , . . .] = det[. . . , aj−1 , x, aj+1 , . . .] + α det[. . . , aj−1 , y, aj+1 , . . .].
(v) The determinant is an alternating form, i.e.,
det[. . . , ai , . . . , aj , . . .] = − det[. . . , aj , . . . , ai , . . .].
In words: Interchanging two columns changes the sign of the determinant (and nothing else).
It can be shown (see below) that (ii) + (iv) + (v) implies (i) (and anything else you may wish
to prove about determinants). Here are some basic consequences first.
(vi) Since 0 is the only scalar α with the property that α = −α, it follows from (v) that det(A) = 0
if two columns of A are the same.
(vii) Adding a multiple of one column of A to another column of A doesn’t change the determinant.
Indeed, using first (iv) and then the consequence (vi) of (v), we compute
det[. . . , ai , . . . , aj + αai , . . .] = det[. . . , ai , . . . , aj , . . .] + α det[. . . , ai , . . . , ai , . . .] = det[. . . , ai , . . . , aj , . . .].
19aug02 c 2002 Carl de Boor
162 16. More on determinants
Here comes a very important use of (vii): Assume that b = Ax and consider det[. . . , aj−1 , b, aj+1 , . . .].
Since b = x1 a1 + · · · + xn an , subtraction of xi times column i from column j, i.e., subtraction of
xi ai from b here, for each i 6= j is, by (vii), guaranteed not to change the determinant, yet changes
the jth column to xj aj ; then, pulling out that scalar factor xj (permitted by (iv)), leaves us finally
with xj det A. This proves
(viii) If b = Ax, then
det[. . . , aj−1 , b, aj+1 , . . .] = xj det A.
Hence, if det A 6= 0, then b = Ax implies
xj = det[. . . , aj−1 , b, aj+1 , . . .]/ det(A), j = 1, . . . , n.
This is Cramer’s rule.
In particular, if det(A) 6= 0, then Ax = 0 implies that xj = 0 for all j, i.e., then A is 1-1, hence
invertible (since A is square). This gives the converse to (iii), i.e.,
(ix) If det(A) 6= 0, then A is invertible.
In old-fashioned mathematics, a matrix was called singular if its determinant is 0. So, (iii)
and (ix) combined say that a matrix is nonsingular iff it is invertible.
The suggestion that one actually construct the solution to A? = y by Cramer’s rule is ridiculous
under ordinary circumstances since, even for a linear system with just two unknowns, it is more
efficient to use Gauss elimination. On the other hand, if the solution is to be constructed symbolically
(in a symbol-manipulating system such as Maple or Mathematica), then Cramer’s rule is preferred
to Gauss elimination since it treats all unknowns equally. In particular, the number of operations
needed to obtain a particular unknown is the same for all unknowns.
We have proved all these facts (except (i)) about determinants from certain postulates (namely
(ii), (iv), (v)) without ever saying how to compute det(A). Now, it is the actual formulas for det(A)
that have given determinants such a bad name. Here is the standard one, which (see below) can
be derived from (ii), (iv), (v), in the process of proving (i):
(x) If A = (aij : i, j = 1, . . . , n), then
X n
Y
det(A) = (−1)σ aσ(j),j .
σ∈SSn j=1
Here, σ ∈ S n is shorthand for: σ is a permutation of the first n integers, i.e.,
σ = (σ(1), σ(2), . . . , σ(n)),
where σ(j) ∈ {1, 2, . . . , n} for all j, and σ(i) 6= σ(j) if i 6= j. In other words, σ is a 1-1 and onto
map from {1, . . . , n} to {1, . . . , n}. This is bad enough, but I still have to explain the mysterious
(−1)σ . This number is 1 or −1 depending on whether the parity of σ is even or odd. Now, this
parity can be determined in at least two equivalent ways:
(a) keep making interchanges until you end up with the sequence (1, 2, . . . , n); the parity of
the number of steps it took is the parity of σ (note the implied assertion that this parity will not
depend on how you went about this, i.e., the number of steps taken may differ, but the parity never
will; if it takes me an even number of steps, it will take you an even number of steps.)
(b) count the number of pairs that are out of order; its parity is the parity of σ.
Here is a simple example: σ = (3, 1, 4, 2) has the pairs (3, 1), (3, 2), and (4, 2) out of order, hence
(−1)σ = −1. Equivalently, the following sequence of 3 interchanges gets me from σ to (1, 2, 3, 4):
(3, 1, 4, 2)
(3, 1, 2, 4)
(1, 3, 2, 4)
(1, 2, 3, 4)
19aug02 c 2002 Carl de Boor
Definition and basic properties 163
Therefore, again, (−1)σ = −1.
Now, fortunately, we don’t really ever have to use this stunning formula (x) in calculations,
nor is it physically possible to use it for n much larger than 8 or 10. For n = 1, 2, 3, one can derive
from it explicit rules for computing det(A):
a b c
a b
det [ a ] = a, det = ad − bc, det d e f = aei + bf g + cdh − (ceg + af h + bdi);
c d
g h i
the last one can be remembered easily by the following mnemonic:
a b c a b
d e f d e
g h i g h
For n > 3, this mnemonic does not work, and one would not usually make use of (x), but use instead
(i) and the following immediate consequence of (x):
(xi) The determinant of a triangular matrix equals the product of its diagonal entries.
Indeed, when A is upper triangular, then aij = 0 whenever Q i > j. Now, if σ(j) > j for some
j, then the factor aσ(j),j in the corresponding summand (−1)σ nj=1 aσ(j),j is zero. This means
that the only possibly nonzero summands correspond to σ with σ(j) ≤ j for all j, and there is
only one permutation that manages that, the identity permutation (1, 2, . . . , n), and its parity
is obviously even. Therefore, the formula in (x) gives det A = a11 · · · ann in this case. – The proof
for a lower triangular matrix is analogous; else, use (xiii) below.
Consequently, if A = LU with L unit triangular and U upper triangular, then
det A = det U = u11 · · · unn .
If, more generally, A = P LU , with P some permutation matrix, then
det A = det(P )u11 · · · unn ,
i.e.,
(xii) det A is the product of the pivots used in elimination, times (−1)i , with i the number of row
interchanges made.
Since, by elimination, any A ∈ IFn can be factored as A = P LU , with P a permutation
matrix, L unit lower triangular, and U upper triangular, (xii) provides the standard way to compute
determinants.
Note that, then, At = U t Lt P t , with U t lower triangular, Lt unit upper triangular, and P t the
inverse of P , hence
(xiii) det At = det A.
This can also be proved directly from (x). Note that this converts all our statements about
the determinant in terms of columns to the corresponding statements in terms of rows.
(xiv) “expansion by minors”:
Since, by (iv), the determinant is slotwise linear, and x = x1 e1 + x2 e2 + · · · + xn en , we obtain
(16.1) det[. . . , aj−1 , x, aj+1 , . . .] = x1 C1j + x2 C2j + · · · + xn Cnj ,
with
Cij := det[. . . , aj−1 , ei , aj+1 , . . .]
19aug02 c 2002 Carl de Boor
164 16. More on determinants
the socalled cofactor of aij . With the choice x = ak , this implies
n
det A if k = j;
a1k C1k + a2k C2k + · · · + ank Cnk = det[. . . , aj−1 , ak , aj+1 , . . .] =
0 otherwise.
The case k = j gives the expansion by minors for det A (and justifies the name ‘cofactor’ for
Cij ). The case k 6= j is justified by (vi). In other words, with
C11 C21 ··· Cn1
C12 C22 ··· Cn2
adjA :=
... .. ..
. ··· .
C1n C2n · · · Cnn
the socalled adjugate of A (note that the subscripts appear reversed), we have
adj(A) A = (det A) id.
For an invertible A, this implies that
A−1 = (adjA)/ det A.
The expansion by minors is useful since, as follows from (x), the cofactor Cij equals (−1)i+j
times the determinant of the matrix A(n\i|n\j) obtained from A by removing row i and column
j, i.e.,
... ... ... ...
. . . ai−1,j−1 ai−1,j+1 . . .
Cij = (−1)i+j det ,
. . . ai+1,j−1 ai+1,j+1 . . .
... ... ... ...
and this is a determinant of order n − 1, and so, if n − 1 > 1, can itself be expanded along some
column (or row).
As a practical matter, for [a, b, c] := A ∈ IR3 , the formula adj(A) A = (det A) id implies that
(a × b)t c = det[a, b, c],
with
a × b := (a2 b3 − a3 b2 , a3 b1 − a1 b3 , a1 b2 − a2 b1 )
the cross product of a with b. In particular, a × b is perpendicular to both a and b. Also, if
[a, b] is o.n., then so is [a, b, a × b] but, in addition, det[a, b, a × b] = 1, i.e., [a, b, a × b] provides
a right-handed cartesian coordinate system for IR3 .
(xv) det A is the n-dimensional (signed) volume of the parallelepiped
{Ax : 0 ≤ xi ≤ 1, all i}
spanned by the columns of A.
For n > 3, this is a definition, while, for n ≤ 3, one works it out (see below). This is a very
useful geometric way of thinking about determinants. Also, it has made determinants indispensable
in the definition of multivariate integration and the handling therein of changes of variable.
Since det(AB) = det(A) det(B), it follows that the linear transformation T : IFn → IFn : x 7→
Ax changes volumes by a factor of det(A), meaning that, for any set M in the domain of T ,
vol n (T (M )) = det(A) vol n (M ).
19aug02 c 2002 Carl de Boor
Sylvester 165
As an example, consider det[a, b], with a, b vectors in the plane linearly independent, and
assume, wlog, that a1 6= 0. By (iv), det[a, b] = det[a, b̃], with b̃ := b − (b1 /a1 )a having its first
component equal to zero, and so, again by (iv), det[a, b] = det[ã, b̃], with ã := a − (a2 /b̃2 )b̃ having
its second component equal to zero. Therefore, det[a, b] = ã1 b̃2 = ±kãkkb̃k equals ± the area of
the rectangle spanned by ã and b̃. However, following the derivation of ã and b̃ graphically, we
see, by matching congruent triangles, that the rectangle spanned by ã and b̃ has the same area as
the parallelepiped spanned by a and b̃, and, therefore, as the parallelepiped spanned by a and b.
Thus, up to sign, det[a, b] is the area of the parallelepiped spanned by a and b.
b b
b̃ b̃
a a a
ã
Here, finally, for the record, is a proof that (ii) + (iv) + (v) implies (i), hence everything else
we have been deriving so far. Let A and B be arbitrary matrices (of order n). Then the linearity
(iv) implies that
X X Y
det(BA) = det[Ba1 , Ba2 , . . . , Ban ] = det[. . . , bi aij , . . .] = det[bσ(1) , . . . , bσ(n) ] aσ(j),j .
i σ∈{1,...,n}n j
By the consequence (vi) of the alternation property, most of these summands are zero. Only those
determinants det[bσ(1) , . . . , bσ(n) ] for which all the entries of σ are different are not automatically
zero. But that are exactly all the σ ∈ S n , i.e., the permutations of the first n integers. Further, for
such σ,
det[bσ(1) , . . . , bσ(n) ] = (−)σ det(B)
by the alternation property, with (−)σ = ±1 depending on whether it takes an even or an odd
number of interchanges to change σ into a strictly increasing sequence. (We discussed this earlier;
the only tricky part remaining here is an argument that shows the parity of such number of needed
interchanges to be independent of how one goes about making the interchanges. The clue to the
proof is the simple observation that any one interchange is bound to change the number of sequence
entries out of order by an odd amount.) Thus
X Y
det(BA) = det(B) (−)σ aσ(j),j .
σ∈S
Sn j
Since idA = A while, by the defining property (ii), det( id) = 1, the formula (x) follows and,
with that, det(BA) = det(B) det(A) for arbitrary B and A. On the other hand, starting with the
formula in (x) as a definition, one readily verifies that det so defined satisfies the three properties
(ii) (det( id) = 1), (iv) (multilinear), and (v) (alternating) claimed for it. In other words, there
actually is such a function (necessarily given by (x)).
Sylvester
Here, for the record, is a proof and statement of Sylvester’s Determinant Identity. For it,
the following notation will be useful: If i = (i1 , . . . , ir ) and j = (j1 , . . . , js ) are suitable integer
sequences, then A(i, j) = (i|j) is the r × s-matrix whose (p, q) entry is A(ip , jq ), p = 1, . . . , r,
q = 1, . . . , s. This is just as in MATLAB except for the vertical bar used here at times, for emphasis
and in order to list, on either side of it, a sequence without having to encase it in parentheses. Also,
it will be handy to denote by : the entire sequence 1:n, and by \i the sequence obtained from 1:n
by removing from it the entries of i. Thus, as in MATLAB, A(: |j) = A(:, j) is the jth column of A.
19aug02 c 2002 Carl de Boor
166 16. More on determinants
With k := 1:k, consider the matrix B with entries
B(i, j) := det A(k, i|k, j).
On expanding det A(k, i|k, j) by entries of the last row,
X
B(i, j) = A(i, j) det A(k|k) − A(i, r)(−)k−r det A(k|(k\r), j).
r≤k
This shows that
B(:, j) ∈ A(:, j) det A(k|k) + span A(: |k),
while, directly, B(i, j) = 0 for i ∈ k since then det A(k, i|k, j) has two rows the same.
In the same way,
B(i, :) ∈ A(i, :) det A(k|k) + span A(k| :),
while, directly, B(i, j) = 0 for j ∈ k. Thus, if det A(k|k) 6= 0, then, for i > k,
B(i, :)/ det A(k|k)
provides the ith row of the matrix obtained from A after k steps of Gauss elimination (without piv-
oting). In other words, the matrix S := B/ det A(k|k) provides the Schur complement S(\k|\k)
in A of the pivot block A(k|k).
Since such row elimination is done by elementary matrices with determinant equal to 1, it
follows that
det A = det A(k|k) det S(\k|\k).
Since, for any #i = #j, B(i, j) depends only on the square matrix A(k, i|k, j), this implies
Sylvester’s determinant identity. If
S(i, j) := det A(k, i|k, j)/ det A(k|k), ∀i, j,
then
det S(i|j) = det A(k, i|k, j)/ det A(k|k).
Cauchy-Binet
P
Cauchy-Binet formula. det(AB(i|j)) = #h=#i det A(i|h) det B(h|j).
Even the special case #i = #A of this, i.e., the most important determinant identity, (13.11),
det(AB) = det(A) det(B),
Binet and Cauchy were the first to prove.
19aug02 c 2002 Carl de Boor