0% found this document useful (0 votes)
20 views11 pages

Quantum Mechanics: Probability Theory Basics

Chapter 2 introduces the essentials of quantum mechanics, focusing on the foundational elements of probability theory as it applies to quantum systems. It discusses probability distributions, transformations, and the concept of Markov matrices, which are crucial for understanding dynamics in quantum learning theory. The chapter also covers joint distributions and tensor products, emphasizing their importance in the mathematical framework of quantum mechanics.

Uploaded by

tanmoy321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views11 pages

Quantum Mechanics: Probability Theory Basics

Chapter 2 introduces the essentials of quantum mechanics, focusing on the foundational elements of probability theory as it applies to quantum systems. It discusses probability distributions, transformations, and the concept of Markov matrices, which are crucial for understanding dynamics in quantum learning theory. The chapter also covers joint distributions and tensor products, emphasizing their importance in the mathematical framework of quantum mechanics.

Uploaded by

tanmoy321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CHAPTER 2

Essentials of Quantum Mechanics

We begin by building up the basic ingredients of quantum mechanics. This


is not meant to be a course on quantum mechanics, and so we will proceed prag-
matically and without much fanfare. We will have the luxury of working with
finite-dimensional Hilbert spaces (if you do not know what this means, you will
soon), since this is the setting of most present applications of quantum learning
theory. Our pedagogical approach will be to revisit ordinary probability theory in
a suggestive way that naturally generalizes to quantum theory. Our exposition is
meant to be accessible to readers with a knowledge of linear algebra and probability
theory.

1. Probability theory on vector spaces


1.1. Probability distributions and their transformations
Here we will formulate probability theory on a discrete space, with some addi-
tional linear algebraic baggage that will be useful later. If we have a set of size N
we can represent a probability distribution over that set as a vector in RN given by
2 3
p1
6 p2 7
6 7
p~ = 6 . 7
4 .. 5
pN
where pi is the probability of the ith item. We have, out of convenience, chosen an
ordering on our set of items so that we can organize the probabilities into a vector,
but of course this ordering is arbitrary. As usual, we require pi 0 for all i since
PN
probabilities cannot be negative, and also i=1 pi = 1 so that the probabilities are
appropriately normalized. There is a natural way of packaging the normalization
condition. To this end, consider the row vector
⇥ ⇤
~1T = 1 1 · · · 1 .
PN
Then i=1 pi = 1 is equivalent to
~1T · p~ = 1 ,

and we will use this more compact expression henceforth. It will sometimes be
useful to consider the probability simplex N which is a subset of RN , where N
consists of all nonnegative vectors with entries summing to one. Then we can write
p~ 2 N .
Next we consider a rudimentary version of dynamics. That is, what kinds of
transformations on p~ will map it into another valid probability distribution? The
17
18 2. ESSENTIALS OF QUANTUM MECHANICS

simplest kind of transformation we can imagine is a linear one, so let us examine


that first. Letting M be an N ⇥ N matrix, we consider the transformation
p~0 = M · p~ ,
so that p~ 0 is the new probability distribution after the transformation. But what
conditions do we need to put on M such that p~ 0 is a bona fide probability dis-
tribution for all initial distributions p~? Well, we need for all entries of p~ 0 to be
nonnegative, and for ~1T · p~ 0 = 1. To ensure the first property, suppose that p~ is all
zeroes except for the jth entry which equals one. (That is, we would sample the
jth object with probability 1 and never sample anything else.) To introduce some
other notation, let ~ej be vector which is all zeroes except for the jth entry which
equals one. Then we have
2 3
M1j
6 M2j 7
6 7
p~ 0 = M · ~ej = 6 . 7 .
.
4 . 5
MN j
0
In order for all entries of p~ to be nonnegative, we evidently require Mij 0 for all
j, and i fixed. Varying over i as well, we find the requirement that Mij 0 for all
i, j, and so M must be a matrix with nonnegative entries. Since we also demand
that ~1T · p~ 0 = 1, we find the condition
2 3
M1j
6 M2j 7
~1T · p~ 0 = ~1T · M · ~ej = ~1T · 6 7
6 .. 7 = 1 .
4 . 5
MN j
That is, the jth column of M must sum up to one. Since this must hold for every
column, we find the condition
~1T · M = ~1T . (2)
Thus a nonnegative matrix satisfying (2) will send probability vectors to probability
vectors. We honor this finding with a definition:
Definition 5 (Markov matrix). Let M be an N ⇥ N matrix. We say that M is a
Markov matrix if Mij 0 for all i, j, and ~1T ·M = ~1T . Then M maps probability
vectors to probability vectors.
A few comments are in order. In many treatments of Markov matrices, there
is a di↵erent convention in which M is taken to act on probability distributions
‘to the left’, which would give the transpose our definition above. Our conventions
here are chosen to align with those of quantum mechanics, as we will see later on.
We immediately notice that Markov matrices behave nicely under composition.
Specifically, we have the useful lemma:
Lemma 6 (Composition of Markov matrices). If M1 , M2 , ..., Mk are Markov ma-
trices, then Mk · · · M2 · M1 is also a Markov matrix.
The proof of this useful fact follows by a short calculation using the definition
(which you should do if you have not thought it through before). The upshot of
1. PROBABILITY THEORY ON VECTOR SPACES 19

this lemma is that we can consider transformations like


p~ 0 = Mk · · · M2 · M1 · p~
as instantiating a type of ‘circuit’, with depth k. That is, we could say the words:
starting with p~ we apply M1 followed by M2 followed by M3 and so on, and then
finally apply Mk .
Before moving on to increasing levels of sophistication, we consider a simple
example:

Example 1 (Bernoulli coin, N = 2). We now specialize to a two–outcome


space and fix the ordering so that the first coordinate is outcome 0 (“success”)
and the second is outcome 1 (“failure”). A Bernoulli distribution with success
probability ✓ is therefore represented by
 
Pr[0] ✓
p~✓ = = , ✓ 2 [0, 1].
Pr[1] 1 ✓
Consider the bit–flip dynamics with flip probability " 2 [0, 1],

1 " "
M" = ,
" 1 "
whose entries are nonnegative and whose columns each sum to 1, so M" is a Markov
matrix in our sense. Acting on p~ produces

(1 ")✓ + "(1 ✓)
p~✓00 = M" p~✓ = =) ✓0 = (1 2") ✓ + ",
"✓ + (1 ")(1 ✓)
where ✓0 = Pr0 [0] is the new success probability.
Some immediate checks help build intuition. When " = 0 the map is the
1
 0 $ 1; and when " = 2 it sends every
identity; when " = 1 it deterministically flips
1/2
input to the uniform distribution p~1/2 = in one step. For any 0 < " < 1,
1/2
the unique fixed point solves ✓0 = ✓ and is ✓⇤ = 12 . (To see this, simply solve
✓⇤ = (1 2")✓⇤ + " for ✓⇤ ). Iterating M" a total of k times yields exponential
mixing toward the fixed point ✓⇤ at rate |1 2"|:
⇣ 1⌘ 1
k
✓(k) = 1 2" ✓(0) + .
2 2
Finally, the family M" of Markov matrices is closed under composition (illustrating
the lemma above): a short calculation shows
M⌘ M" = M"+⌘ 2"⌘ ,

and in particular M"k = M"eff with


1 (1 2")k
"e↵ = .
2
This two–state example already displays dynamics, fixed points, and circuit com-
position within the linear–algebraic language we have been developing.

Moving on, it is useful to recount a few features of probability distributions. If


20 2. ESSENTIALS OF QUANTUM MECHANICS

we have k probability distributions p~1 , ..., p~k , then we can form a new probability
distribution by forming a convex combination
k
X
p~ 0 = rj p~j (3)
j=1
Pk
where rj 0 and rj = 1. To see this, notice that p~ 0 has nonnegative entries
Pj=1
k Pk
and that ~1T · p~ 0 = j=1 rj (~1T · p~j ) = j=1 rj = 1. We can interpret r1 , ..., rk as a
probability distribution over k items in its own right, and say of (3) that we have
a probabilistic mixture of k probability distributions wherein we sample from p~j
with probability rj . That is, r1 , ..., rk is a probability distribution over probability
distributions. (You can use this ‘meta’ statement to impress your friends, if you
like.) To make this concrete, consider the following example:

Example 2 (Sampling two coins, N = 2). Suppose we have two Bernoulli


coins, represented by the probability vectors p~1/2 and p~1/3 , respectively. The first
one gives heads with probability 1/2 and tails with probability 1/2, and the second
gives heads with probability 1/3 and tails with probability 2/3. Now suppose I
have both coins in my pocket in such a way that when I reach in, I grab the first
coin with probability 1/4 and the second coin with probability 3/4. Then if I reach
in and grab a coin and toss it, what is the probability that I would output heads?
This is described by the convex combination

1 3 3/8
p~1/2 + p~1/3 = ,
4 4 5/8
and so evidently the probability of heads is 3/8.

So far we have only considered linear transformations on p~ that map it into


another probability distribution. What if we consider nonlinear transformations?
One example would be the nonlinear transformation
2 2
3
p
PN 1
6 i=12 p2i 7
6 P p2 7
6 N p2 7
p) = 6
T (~ i=1 i 7
6 .. 7 .
6 . 7
4 2 5
p
PNN
i=1 p2i

Another example would be a Bayesian update. There are clearly a vast infinitude of
other possibilities as well. Among this infinitude of transformations there is a natu-
ral class that interfaces well with convex combinations of probability distributions.
In particular, suppose we mandate that T satisfies
0 1
Xk k
X
T@ rj p~j A = rj T (~
pj ) (4)
j=1 j=1

for any p~1 , ..., p~k and any valid r1 , ..., rk . In words, we are requiring that a transfor-
mation of a probabilistic mixture is a probabilistic mixture of transformations (and
specifically, the same transformation). Such T ’s satisfy a nice structure theorem:
1. PROBABILITY THEORY ON VECTOR SPACES 21

Theorem 7 (Mixture-preserving transformations are Markov matrices). Suppose


that T : N ! N is a mixture-preserving transformation, namely that (4) is
satisfied. Then there exists a Markov matrix M such that T (~ p) = M · p~ for all p~.
PN
Proof. Write p~ = j=1 pj ~ej . Using the mixture-preserving property of T , we
have
0 1
XN N
X
p) = T@
T (~ pj ~ej A = pj T (~ej ) .
j=1 j=1

Let M be the matrix whose jth column is T (~ej ). Then T (~p) = M · p~. Each column
T (~ej ) is a probability vector, so Mij 0 and ~1T · M = ~1T . Thus M is a Markov
matrix, as claimed. ⇤
Mixture-preserving transformations are natural from a physical point of view.
Imagine a preparation device that, with probabilities r1 , ..., rk , produces one of the
distributions p~1 , ..., p~k by consulting some randomly tossed coins you do not get to
see. If dynamics could distinguish whether this randomization happened “before”
or “after” the transformation, then the timing of the unseen coin flips would be
observable from the output statistics alone. Requiring that they not be observable
is exactly the statement of (4).
Two simple consequences are worth keeping in mind. First, the admissible
dynamics are closed under randomized control: if with probability rj you implement
a Markov matrix Mj , then the overall map is
k
X
M0 = r j Mj ,
j=1
Pk
which is again a Markov matrix since ~1 T · M 0 = j=1 rj (~1T · Mj ) = ~1T and all
entries are nonnegative. Second, if one further insists that deterministic states are
carried to deterministic states, so that ~ej never acquires additional randomness,
then each column T (~ej ) must itself be a basis vector. Equivalently, M has exactly
one 1 (and zeros elsewhere) in each column. Such matrices are sometimes called
deterministic or functional Markov matrices. If in addition the mapping j 7! i(j)
is injective (no two distinct columns point to the same basis vector), then M is a
permutation matrix.
By contrast, nonlinear updates arise when you condition on a revealed out-
come and then renormalize; the rule in that case depends on which outcome was
announced, so it is not a single fixed map on N and does not represent closed-
system dynamics. This classical discussion sets the stage for the quantum case,
which we will treat soon. (There, the state space becomes the convex set of density
operators, mixture-preserving maps become convex-linear “channels,” and the role
of Markov matrices is played by completely positive, trace-preserving maps.)

1.2. Joint distributions and tensor products


In probability theory it is essential to consider joint distributions. Here we
develop the basic operations of joint distributions in a convenient and illuminating
linear algebraic notation. First we require some additional tools on the linear
algebra side. Specifically, we will upgrade our linear algebraic toolkit to multi-linear
22 2. ESSENTIALS OF QUANTUM MECHANICS

algebra. The key operation will be the tensor product, which is an operation for
joining two or more vector spaces.
We will proceed by motivating the tensor product informally through simple
examples, and then give the abstract definition. It is worth paying close attention
as the tensor product will serve as an essential piece of mathematical architecture
for almost everything in quantum learning theory.
~ in RN . We denote their tensor product by ~v ⌦ w.
Consider two vectors ~v , w ~ To
develop what this means, consider the example below.
 
1 3
Example 3. Let ~v = and w~ = . Then their tensor product ~v ⌦ w~ is
2 4
represented by
2  3
3 2 3
  61 · 4 7 3
1 3 6 7 647
~v ⌦ w~= ⌦ =6 7 6 7
6  7 = 465 .
2 4 4 3 5
2· 8
4

In words, w
~ gets ‘sucked in’ to ~v . Now let us take the tensor product in the other
~ ⌦ ~v :
order, namely w
2  3
1 2 3
  63 · 2 7 3
3 1 6 7 667
~ ⌦ ~v =
w ⌦ =66 
7=6 7.
7 445
4 2 4 5
1
4· 8
2

From this we glean that, in general, ~v ⌦ w ~ ⌦ ~v . Moreover, since ~v 2 R2 and


~ 6= w
~ 2 R , we notice that ~v ⌦ w
w 2
~ 2 R . To this end we write ~v ⌦ w
4
~ 2 R2 ⌦ R2 ' R4 .
2 3
 3
1
Example 4. Suppose ~v = ~ = 445 so that ~v 2 R2 and w
and w ~ 2 R3 .
2
5
Then
2 3
3
647
6 7
657
~v ⌦ w~ =6 7
6672R ,
6
6 7
485
10

~ 2 R2 ⌦ R3 ' R6 .
and we write ~v ⌦ w

From the previous two examples we see the general rule that if ~v 2 RN and w
~ 2 RM ,
~ 2R ⌦R 'R
then ~v ⌦ w N M NM
. So upon taking the tensor product of two vector
spaces, the dimensions multiply. We can generalize this further by contemplating
another example:
1. PROBABILITY THEORY ON VECTOR SPACES 23

  
1 3 5
Example 5. Let ~v = ,w ~= , and ~u = . Then we have
2 4 6
2 3
15
6187
2 3 6 7
3 6207
 6 7
647 5 6247
~v ⌦ w
~ ⌦ ~u = (~v ⌦ w) 6
~ ⌦ ~u = 4 5 ⌦7 =6 7
6 6 6307
6 7
8 6367
6 7
4405
48
~ ⌦ ~u 2 R2 ⌦ R2 ⌦ R2 ' R8 .
and ~v ⌦ w

The above example indicates that


RN1 ⌦ RN2 ⌦ · · · ⌦ RNk ' RN1 N2 ···Nk ,
namely that if we take the tensor product of k vector spaces then the result is a
vector space which is the product of the dimensions of the constituents.
We are now ready to define tensor products abstractly, and to really appreciate
what it means. Consider the following definition:
Definition 8 (Tensor product). Let V and W be real vector spaces. A tensor
product of V and W is a vector space V ⌦ W together with a map
⌦ : V ⇥ W ! V ⌦ W, (v, w) 7! v ⌦ w,
that is bilinear in each argument, i.e. for all scalars a, b, c 2 R and vectors ~v , w,
~ ~u,
~ ⌦ ~u = a (~v ⌦ ~u) + b (w
(a ~v + b w) ~ ⌦ ~u),
~v ⌦ (b w
~ + c ~u) = b (~v ⌦ w)
~ + c (~v ⌦ ~u),
and in particular (a~v ) ⌦ w
~ = ~v ⌦ (aw)
~ = a(~v ⌦ w).
~ Concretely, one may construct
V ⌦ W as the vector space spanned by formal symbols v ⌦ w modulo the above
bilinearity relations.
~ M
To connect this with coordinates, fix bases {~ei }N i=1 of R and {fj }j=1 of R . Then
N M

the N M simple tensors {~ei ⌦f~j }i,j form a basis of R ⌦R , and so dim(R ⌦RM ) =
N M N
P P
~ = j wj f~j , then
N M . If ~v = i vi ~ei and w
X
~v ⌦ w~= vi wj (~ei ⌦ f~j ) ,
i,j

which recovers the stacking rules seen in the earlier examples and realizes the iden-
tification RN ⌦ RM ' RN M .
Identifying R with the one–dimensional space spanned by 1, there are canonical
isomorphisms V ⌦ R ' V ' R ⌦ V given by ~v ⌦ a 7! a ~v and a ⌦ ~v 7! a ~v . Hence
RN ⌦ R1 ' RN ' R 1 ⌦ R N .
0
Linear maps interact nicely with tensor products. If A : RN ! RN and
0 0 0
B : RM ! RM are linear, there is a linear map A ⌦ B : RN ⌦ RM ! RN ⌦ RM
defined by
(A ⌦ B)(~v ⌦ w)
~ = (A~v ) ⌦ (B w)
~
which in matrix form is the familiar Kronecker product.
24 2. ESSENTIALS OF QUANTUM MECHANICS

Remark 9 (Associativity of tensor products). For our purposes, it does not matter
whether we first form (V ⌦ W ) and then tensor with U from the right, or first form
(W ⌦ U ) and then tensor with V from the left. There is a canonical identification
between
(V ⌦ W ) ⌦ U and V ⌦ (W ⌦ U ),
and so we will simply write
V ⌦W ⌦U
without worrying about parentheses. This scales to many tensor factors. For a
vector space V we write
V ⌦k := V ⌦ · · · ⌦ V ,
| {z }
k copies
k
which has dimension (dim V ) and a basis {~ei1 ⌦ · · · ⌦ ~eik }. We will use this to
model multi–part systems: for example, a register of k N -ary variables naturally
k
lives in (RN )⌦k ' RN .
As a word of caution, order still matters. As we explained before, in general
we have ~v ⌦ w ~ 6= w
~ ⌦ ~v . When we want to swap the order of a tensor product we
will use the linear map SWAP : V ⌦ W ! W ⌦ V , acting by
SWAP · (~v ⌦ w) ~ ⌦ ~v .
~ =w
In summary, associativity lets us ignore parentheses; SWAP lets us reorder factors
when needed.
Going from the abstract back to the concrete, we have the example below:

Example 6. Suppose you are faced with this mess:


~ ⌦ (c ~s + d ~t + e ~u) ⌦ (f ~q + g ~r) .
(a ~v + b w)
To expand it, what do you do? Don’t panic. If you have a long list of things to do,
just do them one at a time. Specifically in this case, use associativity to expand
the bracketed terms first:
~ ⌦ (c ~s + d ~t + e ~u) ⌦(f ~q + g ~r)
(a ~v + b w)
| {z }
~
= (ac ~v ⌦ ~s + ad ~v ⌦ t + ae ~v ⌦ ~u + bc w ~ ⌦ ~t + be w
~ ⌦ ~s + bd w ~ ⌦ ~u) ⌦ (f ~q + g ~r).
Now you can multiply through and expand the rest of the terms as
acf ~v ⌦ ~s ⌦ ~q + acg ~v ⌦ ~s ⌦ ~r + adf ~v ⌦ ~t ⌦ ~q + adg ~v ⌦ ~t ⌦ ~r
+ aef ~v ⌦ ~u ⌦ ~q + aeg ~v ⌦ ~u ⌦ ~r + bcf w
~ ⌦ ~s ⌦ ~q + bcg w
~ ⌦ ~s ⌦ ~r
~ ⌦ ~t ⌦ ~q + bdg w
+ bdf w ~ ⌦ ~t ⌦ ~r + bef w
~ ⌦ ~u ⌦ ~q + beg w
~ ⌦ ~u ⌦ ~r ,
which is the desired expansion.

With some basic tensor product definitions at hand, we can now leverage them
to discuss joint probability distributions in a slick vector space formalism.
Respecting historical tradition,1 suppose we have two urns, where the first urn
has N objects and the second urn has M objects. Suppose that the probability
that we select one of the N items in the first urn is described by the probability
1See Ars Conjectandi by Jacob Bernoulli, published posthumously in 1713.
1. PROBABILITY THEORY ON VECTOR SPACES 25

vector p~ 2 RN , and the probability that we select one of the M items in the second
urn is described by the probability vector ~q 2 RM . Then if we select an item from
the first urn followed by the second urn, what is the probability that we sampled
item i from the first urn and item j from the second urn? The answer is encoded
in the tensor product p~ ⌦ ~q , and in particular its (i 1)M + jth entry:
p ⌦ ~q ](i
[~ 1)M +j = pi q j .
We can extract this entry by dotting p~ ⌦ ~q against ~eiT ⌦ ~ejT , namely
(~eiT ⌦ ~ejT ) · (~
p ⌦ ~q ) = pi qj .
The vector p~ ⌦ ~q is itself a probability vector living in N M ⇢ RN M ; thus it is a
probability distribution on N M outcomes, as we wanted.
So far we have examined p~ ⌦ ~q which is a product distribution, assuming in our
example that our sampling from each of the two urns is uncorrelated. Below we
show in an example that convex combinations of tensor products can represent a
correlated, joint distribution.

Example 7. Suppose the first urn has two items (N = 2), say a ring and a
watch, and the second urn has three items (M = 3), say a tissue, a match, and a
rubber band. The urns were prepared by the ghost of Jacob Bernoulli. We are told
that with probability 1/3 he put a ring in the first urn and a rubber band in the
second urn, and with probability 2/3 he put a watch in the first urn and a match
in the second urn. Then the joint distribution over the urns is described by
2 3
0
2 3 2 3 6 0 7
 0  0 6 7
1 1 2 0 61/37
4
⌦ 0 +5 4 5
⌦ 1 =6 6 7.
3 0 3 1 6 0 7
7
1 0 42/35
0
This distribution does not factorize into a tensor product of two individual vectors.

We abstract this example in the following remark.


Remark 10 (Joint distributions and multi-index notation). Given k probability
spaces represented by Ni ⇢ RNi for i = 1, ..., k, a distribution on the joint space
is represented by
N1 ···Nk ⇢ RN1 ···Nk ' RN1 ⌦ · · · ⌦ RNk .
Product (independent) distributions have the special form p~ (1) ⌦~ p (2) ⌦· · ·⌦~
p (k) , and
general joint distributions are convex combinations of such products. For example,
(j)
if p~i represents a distribution in RNj , then
X (1) (2) (k)
ri1 i2 ···ik p~i1 ⌦ p~i2 ⌦ · · · ⌦ p~ik
i1 ,i2 ,...,ik

is
P a joint distribution so long as ri1 i2 ···ik 0 for all i1 , i2 , ..., ik and additionally
i1 ,i2 ,...,ik ri1 i2 ···ik = 1. Here we have used a multi-index notation, P
in which we are
putting subscripts on subscripts; this is to avoid notation like a,b,c,... rabc··· which
do not specify the total number of subscripts, which in our case is k. (Moreover,
there are only 26 letters of the Latin alphabet.) Multi-index notation may initially
26 2. ESSENTIALS OF QUANTUM MECHANICS

seem like gross notation, but you will soon grow accustomed to it, like generations
have before you.
Joint distributions interface nicely with the ~1T row vector in a number of ways.
For clarity, let us write ~1TN to denote the all-ones row vector with N entries. Then
we have the nice identity
~1TN ⌦ ~1TN ⌦ · · · ⌦ ~1TN = ~1TN N ···N .
1 2 k 1 2 k

Thus if p~ is a joint distribution living in N1 N2 ···Nk , then we have


(~1TN ⌦ ~1TN ⌦ · · · ⌦ ~1TN ) · p~ = ~1TN N ···N · p~ = 1 .
1 2 k 1 2 k

We can also use the all-one row vector to formulate a nice way of computing mar-
ginal distributions. To illustrate, we proceed with the example below.

Example 8. Consider a joint distribution on 6 ⇢ R2 ⌦ R3 . Let us denote


the joint distribution by p~AB where A represents the first subsystem of two items,
and B represents the second subsystems of three items. Then we can write p~AB as
2 3
pAB (1, 1)
6pAB (1, 2)7
6 7
6pAB (1, 3)7
p~AB = 6 7
6pAB (2, 1)7 .
6 7
4pAB (2, 2)5
pAB (2, 3)
Suppose we want to marginalize over the second probability space (the one over
three items). Letting 1N denote the N ⇥ N identity matrix, we marvel at the linear
operator 12 ⌦ ~1T3 which maps R2 ⌦ R3 ! R2 . We marvel at it because applying the
operator to p~AB we find
 
p (1, 1) + pAB (1, 2) + pAB (1, 3) p (1)
(12 ⌦ ~1T3 ) · p~AB = AB = A = p~A
pAB (2, 1) + pAB (2, 2) + pAB (2, 3) pA (2)
where p~A is the marginal distribution on the first subsystem A, which has two items.

The insight in the above example generalizes in the following way.


Remark 11 (Marginalizing any subset of subsystems). Let p~ 2 N1 ···Nk be a joint
distribution on k subsystems with sizes N1 , ..., Nk . For any subset S ✓ {1, ..., k},
define the linear “marginalization” map
k
(
O 1Nj if j 2 S
MS := K j = K1 ⌦ K2 ⌦ · · · ⌦ Kk , Kj = ,
~1NT if j 2
/S
j=1 j

and so MS : RN1 ···Nk ! RProdj2S Nj . Then MS · p~ is the marginal over the subsys-
tems indexed by S.
To summarize, we have recast ordinary probability theory (on discrete prob-
ability spaces) in a linear-algebraic language, which has motivated us to develop
the fundamentals of multi-linear algebra and tensor products. This mathemati-
cal technology certainly illuminates aspects of multi-linearity lurking in ordinary
probability theory. But our true motivation was to set up probability theory in
such a way as to make (finite-dimensional) quantum mechanics appear as a natu-
ral generalization, using many of the same ingredients. In this next section when
2. QUANTUM THEORY IN FINITE DIMENSIONS 27

we introduce quantum mechanics, we will relentlessly capitalize on parallels with


probability theory, but also take care to point out where such parallels break down.

2. Quantum theory in finite dimensions


We begin with a very brief history of quantum theory. Circa 1900 Max Planck
studied blackbody radiation, and solved an inadequacy in the extant equations by
stipulating that energy is quantized in units of his eponymous constant. Then in
1905, Einstein suggests that light itself is quantized as “photons”, providing an
explanation for the photoelectric e↵ect. In the ensuing decade, Bohr makes a first
pass at quantum theory (the so-called ‘old’ quantum theory), and correctly predicts
the spectral lines of hydrogen. This first pass at quantum theory only goes so far,
and a second pass is made in the 1920’s. In 1924, de Broglie postulates that a
particle with momentum p has ‘wavelength’ = h/p, which is soon confirmed
by electron di↵raction experiments. Thereafter, Heisenberg, Born, and Jordan
developed matrix mechanics in 1925 (although they did not yet understand the
connection to de Broglie). In 1926, Schrödinger leveraged de Broglie’s insight to
develop wave mechanics, and that same year showed the equivalence with matrix
mechanics. That year as well, Born gave a ‘probabilistic’ interpretation of quantum
mechanics which clarified its connections to measurable quantities in experiments.
In 1927, Heisenberg wrote down his famous uncertainty principle. Most of the
abstract mathematical foundations of quantum mechanics were consolidated by
Dirac and von Neumann in the early 1930’s, and Einstein-Podolsky-Rosen as well
as Schrödinger highlighted the importance of entanglement in 1935. The year after
in 1936, Birko↵ and von Neumann investigated how quantum mechanics leads to a
new form of logical reasoning that goes beyond classical Boolean logic; in hindsight
this may be regarded as the first hint of the possibility of quantum computing
(although it was not understood as such at the time).
Having completed our brief historical diegesis, we now turn to presenting the
axioms of quantum mechanics. There are various ways of ‘motivating’ the axioms
of quantum mechanics, although at some level they were guessed by very clever
people and experimentally confirmed by very clever people (sometimes in the op-
posite order). We will, however, give some intuition. But first, a word of caution.
When someone asks for a motivation for quantum mechanics in terms of classical
mechanics, this is philosophically backwards; it would be like asking for a deriva-
tion of special relativity starting from Newton’s equations. Indeed, just as special
relativity reduces to Newtonian physics in a certain regime of validity, so too does
quantum mechanics reduce to classical mechanics in a certain regime of validity.
Nonetheless, we will proceed with an idiosyncratic way of ‘guessing’ some of the
axioms of quantum mechanics starting from classical intuitions.

2.1. Mechanics on `p spaces: from classical to quantum


Let us begin by contemplating the salient mathematical structures undergirding
the dynamics of probability distributions discussed above. For this, it is useful to
have the following definition:

Definition 12 (Normed vector space). Let V be a vector space over a field K; we


will consider either V = RN (with K = R), or V = CN (with K = C). A normed

You might also like