0% found this document useful (0 votes)

20 views11 pages

Quantum Mechanics: Probability Theory Basics

Chapter 2 introduces the essentials of quantum mechanics, focusing on the foundational elements of probability theory as it applies to quantum systems. It discusses probability distributions, transformations, and the concept of Markov matrices, which are crucial for understanding dynamics in quantum learning theory. The chapter also covers joint distributions and tensor products, emphasizing their importance in the mathematical framework of quantum mechanics.

Uploaded by

tanmoy321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views11 pages

Quantum Mechanics: Probability Theory Basics

Uploaded by

tanmoy321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CHAPTER 2

Essentials of Quantum Mechanics

We begin by building up the basic ingredients of quantum mechanics. This

is not meant to be a course on quantum mechanics, and so we will proceed prag-
matically and without much fanfare. We will have the luxury of working with
finite-dimensional Hilbert spaces (if you do not know what this means, you will
soon), since this is the setting of most present applications of quantum learning
theory. Our pedagogical approach will be to revisit ordinary probability theory in
a suggestive way that naturally generalizes to quantum theory. Our exposition is
meant to be accessible to readers with a knowledge of linear algebra and probability
theory.

1. Probability theory on vector spaces

1.1. Probability distributions and their transformations
Here we will formulate probability theory on a discrete space, with some addi-
tional linear algebraic baggage that will be useful later. If we have a set of size N
we can represent a probability distribution over that set as a vector in RN given by
2 3
p1
6 p2 7
6 7
p~ = 6 . 7
4 .. 5
pN
where pi is the probability of the ith item. We have, out of convenience, chosen an
ordering on our set of items so that we can organize the probabilities into a vector,
but of course this ordering is arbitrary. As usual, we require pi 0 for all i since
PN
probabilities cannot be negative, and also i=1 pi = 1 so that the probabilities are
appropriately normalized. There is a natural way of packaging the normalization
condition. To this end, consider the row vector
⇥ ⇤
~1T = 1 1 · · · 1 .
PN
Then i=1 pi = 1 is equivalent to
~1T · p~ = 1 ,

and we will use this more compact expression henceforth. It will sometimes be
useful to consider the probability simplex N which is a subset of RN , where N
consists of all nonnegative vectors with entries summing to one. Then we can write
p~ 2 N .
Next we consider a rudimentary version of dynamics. That is, what kinds of
transformations on p~ will map it into another valid probability distribution? The
17
18 2. ESSENTIALS OF QUANTUM MECHANICS

simplest kind of transformation we can imagine is a linear one, so let us examine

that first. Letting M be an N ⇥ N matrix, we consider the transformation
p~0 = M · p~ ,
so that p~ 0 is the new probability distribution after the transformation. But what
conditions do we need to put on M such that p~ 0 is a bona fide probability dis-
tribution for all initial distributions p~? Well, we need for all entries of p~ 0 to be
nonnegative, and for ~1T · p~ 0 = 1. To ensure the first property, suppose that p~ is all
zeroes except for the jth entry which equals one. (That is, we would sample the
jth object with probability 1 and never sample anything else.) To introduce some
other notation, let ~ej be vector which is all zeroes except for the jth entry which
equals one. Then we have
2 3
M1j
6 M2j 7
6 7
p~ 0 = M · ~ej = 6 . 7 .
.
4 . 5
MN j
0
In order for all entries of p~ to be nonnegative, we evidently require Mij 0 for all
j, and i fixed. Varying over i as well, we find the requirement that Mij 0 for all
i, j, and so M must be a matrix with nonnegative entries. Since we also demand
that ~1T · p~ 0 = 1, we find the condition
2 3
M1j
6 M2j 7
~1T · p~ 0 = ~1T · M · ~ej = ~1T · 6 7
6 .. 7 = 1 .
4 . 5
MN j
That is, the jth column of M must sum up to one. Since this must hold for every
column, we find the condition
~1T · M = ~1T . (2)
Thus a nonnegative matrix satisfying (2) will send probability vectors to probability
vectors. We honor this finding with a definition:
Definition 5 (Markov matrix). Let M be an N ⇥ N matrix. We say that M is a
Markov matrix if Mij 0 for all i, j, and ~1T ·M = ~1T . Then M maps probability
vectors to probability vectors.
A few comments are in order. In many treatments of Markov matrices, there
is a di↵erent convention in which M is taken to act on probability distributions
‘to the left’, which would give the transpose our definition above. Our conventions
here are chosen to align with those of quantum mechanics, as we will see later on.
We immediately notice that Markov matrices behave nicely under composition.
Specifically, we have the useful lemma:
Lemma 6 (Composition of Markov matrices). If M1 , M2 , ..., Mk are Markov ma-
trices, then Mk · · · M2 · M1 is also a Markov matrix.
The proof of this useful fact follows by a short calculation using the definition
(which you should do if you have not thought it through before). The upshot of
1. PROBABILITY THEORY ON VECTOR SPACES 19

this lemma is that we can consider transformations like

p~ 0 = Mk · · · M2 · M1 · p~
as instantiating a type of ‘circuit’, with depth k. That is, we could say the words:
starting with p~ we apply M1 followed by M2 followed by M3 and so on, and then
finally apply Mk .
Before moving on to increasing levels of sophistication, we consider a simple
example:

Example 1 (Bernoulli coin, N = 2). We now specialize to a two–outcome

space and fix the ordering so that the first coordinate is outcome 0 (“success”)
and the second is outcome 1 (“failure”). A Bernoulli distribution with success
probability ✓ is therefore represented by
 
Pr[0] ✓
p~✓ = = , ✓ 2 [0, 1].
Pr[1] 1 ✓
Consider the bit–flip dynamics with flip probability " 2 [0, 1],

1 " "
M" = ,
" 1 "
whose entries are nonnegative and whose columns each sum to 1, so M" is a Markov
matrix in our sense. Acting on p~ produces

(1 ")✓ + "(1 ✓)
p~✓00 = M" p~✓ = =) ✓0 = (1 2") ✓ + ",
"✓ + (1 ")(1 ✓)
where ✓0 = Pr0 [0] is the new success probability.
Some immediate checks help build intuition. When " = 0 the map is the
1
 0 $ 1; and when " = 2 it sends every
identity; when " = 1 it deterministically flips
1/2
input to the uniform distribution p~1/2 = in one step. For any 0 < " < 1,
1/2
the unique fixed point solves ✓0 = ✓ and is ✓⇤ = 12 . (To see this, simply solve
✓⇤ = (1 2")✓⇤ + " for ✓⇤ ). Iterating M" a total of k times yields exponential
mixing toward the fixed point ✓⇤ at rate |1 2"|:
⇣ 1⌘ 1
k
✓(k) = 1 2" ✓(0) + .
2 2
Finally, the family M" of Markov matrices is closed under composition (illustrating
the lemma above): a short calculation shows
M⌘ M" = M"+⌘ 2"⌘ ,

and in particular M"k = M"eff with

1 (1 2")k
"e↵ = .
2
This two–state example already displays dynamics, fixed points, and circuit com-
position within the linear–algebraic language we have been developing.

Moving on, it is useful to recount a few features of probability distributions. If

20 2. ESSENTIALS OF QUANTUM MECHANICS

we have k probability distributions p~1 , ..., p~k , then we can form a new probability
distribution by forming a convex combination
k
X
p~ 0 = rj p~j (3)
j=1
Pk
where rj 0 and rj = 1. To see this, notice that p~ 0 has nonnegative entries
Pj=1
k Pk
and that ~1T · p~ 0 = j=1 rj (~1T · p~j ) = j=1 rj = 1. We can interpret r1 , ..., rk as a
probability distribution over k items in its own right, and say of (3) that we have
a probabilistic mixture of k probability distributions wherein we sample from p~j
with probability rj . That is, r1 , ..., rk is a probability distribution over probability
distributions. (You can use this ‘meta’ statement to impress your friends, if you
like.) To make this concrete, consider the following example:

Example 2 (Sampling two coins, N = 2). Suppose we have two Bernoulli

coins, represented by the probability vectors p~1/2 and p~1/3 , respectively. The first
one gives heads with probability 1/2 and tails with probability 1/2, and the second
gives heads with probability 1/3 and tails with probability 2/3. Now suppose I
have both coins in my pocket in such a way that when I reach in, I grab the first
coin with probability 1/4 and the second coin with probability 3/4. Then if I reach
in and grab a coin and toss it, what is the probability that I would output heads?
This is described by the convex combination

1 3 3/8
p~1/2 + p~1/3 = ,
4 4 5/8
and so evidently the probability of heads is 3/8.

So far we have only considered linear transformations on p~ that map it into

another probability distribution. What if we consider nonlinear transformations?
One example would be the nonlinear transformation
2 2
3
p
PN 1
6 i=12 p2i 7
6 P p2 7
6 N p2 7
p) = 6
T (~ i=1 i 7
6 .. 7 .
6 . 7
4 2 5
p
PNN
i=1 p2i

Another example would be a Bayesian update. There are clearly a vast infinitude of
other possibilities as well. Among this infinitude of transformations there is a natu-
ral class that interfaces well with convex combinations of probability distributions.
In particular, suppose we mandate that T satisfies
0 1
Xk k
X
T@ rj p~j A = rj T (~
pj ) (4)
j=1 j=1

for any p~1 , ..., p~k and any valid r1 , ..., rk . In words, we are requiring that a transfor-
mation of a probabilistic mixture is a probabilistic mixture of transformations (and
specifically, the same transformation). Such T ’s satisfy a nice structure theorem:
1. PROBABILITY THEORY ON VECTOR SPACES 21

Theorem 7 (Mixture-preserving transformations are Markov matrices). Suppose

that T : N ! N is a mixture-preserving transformation, namely that (4) is
satisfied. Then there exists a Markov matrix M such that T (~ p) = M · p~ for all p~.
PN
Proof. Write p~ = j=1 pj ~ej . Using the mixture-preserving property of T , we
have
0 1
XN N
X
p) = T@
T (~ pj ~ej A = pj T (~ej ) .
j=1 j=1

Let M be the matrix whose jth column is T (~ej ). Then T (~p) = M · p~. Each column
T (~ej ) is a probability vector, so Mij 0 and ~1T · M = ~1T . Thus M is a Markov
matrix, as claimed. ⇤
Mixture-preserving transformations are natural from a physical point of view.
Imagine a preparation device that, with probabilities r1 , ..., rk , produces one of the
distributions p~1 , ..., p~k by consulting some randomly tossed coins you do not get to
see. If dynamics could distinguish whether this randomization happened “before”
or “after” the transformation, then the timing of the unseen coin flips would be
observable from the output statistics alone. Requiring that they not be observable
is exactly the statement of (4).
Two simple consequences are worth keeping in mind. First, the admissible
dynamics are closed under randomized control: if with probability rj you implement
a Markov matrix Mj , then the overall map is
k
X
M0 = r j Mj ,
j=1
Pk
which is again a Markov matrix since ~1 T · M 0 = j=1 rj (~1T · Mj ) = ~1T and all
entries are nonnegative. Second, if one further insists that deterministic states are
carried to deterministic states, so that ~ej never acquires additional randomness,
then each column T (~ej ) must itself be a basis vector. Equivalently, M has exactly
one 1 (and zeros elsewhere) in each column. Such matrices are sometimes called
deterministic or functional Markov matrices. If in addition the mapping j 7! i(j)
is injective (no two distinct columns point to the same basis vector), then M is a
permutation matrix.
By contrast, nonlinear updates arise when you condition on a revealed out-
come and then renormalize; the rule in that case depends on which outcome was
announced, so it is not a single fixed map on N and does not represent closed-
system dynamics. This classical discussion sets the stage for the quantum case,
which we will treat soon. (There, the state space becomes the convex set of density
operators, mixture-preserving maps become convex-linear “channels,” and the role
of Markov matrices is played by completely positive, trace-preserving maps.)

1.2. Joint distributions and tensor products

In probability theory it is essential to consider joint distributions. Here we
develop the basic operations of joint distributions in a convenient and illuminating
linear algebraic notation. First we require some additional tools on the linear
algebra side. Specifically, we will upgrade our linear algebraic toolkit to multi-linear
22 2. ESSENTIALS OF QUANTUM MECHANICS

algebra. The key operation will be the tensor product, which is an operation for
joining two or more vector spaces.
We will proceed by motivating the tensor product informally through simple
examples, and then give the abstract definition. It is worth paying close attention
as the tensor product will serve as an essential piece of mathematical architecture
for almost everything in quantum learning theory.
~ in RN . We denote their tensor product by ~v ⌦ w.
Consider two vectors ~v , w ~ To
develop what this means, consider the example below.
 
1 3
Example 3. Let ~v = and w~ = . Then their tensor product ~v ⌦ w~ is
2 4
represented by
2  3
3 2 3
  61 · 4 7 3
1 3 6 7 647
~v ⌦ w~= ⌦ =6 7 6 7
6  7 = 465 .
2 4 4 3 5
2· 8
4

In words, w
~ gets ‘sucked in’ to ~v . Now let us take the tensor product in the other
~ ⌦ ~v :
order, namely w
2  3
1 2 3
  63 · 2 7 3
3 1 6 7 667
~ ⌦ ~v =
w ⌦ =66 
7=6 7.
7 445
4 2 4 5
1
4· 8
2

From this we glean that, in general, ~v ⌦ w ~ ⌦ ~v . Moreover, since ~v 2 R2 and

~ 6= w
~ 2 R , we notice that ~v ⌦ w
w 2
~ 2 R . To this end we write ~v ⌦ w
4
~ 2 R2 ⌦ R2 ' R4 .
2 3
 3
1
Example 4. Suppose ~v = ~ = 445 so that ~v 2 R2 and w
and w ~ 2 R3 .
2
5
Then
2 3
3
647
6 7
657
~v ⌦ w~ =6 7
6672R ,
6
6 7
485
10

~ 2 R2 ⌦ R3 ' R6 .
and we write ~v ⌦ w

From the previous two examples we see the general rule that if ~v 2 RN and w
~ 2 RM ,
~ 2R ⌦R 'R
then ~v ⌦ w N M NM
. So upon taking the tensor product of two vector
spaces, the dimensions multiply. We can generalize this further by contemplating
another example:
1. PROBABILITY THEORY ON VECTOR SPACES 23

  
1 3 5
Example 5. Let ~v = ,w ~= , and ~u = . Then we have
2 4 6
2 3
15
6187
2 3 6 7
3 6207
 6 7
647 5 6247
~v ⌦ w
~ ⌦ ~u = (~v ⌦ w) 6
~ ⌦ ~u = 4 5 ⌦7 =6 7
6 6 6307
6 7
8 6367
6 7
4405
48
~ ⌦ ~u 2 R2 ⌦ R2 ⌦ R2 ' R8 .
and ~v ⌦ w

The above example indicates that

RN1 ⌦ RN2 ⌦ · · · ⌦ RNk ' RN1 N2 ···Nk ,
namely that if we take the tensor product of k vector spaces then the result is a
vector space which is the product of the dimensions of the constituents.
We are now ready to define tensor products abstractly, and to really appreciate
what it means. Consider the following definition:
Definition 8 (Tensor product). Let V and W be real vector spaces. A tensor
product of V and W is a vector space V ⌦ W together with a map
⌦ : V ⇥ W ! V ⌦ W, (v, w) 7! v ⌦ w,
that is bilinear in each argument, i.e. for all scalars a, b, c 2 R and vectors ~v , w,
~ ~u,
~ ⌦ ~u = a (~v ⌦ ~u) + b (w
(a ~v + b w) ~ ⌦ ~u),
~v ⌦ (b w
~ + c ~u) = b (~v ⌦ w)
~ + c (~v ⌦ ~u),
and in particular (a~v ) ⌦ w
~ = ~v ⌦ (aw)
~ = a(~v ⌦ w).
~ Concretely, one may construct
V ⌦ W as the vector space spanned by formal symbols v ⌦ w modulo the above
bilinearity relations.
~ M
To connect this with coordinates, fix bases {~ei }N i=1 of R and {fj }j=1 of R . Then
N M

the N M simple tensors {~ei ⌦f~j }i,j form a basis of R ⌦R , and so dim(R ⌦RM ) =
N M N
P P
~ = j wj f~j , then
N M . If ~v = i vi ~ei and w
X
~v ⌦ w~= vi wj (~ei ⌦ f~j ) ,
i,j

which recovers the stacking rules seen in the earlier examples and realizes the iden-
tification RN ⌦ RM ' RN M .
Identifying R with the one–dimensional space spanned by 1, there are canonical
isomorphisms V ⌦ R ' V ' R ⌦ V given by ~v ⌦ a 7! a ~v and a ⌦ ~v 7! a ~v . Hence
RN ⌦ R1 ' RN ' R 1 ⌦ R N .
0
Linear maps interact nicely with tensor products. If A : RN ! RN and
0 0 0
B : RM ! RM are linear, there is a linear map A ⌦ B : RN ⌦ RM ! RN ⌦ RM
defined by
(A ⌦ B)(~v ⌦ w)
~ = (A~v ) ⌦ (B w)
~
which in matrix form is the familiar Kronecker product.
24 2. ESSENTIALS OF QUANTUM MECHANICS

Remark 9 (Associativity of tensor products). For our purposes, it does not matter
whether we first form (V ⌦ W ) and then tensor with U from the right, or first form
(W ⌦ U ) and then tensor with V from the left. There is a canonical identification
between
(V ⌦ W ) ⌦ U and V ⌦ (W ⌦ U ),
and so we will simply write
V ⌦W ⌦U
without worrying about parentheses. This scales to many tensor factors. For a
vector space V we write
V ⌦k := V ⌦ · · · ⌦ V ,
| {z }
k copies
k
which has dimension (dim V ) and a basis {~ei1 ⌦ · · · ⌦ ~eik }. We will use this to
model multi–part systems: for example, a register of k N -ary variables naturally
k
lives in (RN )⌦k ' RN .
As a word of caution, order still matters. As we explained before, in general
we have ~v ⌦ w ~ 6= w
~ ⌦ ~v . When we want to swap the order of a tensor product we
will use the linear map SWAP : V ⌦ W ! W ⌦ V , acting by
SWAP · (~v ⌦ w) ~ ⌦ ~v .
~ =w
In summary, associativity lets us ignore parentheses; SWAP lets us reorder factors
when needed.
Going from the abstract back to the concrete, we have the example below:

Example 6. Suppose you are faced with this mess:

~ ⌦ (c ~s + d ~t + e ~u) ⌦ (f ~q + g ~r) .
(a ~v + b w)
To expand it, what do you do? Don’t panic. If you have a long list of things to do,
just do them one at a time. Specifically in this case, use associativity to expand
the bracketed terms first:
~ ⌦ (c ~s + d ~t + e ~u) ⌦(f ~q + g ~r)
(a ~v + b w)
| {z }
~
= (ac ~v ⌦ ~s + ad ~v ⌦ t + ae ~v ⌦ ~u + bc w ~ ⌦ ~t + be w
~ ⌦ ~s + bd w ~ ⌦ ~u) ⌦ (f ~q + g ~r).
Now you can multiply through and expand the rest of the terms as
acf ~v ⌦ ~s ⌦ ~q + acg ~v ⌦ ~s ⌦ ~r + adf ~v ⌦ ~t ⌦ ~q + adg ~v ⌦ ~t ⌦ ~r
+ aef ~v ⌦ ~u ⌦ ~q + aeg ~v ⌦ ~u ⌦ ~r + bcf w
~ ⌦ ~s ⌦ ~q + bcg w
~ ⌦ ~s ⌦ ~r
~ ⌦ ~t ⌦ ~q + bdg w
+ bdf w ~ ⌦ ~t ⌦ ~r + bef w
~ ⌦ ~u ⌦ ~q + beg w
~ ⌦ ~u ⌦ ~r ,
which is the desired expansion.

With some basic tensor product definitions at hand, we can now leverage them
to discuss joint probability distributions in a slick vector space formalism.
Respecting historical tradition,1 suppose we have two urns, where the first urn
has N objects and the second urn has M objects. Suppose that the probability
that we select one of the N items in the first urn is described by the probability
1See Ars Conjectandi by Jacob Bernoulli, published posthumously in 1713.
1. PROBABILITY THEORY ON VECTOR SPACES 25

vector p~ 2 RN , and the probability that we select one of the M items in the second
urn is described by the probability vector ~q 2 RM . Then if we select an item from
the first urn followed by the second urn, what is the probability that we sampled
item i from the first urn and item j from the second urn? The answer is encoded
in the tensor product p~ ⌦ ~q , and in particular its (i 1)M + jth entry:
p ⌦ ~q ](i
[~ 1)M +j = pi q j .
We can extract this entry by dotting p~ ⌦ ~q against ~eiT ⌦ ~ejT , namely
(~eiT ⌦ ~ejT ) · (~
p ⌦ ~q ) = pi qj .
The vector p~ ⌦ ~q is itself a probability vector living in N M ⇢ RN M ; thus it is a
probability distribution on N M outcomes, as we wanted.
So far we have examined p~ ⌦ ~q which is a product distribution, assuming in our
example that our sampling from each of the two urns is uncorrelated. Below we
show in an example that convex combinations of tensor products can represent a
correlated, joint distribution.

Example 7. Suppose the first urn has two items (N = 2), say a ring and a
watch, and the second urn has three items (M = 3), say a tissue, a match, and a
rubber band. The urns were prepared by the ghost of Jacob Bernoulli. We are told
that with probability 1/3 he put a ring in the first urn and a rubber band in the
second urn, and with probability 2/3 he put a watch in the first urn and a match
in the second urn. Then the joint distribution over the urns is described by
2 3
0
2 3 2 3 6 0 7
 0  0 6 7
1 1 2 0 61/37
4
⌦ 0 +5 4 5
⌦ 1 =6 6 7.
3 0 3 1 6 0 7
7
1 0 42/35
0
This distribution does not factorize into a tensor product of two individual vectors.

We abstract this example in the following remark.

Remark 10 (Joint distributions and multi-index notation). Given k probability
spaces represented by Ni ⇢ RNi for i = 1, ..., k, a distribution on the joint space
is represented by
N1 ···Nk ⇢ RN1 ···Nk ' RN1 ⌦ · · · ⌦ RNk .
Product (independent) distributions have the special form p~ (1) ⌦~ p (2) ⌦· · ·⌦~
p (k) , and
general joint distributions are convex combinations of such products. For example,
(j)
if p~i represents a distribution in RNj , then
X (1) (2) (k)
ri1 i2 ···ik p~i1 ⌦ p~i2 ⌦ · · · ⌦ p~ik
i1 ,i2 ,...,ik

is
P a joint distribution so long as ri1 i2 ···ik 0 for all i1 , i2 , ..., ik and additionally
i1 ,i2 ,...,ik ri1 i2 ···ik = 1. Here we have used a multi-index notation, P
in which we are
putting subscripts on subscripts; this is to avoid notation like a,b,c,... rabc··· which
do not specify the total number of subscripts, which in our case is k. (Moreover,
there are only 26 letters of the Latin alphabet.) Multi-index notation may initially
26 2. ESSENTIALS OF QUANTUM MECHANICS

seem like gross notation, but you will soon grow accustomed to it, like generations
have before you.
Joint distributions interface nicely with the ~1T row vector in a number of ways.
For clarity, let us write ~1TN to denote the all-ones row vector with N entries. Then
we have the nice identity
~1TN ⌦ ~1TN ⌦ · · · ⌦ ~1TN = ~1TN N ···N .
1 2 k 1 2 k

Thus if p~ is a joint distribution living in N1 N2 ···Nk , then we have

(~1TN ⌦ ~1TN ⌦ · · · ⌦ ~1TN ) · p~ = ~1TN N ···N · p~ = 1 .
1 2 k 1 2 k

We can also use the all-one row vector to formulate a nice way of computing mar-
ginal distributions. To illustrate, we proceed with the example below.

Example 8. Consider a joint distribution on 6 ⇢ R2 ⌦ R3 . Let us denote

the joint distribution by p~AB where A represents the first subsystem of two items,
and B represents the second subsystems of three items. Then we can write p~AB as
2 3
pAB (1, 1)
6pAB (1, 2)7
6 7
6pAB (1, 3)7
p~AB = 6 7
6pAB (2, 1)7 .
6 7
4pAB (2, 2)5
pAB (2, 3)
Suppose we want to marginalize over the second probability space (the one over
three items). Letting 1N denote the N ⇥ N identity matrix, we marvel at the linear
operator 12 ⌦ ~1T3 which maps R2 ⌦ R3 ! R2 . We marvel at it because applying the
operator to p~AB we find
 
p (1, 1) + pAB (1, 2) + pAB (1, 3) p (1)
(12 ⌦ ~1T3 ) · p~AB = AB = A = p~A
pAB (2, 1) + pAB (2, 2) + pAB (2, 3) pA (2)
where p~A is the marginal distribution on the first subsystem A, which has two items.

The insight in the above example generalizes in the following way.

Remark 11 (Marginalizing any subset of subsystems). Let p~ 2 N1 ···Nk be a joint
distribution on k subsystems with sizes N1 , ..., Nk . For any subset S ✓ {1, ..., k},
define the linear “marginalization” map
k
(
O 1Nj if j 2 S
MS := K j = K1 ⌦ K2 ⌦ · · · ⌦ Kk , Kj = ,
~1NT if j 2
/S
j=1 j

and so MS : RN1 ···Nk ! RProdj2S Nj . Then MS · p~ is the marginal over the subsys-
tems indexed by S.
To summarize, we have recast ordinary probability theory (on discrete prob-
ability spaces) in a linear-algebraic language, which has motivated us to develop
the fundamentals of multi-linear algebra and tensor products. This mathemati-
cal technology certainly illuminates aspects of multi-linearity lurking in ordinary
probability theory. But our true motivation was to set up probability theory in
such a way as to make (finite-dimensional) quantum mechanics appear as a natu-
ral generalization, using many of the same ingredients. In this next section when
2. QUANTUM THEORY IN FINITE DIMENSIONS 27

we introduce quantum mechanics, we will relentlessly capitalize on parallels with

probability theory, but also take care to point out where such parallels break down.

2. Quantum theory in finite dimensions

We begin with a very brief history of quantum theory. Circa 1900 Max Planck
studied blackbody radiation, and solved an inadequacy in the extant equations by
stipulating that energy is quantized in units of his eponymous constant. Then in
1905, Einstein suggests that light itself is quantized as “photons”, providing an
explanation for the photoelectric e↵ect. In the ensuing decade, Bohr makes a first
pass at quantum theory (the so-called ‘old’ quantum theory), and correctly predicts
the spectral lines of hydrogen. This first pass at quantum theory only goes so far,
and a second pass is made in the 1920’s. In 1924, de Broglie postulates that a
particle with momentum p has ‘wavelength’ = h/p, which is soon confirmed
by electron di↵raction experiments. Thereafter, Heisenberg, Born, and Jordan
developed matrix mechanics in 1925 (although they did not yet understand the
connection to de Broglie). In 1926, Schrödinger leveraged de Broglie’s insight to
develop wave mechanics, and that same year showed the equivalence with matrix
mechanics. That year as well, Born gave a ‘probabilistic’ interpretation of quantum
mechanics which clarified its connections to measurable quantities in experiments.
In 1927, Heisenberg wrote down his famous uncertainty principle. Most of the
abstract mathematical foundations of quantum mechanics were consolidated by
Dirac and von Neumann in the early 1930’s, and Einstein-Podolsky-Rosen as well
as Schrödinger highlighted the importance of entanglement in 1935. The year after
in 1936, Birko↵ and von Neumann investigated how quantum mechanics leads to a
new form of logical reasoning that goes beyond classical Boolean logic; in hindsight
this may be regarded as the first hint of the possibility of quantum computing
(although it was not understood as such at the time).
Having completed our brief historical diegesis, we now turn to presenting the
axioms of quantum mechanics. There are various ways of ‘motivating’ the axioms
of quantum mechanics, although at some level they were guessed by very clever
people and experimentally confirmed by very clever people (sometimes in the op-
posite order). We will, however, give some intuition. But first, a word of caution.
When someone asks for a motivation for quantum mechanics in terms of classical
mechanics, this is philosophically backwards; it would be like asking for a deriva-
tion of special relativity starting from Newton’s equations. Indeed, just as special
relativity reduces to Newtonian physics in a certain regime of validity, so too does
quantum mechanics reduce to classical mechanics in a certain regime of validity.
Nonetheless, we will proceed with an idiosyncratic way of ‘guessing’ some of the
axioms of quantum mechanics starting from classical intuitions.

2.1. Mechanics on `p spaces: from classical to quantum

Let us begin by contemplating the salient mathematical structures undergirding
the dynamics of probability distributions discussed above. For this, it is useful to
have the following definition:

Definition 12 (Normed vector space). Let V be a vector space over a field K; we

will consider either V = RN (with K = R), or V = CN (with K = C). A normed

Overview of Quantum Information Theory
No ratings yet
Overview of Quantum Information Theory
100 pages
Probability Basics and Key Equations
No ratings yet
Probability Basics and Key Equations
29 pages
Understanding Probability Vectors and Stochastic Matrices
No ratings yet
Understanding Probability Vectors and Stochastic Matrices
14 pages
Probability Measures on Semigroups
No ratings yet
Probability Measures on Semigroups
437 pages
Understanding Markov Chains Basics
No ratings yet
Understanding Markov Chains Basics
40 pages
Markov Chain Fundamentals and Examples
No ratings yet
Markov Chain Fundamentals and Examples
15 pages
Machine Learning For Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little Ebook Updated Ebook Pack
100% (2)
Machine Learning For Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little Ebook Updated Ebook Pack
36 pages
Quantum Algorithms for Permutation Recovery
No ratings yet
Quantum Algorithms for Permutation Recovery
7 pages
Markov Chains and Bayesian Statistics Guide
No ratings yet
Markov Chains and Bayesian Statistics Guide
29 pages
Understanding Basic Markov Chains
No ratings yet
Understanding Basic Markov Chains
75 pages
Stochastic Processes: Markov Chains and More
No ratings yet
Stochastic Processes: Markov Chains and More
52 pages
Markov Chains: Definitions and Examples
No ratings yet
Markov Chains: Definitions and Examples
55 pages
Stochastic Dynamics Course Overview
No ratings yet
Stochastic Dynamics Course Overview
78 pages
An Introduction To Random Matrices
No ratings yet
An Introduction To Random Matrices
506 pages
Stochastic Processes and Markov Chains
No ratings yet
Stochastic Processes and Markov Chains
21 pages
Markov Chain Simulation in MATLAB
No ratings yet
Markov Chain Simulation in MATLAB
17 pages
Introduction to Random Matrices Theory
No ratings yet
Introduction to Random Matrices Theory
500 pages
Statistical Physics in Machine Learning
No ratings yet
Statistical Physics in Machine Learning
74 pages
RMT Course Notes
No ratings yet
RMT Course Notes
67 pages
ECE 368: Probabilistic Reasoning Overview
No ratings yet
ECE 368: Probabilistic Reasoning Overview
138 pages
Time-Homogeneous Markov Chains Overview
No ratings yet
Time-Homogeneous Markov Chains Overview
13 pages
Linear Algebra in Markov Chains
No ratings yet
Linear Algebra in Markov Chains
14 pages
Random Matrix Theories and Chaotic Dynamics: Oriol Bohigas
No ratings yet
Random Matrix Theories and Chaotic Dynamics: Oriol Bohigas
116 pages
CS229 Probability Theory Overview
No ratings yet
CS229 Probability Theory Overview
36 pages
Machine Learning for Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little ebook full digital file set
No ratings yet
Machine Learning for Signal Processing: Data Science, Algorithms, and Computational Statistics Max A. Little ebook full digital file set
31 pages
Week1-A
No ratings yet
Week1-A
61 pages
Understanding Brownian Motion Basics
No ratings yet
Understanding Brownian Motion Basics
55 pages
Machine Learning Math Review Basics
No ratings yet
Machine Learning Math Review Basics
61 pages
Probability Theory and Optimization Concepts
No ratings yet
Probability Theory and Optimization Concepts
68 pages
Stochastic Dynamic Programming Overview
No ratings yet
Stochastic Dynamic Programming Overview
17 pages
CS 281: Advanced Machine Learning Course
100% (1)
CS 281: Advanced Machine Learning Course
88 pages
Markov Chain Population Dynamics
No ratings yet
Markov Chain Population Dynamics
104 pages
Stochastic Processes: Markov Chains Guide
No ratings yet
Stochastic Processes: Markov Chains Guide
39 pages
Norms and Inner Products Overview
No ratings yet
Norms and Inner Products Overview
9 pages
Understanding Stochastic Processes
No ratings yet
Understanding Stochastic Processes
139 pages
Introduction to Markov Chains and Processes
No ratings yet
Introduction to Markov Chains and Processes
114 pages
Finite Markov Chains Overview
No ratings yet
Finite Markov Chains Overview
7 pages
Markov Chains: Definitions and Examples
No ratings yet
Markov Chains: Definitions and Examples
59 pages
COL7160 Lec4
No ratings yet
COL7160 Lec4
7 pages
Feb-04 (1)
No ratings yet
Feb-04 (1)
5 pages
Introduction to Probability Theory
No ratings yet
Introduction to Probability Theory
124 pages
Quantum Monte Carlo Methods Algorithms For Lattice Models 1st Edition James Gubernatis Instant Download
No ratings yet
Quantum Monte Carlo Methods Algorithms For Lattice Models 1st Edition James Gubernatis Instant Download
94 pages
Probability Theory for Data Scientists
No ratings yet
Probability Theory for Data Scientists
19 pages
Probability Measures on Semigroups
No ratings yet
Probability Measures on Semigroups
10 pages
Markov Chains and Steady-State Analysis
No ratings yet
Markov Chains and Steady-State Analysis
10 pages
MTH2222 Mathematics of Uncertainty
No ratings yet
MTH2222 Mathematics of Uncertainty
96 pages
Finding Stationary Distribution in Markov Chains
No ratings yet
Finding Stationary Distribution in Markov Chains
18 pages
Manipulating Probability with Devices
No ratings yet
Manipulating Probability with Devices
16 pages
Game Theory: Matrices & Probability
No ratings yet
Game Theory: Matrices & Probability
8 pages
Stochastic Processes Lecture Notes
No ratings yet
Stochastic Processes Lecture Notes
58 pages
Intuitive Martingales in Probability
No ratings yet
Intuitive Martingales in Probability
22 pages
Markov Chains & Bayesian Inference Guide
No ratings yet
Markov Chains & Bayesian Inference Guide
49 pages
High-Dimensional Analysis in Machine Learning
No ratings yet
High-Dimensional Analysis in Machine Learning
131 pages
Random Matrices and Matrix Recovery
No ratings yet
Random Matrices and Matrix Recovery
44 pages
Markov Chains 2022
No ratings yet
Markov Chains 2022
34 pages
Random Matrix Theory Overview
No ratings yet
Random Matrix Theory Overview
77 pages
Overview of Electrolytic Cells
No ratings yet
Overview of Electrolytic Cells
3 pages
Corazon Aquino: Achievements as President
No ratings yet
Corazon Aquino: Achievements as President
5 pages
Sectional Door Design Guide
No ratings yet
Sectional Door Design Guide
174 pages
Free Photo to Cartoon Converter Online
No ratings yet
Free Photo to Cartoon Converter Online
1 page
Dimethylaniline Manufacturing Overview
No ratings yet
Dimethylaniline Manufacturing Overview
33 pages
Understanding Sleep Disordered Breathing in Children
No ratings yet
Understanding Sleep Disordered Breathing in Children
10 pages
Telemedicine in Geriatric Psychiatry: Relevance in India: Review Article
No ratings yet
Telemedicine in Geriatric Psychiatry: Relevance in India: Review Article
7 pages
Chief Architect X8 Reference Manual
No ratings yet
Chief Architect X8 Reference Manual
1,372 pages
Barangay Annual Investment Plan 2023
No ratings yet
Barangay Annual Investment Plan 2023
6 pages
Hard Gospel by A.L. Nielsen Book Preview
No ratings yet
Hard Gospel by A.L. Nielsen Book Preview
26 pages
Effective Strategies for Constipation Relief
No ratings yet
Effective Strategies for Constipation Relief
5 pages
IELTS Tenses Study Guide
No ratings yet
IELTS Tenses Study Guide
9 pages
Two Domes of Halle au Blé, Paris
No ratings yet
Two Domes of Halle au Blé, Paris
19 pages
Grade 7 English Workbook PDF
No ratings yet
Grade 7 English Workbook PDF
16 pages
High Lifter Pump Overview and FAQs
No ratings yet
High Lifter Pump Overview and FAQs
2 pages
C Language Notes from IIT Delhi
No ratings yet
C Language Notes from IIT Delhi
55 pages
Syriac Grammar for Hebrew Scholars
No ratings yet
Syriac Grammar for Hebrew Scholars
224 pages
Transformer Protection in GIS Substation
No ratings yet
Transformer Protection in GIS Substation
4 pages
Engaging Care Techniques for Babies
No ratings yet
Engaging Care Techniques for Babies
3 pages
Literature Review On Electricity Generation Using Speed Breaker
No ratings yet
Literature Review On Electricity Generation Using Speed Breaker
4 pages
Top 10 Must-See Attractions in Dublin
No ratings yet
Top 10 Must-See Attractions in Dublin
4 pages
Understanding Apache Spark Clusters
No ratings yet
Understanding Apache Spark Clusters
9 pages
Business Plan for Paradise Gift Corner
No ratings yet
Business Plan for Paradise Gift Corner
16 pages
3D Printing in Medicine: Case Study
No ratings yet
3D Printing in Medicine: Case Study
10 pages
10 Leadership Virtues for Change
No ratings yet
10 Leadership Virtues for Change
6 pages
AI Overview and Trends for 2025
No ratings yet
AI Overview and Trends for 2025
12 pages
Gann's Market Analysis Techniques
No ratings yet
Gann's Market Analysis Techniques
11 pages
Towards A Philosophy of Photography PDF
50% (2)
Towards A Philosophy of Photography PDF
79 pages
Karthikeyan G: UI Designer & Developer
No ratings yet
Karthikeyan G: UI Designer & Developer
1 page
Xefr 06eu Elrv
No ratings yet
Xefr 06eu Elrv
3 pages

Quantum Mechanics: Probability Theory Basics

Uploaded by

Quantum Mechanics: Probability Theory Basics

Uploaded by

CHAPTER 2

Essentials of Quantum Mechanics

We begin by building up the basic ingredients of quantum mechanics. This

1. Probability theory on vector spaces

simplest kind of transformation we can imagine is a linear one, so let us examine

this lemma is that we can consider transformations like

Example 1 (Bernoulli coin, N = 2). We now specialize to a two–outcome

and in particular M"k = M"eff with

Moving on, it is useful to recount a few features of probability distributions. If

Example 2 (Sampling two coins, N = 2). Suppose we have two Bernoulli

So far we have only considered linear transformations on p~ that map it into

Theorem 7 (Mixture-preserving transformations are Markov matrices). Suppose

1.2. Joint distributions and tensor products

From this we glean that, in general, ~v ⌦ w ~ ⌦ ~v . Moreover, since ~v 2 R2 and

The above example indicates that

Example 6. Suppose you are faced with this mess:

We abstract this example in the following remark.

Thus if p~ is a joint distribution living in N1 N2 ···Nk , then we have

Example 8. Consider a joint distribution on 6 ⇢ R2 ⌦ R3 . Let us denote

The insight in the above example generalizes in the following way.

we introduce quantum mechanics, we will relentlessly capitalize on parallels with

2. Quantum theory in finite dimensions

2.1. Mechanics on `p spaces: from classical to quantum

Definition 12 (Normed vector space). Let V be a vector space over a field K; we

You might also like