0% found this document useful (0 votes)
542 views328 pages

Advances in Observational Cosmology

The book on observational cosmology discusses recent advancements in the field, including precision cosmology, gravitational lensing, and upcoming astronomical facilities. It is designed for self-guided students and includes exercises, further reading suggestions, and chapter summaries to aid understanding. Authored by Stephen Serjeant, the text aims to provide a comprehensive introduction to key concepts and current research in cosmology.

Uploaded by

ast.fisi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
542 views328 pages

Advances in Observational Cosmology

The book on observational cosmology discusses recent advancements in the field, including precision cosmology, gravitational lensing, and upcoming astronomical facilities. It is designed for self-guided students and includes exercises, further reading suggestions, and chapter summaries to aid understanding. Authored by Stephen Serjeant, the text aims to provide a comprehensive introduction to key concepts and current research in cosmology.

Uploaded by

ast.fisi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Observational cosmology is a rapidly developing field and this book covers some of

the breadth of recent developments, such as precision cosmology and the concordance
cosmological model, inflation, gravitational lensing and shear, the extragalactic far-infrared
and X-ray backgrounds, downsizing and baryon wiggles. Forthcoming major facilities are
covered, including radio, X-ray, submm-wave and gravitational wave astronomy. Suggestions
for further reading provide accessible and approachable jumping off points for students aiming
to further their studies. Produced by Open University academics and drawing on decades of
Open University experience in supported open learning, the book is completely self-contained
with numerous exercises (with full solutions provided). Designed to be worked through
sequentially by a self-guided student, it also includes clearly identified key facts and equations
as well as informative chapter summaries.

Stephen Serjeant is a Reader in Cosmology at The Open University. He led the extragalactic
science case of the SCUBA-2 All Sky Survey, and co-led the active galaxies science theme
of the ATLAS Key Project on the Herschel Space Observatory. Stephen also coordinates the
science faculty’s broadcasting at The Open University and is the lead science academic for the
BBC1 science show Bang Goes the Theory.
Cover image: A multi-wavelength, false-colour view of the M82 galaxy. X-ray data recorded
by Chandra appears in blue; infrared light recorded by Spitzer appears in red; Hubble’s
observations of hydrogen emission appear in orange, and the bluest visible light appears in
yellow-green. Copyright: NASA/JPL-Caltech/STScI/CXC/UofA/ESA/AURA/JHU
Author:
Stephen Serjeant
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Dubai, Tokyo
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
In association with THE OPEN UNIVERSITY
The Open University, Walton Hall, Milton Keynes MK7 6AA, UK
Published in the United States of America by Cambridge University Press, New York.
[Link]
Information on this title: [Link]/9780521157155
First published 2010.
Copyright © The Open University 2010.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, transmitted or utilised in any form
or by any means, electronic, mechanical, photocopying, recording or otherwise, without written permission from the publisher or a
licence from the Copyright Licensing Agency Ltd. Details of such licences (for reprographic reproduction) may be obtained from the
Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS; website [Link]
Open University course materials may also be made available in electronic formats for use by students of the University. All rights,
including copyright and related rights and database rights, in electronic course materials and their contents are owned by or licensed to
The Open University, or otherwise used by The Open University as permitted by applicable law. In using electronic course materials
and their contents you agree that your use will be solely for the purposes of following an Open University course of study or otherwise
as licensed by The Open University or its assigns.
Except as permitted above you undertake not to copy, store in any medium (including electronic storage or use in a website), distribute,
transmit or retransmit, broadcast, modify or show in public such electronic materials in whole or in part without the prior written
consent of The Open University or in accordance with the Copyright, Designs and Patents Act 1988.
Edited and designed by The Open University.
Typeset by The Open University.
Printed and bound in the United Kingdom by Latimer Trend and Company Ltd, Plymouth.
This book forms part of an Open University course S383 The Relativistic Universe. Details of this and other Open University courses
can be obtained from the Student Registration and Enquiry Service, The Open University, PO Box 197, Milton Keynes MK7 6BJ,
United Kingdom: tel. +44 (0)845 300 60 90, email general-enquiries@[Link]
[Link]
British Library Cataloguing in Publication Data available on request.
Library of Congress Cataloguing in Publication Data available on request.
ISBN 978-0-521-19231-6 Hardback
ISBN 978-0-521-15715-5 Paperback
Additional resources for this publication at [Link]/9780521157155
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites
referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
1.2
Contents

OBSERVATIONAL COSMOLOGY
Introduction 9
Chapter 1 Space and time 11
Introduction 11
1.1 Olbers’ paradox 11
1.2 Olbers’ paradox in a different way 12
1.3 Metrics: the Universe in a nutshell 13
1.4 Redshift and time dilation 19
1.5 Cosmological parameters 21
1.6 The age of the Universe 25
1.7 The flatness problem 27
1.8 Distance in a warped spacetime 28
1.9 The edge of the observable Universe 31
1.10 Measuring distances and volumes 32
1.11 The fate of the Universe 35

Chapter 2 The cosmic microwave background 40


Introduction 40
2.1 The discovery of the cosmic microwave background 40
2.2 The CMB temperature as a function of redshift 42
2.3 Why is the CMB a black body? 45
2.4 Baryogenesis 46
2.5 The entropy per baryon 47
2.6 Primordial nucleosynthesis: a thousand seconds that shaped
the Universe 47
2.6.1 The primordial fireball 47
2.6.2 The primordial element abundances 50
2.7 The need for new physics 52
2.8 The inflaton field 57
2.9 The primordial density power spectrum 60
2.10 The real music of the spheres 67
2.11 The CMB dipole 70
2.12 The acoustic peaks in the CMB 72
2.13 The Sachs–Wolfe effect 74
2.14 Reionization 76
2.15 Cosmological parameter constraints 76
2.16 The polarization of the CMB 79
2.17 Dark energy and the fate of the Universe 83

5
Contents

Chapter 3 The local Universe 92


Introduction 92
3.1 Evidence for dark matter 92
3.2 The Hubble tuning fork 94
3.3 Spiral galaxies and the Tully–Fisher relation 96
3.4 The fundamental plane of elliptical galaxies 98
3.5 Clusters of galaxies 99
3.6 The Sunyaev–Zel’dovich effect 100
3.7 The morphology–density relation 102
3.8 The Butcher–Oemler effect 104
3.9 The cooling flow problem 105
3.10 The cosmological distance ladder 105
3.11 The large-scale structure of the Universe 108
3.12 Baryon wiggles 116

Chapter 4 The distant optical Universe 120


Introduction 120
4.1 Source counts 120
4.2 Cold dark matter and structure formation 121
4.3 Population synthesis 129
4.4 Photometric and spectroscopic redshifts 132
4.5 Luminosity functions 135
4.6 Active galaxies 139
4.7 Deep-field surveys and wide-field surveys 144
4.8 Morphological K-corrections 154
4.9 The blue cloud and red sequence 154

Chapter 5 The distant multi-wavelength Universe 159


Introduction 159
5.1 The extragalactic optical and infrared background light 159
5.2 Submm galaxies and K-corrections 162
5.3 Ultraluminous and hyperluminous infrared galaxies 166
5.4 Measuring star formation rates 169
5.5 Multi-wavelength surveys 173
5.6 Cosmic star formation history and stellar mass assembly 174
5.7 Downsizing 177
5.8 Feedback in galaxy formation 178

Chapter 6 Black holes 183


Introduction 183
6.1 What are black holes? 183
6
Contents

6.2 The Eddington limit 185


6.3 Accretion efficiency 188
6.4 Cosmic mass density of black holes, ΩBH 193
6.5 Finding supermassive black holes 195
6.5.1 Context 195
6.5.2 Stellar and gas kinematics 196
6.5.3 Megamasers 197
6.5.4 Stellar proper motion 198
6.5.5 Broad iron X-ray emission line 198
6.5.6 Reverberation mapping 199
6.6 The Magorrian relation 201
6.7 The hard X-ray background 203
6.8 Black hole demographics 207
6.9 Observations of black hole growth and the effects of feedback 207
6.10 Merging black holes and gravitational waves 210

Chapter 7 Gravitational lensing 216


Introduction 216
7.1 Gravitational lens deflection 216
7.2 The lens equation 220
7.3 Magnification 222
7.4 The singular isothermal sphere model 227
7.5 Time delays and the Hubble parameter 229
7.6 Caustics and multiple images 231
7.7 Other lens models 236
7.8 Microlensing 237
7.9 Cosmic shear 240
7.10 Galaxy cluster lenses 246
7.11 Finding gravitational lenses 249

Chapter 8 The intervening Universe 253


Introduction 253
8.1 The Lyman α forest 253
8.2 Comparison with cosmological simulations 257
8.3 Ωb and the cosmic deuterium abundance 257
8.4 The column density distribution 258
8.5 Damped Lyman α systems 261
8.6 The proximity effect 266
8.7 ΩH I , the neutral hydrogen density parameter 267
8.8 How big are Lyman α clouds? 269
7
Contents

8.9 Reionization and the Gunn–Peterson test 270


8.10 The Lyman α forest of He II 273
8.11 The first light in the Universe and gamma-ray blazars 275
8.12 The Square Kilometre Array 277
8.13 The CODEX experiment 278

Epilogue 282
Appendix A 283
Appendix B 285
Solutions 291
Acknowledgements 312
Index 318

8
Introduction
I have gathered a posie of other men’s flowers, and nothing but the thread
that binds them is my own.
Montaigne
Observational cosmology is in a tremendously exciting time of rapid discovery.
Cosmology can be enriching and enjoyable at this level no matter what your
aims are, but my guiding principle for the topics in this book has been: what
would I ideally like a person finishing an undergraduate degree and starting a
PhD in observational cosmology to know? What would represent a balanced
undergraduate introduction that I would like them to have had?
Throughout this book, I’ve tried to give readers enough grounding to appreciate
the current topics in this enormously active and exciting field, and to give some
sense of the gaps — and in some cases chasms — in our understanding. I haven’t
forgotten that the step up to third-level undergraduate study can be difficult and
daunting, so I’ve included further reading sections. Some of the items in these
lists will take you to more leisured introductions and backgrounds to some of the
material that we shall cover. Nevertheless, this book is intended to be fully
self-contained.
I’ve also given some jumping-off points if readers want to go into more
depth. You’ll find these mostly in the further reading sections, but also
in some footnotes and figure captions. There are some references to journal
articles, such as ‘Hughes et al., 1998, Nature, 394, 241’. The first number is
a volume number, and the second is a page number. Many of these can
currently be read online, either in preprint form or as published papers, at
[Link] [Link]. Some of the further reading is
most easily found on the internet, but internet addresses are transitory so I’ve tried
to keep these to a minimum. References to arXiv or astro-ph reference numbers
are to the preprint server, currently at [Link] or various worldwide
mirrors. Entering the article identification in the search there usually results in
the paper. Sometimes the further reading section will point to more advanced
material, beyond the normal scope of an undergraduate degree. I’ve chosen to do
this partly in order to ease the transition from undergraduate to postgraduate level
for those of you who are on that track. The online abstract service also has the
facility to list later papers that have cited any given paper, so it’s very useful for
literature reviews. At every stage, each level is a big step up from the previous one
and the transition can be difficult. I don’t intend this book to be a postgraduate
textbook, but if I can ease the transition to that level, then all to the good.
Inevitably the selection of topics betrays my own biases and interests, and there
are undoubtedly many exciting areas not covered. But the biggest problem is that
this is a fast-paced field with lots of exciting and rapid developments. Some future
advances are foreseeable, such as gravitational wave astronomy or the Square
Kilometre Array, and I can give tasters for what these fabulous new facilities
promise, so this book should keep its relevance for a few years at least. As I write
this, the Herschel and Planck satellites are waiting to be launched in French
Guyana. However, the ‘unknown unknowns’ I can do nothing about. This is the
mixed blessing of writing a book during the golden age of cosmology.

9
Introduction

Finally, I would like to thank David Broadhurst, Mattia Negrello, Andrew Norton,
Robert Lambourne, Jim Hague and Carolyn Crawford for their critical readings of
early drafts of this book. Any errors that I somehow managed to sneak through
their careful ministrations are down to me alone. I would also like to thank the
editors and artists at The Open University for turning my scribbles into something
beautiful.

10
Chapter 1 Space and time
God does not care about our mathematical difficulties. He integrates
empirically.
Albert Einstein

Introduction
How did the Universe begin? How big is the observable Universe? Why is the
night sky dark? What will the Universe be like in the year one trillion? What is
the ultimate fate of the Universe? This chapter will answer these questions and
more, and give you the tools that you need for understanding modern precision
cosmology.
Although it’s not necessary for you to have met special relativity and the
Robertson–Walker metric before, you may find that we take these subjects at a
fast pace in this chapter if these are new topics to you. If so, you may find
Appendix B on special relativity helpful, or you might try a more comprehensive
introduction to expanding spacetime metrics such as that in Robert Lambourne’s
Relativity, Gravitation and Cosmology (see the further reading section).

1.1 Olbers’ paradox


In 1823, the German astronomer Heinrich Wilhelm Olbers asked a profound
question: if the Universe is infinite, then every line of sight should end on a star, r
so why isn’t the night sky as bright as the Sun? The fact that these stars are further
away doesn’t help, as we’ll show.
Sun
At a distance r from the Sun, its light is spread evenly over a sphere with an area
4πr2 , as shown in Figure 1.1. If the Sun has luminosity L, then the energy flux S
from the Sun must be
L
S= , (1.1)
4πr2
i.e. S ∝ L/r2 . Meanwhile, if the diameter of the Sun is D, then the angular Figure 1.1 A sphere of
diameter θ of the Sun (see Figure 1.2) will be given by radius r surrounding the Sun.
D
θ 1 tan θ = ,
r
where the approximation θ 1 tan θ is valid for small angles measured in radians.
Therefore we have
D
θ∝ .
r
So the Sun’s angular area on the sky (in, for example, square degrees or
steradians) must be proportional to (D/r)2 . Therefore the surface brightness D
(flux per unit area on the sky, e.g. per square degree) is proportional to θ
r
(L/r2 )/(D/r)2 = L/D2 , which is a constant independent of r. So if all stars are
like the Sun (i.e. similar luminosities and diameters), then all stars should have Figure 1.2 The angle θ varies
surface brightnesses similar to that of the Sun. If every line of sight ends on a star, approximately as D/r.
then the whole sky should be about as bright as the Sun.
11
Chapter 1 Space and time

1.2 Olbers’ paradox in a different way


Here is a different approach to the same problem. Suppose that there are ρ stars
per unit volume, in a Universe that’s homogeneous (the same seen from every
point) and isotropic (no preferred direction). How many stars have fluxes in the
range S to S + dS? (Here, dS can be thought of as a limitingly-small1 increment
of S, which we use in preference to δS.) First, let’s assume for now that all stars
are identical, and have luminosity L. Consider a radial shell around the Sun, with
radius r and thickness dr (Figure 1.3). The volume of this shell is the area times
the thickness, or 4πr2 dr. (Another way of finding the volume of the shell is to
subtract 43 πr3 from 34 π(r + dr)3 , and neglect terms of the order (dr)2 .) The
number of stars in this shell is ρ times the volume of the shell:
dN = ρ × 4πr2 dr ∝ ρr2 dr. (1.2)
(We’re assuming a flat space for now, known as Euclidean space — curved
spaces will come later.) The flux S of a star varies with distance according to
Equation 1.1, which implies that
dS
∝ Lr−3 .
r + dr dr
r
So the number of stars with fluxes between S and S + dS is
dN dN dr
Sun dN = dS = dS
dS dr dS
r3
∝ ρr2 dS ∝ r5 dS. (1.3)
L
This dN is the same as in Equation 1.2; all stars have the same luminosity L, so
Equation 1.1 gives an exact one-to-one correspondence between S and r, so the
Figure 1.3 A shell of radius r interval (r, r + dr) corresponds exactly to an interval (S, S + dS).
and thickness dr around the Sun. We can also write the last result as dN/dS ∝ r5 . Now, by rearranging
Equation 1.1 we have that
C D
L 1/2
r= ∝ S −1/2
4πS
and therefore
dN
dS ∝ S −5/2 dS.
dS
So we find that
dN
∝ S −5/2 . (1.4)
dS
Relations of this kind are known in cosmology as number counts or source
counts, and play an important role, as we’ll see. dN/dS is the number of
stars dN in a flux interval dS, which is a slightly different way of regarding the
rate of increase of N with respect to S. The general form y ∝ xa is sometimes
1
The infinitesimal quantity dS, which we refer to vaguely as a ‘limitingly-small’ version of δS,
can be defined more rigorously using a mathematical discipline called non-standard analysis. This
takes us beyond the scope of this book, but if you are concerned by manipulating infinitesimals no
differently to other algebraic symbols, try, for example, H. Jerome Keisler’s book Elementary
Calculus: An Approach Using Infinitesimals, which is available online.
12
1.3 Metrics: the Universe in a nutshell

called a power law, with a in this case being the power law index. Here, dN/dS
is a power law function of S, with a power law index of −5/2.
Now, we’ve assumed that all the stars are identical, but suppose instead that
there are several types of star, each with a different luminosity Li and number
density ρi , with i = 1, 2, 3, . . .. Each type of star will have its own number counts
dNi /dS = ki S −5/2 , where ki is some constant specific to type i. The total
number counts will still obey a −5/2 power law:
dN E dNi E ! F E
= = ki S −5/2 = S −5/2 ki ∝ S −5/2 .
dS dS
So any homogeneous, isotropic population of stars produces a −5/2 power law
for number counts. But this leads to a profound problem: the total flux of stars
brighter than S0 is
* ∞ * ∞
dN −1/2
Stotal = S dS ∝ S −3/2 dS ∝ S0 ,
S0 dS S0

which diverges as S0 tends to zero. So the sky should be infinitely bright!

Exercise 1.1 First, we’ve argued that a homogeneous, isotropic Universe gives
you a sky as bright as the Sun. Next, we’ve argued that the sky is infinitely bright
in a homogeneous, isotropic Universe. They can’t both be true, and it’s not a
mistake in the algebra, so what’s different in our assumptions? ■
The night sky is a long way from being as bright as the Sun, and is certainly not
infinitely bright. So what is the answer to Olbers’ profound question? It’s not that
the Universe is opaque — in fact, as we shall see later in this book, the Universe is
surprisingly transparent at optical wavelengths. It’s also not that stars have finite
lifetimes, because that doesn’t stop lines of sight ending on a star eventually and
inevitably.
Part of the answer is that the Universe is only finitely old. Edgar Allan Poe was
the first to point out this solution, in his 1848 book Eureka: a Prose Poem. But
another part of the answer is that we don’t live in a static, flat space. Rather, we
live in a curved, expanding spacetime, which we shall meet in the next section.

1.3 Metrics: the Universe in a nutshell


We’re about to commit possibly the greatest ever act of hubris: to describe the
geometry of the Universe in a single equation. To do this, we’ll need to remind
you of a few preliminaries and notations. Pythagoras’s theorem is
H 2 = x2 + y 2
for a right-angled triangle with length x, height y and hypotenuse H. In three
dimensions this is just
H 2 = x2 + y 2 + z 2 .
So, if two points in space are separated by (δx, δy, δz), their separation δL is
given by
(δL)2 = (δx)2 + (δy)2 + (δz)2 .
13
Chapter 1 Space and time

In pre-relativistic physics, if one observer measures the separation to be δL,


then all observers measure the same δL regardless of how they are moving.
This also has all observers agreeing over the passage of time: a separation in
time of δt of two events is the same for all observers. (Event is used to mean
a point in both space and time.) The whole aim of fundamental physics is
to describe the workings of the Universe in an observer-independent way, so
these observer-independent quantities, called invariants, often play a central
role. This is why it’s meaningless to say that the laws of physics are different
somewhere else in the Universe: if they’re different somewhere else, they weren’t
fundamental laws in the first place. Many quantities in physics owe almost all
their interest to the fact that they are conserved in all (or nearly all) situations:
energy, momentum, angular momentum, baryon number, lepton number,
strangeness, isospin.
Having said that, there is currently no consistent theory of everything. The best
description of the very small, quantum physics, contradicts the best theory of the
very large, general relativity. We proceed in the hope that the apparently-invariant
quantities discovered so far will lead us closer to the underlying workings of the
Universe . . .
In Einstein’s special relativity, neither spatial separations nor time intervals are
invariant, but there is a combined spacetime interval that is invariant:
(δs)2 = (c δt)2 − (δx)2 − (δy)2 − (δz)2 , (1.5)
where c is the speed of light in a vacuum. The coefficients on the right-hand side
(in this case +1, −1, −1, −1) are known as the metric coefficients. (Note
that some books choose to use a (−1, 1, 1, 1) metric instead.) Together, these
coefficients make up the metric tensor (see Appendix B); we shall discuss
tensors later in this book.
The metric of special relativity has many consequences with which you should be
familiar, such as time dilation, Lorentz contraction, Lorentz transformations and
the non-universality of simultaneity. If you need reminders, Appendix B gives a
very brief summary of special relativity. Freely falling particles move on paths for
which the total interval s is a maximum along that path, similarly to Fermat’s
principle in optics. These optimal paths are known as geodesics.

Worked Example 1.1


Two ticks of a watch are separated by (δt, 0, 0, 0) in the frame of
the watch, with δt = 1 second. The watch is moving at a constant
velocity v relative to an observer, for whom the ticks are separated by
(δt% , δx% , δy % , δz % ). Using Equation 1.5 or otherwise, show that δt% = γ(v) δt
with γ = (1 − v 2 /c2 )−1/2 , and calculate the spacetime interval δs between
the ticks.
If I put the watch on a piece of string and whirl it around my head, and it
takes half a second (according to the watch) to go round once, what is the
total spacetime interval of the watch’s world-line of one orbit? (A world-line
is the set of events that trace the path of an object through spacetime.)

14
1.3 Metrics: the Universe in a nutshell

Solution
Without loss of generality, we can choose coordinates in which the observer
is moving along the x-axis, so δy % = δz % = 0. Now v = dx% /dt% , and from
δs% = δs we have (c δt)2 = (c δt% )2 − (δx% )2 . If we divide by (c δt% )2 , we
find
c2 (δt)2 (δx% )2 (dx% )2
= 1 − = 1 − ,
c2 (δt% )2 c2 (δt% )2 c2 (dt% )2
so
C D2 C D2
δt dx% v2
=1− =1−
δt% c dt% c2
hence
δt% = γ(v) δt.
For the second part, δs = c δτ , where τ is the proper time, i.e. the time
measured by the watch. Here, we have δτ = δt, so the total interval is
0.5c metres, or 0.5 light-seconds. When the watch is being whirled around,
τ is in an accelerating frame, but it’s always true that δs = c δτ , so the total
interval is 0.5 light-seconds.

Worked Example 1.2


What is the total spacetime interval between any two points on a light ray,
and how does the interval relate to causality in the Universe? (Hint: If events
can be connected only by a signal travelling faster than light, they cannot be
in causal contact with each other.)

Solution
The spacetime interval between any two points on a light ray is zero. The
connection to causality is best illustrated in the lightcone diagram shown in
Figure 1.4. An event at the origin can send a message at light speed or
slower to any event in the future lightcone. Similarly, any event in its past
lightcone could have affected it. The spacetime intervals between the origin
and these events are time-like intervals, (δs)2 > 0. Events outside the
lightcone cannot affect, or be affected by, the event at the origin. The
spacetime intervals between the origin and these events are space-like, i.e.
(δs)2 < 0. Points on the lightcone have exactly zero spacetime interval
between one another. The δs = 0 intervals are sometimes referred to as null.

Exercise 1.2 The highest-energy cosmic rays have energies of 1020 eV and
above. Most cosmic rays are protons, with rest masses of 938.28 MeV/c2 . The
diameter of our Galaxy is roughly 100 000 light-years. Calculate how long it
would take to cross the Galaxy, according to the highest-energy cosmic rays.
(Hint: You don’t need to know the conversion between eV and joules, nor do you
need the conversion between light-years and metres.) ■

15
Chapter 1 Space and time

We can also describe the invariant spacetime interval in spherical coordinates, in


infinitesimals:
ds2 = (c dt)2 − dr2 − (r dθ)2 − (r sin θ dφ)2 ,
where ds2 means (ds)2 . These coordinates are shown in Figure 1.5.

ct

h t re
fu tco

ne
lig

lig futu
tu ne

co
h
re

δx r
θ
c δt φ y
x

x
ht st

pa tcon
ne

lig
lig pa

st e
co

Figure 1.4 The lightcone in special relativity. Figure 1.5 The position of a point can be
The point at position (δx, c δt) has c δt > δx, so specified in terms of Cartesian coordinates
(c δt)2 − (δx)2 > 0, implying that (δs)2 > 0, (x, y, z) or in terms of spherical coordinates
meaning that the invariant interval between that (r, θ, φ).
point and the origin is time-like.

time t1
To describe an expanding Universe, we could modify the metric by multiplying
the spatial parts with a time-dependent expansion factor:
1 (
ds2 = c2 dt2 − R(t) dr2 + r2 dθ2 + r2 sin2 θ dφ2 ,
l(t1 ) where R(t) is called the scale factor of the Universe. A schematic representation
of this is shown in Figure 1.6.
time t2
In fact, the most general homogeneous, isotropic metric is
C D
2 2 2 2 dr2 2 2 2 2 2
ds = c dt − R (t) + r dθ + r sin θ dφ , (1.6)
1 − kr2

where the constant k determines whether the Universe is spatially flat (k = 0),
spherical (k = +1) or hyperbolic (k = −1). (We use only these three values of k
l(t2 ) because other values can be found by rescaling√R and r; for example, if k = −3,

Figure 1.6 A cubical volume then the substitutions r% = r 3 and R% = R/ 3 give an equation of the same
of the Universe. The length of form as Equation 1.6 for k = −1.) Figure 1.7 illustrates some two-dimensional
the side l expands with the scale surfaces in which k = +1, 0 or −1, to give you some intuition (if not actually a
factor of the Universe, so visualization) of the three-dimensional counterparts. You might reasonably
l(t2 ) = (R(t2 )/R(t1 )) l(t1 ). object that these two-dimensional representations oversimplify the situation. In
physics (and general relativity especially) it’s often easier to describe something
mathematically than it is to visualize it; physics makes tremendous demands on
16
1.3 Metrics: the Universe in a nutshell

the imagination. Perhaps the human brain doesn’t have the cognitive machinery to
be able to visualize curved expanding spacetimes.

β
b
initially
γ α parallel
lines

C < 2πb initially


parallel β
lines α b
γ C = 2πb

α + β + γ > 180◦ α + β + γ = 180◦

β
α
C > 2πb

γ b

initially
parallel
lines

α + β + γ < 180◦

Figure 1.7 Curved surfaces may have geodesics that start parallel, but don’t remain parallel. Also, the angles of
a triangle need not add up to 180◦ , nor is the circumference of a circle necessarily 2π times the radius. The
spherical model has k = +1, the flat model has k = 0 and the saddle-shaped model has k = −1.

Equation 1.6 is known as the Robertson–Walker metric or sometimes as


the Friedmann–Robertson–Walker metric. Because this metric is spatially
homogeneous and isotropic, any point can be chosen for the origin r = 0. We
sometimes refer to t as coordinate time or cosmic time. This metric has preferred
inertial reference frames in which the expansion of the Universe is isotropic. It’s
in such a frame that the coordinates t, r, θ, φ are measured. We’ll refer to this
reference frame as the cosmic rest frame, and it’s assumed to be equivalent to the
frame of the cosmic microwave background, of which more later. If we took
an observer in the cosmic rest frame and applied a velocity boost (a Lorentz
transformation) to see what the expanding Universe would look like from a
moving observer’s point of view, we’d find that it no longer looked isotropic.
Equation 1.6 is our hubristic attempt to describe the Universe in one line, and we
shall meet this equation many times in this book.
17
Chapter 1 Space and time

It may now be worth discussing some frequent misconceptions about the


Robertson–Walker metric.
● If the Universe is expanding, why am I not getting taller?
❍ Your head is not free-floating from your feet; your body is bound by chemical
bonds.
● Why does the Earth not drift away from the Sun as the Universe expands?
❍ Equation 1.6 describes a Universe that is exactly homogeneous and isotropic.
Locally, though, that’s obviously not right. Within the Solar System the local
metric is not the Robertson–Walker metric, because the gravitational field of
the Sun dominates. Equation 1.6 is an approximation that becomes
increasingly good at larger and larger scales, but on small scales spacetime
clearly has much more structure. When we consider the collapse of density
perturbations in Chapter 4, we’ll see that it’s wrong to think of the Earth
feeling a gentle tug from the expansion of the Universe, which is
overwhelmed by the attraction from the Sun.
● What is the Universe expanding into?
❍ In general relativity, the intrinsic curvature of spacetime (including the
expansion of space) can be specified using measurements within the
spacetime, using geodesics. This means that if we want to describe the
curvature of spacetime, we don’t need to embed spacetime in a
higher-dimensional space. But if we don’t need to refer to higher-dimensional
space, and it doesn’t affect anything that we can measure, do we need to
hypothesize its existence at all? In any case, we would have no reason to
believe that this higher-dimensional space is flat anyway. So, we don’t have
evidence that the Universe is expanding ‘into’ anything at all — it is simply
expanding.
● Where is the middle of the Universe, which everything exploded out of?
❍ This is a widely-held misconception that perhaps dates back to Lemaı̂tre’s
phrase ‘the primeval atom’. We frequently illustrate the expansion of space
with an analogy of an expanding balloon (such as in Figure 1.8), but what is
rarely pointed out is that the radial coordinate is time-like, not space-like. The
explosion started at the centre, but this was at the beginning of time in the
Robertson–Walker universe, not in a particular location. In fact, one might
reasonably say that it occurred everywhere, at every point in space. Also, if
the Universe is hyperbolic (k = −1), then the Universe may be infinite. If so,
then even in the earliest moments of the history of the Universe, there would
still be infinitely more matter outside any given volume than inside that
Figure 1.8 The balloon volume.
metaphor for the expanding ● What caused the Universe to fling itself apart in the Big Bang?
Universe. Note that the objects
on the balloon don’t themselves ❍ This question suggests an input of energy, flinging matter from an initial state
expand. of rest, but this is not how the field equations of general relativity are
formulated. Classically at least, the answer is that these are just the initial
conditions to Einstein’s field equations. But just saying ‘that’s how it started’
is clearly a very unsatisfactory answer, and arguably avoiding the question.
One attraction of the theory of inflation is that it gives a mechanism for this
initial expansion; we shall meet inflation in later chapters.
18
1.4 Redshift and time dilation

Note that Equation 1.6 defines a preferred reference frame, in which the expansion
is isotropic. As we shall see, this is well-supported by observations both of the
large-scale structure of the galaxy distribution, and of the cosmic microwave
background. Nevertheless, is it possible to conceive of a universe consistent with
Einstein’s field equations in which there are no preferred reference frames? One
possibility is a fractal structure, and we shall meet this in later chapters when
discussing inflation.
The field equations of Einstein’s general relativity determine both k and R(t).
These equations can be shown to yield
C D Proving these equations would
dR 2 8πG(ρm + ρr )R2 Λc2 R2
= Ṙ2 = − kc2 + , (1.7) take us a long way outside
dt 3 3 the scope of this book into
C D C D
d2 R d dR 3p R Λc2 R what is usually graduate-level
= = R̈ = −4πG ρ m + ρ r + + , (1.8)
dt2 dt dt c2 3 3 physics, but if you wish to
pursue this rewarding path
where ρm is the average matter density of the matter in the Universe, ρr is an you might try, for example,
equivalent matter density for radiation (derived using E = mc2 ), G is Newton’s Relativity, Gravitation and
gravitational constant, p is the pressure of the matter and radiation, and R is a Cosmology by R. Lambourne for
function of time, R = R(t), though we drop the function notation for clarity and an advanced undergraduate-level
brevity. These equations are known as the Friedmann equations. Both the introduction, or General
densities and p also vary with time. Λ is known as the cosmological constant, Relativity: An Introduction for
and features in Einstein’s field equations for general relativity. Physically, it Physicists by M.P. Hobson, G.P.
represents an in-built tendency of space to expand (or, for Λ < 0, contract). Some Efstathiou and A.N. Lasenby.
special cases are fairly simple: for example, if k = Λ = 0, then R(t) ∝ t2/3 in a
matter-dominated universe, or R(t) ∝ t1/2 in a radiation-dominated universe.
We can derive Equation 1.8 from Equation 1.7 by differentiation. This will give us
a term involving d(ρm + ρr )/dt, but we could treat a part of the Universe as a box
of gas, and use the conservation of energy to show that the change in energy
density equals the p dV work, i.e. d((ρm + ρr )c2 R3 ) = −p d(R3 ). Therefore
d((ρm + ρr )c2 R3 )/dt = −p d(R3 )/dt.

Exercise 1.3 Derive Equation 1.8 from Equation 1.7, using the conservation
of energy. ■
(The issue of p dV work is slightly more subtle in general relativity, since it’s
not immediately clear what the work is done against, but the full relativistic
calculation gives the same result.)
In the next few sections, we shall explore some of the surprising aspects of this
expanding spacetime model, before returning to Olbers’ profound paradox in the
next chapter.

1.4 Redshift and time dilation


Imagine two photons being emitted at an epoch t1 when the scale factor of the
Universe was R1 . Suppose also that the photons were emitted a short time δt
apart. They arrive at the Earth now at time t0 when the scale factor is R0 . When
the photons were emitted at t1 , the distance between them was c δt. Now, the
distance between them is (R0 /R1 )c δt, i.e. stretched by a factor R0 /R1 due to the

19
Chapter 1 Space and time

expansion of the Universe. The second photon will arrive (R0 /R1 ) δt later than
the first. This implies that distant clocks in the Robertson–Walker universe appear
time-dilated by a factor R0 /R1 , sometimes called cosmological time dilation.
We’ll find it useful to define the dimensionless scale factor a as
R1
a= , (1.9)
R0
so a = 1 today and a < 1 in the past.
A similar argument applies to the photons themselves. Treating them this time as
waves, the distance between two peaks of a light wave will be expanded by the
same factor R0 /R1 , i.e. the wavelength is longer, and the light is shifted to the
red. We define redshift (symbol z) using
R0 1 λobserved
1+z = = = , (1.10)
R1 a λemitted
where λobserved is the observed wavelength of the photon, and λemitted is the
original photon wavelength when the light was emitted. Sometimes this is written
as
λobserved − λemitted
z= . (1.11)
λemitted
A high redshift means that there has been a big increase in the expansion factor
since the light was emitted. Redshift is sometimes misleadingly referred to as
‘recession’, since a receding object would have a Doppler shift to the red. Indeed,
galaxies are not stationary relative to each other, but have relative velocities up to
even 1000 km s−1 . In astronomy these are usually known as peculiar velocities,
and these will indeed contribute both blue and red Doppler shifts. However,
cosmological redshift swamps these effects at distances beyond about 100 Mpc,
and you should not confuse Doppler shifts with the redshift from cosmological
expansion. The distance between us and a distant galaxy is getting bigger because
of the expansion of the Universe, but this is a physically distinct situation to a
galaxy moving away in a flat, non-expanding spacetime.
One alternative to the Robertson–Walker model is the ‘tired light’ universe,
proposed by Fritz Zwicky in 1929. In this model, redshift is due to photons
gradually losing energy during their passage through the universe, due to some
interaction with intervening matter. There are many observations that are difficult
to reproduce in this model, but in particular, the experimental detection of
cosmological time dilation has made this interpretation untenable. Figure 1.9
shows the decay times of supernovae as a function of redshift, which show exactly
the (1 + z) time dilation predicted by theory.
But to measure redshifts, we need to know λemitted . This can be done using atomic
or molecular transitions that occur at particular quantized energies, and so involve
the emission or absorption of photons with particular quantized wavelengths. If
we can identify the transition in the distant object, we know λemitted , provided that
atoms behaved in the same way in the early Universe.
But did they? And if not, how could we tell? It turns out that many characteristic
atomic and molecular transitions can easily be recognized at high redshifts (see,
for example, Figure 1.10), so any differences must be fairly subtle. If the strength
of the electromagnetic interaction were different at early cosmic epochs, this
20
1.5 Cosmological parameters

would change the atomic fine structure constant α = e2 /(4πε0 !c) 1 1/137. The
fractional difference in wavelength (δλ/λ) between a pair of relativistic fine
structure lines is proportional to α2 , so changes in α should lead to wavelength
shifts between some emission lines in distant cosmological objects.
relative decay rate

1.2
1.0
0.8
0.00 0.01 0.02 0.03
(a) redshift, z
relative decay rate

1.2
1.0
0.8
0.6
0.4
0.0 0.1 0.2 0.3 0.4 0.5 0.6
(b) redshift, z

Figure 1.9 Supernova decay rates in the nearby Universe (top), and in the high-redshift Universe (bottom). The
dashed line shows the tired light prediction of no time dilation, and the red line shows the (1 + z)−1 time dilation
expected in the Robertson–Walker metric. The high-redshift data strongly support the expanding universe model;
the measured variation is (1 + z)−0.97±0.10 .
So far, comparisons of the atomic and molecular transitions in the early Universe
with laboratory experiments have not yielded any uncontested evidence for α
being any different in the early Universe; some claimed detections of changes in
α have not been corroborated by other experiments, and it is clear that the
experiments are both very difficult and prone to systematic errors. In terrestrial
laboratories, α̇/α = (−2.6 ± 3.9) × 10−16 per year, i.e. consistent with no
change. However, it remains possible that ongoing cosmological experiments will
make a ground-breaking detection of a change in α. Some speculative theories
allow for possible changes in α, such as supersymmetry or M-theory. However,
these theories do not (or don’t yet) predict the specific variations in α with
redshift that some groups have claimed, in any unique and unforced way.

1.5 Cosmological parameters


How fast is the Universe expanding?
Suppose that the distance between us and a distant galaxy is Y = D × R, where R
is the current scale factor of the Universe, and D is some constant. By
differentiating this, we find that Y is increasing at a rate
dY dR
=D
dt dt

21
Chapter 1 Space and time

1 8
[O III] Hα

flux/arbitrary units
0.8
flux/arbitrary units

6
Hγ Hβ [O III] [S II] [S III]
0.6
4
Hα + [N II]
0.4
2
0.2
Hβ 0
0 500 600 500 600 700 800 900
observed wavelength/nm observed wavelength/nm

600
[O III] 1
flux/arbitrary units

[O III]
flux/arbitrary units

0.8
400 Hα + [N II]
0.6
[O II] [O III]
Hβ 0.4
200
⊕ 0.2

0 0
400 500 600 700 800 900 1200 1400 1600 1800 2000 2200
observed wavelength/nm
observed wavelength/nm

Figure 1.10 Spectra of objects in the local Universe and in the high-redshift Universe, showing many of the
same characteristic spectral features. The y-axes in all the spectra are relative flux. The top left panel is a planetary
nebula in our Galaxy, M57. The top right panel is an H II region in the Virgo cluster of galaxies. The bottom left
panel shows the spectrum of a star-forming galaxy at a redshift of z = 0.612 (the ⊕ symbol marks absorption from
the Earth’s atmosphere), and the bottom right panel shows another star-forming galaxy at a redshift of z = 2.225.
In all cases there is the characteristic [O III] emission line doublet at a rest-frame wavelength of 495.9, 500.7 nm, as
well as other emission lines such as Hα 656.3 nm, Hβ 486.1 nm, [N II] 654.8, 658.4 nm. The bottom right panel
has the emission lines redshifted into the near-infrared range, in which only certain regions of the spectrum are
available, for reasons of atmospheric transparency.

because of the expansion of the Universe. But D = Y/R, so


C D
dY Y dR 1 dR
= =Y× = Y × H,
dt R dt R dt
where
H = Ṙ/R (1.12)

is known as the Hubble parameter, whose current value is known as H0 . If we


regard dY/dt as an apparent recession velocity v, then we have
v = Y × H, (1.13)

22
1.5 Cosmological parameters

i.e. the apparent recession velocity v is proportional to distance Y, but recall the
warnings in Section 1.4. Sometimes this apparent flow is called the Hubble flow.
Beware: H is frequently (but misleadingly) known as the Hubble constant
(Hubble constant is a fair description of H0 , however). Although it’s virtually
constant over our lifetimes, it certainly isn’t constant over the history of the
Universe. In some sense, H0 is a measure of the current expansion rate of the
Universe, and it has the value 72 ± 3 km s−1 Mpc−1 , or about 2 × 10−18 s−1 . This
is also sometimes written as H0 = 100h km s−1 Mpc−1 , with h = 0.72 ± 0.03.
This may seem deliberately obtuse, but the Hubble parameter is so fundamental
that it affects many other cosmological measurements, so some observational
cosmologists opt to quote their results in terms of h.
If we divide Equation 1.7 by R2 , we obtain
9 <2
2 Ṙ 8πG(ρm + ρr ) Λc2 kc2
H = = + − 2. (1.14)
R 3 3 R

The terms on the right-hand side drive the expansion of the Universe. It’s common
in cosmology to define their fractional contributions:

8πGρm
Ωm = , (1.15)
3H 2
8πGρr
Ωr = , (1.16)
3H 2
Λc2
ΩΛ = , (1.17)
3H 2
−kc2
Ωk = 2 2 , (1.18)
R H

where the subscript ‘m’ stands for ‘matter’ and the subscript ‘r’ stands for
‘radiation’. Equation 1.14 then implies that
Ωm + Ωr + ΩΛ + Ωk = 1. (1.19)

We can also define Ωtotal = Ωm + Ωr + ΩΛ = 1 − Ωk . All these Ω parameters,


known as density parameters, are functions of time, except in a few special
cases. Figure 1.11 shows how the density parameters varied with the scale factor
of the Universe. As with the Hubble parameter, we shall use a subscript 0 for the
present-day value, e.g. ΩΛ,0 = Λc2 /(3H02 ). However, be warned that many
textbooks omit 0 subscripts for the present-day Ω values. Historically, another notation
It’s also useful to define a critical density ρcrit such that has been used, which is now out
of favour: q0 = −(RR̈/Ṙ2 )0 =
ρm 1
Ωm = , (1.20) 2 Ωm,0 − ΩΛ,0 and σ0 = Ωm,0 /2.
ρcrit Besides mentioning them here,
so that we won’t use this notation in
this book.
3H 2
ρcrit = , (1.21)
8πG
(we’ll explain below why this is ‘critical’).

23
Chapter 1 Space and time

Planck
scale EW BBN now

Ωr Ωm ΩΛ
1

values of density parameters


0.5

Figure 1.11 Past and future


variations in the density
parameters given the present-day
WMAP cosmological
parameters, with dimensionless
0
scale factor a = R/R0 . Some
key times are marked: the
Planck scale, the epoch of
electroweak symmetry breaking −30 −20 −10 0 10
(EW), Big Bang nucleosynthesis log10 a
(BBN) and the present.

Matter densities are often expressed relative to this critical density. For example,
the baryon density of the Universe, ρb , is sometimes written as
ρb
Ωb = . (1.22)
ρcrit
The matter density of the Universe can be expressed as
ρm = Ωm ρcrit = 1.8789 × 10−26 Ωm h2 kg m−3
= 2.7752 × 1011 Ωm h2 M) Mpc−3 , (1.23)
where 1 M) is the mass of the Sun.
We shall see in later chapters that most of the matter content of the Universe
is dark matter that neither absorbs nor emits light. Dark matter appears to
interact only (or very nearly only) through gravitation, and its only observational
consequences so far have been via its gravitational effects.
The current experimental values from the WMAP satellite (which we shall meet
later) are
Ωm,0 h2 = 0.1326 ± 0.0063, (1.24)
ΩΛ,0 = 0.742 ± 0.030, (1.25)
Ωb,0 h2 = (2.273 ± 0.062) × 10−2 . (1.26)

We’ll show in Chapter 2 that the contribution from radiation and neutrinos gives

Ωr,0 h2 1 4.2 × 10−5 . (1.27)

24
1.6 The age of the Universe

The value of Ωr,0 is therefore negligible, and we’ll usually assume that it’s zero
in this book. Note that WMAP doesn’t constrain Ωm,0 on its own, but rather
constrains the product of Ωm,0 with the Hubble parameter squared.

1.6 The age of the Universe


How old is the Universe? And how long does it take light from distant galaxies to
reach us?
It is a quite astonishing feat of modern precision cosmology that we know the
time since the Big Bang to better than a few per cent accuracy. To see how this is
calculated, we need to relate the redshift z to the age of the Universe at that epoch,
using only present-day observable quantities. The Hubble parameter H is closely
related to dz/dt:
1 dR R0 d(R/R0 ) d(1/(1 + z))
H= = = (1 + z)
R dt R dt dt
d(1/(1 + z)) dz
= (1 + z)
dz dt
−1 dz
= . (1.28)
1 + z dt
Next, we re-cast Equation 1.14 in terms of −kc2 (assuming Ωr = 0):
8πGρm R2 Λc2 R2
−kc2 = H 2 R2 − − . (1.29)
3 3
In particular, at the present time we have
8πGρm,0 R02 Λc2 R02
−kc2 = H02 R02 − − , (1.30)
3 3
where ρm,0 is the present-day matter density. Clearly, the right-hand sides of
Equations 1.29 and 1.30 must be equal. But ρm = ρm,0 × R03 /R3 , so
8πGρm,0 R03 Λc2 R2 8πGρm,0 R02 Λc2 R02
H 2 R2 − − = H02 R02 − − .
3R 3 3 3
We can express this in terms of the present-day density parameters Ωm,0 and ΩΛ,0
using Equations 1.15 and 1.17. After rearranging, this gives
C D2 C 3 D C D
H R02 R0 R02 R02
= 2 + Ωm,0 − + ΩΛ,0 1 − 2 . (1.31)
H0 R R3 R2 R
We can simplify this using 1 + z = R0 /R, which gives
C D2
H 6 G
= (1 + z)2 + Ωm,0 (1 + z)3 − (1 + z)2
H0 6 G
+ ΩΛ,0 1 − (1 + z)2 , (1.32)
which can be rearranged to give
C D2
H
= (1 + z)2 (1 + z Ωm,0 ) − z(2 + z) ΩΛ,0 . (1.33)
H0

25
Chapter 1 Space and time

Finally, using Equation 1.28, we reach


C D2
dz 6 G
= H02 (1 + z)2 (1 + z)2 (1 + z Ωm,0 ) − z(2 + z) ΩΛ,0 , (1.34)
dt

from which we can easily find dt/dz. Although admittedly pretty ghastly, this
equation does have the advantage of using only present-day observable quantities,
and we’ll be referring to it several times in this book.
In general, dt/dz can’t be integrated analytically, so t(z) can be calculated only
by numerically integrating dt/dz. The time to z = ∞ is the age of the Universe,
and this is shown in Figure 1.12. Equation 1.34 can also be integrated to give the
time taken for light to reach us from redshift z. This is known as the lookback
time.
1
0 =1
.5
1 .2
t0 H

0.8
0 =

1.0
t0 H

0 =

0.9
t0 H

=
0.6
0
t0 H

0.8
=
0 closed
ΩΛ,0

t0 H

0.4
fla
t

0.7
n g
=
0.2 open
rati
ele g 0
H
c

67
a c tin

0.6
lera
t0

c e
de =
0
0
H
t0

−0.2
0 0.2 0.4 0.6 0.8 1 1.2
Ωm,0

Figure 1.12 The age of the Universe times the Hubble parameter, for several
cosmological models. Also shown is the spatial geometry (open, flat and closed)
and whether the present-day expansion of the Universe would be accelerating or
decelerating.
How does this estimate of the age of the Universe compare to the ages of the
oldest objects in the Universe? There is now a well-developed theory for main
sequence stellar evolution that can be used to find the ages of stars. Particularly
useful are globular clusters (e.g. Figure 1.13), which are some of the oldest
gravitationally-bound objects in the Universe. The stars that comprise any given
globular cluster are believed to have formed at about the same time (though the
ages of globular clusters vary). More luminous stars spend less time on the main
sequence in the colour–magnitude diagram, so if one can find the luminosity of

26
1.7 The flatness problem

the stars in a globular cluster that are just leaving the main sequence, one can infer
an age for the globular cluster.
The oldest known globular cluster appears to be 12.7 ± 0.7 Gyr old. In the 1990s
it was recognized that globular cluster ages are an important constraint on the age
of the Universe, and therefore on the cosmological parameters that control the
geometry and fate of the Universe (Figure 1.12). But as we shall see in the next
section, there seemed to be very good reasons to expect that we live in a Universe
with Ωm,0 = 1 and Λ = 0, which as you will see turns out to be significantly
younger, so these stars appeared to be older than the Universe.

Exercise 1.4 Starting from Equation 1.33, or otherwise, show that in an


Ωm = 1, Λ = 0 universe,
C D2/3 Figure 1.13 The globular star
R t cluster M80. Most of its stars are
= . (1.35)
R0 t0 older and redder than our Sun.

Exercise 1.5 Using Equation 1.35, show that the age of the Universe in an
Ωm = 1, Λ = 0 model is t0 = 2/(3H0 ), and evaluate the age in Gyr for the value
of H0 in Section 1.5. A spacetime that expands in this way is sometimes called
the Einstein–de Sitter model. ■

1.7 The flatness problem


In this section we shall introduce you to a profound, and as yet unsolved, problem
in cosmology. We have already noted that the density parameters in general
depend on time. For example, what was Ωm at a redshift of z = 1000 (about the
redshift of the cosmic microwave background)? Let’s assume for now that Λ = 0,
so ΩΛ is always zero, and Ωm = 1 − Ωk . Equation 1.18 can be modified to give
the current value of Ωk ,
−kc2
Ωk,0 = ,
R02 H02
and dividing this into Equation 1.18 gives
C D2
Ωk (z) R02 H02 2 H0
= 2 2 = (1 + z) .
Ωk,0 R H H
Now we know how to relate H to z (Equation 1.33), so we can find how Ωk and
Ωm evolve.
First, if Λ = 0 and Ωm,0 = 1, then Ωk,0 = 0. But this can happen only if k = 0, so
Ωk must always be zero, and Ωm = 1 at all times.
But if Λ = 0 and Ωm,0 = 0.7, then at z = 1000, Ωm = 0.9996. As redshift
increases, Ωk tends to zero and Ωm tends to 1. About one second after the Big
Bang, Ωm = 1 − 10−15 . So if the present-day value of Ωm is not 1, it must have
had only a very tiny offset from 1 in the early Universe. What could cause it to be
so close to 1, but not quite equal to 1?
This fine-tuning problem is worse if we include a non-zero Λ. If ΩΛ,0 = 0.7
and Ωm,0 = 0.3, then at z = 1000 their values were ΩΛ = 3.3 × 10−9 and
Ωm = 0.999 999 996.
27
Chapter 1 Space and time

Figure 1.14 shows how the Ω parameters depend on redshift, for various
cosmological models. If Ωm = 1 and ΩΛ = 0, then they keep these values
throughout the history of the Universe. This would remove the need to explain the
cosmological fine-tuning in the Ω parameters, and led to the expectation among
at least some astronomers that the likely values are Ωm,0 = 1 and ΩΛ,0 = 0.
However, there is now good experimental evidence to reject this particular
cosmological model, so we are left with the problem: what caused the fine-tuning
in the early Universe?

Figure 1.14 Schematic


illustration of the variation of
the density parameters Ωm
(upper, red lines) and ΩΛ
Ωm or ΩΛ

1 (lower, black lines) with time,


for various cosmological
models. Note that the
time-dependence of a density
parameter for any given
starting point also depends on
the starting values of the
0 time other density parameters.

One possible solution for the smallness of Ωk , or rather a class of possible


solutions, is inflation, which we shall meet in later chapters. But it is by no means
certain that this is the correct solution. Inflation also does not offer any clear
explanation for the fine-tuning of ΩΛ . There may be Nobel prizes to be had for
successful insights into the origin of ΩΛ .

1.8 Distance in a warped spacetime


In 1963, Maarten Schmidt and Bev Oke made an astonishing discovery: the
extraterrestrial radio source 3C273 is at what was then an unprecedentedly
vast distance, a redshift of z = 0.158. Suddenly astronomers realized that
telescopes could explore a much bigger volume of the Universe than had been
C supposed, and see objects as they were much earlier in the history of the Universe.
This produced great excitement and optimism in cosmology. This also meant
A that 3C273 must be extremely luminous. We shall explore the causes of this
prodigious luminosity in later chapters.
Using dz/dt from Section 1.6, then integrating numerically with the cosmological
D parameters from Section 1.5, it turns out that the light from 3C273 has taken
1.9 billion years to reach us, which is about 14% of the age of the Universe. Does
B
this mean that 3C273 is 1.9 billion light-years away?
It turns out that ‘distance’ is a surprisingly tricky concept to define in an
Figure 1.15 Different options expanding spacetime. Figure 1.15 shows some of the problems. Do we mean the
for measuring cosmological distance that the light has travelled (B to C)? Or do we mean the distance that
distances in an expanding 3C273 was at when the light was emitted (B to A)? Or do we mean the distance
universe. that 3C273 is at now (D to C)?

28
1.8 Distance in a warped spacetime

Cosmologists have settled on a convention to use D to C (neglecting any peculiar


velocities), in the preferred reference frame of the Robertson–Walker metric. This
distance is known as the comoving distance. The comoving distance to 3C273 is
about 637 Mpc, or about 2.1 billion light-years. This is longer than the light-travel
distance, because the space has been expanding since the light was emitted, and
3C273 is now further away than when it emitted the light. Figure 1.16 illustrates
how the Universe looks in proper coordinates (proper distances are those that
would be measured by a tape measure at a fixed time in the cosmic rest frame),
and in comoving coordinates.

Figure 1.16 A simulation of the expanding Universe as seen in proper coordinates and
in comoving coordinates. The panels show a simulation of a segment of the Universe at
redshifts of z = 5.7 (left), z = 1.4 (centre) and z = 0 (right). The upper panels are shown
in proper coordinates, while the lower panels are shown in comoving coordinates. A white
bar shows a comoving length of 125/h Mpc. Note also the gradual increase in large-scale
structure in this simulation with time, which we shall return to in later chapters.
To calculate the comoving distance to a cosmological object, we can use the
Robertson–Walker metric, Equation 1.6. We want to know the radial distance
between us and a distant object, at a fixed coordinate time t = t0 (i.e. the present),
perhaps imagining a tape measure stretched between us and it, which we read at
the time t = t0 . Therefore dt = 0 and dθ = dφ = 0. The remaining non-zero
terms of Equation 1.6 are
−R2 (t0 ) dr2
ds2 = .
1 − kr2
This is −1 times the square of a spatial separation. We define the comoving
distance dcomoving via
R(t0 ) dr R0 dr
ddcomoving = √ =√ (1.36)
1 − kr 2 1 − kr2
(with apologies for the profusion of the letter d) so that
* r
dr%
dcomoving = R0 √ . (1.37)
0 1 − kr%2

29
Chapter 1 Space and time

This integral has a standard solution, depending on the value of k:



R0 sin−1 r if k = +1,
dcomoving = R0 r if k = 0, (1.38)

R0 sinh−1 r if k = −1.
Now, it’s all very well imagining imaginary tape measures, but how can we relate
this to real observable quantities? To find out, consider the light ray arriving at the
Earth from the distant object. Light rays have ds = 0, and the motion of this light
ray is purely radial, so dθ = dφ = 0. The motion of the light ray is therefore just
R(t) dr
√ = c dt. (1.39)
1 − kr2
Our aim is to calculate dcomoving following Equation 1.38.
In general, R(t) can’t be expressed analytically, though there are a few analytic
special cases (e.g. R(t) ∝ t2/3 for a matter-dominated universe with Ωm = 1 and
Λ = 0). But it’s more helpful to express the right-hand side of Equation 1.39 in
terms of redshift, which (unlike lookback time) is directly observable. To do this,
we start with the chain rule for differentiation:
dt c dR c dR
c dt = c dR = = , (1.40)
dR dR/dt RH
where we have used H = Ṙ/R. Next, R = R0 /(1 + z), which we can
differentiate to find
−R0
dR = dz.
(1 + z)2
Putting this into Equation 1.40 gives
−c R0 −c
c dt = 2
dz = dz
HR (1 + z) (1 + z)H
and therefore
R dr −c dz
√ = . (1.41)
1 − kr 2 (1 + z)H
)r
The comoving distance is 0 R0 (1 − kr%2 )−1/2 dr% (Equation 1.37), and this is
almost in the right form. R0 = (1 + z)R, so R0 dr = (1 + z)R dr, and therefore
R dr −c dz
√ 0 = ddcomoving = , (1.42)
1 − kr 2 H
where H is a function of redshift. Therefore the comoving distance is just
* z
dz %
dcomoving = c %
. (1.43)
0 H(z )

But we know H in terms of z and the observed cosmological parameters — we


found this back in Equation 1.33. Putting this in here, and using R0 /R = 1 + z,
we get
−c dz
ddcomoving = - ,
H0 (1 + z) (1 + z Ωm,0 ) − z(2 + z) ΩΛ,0
2

30
1.9 The edge of the observable Universe

thus
* z
c dz %
dcomoving = - . (1.44)
H0 0 (1 + z % )2 (1 + z % Ωm,0 ) − z % (2 + z % ) ΩΛ,0
(We have integrated the previous differential from z to 0, but used its minus sign
to swap the limits and obtain a positive integral.) There are a few special cases
where this integral comes out as a relatively simple expression, such as when
Λ = 0 and Ωm = 1:
2c ! F
dcomoving = 1 − (1 + z)−1/2 only for Ωm = 1, Λ = 0. (1.45)
H0

1.9 The edge of the observable Universe


How big is the observable Universe today? If we lived in a flat, unexpanding
space, the observable Universe would be the region of the Universe that could
have sent us a light signal. The radius would be ct0 , where t0 is the age of the
Universe. This radius would be increasing at a speed c.
In the Robertson–Walker expanding universe, this is different in a rather
astonishing way. First, we’ll calculate the size of the observable Universe. This
size is the same in comoving coordinates and in proper coordinates, provided that
we’re referring to the size at time t = t0 (i.e. now) when the scale factor is
R = R0 .
The size of the observable Universe is therefore the comoving radius in
Equation 1.44 as redshift z tends to infinity. In general, this comes out as a
number times c/H0 . For an Ωm = 1 and Λ = 0 universe, you can see from
Equation 1.45 that the radius of the observable Universe is 2c/H0 , or about
8300 Mpc for the currently-accepted value of H0 in Section 1.5. For the
currently-accepted values of the density parameters in Section 1.5, the radius of
the observable Universe comes out at about 3.53c/H0 . The volume enclosed is
sometimes referred to as the Hubble volume.
Next, we’ll calculate how fast this observable Universe is growing, in proper
distances rather than comoving distances. Again, we take the time now to be t0
and the current scale factor to be R0 . Suppose that we see a distant object
at a redshift z and a proper distance R0 r. After a time δt, the time will be
t1 = t0 + δt, and the new scale factor of the Universe will be R1 . Therefore the
new proper distance to this distant object will be
C D
dR
R1 r = R0 r + δt r 1 R0 r + (R0 H0 δt)r.
dt
Therefore the rate of change of proper distance is
d(Rr) R1 r − R0 r
1 = R0 H0 r, (1.46)
dt δt
i.e. just H0 times the current proper distance. This is one sense in which the
Hubble parameter is measuring the rate of the expansion of the Universe.
So, for the currently-accepted values of the density parameters in Section 1.5,
the observable Universe is getting bigger at an astonishing rate of
(3.53c/H0 ) × H0 = 3.53c. In Section 1.5 we met Equation 1.46 in a different
31
Chapter 1 Space and time

guise (Equation 1.13), but warned that the left-hand side is not a recession
velocity, but rather an apparent recession. Here, you see our reason for this
warning! An object moving through a flat, unexpanding space has a maximum
speed of c, but an expanding spacetime is a very different physical situation,
and the maximum cosmological apparent ‘recession’ speed in our Universe is
currently about 3.53c.

1.10 Measuring distances and volumes


Redshift is one way of measuring distances, but to convert redshift into a distance,
we need to know the cosmological parameters, especially H0 . If we want to
measure these fundamental cosmological parameters, we need independent ways
of measuring distances.
One useful distance measure is the angular diameter distance. If an object has a
known size D, and subtends an angle θ in radians, then the angular diameter
distance is
D
dA = . (1.47)
θ
(Compare Figure 1.2, in which r takes the place of dA .)
Another useful measure is the proper motion distance. If an object has a known
transverse velocity u (i.e. motion in the plane of the sky), and has an observed
angular motion of dθ/dt, then the proper motion distance is defined as
u
dM = . (1.48)
dθ/dt
Finally, we can also define the luminosity distance. If an object has a known
luminosity L, and the observed flux is S, then the luminosity distance is
C D
L 1/2
dL = . (1.49)
4πS
These three approaches to measuring distance give the same answer in a flat,
unexpanding space, but they are surprisingly different in the Robertson–Walker
metric. Figure 1.17 illustrates how these distances are constructed in
Robertson–Walker universes. As usual, we use t0 and R0 for the current time and
current scale factor, respectively. Suppose that photons are emitted from a distant
object of size D at a time t1 , when the scale factor was R1 . From the figure, we
see that D = R1 rθ, so dA = R1 r.
Next, suppose that another object at the same redshift is moving with a proper
transverse velocity u, and is seen to move at an angular speed dθ/dt, as in
Equation 1.48. Here, the cosmological time dilation first noted in Section 1.4
comes into play. If t% is the time measured when the photons are emitted, then
dt% /dt = R1 /R0 . The proper velocity is
dD d(R1 rθ)
u= %
= ,
dt dt%
and substituting this into Equation 1.48, we find dM = R0 r. Note that when
k = 0, this equals the comoving distance (compare Equation 1.38).

32
1.10 Measuring distances and volumes

time
θ

R0 r

Figure 1.17 Light rays in


R1 r
the Robertson–Walker metric,
D for illustrating distance
θ
measures.

Finally, suppose that an object at this redshift has a bolometric luminosity L


(where ‘bolometric’ means the total over all wavelengths). The photons are
distributed over a sphere with a proper area of 4π(R0 r)2 (see Figure 1.17). The
energy emitted in a time dt% will be L dt% , but the redshifting will reduce the
energy received by a factor of R1 /R0 . Therefore the flux received will be
L dt% R1 /R0 1 dt% R1 1 1
S= × 2
= L 2 .
dt 4π(R0 r) dt R0 4π R0 r2
But because of cosmological time dilation, dt% /dt = R1 /R0 , so
R1 R1 1 1 LR12 L
S=L 2 2
= 2 4 = 2 .
R0 R0 4π R0 r 4πr R0 4π(R0 r/R1 )2
Comparing this to Equation 1.49, we see that dL = R02 r/R1 . It would be
wonderful if comparing these three distance measures for a single object gave us
constraints on the cosmological parameters. It’s perhaps a little disappointing,
then, that they are all closely related. Using 1 + z = R0 /R1 , we find that
dL = (1 + z)dM = (1 + z)2 dA , (1.50)
independent of the cosmological parameters. The constraints on the cosmological
parameters can instead be gleaned from how these distance measures vary with
redshift. Figure 1.18 shows how the angular diameter distance varies with
redshift, for various cosmological models.

1
E
D
B Figure 1.18 The variation
H0 dA /c

C of angular diameter distance


A with redshift, for the
0.1 following cosmological
models. A: Ωm = 1, Λ = 0.
B: Ωm,0 = 0.1, Λ = 0.
C: Ωm,0 = 0.1, ΩΛ,0 = 0.9.
0.1 1 10
redshift, z
D: Ωm,0 = 0.01, Λ = 0.
E: Ωm,0 = 0.01, ΩΛ,0 = 0.99.

33
Chapter 1 Space and time

Again, the Robertson–Walker metric holds another surprise: the angular diameter
distance dA has a maximum value, as you can see in Figure 1.18. What does this
mean? In flat unexpanding space, objects always appear smaller when they are
placed further away, but in the Robertson–Walker spacetime, an object that is
placed further away can appear larger. This is partly because the Universe was
smaller when the light was emitted, so the object was then nearer to us. It’s partly
also to do with the geometry of the space. For example, light rays emitted on a
two-dimensional spherical surface at the South pole will initially diverge, but
will eventually converge again as they approach the North pole. A spherical
unexpanding space would therefore still have a maximum angular diameter
distance.
We shall see how these distances affect Olbers’ paradox later in this book. In the
meantime, the following exercise will give you a clue to how Olbers’ paradox is
resolved in a Robertson–Walker universe.

Exercise 1.6 How does surface brightness (flux per square degree) vary with
redshift? ■
In observational astronomy we rarely measure the total luminosities of distant
objects; instead, we tend to measure the redshift and the flux in a particular
wavelength interval Δλobs . Two effects change the observed flux: first, the
observed wavelength interval Δλobs corresponds to a smaller wavelength interval
in the emitted frame, because Δλobs = (1 + z)Δλem ; second, the distant object
may emit different amounts of light at rest wavelengths of λem and λobs . This
latter effect is known as the K-correction for historical reasons, and we shall
meet it later in this book. If the underlying spectrum is a ‘power law’, i.e. if the
flux per unit frequency is Sν ∝ ν −α , then a useful expression for the luminosity is
C D2
Lν Sν dM
= −26 (1 + z)1+α . (1.51)
1026 W Hz−1 sr−1 10 W Hz−1 m−2 3241 Mpc
Finally, cosmologists often use the term comoving volume to describe volumes
with the expansion factor divided out (Figure 1.16). We shall use this many
times throughout this book. Imagine a patch of sky with an angular area δΩ
(in units, for example, of square degrees). We can convert this to a proper
area at any redshift using the angular diameter distance: δA = d2A δΩ. Now
imagine that we are observing a slab at redshift z with a proper area δA and
proper thickness R(1 − kr2 )−1/2 dr. The proper volume will therefore be
dVproper = δA × (1 − kr2 )−1/2 R dr, or
R dr
dVproper = d2A (z) δΩ × √ . (1.52)
1 − kr2
Now the comoving volume is just dVcomoving = (1 + z)3 × dVproper , so
R dr
dVcomoving = d2A (z) δΩ √ × (1 + z)3 . (1.53)
1 − kr 2

There are many ways of integrating this, but one approach is to express it in terms
of the proper motion distance dM = R0 r. (Recall that in a flat universe this is
equivalent to the comoving distance.) Then
d2M (z)
dVcomoving = J d(dM ) δΩ. (1.54)
1 + Ωk,0 H02 d2M /c2
34
1.11 The fate of the Universe

We’ll express this in another useful way in Chapter 6, Equation 6.26.


The function dVcomoving /dz is plotted in Figure 1.19. This can be integrated to
give Vcomoving (z), the total volume enclosed by a sphere with radius z centred on
us. The form of the comoving volume equation depends on whether k = 0, k = 1
or k = −1. For reference, there are analytic solutions for the comoving volume
over the whole sky, in terms of the proper motion distance dM :
 J

 4πD 3 (2Ω )−1 [(d /D ) 1 + Ω d2 /D 2 − |Ω |−1/2 sin−1 ((d /D )|Ω |1/2 )] if k = +1,
 H k,0 M H k,0 M H k,0 M H k,0
4 3
V (dM ) = 3 πdM J if k = 0,


4πD3 (2Ω )−1 [(d /D ) 1 + Ω d2 /D2 − |Ω |−1/2 sinh−1 ((d /D )|Ω |1/2 )] if k = −1,
H k,0 M H k,0 M H k,0 M H k,0

(1.55)
where DH = c/H0 is sometimes known as the Hubble distance. Also, for
reference, the proper motion distance can be expressed as
 : * z L
 1 6 G−1/2

 DH sin |Ωk,0 |1/2 2
(1 + z) (1 + Ωm,0 z) − z(2 + z)ΩΛ,0 dz if k = +1,

 |Ωk,0 |1/2

 0
dM = dcomoving if k = 0,

 : * z L

 1 6 G−1/2

 sinh |Ωk,0 |1/2 (1 + z)2 (1 + Ωm,0 z) − z(2 + z)ΩΛ,0
DH 1/2
dz if k = −1.
|Ωk,0 | 0
(1.56)
Equation 1.44 gives the expression for dcomoving . Remember that proper motion
distance dM is equal to comoving distance dcomoving only when k = 0.

10 D
(H0 /c)3 (dV /dz) δΩ

E
1
B
0.1 C Figure 1.19 The variation
of the comoving volume
0.01 derivative dV /dz with
A redshift, for the various
0.001 cosmological models in
0.1 1 10 100
Figure 1.18, and for an
redshift, z
angular area on the sky of δΩ.

1.11 The fate of the Universe


The cosmological parameters in Section 1.5 can also tell us the fate of the
Universe. An extraordinary consequence of the success of the Robertson–Walker
model is that we know the ultimate fate of the atoms in our bodies, at least up to
about the year AD 1035 .
We can again re-cast Equation 1.14, this time by changing the variables to
a = 1/(1 + z) = R/R0 and τ = H0 t. This comes out as
C D2 C D
da 1
= 1 + Ωm,0 − 1 + ΩΛ,0 (a2 − 1).
dτ a
35
Chapter 1 Space and time

This differential equation can be solved numerically, and the predicted fate
of the Universe is shown in Figure 1.20 as a function of Ωm,0 and ΩΛ,0 . The
cosmological parameters in Section 1.5 are very clearly in the regime of
expanding forever.

an g
g b an
bi ig b
g
b
no
2

ting
c e lera ng
1 ac ati
eler
d ec
expands forever
ΩΛ,0

0 recollapses eventuall
y

closed
−1 fla
open t

Figure 1.20 The predicted


fate of the Universe as a
function of the present-day
0 1 2 3
Ωm,0
cosmological density
parameters.

What will this be like? The Universe will become increasingly sparse, as the
matter density decreases and Ωm tends to zero. The cosmological constant will
then dominate, and the Universe will tend to the ΩΛ = 1 model. Equation 1.14
will reduce to
Λc2
H2 =
3
so the Hubble constant will, finally, be truly a constant. The expansion will be
exponential, as you can see from substituting H = Ṙ/R into the equation above:
#
1 dR Λc2
= ,
R dt 3

which has the solution R ∝ ect Λ/3 . This is sometimes known as de Sitter
spacetime.
This exponential expansion has a curious consequence. Regions of the Universe
that were once in causal contact eventually lose contact with each other, as the
rapidly expanding space makes it impossible for even light signals to pass between
them. To show this, imagine a light signal being sent out in this universe. How far
can it get? The light signal will be in an exponentially expanding universe, so in a
sense it will go infinitely far, but if we normalize our distances by the scale factor,
then we’ll see that the light signal reaches only a finite comoving region.
36
1.11 The fate of the Universe

Light signals still satisfy Equation 1.39, i.e. R(t) dr = c dt (note that k = 0
because Ωk = 1 − Ωm − ΩΛ = 1 − 0 − 1 = 0), but this time R(t) is very
different. If we set R(t) = R1 at a time t = t1 , then

R(t) R(t) ect Λ/3 √
= = √ = ec(t−t1 ) Λ/3 .
R(t1 ) R1 ect1 Λ/3
If we treat R1 r as our choice of comoving distance, then we have

R1 dr = c e−c(t−t1 ) Λ/3 dt
so
* ∞ √ #
−c(t−t1 ) Λ/3 c 3 c
R1 r = ce dt = - = = .
t=t1 c Λ/3 Λ H
So as time
- tends to infinity, the light signal penetrates a comoving
- distance of only
R1 r = 3/Λ. Objects beyond a comoving distance of 3/Λ cannot be seen,
because the intervening space is expanding so quickly that even a light signal
cannot cross it. But H is constant and the expansion rate is unchanging, so this is
true at any time.
- What this would look like is a fixed horizon around you at a
distance of 3/Λ, and neighbouring galaxies being accelerated away from you
towards this horizon and out of your observable Universe, which is gradually
being emptied out. However, you would never see a galaxy cross this horizon: its
redshift would get larger as it approached the horizon, and if you could watch a
clock in that galaxy, the time dilation of that clock would get longer. If t2 is the
coordinate time when the galaxy reaches the horizon, then you would see the
clock slow at it approached t2 , but it would never quite reach t2 from your
point of view. However, from the galaxy’s point of view, the passage of time is
unaffected. There, they would see your clocks running slowly, as you passed out
of their observable Universe.
You may recognize this redshifting and time dilation from descriptions of objects
falling into the event horizon of a black
- hole (which you will also meet in
Chapter 6). Indeed, the horizon at 3/Λ is a cosmological event horizon. The
Universe in the far future will look like a black hole, but inside-out.

Exercise 1.7 How big, in megaparsecs and in metres, will the cosmological
event horizon be? You will need the cosmological parameters in Section 1.5. How
does this compare to the current size of the observable Universe? ■
How far ahead can we look? When Ralph Alpher and George Gamow realized
that the early Universe was hot and dense enough for nuclear reactions, and
calculated the amount of heavy elements production, they were condemned by
some physicists for their rashness. What grounds do we have, the critics argued,
for believing that the same physical theories applied three minutes after the
Big Bang? The Universe provided the rebuttal: the predictions of primordial
nucleosynthesis have been very extensively confirmed, as we shall see in later
chapters. Nevertheless, the words of warning from these critics should still ring in
our ears as we extrapolate to the distant future.
At the moment all the baryons in the Universe are either involved in the cycle of
star birth and death, or could potentially take part. However, by about the year
one trillion (1012 years), the Universe will be too sparse to support more star
37
Chapter 1 Space and time

formation. At that point, baryons will either be in degenerate matter (in white
dwarfs or neutron stars), or be locked up in brown dwarfs, or have fallen into black
holes, or just be atoms or molecules too sparsely distributed to form new stars.
Looking further ahead, we eventually reach the epoch of possible proton decay.
In the standard model of particle physics, the proton is stable and does not decay.
However, some ‘grand unified theories’ in particle physics predict eventual
proton decay. The best current limit on proton half-life t1/2 comes from the
Super-Kamiokande experiment in Japan, which found t1/2 > 1035 years. Perhaps
1035 years is the furthest ahead that one might venture to predict the contents of
the Universe. But who would be around to contradict you if you got it wrong?

Summary of Chapter 1
1. In a flat Euclidean space, the number counts of a homogeneous and isotropic
distribution of objects vary as dN/dS ∝ S −5/2 , but the total flux diverges.
2. In special relativity, lengths and times are not observer-independent, but the
relativistic interval s is invariant.
3. δs = 0 always for light rays, and δs = c δτ always for massive particles,
where τ is proper time.
4. Any homogeneous, isotropic expanding Universe consistent with special
relativity can be described by the Robertson–Walker metric
C D
2 2 2 2 dr2 2 2 2 2 2
ds = c dt − R (t) + r dθ + r sin θ dφ . (Eqn 1.6)
1 − kr2
5. Cosmological redshift z, given by
R0 1 λobserved
1+z = = = , (Eqn 1.10)
R1 a λemitted
is caused by the expansion of the Universe, not by the Doppler effect.
Random galaxy motions (known as proper motions) can contribute
additional red or blue shifts from the relativistic Doppler effect.
6. Nevertheless, if one regards cosmological redshift as an apparent recession
velocity, then the apparent velocity is proportional to distance from the
observer, with the constant of proportionality known as the Hubble
parameter.
7. The contributions to the energy density of the Universe from matter and the
cosmological constant are denoted as Ωm and ΩΛ , respectively, and are
defined by
8πGρm
Ωm = , (Eqn 1.15)
3H 2
Λc2
ΩΛ = . (Eqn 1.17)
3H 2
These determine the age and fate of the Universe.
8. Neglecting radiation, if Ωm + ΩΛ = 1 at any time, then this is true at all
times. Also, if either Ωm or ΩΛ is zero, then this is also true at all times. In
all other situations, there is a fine-tuning problem in the early Universe for
the values of Ωm and ΩΛ .
38
Summary of Chapter 1

9. ‘Distance’ can have several meanings in a Robertson–Walker metric. We


have defined the angular diameter distance, proper motion distance and
luminosity distance:
D
dA = , (Eqn 1.47)
θ
u
dM = , (Eqn 1.48)
dθ/dt
C D
L 1/2
dL = . (Eqn 1.49)
4πS
These give different but related values for the distance. Angular diameter
distance has a maximum value, so objects placed further away could
sometimes appear larger.
10. The comoving distance to any distant object is the current proper distance,
neglecting peculiar velocities. (This turns out to be equal to the proper
motion distance when k = 0.)
11. The proper size of the observable Universe is increasing at a rate much larger
than the speed of light, c. An object moving in a flat spacetime is a very
different physical situation to free-floating objects in an expanding space.
12. The currently-accepted cosmological parameters imply that the foreseeable
fate of the Universe is exponential expansion.

Further reading
• For a more leisurely introduction to the Robertson–Walker metric, see
Lambourne, R., 2010, Relativity, Gravitation and Cosmology, Cambridge
University Press.
• For a useful review of distance measures in cosmology (though pre-dating dark
energy, which we shall meet in later chapters, and the observation that
ΩΛ 1 0.7), see Carroll, S.M., Press, W.H. and Turner, E.L., 1992, ‘The
cosmological constant’, Annual Review of Astronomy and Astrophysics,
30, 499.

39
Chapter 2 The cosmic microwave background
I would rather live in a world where my life is surrounded by mystery than
live in a world so small that my mind could comprehend it.
Harry Emerson Fosdick

Introduction
We effectively answer Olbers’ paradox in this chapter, with the first and most
famous cosmic background light. You will also find out how quantitative
cosmology is done using this background, which has already resulted in two
Nobel prizes.

2.1 The discovery of the cosmic microwave


background
The Big Bang theory is supported by three major observations, one of which you
have already met, and the other two you will meet in this chapter: the expansion
of the Universe, the cosmic microwave background (CMB), and primordial
nucleosynthesis. Some would add the large-scale structure of matter in the
Universe, which we shall also meet later in this book in various guises.
Perhaps the most powerful recent development is the fact that many disparate
observations all converge on the same cosmological model, described in
Chapter 1. Our cosmological model is over-determined, in the sense that there are
more independent experimental constraints than there are parameters to constrain.
Perhaps this is a sign of a mature scientific discipline; the field is now described as
‘precision cosmology’, of which more later. This expression came into use
about the time of the microwave background measurements from the Wilkinson
Microwave Anisotropy Probe (WMAP), for reasons that will become clear in this
chapter.
The microwave background is the redshifted light from when the Universe was
last opaque. Because of the finite speed of light, this light must appear to any
observer as a spherical, receding surface, with the observer at the centre. For
this reason the microwave background is sometimes called the ‘surface of last
scattering’. In this sense, Olbers was exactly right — the sky is indeed uniformly
bright. It has a black body spectrum to an excellent approximation — in fact, it is
the most perfect black body spectrum known in existence (Figure 2.1). From
calculations of the probability of photon–electron collisions as a function of the
ionization of the primordial gas, the redshift of the last scattering surface can be
estimated as z 1 1000. (More precisely, it’s at z = 1090.)
Around the time of last scattering is also the epoch of recombination. Before that
time, photons suffered many Thomson-scattering collisions with electrons, and
would ionize any atoms that tried to form. Once the density dropped enough for
the Universe to become transparent, the electrons and nuclei were free to combine
to form atoms, and the photons were free to travel in the newly-transparent

40
2.1 The discovery of the cosmic microwave background

wavelength/cm
Figure 2.1 The intensity of
300 30 3 0.3 0.03 the CMB radiation, measured by
10−17
ground-based various techniques. Both the
10−18 balloon x-axis and y-axis are plotted
intensity/J s−1 m−2 sr−1 Hz−1

cyanogen logarithmically. The shallow


10−19 COBE/DMR slope at the left-hand side is
COBE/FIRAS
known as the Rayleigh–Jeans
10−20 regime, while the steeper
slope at shorter wavelengths is
10−21
known as the Wien regime. The
10−22 uncertainties for the data from
the FIRAS instrument on the
10−23 COBE satellite are smaller than
the plotting symbols, and the
10−24 deviation from a perfect black
0.1 1 10 100 1000
frequency/GHz
body is less than 50 parts per
million.

Universe. From a detailed statistical mechanical calculation2 (we shall spare you
the details), the fraction of ionized gas x near z = 1000 comes out as
-
−3 Ωm,0 h2 ! z F12.75
x(z) 1 2.4 × 10 , (2.1)
Ωb,0 h2 1000
i.e. it depends on the density of baryons relative to the critical density, Ωb,0 , and
on Ωm,0 . We’ll see later how the CMB gives estimates of Ωm,0 and Ωb,0 . (The
physical process of electrons binding with protons to make hydrogen is known as
‘recombining’ but this is a misnomer here, because they are in fact combining for
the first time.)

Exercise 2.1 We can ionize a hydrogen atom by colliding it with a photon


with an energy E = hν = 13.6 eV. Let’s think of recombination as the inverse of
ionization. Imagine that the electron binds to a proton and emits a photon with an
energy E = 13.6 eV. Won’t this photon go on to ionize another atom? How
is it that the Universe is able to recombine at all? (It’s not that the Universe
expands and makes the number density of atoms sufficiently dilute that ionizing is
rare.) ■
The CMB was discovered in 1964 by Arno Penzias and Robert Wilson at the Bell
Laboratories in New Jersey. They were given the task of calibrating a microwave
antenna for use in telecommunications and astronomy, but found a surprising and
unexplained constant noise source distributed isotropically across the sky. The
isotropy and constancy themselves ruled out several potential causes: Galactic
sources should be preferentially found in the Galactic plane; Solar System sources
should be preferentially in the ecliptic plane; the signal was the same in the
direction of extraterrestrial radio sources (which we shall meet in Chapter 4);
sources related to nuclear tests should decay with time; other sources related to
human activity might be expected to be stronger in the direction of nearby

2
Jones, B.J.T. and Wyse, R.F.G., 1985, Astronomy and Astrophysics, 149, 144.
41
Chapter 2 The cosmic microwave background

cities. They also found pigeons roosting in the antenna and needed to remove
what Penzias later called a ‘white dielectric material’. Although they trapped
the pigeons and released them thirty miles away, the pigeons kept returning;
reluctantly, the birds were eventually shot. The anomalous isotropic noise source
remained, and neither Penzias nor Wilson could find its source.
Unknown to both, a rival group in Princeton, led by Robert Dicke, had just
predicted an isotropic CMB from the Big Bang theory. Dicke’s group were
planning an experiment to detect it. Once Dicke heard of Penzias and Wilson’s
anomalous noise, Dicke spoke on the telephone to the Bell Laboratories group,
after which he announced to his team: ‘Boys, we’ve been scooped’.
The result was that two back-to-back papers in the Astrophysical Journal
announced the discovery to the world: Penzias and Wilson published the
discovery itself, while Dicke’s group published the theoretical explanation.
Penzias and Wilson co-won the 1978 Nobel Prize in Physics for their discovery
(along with Pyotr Kapitsa for a different discovery).
The story has further twists. It turned out that the CMB prediction was implicit in
calculations published in 1948 by George Gamow, Ralph Alpher and Robert
Herman. Furthermore, in 1934 Andrew McKellar measured the ‘effective
temperature of interstellar space’ in a careful experiment using spectroscopy to
derive the typical energy level excitations of molecules in the interstellar medium
(we shall see how in Section 2.2 and in later chapters). He found this to be about
2.3 K, but the work pre-dated the Big Bang predictions and neither he nor any
reader appreciated its significance at the time. The currently accepted value of the
CMB temperature is 2.725 ± 0.001 K.

Figure 2.2 All-sky maps of Exercise 2.2 Stefan’s law can be shown to imply that the energy density is
the CMB temperature. The top 4σT 4 /c. Use this to show that the present-day CMB radiation energy density of
image is scaled from 0 K to 4 K the Universe is Ωr,0 h2 = 2.5 × 10−5 . (The value of Stefan’s constant σ is
and looks very uniform. The 5.67 × 10−8 W m−2 K−4 .) ■
next image has a much smaller
There is also an additional contribution from light neutrinos to Ωr of about
scaling, with a range of just
68%, so the total relativistic energy density is slightly larger (Equation 1.27 in
3.353 mK. The pattern is the
Chapter 1) with a value Ωr,0 h2 = 4.2 × 10−5 .
dipole that results from the
Doppler shift due to the Earth’s The CMB is also remarkably uniform (Figure 2.2); we’ll show that this presents a
motion relative to the cosmic very serious cosmological problem. If we increase the contrast level, we first find
rest frame (Section 2.11). After a characteristic hot and cold pattern (Figure 2.2). We’ll show in Section 2.11 that
accounting for this motion this is caused by our motion relative to the cosmic rest frame. Correcting for
and further restricting the this motion, we find a strong signal from our Galaxy (the horizontal band in
temperature range to just 18 µK, Figure 2.2), and apart from this we find that the CMB has intrinsic fluctuations at
we see a large band due to our the level of microkelvins. There is no single clear physical theory for the level of
Galaxy and the background these fluctuations, though we’ll see in Section 2.8 what we know about how they
primordial fluctuations where may have been generated.
the Galactic foreground does not
outshine them. These three
images were taken with the 2.2 The CMB temperature as a function of
COBE satellite; the final image
is higher-resolution data from
redshift
the WMAP satellite. Why is the CMB such a perfect black body? Shouldn’t the redshifting of the
photons distort the spectrum? It turns out that black body radiation has the
42
2.2 The CMB temperature as a function of redshift

remarkable property that it keeps a black body shape, independently of cosmic


expansion. The black body spectrum is
ν3
I(ν, T ) dν ∝ dν, (2.2)
ehν/kT − 1
where I(ν, T ) dν is the energy per unit area in a frequency interval ν to ν + dν,
T is the temperature, k is Boltzmann’s constant, and h is Planck’s constant. If we
substitute in ν % = ν/(1 + z), we find that
(ν % )3 (1 + z)3
I(ν % , T ) dν % (1 + z) ∝ dν % (1 + z)
ehν " (1+z)/kT − 1
thus
(ν % )3
I(ν % , T ) dν % ∝ hν " (1+z)/kT
dν % ,
e −1
omitting the constant (1 + z) factors in the proportionality, so
(ν % )3
I(ν % , T % ) dν % ∝ dν % ,
ehν " /kT " − 1
where T % = T /(1 + z). This has exactly the same form as Equation 2.2. There are
some underlying physical reasons for this that we shall explore in this section.
The wavelengths have increased by a factor of (1 + z) since the photons were
emitted, and the volumes have increased by a factor (1 + z)3 . Since photon
energy is related to frequency by E = hν, the energy density (i.e. energy per unit
volume) has decreased by a factor of (1 + z)−4 since the photons were emitted.
But the Stefan–Boltzmann law states that the energy density is proportional to T 4 ,
where T is the temperature. Therefore
Temitted
= (1 + z). (2.3)
Tobserved
We therefore have a clear prediction that the CMB was warmer in the past:
TCMB (z) = (1 + z) TCMB (0), where TCMB (0) is the present-day observed
temperature of the CMB. But how could we tell? There are no friendly aliens
beaming us their measurements of TCMB from earlier in the Universe (or at
least, none that we know of). But the CMB sets a minimum temperature for
astronomical objects, because anything cooler would be heated by the CMB, and
we can detect the effects of this minimum temperature.
The temperature of a gas affects the energy levels of the gas molecules. Energy
levels of the order kT tend to be populated by the molecules, while energy levels
0 kT or * kT are not. The relative numbers in states close to kT will depend
on whether the gas is monatomic, diatomic (and so with extra rotational and
vibrational modes) or more complex, but the temperature-dependence of these
numbers is calculable and known for very many molecules. When a molecule
changes state, it involves the emission or absorption of a photon, and these
emission or absorption lines can be used to both identify the molecular species
and derive the redshift (see, for example, Figure 1.10 in Chapter 1). The relative
strengths of emission lines of any particular molecular species will depend on the
temperature, because the amount of any particular emission will depend on the
number of molecules in the emission process’s initial state. We can therefore use
43
Chapter 2 The cosmic microwave background

the relative strengths of emission lines in high-redshift objects to constrain the gas
temperatures, and so place upper limits on the CMB temperature.
Srianand, R. et al., 2008, One recent measurement of this temperature at a redshift of z = 2.418 37 yielded
Astronomy & Astrophysics, 482, T = 9.15 ± 0.32 K using carbon monoxide rotational modes, and the CMB has
L39–42. been argued to dominate the CO excitation in this system. This beautifully
confirms the predicted CMB temperature of 9.315 ± 0.007 K at that redshift.
One subtle question often asked about the CMB temperature is: where did the
photons’ energy go? The emitted energy of any photon would have been h νemitted ,
but the energy received is h νemitted /(1 + z).
Could it be a gravitational redshift? A gravitational field can certainly redshift
photons. A light signal sent from the surface of the Earth will be received in space
with a slightly redder wavelength, because energy has been lost by climbing up
the gravitational potential well. However, this cannot be the case in the expanding
Universe, because it is homogeneous and isotropic.
An analogy is sometimes made to the adiabatic expansion of a photon gas. If you
have a photon gas contained in a box, and expand the box by a factor (1 + z) (so
the volume increases by (1 + z)3 ), the energy density will decrease by a factor
(1 + z)−4 and the temperature by (1 + z)−1 , as we have argued above. However,
in this case the photon gas does p dV work against the sides of the box. In the
cosmological case, though, the issue of the p dV work is much more subtle.
Unfortunately, this issue takes us into very deep waters. In general relativity, the
separate concepts of energy conservation and momentum conservation are
replaced by zero derivatives of the ‘energy–momentum tensor’. The short answer
is that ‘energy per unit volume’ becomes dependent on the reference frame, so
‘energy’ on its own cannot be said to be conserved, although a more general
conservation law does apply (see the further reading section).
To see why ‘energy per unit volume’ is dependent on the reference frame, imagine
that the Universe is filled with a pressureless gas of particles, and for simplicity
assume just the flat Minkowski metric of special relativity. Suppose that the
particles each have a rest mass m and that their number density is n particles per
unit volume. The energy density will therefore be mc2 × n. If we now make
a Lorentz transformation to a reference frame that’s moving relative to the
first, we can see that the mass will be increased by a factor of γ to γm, while
Lorentz contraction of the volumes will result in an increase in n by the same
factor, to γn. The moving observer will therefore see an energy density of
γmc2 × γn = γ 2 mc2 n. Therefore the energy density isn’t an invariant scalar,
because different observers see different energy densities. It also can’t be a
component of a four-vector, because when you Lorentz transform a four-vector
you get only one γ factor. So ‘energy density’ has to be part of a different sort of
mathematical object. This object is a tensor; we shall meet more examples of
tensors later in this book.

Exercise 2.3 Show that the redshift of matter–radiation equality zeq , when the
energy densities of matter and radiation (including neutrinos) were comparable, is
given by
1 + zeq = 23 800 Ωm,0 h2 (TCMB,0 /2.725 K)−4 , (2.4)
where TCMB,0 is the present-day CMB temperature. Evaluate zeq . ■
44
2.3 Why is the CMB a black body?

At early enough times, the Universe was radiation-dominated, with Ωr 1 1 and


negligible contributions from other density parameters. Figure 1.11 shows the
evolution of density parameters.

Exercise 2.4 In Section 1.6, we showed that a universe with Ωm = 1 and all
other density parameters zero obeys a = R/R0 ∝ t2/3 . Make a similar analysis
for a universe in which Ωr = 1 and the other density parameters are negligible,
and show that in this case a = R/R0 ∝ t1/2 . ■

2.3 Why is the CMB a black body?


The previous section asked why the CMB radiation should be such a perfect black
body, but arguably failed to answer the question; we showed only that a black
body should stay a black body, not why it should be black body radiation in the
first place. Figure 2.1 shows the spectrum observed from the COBE satellite. (The
COBE FIRAS error bars are smaller than the plot symbols.)
One reason why we might expect the spectrum not to be thermal is that the early
Universe was very young — was there enough time for the particles to thermalize?
A relatively simple argument shows that the young age of the Universe isn’t a
problem. The early Universe is expanding very fast and (on purely dimensional
grounds) we can define a dynamical timescale of τdyn ∝ (Gρ)−1/2 , where ρ is the
energy density. We won’t worry about the constant of proportionality for now.
The early Universe must have been radiation-dominated (Chapter 1), so ρ ∝ R−4 ,
where R is the scale factor of the Universe. Therefore the dynamical timescale
will scale as τdyn ∝ R+2 , and so the dynamical inverse timescale varies as
1/τdyn ∝ R−2 . Meanwhile, thermalization of the particles will happen through
interactions between the particles. Even photons can interact with each other,
though the interaction probability is low except at the highest energies. The
number of collisions that a single particle will experience per unit time will be
proportional to the number density of particles that it’s interacting with, n. To find
the total number of collisions for all the particles, we multiply this again by the
number density of particles. The collision rate must therefore be proportional
to n2 , which scales as n2 ∝ (R−3 )2 = R−6 . So as R tends to zero, the collision
rate (∝ R−6 ) increases much faster than the dynamical rate (∝ R−2 ). Regardless
of the constants of proportionality, thermalization will be increasingly easy at
earlier times, and we must reach a time in the early Universe when the particles
are thermalized. The current Big Bang model is sometimes referred to as the Hot
Big Bang, distinguishing it from a previous (but now disfavoured) model.
The closeness of the spectrum to a black body is also strong evidence against
the early Steady State models of the Universe, which have no Big Bang but a
continual creation of matter throughout all space. (The term ‘Big Bang’ was
originally a pejorative coined by a Steady State proponent, Fred Hoyle.) The
Steady State model supposed that the CMB was thermal radiation from dust
clouds, which were themselves heated by stars (much like the Orion nebula).
However, this would predict that the CMB is comprised of a range of temperatures
from a variety of redshifts, so the spectrum would not resemble a single black
body. Steady State models have fallen out of favour, partly because of the lack of
a plausible mechanism for the steady state creation of matter throughout all space,
45
Chapter 2 The cosmic microwave background

and partly because they struggle to account for other observations (microwave
background and its anisotropies, the evolution of large-scale structure, Big Bang
nucleosynthesis, etc.) without appearing contrived to many. For these reasons we
won’t discuss these models further in this book.

2.4 Baryogenesis
One of the deep unsolved problems in fundamental physics is why our Universe
has more matter than antimatter. In the standard model of particle physics,
baryon number is strictly conserved. (A baryon such as a proton or a neutron has
a baryon number of B = +1, while an antibaryon such as an antiproton or
antineutron has a baryon number of B = −1.) Baryons are created and destroyed
in baryon–antibaryon pairs, conserving baryon number, yet somehow we find
ourselves in a Universe composed almost entirely of matter. We could adopt the
position that it’s somehow set in the initial conditions, as is also argued for
‘explaining’ the expansion of the Universe, but in both cases this arguably evades
the question. One might suppose that there are distant galaxies made of antimatter
rather than matter, but the intergalactic medium is not empty and there should be
very clear observable signatures from ongoing annihilation along the boundary
between the matter and antimatter regions.
While the Universe was hot enough that kT was above the proton rest mass
energy, protons and antiprotons would have been being created and destroyed
from particles colliding at their thermal velocities. As the Universe expanded and
cooled, this baryon–antibaryon creation eventually ceased and the protons
and antiprotons were free to annihilate, and it turns out that the collision rates
were high enough for this to happen very efficiently. What we see now as a
Universe with mainly matter (rather than antimatter) is in fact the relic of a subtle
asymmetry in the early baryon versus antibaryon numbers, of the order of
1 + 10−9 protons for every antiproton. What generated this initial imbalance?
Why is there any matter left in the Universe?
This on its own tells us that there must be new physics beyond the standard model
of particle physics. What sort of theory could explain baryosynthesis? The answer
may come from so-called grand unified theories (GUTs) that unify three of the
four fundamental forces: the strong nuclear force, the weak nuclear force and
electromagnetism. These forces appear distinct with different strengths, but their
strengths are predicted in GUTs to converge at energies of ∼1015 GeV. GUT
reactions above these energies could be the source of the present-day baryon
asymmetry in the Universe. However, it’s by no means certain that this is the
correct answer. An epoch of inflation (which we shall meet in Section 2.8) would
erase any pre-existing baryon asymmetry. Inflation is thought to be triggered by a
GUT-scale phase transition (see Section 2.8), and once inflation has finished, it
leaves the Universe at a lower temperature than the GUT scale. This would leave
us with no baryon asymmetry, unless generated during the processes that end
inflation.
There must also have been a primordial lepton asymmetry, otherwise the excess
number of protons over antiprotons would have left the Universe with an overall
electric charge. GUTs view baryons and leptons as different states of one common
species of particle, and in many GUTs, B − L is conserved (L being the lepton
46
2.5 The entropy per baryon

number, e.g. +1 for e and νe , −1 for e+ and ν e ), so this lepton asymmetry would
be a natural consequence of the baryon asymmetry. The present-day lepton
asymmetry would now reside in the cosmic neutrino background, which we shall
meet in Section 2.6.

2.5 The entropy per baryon


Once the baryon–antibaryon annihilation finished, leaving the residual baryon
excess, the comoving baryon density was conserved. The next major annihilation
stage in the Universe’s history was e± annihilation through the reaction
e− + e+ → γ + γ, i.e. creating two photons. After this point and until the first
starlight illuminated the Universe (Chapter 8), the comoving number density of
photons is conserved. It’s conventional to measure the baryon asymmetry in terms
of the photon abundance: we define a new quantity η as
nb − nb
η= , (2.5)

where nb is the comoving density of baryons, nb is that of antibaryons, and nγ is
that of photons. (Sometimes the notation η = 10−10 η10 is used, though we won’t
make use of this notation in this book.)
The quantity η is sometimes referred to as a measure of the entropy per baryon.
A photon gas has an entropy that’s proportional to the number of photons, so η is
in fact proportional to the reciprocal of the entropy per baryon. Another way of
measuring the baryon density of the Universe is as a fraction of the critical
density, i.e. Ωb,0 (see Chapter 1). The η parameter is related to Ωb,0 via
η
Ωb,0 h2 = . (2.6)
2.74 × 10−8
If we take the entropy in the photon gas and divide it by the
cosmological matter density, the entropy per unit mass comes out as
1.09 × 1012 (Ωm,0 h2 )−1 J K−1 kg−1 . To give you some sense of the scale of this
number, taking 300 K water and raising its temperature by 1 K raises the entropy
by only 14 J K−1 kg−1 . The entropy content of the Universe is overwhelmingly
dominated by the entropy of the CMB (neglecting black holes), which is why the
expansion can be treated as adiabatic and reversible.

2.6 Primordial nucleosynthesis: a thousand


seconds that shaped the Universe
2.6.1 The primordial fireball
If we mentally rewind the history of the Universe, we shall reach a point where the
Universe becomes opaque like the photosphere of a star. As we’ve seen, the end
of that opaque epoch is the CMB, i.e. the surface of last scattering. If we continue
rewinding, the Universe will become hotter and denser, and should ultimately
reach the temperature of the Sun’s core. Should we therefore expect nuclear
reactions? It turns out that most of the energy density then was in radiation rather
than matter (Chapter 1) so baryons (e.g. protons and neutrons) would have been
47
Chapter 2 The cosmic microwave background

much rarer relative to photons, compared to the Sun’s core. Nuclear reactions at
that time were therefore slow. However, it turns out that the nuclear reaction rates
were significant earlier on, at higher temperatures of around 109 K.
It’s an extraordinary intellectual triumph of the Big Bang theory that it’s possible
to calculate the nuclear reaction rates in the early Universe and estimate the
abundances of nuclei and particles from this primordial nucleosynthesis. These
estimates are in fairly good agreement with observations, as we shall see in this
section. The key concept is freeze-out, which we shall meet first in the context of
protons and neutrons.
In thermal equilibrium, the relative numbers of protons and neutrons, np and nn
respectively, will be related through a Boltzmann distribution:
C D C D
nn −Δm c2 −1010.176 K
= exp 1 exp , (2.7)
np kT T
where Δm 1 1.29 MeV/c2 is the mass difference between a neutron and a
proton. (Strictly speaking this is true only when the protons and neutrons are
non-relativistic, but this is the case in the following discussion.) Protons
and neutrons will be converting between each other through the reactions
p + e− ! n + νe and p + ν e ! n + e+ . The rate v of either reaction
can be calculated through the theory of the weak nuclear force, and in the
high-temperature limit it turns out that the reaction rates are the same and have a
very strong temperature-dependence:
C 10.135 D−5
10 K
v= s−1 . (2.8)
T
Meanwhile, the ambient temperature is changing as the Universe expands, varying
as (1 + z) (see Section 2.2). The early Universe was approximately spatially flat
(because Ωk 1 0 in the early Universe — see Chapter 1) and radiation-dominated
(also Chapter 1), from which we find that the scale factor satisfies R ∝ t1/2
(Exercise 2.4). From this it follows that the temperature varies with time as
t ∝ T −2 , and putting in the constants of proportionality gives
C 10.125 D2
10 K
t= s. (2.9)
T
As a result, the reaction timescale (1/v) for p + e− ! n + νe will quite suddenly
become longer than the age of the Universe at a time t = 1/v. After this point the
neutron–proton reactions cease and the neutron–proton ratio (Equation 2.7) is
frozen out at the equilibrium value that it had at that time. The time t = 1/v
corresponds to a temperature of
C 10.125 D2 C 10.135 D5
10 K 10 K
= , (2.10)
T T
which we can rearrange to find T = 1010.142 K. Plugging this into Equation 2.7,
we find that the relic neutron–proton ratio must be
C D
nn −1010.176 K
1 exp 1 0.34. (2.11)
np 1010.142 K
At this time, the Universe was only about one second old (Equation 2.9).
48
2.6 Primordial nucleosynthesis: a thousand seconds that shaped the Universe

There are some slight corrections to this, and more detailed calculations
give nn /np 1 1/7 for the present-day relic abundance of neutrons and
protons. For example, we’ve assumed an instantaneous transition. Also,
the temperature-dependence of the rate will be slightly modified at lower
temperatures, because Equation 2.8 is the high-T limiting case. Another potential
complication is the fact that other reactions will be happening at the same time as,
for example, p + e− ! n + νe . For instance, atomic nuclei will form from the
protons and neutrons. While the ambient thermal energy kT of particles is much
larger than nuclear binding energies, these nuclei will quickly be destroyed again,
so we can ignore these reactions for now.
We’ve gone carefully through this freeze-out process because it’s one of the
key physical principles in the nuclear reactions of the primordial fireball. For
example, the electron–positron annihilation creates neutrinos through the weak
interaction reaction e− + e+ ! νe + ν e (as well as annihilating through the
electromagnetic interaction e− + e+ ! γ + γ). Neutrino production freezes out
at a temperature of approximately 1010.5 K, corresponding to a cosmic time of
about 0.18 seconds, leaving most of the electrons and positrons to annihilate to
make more photons, which happens at a temperature of T 1 me c2 /k 1 109.77 K,
corresponding to a time (Equation 2.9) of about five seconds, i.e. shortly after the
neutron–proton freeze-out. Neutrinos interact only very weakly with other matter,
and their freeze-out happened before the epoch of the CMB; there should be a
cosmic neutrino background from much earlier cosmic epochs than the CMB.
Detailed calculations of the relic abundances of protons and neutrons take
into account the ongoing changes in the neutrino population, though we won’t
discuss this in this book. There are formidable experimental challenges to the
direct detection of the primordial neutrino background, though the presence
of these neutrinos can be inferred indirectly from the structures in the CMB.
(Nevertheless, neutrinos from astrophysical sources have been detected, most
famously from the supernova SN 1987A.)
Neutrons aren’t stable but instead decay with a half-life of τ = 885.7 ± 0.8 s.
Why are there any left? Why isn’t the Universe pure hydrogen? Luckily for us,
the temperatures soon dropped enough to allow the formation of atomic nuclei. To
see why, compare the binding energy of the first nuclide heavier than hydrogen
(deuteron, 2.225 MeV) with the electron rest mass energy (0.511 MeV) and the
neutron–proton mass difference (1.3 MeV). The Universe was still only a few
seconds old.

A note on terminology
Children are taught at school that nuclear reactions are not combustion, so
it’s not correct to refer to nuclear reactions as ‘burning’. This is quite right.
However, at this level it’s usually felt that there is no danger of confusing
this with combustion, so the technical literature makes free use of the verb
‘to burn’ and related words. For example, the early Universe is sometimes
referred to as the primordial fireball. I once heard a supernova described in a
seminar as ‘like a forest fire, but the trees can run away’.

49
Chapter 2 The cosmic microwave background

The nuclear reactions in the next 1000 seconds or so shaped the baryonic content
of the Universe. To calculate the final mix of elements left at the end of these
nuclear reactions, you need to consider all the different reactions. We won’t do
this here (see the further reading section for more details), but the main processes
are summarized in Table 2.1 and Figure 2.3.

1 1
1H
−1 4
10 n 2 He

10−2
10−3
2
10−4 1H

10−5 3
1H
3
mass fraction

10−6 2 He

10−7
10−8
10−9
10−10 7
7 3 Li
4 Be
10−11
10−12
10−13
10−14
10 102 103 104
time/seconds

Figure 2.3 Predicted light element abundances during the primordial fireball.
Many of the reactions involved in changing these abundances are listed in
Table 2.1.

2.6.2 The primordial element abundances


So, after this fireball, how did the Universe end up? The element abundances
depend on the number of baryons per photon, or (equivalently) on η or Ωb,0 h2 .
The value of η has been very well determined by the WMAP satellite (of which
more later) to be η = (6.14 ± 0.25) × 10−10 . Big Bang nucleosynthesis therefore
makes very clear predictions for the primordial abundances of elements created in
the first half hour of the Universe’s existence. We can test these predictions, and
the overall level of agreement with observations is one of the many successes of
the Big Bang model (and challenges to rivals such as Steady State models).
However, the tricky part of these experiments is to find baryonic matter that has
remained in its primordial condition for the ∼13.7 billion years since primordial
nucleosynthesis. Figure 2.4 shows a compilation of constraints on η.
50
2.6 Primordial nucleosynthesis: a thousand seconds that shaped the Universe

Table 2.1 Some of the important reactions in primordial nucleosynthesis.

Time since Reactions Description


the Big Bang
< 1s p + e− ! n + νe , Neutron–proton freeze-out sets the
n + e+ ! p + ν e subsequent neutron–proton ratio.
∼1–100 s n → p + e− + ν e Neutrons then decay.
∼100–200 s p+n!D+γ The reason why there are any neutrons
left in the Universe is nuclear reactions
that create stable deuterium nuclei,
which allow further reactions.
∼200–1000 s D + n → 31 H + γ, Deuterium burning exceeds deuterium
3H + p → 42 He + γ, creation. The net effect of these
1
D + p → 32 He + γ, deuterium burning reactions is
3 He + n → 4 He + γ, D + D → 42 He + γ.
2 2
D + D → 32 He + n,
D + D → 31 H + p,
3 H + D → 4 He + n,
1 2
3 He + D → 4 He + p
2 2

1
4
10−1 2 He
number abundance relative to hydrogen

10−2
10−3
10−4
10−5
3
10−6 2 2 He + 21 H
1H

10−7
10−8
7
10−9 3 Li

10−10

0.001 0.01 0.1 1


Ωb,0 h2

Figure 2.4 The light element abundances relative to 11 H, as a function of the present-day baryon density times h2 .
The vertical area is the constraint on the baryon density from WMAP. The curves show the predictions from
nucleosynthesis calculations, while the horizontal boxes show the observational constraints. There is a broadly
consistent picture apart from the 73 Li abundance, but this element can be destroyed in stars so is difficult to measure.
The best constraints on the deuterium abundance have come from the absorption
lines in neutral hydrogen clouds. Here, the light from a background source
(such as a quasar) passes through a foreground dense neutral hydrogen clump.
The abundance of deuterium is low, but if that clump is sufficiently dense,
51
Chapter 2 The cosmic microwave background

characteristic absorption lines can be detected. We shall return to this in


Chapter 8. Deuterium is destroyed in stellar nucleosynthesis, so the observed
deuterium abundance is more accurately described as a lower bound to the
primordial abundance.
The lithium abundance is very difficult to measure because lithium is relatively
rare. Cosmic rays and stellar nucleosynthesis can both produce 7 Li long after
primordial nucleosynthesis, and 7 Li can also be destroyed in stellar interiors.
Attempts have been made to measure the 7 Li abundance in old, metal-poor
stars for which one might hope that these reactions are minimized. However,
as we see in Figure 2.4, there seems to be some discrepancy with Big Bang
See, for example, Steigman, G., nucleosynthesis. There is a very active debate over whether this discrepancy is the
2007, Annual Review of Nuclear signature of new physics, or whether there are unknown systematic errors in the
and Particle Science, 57, 463. abundance calculations, or whether we are seeing the destruction of lithium in the
previous generation of stars, or whether there are uncertainties in the stellar
temperature scale that is used to convert the lithium absorption line depths to
lithium abundances.
The 32 He abundance is also complicated and model-dependent. Again, it’s both
created and destroyed in stars. The strongest constraints on the 32 He abundance
currently come from the metal-poor H II region that is most distant from the
Galactic Centre. 32 He is less sensitive to changes in η than D (see Figure 2.4) so
the constraints on η in Figure 2.4 are correspondingly weaker.
Perhaps the biggest surprise is the observed 42 He abundance. The abundance
depends only very weakly on η, and with the WMAP value of η, the abundance of
4 He predicted by primordial nucleosynthesis is Y = 0.2485 ± 0.0008. The 4 He
2 p 2
abundance monotonically increases with time in stellar nucleosynthesis, but as the
oxygen abundance (also created in stellar nucleosynthesis) in stars tends to zero,
the 42 He abundance should tend towards its primordial value. Some measurements
of the primordial 42 He abundance (also written Yp ) differ from the nucleosynthesis
prediction by more than 2σ, i.e. two standard deviations! However, there is
Again see, for example, considerable debate over the systematic errors present in these data, which
Steigman, 2007. appear to be significant. One approach is to derive a conservative upper limit
to Yp using only the best-studied systems for which systematics are best
characterized. In this way a 2σ limit of Yp < 0.254 has been found, consistent
with Big Bang nucleosynthesis. Alternatively, the currently most defensible
compilation of observations for which systematics can be well characterized gives
Yp = 0.240 ± 0.006. This is only marginally discrepant (∼1.4σ) with the Big
Bang nucleosynthesis value.
If the 42 He abundance is found ultimately to be discrepant with primordial
nucleosynthesis, it may be a signature of unknown new physical laws, for example
modifying the expansion rate at early post-inflationary epochs (but only by at
most a few tens of per cent). This would change the amount of time available for
primordial nucleosynthesis and so change the final abundances.

2.7 The need for new physics


The primordial abundances of light elements have long been felt by some to hint
tantalizingly at new unknown physical laws, but there are many much stronger

52
2.7 The need for new physics

hints. For example, where did the matter–antimatter asymmetry of the Universe
come from (Section 2.4)? Also, what caused the initial inhomogeneities in the
Universe? If the Universe were perfectly homogeneous, it would have stayed
homogeneous and no stars or galaxies would have formed. Something must have
given the Universe its initial density perturbations. Also, what triggered the initial
expansion? This is often written off as part of the initial conditions, but isn’t
that evading the question? We’ve also met the fine-tuning problems for the
cosmological density parameters Ωm and ΩΛ in Chapter 1 (Section 1.7), known as
the flatness problem.
There is another very fundamental problem posed by the uniformity of the
microwave background itself, known as the horizon problem. Suppose that you
are at some place in the early Universe, arbitrarily close to the time of the Big
Bang. You send a photon out. Neglecting the opacity of the Universe, how far will
that photon travel? Anything further could not have been in causal contact with
you since the Big Bang, so the distance that the photon travels sets the size of the
causally-connected region.
We start from the Robertson–Walker metric (Equation 1.6) with the approximation
of a spatially-flat universe (Figure 1.11) so k = 0. We can set the origin to where
the photon starts so the light ray is radial, so dθ = dφ = 0. Also, all light rays
have ds = 0, so we have that R(t) dr = c dt, where R(t) is the scale factor of the
Universe, as in Chapter 1. The proper distance travelled by the photon will
therefore be
! t
c dt
r= . (2.12)
0 R(t)
In the early radiation-dominated Universe, R ∝ t1/2 (Exercise 2.4) so this integral
converged to a finite value. In the later matter-dominated Universe (before ΩΛ
became significant), we had R ∝ t2/3 , which again converges. The size of this
causally-connected region is known as the particle horizon. Note that this is
different to the event horizon (Section 1.11). To calculate the event horizon, you’d
integrate from t to infinity, not from 0 to t.
We saw in Section 1.9 that Equation 1.44 implies that the size of the comoving
distance to z = ∞ in an Ωm = 1, Λ = 0 universe is 2c/H0 . The proper distance
to the particle horizon must therefore be 2c/H(z), where H(z) is the Hubble
parameter at redshift z. (If we had assumed a radiation-dominated universe, this
would come out as c/H(z), which is a factor of two smaller.) The value of H(z)
at the time of recombination comes out as 18 200 × H0 using Equation 1.33 and
the values of the density parameters in Section 1.5. The particle horizon size then
comes out as 2c/H = 2c/(H0 × 18 200) = (2c/H0 ) × 5.5 × 10−5 , or about
0.46 Mpc for an H0 of 72 km s−1 Mpc−1 .
The horizon problem is that 0.46 Mpc on the CMB is very small: by
numerically integrating Equation 1.44, we find that the comoving distance
to z = 1090 (the redshift of CMB) is 14 189 Mpc, so the angular diameter
distance dA = dcomoving /(1 + z) = 14 189/1091 ! 13 Mpc. Using Eqn 1.47,
the angular size of the particle horizon at the time of recombination is
θ = D/dA = 0.46/13 radians ! 2 degrees (slightly less if we take into account
the early radiation-dominated phase). We’ve just shown that objects further apart
than this distance could not have been in causal contact, so how is it that parts of
the CMB sky more distant than two degrees ever managed to look so similar?
53
Chapter 2 The cosmic microwave background

There is also a problem that arises from almost all grand unified theories (GUTs)
that seek to unify three of the four fundamental forces (electromagnetism, strong
nuclear force, weak nuclear force). As the Universe expanded and cooled, the
GUT field (whatever it was) would settle into particular configurations. This
is rather like paramagnetism, where below a critical temperature (the ‘Curie
temperature’) the magnetic moments of molecules align with those of their
neighbours into magnetic domains. In the cosmological case these domains can
have various sorts of boundaries, including a monopole state where the local field
points radially away from a particular point. Macroscopically this would look like
a magnetic monopole. GUTs predict about one monopole per horizon size at the
time when the Universe was at the critical GUT temperature, but as this was very
early in the Universe, the horizon size was small. Therefore the present-day
Universe should have many magnetic monopoles — so many, in fact, that they
would dominate the energy density of the Universe. Why do we not see them in
the Universe? This is known as the monopole problem.
Perhaps the solution to all these problems is at the Planck epoch. We currently
have no consistent theory that unifies quantum mechanics and general relativity.
Where should we expect such a theory to be needed? Presumably the theory
would need to use !, G and c, so we can use these to derive a characteristic length,
mass and time:
#
!c
mPl = 1 1019 GeV/c2 , (2.13)
G
#
!G
rPl = 1 10−35 m, (2.14)
c3
#
!G
tPl = 1 10−43 s. (2.15)
c5
These are known as the Planck mass, Planck length and Planck time,
respectively. When a mass, length or time interval under consideration is of
the order of the Planck scales, we should expect an unknown theory of
quantum gravity to be needed. Clearly the initial singularity at t = 0 in the
Robertson–Walker metric is an example, as is the singularity at the centre of a
black hole (Chapter 6). Note that the GUT energy scale of 1015 GeV is a factor of
104 from the Planck scale. While 104 might be considered a large factor, the
current temperature of the CMB of 2.7 K is equivalent to about 2 × 10−4 eV, i.e. a
factor of about 1028 from the GUT epoch.
Exercise 2.5 Show on dimensional grounds that - the only characteristic
timescale involving !, G and c is proportional to !G/c5 . ■
It may be that in order to solve all these problems (monopole, flatness, horizon,
baryon asymmetry, density perturbations, initial expansion) we need the unknown
theory of quantum gravity at the earliest times in the Universe (t 1 tPl ). However,
there has been a proposal to solve these problems at the later GUT epoch in the
Universe’s history when the characteristic temperature was around the GUT scale
of approximately 1015 GeV, in which GUT-scale physics triggers a very rapid
phase of expansion known as inflation. Before describing what triggers this
phase, we’ll first look at how this solves some of these problems.
)t
We’ve shown that the particle horizon size is r = 0 c dt% /R(t% ). If we want
this to diverge, we’ll need an expansion rate R(t) much faster than the t1/2
54
2.7 The need for new physics

in the radiation-dominated era or the t2/3 in the matter-dominated era. If we


suppose that R(t) ∝ tα , then this integral gives the horizon size as ct1−α /(1 − α)
evaluated from t = 0 to t, i.e.
c 1 1−α (
r= t − 01−α . (2.16)
1−α
If α > 1, then the horizon size formally diverges, if the tα expansion operated
right back to the Big Bang at t = 0.
Physically, the tα phase would be a period of very rapid expansion in the
Universe. This would immediately solve the horizon problem, because the regions
that appear to us to be causally disconnected were in fact once part of a much
smaller, causally-connected region that was then inflated to a much bigger size.
We can calculate the minimum amount of inflation needed to solve the horizon
problem: we need that the comoving horizon size when the Universe had a
temperature at the GUT scale (E 1 1015 GeV, i.e. T = E/k 1 1028 K) was
inflated to at least the horizon size today, when the CMB temperature is 2.7 K.
The redshift of the GUT epoch was therefore 1 + z 1 1028 /2.7 1 1027.5 . We can
estimate the size of the proper causal horizon (in the absence of inflation) as
c tGUT , where tGUT is the age of the Universe in the GUT epoch — about 10−35 s.
The comoving size will be cz tGUT (using 1 + z 1 z), which comes out as about
10 metres. This is about a factor of e60 smaller than the current horizon size.
Therefore at least about 60 e-foldings of inflation are needed; were it not for
inflation, the present-day Universe would be strongly inhomogeneous everywhere
on scales more than a few metres.
Inflation gives a mechanism for generating the initial density perturbations in the
Universe. As we saw in Section 1.11, a universe dominated by a cosmological
constant has an event horizon with a proper radius of c/H. This event horizon
will generate Hawking radiation, which can be understood qualitatively as
follows. Quantum mechanics predicts that virtual particle–antiparticle pairs will
be created close to the event horizon, but sometimes one part of the pair will
fall inside the event horizon while the other escapes. Event horizons should
therefore have an associated energy radiation that has been predicted for black
holes. Detailed quantum field theory calculations show that this radiation has a
thermal spectrum. In the context of inflation, the resulting random quantum
thermal fluctuations are ultimately the source of the initial cosmological density
perturbations that eventually formed stars and galaxies. All forms of radiation
should contribute to the Hawking radiation, including gravitational waves, so a
prediction of inflation is a primordial gravitational wave background.
The monopole problem could also be solved by a period of inflation. There would
have been many magnetic monopoles in the early Universe, but once made, the
total number of cosmological monopoles is conserved. The period of inflation
would then have greatly diluted their space density. If this inflation epoch is
allowed to run for sufficiently long, the probability of finding even one monopole
in the observable Universe could be vanishingly low.
We can infer more about the nature of the substance driving inflation if we use
Equation 1.8, which we’ll reproduce in a slightly modified form here:
C D
1 d2 R −4πG 3p Λc2
= ρ + + . (2.17)
R dt2 3 c2 3
55
Chapter 2 The cosmic microwave background

As we found in Chapter 1, the Λ term makes a negligible contribution to the


dynamics of the expansion in the early Universe, so we can neglect it.
Let’s suppose that R is proportional to tα : i.e. R(t) = ntα , where n is
some constant. The time derivative is dR/dt = nαtα−1 , and the second
derivative is d2 R/dt2 = nα(α − 1)tα−2 . Dividing this by R gives us
(1/R)d2 R/dt2 = α(α − 1)t−2 , but this is also the left-hand side of
Equation 2.17. In order to solve the horizon problem, we need that α > 1, which
means that α(α − 1)t−2 must be positive. This means that the left-hand side of
Equation 2.17 must also be positive. In order for the right-hand side to be positive,
we need that ρ + (3p/c2 ) < 0.
What sort of substance would satisfy this? We characterize the equation of state of
a gas as p = wρc2 , where p is the pressure and ρ is the density. The parameter
w defines the equation of state. For example, w = 0 is pressureless matter
(sometimes called ‘dust’), while for a monatomic gas w = 2/3. A photon gas has
w = 1/3. In this case we have ρ + (3p/c2 ) < 0, which implies that w < −1/3. In
other words, inflation needs a sort of negative pressure!

Exercise 2.6 Show that the inflation condition that α > 1 is equivalent to the
scale factor accelerating, i.e. d2 R/dt2 > 0. ■
Inflation can also explain why the Universe is so close to being spatially flat (the
flatness problem). From thermodynamics, an adiabatic expansion of a gas with an
equation of state parameter w satisfies p ∝ V −(1+w) , where p is the pressure and
V is the volume. If the rest mass density is negligible, then ρ ∝ V −(1+w) too. In
this case we can write ρ ∝ R−3(1+w) , where R is the scale factor. (As in the
photon gas in Section 2.2, we’re neglecting the issue of p dV work, but this
equation turns out to be true in the fully general relativistic case.) The key to
solving the flatness problem is in Equation 1.7 from Chapter 1, which we’ll
reproduce here:
C D
dR 2 8πGρR2 Λc2 R2
= − kc2 + . (Eqn 1.7)
dt 3 3
Again, we’ll neglect the Λ term because it makes a negligible contribution
to the dynamics in the early Universe. Suppose that the Universe had
some arbitrary curvature constant k before inflation. As inflation
expanded the Universe, the ρR2 term in the equation above will vary as
ρR2 = R−3(1+w) R2 = R−3(1+w)+2 = R−(3w+1) . If w < −1/3, then the ρR2
term increases with the scale factor. This means that the ρR2 term will eventually
dominate and the kc2 term will be very small in comparison, and can be neglected
or taken to be zero. Thus after inflation, the Universe is left in a state that is close
to spatial flatness. Another way of thinking of this is that the process of inflation
takes one tiny local patch that appears locally flat, and expands it enormously.
Thus no matter how wrinkly the initial state of spacetime before inflation, a small
enough local region will appear locally flat, so the result after inflation is a
spacetime that’s spatially flat. Spatial flatness is in fact a key prediction of
inflation.
Finally, inflation also changes the estimates of the age of the Universe, because
the epoch of inflation could be arbitrarily long. In principle this is one way
of solving the singularity at t = 0 in the Robertson–Walker metric. Having
said that, we can calculate the minimum time needed for inflation to solve the
56
2.8 The inflaton field

horizon problem. We found earlier that there need to be at least 60 e-foldings of


expansion, and these will take a time Δt 1 60H −1 , where H is the Hubble
parameter during inflation. This comes out as only 10−33 seconds.
But how is inflation triggered? We shall look at some possibilities in the next
section.

2.8 The inflaton field


The idea behind inflation is to speculate that the Universe is filled with a scalar
field. This is very different to the fields associated with the four fundamental
forces that you have met so far. The electromagnetic force, weak nuclear force,
strong nuclear force and (Newtonian) gravitational force are all vector fields, i.e.
we can draw the force vector at every point in space, as illustrated in Figure 2.5.
A scalar field, however, has no direction. It’s just an intensity or strength at
every point in space, also illustrated in Figure 2.5. We have never detected a
fundamental scalar field in physics, though derived quantities such as temperature
are scalar fields (but not fundamental ones). Should the Large Hadron Collider
detect the Higgs boson, then this will be evidence for a fundamental scalar field
known as the Higgs field, though it’s known that the Higgs field can’t be identical
to the field responsible for inflation.
The scalar field associated with inflation takes a value that we shall symbolize
as φ. In general, φ can vary with time and space, though to a first approximation
everywhere in the Universe will have the same value of φ at any one time. The
time-variation of φ throughout the Universe will prove to be very important. The
field φ has a particle associated with it, just as the electromagnetic field is
associated with the photon. In this case the particle is known as the inflaton and
the corresponding field φ is known as the inflaton field. Note that this is ‘inflaton
field’, not ‘inflation field’.
The value of the scalar field φ has an energy associated with it (in fact, it’s an
energy density), which we write as V (φ). We can imagine this as a valley with an Figure 2.5 An example of a
object in a potential well, as shown in Figure 2.6. The height of the object in vector field and a scalar field.
Figure 2.6 represents the energy V (φ), while the horizontal position represents Vector fields have an amplitude
the value of φ. If the object is offset from the minimum of V (φ), then it will slide (the length of the arrows) and a
down the well in this analogy, and indeed we imagine that there should be an direction (where the arrows
energy term associated with dφ/dt. point) at every position. We’ve
However, we don’t know the shape of the potential. Taking into account the drawn arrows at only a few
possibility of a temperature-dependent interaction with the other particles, it’s points so the figure isn’t awash
expected that there should be a temperature-dependent ‘effective potential’ that with overlapping arrows, but
could, for example, look like Figure 2.7. The Universe starts in the minimum vector fields have position and
configuration for φ, but as the temperature drops, the shape of the effective direction everywhere, not just at
potential changes. The Universe may find itself in a secondary minimum or may a finite number of points. Scalar
find that the only minimum has shifted. In the former case, the Universe is in a fields, on the other hand, have
false vacuum and could quantum tunnel through the barrier, at which point the only an amplitude. We’ve
value of φ could fall to the new state; in the latter case, the value of φ in the represented this as a greyscale
Universe will simply slide down to the new state. image.

57
Chapter 2 The cosmic microwave background

V (φ)
0.05 false
Figure 2.7 Illustration of
vacuum how temperature-dependent
effects can create a false
0 vacuum. Early in the history
of the Universe, the inflaton

V (φ)
field is around the energy
minimum at φ = 0, but as the
−0.05
Universe cools, a second,
φ deeper minimum appears
true
vacuum elsewhere. The Universe
Figure 2.6 Schematic −0.1 slides or (if necessary)
representation of the value of the 0 0.5 1 quantum tunnels to the new
inflaton field φ, versus the φ
minimum.
energy associated with the field
V (φ). In either case, there is an energy difference between the upper and lower vacuum
states of ΔV . If we take V = 0 as the true vacuum, then the elevated state has an
effective cosmological constant (though strictly speaking this is a misnomer as it
would not be constant in this case). The order of magnitude for ΔV expected in
GUTs is huge, giving prima facie plausibility to inflation: the characteristic
energy density can be shown to be (on, for example, purely dimensional grounds)
4
ρ 1 EGUT /(!3 c5 ) 1 (1015 GeV)4 /(!3 c5 ) 1 1080 kg m−3 .
Going back to our analogy of an object sliding inside a potential well (Figure 2.6),
the full equation of motion turns out to be (in natural units of c = ! = 1, see box):
dV (φ)
φ̈ + 3H φ̇ − ∇2 φ + = 0. (2.18)

The derivation of this formula is lengthy, but it follows ultimately from energy
conservation considerations in the quantum scalar field. The gradient ∇ is with
respect to proper spatial coordinates (not comoving ones), and the dots are time
derivatives. Note that it involves the Hubble parameter H. In the analogy of an
object sliding down the valley, the H φ̇ term is equivalent to a friction term, while
dV /dφ is the force acting on the object.

Natural units
Particle physicists sometimes opt to use ‘natural units’ in which ! = c = 1
to keep the algebra simpler, avoiding fiddly factors of ! and c that can be
determined at the end from dimensional analysis. The thinking is to treat !
and c as implying ‘conversion factors’ between different dimensions. For
example, c could be thought of as the conversion factor between space
measurements and time measurements. What defines these conversion
factors? Well, for us it’s about how we choose to measure lengths
(e.g. metres) and times (e.g. seconds). The Universe doesn’t care whether
we use metres or seconds, or miles and years, so why not choose units in
which c is set to one? For us c has dimensions LT−1 (e.g. metres per
second), so c = 1 has the effect of treating space units in the same way as
time units. In natural units, energies have the same dimensions as mass
(because E = mc2 ) and 1/time (because E = hν).

58
2.8 The inflaton field

A similar consideration turns out to give the pressure and energy density, again in
natural units (c = ! = 1):
p = 12 φ̇2 − 16 (∇φ)2 − V (φ), (2.19)
1 2 1 2
ρ= 2 φ̇ + 2 (∇φ) + V (φ). (2.20)
In natural units the equation of state parameter w = p/(ρc2 ) is written as
w = p/ρ. Equations 2.19 and 2.20 could generate a negative equation of state
parameter: for example, if V 0 φ̇2 and spatial derivatives are negligible, then
w = −1. If we define φ to have the units of energy, then Equations 2.19 and 2.20
come out in conventional units as
1 2 1
p= φ̇ − (∇φ)2 − V (φ), (2.21)
2!c3 6!c
1 2 1
ρc2 = 3
φ̇ + (∇φ)2 + V (φ), (2.22)
2!c 2!c
so each term has the dimensions of energy density.
It’s usual in inflationary calculations to assume that the spatial derivatives are
negligible, because we’re inflating a small, locally-homogeneous region to a giant
size, so any inhomogeneities will become negligible. This means that in most
contexts, φ is the value of the field throughout the observable Universe. If we
assume that the field φ is approximately the same everywhere, then ∇φ 1 0 and
∇2 φ 1 0.
We also found in Section 2.7 that we need w < −1/3 for inflation to happen. In
order to achieve this, we need that the potential V in Equations 2.19 and 2.20
starts off by dominating over the kinetic energy term involving φ̇. When this is no
longer true, inflation will cease. At this point the ‘object’ in Figure 2.6 will
oscillate around the minimum, with the oscillations damped by the H φ̇ term. In
addition, it’s expected that the inflaton field will then decay into conventional
matter and radiation. This particle generation would appear as another
friction-like term in the equation of motion. At this point in the history of
the Universe, the temperature would have been very cold, because the energy
densities of matter and radiation will have been reduced by factors of a3 and a4 ,
respectively, where a is the dimensionless scale factor of the Universe. The
subsequent particle generation process is known as reheating, but the exact
mechanism is not known since the underlying physics of the inflaton field is not
known. During this process, the matter–antimatter asymmetry of the Universe
may have been generated. The end result of inflation is that the Universe is left
with more or less the same energy density as when it started, but in the form of
radiation and matter, and with an imbalance of matter over antimatter.
The requirement that V starts out much bigger than the kinetic energy term can
also be shown to imply that we need φ̈ to be small, and that φ is homogeneous.
The proof of this is very involved, but we can sketch a demonstration. Suppose
that φ has some intrinsic variation over a spatial scale δx. We’d expect there also
to be intrinsic temporal variations over timescales of δt = δx/c. We could
think of this as being equivalent to a kinetic energy term φ2 /(δt)2 . In order for
the potential to dominate, we need that V (φ) is much bigger than φ2 /(δt)2 ,
i.e. V (φ) 0 φ2 /(δt)2 . If we differentiate this with respect to φ, we find that
dV /dφ 0 2φ/(δt)2 . But this will be of the order φ̈. We should therefore expect
to be able to neglect the φ̈ term in Equation 2.18. This approximation is known as
59
Chapter 2 The cosmic microwave background

The expression ‘slow-roll’ is the slow-roll approximation. The result is that the slow-roll approximation leads
perhaps misleading because it to us approximating the equation of motion as
seems to suggest that the object dV
in Figure 2.6 acquires some 3H φ̇ = − = −V % . (2.23)

angular momentum. To avoid
this, we’ve used the verb ‘slide’ The next step is to substitute this into the Friedmann equation (Equation 1.7)
in preference to ‘roll’ where we rewritten in natural units. (To do this, we replace the factor of G with one of the
can, but be aware that most of Planck scales in Equations 2.13–2.15 — conventionally, mass.) We then use
the technical literature and V 0 φ̇2 to show that
textbooks use ‘roll’ in this C D
2 8π 1 1 2 1 2
context. H = φ̇ + (∇φ) + V (φ)
3m2Pl !c 2!c3 2!c

1 V (φ). (2.24)
3m2Pl

Putting these together, one can show that the requirement that V 0 φ̇2 can be
expressed as constraints on two new dimensionless quantities:
C D
−Ḣ m2Pl V % 2
ε= 2 = * 1, (2.25)
H 16π V
C D
φ̈ m2Pl V %%
η= = * 1, (2.26)
H φ̇ 8π V
where V % = dV /dφ and V %% = d2 V /dφ2 .
These equations are a dimensionless way of expressing the constraint that the
potential V must be shallow and flat enough to allow slow-rolling. These criteria
are requirements for inflation to start, and inflation will end when, for example,
ε 1 1.
We don’t know the shape of the inflation potential. There are many varieties of
inflation, each of which hypothesizes a differently shaped potential. However, the
observational consequences of inflation all rely on the last stages of inflation when
the ‘object’ in Figure 2.6 is close to the minimum, so they don’t depend strongly
on the shape of the potential. In a sense this is a pity, because it restricts our
ability to constrain this new physics experimentally, but it also greatly simplifies
the predictions of inflation and makes them more robust to changes in the
underlying assumptions. We’ll describe some of the observational consequences
of inflation in the next section.

2.9 The primordial density power spectrum


One of the key observables that are predicted to result from inflation is the
‘clumpiness’ of the CMB. The statistics of the clumpiness of the CMB are a key
cosmological constraint and the key to modern precision cosmology. To show you
how this works, we’ll need some mathematical way of describing this clumpiness.
Clumps can be small or large, so one way of describing clumpiness could be with
a Fourier series. It’s conventional in this field to use complex Fourier series, so if
you’ve not met these before, see the box below. This box will also briefly mention
Fourier transforms, though we won’t use these for the most part in this book.

60
2.9 The primordial density power spectrum

Fourier series and transforms


Here’s a quick reminder of what a Fourier series expansion looks like. We
have a function f (x) that’s periodic over the interval −L/2 < x < L/2
(e.g. waves in a box of length L), and we find that it can be expressed as
∞ C C D C DD
a0 E 2πnx 2πnx
f (x) = + an cos + bn sin , (2.27)
2 n=1
L L
where the coefficients an and bn are given by
* C D
2 L/2 2πnx
an = f (x) cos dx, (2.28)
L −L/2 L
* C D
2 L/2 2πnx
bn = f (x) sin dx. (2.29)
L −L/2 L
Now,√we can simplify this slightly if we use complex numbers, i.e. using
i = −1. It’s known that eiθ = cos θ + i sin θ, which we can use to write
the Fourier series as
+∞
E
f (x) = An e2πinx/L . (2.30)
n=−∞

This corresponds to the previous Fourier series if An = 12 (a|n| + ib|n| ) for


n < 0, a0 /2 for n = 0, and 21 (an − ibn ) for n > 0. This can be shown to
lead to the following expression for An :
*
1 L/2
An = f (x) e−2πinx/L dx. (2.31)
L −L/2
Sometimes the complex Fourier series is written as
+∞
E
f (x) = An eixkn , (2.32)
n=−∞

where kn = 2πn/L is known as the wave number. We won’t ask you to


manipulate complex Fourier series in this book, but we do want you to have
met them.
Fourier series occur very often in physics, but what happens when the box
that you’re using becomes limitingly big? In this case, sums become
integrals, and Fourier series become integrals. These integrals are known as
Fourier transforms. In cosmology we deal with only finite volumes, so in
practice we need only Fourier series, but just so you’ve met them, the
Fourier transforms are
* +∞
f (x) = F (k) e2πikx dk, (2.33)
−∞
*+∞
F (k) = f (x) e−2πikx dx. (2.34)
−∞

Note how similar these equations are. F (k) is known as the Fourier
transform of f (x). Transforming twice gets you almost back where you

61
Chapter 2 The cosmic microwave background

started: if you make a Fourier transform of F (k), you get f (−x) back.
Fourier transforms occur throughout physics. For example, diffraction in
optics involves Fourier transforms. The image of a star seen through a
telescope is the Fourier transform of the telescope aperture — well, almost.
The amplitudes of the waves hitting your detector are the Fourier transform
of the telescope aperture, but what you measure is the energy of the light on
your detector, which is proportional to the amplitude of the electromagnetic
wave squared, so your image will be the Fourier transform of the telescope
aperture, squared.

Let’s write the average density of matter as ρ and the deviation from this average
as δρ. This deviation will vary with position. It’s common to express the
clumpiness in terms of the fractional overdensity or underdensity, (δρ)/ρ. Often
this fractional overdensity is simply abbreviated as δ. By definition, the mean
value of δ is zero. Since δ will vary as a function of position, we’ll write this as
δ(x).
Imagine that we are considering a large box in the Universe with side length L.
We can write δ(r) as a Fourier series in three dimensions. For simplicity for now,
however, let’s just consider a one-dimensional universe so the ‘box’ is just a
length L. The density of matter expanded as a Fourier series will be
δρ E
δ(x) = = Ckn eikn x , (2.35)
ρ n=−∞,∞

where kn = 2πn/L (with n as an integer) is known as the wave number, and


*
1 L/2
C kn = δ(x) e−ikn x dx. (2.36)
L −L/2
In some sense, the coefficients Ckn characterize how much structure there is at
any wavelength. We can carry this over into three dimensions:
E
δ(r) = C(k) eik·r , (2.37)

where the sum is over all wave numbers k = (kx , ky , kz ) in the box, e.g.
kx = 2πn/L and similarly for y and z. The conventional symbol used to
represent the Fourier coefficients is δk :
E
δ(r) = δk eik·r , (2.38)
*
1
δk (k) = 3 δ(r) e−ik·r dr. (2.39)
L within L3
So far we’ve just written down Fourier series; how can we use these to
characterize the clumpiness? One approach is to measure how much variation
there is in the Fourier coefficients. Since root-mean-square (RMS) is a measure of
the standard deviation of the random sample, we can estimate the variance by
averaging the squares of the Fourier coefficients over different realizations of the
density field for a fixed k, i.e. (|δk |2 /, where |δk |2 = δk δk∗ (remember that δk is a
complex number, so δk∗ is its complex conjugate).

62
2.9 The primordial density power spectrum

How might this work in practice? First, if the δ(x) distribution is isotropic on
average, the Fourier coefficients won’t on average depend on the direction of k.
Second, the amount of clumpiness could depend on how closely you look at the
density field map. For example, the density distribution could be clumpy on
medium-sized scales, but look smooth on larger scales and on smaller scales. For
this reason it’s useful to calculate the variance in the Fourier coefficients as a
function of the length of the wave number vector, k = |k|. This is known as the
power spectrum and is written as
P (k) = (|δk |2 /. (2.40)
In the present-day Universe, an overdensity (δ(x) > 0) will attract surrounding
matter through gravity and will tend to increase the value of δ. Similarly,
underdensities (δ < 0) will empty out of matter, causing the value of δ to become
more negative. The density perturbations δ(r) will initially evolve from
self-gravity in such a way that each Fourier mode evolves independently. This is
also referred to as the ‘linear regime’ in the evolution of the density field. This is
one reason why the power spectrum is used in cosmology, rather than other
measures of clustering.
These effects of self-gravity can be neglected during inflation, but inflation makes
very clear predictions for the initial density power spectrum. The key idea for
inflation is that the gravitational potential laid down by the inflating Universe was
invariant under time translation, i.e. the Universe should look the same on average
if you make the transformation t → t + Δt, regardless of your choice of Δt (as
long as it’s shorter than the duration of inflation). Therefore there must be a
constant level of fluctuations on (say) the scale of the horizon. In other words,
there must be a continuous time-invariant process in which quantum fluctuations
are being created within the Hubble volume then inflated out of it. Also, these
fluctuations cannot have any characteristic length scale (or the Universe would not
look the same regardless of the choice of Δt). This is a fractal universe.
To see what this means in terms of the power spectrum, we need to express the
fluctuations in a scale-invariant way, then state that the fluctuations are constant.
For example, it’s no good measuring the power spectrum on scales of k = 1 m−1
to k = 2 m−1 , because this invokes the characteristic scale length of the metre.
What we can do, however, is measure the power spectrum in an interval between Note that k here is the wave
any wave number k and double that wave number, 2k. We’d then require that the number not the curvature
value of the power spectrum shouldn’t depend on the choice of k. In other words, parameter!
in any factor-of-two interval in k, we should measure the same power spectrum.
Another way of expressing this is to use the natural log of the wave number, ln k,
and require that the power spectrum is the same in any logarithmic interval
Δ ln k = ln 2. Of course, there’s nothing special about the choice of 2. The
general way of expressing this scale-invariance is to say that the variance in the
density field per logarithmic k interval is constant:
Δσ 2 dσ 2
1 = constant. (2.41)
Δ ln k d ln k
This is known as the scale-invariant spectrum or the Harrison–Zel’dovich
spectrum.

63
Chapter 2 The cosmic microwave background

The quantity dσ 2 /d ln k is known as the dimensionless power spectrum of the


gravitational potential Φ and is conventionally (but perhaps confusingly) given the
symbol Δ2Φ :
dσ 2
Δ2Φ (k) ≡ . (2.42)
d ln k
To relate this to the matter power spectrum, we need to use Poisson’s equation,
∇2 δΦ = 4πGρ0 δ, where as before the quantity δ is used as a shorthand for the
fractional overdensity δρ/ρ0 . From a Fourier transformation of this equation, it
turns out that the Fourier transform of the potential fluctuations δΦk satisfies
δΦk = −4πGρ0 δk /k 2 , where δk is the Fourier transform of δ. The scale-invariant
potential fluctuations thus give a dimensionless matter power spectrum: Δ2 ∝ k 4 .
This turns out to be similar to the power spectrum that we defined in
Equation 2.40, but not quite identical. The difference rests on how the averaging
of the Fourier modes is done, because some ranges of k have more Fourier modes
than others.
To see why this is, imagine that we have a cubical box of the Universe with a
volume V = L × L × L, which we’re describing with a Fourier series. The
allowed wavelengths along the x-axis will be λ = L/n, where n is an integer, so
the wave numbers along the x-axis are kx = 2πn/L. Therefore the number
of modes from kx to kx + dkx in the x-direction will be (L/2π) dkx . Now,
instead of just considering the modes along the x-axis, consider all three axes.
How many modes are J there in a radial shell of thickness dk? (Note that we’re
writing k = |k| = kx2 + ky2 + kz2 .) This will be the density of modes times the
volume of the shell, which is (L/2π)3 × the volume of the shell. This volume is
d3 k = 4πk 2 dk. The number of modes in our shell is therefore
C D3
L V V V
d3 k = 3
d3 k = 3
4πk 2 dk = 2 k 2 dk.
2π (2π) (2π) 2π
When we’re calculating the dimensionless power spectrum, we’re asking what the
variance is per logarithmic interval of k, and d ln k = (1/k) dk, so dk = k d ln k.
It perhaps shouldn’t be a huge surprise, therefore, that the dimensionless power
spectrum comes out as
V 3 V
Δ2 (k) = 2
k (|δk |2 / = 2 k 3 P (k). (2.43)
2π 2π
We’ll see in Chapter 4 how this is also used in measuring the clustering of
galaxies.
Often the power spectrum is written as
P (k) ∝ k ns (2.44)
or equivalently
Δ2 (k) ∝ k ns +3 , (2.45)
where ns is known as the spectral index of scalar perturbations, which satisfies
ns = 1 for a scale-invariant spectrum. White noise (e.g. putting down atoms at
random with the same probability everywhere) would have ns = 0.
But how big are these fluctuations? The detailed quantum field theoretical
calculation is complicated, but we can get some idea through the following
64
2.9 The primordial density power spectrum

argument. Quantum fluctuations in the field φ will result in regions of the


Universe finishing inflation at slightly different times. Figure 2.8 illustrates this as
two objects sliding down the potential well V (φ), slightly offset. The field at
these two positions will have the same slow-rolling behaviour, but will finish at
times that are offset by δt, where
δφ
δt = . (2.46)
φ̇

V (φ) φ

δφ
δt

Figure 2.8 A slight difference


δφ in the scalar field value δφ
means that inflation finishes with
a time separation of δt, which
φ t creates a density fluctuation of
δ = H δt (in natural units).

The difference in density between these two regions at the scale of the horizon δH
will be roughly H δt, where H is the Hubble parameter during inflation. Quantum
field theory (and in fact dimensional analysis) predicts that the RMS of δφ on the
scale of the horizon is equal to H/(2π) in natural units, so the horizon-scale
fluctuations will be of the order δH = H 2 /(2π φ̇). This will depend on the shape
of the inflation potential and is one of the free parameters in fitting inflationary
models to data on the large-scale structure of the Universe.
Scale-invariance breaks down once the expansion ceases to be exponential, so we
expect a slight deviation from scale-invariance to be imprinted on the fluctuations
as inflation ends. This will depend on the shape of the inflation potential near
the end of inflation, i.e. on the parameters ε and η (defined in Equations 2.25
and 2.26). The result is that ns 1 1 + 2η − 6ε for the values of ε and η near
the end of inflation. There is also a prediction of a clustered background of
gravitational waves (which we shall meet in Section 2.16), which also has a
dependence on ε.
Two other key predictions of inflation are worth mentioning. First, the
perturbations of the matter and radiation number densities should be equal; these
are known as adiabatic perturbations because adiabatic expansion conserves the
ratio of matter and radiation number densities. Second, the phases of the Fourier
decomposition should be random and uncorrelated with each other. Intuitively
this seems reasonable since the quantum fluctuations at one time should be
uncorrelated with the quantum fluctuations at a later time; the earlier quantum
fluctuations give rise to the Fourier components on larger spatial scales, while the
later quantum fluctuations are responsible for the Fourier modes on smaller spatial
scales. It can be shown that random phases imply that the fluctuations are a
Gaussian random field, which means that the joint probability distribution for
the density at any number of points must be a multivariate Gaussian distribution.
Because a Gaussian random field has no information contained in the phases
65
Chapter 2 The cosmic microwave background

(i.e. they are all uniformly randomly distributed), all the statistical information
about the density field is contained in the amplitudes, so the power spectrum
completely characterizes the density fluctuations.

In summary, inflation predicts a nearly scale-invariant spectrum of


primordial Gaussian density fluctuations, spatial flatness, a gravitational
wave background, and adiabatic fluctuations.

The overall amplitude of the initial perturbations depends on the shape of the
inflation potential, as does the deviation from scale-invariance. We’ll see in
Section 2.16 that the gravitational wave background does too. The CMB
fluctuations have been shown to be consistent with adiabatic perturbations, so we
won’t discuss alternative sources of perturbations here (e.g. ‘isocurvature’
Trotta, R., 2007, Monthly perturbations); if there is a non-adiabatic contribution, it must be small. Many
Notices of the Royal tests have been made of the CMB clustering to search for non-Gaussian character,
Astronomical Society, 375, L26. though no unequivocal signal has yet been found. The expectation is that the
reheating at the end of inflation was the time of baryogenesis, which set the
subsequent entropy per baryon, but the GUT-scale physics that determined these
processes (and inflation itself) is still uncertain.
One implication of inflation is that there may be regions far off the minimum
V (φ) that inflate eternally. If φ is very large, the quantum fluctuations in φ would
make φ perform a random walk that overwhelms the drift towards the minimum
V (φ). Our observable portion of Universe could be just an infinitesimal part of a
much, much larger complex. One of the enduring surprises of observational
cosmology is that it is possible at all — that is, we can build telescopes that are
big enough to detect light from most of the way back to the Big Bang, and
observe galaxies throughout most of the Hubble volume (Section 1.9). However,
if one of these variants of inflation is correct, the observable part of the Universe
is a very tiny part of it indeed. This boggles the mind.
Finally, it’s worth remembering that one of the motivations of inflation is to
solve the horizon problem and many others without invoking Planck scale
physics such as quantum gravity. We’re describing a general relativistic Universe,
which inevitably involves the gravitational constant G, and quantum mechanics,
which inevitably involves Planck’s constant ! = h/(2π). It’s therefore perhaps
inevitable but a little disappointing that the Planck scale should occur in various
forms in the inflation equations. The following exercise will demonstrate the
inevitability of the Planck scale in inflation.
)
Exercise 2.7 The number of e-foldings of inflation is roughly N = H dt.
Use the slow-roll approximation to show that
*
−8π φ1 V
N= 2 dφ, (2.47)
mPl φ2 V %
where φ2 and φ1 are values of the inflaton field at the start and end points of
inflation, respectively. We can choose φ1 = 0 without loss of generality.
Next, make the assumption that V % is roughly of the order of V /φ (which should
be true if the potential is reasonably smooth and slowly-varying) to show that
N ∼ (φ2 /mPl )2 and hence that we need φ2 significantly larger than mPl . ■
66
2.10 The real music of the spheres

Similarly, the same criteria can be used to show that the parameters ε and η are
both * 1 (Equations 2.25 and 2.26). Inflation ends when φ ∼ mPl , so we have
not escaped consideration of the Planck scale.

2.10 The real music of the spheres


We’ve already seen that the microwave background is strikingly uniform, in
marked contrast to the present-day matter density of the Universe.
● Why is it that the photons in the Universe are so uniformly distributed, while
the distribution of matter is so varied?
❍ The photon uniformity reflects the distribution at the time of recombination,
i.e. the time when the Universe was last opaque. This was the last time that
the photon distribution was strongly coupled to the matter distribution. Since
then, the matter distribution has evolved while the photons have travelled
more or less unimpeded through the Universe.
However, as we saw in Figure 2.2, there are slight inhomogeneities. Inflation
is one potential mechanism for generating these irregularities, as we saw in
Section 2.8, though we don’t yet know the shape of the inflation potential.
These inhomogeneities are the fluctuations that grew through gravity into the
present-day matter distribution of stars, galaxies and clusters of galaxies.
Again we’ll be characterizing these with the power spectrum, which is essentially
the RMS as a function of angular scale, using Fourier transforms. However, our
Fourier analysis implicitly used a flat space. The sky isn’t flat, so we need some
equivalent that works on the spherical surface of the sky. The idea is to replace the
sin and cos functions with some other functions that are appropriate for a sphere.
The functions usually chosen are the spherical harmonics Ylm , defined as A simpler way to express the
definition of spherical harmonics
Ylm (θ, φ) ∝ eimφ Plm (cos θ), (2.48)
is as the eigenfunctions of the
where the Plm are the associated Legendre polynomials. The θ and φ angular part of the ∇2 operator,
coordinates are the ones from spherical coordinates. We won’t derive the but this takes us outside the
Legendre polynomial functions, but just for completeness they are defined as scope of this course.

(−1)m 1 (
2 m/2 d
l+m 1 (l
Plm (x) = l
1 − x l+m
x2 − 1 , (2.49)
2 l! dx
where dl+m /dxl+m is the (l + m)th-order derivative. When the CMB structure is
expanded (like a Fourier series) in terms of spherical harmonics, the coefficients
used are named the monopole, dipole, quadrupole, octopole, and so on. The l = 1
Legendre polynomials have just one trigonometric function of θ (e.g. sin θ), while
the l = 2 polynomials have two (e.g. sin2 θ), and so on. Spherical harmonics are
also used in quantum mechanics, especially in describing electron orbits in atoms,
and in helioseismology.
We calculate (δT )/T CMB , where T CMB is the average CMB temperature and
δT = T − T CMB . As with the power spectrum above, we shall write this as δ,
though in this case δ will depend on the angular position q = (θ, φ) on the sky
rather than the spatial position x.

67
Chapter 2 The cosmic microwave background

The spherical equivalent of the Fourier transform is


+∞ E
E +l
δ(θ, φ) = alm Ylm (θ, φ). (2.50)
l=0 m=−l

The alm are the equivalent of Fourier coefficients:


*
∗ 2
alm = δ(q) Ylm d q, (2.51)

where the integral is done over the whole sky.


One important quality in defining this
) π spherical alternative to Fourier transforms
is orthogonality. In trigonometry, −π sin(nx) sin(mx) dx for integers m and
n is always
) zero unless m = n. A similar result holds in two dimensions.
Here Ylm Yl∗" m" d2 q = 0 unless l = l% and m = m% . It is this orthogonality
that is important in the Fourier components of density perturbations evolving
independently in the linear regime (Section 2.9 and Chapter 4). We’ll return to
this in Chapter 8 on the Lyman α forest clustering, and in Chapters 3 and 4 on the
large-scale structure of galaxies.
Because of isotropy, the coefficients alm are a function of l only, not m, so the
sum in Equation 2.50 can be expressed as (2l + 1) times the sum over l only. The
quantities l and m are like wave numbers on the sky, so the smaller the angle, the
larger the value of l. As a rough rule of thumb, the scales l are related to angular
sizes θ on the sky through l 1 180◦ /θ.
The power spectrum of the CMB is usually written as
Cl = (a2lm /. (2.52)
Conventionally, the Cl power spectrum tends to be plotted as
2
(ΔT )2 = l(l + 1)Cl T CMB , or sometimes with an additional divisor of 2π. This
See Chapter 18 of Peacock, J.A., measures the power per logarithmic interval in l, so a scale-invariant spectrum
1999, Cosmological Physics, looks horizontal in such a plot.
Cambridge University Press. How precisely can the Cl power spectrum be measured? There are only
2l + 1 m-samples of power at any fixed l, which limits the precision of the
measurements of Cl . This limit is known as cosmic variance. Formally, the
precision limit from cosmic variance comes out as
#
2
ΔCl = Cl . (2.53)
2l + 1
This is the best possible measurement, in the absence of any instrumental noise or
astrophysical foreground systematic effects. In order to measure the Cl modes
better than this, we’d need a bigger sky!
There have been many attempts to measure the CMB power spectrum. The first
great breakthrough was with the COBE (Cosmic Background Explorer) satellite,
which measured the anisotropies on scales larger than about 0.1◦ . This confirmed
that the power spectrum is approximately scale-invariant, exactly as predicted by
inflation. COBE also showed that the CMB spectrum was an excellent black body
(Figure 2.1), exactly as predicted by the Hot Big Bang model. The COBE results
won John Mather and George Smoot the 2006 Nobel Prize in Physics. The
Wilkinson Microwave Anisotropy Probe (WMAP) has now constrained much
68
2.10 The real music of the spheres

more of the Cl spectrum, shown in Figure 2.9. We shall see why the power
spectrum has peaks in Section 2.12. Many ground-based and balloon-borne
experiments have made constraints on the highest-l region, though the maps at
this resolution are not yet all-sky. This will change shortly with the European
Space Agency Planck mission, which launched on 14 May 2009. Planck is
also expected to make tremendous advances in measuring the clustering of the
polarized CMB, about which we shall hear more later.

10 100 500 1000


6000

5000

4000

3000

2000

1000

0
90◦ 2◦ 0.5◦ 0.2◦

Figure 2.9 The CMB power spectrum measured in the first five years of the WMAP satellite. The curve shows
the best fit to the data, in which the cosmological parameters are inferred. The grey region shows the scatter in the
data that one would expect from cosmic variance, i.e. the fact that you’re sampling only a finite region of the
2
Universe. The fluctuations plotted in the y-axis are l(l + 1)Cl T CMB /2π.
We’ve seen how the theory of inflation predicts a roughly scale-invariant spectrum
of density perturbations, and that the real horizon size at recombination was in
fact much larger than one would predict without inflation. Nevertheless, the
apparent (i.e. non-inflationary) horizon size is still a useful scale length. On sizes
much smaller than this scale, regions will have had time since inflation to affect
each other. On sizes much larger than this scale, the only causal contact could
have been during or prior to inflation. We’d therefore expect that the power
spectrum on large scales should have the roughly scale-invariant behaviour
predicted by inflation. This is known as the Sachs–Wolfe plateau and is indeed
what’s seen in observations. (The clustering amplitude on the Sachs–Wolfe
plateau also agrees with the amplitude of matter fluctuations in the local Universe
on 8 Mpc scales, known as σ8 , of which more later.) However, we’ll see in
Section 2.13 that the passage of photons through the Universe over the past 13 or
so billion years can cause some additional distortions on the largest scales.
Another effect that might leave its imprint on the CMB is the topology of the
Universe. If we travel for long enough in one direction, might we go right round

69
Chapter 2 The cosmic microwave background

the Universe and back to where we started? It’s possible to show that in any k > 0
(spatially spherical) universe, the expansion is too fast to permit this motion.
However, there’s another way to make this happen. Imagine a sheet of paper. We
can easily draw geodesics on this surface: they are just straight lines that you
would draw with a ruler. Now curl the paper into a tube. The lines previously
drawn are still geodesics. However, if you travel in one direction for long enough,
you get back where you started, despite the fact that geodesically the surface is
spatially flat. This is rather like the 1970s arcade game Asteroids in which you
can disappear off the edge of the screen in one direction and reappear at the
opposite edge. Curvature that changes the geodesics is called intrinsic curvature,
while curvature that doesn’t is called extrinsic curvature. Einstein’s theory
of general relativity makes predictions only for intrinsic curvature; we have
no theory making any prediction for extrinsic curvature. Going back to our
tube made of a piece of paper, we can’t link the two ends of the tube in three
dimensions without bending the tube and so generating intrinsic curvature, but if
we had four spatial dimensions, we could link the two ends and still have zero
intrinsic curvature. The paper would then be arranged into a torus shape, i.e. it has
assumed a different topology to a single sheet. What is the topology of our
Universe? A complex topology would leave characteristic imprints on the CMB if
the wrap-around scales were small enough. No such features have been found,
implying that any wrap-around topology in our Universe has to be at least around
the size of the Hubble volume.
We’ll see below how some of the fluctuations that are seen are due to acoustic
oscillations in the early Universe. Some audio representations of the acoustic
oscillations after the Big Bang can be found in the further reading section. The
cosmologist Peter Coles estimated the amplitude of these acoustic oscillations in
decibels (setting aside the obvious objections that there were no people to hear
them and the conditions were too hot and dense for terrestrial life anyway) and
found that the Big Bang was no louder than a rock band.

2.11 The CMB dipole


The CMB is extraordinarily uniform across the whole sky (Figure 2.2), as we’ve
seen. If we increase the contrast ratio of the image (Figure 2.2), we see that the
CMB is dominated by a characteristic pattern of hot and cold regions. This is the
dipole caused by the Doppler effect of our motion relative to the CMB rest frame.
We can derive this quite quickly in special relativity using the wave four-vector
!ω F
k= , kx , ky , kz , (2.54)
c
where ω is the angular frequency, related to the frequency ν and period T by
T = 1/ν = 2π/ω, and the k values are wave numbers, related to wavelengths λ
by λ = 2π/k. This four-vector describes the light waves and has zero invariant
length, i.e.
! ω F2
− kx2 − ky2 − kz2 = 0. (2.55)
c
Like any four-vector, it transforms with the Lorentz transformation (see
Appendix B). For simplicity (but without loss of generality) we’ll assume that our
70
2.11 The CMB dipole

motion is along the x-axis and we’ll consider a light ray in the xy-plane. There is
no z-axis component of the light ray’s motion, so the z-axis component of the
wave vector is zero, which is also true for all observers. We’ll also assume that
there is a CMB rest frame in which it appears uniform.
First, imagine a stationary observer on the Earth. He or she receives a CMB
photon in the xy-plane. Now we imagine making a Lorentz transformation to the
CMB rest frame, which we’ve chosen to be a velocity boost along the x-axis.
We’ll give the CMB rest frame primed coordinates. Applying the Lorentz
transformation (Appendix B, Section B.4), we find that an observer moving
relative to the Earth along the x-axis with velocity v will see a wave vector
C % D ! F
% ω % % % ω v v ω
k = , kx , ky , kz = γ − γkx , γkx − γ , ky , 0 . (2.56)
c c c c c
Focusing on the time-like (zeroth) component, we find that the observer in the
CMB rest frame will see the light at a different frequency:
ω% ω v
= γ − γkx . (2.57)
c c c
We can relate kx to ω using the null length of the wave vector (Equation 2.55) and
kz = 0:
! ω F2
= kx2 + ky2 . (2.58)
c
This is Pythagoras’s theorem, with the hypotenuse of the triangle as ω/c.
The angle that the light ray makes with the x-axis, θ, can be found from
trigonometry: cos θ = adjacent divided by hypotenuse, or kx /(ω/c). Therefore
kx = (ω/c) cos θ. Plugging this into Equation 2.57 and rearranging, we find that
! v F
ω % = γω 1 − cos θ , (2.59)
c
i.e. there is a θ-dependent blueshifting or redshifting. We’ve already found that a
redshifted or blueshifted black body spectrum is still a black body spectrum,
though with a different temperature. Therefore we can write
! v F
T % = γT 1 − cos θ , (2.60)
c
where T % is the temperature in the CMB rest frame, while T is the temperature as
seen from Earth. But we’ve assumed that the CMB has a uniform temperature in
the CMB rest frame, i.e. T % = constant, so we must see a fractional temperature
variation
T 1
= 1 (
T % γ 1 − vc cos θ
C ! v F2 D1/2 ! F−1
v
= 1− 1 − cos θ
c c
C ! F DC D
1 v 2 v v2 2
= 1− + ··· 1 + cos θ + 2 cos θ + · · ·
2 c c c
2
C D
v v 1
= 1 + cos θ + 2 cos2 θ − + ··· .
c c 2
Comparing this to Section 2.10, we see that our motion relative to the CMB
induces a dipole as well as having smaller effects on higher-order multipoles.
71
Chapter 2 The cosmic microwave background

● Can we use the CMB to measure the primordial dipole?


❍ No, unless we can find an alternative way to measure our motion relative to
the cosmic rest frame.

2.12 The acoustic peaks in the CMB


For most of the time that cosmology has existed as a subject for study, it’s been
extremely difficult to measure most fundamental parameters to anything much
better than a factor of two. This embarrassing situation changed dramatically in
the last decade, and precision cosmology is now possible. Much of this new
precision has come from measurements of microwave background fluctuations,
and we’ll see in this section why they are so uniquely powerful in cosmology.
The density fluctuations following inflation were imprinted jointly on the
dark matter density and the photon density and the baryon density. Before
recombination (Section 2.1), the motion of the baryons was strongly coupled
to the photons, because the photons were scattered against the free electrons
(Thomson and Coulomb scattering), while the electrons were themselves strongly
coupled electrostatically to the baryons (e.g. protons). Therefore we can think of
the photons and baryons as a joint photon–baryon gas or photon–baryon fluid.
The distribution of dark matter dominates the gravitational potential.
Gravitational attraction caused the photon–baryon gas to fall out of underdense
regions and into overdense regions. As the gas fell in and compressed towards
the centre of the overdense region, photon pressure outwards resisted the
inward flow. This sets up oscillations. The frequency of the oscillations is
νosc = cs /λ, where
- cs is the sound speed in the early Universe, which comes
out as cs = c/ 3 + 2.25 Ωb /Ωr . The value of λ depends on the size of the
inhomogeneity. The density perturbations were roughly scale-invariant
(Section 2.8), but some oscillations were still particularly favoured over others:
nearly the entire observable Universe acted as a resonating cavity!
The size of this cavity is the size of the sound horizon after inflation, i.e. the
distance that a sound wave could have travelled since the end of inflation. As with
a musical instrument, this resonating cavity has a fundamental note and has
overtones. The fundamental note has a wavelength that’s twice the size of the
sound horizon, which is the first and biggest peak in Figure 2.9. Overtones
correspond to situations where the sound horizon size is an integer multiple of
half-wavelengths (n = 2, 3, 4 . . .), which are the subsequent peaks in Figure 2.9.
Sometimes these peaks are called ‘Doppler peaks’, though Doppler motions are
only a small part of the physics of their generation. A more accurate term is
‘acoustic peaks’ since they arise mainly from the effects of acoustic waves. Each
of these peaks gives us a precise measurement of some cosmological parameters.

Exercise 2.8 Imagine that you are in the early Universe shortly after
recombination, watching the surface of last scattering recede from you. Would the
CMB at this time have the same acoustic peaks that we see today? ■
The first acoustic peak is determined mainly by the sound horizon size. This in
turn is mainly dependent on the Hubble parameter at that time, H, and therefore
on H0 . The angular size of this structure is found by calculating the angular
72
2.12 The acoustic peaks in the CMB

diameter distance to the surface of last scattering, which in turn has an


H0 -dependence. The ratio of the two is therefore almost H0 -independent. The
apparent angular size of the first acoustic peak is therefore determined almost
entirely by the geometry of the Universe, i.e. open, flat, or closed (see Figure 1.7).
Detailed calculations show that the l value of the first acoustic peak is predicted as
l 1 200 in a flat Universe.

Exercise 2.9 The Universe at the time of recombination (redshift


zrecomb 1 1090) was matter-dominated, because the epoch of matter–radiation
equality was much earlier (z 1 23 800 Ωm,0 h2 1 3160; see Exercise 2.3). Show
that the size of the√sound horizon at recombination was roughly 0.27 Mpc,
assuming cs 1 c/ 3. ■
It’s sometimes said that the sound horizon is a standard rod, i.e. an object with a
known fixed length in metres. If we measure the angular size of a standard rod,
we can calculate the exact angular diameter distance to it. However, the sound
horizon is not quite so simple since it depends on Ωb and Ωr . The good news is
that we can determine Ωb /Ωr from the other acoustic peaks. The sound horizon
has sometimes been called a ‘standardizable rod’. We’ll meet another sort of
standard measure in Chapter 3, the ‘standard candle’, but this will also turn out to
be standardizable in practice rather than standard.
Now, Ωr can be found ultimately from the normalization of the present-day black
body radiation spectrum. Meanwhile, the second acoustic peak in the CMB gives
us Ωb , as follows. The more baryons are swept along with the photons, the deeper
into the potential wells the photon–baryon flux goes (a process called ‘baryon
drag’). The effects are different on the odd-numbered and even-numbered
acoustic peaks, because only the odd-numbered peaks contain a half-wavelength,
so the odd-numbered peaks are particularly sensitive to the amplitude of this
oscillation. The strength of the odd-numbered peaks is mainly about how far into
the potential well the baryons move, so increasing the baryon density will tend to
enhance the odd-numbered peaks. The measurement of the second acoustic peak
by WMAP is now the best experimental constraint on Ωb,0 , against which one can
test the predictions of primordial nucleosynthesis (Section 2.6).
The peaks on smaller angular scales depend on oscillations that started
earlier, when the sound horizon was smaller, probing times even earlier than
matter–radiation equality (i.e. when Ωm = Ωr ). It turns out that this extra
information is enough to unpick the dark matter density (which dominates Ωm ).
Figure 2.9 also shows that the higher acoustic peaks are suppressed. The reason
for this is that the transition to a transparent Universe wasn’t instantaneous. As the
opacity of the Universe dropped, the photons started diffusing away from the
positions that they had while the Universe was opaque. Once the Universe became
transparent, the photons travelled in a straight line (‘free-streamed’), but just
before that point they were undergoing a random walk. This diffusion smoothed
out the structure on the smallest scales. The strength of this effect depends on how
long it took the Universe to make the transition from opaque to transparent.
Alternatively, another way of looking at this is that structures are smoothed out if
they are smaller than the thickness of the last scattering surface. This effect is
known as diffusion damping or Silk damping, after the cosmologist Joe Silk
who first characterized this effect. It’s also dependent on Ωb because increasing

73
Chapter 2 The cosmic microwave background

the numbers of baryons also increases the rate of collisions that the photons
experience. The shape of this damping tail is an important consistency check for
the cosmological parameters derived from the acoustic peaks.
One final surprise is that this frozen cosmological sound wave structure has never
gone away — we’ll see in Chapter 3 that it’s still visible in the galaxy distribution!

2.13 The Sachs–Wolfe effect


After recombination, the density perturbations in the Universe slowly evolve.
Overdensities of dark matter attract the surrounding matter, so matter flows
from the underdense regions into the overdense regions. The Universe is now
transparent, so there is no photon pressure to prevent matter falling in, so instead
of forming oscillations the overdensities grow as more and more matter is drawn
in. Similarly, the voids empty out of matter, becoming more and more significant
underdensities.
The time evolution of matter density now has a measurable effect on the photons.
For example, a photon could pass into a density enhancement, but by the time the
photon crosses the overdensity and emerges out the other side, the overdensity has
grown. The photon finds that it has to climb out of a bigger potential well
than it originally fell into, so the photon will end up with a slight redshift.
Similarly, photons passing through a void will find themselves with a net blueshift
as the voids become more significant underdensities. This is known as the
See, for example, Sachs, R.K. Sachs–Wolfe effect after its discoverers. This effect is detectable in the CMB,
and Wolfe, A.M., 1967, known as the early Integrated Sachs–Wolfe effect. Much later, large-scale
Astrophysical Journal, 147, 73. density inhomogeneities such as galaxy clusters affect the passage of photons in
exactly the same way. The early inhomogeneities had no luminous matter to mark
them, but the later, larger clusters and voids can be traced by optical galaxies.
This is sometimes known simply as the Integrated Sachs–Wolfe effect . Finally,
as the Universe began to enter the ΩΛ > 0 phase, the acceleration of the expansion
rate caused the depths of potential wells to decrease, leading to a further late-time
Integrated Sachs–Wolfe effect. In other words, photons gain energy falling into
galaxy clusters, but do not lose all this energy on leaving them again. Meanwhile,
photons expend energy to climb up the potential well into a void, but do not gain
all this energy back on leaving it. In both cases the accelerating expansion of the
Universe has reduced the strength of the overdensity or underdensity. The effect is
It is sometimes also known as sometimes abbreviated as ISW.
the Rees–Sciama effect, though
beware: sometimes the latter is The key prediction of the ISW effect is that there should be a correlation
used for the non-linear terms in between CMB hot spots and foreground galaxy clusters. Similarly, one can
the derivation of this effect, predict a correlation between CMB cold spots and foreground voids.
while ‘Integrated Sachs–Wolfe’
is used for the linear terms.
This is exactly what is seen in correlating the galaxy clusters and voids in the
Sloan Digital Sky Survey with the CMB. Figure 2.10 shows part of the CMB map
from WMAP, with the positions of foreground galaxy clusters marked in red and
foreground voids marked in blue. The team who did this claimed a trend for
foreground clusters to be associated with warm (red) regions, while voids tended
to be cooler (blue). Since this trend does not appear compelling from this image
74
2.13 The Sachs–Wolfe effect

alone, they took small segments of the WMAP sky map around each of their
galaxy clusters and averaged these small images together. They did the same on
the voids. The resulting averaged images are shown in Figure 2.11. The resulting
detection is significant at over 4 standard deviations.

Figure 2.10 The CMB, with the positions of known foreground galaxy clusters
(red) and voids (blue) marked.

voids clusters
20
8◦

10
4◦

0◦ 0
µK

−4◦
−10

−8◦
−20
−8◦ −4◦ 0◦ 4◦ 8◦ −8◦ −4◦ 0◦ 4◦ 8◦
Figure 2.11 Average CMB images at the positions of voids and clusters.
We know from the acoustic peaks in the CMB that we live in a spatially flat
universe to a good approximation (Section 2.12); the detection of a late-time ISW
effect can be reconciled with spatial flatness only if there is a cosmological
constant (or, more generally, dark energy — see later). We’ll see below that it is
very difficult to measure ΩΛ from the CMB alone, so cross-correlating foreground
populations with the CMB to search for the ISW is a very valuable additional use
of the CMB maps.
This technique (averaging images of separate objects in order to detect the
average signal from the population) is an example of a stacking analysis. This is
used very widely in observational cosmology.
75
Chapter 2 The cosmic microwave background

2.14 Reionization
After the Big Bang, what generated the first light in the Universe? Were stars the
first luminous objects in the Universe that illuminated the darkness, or accreting
black holes? In Chapter 8 we’ll discuss what observational constraints we have on
the first light in the Universe, and also another way of finding Ωb . However, the
CMB gives its own unique constraint on the first light in the Universe.
The effect is similar to Silk damping (Section 2.12). Once the first luminous
objects have reionized the Universe, the free electrons liberated by ionizing the
‘Reionization’ is perhaps a atoms can once again scatter CMB photons through Thomson scattering. As with
misleading term. We speak of Silk damping, the effect is to suppress the acoustic peaks in a characteristic
‘recombination’ at z 1 1000 to manner. This time, however, the Universe is transparent and the mean free path of
describe the formation of the the photons is much larger, of the order of the horizon size. (This also implies that
first neutral atoms. These are the new acoustic oscillations won’t form.) The suppression therefore acts on both
first atoms, so we can hardly large and small scales. The overall effect of reionization resembles a change in the
speak of their ‘recombining’, overall normalization of the fluctuations, except on the largest scales.
since they are combining for the Another effect of reionizing the Universe is to change the polarized components
first time! Nevertheless, the term of the CMB, because scattered light is polarized. We’ll discuss the polarization
‘recombination’ is the one of the CMB in Section 2.16. The current CMB constraints on the epoch of
used. Similarly, ‘reionization’ reionization from the five-year WMAP results are shown in Figure 2.12. While
is open to criticism, since the redshift of reionization is not very well determined, the optical depth τ to
these atoms are being ionized Thomson scattering is better measured: the probability of a photon undergoing
for the first time. They are Thomson scattering is defined as (1 − e−τ ), where the five-year WMAP CMB
re-making an ionized plasma data set the constraint τ = 0.087 ± 0.017.
that existed at z > 1000, but this
is the first example of the
1.0
ionization process, so the term 1
‘reionization’ at z > 6 could be 0.8
considered as inappropriate as
‘recombination’ at z 1 1000. 0.6
Nevertheless, these are the terms
in use. One can only apologize. 0.4

0.2

0 0.0
0 5 10 15 20 0 5 10 15 20 25 30

Figure 2.12 The constraints on reionization from the WMAP CMB maps. The left-hand panel is the likelihood
constraints from the WMAP 3-year and 5-year data sets (note the improved constraint from the extra two years of
data), assuming an instantaneous reionization at a redshift zr . But the CMB data don’t in themselves require the
reionization to be instantaneous. The right-hand panel shows the constraints if we assume a two-step model: the
reionization was instantaneously set to a level of xe at redshift zr , then instantaneously completely ionized at
redshift 7. The dark shaded region is the 1σ contour, i.e. there is an approximately 68% chance that the underlying
value is in that region, while the light shaded region is the 2σ contour (1 95%).

2.15 Cosmological parameter constraints


Figure 2.13 shows the effect that varying some cosmological parameters has on
the acoustic peaks. As we saw in Section 2.12, these effects are quite strong and
76
2.15 Cosmological parameter constraints

are our best experimental constraints on many cosmological parameters at the


time of writing. Unfortunately, the CMB on its own isn’t quite enough to constrain
all the cosmological parameters. To see why, see panel (b) of Figure 2.13.

(a) curvature (b) dark energy


100

80

60
ΔT /µK

40

20
Ωtot,0 ΩΛ,0

0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8


0
(c) baryons (d) matter
100

80

60
ΔT /µK

40

20
Ωb,0 h2 Ωm,0 h2

0.02 0.04 0.06 0.1 0.3 0.5


0
10 100 1000 10 100 1000
multipole moment, l multipole moment, l
Figure 2.13 How the shapes of the acoustic peaks depend on present-day cosmological parameters. All the
models are varied around a common starting point of Ωtotal,0 = 1, ΩΛ,0 = 0.65, Ωb,0 h2 = 0.02, Ωm,0 h2 = 0.147,
a scale-invariant
- power spectrum and no reionization. The temperature fluctuation ΔT plotted is defined as
ΔT = l(l + 1)Cl /2π T CMB .

77
Chapter 2 The cosmic microwave background

The positions of the acoustic peaks are only very weakly dependent on ΩΛ,0 , for a
fixed Ωk,0 . If Ωb,0 is fixed and Ωk,0 = 0, then we can constrain Ωm,0 (panel (d) of
Figure 2.13), but there is not enough information in the CMB to constrain Ωk,0
and Ωm,0 and Ωb,0 and ΩΛ,0 all simultaneously.
The fact that constraints on one parameter can correlate with constraints on
another is known as parameter degeneracies. These degeneracies are intrinsic to
CMB experiments, but the degeneracies can be broken by including a comparison
with other, non-CMB experiments. For example, Figure 2.14 shows the
constraints on Ωm,0 and ΩΛ,0 from the CMB and from high-redshift supernovae
(which we shall meet in Chapter 3). The supernovae and CMB constraints
both have degeneracies, but they are in opposite directions, so combining the
constraints makes it possible to measure Ωm,0 and ΩΛ,0 separately. Another
example that we’ve already met is the late-time ISW (Section 2.13), from which
one can infer ΩΛ,0 if Ωk,0 is known.

H0 : 30 40 50 60 70 80 90 100
1.4

1.2

1.0

0.8
ΩΛ,0

0.6

0.4 fla
t
0.2

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4


Ωm,0

Figure 2.14 The constraints on the cosmological density parameters from the WMAP CMB measurements and
from high-redshift supernovae. The contours show the 1σ (inner ring), 2σ (middle ring) and 3σ (outer ring)
allowed range for the supernova data of Kowalski et al (2008). The dots show some Monte Carlo (i.e. random)
realizations of the WMAP data, in which model universes are selected in proportion to their likelihood of fitting the
WMAP data of Dunkley et al. The value of the Hubble parameter H0 is colour-coded for these model universes.
The combination of the WMAP data and the supernova data leaves only a very small region of this plane mutually
allowed, thus breaking the parameter degeneracy in the WMAP data alone.
Together, the parameter constraints from the CMB and other sources have
converged remarkably on the parameters. Overall, the level of agreement between
the CMB and other constraints has led to the resulting cosmological model being
called the concordance cosmology.
A completely different (and controversial) type of constraint is also worth
mentioning. The Universe contains intelligent life. Can we use this fact on its own
to constrain the cosmological parameters? This line of argument is known as the
anthropic principle. This might explain why, for example, we find ourselves in a
Universe after recombination and at a time when Ωm > Ωr so gravitational
78
2.16 The polarization of the CMB

collapse is possible and stars can form. The Universe also cannot be so old that
stars no longer form, as might be the case if the age were measured in hundreds of
billions of years, or when Ωm 1 0. Using anthropic arguments to constrain our
position in time and space within a given cosmological model is known as the
weak anthropic principle. This is quite a departure from the Copernican
principle! It is also an example of a ‘selection effect’, about which we shall say
more in Chapter 4. Anthropic arguments attracted some interest prior to precision
cosmology, but the experimental constraints on the concordance cosmological
model are now much stronger than the anthropic constraints.
A more radical variant is to suppose that an ensemble of universes exists (a
so-called ‘multiverse’), each universe having different fundamental physical
constants. Anthropic arguments can then be used to constrain which parts of this
ensemble intelligent observers could inhabit. The underlying assertion that our
own Universe must be suitable for the formation of intelligent life (from which
one might constrain the fundamental physical constants) is sometimes known as
the strong anthropic principle. While this could explain various apparent
fine-tunings in physical constants, the disadvantage of these arguments is that they
give no physical mechanism for explaining parameter values. There is also
predictably some disagreement over how to best calculate likelihood distributions
for fundamental physical constants on anthropic grounds. Also, is this part of
testable science? It may be or may become so if a testable theoretical framework
could be found for explaining this ensemble. In any case, is ‘science’ exclusively
concerned with things that are testable in practice, now? We won’t rehearse the
debates in this book but you will no doubt sense that this is an area that can
generate a great deal of controversy.

2.16 The polarization of the CMB


A small percentage of the CMB is polarized, and like the unpolarized CMB this
polarized component also has structure. This polarized signal is an independent
measure of the cosmological parameters and could be the key to uncovering the
physics of inflation. At the time of writing only part of this polarized clustered
signal has been reliably detected, but CMB polarization will be a major focus of
observational cosmology in the coming decades.
The primordial CMB polarization is generated only by the scattering of CMB
photons, so it is therefore sensitive to both recombination and reionization. The
CMB has no circular polarization, because this can’t be generated by scattering.
The scattering is also wavelength-independent, so the scattered CMB spectrum is
the same as the unpolarized CMB spectrum. For that reason it’s usual to use the
fractional polarization, in particular to measure the temperature difference
that the polarized light has relative to the mean CMB temperature, e.g.
(Tpol − TCMB )/(TCMB /, where the angle brackets indicate an average. There is
also more scope for information in polarization on the sky because polarizations
have directions.
There are many ways of representing polarization mathematically. CMB science
tends to break the linear polarization into two components known as E-mode and
B-mode. (The origins of these names are an analogy to electromagnetism — these
are not the electric and magnetic field components of the CMB electromagnetic
79
Chapter 2 The cosmic microwave background

wave! It’s not essential to our story, but if you want details on this analogy, see
the box below.) There’s a physical reason for doing this: it turns out that the
B-modes are sometimes also primordial B-mode is due entirely to primordial gravitational waves!
called tensor modes, though we
don’t usually use that expression
The Helmholtz–Hodge theorem
in this book. The term is related
to how they can be expressed as The electromagnetic analogy is as follows. There’s a general mathematical
perturbations of the metric. theorem, known as the Helmholtz–Hodge theorem, that any vector field v
can be expressed in two parts: v = B + ∇φ, where φ is a scalar field and B
has no divergence, i.e. ∇ · B = 0. This is like electromagnetism, where the
electric field is E = ∇φ and φ is the scalar potential in electromagnetism;
also, the lack of magnetic monopoles in electromagnetism implies that
∇ · B = 0, where B is the magnetic field. It’s also generally true that a curl
of a gradient is zero, i.e. ∇ × ∇φ = 0 for any scalar field φ, implying
∇ × E = 0. So any vector field can be broken into a ‘magnetic’
(i.e. divergence-free) component and an ‘electric’ (i.e. curl-free) component.
Now, in the case of our CMB linear polarizations we are dealing not with a
vector field but with something subtly different, because if we rotated the
polarization by 180◦ we’d get the same polarization, which isn’t true of a
vector. The mathematical expressions for the ‘electric’ and ‘magnetic’
components of CMB polarization are therefore slightly different. We won’t
go into these differences here, but there is more information in the further
reading section.

How do you measure the clustering of the polarized CMB? Using a mathematical
formalism similar to the unpolarized CMB structure, it’s possible to quantify the
structure in the polarized background as a function of angular size. Instead of
measuring the difference in unpolarized temperatures between two locations
(known as the T T power spectrum, with T standing for temperature), we could
measure the differences in (say) the temperatures of the E-component of the
polarization between two locations. This is sometimes referred to as the EE
power spectrum. Similarly, we could measure the clustering of the B-mode,
which would be called the BB power spectrum. We could also compare, say, the
E-mode polarization temperature in one place with the unpolarized temperature
in another. This cross-correlation would go by the name T E. In all there are
six possible permutations: T T , EE, BB, T E, T B, EB. Of these, it can be
shown from parity arguments that T B and BE should be zero, so there are four
astrophysically useful cross-correlations. The predicted levels of these clustering
strengths are shown in Figure 2.15. Note that the EE oscillations are out of phase
with the T T oscillations, for reasons related to how the light is scattered at the
time of recombination (see the further reading section for more details).
The detection of B-mode polarized clustering would be tremendously exciting,
because the primordial gravitational wave background constrains the shape of the
inflation potential (Section 2.8) and would be the first genuine consistency test of
inflation. If we describe the scalar clustering power spectrum as ClS ∝ lns −3
(where ns is as defined in Equations 2.44 and 2.45) and the tensor clustering as
ClT ∝ lnT −3 , then inflation’s predictions for powers are ns 1 1 + 2η − 6ε and
nT 1 1 − 2ε, respectively, where ε and η are defined in Section 2.8 and are related

80
2.16 The polarization of the CMB

100

10 TE
reionization

EE
ΔT /µK

1
BB

0.1 gravitational
lensing
gravitational
waves
0.01
10 100 1000
multipole moment, l

Figure 2.15 The predicted unpolarized CMB power spectrum (upper line),
compared to the T E power spectrum, the EE power spectrum and the BB power
spectrum. Negative values are dashed, and the predicted 1σ (i.e. 68% confidence)
uncertainties from the European Space Agency Planck satellite are shown as bars.
Note that the polarized signals are much weaker than the unpolarized signal.
to the shape of the inflation potential. These powers are sometimes referred to as
spectral indices of the density perturbations. There is also a prediction for the
amplitudes of the scalar and tensor clustering to be related by ClT /ClS = 12.4ε.
The consistency test of inflation is whether the tensor spectral index agrees
with the relative strengths of the scalar and tensor modes. Unfortunately, the
closer the scalar power spectrum is to scale-invariance, the harder it is to test
inflation; the current experimental constraint on the scalar spectral index is
0.0081 < 1 − ns < 0.0647 (WMAP five-year constraint).
Experimentally, this is a very challenging experiment for the current generation of
detectors, as can be seen from Figure 2.15. Currently, the only constraints on the
polarized signals are from the T E cross-correlation power spectrum from the
WMAP satellite, shown in Figure 2.16. The detection of primordial gravitational
waves is the next great challenge for CMB experiments and could show us the
path to new physics. One of the problems is the existence of astronomical
foregrounds that could contribute to the BB power spectrum. Gravitational
lensing of the CMB by the intervening large-scale structure of the Universe
deflects the CMB photons so changes the Cl clustering signal (see Figure 2.15). It
can produce a BB correlation, particularly at smaller scales of l > 500 or so. We
discuss gravitational lensing in more depth in Chapter 7. It’s by no means certain
that there is a gravitational wave background to detect: a rival theory to inflation
known as the ‘ekpyrotic Universe’ predicts no detectable CMB gravitational wave
background. This model is based on an extension of superstring theory known as

81
Chapter 2 The cosmic microwave background

M-theory. These theories predict more dimensions than our usual three space and
one time. Our Universe is imagined to be a ‘sheet’ in a higher-dimensional
spacetime, and the trigger for the Big Bang in this model was the collision
between two such sheets.

2
TE

[T CMB l(l + 1)ClT E /2π]/µK2 1

0
2

−1
10 100 500 1000
multipole moment, l
1.0
TB
[T CMB l(l + 1)ClT B /2π]/µK2

0.5

0
2

−0.5
10 100 500 1000
multipole moment, l

Figure 2.16 The T E polarized CMB clustering detected by the first seven
years of WMAP, and WMAP’s non-detection (as expected) of the T B clustering
signal. The boxes show WMAP’s five-year constraint, showing the improvement
from adding a couple of years’ more data.
We shall meet gravitational wave observatories briefly in Chapter 6. The direct
detection of primordial gravitational waves in the new generation of gravitational
wave observatories is also very challenging, partly because the energy density in
gravitational radiation redshifts in the same way as the photon energy density.
Another source of CMB photon scattering is the electrons that are liberated at the
epoch of reionization. As with the surface of last scattering at recombination, the
scattered CMB light will be partly polarized. The polarized structure of the CMB
is therefore simultaneously sensitive to physical processes at both recombination
and reionization. This shows up as a bump in the E-mode polarization on large
angular scales, around l of a few.

82
2.17 Dark energy and the fate of the Universe

2.17 Dark energy and the fate of the Universe


Einstein’s field equations of general relativity equate a measure of the spacetime
curvature (the Einstein tensor) with a measure of the mass-energy and momentum
(the energy–momentum tensor). What does this mean? In what sense is a
curvature equal to a mass-energy? Is the mass-energy causing the curvature? Or
perhaps the reverse? The origin of mass is a deep and obscure problem in particle
physics. It may be that the Large Hadron Collider will detect the Higgs boson,
which is a prediction of one candidate explanation of the origin of mass. (Like
inflation, this also invokes a new scalar field, though it’s known that the Higgs
field and the inflaton field cannot be identical.) However, even if the Higgs boson
is detected, the causes of the link between spacetime curvature and mass-energy
will still be a mystery.
Spacetime also has an in-built capacity or tendency to expand, characterized by
the cosmological constant Λ as we noted in Chapter 1. According to how we
choose to interpret Einstein’s field equations, we could choose to regard Λ as a
property of spacetime, or we could choose to regard it as some substance within
space that drives an expansion. We’ve already treated inflation in this way, by
starting off by asking what sort of content in space could explain inflation
(Section 2.7). We’ve also already implicitly considered the cosmological constant
in these terms, by giving it an effective energy density parameter ΩΛ in Chapter 1.
If Λ is the result of spacetime containing some substance, what sort of substance
would it be?
First, let’s remind ourselves of the fundamental equations of the expansion from
Chapter 1:
C D
dR 2 8πGρR2 Λc2 R2
= Ṙ2 = − kc2 + , (Eqn 1.7)
dt 3 3
C D C D
d2 R d dR 3p R Λc2 R
= = R̈ = −4πG ρ + 2 + . (Eqn 1.8)
dt2 dt dt c 3 3
If we want to regard the cosmological constant as having an effective energy
density ρΛ and pressure pΛ , we could write these as
9 <2
Ṙ −kc2 8πG E
H2 = = + ρi , (2.61)
R R2 3
C D
R̈ −4πG E 3pi
= ρi + 2 , (2.62)
R 3 c
where the sum is carried over all components, e.g. matter, radiation and in this
case the cosmological constant. In order to reconcile Equations 1.7 and 2.61, one
needs that
8πGρΛ Λc2
= ,
3 3
i.e. ρΛ = Λc2 /(8πG).
In Section 2.7 we wrote the equation of state of a gas as p = wρc2 , where p is the
pressure and ρ is the density. The parameter w defines the equation of state. We
can regard all the contents of the Universe as ‘gases’ in this sense. For example,

83
Chapter 2 The cosmic microwave background

pressureless matter has w = 0. We can re-cast Equation 2.62 in this way as


R̈ −4πG E
= ρi (1 + 3wi ), (2.63)
R 3
where wi is the equation of state parameter for the ith component. In order to
reconcile this with Equation 1.8, we need that
−4πG Λc2
ρΛ (1 + 3wΛ ) =
3 3
or
−4πG Λc2 Λc2
(1 + 3wΛ ) = ,
3 8πG 3
so
−1
(1 + 3wΛ ) = 1
2
thus
wΛ = −1,
i.e. the cosmological constant ‘substance’ has an effective negative pressure! The
same is true of the inflaton field. However, inflation occurs at an energy scale of
around 1024 eV, while the energy scale of the cosmological constant is more like
10−3 eV. The physical processes behind inflation and the cosmological constant
would appear to be quite different.
We have no idea of the underlying physical causes of Λ. This has led some
cosmologists to speculate that wΛ might be different from −1, or even that wΛ
might be time-varying. In the latter case, the ‘cosmological constant’ would not
be a constant. These modifications to the cosmological constant are generically
known as dark energy. The cosmological constant Λ is therefore a special case of
dark energy, in which w = −1 at all times. Note that dark energy appears to
be very different to dark matter. Sometimes dark matter and dark energy are
collectively called the ‘dark sector’. In total, the dark sector comprises about 90%
of the present-day energy density of the Universe, yet we know next to nothing
about its physics.

Exercise 2.10 List at least four major differences between dark matter and
dark energy. ■
Figure 2.17 shows the observational constraints on the dark energy equation of
state. Note that there is a strong parameter degeneracy between the present-day
dark energy density ΩΛ and the equation of state. However, combining this with
other observational constraints narrows the field considerably, such as the Hubble
Space Telescope determination of the Hubble parameter, baryonic acoustic
oscillations and high-redshift supernovae (which we shall meet in Chapter 3).
If we admit the possibility of a time-varying equation of state, the constraints
worsen considerably. If we write w = w0 + w% z/(1 + z) for the dark energy
equation of state, the corresponding constraints are shown in Figure 2.18.
However, there are good reasons for believing that the constraints will improve.
For example, the late-time ISW is sensitive to a late phase of accelerated
expansion in the Universe, and changes in the dark energy equation of state
produce changes in the ISW signal.
84
2.17 Dark energy and the fate of the Universe

0
0.8

−0.1 0.7

ΩΛ,0
Ωk,0

0.6
−0.2 WMAP
0.5 WMAP + HST
WMAP + BAO
WMAP + SN
−0.3 0.4 WMAP + BAO + SN

−2.5 −2.0 −1.5 −1.0 −0.5 0 −2.5 −2.0 −1.5 −1.0 −0.5 0
w w

0.04 0.04

0.02 0.02

0 0

−0.02 −0.02
Ωk,0

Ωk,0

−0.04 −0.04

−0.06 −0.06

−0.08 −0.08

−0.10 −0.10

−2.5 −2.0 −1.5 −1.0 −0.5 0 −2.5 −2.0 −1.5 −1.0 −0.5 0
w w
Figure 2.17 The constraints on the dark energy equation of state w from the WMAP CMB maps and other
surveys, compared to the constraints on other parameters. The dark shaded regions are the 1σ contours, i.e. an
approximately 68% likelihood that the underlying value is in that region, while the lighter shaded regions are 2σ
(1 95% likelihood). The bottom-right panel uses BAOs (Chapter 3) from SDSS luminous red galaxies, while the
bottom-left panel uses a wider compilation.

Are there any theoretical reasons for expecting w to depend on time? We could
follow a similar line of reasoning to inflation and imagine that spacetime is filled
with (another!) scalar field, which we’ll call φΛ . (We’ll use the Λ subscript to
distinguish this field from the inflaton field.) If it has a potential of VΛ (φΛ ), then
following Section 2.8, the field will satisfy φ̈Λ + 3H φ̇Λ = VΛ% (φΛ ), where dots are
time derivatives and the prime is a derivative with respect to φΛ . (Compare
Equation 2.18 — again we’re assuming that the field is the same everywhere in

85
Chapter 2 The cosmic microwave background

Figure 2.18 The


2.0 constraints on the
time-variation of the dark
energy equation of state from
1.0 the WMAP CMB data
combined with supernovae,
BAOs (Chapter 3) and
0 nucleosynthesis constraints.

w"
The dark shaded regions are
the 1σ contours, i.e. there is
−1.0 an approximately 68%
likelihood inferred from the
CMB data that the underlying
−2.0 value is in that region, while
−1.4 −1.2 −1.0 −0.8 the lighter shaded regions are
w0 2σ (1 95% likelihood).

space.) Pursuing the analogy of a ball rolling or sliding down a hill, we could
regard VΛ as a potential energy, and 12 φ̇Λ2 as a kinetic energy that we’ll call KΛ . As
with inflation, it turns out the energy density comes out as ρΛ ∝ KΛ + VΛ , while
the pressure is pΛ ∝ KΛ − VΛ (compare Equations 2.19 and 2.20), so the equation
of state is w = (KΛ − VΛ )/(KΛ + VΛ ). This would naturally vary with time. If the
field is slowly-rolling, then KΛ * VΛ so w 1 −1, i.e. it would look like a
cosmological constant. Before that point it would have w in the range −1/3 to −1
(Section 2.7). These models are sometimes called quintessence since they
postulate a fifth fundamental field in addition to the four fundamental forces of
nature. The mass of the particle associated with this field turns out to be very
small indeed in particle physics terms, namely 10−33 eV, which leads to a whole
new set of fine-tuning problems.
One curious regime that is not prohibited by experimental constraints is w < −1,
sometimes called ‘phantom energy’. These models radically change the projected
fate of the Universe. To see why, we’ll use the variation of energy density for an
adiabatic expansion: ρ ∝ (R/R0 )−3(1+w) , which we also met in Section 2.7. We
won’t prove this thermodynamic formula here, but note that for a photon gas
(w = 1/3) this gives ρ ∝ (R/R0 )−4 , while for pressureless matter (w = 0) this
gives ρ ∝ (R/R0 )−3 . Adapting Equation 2.61, we find that
C D−3(1+wi )
H2 E R
= Ω i,0 , (2.64)
H02 i
R0

where the wi are the equations of state of each of the contents of the Universe.
Once we get to an epoch when dark energy dominates, we’ll have that
C D−3(1+w )
R Λ
H2 ∝ ,
R0
where we’re now writing wΛ for the equation of state of the only remaining
component, dark energy. If wΛ = −1 (i.e. a cosmological constant), then the
right-hand side becomes a constant, so H tends to a constant value, which is what
we found in Section 1.11. However, if wΛ < −1, then H is perpetually increasing.
This means that the radius of the cosmological event horizon will shrink. When it
86
2.17 Dark energy and the fate of the Universe

becomes smaller than the size of any gravitationally-bound object, that object will
become unbound. Clusters of galaxies will be unbounded, then galaxies, then
stars and planets. Eventually even atoms will become unbound. This model
universe has been called the ‘big rip’. The event horizon size reaches zero in a
finite time, at which point the scale factor of the Universe becomes infinite. This
singularity represents an effective end of the Universe. For wΛ = −3/2, the
Universe ends in about 21 Gyr. Galaxy clusters would be ripped apart about
1 Gyr before the end, galaxies about 60 Myr before the end, solar systems about
3 months before the end, planets about half an hour before the end, and atoms
about 10−19 seconds before the end. The Universe would also spend most of its
history in a dark-energy-dominated phase, avoiding the need for anthropic
arguments. There are, however, ongoing debates as to whether any viable
phantom energy model could be generated from particle physics considerations.
Finally, it’s worth linking the discussion of dark energy with our earlier discussion
of inflation. When we calculated the effective pressure and density of the inflaton
field in Section 2.8, we found that V 0 φ̇ led to an effective equation of state
w = −1 (Equations 2.19 and 2.20). As we’ve seen in this section, this is
equivalent to a cosmological constant, so the slow-roll approximation leads to a
Universe very much like a cosmological-constant-dominated de Sitter universe.
This inflation was driven by the difference between the initial value of V (φ)
and the minimum. We implicitly assumed that when the Universe reaches the
minimum in V (φ), inflation stops, implying V = 0. However, it’s by no means
clear that V = 0 is the natural minimum value. The framework of inflation only
deals with potential differences, but in general relativity the absolute value of the
energy (in the form of the energy–momentum tensor) determines the curvature
and dynamics of spacetime.
One might, for example, expect the zero-point energy to be set by Planck scale
physics. Since Λ has dimensions of one over length squared, we might expect
−2
Λ ∼ rPl 1 1070 m−2 . However, as the following exercise shows, this is wildly
out of kilter with the observations.
Exercise 2.11 Using the experimental values ΩΛ,0 = 0.742
and H0 = 72 km s−1 Mpc−1 , show that the observed value of Λ is
1.3 × 10−52 m−2 . ■
This dimensional analysis gets the answer wrong by 122 orders of magnitude! We
might make a slightly more physically-motivated estimate by imagining that
the quantum vacuum is made up of quantum mechanical simple harmonic
oscillators. It’s a standard result in quantum mechanics that the wave function of a
particle of mass m has a zero-point energy of E0 = hω/(4π), where ω is the
angular frequency of the oscillator. We could consider a box in the Universe
with a side L and , add up all the zero-point energies of these oscillators:
E = (1/4π) × j hωj , where ω 2 = k 2 c2 + m2 c4 /!2 and k = 2π/λ, where λ is
the de Broglie wavelength. We can let the dimension of the box L tend to infinity.
The periodic boundary conditions of the box imply that the only wavelengths
allowed are λx = L/nx in the x-direction (where nx is an integer). Therefore in
the interval kx to kx + dkx , there should be (L/2π) dk separate values of kx . The
same applies in y and z, in which case the energy per unit volume becomes
*
E h ω(k) 3
3
= d k.
L 4π (2π)3
87
Chapter 2 The cosmic microwave background

Unfortunately, this integral diverges. Perhaps this is OK, since we should expect
our low-energy quantum mechanical calculations to break down at some length
scale, which should be 1/λ = k 0 mc/!. If we set this minimum length scale to
the Planck length, the vacuum energy density comes out again as 120 orders of
magnitude greater than the observed Λ. As the cosmologist John Peacock said:
‘We are left with the strong impression that some vital physical principle is
missing. This should make us wary of thinking that inflation, which exploits
changes in the level of vacuum energy, can be the final answer.’
So this is the awkward position that we find ourselves in. We’ve tried to solve the
horizon problem and other problems avoiding Planck scale physics with the
GUT-scale inflation model, but we found that inflation doesn’t quite escape all
considerations of the Planck scale. We have no way of predicting even the order
of magnitude of the cosmological constant from fundamental principles. We’ve
filled space with at least three fundamental scalar fields (Higgs, inflaton, dark
energy), while being unable to reconcile the fundamental conceptual bases of
general relativity and quantum field theory at the Planck scale. Perhaps there is a
tremendous conceptual breakthrough coming soon, but it’s overdue, as the
problems have been around for some decades. Perhaps there’s some vital piece
of experimental evidence that will provide the trigger, or perhaps an insight
will provide a thrilling breakthrough. There are some contenders, but with
experimental constraints and/or quantitative testable predictions hard to come by,
they’re still a little speculative to cover in depth in a book such as this. As a
professional scientist one has a choice: do you take a punt on chasing these
fundamental problems, or perhaps do you try something fundamental but more
tractable? We have the astonishing capability to build telescopes that can detect
galaxies throughout most of the Hubble volume, i.e. almost the entire observable
Universe. In the next chapters we shall cover some of what’s been discovered
about the evolution of the Universe since the time of the CMB.

Summary of Chapter 2
1. The cosmic microwave background (CMB) is the light from the surface of
last scattering from when the Universe was last opaque. At the time of last
scattering the motion of the photon gas became decoupled from the baryonic
matter.
2. A redshifted black body spectrum is also a black body spectrum.
3. The CMB is expected to be a black body because the photon–baryon
collision rate scales as a−6 (where a is the dimensionless scale factor of the
Universe), which increases faster as a tends to zero than the dynamical
inverse timescale, so thermalization was increasingly easy at early epochs.
The CMB spectrum is observed to be an excellent black body.
4. Baryon number conservation predicts perfect matter–antimatter symmetry
(unless an asymmetry is incorporated into the initial conditions of the
Universe), so baryon number non-conservation is expected in grand unified
theories.
5. The η parameter is the ratio of the number density of baryons to that of
photons. Since CMB photons dominate the entropy density of the Universe,
88
Summary of Chapter 2

η is a measure of the number of baryons per unit entropy and is hence a


measure of the (reciprocal of the) entropy per baryon.
6. Various particle creation and annihilation processes existed in equilibrium in
the early Universe, with relative numbers determined by the Boltzmann
distribution. When the ambient particle energies could no longer support the
reactions, the number densities were frozen out.
7. The neutron–proton ratio predicted in freeze-out is the first step in Big Bang
nucleosynthesis in calculating the abundances of nuclei. The observed
abundances agree with observations, though there are cases where
systematic uncertainties are under discussion.
8. Several lines of evidence point at the need for new physics: the horizon
problem, the flatness problem, the baryon asymmetry problem, the
monopole problem, and the origin of the primordial density perturbations.
The theory of inflation gives a broad framework for new physics that could
resolve these problems.
9. Inflation hypothesizes a scalar field φ (assumed roughly the same
everywhere in the observable Universe) with associated energy density
V (φ). The field φ is offset from the minimum of V (φ) and evolves
sufficiently slowly to the minimum that the φ̈ terms can be neglected. This is
known as the slow-roll approximation. However, the shape of the inflation
potential is not yet known. Many varieties of inflation theories exist, each
with different assumptions about the shape of the V (φ) function.
10. Inflation implies an exponential phase of expansion in the early Universe,
with an effective equation of state w = −1/3 to −1 (where a cosmological
constant has w = −1, and w < −1/3 is necessary to solve the horizon
problem). This cooled the Universe but the decay of the energy in the field
into particles reheated the Universe; this is also a likely time for the creation
of the matter–antimatter asymmetry in the Universe (baryogenesis).
11. Inflation predicts a roughly scale-invariant power spectrum of initial density
perturbations, P (k) ∝ k. This is in good agreement with observations. The
slight departures from scale-invariance are imprinted at the end of inflation
when the slow-roll approximation breaks down. These departures depend on
the shape of the potential near the minimum.
12. Inflation also predicts adiabatic perturbations that are a Gaussian random
field, i.e. with random phases.
13. The power spectrum of the CMB is measured using the Cl spectrum.
14. The CMB shows a dipole due to our motion relative to the cosmic rest frame.
15. The higher-order multipoles of the CMB show approximate scale-invariance;
on smaller scales, the acoustic peaks in the Cl spectrum are due to
resonances within the sonic particle horizon at the time of recombination.
16. The position of the first peak is determined by the sonic horizon particle size.
The relative strength of the second peak is sensitive to the entropy per
baryon.
17. Cosmological parameters derived from the CMB nevertheless have several
degeneracies that can be resolved by incorporating other experimental
constraints.
89
Chapter 2 The cosmic microwave background

18. On the smallest scales, the Silk damping effect damps down the acoustic
peaks. This effect is due to photon diffusion during the process of
decoupling, between the times when photons were tightly coupled to matter
and when the Universe became transparent.
19. The Sachs–Wolfe effect applies during the passage of photons through the
Universe since the time of last scattering. The change in gravitational
potentials as photons pass through the Universe leaves imprints that are
detectable through stacking analyses of galaxy clusters in CMB maps. The
strength of the late-time Integrated Sachs–Wolfe effect is sensitive to the
evolution of dark energy.
20. The reionization of the Universe by the first luminous objects in the
Universe (stars or accreting black holes) also increased the optical depth to
Thomson scattering experienced by CMB photons. This optical depth is a
free parameter in fitting the acoustic peaks of the CMB.
21. Primordial gravitational waves (also known as tensor modes) generate
B-mode polarization. The spectral index and intensity of the gravitational
wave background is determined by the same inflation parameters that predict
the departure from scale invariance in the scalar perturbations, so detecting
these primordial gravitational waves (either in B-mode CMB maps or in
gravitational wave observatories) would be a direct test of the inflationary
framework.
22. Dark energy models generalize the cosmological constant, whose effective
equation of state parameter is w = −1, and consider the possibility of w
varying with time. This could occur, for example, if Λ is associated with a
scalar field, analogous (but not identical) to the inflaton.

Further reading
• Audio representations of CMB acoustic peaks can currently be found on Mark
Whittle’s web pages ([Link] and John
Cramer’s web pages ([Link]
• For more details on energy and momentum conservation in general relativity,
see Lambourne, R., 2010, Relativity, Gravitation and Cosmology, Cambridge
University Press.
• It’s also possible for advanced undergraduate-level students to achieve a deeper
knowledge of general relativity, though this is usually outside the scope of most
undergraduate degrees. For readers who would like to try this immensely
rewarding intellectual adventure, we would recommend an accessible text on
general relativity such as Hobson, M.P., Efstathiou, G.P. and Lasenby, A.N.,
General Relativity: An Introduction for Physicists.
• The discovery papers of the CMB are: Penzias, A.A. and Wilson, R.W., 1965,
‘A measurement of excess antenna temperature at 4080 Mc/s’, Astrophysical
Journal, 142, 419; Dicke, R.H., Peebles, P.J.E., Roll, P.G. and Wilkinson, D.T.,
1965, ‘Cosmic black-body radiation’, Astrophysical Journal, 142, 414.
• Steigman, G., 2007, ‘Primordial nucleosynthesis in the precision cosmology
era’, Annual Review of Nuclear and Particle Systems, 57, 1 (available at
arXiv:0712.1100).
90
Summary of Chapter 2

• For more on the statistical physics that connects the microscopic world of
molecules and collisions with the macroscopic world of densities and
pressures, see, for example, Mandl, F., 1988, Statistical Physics, Wiley.
• For more about neutrino astronomy, see Spiering, C., 2008, ‘High energy
neutrino astronomy: status and perspectives’, Proceedings of the 4th
International Meeting on High Energy Gamma-Ray Astronomy, AIP
Conference Proceedings Vol. 1085, pp. 18–29 (also available at
arXIv:0811:4747), or Learned, J.G. and Mannheim, K., 2000, ‘High-energy
neutrino astrophysics’, Annual Review of Nuclear and Particle Science, 50,
679.
• For more on Fourier series and transforms, see, for example, Gillett, P., 1984,
Calculus and Analytic Geometry, Houghton Mifflin Harcourt.
• For more about the physics of the CMB, try Hu, W. and Dodelson, S., 2002,
‘Cosmic microwave background anisotropies’, Annual Review of Astronomy
and Astrophysics, 40, 171, or Hu, W. and White, M., 1997, ‘A CMB
polarization primer’, New Astronomy Reviews, 2, 323.
• A more substantial introduction to many themes in this chapter can be found in
the graduate-level text Peacock, J.A., 1999, Cosmological Physics, Cambridge
University Press.

91
Chapter 3 The local Universe
‘When I use a word,’ said Humpty Dumpty, ‘it means exactly what I intend
it to mean.’ ‘The question is,’ said Alice, ‘can you use words this way?’
‘The question is,’ said Humpty, ‘who is to be the Master?’
Lewis Carroll, Alice’s Adventures in Wonderland

Introduction
You may have been surprised at the number of fundamental unknowns in the
basic processes in the early Universe. In this chapter we’ll cover our cosmic
neighbourhood and find out some of what is, and what isn’t, understood even
here. We’ll cover some of the phenomenology and terminology (a necessary evil)
and some more tools of precision cosmology, and also start assembling the
evidence for how the birth of galaxies like our own Milky Way happened.

3.1 Evidence for dark matter


Some of the best direct evidence for dark matter is in the local Universe. The
rotation rate of a spiral galaxy can be used to infer the mass enclosed within a
radius r measured from the centre of that galaxy, since v 2 /r = G M (r)/r2 , where
M (r) is the mass at radii less than r. The velocity of stars and gas as a function of
radius, v(r), is known as the rotation curve of the galaxy. If there were no dark
matter, the rotation curve should agree with predictions made from the sum of
stars (detectable in visible light) and gas (detectable from radio fine-structure
transitions that we shall meet in Chapter 8). Figure 3.1 shows the rotation curve of
the galaxy NGC 1560. There is a clear and highly statistically significant deviation
from the prediction using only stars plus gas, implying a dominant dark matter
component at large radii. In due course there will be similar compelling evidence
from our own Galaxy: the planned launch of the European Space Agency Gaia
mission in the coming decade will map the kinematics of stars in our own Galaxy
and infer any substructure (i.e. clumpiness) in its dark matter distribution.
One way of avoiding the conclusion of a dominant dark matter component is to
suppose that the gravitational force law is different at large separations. We shall
briefly discuss this possibility in Chapter 7. More evidence for dark matter comes
from galaxy motions in clusters and from gravitational lensing which we shall
also meet in Chapter 7. The latter provides significant challenges to modified
gravitational force laws. As we’ve seen, the existence of dark matter is also
supported by the relative strengths of the acoustic peaks in the CMB.
The current space density of stars and gas is lower than the baryon density Ωb,0
inferred from primordial nucleosynthesis and the CMB acoustic peaks, so some
‘dark’ matter must be baryonic, perhaps in the form of neutral gas, brown dwarfs
or MACHOs (see Chapters 7 and 8). However Ωm,0 cannot be mainly baryonic, or
it would violate the primordial nucleosynthesis constraints, so what is it? Despite
the fact that neutrinos are now known to have mass, the estimated mass density of
,
2 mi
Ων h 1 < 0.0076, (3.1)
93.5 eV
92
3.1 Evidence for dark matter

100

NGC 1560

circular velocity/km s−1


50

0 2 4 6 8 10
(a) (b) radius/kpc

Figure 3.1 (a) An optical image of the approximately edge-on spiral galaxy NGC 1560 and (b) the galaxy’s
rotation curve. The data points show the observed velocities. The dashed curve shows the predicted contribution to
the velocities from the inferred mass of stars, while the dotted line shows the contribution from the inferred mass of
the gas. These are not enough on their own to explain the observed velocities. The dash–dotted line is the dark
matter halo required to make up the remaining mass.
(where the sum is over the neutrino species) is too low to account for all the dark
matter.
Two leading dark matter particle possibilities are the neutralino and the axion.
The neutralino is a partner particle to the neutrino predicted in supersymmetry,
which is an extension of the standard model of particle physics. The neutralino is
an example of a weakly interacting massive particle (WIMP). If neutralinos exist
in cosmologically-significant numbers, they may be found by direct detection
experiments such as the one at Boulby Mine in the UK, through detecting the
recoils of nuclei colliding with WIMPs. The interaction probability between a
WIMP dark matter particle and normal matter is predicted to be extremely low,
but with a sufficient flux of particles through the Earth (as the Solar System
traverses the Galaxy) very rare collisions may be detected. As yet there are
no uncontested claims of WIMP detection from these experiments. Axions,
meanwhile, are particles proposed in quantum chromodynamics with a very low
mass (10−3 –10−4 eV) that couple only very weakly to electromagnetism. They
are predicted to decay into two photons in the presence of a strong magnetic field.
Some recent laboratory claims of axion direct detection exploiting this decay (or
its time reverse) were unfortunately later withdrawn or proved unrepeatable in
other experiments.
In 2008 there was a flurry of excitement over the cosmic ray positrons detected by
the PAMELA satellite (Payload for Antimatter Matter Exploration and
Light-nuclei Astrophysics), as well as the ATIC (Advanced Thin Ionization
Calorimeter) balloon-borne measurements of electrons and positrons (which
ATIC couldn’t distinguish). Both experiments inferred a peak in the spectrum of
cosmic ray electrons at about 500 GeV. Could this be the signature of dark matter
93
Chapter 3 The local Universe

annihilation? The Fermi gamma-ray telescope is also sensitive to cosmic ray


electrons, and though cosmic rays are usually a nuisance that needs to be carefully
identified and removed from data, in this case Fermi could act as a powerful
cosmic ray detector in its own right. Unfortunately, Fermi failed to confirm the
PAMELA/ATIC signal. Although there is an excess relative to the predictions of
cosmic ray propagation through the Galaxy, astrophysical processes could mimic
this excess. It’s not impossible, however, that future cosmic ray observations
could detect dark matter annihilation processes.
It’s usually assumed that dark matter interacts almost exclusively through gravity,
but one can’t rule out a set of forces only experienced by dark matter. For
Ackerman, L., Buckley, example, a ‘dark electromagnetism’ has been proposed. In this model there
M.R., Carroll, S.M. and would be equal numbers of dark-negative and dark-positive particles, but their
Kamionkowski, M., 2009, annihilations would be suppressed if the dark equivalent of the fine structure
Physical Review D, 79, 023519. constant is sufficiently small. Dark matter galaxy haloes would be a sort of dark
plasma in this model. Occam’s razor has perhaps precluded much discussion of
this imaginative proposal, but it still highlights how little we know of the dark
sector in general.

3.2 The Hubble tuning fork


Spiral galaxies are not the only cosmologically-relevant population. Edwin
Hubble, the discoverer of the expansion of the Universe, is also remembered for
Astronomers could just as well making the first attempt at classifying galaxy morphologies. His division of spiral
use the word ‘structure’, but and elliptical galaxies into a hypothetical sequence is shown in Figure 3.2, with
terms like ‘morphology–density ellipticals called early-type and spirals called late-type. However, be warned that
relation’ are in such widespread there is no observed evolution from early to late. This misleading terminology is
use in astronomy that we need to still in use for historical reasons only. In fact, if anything there is evolution in the
use the standard convention in opposite direction, since spiral–spiral galaxy mergers are likely to generate an
this book. eventual merger product with an elliptical morphology. We’ll see later that spirals
and ellipticals are found in different places, and have different colours and
different contents.
Galaxy classification is usually done by specialists, but a curious development has
been the adaption of artificial neural networks to classify galaxies. This is a way
of specifying a computer algorithm for classifying galaxies that can be compared
to a human’s classifications, and iteratively altered until its outputs match those
of a human. Neural nets can reproduce galaxy classifications with the same
level of disagreement with human classifiers as there is disagreement between
human classifiers themselves. Indeed, neural nets may be better than humans,
but we would never know. The algorithms found and used by neural nets are
unfortunately not often easily interpretable, but other approaches to ‘machine
learning’ can give interpretable classification schemes. Nevertheless, automatic
classification is still not completely reliable, especially when there are irregular or
disturbed galaxies. The Galaxy Zoo project has enrolled the general public to
make morphological classifications of all the galaxies in Sloan Digital Sky Survey
(SDSS) data release 6 — around a million galaxies. Astrophysics is unusual in
that the amateur community makes a real scientific impact, from supernova and
comet searches, to SETI@Home and Galaxy Zoo.

94
3.2 The Hubble tuning fork

Figure 3.2 Digitized Sky


Survey images of local
galaxies illustrating points on
the Hubble tuning fork
for galaxy classification.
Irregular galaxies lie outside
this scheme.

In Exercise 1.6 in Chapter 1 we saw that the surface brightness of an object


(i.e. its flux per square degree) declines as (1 + z)4 . If redshift is cosmological,
we should expect this surface brightness variation in galaxies, but if for example
‘tired light’ models are responsible for redshift (Chapter 1), we should not. This is See, for example, Sandage, A.,
known as the Tolman surface brightness test. The attraction of this test is that 2009, Astronomical Journal,
the (1 + z)4 prediction is independent of the cosmological parameters Ωm,0 , ΩΛ,0 , 139, 728.
H0 and so on. The result is that the surface brightness of elliptical galaxies
between z = 0 and z = 0.85 declines as (1 + z)2.80±0.25 in the R-band filter The division between ‘optical’
(a broad optical filter used in optical astronomy at about 550–750 nm), and and ‘near-infrared’ is often taken
(1 + z)3.48±0.14 in the I-band filter (about 700–900 nm). However, this span of to be at around 1 µm, despite
cosmic time is enough to expect evolution in the stellar populations of galaxies, as the fact that human eyesight
we shall see in Chapter 4. The colour-dependent surface brightness evolution is loses sensitivity quickly above
exactly as expected from the combination of (1 + z)4 dimming combined with 0.75 µm. Perhaps this is because
stellar population evolution models. in observational astronomy,
A population of galaxies entirely overlooked by the Hubble scheme, indeed wavelengths > 1 µm require
unknown to Edwin Hubble, is low surface brightness (LSB) galaxies. These are different detector technology
defined as having surface brightnesses fainter than 22.5 magnitudes per square and observing techniques to
arcsecond (recall that astronomical magnitudes m are related to fluxes Sν by wavelengths < 1 µm.
95
Chapter 3 The local Universe

m = −2.5 log10 Sν + constant). Their rotation curves imply a much higher


mass-to-light ratio (mass divided by luminosity) than is typical in the rest of the
galaxy population, and are often dominated by dark matter at all radii. This makes
them very different to, say, globular clusters of stars. LSB galaxies are often also
very gas-rich and have low metallicity (i.e. low abundance of elements heavier
than hydrogen and helium — recall the convention in astronomy to call all of
these ‘metals’), as well as sometimes having low star formation rates for their
gas contents, implying that either they have yet to start the bulk of their star
formation, or this star formation is suppressed.

3.3 Spiral galaxies and the Tully–Fisher relation


The rotation of spiral discs turns out to be closely related to other properties of
spiral galaxies. In 1977, R. Brent Tully and J. Richard Fisher discovered that
the velocity widths of spiral galaxies, Δv, correlate very strongly with their
luminosities in the B-band filter (about 400–500 nm). Velocity widths can be
measured from optical spectroscopy (via Doppler shifts of absorption lines) or at
radio wavelengths (again via Doppler shifts of the 21cm-wave emission line of
neutral hydrogen). These correlations have been found at other wavelengths too,
and the correlations are typically stronger in the near-infrared, such as in the
K-band filter (2–2.5 µm). The luminosity scales as L ∝ (Δv)α , where the
parameter α is typically between 3 and 4, depending on the wavelength at which
the luminosity is measured.
We have now mentioned so many astronomical filters that we had better give you
a full set, shown in Figure 3.3. There are also filters at longer wavelengths in
what is sometimes known as the ‘mid-infrared’. Ground-based astronomy
tended to use letter names for these too, but with the advent of space-based
mid-infrared astronomy this has been more or less superseded by simply quoting
the wavelength in microns. There is no clear convention for which wavelengths
are termed ‘near’- or ‘mid’-infrared (usage varies around 3–5 µm for the division),
but 60 µm is often referred to as ‘far-infrared’.
What physical processes give rise to the Tully–Fisher relation? Suppose that the
rotation of the galaxy is Keplerian, i.e. the period P at the edge of the galaxy is
related to the radius R by P 2 ∝ R3 /M , where M is the galaxy mass. The
rotation velocity is v = 2πR/P , so we can write the rotation width Δv = 2v as
M ∝ R(Δv)2 . It’s observed that the mass-to-light ratio of most spiral galaxies is
roughly constant, i.e. M/L 1 constant, so L ∝ R(Δv)2 . If we assume that
galaxies have the same surface brightness, then L/R2 must be constant, or
L ∝ R2 , but since L ∝ R(Δv)2 we must have R ∝ (Δv)2 so L ∝ (Δv)4 . In
practice, the surface brightness has a weak dependence on luminosity. For
normal spiral galaxies on the Hubble sequence, the luminosity empirically
appears to scale with the size of the galaxy as L ∝ R2.8 in the B-band, implying
L ∝ (Δv)3.1 .
The trouble is, the assumptions built into this quick calculation are clearly wrong.
The mass-to-light ratio should be roughly constant if the mass is dominated by
stars, but as we’ve seen, the mass is in fact dominated by dark matter. Also, we’ve
already met galaxy populations that have much lower surface brightnesses —

96
3.3 Spiral galaxies and the Tully–Fisher relation

transmission
transmission

Z Y J H K
transmission

u g r i z
transmission

U B V R I
00

00

00
10 00
0

0
00

00

00
40

60

80
90

20

30
λ/Å

Figure 3.3 A selection of astronomical filters. The bottom row shows the UBVRI system at use at the Kitt Peak
National Observatory. The next row up shows the Sloan Digital Sky Survey filters. The second from top shows the
ZYJHK infrared filters in use at the United Kingdom Infrared Telescope, while the top row shows the filter set of
the COMBO-17 survey (Chapter 4). The y-axis gives the relative transmission from the top of the atmosphere to
the detector in the telescope, in arbitrary units.
the ‘low surface brightness’ galaxies. These also appear to follow the same
Tully–Fisher relationship, but with a bigger scatter. The correlation is therefore
somewhat surprising, and the low scatter about the best fit for high surface
brightness galaxies is doubly surprising, so reproducing the observed Tully–Fisher
relation has been a key constraint of models of spiral galaxy discs. Empirically,
the star formation rate in galactic discs is proportional to the gas density ρ to the
power n, where n 1 1–2, but star formation is a very complex process and the
theoretical underpinning of this relation (known as the Schmidt law) is still
sketchy.
Having said that, we’ll see below that the Tully–Fisher relation is also empirically
useful because it can be used to determine the luminosity distance to a galaxy,
from which the Hubble parameter can be found.
The profiles of spiral discs have a surface brightness I that varies with radius
typically as I ∝ exp(−r/rscale ), where rscale is in this case known as the scale
length. Similarly, the disc of our Galaxy (and indeed all spiral galaxies) has a
density of stars that varies exponentially with the vertical height above (or below)
the disc: I ∝ exp(−h/hscale ), where hscale is known as the scale height of the
disc. This scale height appears to vary with the type of stars: the youngest stars
have hscale ∼ 100 pc, while the oldest stars have hscale ∼ 1.5 kpc. This latter
population is also known as the thick disc. It’s not known what generated

97
Chapter 3 The local Universe

the thick disc in our Galaxy, though these stars are observed to have a lower
metallicity so may represent a relic from an earlier phase of the formation of our
Galaxy, or perhaps they are a fossil of a disruption that our Galaxy underwent
early in its history from another passing galaxy. Further out, stars are found
throughout the halo of the Galaxy. These halo stars are typically much less
enriched in heavy elements, suggesting that they are from a very early phase in
the formation of our Galaxy.
The discs of spiral galaxies are far more abundant in gas than elliptical galaxies,
which we’ll meet in the next section. Hydrogen (mainly in the form of
molecular H2 , but also neutral H I and ionized H II) dominates the gas mass of
galaxies, but very often the easy-to-measure CO emission at radio wavelengths is
used as a substitute measure for the gas content. CO molecules have a J = 1 → 0
transition at 115 GHz, and other transitions at integer multiples of this frequency.
The CO-to-H2 abundance is sometimes taken as a fixed quantity in extragalactic
astronomy, but the CO-to-H2 conversion factor (known as XCO ) is only accurate
to ±50% at 1σ (see the further reading section for more information). We’ll
see in later chapters that many galaxies are strongly luminous at far-infrared
wavelengths, caused by thermal radiation from dust.

3.4 The fundamental plane of elliptical galaxies


Curiously, elliptical galaxies have a similar relation between velocity
dispersion σv and luminosity L, known as the Faber–Jackson relation: L ∝ σvα
with α 1 3–4. However, in this case the velocities are not regular and confined to
a disc, but are rather more like a gas of gravitationally-bound particles.
In fact, this is a special case of a more general relation in elliptical galaxies,
known as the fundamental plane: L ∝ I0x σvy , where I0 is the surface brightness
within a given radius (conventionally the radius within which half the total light is
generated, known as the half-light radius), and x 1 −0.7 and y 1 3–4.
Empirically, elliptical galaxies appear to follow the de Vaucouleurs law or
de Vaucouleurs profile:
! F
I(r) = I0 exp −(r/r0 )1/4 , (3.2)
where I(r) is the surface brightness, r is the radius measured from the
centre, I0 is a normalization that may vary from galaxy to galaxy, and r0 is a
characteristic scale length for that particular galaxy. Integrating this, the total
luminosity comes out as L ∝ I0 r02 . The half-light radius is also proportional to r0 .
Note that this profile is sometimes defined with −(r/r0 )1/4 − 1 in the exponent
instead of just −(r/r0 )1/4 , though this is just equivalent to changing the definition
of I0 . Sometimes a slightly more general law is used in which the 1/4 index is
changed to 1/n, where n is a free parameter; this is known as a Sérsic profile.
Nevertheless, L ∝ I0 r02 still holds in all such profiles, though with different
constants of proportionality.
Exercise 3.1 Show that any surface brightness law of the form
I(r) = I0 f (r/r0 ) has a total luminosity that is proportional to I0 r02 . ■
Again, a simple plausibility argument can be given for the existence of the
fundamental plane if we accept the two-parameter characterization of elliptical
98
3.5 Clusters of galaxies

galaxies in Equation 3.2. If we assume that the galaxies are gravitationally bound,
then a dimensional analysis (or the virial theorem — see below and the further
reading section) implies σv2 ∝ M/r0 . Now suppose that there is a mass-to-light
ratio that is a weak function of mass, so M/L ∝ M a . Together with L ∝ I0 r02
this can be shown to imply that
L1+a ∝ σv4−4a I0a−1 , (3.3)
which fits the observations provided that a 1 1/4.
The fundamental plane is also sometimes known as the Dn –σ relation. The
parameter Dn is the diameter in which a galaxy’s surface brightness is larger than
a given constant. This diameter will depend approximately on the fundamental
plane parameters as power laws: Dn ∝ r0α I0β for some constants α and β. If the
reference surface brightness is chosen carefully, one can find a definition for Dn
in which it is also proportional to some power of σv , and so approximates the
fundamental plane. In practice the Dn –σ relation is used as a distance indicator:
knowledge of σv and a measurement of the angular size gives an angular diameter
distance to the galaxy.

3.5 Clusters of galaxies


The largest gravitationally bound structures in the Universe are galaxy
clusters. These assemblages are virialized, i.e. they have settled into a stable
gravitationally-bound system for which the virial theorem applies: the average
kinetic energy is equal to −1/2 times the average potential energy. This theorem
is widely used in astrophysics, and a source for a proof of this useful relation is
given in the further reading section. In this context, the virial theorem gives us
direct evidence for dark matter, since the average kinetic energy of the constituent
galaxies is too high to be accounted for by the gravitational potential of the visible
matter.
You may be surprised to hear that the dominant baryonic component of galaxy
clusters (by a factor of ∼ ×10 to ×15) is not the galaxies themselves but a gas
plasma. The potential well of the cluster is so deep that the gas that has fallen in
and been entrained is heated to hundreds of millions of kelvins (kT ∼ 10 keV),
and emits X-rays through bremsstrahlung with luminosities > 1035 W
at 0.5–2 keV. (Accelerating or decelerating charges radiate electromagnetic
waves, and in this case the acceleration/deceleration is during collisions between
electrons and ions at thermal velocities.) Since thermal emission depends on
collisions between particles, the X-ray luminosity scales as the number density n
squared. The luminosity will also be proportional to the volume of gas. If we
define a characteristic size r0 for the cluster (a suitable definition will be given in
Section 7.7 of Chapter 7), the X-ray luminosity is LX ∝ n2 r03 . Meanwhile,
though, the mass of the baryonic gas will be given by the density times the
3/2 1/2
volume, so Mgas ∝ nr03 . Putting these together, we have that Mgas ∝ r0 LX .
The total mass of the cluster (typically 1014 –1015 M) ) will be governed by
the equation of hydrostatic equilibrium. We shall cover this in more detail in
Section 7.4, but for now we shall just quote the result that isothermal pressure
balances lead to Mtotal ∝ r0 . Therefore the fraction of baryonic gas must be

99
Chapter 3 The local Universe

1/2 1/2
Mgas /Mtotal ∝ r0 LX . So, if we know the distance to a cluster, we can calculate
its size and X-ray luminosity, and find the baryonic mass fraction. This is further
evidence for dark matter, i.e. the gas is not massive enough on its own to entrain
itself in hydrostatic equilibrium. It’s often assumed that the baryonic mass ratio
shouldn’t depend on redshift, since the mass is accreted into the cluster purely
gravitationally, regardless of whether it’s baryonic or not. This leads to another
constraint on the cosmological parameters: if the angular size is θ0 and the X-ray
flux is FX , we have that θ0 = r0 /dA (where dA is the angular diameter distance),
while LX ∝ FX d2L (where dL is the luminosity distance). If Mgas /Mtotal is
√ √
constant, then r0 LX must be constant, which equals θ0 FX (1 + z)1/2 r3/2 ,
where r is the proper motion distance (using Equations 1.50) or the comoving
distance if we’re assuming a spatially-flat universe. The expression for r depends
on the cosmology (Equation 1.44), so the assumed constancy of Mgas /Mtotal can
be used to derive cosmological parameters.
Galaxy clusters were first discovered in optical photographic imaging surveys.
Abell, G.O., 1958, Astrophysical The Abell cluster catalogue was compiled by an exhaustive visual inspection of
Journal Supplement, 3, 211. photographic plates. Clusters were classified according to their ‘richness’, i.e.
their density of galaxies, with Abell richness class 0 being the sparsest and 5 the
densest. Abell found galaxy clusters out to around z = 0.2, though with some
incompleteness in the catalogue at the higher redshifts and lower richness classes.
X-ray sky surveys and modern digital optical sky surveys have both been used to
find galaxy clusters to higher redshifts, and in the future many more are expected
to be found in the next generation of CMB maps using the Sunyaev–Zel’dovich
effect (Section 3.6).
The number counts of clusters are also a cosmological constraint. The formation
of virialized clumps of dark matter can be calculated using a knowledge of the
matter power spectrum and the cosmological model. If the Universe underwent an
accelerated expansion in recent history (e.g. z < 0.5), this will have a strong
effect on the gravitational collapse of clumps. The mass function of clusters is
therefore a strong function of the dark energy parameter w(z).
At the centre of a galaxy cluster is usually a large galaxy, usually the brightest in
the cluster (known as the brightest cluster galaxy or BCG). Often this galaxy is a
giant elliptical; otherwise it is a very large S0 galaxy known as a D galaxy (the D
stands for ‘diffuse’, since they have unusually extended envelopes) or cD galaxy
(a larger variant of D galaxies). These galaxies sometimes have multiple cores,
suggesting that they are built up by ongoing galaxy mergers.

3.6 The Sunyaev–Zel’dovich effect


The CMB photons passing through a foreground galaxy cluster can be scattered
by the electrons in the gas through the Compton scattering process. This will
cause a change in the temperature of the
) CMB photons in the Rayleigh–Jeans
tail of the CMB spectrum: ΔT /T ∝ ne dY ∼ ne ΔY for a fixed cluster gas
temperature T , where ne is the electron number density and ΔY is the path
length )through the cluster. Meanwhile, the X-ray luminosity will scale as
LX ∝ n2e dY ∼ n2e ΔY (see above). So from comparing the X-ray luminosity
with the temperature decrement, we can eliminate ne to find ΔY. If the cluster is
symmetrical (which will be true on average if not necessarily in an individual
100
3.6 The Sunyaev–Zel’dovich effect

case), the length ΔY should be the same as the angular size, so we can derive an
angular diameter distance to the cluster. The Hubble parameter can be derived
from cluster Sunyaev–Zel’dovich measurements, and though the constraints are
not currently as tight as other determinations (61 ± 3 (random) ± 18 (systematic)
km s−1 Mpc−1 , where the systematics are from, for example, uncertainties in the
radial temperature profiles in the cluster), it is important to have independent
checks.
The Compton scattering conserves the number of CMB photons, but redistributes
their energies. The CMB photons in the Rayleigh–Jeans regime are suppressed,
but on the other side of the black body peak the number of photons is enhanced.
This characteristic wavelength-dependent suppression and enhancement, known
as the Sunyaev–Zel’dovich effect (or S–Z effect), will be searched for in the
next generation of CMB maps to make comprehensive all-sky galaxy cluster
surveys. Figure 3.4 shows the predicted absorption and emission from the
Sunyaev–Zel’dovich effect.

wavelength/mm
10 5 2 1 0.5
500
0.2

200
0.1
ΔIν /MJy sr−1
Iν /MJy sr−1

100

50
0
kinetic SZE

20
thermal SZE
−0.1
20 50 100 200 500 100 200 300 400 500
frequency/GHz frequency/GHz

Figure 3.4 The Sunyaev–Zel’dovich effect on the intensity Iν of the CMB. The left panel shows an exaggerated
demonstration of the effect, while the right panel shows the change in the CMB intensity from the thermal and
kinetic S-Z effect for an electron temperature of 10 keV, a Compton y-parameter of 10−4 and 500 km s−1 peculiar
velocity.

An additional S–Z effect can occur if the electrons have additional kinetic
energies from bulk motion of the cluster relative to the cosmic rest frame, though
this effect (termed ‘kinetic S–Z’ to distinguish it from the ‘thermal S–Z’ discussed
in this section) is much smaller, as shown in Figure 3.4. A more lengthy
calculation of the S–Z distortion (see the further reading section) gives
*
ΔT kT
= f (x) y = f (x) ne σT dY, (3.4)
TCMB me c2
101
Chapter 3 The local Universe

where x = hν/(k TCMB ) is known as the dimensionless frequency, and the


parameter y defined in the above equation is the Compton y parameter. The
parameter σT is the Thomson scattering cross section (if cross sections are
unfamiliar, see the box below), and the function f (x) is
C x DC D
e +1 xex
f (x) = x x −4 , (3.5)
e −1 ex − 1
provided the gas temperature is 0 T CMB . The important thing to note is that the
distortion is redshift-independent.

Cross sections
Cross sections express the probability in quantum mechanics that objects
interact, such as a photon being absorbed by a hydrogen atom. If atoms were
classical spheres, the probability that the photon is absorbed would be 1 if
the photon passed into the sphere, and 0 if it did not. The cross section σ in
this case would be σ = πr2 , where r is the radius of the sphere. This is the
size of the target that the photon must hit if it is to be absorbed. However,
the atom does not have a sharp boundary. Instead, we can define the total
cross section of N atoms to be R/S, where S is the number of incident
photons per unit area per unit time, and R is the absorption rate, i.e. the
number of absorptions per unit time. The cross section for one atom would
then be σ = R/(N S).

3.7 The morphology–density relation


Galaxy morphologies turn out to be very strongly dependent on their
environments: elliptical galaxies are often (but not always) found in galaxy
clusters, while spiral galaxies are often (but not always) in the ‘field’ (i.e. not in
clusters). This has been quantified in the morphology–density relation shown in
Figure 3.5.
What could cause this strikingly strong effect? Clearly, environment must have
had a strong effect on the evolution of galaxies. As yet it’s still not clear what
causes dominate, though plausible contenders are ram pressure stripping of the
gas from galaxies (which removes the fuel for star formation and so shuts it
down), or perhaps the removal of gas from galaxies through tidal forces from the
The hot gas, spheroidal cluster (when the galaxy falls into the cluster), or from the tidal disruption caused
morphology, redder stellar by frequent, close, fast passes of other galaxies in the rich environment (termed
populations and cluster galaxy harassment). In all cases, removal and/or disruption of gas-rich spiral
environments have led elliptical discs will tend to transform morphologies. Indeed, elliptical galaxies tend to have
galaxies occasionally to be very little cold gas from which stars could form (though they do contain hot gas)
referred to in technical and to contain very little dust. We’ll see in Chapter 4 that their red colours are
conference proceedings as related to their older ages relative to star-forming discs in spirals. Numerical
‘warm, round, pink and simulations of similarly-sized spiral–spiral mergers show that the randomization
friendly’. of the stars’ velocities eventually results in elliptical-like morphologies.

102
3.7 The morphology–density relation

0.9
E
S0
0.8
S + Irr
0.7

0.6
fraction of population

0.5

0.4

0.3

0.2

0.1

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0


log ρproj

Figure 3.5 The morphology–density relation. The fractions of elliptical, S0 and spiral + irregular galaxies are
plotted as a function of the logarithm of the projected galaxy density.
However, this doesn’t appear to be the whole story as to how elliptical galaxies
themselves form. The tightness of the fundamental plane relations and their old,
red colours (consistent with highly evolved stellar populations) suggest instead a
formation early in the history of the Universe (e.g. redshifts z > 2) in a large
starburst (a strong burst of star formation), followed by passive stellar evolution
(i.e. no subsequent star formation, so the galaxy colours change purely through
the passage of their stars through the Hertzsprung–Russell diagram). This became
known as the monolithic collapse model and reproduced the fundamental plane The classic reference for this
and the colour–magnitude diagrams of local elliptical galaxies. This model also model is Eggen, O.J.,
explained why the metal-poor stars in the Galactic halo have highly elliptical Lynden-Bell, D. and Sandage,
orbits: they were formed when the initial gas cloud was in the process of collapse. A.R., 1962, Astrophysical
Metal-rich stars in the disc, meanwhile, are formed later in this model. The Journal, 136, 748.
alternative is the hierarchical structure formation model in which, as we’ll see,
galaxies are formed piecewise by the accretion and influence of neighbours. In
particular, numerical simulations predict that the merger of spiral discs will
result in an elliptical galaxy morphology. The relative dominance of these two
mechanisms is currently the subject of some debate, and we shall return to this
nature versus nurture issue in later chapters. The fundamental difficulty is
that while the evolving distribution of dark matter can be (reasonably) easily
calculated, because it only interacts through gravity, the evolving distribution of
baryons such as stars and gas is much more complicated.
The following exercises will derive the Jeans mass: the mass above which an
object is unstable to gravitational collapse.

103
Chapter 3 The local Universe

Exercise 3.2 Imagine a uniform sphere of radius R and density ρ. Write down
the gravitational potential energy of a shell inside this sphere of radius r and
thickness dr, then integrate these shells to show that the gravitational binding
energy of the sphere is
−3 GM 2
EGR = , (3.6)
5 R
where M is the total mass of the sphere.
Exercise 3.3 According to the virial theorem, an object is stable against
gravitational collapse when 2EK + EGR = 0, where EK is the kinetic energy.
Assuming that the sphere is made of an ideal gas, show that the condition for
gravitational collapse can be stated as
C D C D
5kT 3/2 3 1/2
M> . (3.7)
Gmp 4πρ
This mass threshold is known as the Jeans mass. Put in the numbers from
previous chapters to show that a baryonic overdensity of about a million solar
masses was unstable to collapse at the epoch of recombination. ■
This mass is tantalizingly similar to the masses of present-day globular
clusters, which are some of the oldest bound objects known in the Universe.
However, it’s unlikely that globular clusters formed as early as this, because of
observed relationships between globular clusters and the galaxies that they
inhabit. Nevertheless, you’ll see in Exercise 6.6 of Chapter 6 that seeds of around
this mass at redshifts higher than all known quasars are needed to explain the
existence of supermassive black holes in quasars. The main difficulty, however, in
making predictions is that baryons can do much more than just collapse under
gravity, as we’ll see in later chapters.

3.8 The Butcher–Oemler effect


Another hint at the formation histories of galaxies comes from the evolution in the
colours of cluster galaxies: at higher redshifts (e.g. z ∼ 0.4), there appears to be a
larger fraction of blue galaxies in the cluster cores than are seen in the cores of
clusters in the local Universe. This is known as the Butcher–Oemler effect, after
its discoverers. Since young massive (short-lived) stars have higher temperatures
and bluer colours, this suggests more star formation at higher redshifts in cluster
core galaxies. This seemed to be a tantalizing clue on how environments affect the
evolution of galaxies — was star formation shut down in rich environments in the
recent past?
There’s been some debate about the reality of this effect regarding underlying
See, for example, the discussion biases in the samples of objects studied, and the underlying causes of those effects
and review in Haines, C.P. et al., that are present. We shall see in Chapters 4 and 5 how star formation in the
2009, Astrophysical Journal, Universe has evolved; the jury is still out on exactly how this evolution depended
704, 126. on galaxy environment.

104
3.9 The cooling flow problem

3.9 The cooling flow problem


We’ve already described how galaxies grow by mergers. Why don’t the galaxies
in galaxy clusters all merge to make one giant galaxy? One reason is that the
surrounding galaxies are moving too fast, but what about the intra-cluster gas
mentioned in Section 3.5? It needs to cool in order to be accreted, but the cooling
time for the gas is longer than the current age of the Universe except in their
cores. However, many clusters are observed to have dense cores, and the cooling
time in the densest regions is shorter: we’ve already argued that the X-ray
luminosity is proportional to the density n squared, so the energy loss per unit
mass must be proportional to n. In these dense regions, the cooling timescale is
still expected to be larger than the gravitational free-fall timescale (i.e. the time
for an object to fall from rest to the centre of the cluster), so the gas would evolve
as a quasi-hydrostatic equilibrium. If we imagine the gas in the core to have
cooled, it will not have enough pressure to support the weight of the gas at larger
radii, so the gas will be compressed, which further shortens its cooling time. This
runaway cooling is known as a cooling flow.
However, if cooling flows exist, then the inferred amount of gas deposition onto
the central cD galaxy turns out to be very large (hundreds of solar masses per
year), so there should be evidence for large amounts of cool X-ray gas in the
cluster cores. Cool gas is sometimes observed, but much less than expected, and
the inferred infall rate should lead to many observational signatures, such as
vigorous star formation, that are very rarely seen. The question then is why do
cooling flows not exist? What are the hidden distributed heating mechanisms for
the dense cluster cores? This is known as the cooling flow problem. We won’t
answer this immediately but will return to this question in Chapters 5 and 6.

3.10 The cosmological distance ladder


So far we have mentioned several ways of determining cosmological distances:
the S–Z effect (via an angular diameter distance), Tully–Fisher (a luminosity
distance), Dn –σ (an angular diameter distance), Faber–Jackson (a luminosity
distance), and the position of the first acoustic peak in the CMB (an angular
diameter distance). Many of these are based on the idea of a standard candle (an
object with a known or derivable luminosity) or a standard rod (an object with a
known or derivable size), from which one can calculate an angular diameter or
luminosity distance, respectively. There are several other ways of determining
cosmological distances that we have not yet covered, such as the following.
• The tip of the red giant branch: the brightest red giant stars turn out to have
I-band magnitudes that are relatively insensitive to metallicity and age (a
standard candle).
• RR Lyrae stars: these variable stars on the horizontal giant branch of the
Hertzsprung–Russell diagram pulsate with a period shorter than one day that is
closely correlated with their mean absolute magnitude (a standard candle).
• Cepheid variables: these are also variable stars with a well-defined
relationship between period and luminosity. There are two types: type I are
massive young stars while type II are low-mass stars. The period–luminosity

105
Chapter 3 The local Universe

relationships for the two types are different; only type I Cepheid variables are
used as primary distance indicators.
• Planetary nebula luminosity function: this is the number of planetary nebulae
per unit volume per unit luminosity. The luminosity is usually measured in the
[O III] 500.7 nm emission line of oxygen. It has a characteristic shape with a
sharp cut-off above a fixed luminosity, which can be used as a standard candle.
• Globular cluster luminosity function: this is found to be approximately
Gaussian. The mean of this Gaussian can be used as a secondary distance
indicator (a standard candle).
• The light echo from the supernova SN 1987A is a standard rod.
• The rotation velocity of the water maser (a naturally-occurring microwave
laser) in the galaxy NGC 4258 can be compared to its proper motion, finding a
distance of 7.2 ± 0.5 Mpc.
• Surface brightness fluctuations: if there are on average N stars per square
arcminute on the sky in a galaxy, this number will be
√ subject to Poisson
statistics and will fluctuate with standard deviation N . The fluctuations in
surface brightness can therefore give an estimate of the number of stars per unit
area on the sky in that galaxy, from which an angular diameter distance to that
galaxy can be derived.
• Type Ia supernovae are standard candles that will be discussed in more detail
below.
Figure 3.6 summarizes how larger-scale distance indicators depend on earlier
stages. The techniques have been labelled as standard rods, candles etc., but note
that for the nearest objects the various types of distances are indistinguishable.
From a comparison with the redshifts, the Hubble parameter can be found through
Equation 1.44 and its variants. In the low-redshift limit, the various distances
(angular diameter, luminosity, comoving) are indistinguishable and the expression
for H0 reduces to H0 = cz/d, where d is the distance. One of the primary goals
of the Hubble Space Telescope (HST) was the determination of the Hubble
parameter to 10% accuracy. This was achieved by measuring the periods and
magnitudes of Cepheid variables in 18 spiral galaxies at distances < 20 Mpc. The
high angular resolution of the HST was needed to identify the stars in these
crowded fields. This provided the fundamental calibration of the Tully–Fisher
relation, the fundamental plane, type Ia supernovae and surface brightness
fluctuations; the final result was H0 = 72 ± 8 km s−1 Mpc−1 .
Perhaps the most striking recent progress with the distance scale has been with
type Ia supernovae. These supernovae are caused when a white dwarf is accreting
matter from a companion star, which eventually sends the star above the threshold
for ignition of carbon. The star explodes and briefly has a luminosity of about ten
billion times that of our Sun. Type Ia supernovae occur only about once every two
hundred years in our Galaxy (the last one in our Galaxy was seen by Tycho Brahe
in 1572), but may occur at the rate of one per second in the observable Universe.
Since white dwarfs have a small range of masses, type Ia supernovae have a very
restricted range of luminosities, making them ideal for consideration as standard
candles. Empirically they have a peak luminosity that appears to be related to the

106
3.10 The cosmological distance ladder

The Hubble constant

SN Ia Dn − σ
100 Mpc Tully–Fisher
SBF

RGB
10 Mpc tip
Local Group NGC 4258
novae
and maser GCLF
PNLF
1 Mpc HST Cepheids

Local Group
RR Lyrae
100 kpc LMC Cepheids
SN 1987A
light echo
globular cluster
10 kpc Galactic RR Lyrae RR Lyrae
novae statistical π
cluster Cepheids

1 kpc Cepheid RR Lyrae globular cluster


B–W B–W statistical π

Figure 3.6 The cosmological distance ladder, showing how each method
depends on the calibration for more nearby methods (arrows). Insecure calibration
steps are shown as dashed lines. The pink boxes refer to methods that are useful in
star-forming galaxies, while the blue boxes are useful in early-type galaxies. Open
boxes are geometric distance determinations. The PNLF box (planetary nebulae
luminosity function) works for all galaxy populations in the local supercluster.

subsequent rate of decay of brightness; although this relationship is not well


understood, applying a correction for this makes type Ia supernovae excellent
candidates for standard candles. The advent of large-scale digital CCD sky
surveys made it possible to search for these supernovae at much higher redshifts.
The Supernova Cosmology Project and the High-Redshift Supernova Search
Team found supernovae out to z ∼ 1. To many people’s great surprise, the
luminosity distances implied by the supernovae could only be fit with models in
which ΩΛ > 0. Figure 2.14 in Chapter 2 showed how the simultaneous constraints
from supernovae and the CMB were needed to constrain both Ωm and ΩΛ .
This conclusion was not immediately accepted by some in the astronomical
community, who argued that (for example) dust extinction preferentially in
the higher-redshift supernovae could mimic the effect of ΩΛ by making the
high-redshift supernovae preferentially fainter. Indeed, the opacity of spiral discs
is expected to evolve. However, this would also be expected to increase the
dispersion, which it did not appear to, and the supernovae showed no evidence of
having redder optical spectra. Figure 3.7 shows the current magnitude–redshift
diagram for type Ia supernovae. Plots of apparent magnitude against redshift are
known as Hubble diagrams owing to their use in determining the Hubble
parameter, and dating from Hubble’s original 1929 discovery of the expansion
107
Chapter 3 The local Universe

of the Universe. A recent Cepheid-based calibration of the distances to the


Reiss, A. G. et al., 2009, galaxies that hosted nearby supernovae, plus the galaxy NGC 4258 (with a
Astrophysical Journal, 699, 539. water-maser-calibrated distance) has found 74.2 ± 3.6 km s−1 Mpc−1 .

900 000
empty model
flat dark-energy model
800 000
closed dark-energy model
Einstein–de Sitter model
700 000
dusty E–d S model
closed matter-only model
600 000
de Sitter model
cz/km s−1

evolving supernova
500 000
binned data

400 000

300 000
Figure 3.7 Constraints on the
Hubble diagram of type Ia 200 000
supernovae. The flat dark energy
model has Ωm = 0.27 and 100 000
ΩΛ = 0.73. It is, of course, also
possible to fit the data with a
model in which supernovae have 0 1 2 3 4 5 6 7 8 9 10 11 12
a specially-tailored evolution. luminosity distance/Gpc

3.11 The large-scale structure of the Universe


Next we’ll take you on a tour of the large-scale structure of the Universe. Our
Galaxy, the Milky Way, is in the process of accreting two satellite galaxies,
the Large Magellanic Cloud (LMC) and the Small Magellanic Cloud (SMC,
Figure 3.8), which can be seen by the naked eye on a dark enough night in the
southern hemisphere. These are our most famous satellite galaxies, but there are
many more, many of which have only recently been discovered. Figure 3.9 shows
our Galaxy’s immediate cosmic neighbourhood. One close neighbour, the
Sagittarius dwarf elliptical galaxy, was discovered only in 1994, despite being
Figure 3.8 The Magellanic only half the distance to the LMC; its location on the sky is on the far side of the
Clouds as seen from central bulge of our Galaxy. The Canis Major dwarf galaxy is even closer at only
Queensland, Australia. 7.6 kpc (closer to us than the centre of the Milky Way), yet was discovered only
in 2003 using the Two Micron All-Sky Survey (2MASS), which highlighted
the distinctive colours of the galaxy’s stellar population. The Milky Way is
accreting all these galaxies. The tidal shears from our Galaxy’s gravitation field
are distorting the satellite galaxies and in some cases pulling them into streams of
stars. In Chapter 4 we’ll see how this process has continued throughout the
history of the Universe.

108
3.11 The large-scale structure of the Universe

200
Leo I Leo II
150

100
UMi
z/kpc (b = +90◦ )

M.W.
50
Sex
Dra
0
LMC
−50 Sgr
−100

−150 Car Sc1


For
−200
−200 SMC

0
x/kpc (l = 0◦ ) 100 200
−100 0
200 −200
y/kpc (l = 90◦ )

Figure 3.9 The neighbourhood of our galaxy. MW marks the Milky Way; l and Figure 3.10 The Andromeda
b are the galactic longitude and latitude coordinates. The other annotations are galaxy, M31, as seen in the
abbreviated names of the nearby galaxies. optical.
Our Galaxy is falling towards and will eventually itself be accreted by its more
massive neighbour, the Andromeda galaxy (Figure 3.10), also known as M31. In
about a billion years’ time, Andromeda will appear about the size and brightness
of the Magellanic Clouds today. Even today, though, it is visible by the naked eye
on a dark enough night in the northern hemisphere. M31 also has its own satellite
galaxies currently being accreted.
Moving slightly out, both our Galaxy and M31 are members of the Local Group,
known to contain over 40 galaxies. Edwin Hubble was the first to recognize that
these neighbours represent an overdensity relative to the average galaxy density.
Moving slightly out again, the Local Group is itself interacting with other galaxy
groups, such as the Maffei 1 Group, the Sculptor Group, the M81 Group and
the M83 Group (each named after one of their members). Galaxy groups are
gravitationally bound but are not as massive as galaxy clusters; there is no
established dividing line between the two, but > 1014 M) systems are usually
regarded as clusters. Our nearest galaxy cluster is the Virgo cluster (Figures 3.11
and 3.12). The Virgo cluster currently has a positive redshift, but the Local Group
is still likely eventually to be accreted by it. The Virgo cluster and the next nearest
cluster, Coma, are themselves part of a structure known as the local supercluster. Figure 3.11 The centre of the
On these progressively larger scales (Figure 3.13), the Universe looks filled with Virgo cluster of galaxies, as seen
sheets and filaments — indeed, the local supercluster galaxies are preferentially in optical light. The cluster
(though not exclusively) in a particular plane. contains over 2000 galaxies and
has a diameter ten times bigger
than the full Moon on the sky,
but is too faint to be seen by
unaided eyesight.

109
Chapter 3 The local Universe

Figure 3.12 The location of our Local Group of galaxies and the Virgo cluster.

As we zoom out further, this filamentary structure is increasingly apparent.


Figure 3.14 is the famous ‘stick man’ from the early CfA (Centre for
Astrophysics) galaxy redshift surveys. The horizontal ‘arms’ featured in later
surveys and became known as the Great Wall. This early survey is now dwarfed
by the latest generation of redshift surveys, such as the 2dF galaxy redshift survey
(2dFGRS, named after the fibre-optic spectroscope at the Anglo-Australian
Telescope, with a two degree field on the sky) and the Sloan Digital Sky Survey
(SDSS) galaxy redshift survey (a CCD imaging and spectroscopy survey of about
Sometimes referred to as π in π steradians on the sky). Figure 3.15 shows the cone diagram for the 2dF survey.
the sky.

110
3.11 The large-scale structure of the Universe

Figure 3.13 The Virgo cluster in relation to nearby superclusters. On this truly
gigantic scale, the Local Group is barely discernible on the page.

Compare the scale to Figure 3.14. The large-scale structure of the Universe looks
cobwebby, with superclusters, walls and giant voids. Note that the sensitivity in
Figure 3.15 tapers off at the largest distances. On the very largest scales the
Universe is beginning to look homogeneous, which we shall quantify with the
power spectrum.

111
Chapter 3 The local Universe

13 hr 12 hr
14 hr 11 hr
15 hr 10 hr

16 hr 9 hr

17 hr 8 hr
Figure 3.14 The famous
‘stick man’ in the CfA galaxy 10 000 km s−1
redshift surveys. The angular
axis is right ascension, and the 5000 km s−1
distance from the apex is
galactocentric.

10 h 0.2
3h
ift 0.15
d sh
11 h re 0.1 2h

0.05
12 h 1h

0h
13 h

Milky
23 h
14 h Way
Figure 3.15 The distribution
right
of galaxies seen in the 2dF ascension
22 h
galaxy redshift survey.

A sharp-eyed look at Figure 3.15 reveals several structures apparently pointed


directly at the Earth! These are known as fingers of God. To understand how
these come about, imagine how a galaxy cluster affects the redshifts of its
galaxies. These galaxies are decoupled from the Hubble flow and acquire the
typical redshift of the cluster, plus or minus its velocity dispersion. The length of
the ‘finger’ simply reflects the velocity dispersion in the cluster. On bigger scales
there is another effect, known as the Kaiser effect, caused by galaxies beginning
to fall into the cluster. Galaxies between us and the cluster (but near the cluster in
space) will tend to be falling in. This peculiar velocity (Section 1.4) adds a
Doppler component to the redshift and makes the galaxies appear further away.
Similarly, galaxies on the other side of the cluster appear closer. Together, the
Kaiser effect and fingers of God are known as redshift space distortions, and
the clustering analysis done with redshifts (rather than independently-derived
distances) is known as redshift space clustering. We’ll return to this in Chapter 4
and see at which point the Kaiser effect turns into a finger of God.
This is illustrated in Figure 3.16. This shows the amplitude of the redshift space
correlation function from 2dFGRS as a function of transverse separation on the
sky (symbolized σ) and redshift-axis separation (symbolized π). Note the finger
112
3.11 The large-scale structure of the Universe

of God and the Kaiser effect flattening. This flattening is determined by the value
of Ω0.6
m /b, where b is known as the bias parameter and expresses how much
stronger the clustering of galaxies is compared to dark matter (see also Chapter 4).

20
π/h−1 Mpc

0 Figure 3.16 The amplitude of the redshift space


correlation function of galaxies, ξ(σ, π), as a function of the
transverse (σ) and radial (π) separations. The data from the
first quadrant have been repeated in the other three quadrants,
−20 to highlight the deviation from circular symmetry. The ‘fingers
of God’ effect is seen at small scales, and the flattening on
larger scales is due to the Kaiser effect. Contours are plotted at
−20 0 20 ξ = 10, 5, 2, 1, 0.5, 0.2, 0.1 with the contour of highest ξ in the
σ/h−1 Mpc
centre.

The dimensionless power spectrum of galaxies is shown in Figure 3.17. This


diagram can be used to make a rough estimate of how big the fluctuations from
large-scale structure would be in any galaxy survey. We’ll show examples of this
in Chapter 4.
Another way of measuring galaxy clustering is through the correlation function
ξ(r), which is defined as follows. What is the probability that a galaxy will
have a neighbour at a distance r from it, in a small volume δV ? If galaxies are
unclustered, the answer will be n δV , where n is the number of galaxies per unit
volume. However, if galaxies are clustered, we could write this as
Pr(r) = {1 + ξ(r)} n δV, (3.8)
where ξ(r) just expresses the excess over what’s expected in a non-clustered case.
Slightly more generally, the probability of finding one galaxy in the volume dV1
and another in dV2 separated by a vector r is
d2 Pr(r) = {1 + ξ(r)} n2 dV1 dV2 . (3.9)
The correlation function can be estimated by counting the numbers of galaxy
pairs as a function of their separation, and comparing the results for random
unclustered populations. This measure of galaxy clustering is sometimes seen as
more intuitive since it avoids Fourier series.
It turns out to be closely related to the galaxy power spectrum. We’ll spare you
the details, but if we make a Fourier expansion of ξ(r), it turns out that it can be
expressed as
*
V
ξ(r) = |δk |2 e−ik·r d3 k. (3.10)
(2π)3

113
Chapter 3 The local Universe

10

0.1

Δ2 (k)
Abell
radio
Abell × IRAS
0.01 CfA
Figure 3.17 The APM/Stromlo
dimensionless power spectrum radio × IRAS
IRAS
of galaxies, selected through a
0.001 APM (angular)
variety of means. The bias
parameter depends on the nature
of the selection, so the power
0.01 0.1 1
spectra have been normalized to
k/h Mpc−1
b = 1.

In other words, the correlation function is the Fourier transform (see the box in
Section 2.9) of the power spectrum. If the density field is isotropic, then the
power spectrum will depend only on the magnitude of the k vector, not its
direction: (|δk |2 / = |δk |2 (k). Since ξ(r) must be real, we can replace eik·r with
cos(kr cos θ), and integrating over all angles in three dimensions can be shown to
give
*
V sin kr
ξ(r) = 3
P (k) 4πk 2 dk. (3.11)
(2π) kr
Similarly, we can write the power spectrum as a transform of the correlation
function:
*
2 V 3 2 3 ∞ sin kr 2
Δ (k) = 3
4πk P (k) = k ξ(r) r dr. (3.12)
(2π) π 0 kr
In summary, the power spectrum P (k) (or Δ2 (k)) is very closely related to the
correlation function ξ(r).
So far we’ve talked about three-dimensional power spectra and the correlation
function of galaxies in three dimensions. To measure this we need the positions of
galaxies on the sky, and the distance (e.g. redshift) to the galaxies. But what if we
don’t have redshifts? Can we determine anything about the clustering? It turns out
that we can, though we need to make some assumptions about the number of
galaxies that we see per unit redshift. This might evolve for example because of
galaxy merging, and needs to take account of the fact that less-luminous galaxies
can be seen only nearby while the rarer bright galaxies can be seen further away.
We can write down an angular correlation function w(θ) on the sky in a similar
way to the spatial correlation function, as the excess neighbours that you’d see
over an unclustered population:
d2 Pr(θ) = n2 (1 + w(θ)) dΩ1 dΩ2 , (3.13)
114
3.11 The large-scale structure of the Universe

where dΩ1 and dΩ2 are the solid angles of nearby patches of sky, and n is the
number of galaxies per unit area on the sky. We won’t prove the relationship here
between ξ(r) and w(θ) (it is known as Limber’s equation), but if w(θ) is a power
law with w(θ) = Aθ1−γ (where A is some constant), then it turns out that ξ(r) is
also a power law with ξ(r) = (r/r0 )−γ (where r0 is a scale length known as the
clustering length scale). The constants A and r0 can be related with a knowledge
of the number of galaxies per unit redshift. See, for example,
The clustering of galaxies was different at high redshifts. We can add an evolution Gonzalez-Solares, E.A. et al.,
term into the correlation function: 2004, Monthly Notices of the
C D−γ Royal Astronomical Society,
r 352, 44.
ξ(r) = (1 + z)−(3+ε) . (3.14)
r0
If ε = 0, then the galaxy clustering is constant in physical coordinates, i.e. it’s
unaffected by the expansion of the Universe. If ε = 3 − γ, then the clustering is
constant in comoving coordinates. Equation 3.14 still yields a power law angular
correlation function w(θ) = Aθ1−γ . Just so you have it, in this case Limber’s
equation (which we won’t prove) is See, for example, Phillips, S. et
) 1−γ −1 al., 1978, Monthly Notices of the
γ dA g (z) (1 + z)−(3+ε) (dN/dz)2 dz Royal Astronomical Society,
A = Cr0 1) (2 , (3.15)
(dN/dz) dz 182, 673.
where dA is the angular diameter distance, g(z) is the derivative of proper
distance with redshift, and dN/dz is the number of galaxies
√ per unit redshift. The
normalization C is a function of γ and is given by C = π Γ((γ − 1)/2)/Γ(γ/2),
where Γ is a standard function (usually calculated numerically) known as the
gamma function.
We’ll end our tour of the large-scale structure of the Universe with the clustering
of quasars, shown in Figure 3.18. Quasars, as we’ll see in Section 4.6, are
supermassive black holes in the centres of galaxies. The accretion discs around
these black holes can outshine the rest of the galaxy, as we’ll discuss in Chapter 4.
This extraordinary luminosity makes them visible in large numbers throughout
most of the Hubble volume (Section 1.9). The Universe appears far more
homogeneous on these scales, but some clustering signal is still present. The
strength of quasar clustering today is comparable to the predicted dark matter
clustering, but in the early Universe the amplitude of the quasar correlation
function was higher (unlike the dark matter). In general, virialized dark matter
haloes (when considered as objects in their own right) cluster more strongly than
the dark matter distribution as a whole, so the enhanced clustering strength of
quasars could reflect the haloes that they inhabit. The observed quasar clustering
could be accounted for by assuming that quasars inhabit dark matter haloes with a
typical mass of around 1013 M) . The phenomenon can be characterized by the
bias parameter b, mentioned briefly above:
C D C D
δρ δρ
=b , (3.16)
ρ QSOs ρ dark matter
where the bias parameter can be a function of redshift. There isn’t much
theoretical basis for assuming this relationship; it’s more of an empirical rule of
thumb.

115
Chapter 3 The local Universe

2df Quasar Redshift Survey


bi
3 lli approximate c
on r se 3
pa distance pa
rs on
2 ec lli
bi 2

1 1

Figure 3.18 The spatial


distribution of quasars in the
2dF quasar survey, which Milky
covered two 75◦ × 5◦ strips on 1 Way 1
the sky, one in each Galactic
re
hemisphere. The approximate ft ds 2
2 s hi hif
distances marked are comoving, r ed t 3
3
assuming Ωm = 1 and Λ = 0.

3.12 Baryon wiggles


The acoustic peaks in the matter power spectrum did not disappear after
recombination — they are still there in the power spectrum of galaxies! These
appear as wiggles in the galaxy power spectrum and are sometimes referred to as
baryon wiggles or more fully as baryonic acoustic oscillations (BAOs). The
detection of these features is difficult but there have been hints of detections from
SDSS and the 2dF galaxy redshift survey, shown in Figure 3.19.
At the time of writing this work is at its early stages, but the crucial advantage of
detecting BAOs is that the positions of the acoustic peaks are a standard rod that
can be used right back to the time of the CMB. (There is a phase difference
between the photon and matter oscillations, but this is calculable.) This is ideal
for measuring changes in the expansion rate of the Universe, i.e. constraining the
cosmological parameters and the evolving dark energy equation of state. This
goal is so attractive that many international groups are planning ambitious sky
surveys with enough photometric and spectroscopic data for baryon wiggles and
other dark energy constraints such as cosmic shear (Chapter 7) and high-redshift
supernovae. In the further reading section, we’ve included a report on the future
of dark energy surveys in the coming decade; we’ll meet some of these ambitious
future surveys briefly in Chapter 7.
BAOs are not just detectable in the angular correlation function of galaxies at a
particular redshift. They could also be detected along the redshift axis, which as
the following exercises show can give us a measure of the evolving Hubble
parameter H(z). This in turn can be a measure of the dark energy equation of
state, because the generalization of Equation 1.33 for a flat Universe with dark
energy is
J
H(z) = H0 Ωm,0 (1 + z)3 + (1 − Ωm,0 )(1 + z)3(1+w) . (3.17)

116
Summary of Chapter 3

(a) 2dFGRS + SDSS main


0.05

−0.05

0.1 0.2
k/h Mpc−1
log10 [P (k)/P (k)smooth ]

(b) SDSS LRG


0.05

−0.05

0.1 0.2
k/h Mpc−1
Figure 3.19 The deviations
(c) all from a smooth power spectrum
0.05 seen in the galaxy power
spectrum, compared to the
0 predictions for baryon wiggles.
−0.05
The galaxy surveys are the
2dFGRS, the whole SDSS
galaxies, and the subset of
0.1 0.2
k/h Mpc−1
luminous red galaxies from
SDSS.

Exercise 3.4 If LBAO is the comoving size of the BAOs at some redshift z, and
θBAO is the observed angular size in a flat universe, show that the comoving
distance to redshift z is dcomoving = LBAO /θBAO .
Exercise 3.5 If δz is the size along the redshift axis of the BAOs, show that
H(z) = c δz/LBAO , where LBAO is the comoving size of the BAOs.
Exercise 3.6 Is the scale length of baryon wiggles dependent on galaxy
bias? ■

Summary of Chapter 3
1. One piece of evidence in favour of dark matter is galaxy rotation curves.
2. Local galaxies can be classified by eye (or by software) along the Hubble
tuning fork. Irregular galaxies lie outside this framework, and low surface
brightness galaxies were unknown in Hubble’s time. Local elliptical
galaxies tend to occur in galaxy clusters, while local spiral galaxies tend to
occur in the field (the morphology–density relation).
3. There is experimental support for the Tolman test (that surface brightness
decreases as (1 + z)4 ) once stellar evolution is taken into account.

117
Chapter 3 The local Universe

4. Spiral galaxies obey the Tully–Fisher relation, that luminosity L scales with
velocity width Δv as L ∝ (Δv)α , with α 1 3–4. This can also be used as a
distance indicator.
5. Elliptical galaxies also have a similar relation, known as the Faber–Jackson
relation: L ∝ σvα , where σv is the velocity dispersion and α 1 3–4, though
the physical underpinnings are different.
6. There is a more general relationship for ellipticals, known as the
fundamental plane: L ∝ I0x σvy , where L is the luminosity, I0 is the surface
brightness within a given radius, and σv is the velocity dispersion. This can
also be used as the basis of distance indicators.
7. Clusters of galaxies show evidence for dark matter, from the virial theorem
(kinetic energy equals minus one half times the potential energy for a
virialized system).
8. If we assume that the baryonic mass fraction of galaxy clusters doesn’t
depend on redshift, then galaxy cluster sizes and X-ray luminosities can be
used to constrain cosmological parameters.
9. The Sunyaev–Zel’dovich effect is the change of temperature of CMB
photons passing through a galaxy cluster, due to Compton scattering. It can
be used to find the line-of-sight size of a cluster, and assuming spherical
symmetry (true on average) gives an angular diameter distance estimate.
10. Galaxy surveys to determine the large scale structure of the Universe reveal
redshift space distortions known as the fingers of God and the Kaiser effect.
These are due to the velocity dispersion of galaxies within a cluster and the
peculiar velocities of galaxies falling into clusters respectively.
11. The clustering of galaxies can be measured with the correlation function
ξ(r), which is a measure of the number of neighbours that a galaxy has in
excess of what’s expected in an unclustered population. This is proportional
to the Fourier transform of the galaxy power spectrum. It’s also related to
the angular correlation function, i.e. the clustering of galaxies as they appear
on the sky.
12. The acoustic peaks imprinted into the cosmological density field in the early
Universe are still present in the present-day clustering of galaxies. These
features, known as baryonic acoustic oscillations or simply baryon wiggles,
are a standard rod that can be used to derive angular diameter distances and
also the variation in the Hubble parameter H(z). This is a route to
constraining the dark energy equation of state parameter w.

Further reading
• The Galaxy Zoo website is currently [Link].
• Gaitskell, R.J., 2004, ‘Direct detection of dark matter’, Annual Reviews of
Nuclear and Particle Science, 54, 315.
• For more on the virial theorem, see, for example, Chapter 2 of Ryan, S.G. and
Norton, A.J., 2010, Stellar Evolution and Nucleosynthesis, Cambridge
University Press.

118
Summary of Chapter 3

• Young, J.S. and Scoville, N.Z., 1991, ‘Molecular gas in galaxies’, Annual
Review of Astronomy and Astrophysics, 29, 581.
• Reese, E.D., 2004, ‘Measuring the Hubble constant with the
Sunyaev–Zel’dovich effect’, in Freedman, W.L. (ed.) Measuring and Modeling
the Universe, Carnegie Observatories Centennial Symposia, Cambridge
University Press (also available at level 5 of [Link]
• Freeman, W.L., 2001, ‘Final results from the Hubble Space Telescope key
project to measure the Hubble constant’, Astrophysical Journal, 553, 47.
• For an accessible account of recent developments in the cosmological distance
ladder, see, for example, Rowan-Robinson, M., 2008, ‘Climbing the
cosmological distance ladder’, Astronomy and Geophysics, 49, 3.30–3.33.
• An accessible review of the WiggleZ baryon oscillation survey is in Blake, C.,
2008, Astronomy and Geophysics, 49, 5.19–5.24.
• The classic graduate-level text for galaxy clustering (e.g. galaxy correlation
functions and power spectra) is Peebles, P.J.E., 1980, The Large-scale
Structure of the Universe, Princeton University Press.
• Albrecht, A. et al., 2006, Report of the Dark Energy Task Force, available at
astro-ph/0609591.

119
Chapter 4 The distant optical Universe
It’s one of nature’s ways that we often feel closer to distant generations than
to the generation immediately preceding us.
Igor Stravinsky

Introduction
Perhaps one of the most astonishing surprises about the Universe is that it’s
possible to build telescopes that can observe galaxies throughout almost all the
Hubble volume, most of the way back to the Big Bang. In this chapter we’ll
discuss some of the profound insights that optical telescopes have brought us on
the evolution of galaxies in our Universe.

4.1 Source counts


We started Chapter 1 with a derivation of the Euclidean source count slope
dN/dS ∝ S −5/2 , where dN/dS is the number of objects per unit flux S on the
sky. You may be surprised to hear that something as simple as counting galaxies
in this way has led to profound insights and questions.
First, however, let’s derive the Euclidean slope in a simpler and quicker way by
considering instead the integral source counts (or integral number counts),
defined as the number of objects brighter than a flux S. This is often given the
absurdly non-mathematical)symbol N (> S), and it can be related to the number

per unit flux as N (> S) = S (dN/dS) dS. For this reason dN/dS is sometimes
called the differential source counts (or differential number counts).
Suppose that all galaxies have the same luminosity L, and have a constant number
per unit volume in a static Euclidean space. The number of objects brighter than a
flux S will be given by all the objects within a radius r, where S = L/(4πr2 ).
But we’ve assumed that the space density (number per unit volume) of galaxies
is a constant, say ρ, so the number of galaxies brighter than S must be
N (> S) = ρ × 43 πr3 ∝ r3 . But S ∝ r−2 , so N (> S) ∝ S −3/2 . Differentiating
this gives dN/dS ∝ S −5/2 . As in Chapter 1, objects with different luminosities
will still give power-law number counts with the same slope, so a mix of galaxies
with different luminosities will still give the same slope. This argument is very
Michell, J., 1767, Philosophical old and dates back at least as far as John Michell in 1767!
Transactions of the Royal As early as the 1970s it became clear that the number counts of galaxies in
Society of London, 57, 234–64; a the B-band optical (blue) filter were strongly inconsistent with an unevolving
reference url is given in the population of galaxies almost regardless of Ωm (Λ = 0 was implicitly being
Further reading section at the assumed). There seemed to be too many faint blue galaxies, which became known
end of this chapter. as the faint blue galaxies problem. We’ll see below that galaxies should evolve
in luminosity, and adding some luminosity evolution to the models helped but did
not resolve the discrepancy on its own. The only solution appeared to be a new
population of recently-formed dwarf galaxies, and theorists struggled to explain
this population in an unforced way.

120
4.2 Cold dark matter and structure formation

This changed with the advent of precision cosmology and the concordance model
(Chapters 1 and 2) with Ωm,0 h2 = 0.1326 ± 0.0063, ΩΛ,0 = 0.742 ± 0.030 and
h = 0.72 ± 0.03. A flat universe with a cosmological constant has both more time
and more volume at high redshifts than a flat Λ = 0 universe. Both these effects
increased the expected numbers of galaxies at the faint end of the B-band number
counts. This resolved most of the faint blue galaxies problem, but perhaps it’s a
pity that optical galaxy number counts are no longer demanding such strong
changes to our picture of galaxy formation and evolution. (The number counts of
galaxies at submm wavelengths nevertheless turned out to be an unexpectedly
strong constraint, as we’ll see.)
Although this chapter is mainly about optical astronomy, it’s worth just adding
that radio source counts were found as early as 1959 to be steeper than an S −5/2
power law. This was one of the first pieces of evidence that we don’t live in a flat,
steady-state universe, and it predated the discovery of the CMB.

4.2 Cold dark matter and structure formation


Why is dark matter sometimes described as ‘cold’?
The key to this is the time when the dark matter particles decouple, i.e. the time
when interactions cease with other matter. If the dark matter was relativistic at
decoupling, the mean energy of a particle would be (E/ = 2.7kT for bosons, or
3.1kT for fermions. Also, the average momentum (|p|/ will satisfy (E/ = (|p|/ c.
At later times, the particle’s momentum will be reduced by redshifting (the
de Broglie wavelengths will redshift like that of a photon) until the motion is
non-relativistic and (|p|/ = m (|v|/. However, the shape of the momentum
distribution will still be the same, so we still apply (E/ = (|p|/ c = m |v| c. A
neutrino background would have a present-day temperature of 1.95 K (this is
cooler than the CMB temperature because of the additional later contribution from
e+ e− annihilation to the photons), which leads to |v| = 159 (mc2 /eV)−1 km s−1
for the present-day velocities of a hypothetical neutrino of mass m. However, if
the dark matter particles decoupled when non-relativistic, they can be arbitrarily
slow. These relative speeds lead to the terms ‘hot’ and ‘cold’ dark matter (HDM
and CDM, respectively), or compromises such as ‘warm’ or ‘mixed’ dark matter.
Several effects bend the matter power spectrum over (see Figure 4.1) from the
nearly scale-invariant fluctuations imprinted by inflation. For example, if a matter
perturbation Fourier mode enters the horizon (i.e. when the horizon becomes
bigger than the scale of the Fourier mode) before the epoch of matter–radiation
equality, then this Fourier mode can’t grow through gravitational collapse,
because the dominant energy of radiation drives the Universe to expand so
fast that matter has no time to self-gravitate, i.e. the Fourier mode is ‘frozen’
at a constant value. This is known as the Mészáros effect. Also, if the dark
matter particles are fast-moving as in HDM, their free-streaming movement can
erase small-scale structure in the early Universe, until the particles become
non-relativistic. However, HDM models do not reproduce the observed
large-scale galaxy power spectrum. In fact, the upper limit on the cosmological
neutrino density (Equation 3.1) comes not from particle physics constraints but
from the evolution of large-scale structure in the Universe.

121
Chapter 4 The distant optical Universe

λ/(h−1 Mpc)
104 103 102 10 1
105

current power spectrum, V P (k)/(h−1 Mpc)3


104

103

102

CMB
2dF galaxies
10
Figure 4.1 Matter power cluster abundance
spectrum from the largest to the weak lensing
smallest scales, derived from Lyman α forest
multiple methods. This figure
pre-dates WMAP. The solid red 1
10−3 10−2 10−1 1 10
curve is a model based on CDM −1
k/(h Mpc )
in the concordance cosmology.

The hierarchical formation of large-scale structure in the Universe is shown


schematically in Figure 4.2. Small virialized dark matter haloes merge to form
larger dark matter haloes. This is known as bottom-up structure formation
(as opposed to the top-down formation that results in HDM from the erasing
of small-scale structures). Note, however, that what happens to the baryons
(e.g. galaxies, stars, gas) is another matter altogether. In the early Universe,

tf Figure 4.2 Schematic


representation of the
hierarchical growth of
dark matter haloes. Time
increases downwards in this
figure. Haloes at a nominal
t
formation time tf in the figure
merge and form a single halo
in the present Universe at
t0
time t0 .

122
4.2 Cold dark matter and structure formation

the evolution of perturbations in matter is affected by the interaction of baryons


with the photons (via Thomson scattering). In the later Universe, the formation of
stars and galaxies is a very complex process, as we explore later in this book.
The evolution of density perturbations can be very complicated mathematically, if
you take into account the spacetime curvature perturbing the Robertson–Walker
metric. It’s relatively easy to derive the collapse of a spherical overdensity in a flat
Λ = 0 universe, however, assuming that the motion is non-relativistic. First, by
Birkhoff’s theorem (that you can neglect the external matter if the system is
spherically-symmetric — we shall demonstrate this in Chapter 6) the perturbation
must behave in exactly the same way as an identical region in a homogeneous
closed universe, which we’ll calculate in the next exercise.
Exercise 4.1 Verify that in a Λ = 0 universe with k = +1, the Friedmann
equation with dR/dt (Equation 1.7) can be written as
C D
dR 2
= Rmax R − R2 , (4.1)

where Rmax is the radius where dR/dt = 0, and we’ve introduced a new
parameter θ defined with dt/dθ = R/c.
Also verify that the solution satisfies the cycloid equations
Rmax R(t)
R(θ) = (1 − cos θ) , (4.2)
2
Rmax
t(θ) = (θ − sin θ) . (4.3)
2c
■ θ
Incidentally, you may be curious why this is called the ‘cycloid’ solution. It’s t
because Equations 4.2 and 4.3 describe the path swept out by an object on a
wheel, known as a cycloid, shown in Figure 4.3. Figure 4.3 Schematic
In the cosmological case, Rmax = 8πGρ0 R03 /(3c2 ). In the case of a spherical representation of the cycloid
overdensity in a flat Λ = 0 universe, the same equations apply but R this time solution for a k = +1, Λ = 0
stands for the radius of the overdensity. The mass of the overdensity is universe.
M = 43 πR3 ρ, (4.4)
and since mass is conserved we can use the value at time t0 , which is
M = 43 πR03 ρ0 , so we can write
2GM
Rmax = . (4.5)
c2
(This is curiously similar to the Schwarzschild radius, but remember that we’re
working in the Robertson–Walker metric, not the Schwarzschild metric.) In the
cosmological case, the universe begins at θ = 0, reaches a maximum size at
θ = π, and ends at θ = 2π, while in the overdensity the moment θ = π is called
the turn-around time and θ = 2π is the time when it would collapse to a point.
In practice, we suppose that the clump would become virialized before then.
Rearranging Equations 4.2 and 4.3 to find R(t) is not possible analytically, but we
can find the early behaviour of the fractional overdensity δ using Taylor series.
Combining Equations 4.4 and 4.5 gives us the density of the spherical clump:
3M 3Rmax c2
ρ= = . (4.6)
4πR3 8πG R3 (θ)
123
Chapter 4 The distant optical Universe

The density of the surrounding universe can be found by using the results of
Exercises 1.4 and 1.5 in Chapter 1, and applying Ωm = 1 (Equation 1.15). The
density of this Einstein–de Sitter universe comes out as
1
ρEdS = . (4.7)
6πGt2
Taking the ratio of these two densities gives
ρ 3Rmax c2 2 9 2
2 t (θ)
= 6πG t (θ) = Rmax c
ρEdS 8πG R3 (θ) 4 R3 (θ)
9 R2 8 (θ − sin θ)2
= Rmax c2 max
4 4c2 Rmax 3 (1 − cos θ)3
9 (θ − sin θ) 2
= .
2 (1 − cos θ)3
The function of θ is singular at θ = 0, but by writing a Taylor expansion around a
point arbitrarily close to θ = 0, it can be shown that
(θ − sin θ)2 2 1
3
1 + θ2 + · · ·
(1 − cos θ) 9 30
and therefore
C D
ρ 9 2 1 3 2
1 + θ2 + · · · = 1+ θ + ··· .
ρEdS 2 9 30 20
This is still in terms of θ, but if we now Taylor-expand Equation 4.3, we find that
Rmax θ3
t(θ) 1 + ···
2c 6
and so
C D2/3
ρ 3 12ct
11+ .
ρEdS 20 Rmax
We can therefore identify the fractional overdensity as
C D
3 12ct 2/3
δ1 . (4.8)
20 Rmax
Despite the high orders of θ that we’ve used, this is known as the linear theory
approximation (we’ll see why in Exercise 4.2). In this approximation, density
perturbations grow as t2/3 , i.e. proportional to the dimensionless scale
factor a. However, by the time the overdensity reaches the turn-around radius
Rmax , linear theory has broken down. This is at θ = π, i.e. t = πRmax /(2c)
(Equation 4.3). The overdensity will then have a turn-around density of exactly
ρ = 3M/(4πRmax 3 ). Using Equations 4.5 and 4.7 we find that the density ratio is

C D
ρ 3M 2 9 M G πRmax 2 9π 2 M G
= 3
6πGt = 3
= 2
ρEdS 4πRmax 2 Rmax 2c 8c Rmax
9 2
= π 1 5.552,
16
so the overdensity is δ = 5.552 − 1 = 4.552. Linear theory would predict
δ = (3/20)(6π)2/3 1 1.062 at this point (Equation 4.8).
124
4.2 Cold dark matter and structure formation

If the cycloid solution applied at θ = 2π, then the overdensity would collapse to a
point of infinite density. The linear theory prediction (now infinitely wrong)
3
is δ = 20 (12π)2/3 1 1.686 at this point. However, it’s more likely that the
overdensity will stabilize at the virial equilibrium, where the potential energy
equals −2 times the kinetic energy. It’s usually assumed that virialization happens
around the time θ = 2π. It’s not too hard to calculate the final virialized density.
Total energy is conserved, and since there’s no kinetic energy at turn-around, the
total energy is just the binding energy at turn-around (E ∝ 1/Rmax , Equation 3.6).
But since the virial binding energy Ev must be related to the virial kinetic energy
EK,v by Ev + 2EK,v = 0, and the total energy is E = Ev + EK,v , we must have
E = Ev /2. This means that the virialized radius must be half the turn-around
radius (Equation 3.6), i.e. the virialized density is eight times the turn-around
density. The collapse time is at θ = 2π, which is twice the turn-around time
(proved either by symmetry in Figure 4.3 or using Equation 4.3), so the density of
the surrounding Universe has gone down by a factor of 4 (Equation 4.7). The
overdensity is therefore 8 × 4 = 32 times bigger than the turn-around overdensity,
i.e. about 146. This collapse also sheds light on the fingers of God. Figure 4.4
shows how the collapse of a spherical overdensity would appear in redshift space.

REAL SPACE REDSHIFT SPACE

squashing effect
linear regime

collapsed
turnaround

Figure 4.4 A schematic


view of how a collapsing
collapsing finger-of-God
overdensity appears in real
space and in redshift space.

Of course, you might quite reasonably object to this analysis, because real Deriving Equation 4.9 would
overdensities are much more likely to be highly aspherical. The growth of these take us too far off-topic, and
density perturbations can be modelled with numerical N-body simulations. introduce mathematical
However, the spherical collapse model contains more elements of the underlying machinery that we won’t use
physics than you might expect, as we’ll see. We’ve also assumed an Ω = 1 elsewhere, but if you’re not
universe — but what happens more generally? satisfied with having this used
without proof, you can find
In general, the growth of dark matter perturbations in the Universe is governed by
proofs in Chapter 15 of Peacock,
fluid dynamics equations. Taking the linear approximation of these equations
1999, or Chapter 11 of Coles
gives the relation
and Lucchin, 1995, in the
ȧ Further reading section below.
δ̈ + 2 δ̇ = 4πGρm δ, (4.9)
a
125
Chapter 4 The distant optical Universe

where δ is the shorthand for the matter overdensity (δρ)/ρ0 , and a = R/R0 is the
dimensionless scale factor (Chapter 1).

Exercise 4.2 Show using Equation 4.9 that in a flat matter-dominated universe
with no cosmological constant, matter perturbations grow following the power
law δ(t) ∝ t2/3 , i.e. δ(t) is just proportional to the scale factor a. (You may
assume that H(t) = 2/(3t) in this universe.) ■
See Peacock, 1999, in the The solution for general cosmologies can be expressed as
Further reading section.
δ(z = 0, Ωm,0 , ΩΛ,0 )
δ(z = 0, Ωm,0 = 0, ΩΛ,0 = 0)
% 1 (1 (=−1
4/7
1 52 Ωm,0 Ωm,0 − ΩΛ,0 + 1 + 12 Ωm,0 1 + 1
70 ΩΛ,0 . (4.10)

A reasonable approximation to this equation for a flat universe is Ω0.65


m,0 , while for
a Λ = 0 universe a good approximation is Ω0.23
m,0 . So, for fixed initial conditions,
open universes or Λ > 0 universes have weaker large-scale structure δ than the
flat Λ = 0 case. This is because the increased expansion rate at late epochs
suppresses the growth of large-scale structure. The suppression is weaker in the
flat Λ > 0 case because the Λ term becomes dynamically relevant only at later
epochs. A related analysis to the proof of Equation 4.9 shows that density
perturbations gravitationally induce peculiar velocities given approximately by
δv 1 H0 r Ω0.6
m,0 (δρ)/ρ (Section 3.11).

In the linear theory, we have that δ ∝ t2/3 in a flat matter-dominated Λ = 0


universe, so that P (k) ∝ t4/3 in this case. In the linear regime, each Fourier mode
evolves independently. In general this is expressed as the transfer function T (k),
which depends on the Fourier mode k: Δ2 (k, z = 0) = T 2 (k) f 2 (a) Δ2 (k, z),
where f (a) is the linear growth factor which we found is t2/3 in a
matter-dominated Λ = 0 universe. The calculation of the transfer function is in
general quite complicated and includes the Mészáros effect, Silk damping and
other related effects in the early Universe. Finally, though, the (linear) matter
power spectrum can be expressed as Δ2 (k) ∝ k 3+ns T 2 (k). How do we specify
the constant of proportionality? The galaxy spatial correlation function is about
ξ(r) 1 1 on a comoving scale of r = 8h−1 Mpc. However, galaxy clustering
may be biased, so it’s conventional to convolve (see the box below) the matter
density distribution with a spherical kernel with 8h−1 Mpc, and use the standard
deviation of the resulting distribution known as σ8 to define the constant of
proportionality in the galaxy power spectrum. (A further complication is that σ8 is
usually measured from the filtered linear-theory density field.) We could then
use the observed value of σ8 as an equivalent definition of the galaxy bias
(b = σ8 (galaxies)/σ8 (mass)), provided that the bias isn’t scale-dependent.

Convolution
If you’ve ever experimented with image software like Photoshop, you’ll
know that it’s possible to smooth or blur an image. How does this work?
Well, at any particular point in the image, the software will consider a little
box around that point and take the average value in that little box. Perhaps
all pixels in that little box will be given the same weight, or perhaps it’ll be a

126
4.2 Cold dark matter and structure formation

weighted average with more weight given to pixels in the middle of the little
box than the edge. The software repeats this little-box-averaging around all
the positions in the image, and the result is a blurred or smoothed image.
It turns out that Fourier analysis has a very simple and beautiful way of
expressing this. Let’s suppose that the pixels are small enough that we can
consider an integral instead of a) sum. In one dimension, the blurring

could be expressed as B(x) = −∞ I(x) P (x + α) dα, where I(x) is the
un-blurred data, P is the weighting function (e.g. 1 in a range −L to +L,
representing the little box size, and 0 otherwise), and B(x) is the blurred
data. The weighting function P is sometimes called the kernel. In two or
more dimensions, we’d change x into a vector x and α into a vector α, but
otherwise it’s the same. We call this the convolution of I with P , and
it’s sometimes written as B = I ⊗ P . Now, it turns out that the Fourier
transforms of B, I and P , which we can write as B, B IB and PB , are related by
BB = IB × PB. In other words, a convolution in real space can be expressed as
just a multiplication in Fourier space. The reverse is true too: a convolution
in Fourier space is the same as a multiplication in real space.
An example is the treble control on a stereo. High musical notes have high
frequencies and short wavelengths, and since the Fourier wave number k is
proportional to the reciprocal of wavelength, the high musical notes are in
the high Fourier k modes. Now suppose that we don’t want the treble (high)
notes to be so loud, so we rig up some electronics that suppresses these
high k modes, with (say) an exp(− 12 (k/k0 )2 ) factor, where k0 is some
constant. We picked this factor because it’s a Gaussian, and it turns out that
the Fourier transform of a Gaussian is another Gaussian. Multiplying
the Fourier transform of our sound with a Gaussian to suppress the high
treble notes is therefore the same as convolving our sound with another
Gaussian. So, when you’re turning down the treble control on a stereo,
you’re smoothing out the sound waves coming out of the speaker, just like
blurring an image in Photoshop by convolving it with a Gaussian.

The number density of dark matter haloes per unit mass, known as the mass
function, can be predicted by assuming that the density fluctuations are Gaussian.
First we convolve the density field with a spherical kernel with radius R, which
corresponds to a mass scale
M = 43 πR3 ρ0 . (4.11)
The probability that a random point has an overdensity δconvolved is just
Pr(δconvolved ) dδconvolved
C 2 D
1 −δconvolved
= J exp 2 dδconvolved , (4.12)
2
2π σconvolved 2σconvolved

2
i.e. just a Gaussian distribution. It turns out that the variance σconvolved See, for example, Coles and
n
can be calculated from the power spectrum P (k) ∝ k (Equation 2.44):
s Lucchin, 1995, in the Further
σconvolved ∝ M −(3+ns )/6 . In particular, the probability that a region is above the reading section.

127
Chapter 4 The distant optical Universe

overdensity for virialized collapse δc = 1.686 is just


* ∞
Pr(> δc , M ) = Pr(δconvolved ) dδconvolved . (4.13)
δc

We’ve written this as a function of M because the smoothing scale R is equivalent


to a mass scale M (Equation 4.11). This isn’t quite enough to calculate the
mass function, because we need the clump to be isolated, i.e. surrounded by a
less dense region, so we need to subtract the probability Pr(> δc , M + dM ).
Also, there is the cloud-in-cloud problem: could an object that’s a virialized
clump on one mass scale M also be later contained in another larger clump on a
bigger mass scale? Moreover, doesn’t this blob counting miss all the mass in the
underdense regions? Perhaps we should multiply this probability by a factor of
two, since we’re missing the half of the matter that’s in underdense regions? A
careful but lengthy analysis of the cloud-in-cloud problem has shown that such a
factor of two correctly solves both problems, so the mass function is
n(M ) M dM = 2ρ0 {Pr(> δc , M ) − Pr(> δc , M + dM )}
$ $$ $
$ d Pr(δc ) $ $ dσconvolved $
$
= 2ρ0 $ $ $ $ dM,
dσconvolved $ $ dM $
thus
C D−α−2 9 C D <
M M 2α
n(M ) ∝ exp − , (4.14)
M∗ M∗

where we’ve defined σconvolved ∝ M −α and M∗ is a constant that depends on δc , α


and the normalization of the σconvolved –M relationship. This is known as the
Press–Schechter model, and is a surprisingly good fit to the results of N -body
simulations (an agreement that is not entirely understood).
The evolution of dark matter can be approximated by the Press–Schechter
approach and tracked more accurately with N -body simulations, but the evolution
of baryons is much more complicated because the physics is not purely
gravitational, but includes gas cooling and heating from photons and shocks,
energy and momentum input from winds, and multiple phases in the interstellar
medium. Understanding the formation of galaxies and the stars within them is one
of the key goals in observational cosmology. Much recent work has been done
on combining N -body simulations with simplified numerical models of the
chemical/dust/stellar content evolution of particular galaxies or classes of galaxy.
These so-called semi-analytic models have had many successes in reproducing
observed galaxy properties but currently still contain many adjustable parameters
that need to be tuned to match observations. There is, as yet, no consensus
between the semi-analytic interpretations. Different semi-analytic models make
different assumptions about, for example, the number of stars forming per unit
stellar mass (the initial mass function) and the feedback effects of energy and
momentum input from black hole accretion and star formation.
We also shouldn’t assume that there’s a one-to-one correspondence between dark
matter haloes and galaxies, and the halo occupation distribution describes the
probability distribution of the number of galaxies that a halo contains. This
distribution is one of the key predictions of models of galaxy evolution. It can be
constrained by the shape and amplitude of the spatial or angular correlation
128
4.3 Population synthesis

function on small scales (e.g. less than one comoving Mpc), where this function
reflects the way that galaxies populate the haloes.
Figure 4.5 shows why baryons must behave differently to dark matter: if they
didn’t, galaxies would have far more satellites, just as galaxy clusters have
many satellite clumps containing galaxies. Recent discoveries of Milky Way
satellites have alleviated this problem (Section 3.11), but there is still a factor of
∼×4 deficit at some mass ranges. It’s possible to explain this deficit if the
star formation in the lowest-mass haloes is suppressed, which numerical
simulations of feedback (Chapters 5 and 6) suggest may be the case. If so, our
Milky Way galaxy is surrounded by puddles of ‘failed dwarf galaxies’ that
have yet to undergo their first star formation. Another dark matter puzzle that
baryonic physics might solve is the lack of sharp density cusps at the centres
of dark matter haloes: N -body simulations predict that haloes should have a
Navarro–Frenk–White profile — which we shall meet in detail in Chapter 7, and
which predicts a density profile varying as ρ(r) ∝ 1/r at the centre. Local galaxy
observations suggest otherwise, perhaps because winds from supernovae or black
hole accretion drive out matter and smooth out the density profile.

Figure 4.5 The dark matter


4.3 Population synthesis density within a galaxy cluster
halo with a mass 5 × 1014 M)
We saw in Section 4.1 that having evolution in galaxy luminosities helped to (top), compared to a galaxy halo
resolve the faint blue galaxies problem. Can we predict how galaxies evolve from with a mass 2 × 1012 M)
first principles? Most of the optical light from galaxies comes from stars. There’s (bottom). Note the similar
a very well-developed theory for the evolution of stars along the main sequence, number of neighbouring objects.
along the giant branch and (for more massive stars) to supergiants and ultimately So why do galaxies not look like
to white dwarfs, neutron stars or black holes. If we make some assumption about mini galaxy clusters?
how the galaxies started, we could derive how the colours and luminosities of
galaxies evolve.
This technique is known as population synthesis (or sometimes ‘pop synth’,
colloquially). Figure 4.6 shows synthetic spectra of a galaxy evolving from a burst
of star formation that lasted 1 Gyr. There are several assumptions that one needs
to make, such as: whether star formation is ongoing or whether it occurred in a
single burst; the initial distribution of stellar masses, known as the initial mass
function (IMF); and what effect dust has on the spectra. This astronomical dust is
quite different from household detritus: it’s created in the winds of red giants and
in supernova explosions, and is composed of graphites and/or silicates. The grain
sizes of astronomical dust vary from sub-micron sizes to clusters of just a few
atoms. Large interstellar grains can penetrate the Solar System despite the solar
wind, and interplanetary/interstellar dust grains have been collected in the
upper atmosphere (see Figure 4.7). These are also the materials that comprise
protoplanetary discs and ultimately form asteroids and planets.
The amount of star formation per unit time is an important input to population
synthesis models. Depending on the context, an initial starburst might be
assumed, or the star formation history of a galaxy might be taken from predicted
merger rates. If there is no star formation, the change in colours purely from
stellar evolution is known as passive stellar evolution. In general this involves a
reddening and dimming of the galaxy.

129
Chapter 4 The distant optical Universe

6
0.4 Gyr

5
relative flux, Fλ

4
1.4 Gyr

3
3.4 Gyr

6.4 Gyr
Figure 4.6 Simulated galaxy 2
9.4 Gyr
spectra following a 1 Gyr-long
burst of star formation. Ages 13.4 Gyr
are marked in Gyr on each
1
spectrum. The spectra have been 17.4 Gyr
normalized to 1 and offset 1 Gyr burst model
vertically for clarity. Some
spectra have lines or continua 0
2000 4000 6000 8000 104
off the top of the figure, and λ/Å
have been truncated for clarity.

The initial mass function is one of the key unknowns in observational astronomy.
Often the initial number of stars per unit mass, dN/dm, is assumed to have
a simple form such as dN/dm ∝ m−2.35 over a range 0.1–100 M) or so,
which matches inferences from present-day mass distributions of Galactic stars.
However, it’s not at all clear whether the IMF varies from place to place within a
galaxy, or between galaxies. We’ll see in Chapter 5 that when star formation rates
of galaxies are estimated, the estimates all depend on light generated ultimately
by the most massive stars. In order to find the total number of stars being formed,
we need to extrapolate from these largest stars to the smallest, which is very
sensitive to the shape of the IMF.
Dust reddening is another unknown. Dust preferentially absorbs blue optical light
compared to red. This changes the (B–V) colour of a background star by an
amount symbolized as E(B–V). We can write the observed value of (B–V) as
(B–V)obs , and it’s related to the true value (B–V)true by
(B–V)obs = (B–V)true + E(B–V). (4.15)

130
4.3 Population synthesis

presolar silicates
interstellar organic matter Figure 4.7 A large
interplanetary dust grain
collected from the upper
atmosphere of the Earth
2 µm by a NASA aircraft in
2003, during the Earth’s
supernova passage through the
olivine
dust stream from comet
interstellar 26P/Grigg-Skjellerup.
nanoglobule Parts of the dust grain
appear to be pre-solar, and
other parts originated in
the interstellar medium.

Exercise 4.3 Which is redder: (B–V) = 0 or (B–V) = 1? Remember that


(B–V) is the B-magnitude minus the V-magnitude, and magnitudes m are related
to fluxes S by m = −2.5 log10 S + constant. ■
If we imagine a dust screen in front of a star, the amount of V-band attenuation in
magnitudes AV is related to the E(B–V) reddening by
AV = RV E(B–V), (4.16)
where RV is a constant that depends on the composition of the dust. For typical
Galactic dust, RV = 3.1, though it can range from 2.75 to 5.3. Different dust
compositions also change how absorbent the dust is at other wavelengths, as
shown in Figure 4.8. The LMC and SMC appear to be less enriched with heavy
elements than our Galaxy, and the dust in the LMC and SMC appears to have
different extinction properties. The differences in absorption at 217.5 nm are also
about dust composition; this feature is probably caused by small graphite grains.
We’ll revisit dust composition in Chapter 5.

SMC bar
LMC2 supershell
6 LMC average
Milky Way (RV = 3.1)
Aλ /AV

2
Figure 4.8 The relative
extinction as a function of
wavelength, for dust found in
0 2 4 6 8
the LMC in two locations, in
(1/λ)/µm−1
the SMC and in our Galaxy.

131
Chapter 4 The distant optical Universe

This assumes that the gas is One way of measuring the amount of dust attenuation is the Balmer decrement.
optically thick to the Lyman The hydrogen Balmer lines Hα (656.3 nm) and Hβ (486.1 nm) are emitted by gas
continuum — see, for example, that’s ionized by hot stars, with a characteristic ratio that’s calculable from atomic
Osterbrock, D.E. and Ferland, physics. We’ll meet these lines again in Chapters 5 and 8. If we write the optical
G.J., 2005, Astrophysics of depth to Hα photons as τ Hα , the Hα fluxes are by definition attenuated by e−τ Hα .
Gaseous Nebulae and Active The optical depth turns out to be related to AV by τ Hα 1 0.7AV . (It shouldn’t be
Galactic Nuclei, University a surprise that it’s a linear relation, since magnitudes are a logarithmic system and
Science Books. the optical depth appears in an exponential, e−τ Hα .) However, as we see in
Figure 4.8, the extinction and optical depth to Hβ will be greater. For Galactic
dust, τ Hβ 1 1.45 τ Hα . If we compare the observed flux ratio of Hα and Hβ,
See, for example, Diplas, A. SHα /SHβ , and compare the predicted ratio of 2.8 from atomic physics, we
and Savage, B.D., 1994, can infer τ Hα and hence AV . The extinction is also empirically related to the
Astrophysical Journal, 427, 274. gas column density (which we shall meet in subsequent chapters) through
NH /E(B–V) = 4.93 × 1021 atoms cm−2 mag−1 .
It’s not just the type of dust that affects the colours, it’s the location of the dust, as
the following (optional) exercise shows.

Exercise 4.4 The previous discussion of Balmer decrements assumed that the
dust is placed in a screen in front of the emission-line-emitting gas. Suppose
instead that the dust is evenly mixed in with this gas. Calculate the Balmer
decrement SHα /SHβ that you’d measure in this situation. Now suppose that you
observed this but wrongly assumed that the dust was in a screen in front of the
stars. What AV would you wrongly infer? (This is a more difficult and slightly
more open-ended exercise than most in this book, so for quantities not supplied in
the question, improvise!) ■
Another way of looking at this exercise is that the optical depth of extinction to a
given position depends on the wavelength, so Hα photons would have a lower
optical depth than Hβ photons or ultraviolet photons. However, the light that we
receive will always be dominated by the regions with optical depths of τ < 1, so
the observed light at shorter and shorter wavelengths will be dominated by regions
with lower and lower extinctions. The modern approach in population synthesis is
to include assumptions or predictions for the dust location, density variation and
composition.

4.4 Photometric and spectroscopic redshifts


All these effects complicate the spectra of galaxies, but this is also an advantage
because there is a lot of useful information in optical spectra. For example,
emission lines imply the presence of ionizing radiation from young, hot O stars
and B stars, or from an active nucleus, which we shall meet in Section 4.6. Blue
spectra also suggest the presence of young hot stars (Figure 4.6), while red spectra
can be the signature of old stellar populations (since the blue O and B stars are
short-lived). Elliptical galaxies and spiral bulges tend to have spectra consistent
with old stellar populations, while the bluer spectra of spiral discs are attributable
to more recent star formation.
The emission and absorption lines in Figure 4.6 are used to measure the redshift z
of galaxies, since each line will be shifted in wavelength by a factor (1 + z). We
need to know which emission line we’re observing in order to know the emitted
132
4.4 Photometric and spectroscopic redshifts

wavelength λem , and hence infer the redshift from the observed wavelength λobs
and 1 + z = λobs /λem (see Equation 1.10). But how can we find which emission
line is which? Often in practice certain features are characteristic enough to be
instantly recognizable, but this is not always the case.
One way is to use the wavelength ratios. Suppose that we have two emission lines
observed at wavelengths λobs,1 and λobs,2 . If they’re at the same redshift, then
λobs,1 λem,1 (1 + z) λem,1
= = . (4.17)
λobs,2 λem,2 (1 + z) λem,2
There are a limited number of astrophysically-plausible emission lines in galaxies,
so the observed wavelength ratios can quickly identify the emitted wavelengths.
Even when a galaxy doesn’t have emission lines, the absorption lines can be used
to find redshifts. Another absorption feature is the ‘break’ in the spectra in
Figure 4.6 at about 400 nm, known as the 4000 Å break, caused by Balmer
continuum absorption in the atmospheres of stars.
Spectroscopy is not always available, because it’s harder to get good
signal-to-noise ratios in spectra (where you need enough photons in each of
hundreds of pixels along the wavelength axis) than in broad-band imaging (where
all the photons passing through the filter are directed onto a few pixels in the
image). No matter how big your telescope, it’s always possible to make images of
galaxies too faint to find redshifts from spectra with that telescope. The next best
approach is to use the broad-band colours of galaxies to estimate the redshift. If a
galaxy has photometric measurements in many filters (e.g. UBVRIJHK —
see Section 3.3), this is effectively a spectrum with a very coarse wavelength
resolution. Instead of using emission lines, this approach uses the broad shape
of the spectrum to estimate the redshift. With good enough photometric
measurements in enough filters, it’s possible to distinguish redshifting from the
reddening from dust or the presence of old stars. The most likely redshift is
usually found from the minimum-χ2 fit to the photometric data using template
spectra. Figure 4.9 shows that this works on the whole, but it’s hard to avoid a few See, for example,
strong outliers known sometimes as ‘catastrophic failures’ of the fitting. Reducing Bolzonella, M., Miralles, J.-M.
these outliers is one of the main challenges in calculating photometric redshifts. and Pelló, R., 2000, Astronomy
The Sloan Digital Sky Survey (SDSS) used a custom-made set of broad-band and Astrophysics, 363, 476.
filters named u, g, r, i, z. The sharper boundaries of the filter curves help with
photometric redshifts.
As we’ll see in Chapter 8, high redshift galaxies can be identified from the fact
that intervening neutral clouds cause absorption of Lyman series lines. This
means that at observed wavelengths shorter than the redshifted Lyman α line (the
n = 2 → 1 hydrogen transition at an observed wavelength of 121.6(1 + z) nm),
and particularly those below the redshifted Lyman limit (91.2(1 + z) nm,
n = ∞ → 1), the continuum from the high redshift galaxy is strongly suppressed.
This means that z > 4 galaxies can be found by searching for galaxies with
detections in the B or g filters and longer wavelengths, yet non-detections in the U
or u filters. This photometric redshift technique is known as U-band dropouts or
Lyman break galaxies. Searches for galaxies at higher redshifts tend to focus on
dropouts at longer wavelengths, though it becomes progressively more difficult
because we start running out of cosmological volume, since dV /dz tends to zero
as z increases (the Hubble volume has a finite comoving size). Follow-up

133
Chapter 4 The distant optical Universe

6
HDF-S
5 MS-1054
CDF-S
4 MUSYC
HDF-N
3

zphot
Figure 4.9 A comparison of
1
photometric redshift estimates
with spectroscopic redshifts
(which have much higher
precision). Note that while the
technique works for most
objects, there are some strongly
outlying points known as 0
‘catastrophic’ failures. Five 0 1 2 3 4 5 6
survey data sets are used and are zspec
marked in different symbols.

spectroscopy often finds a Lyman α emission line (rest wavelength 121.6 nm), as
shown in Figure 4.10. Note the weak continuum at λ above Lyman α, and the lack
of continuum at λ below. However, the presence of the rest-frame 4000 Å break in
lower-redshift galaxies and the nearby [O II] 372.7 nm emission line has resulted
in some high-z galaxy claims that failed under closer examination.

z = 5.34
6 Lyα

4
Fν /µJy

4
2

0
Fν /µJy

7700 7800
λ/Å
2

6500 7000 7500 8000 8500 9000


Figure 4.10 The spectrum of λ/Å
a z = 5.34 galaxy.

134
4.5 Luminosity functions

The COMBO-17 survey (Classifying Objects by Medium-Band Observations in


17 filters) ingeniously used an intermediate approach, taking images in many
intermediate-width filters (Figure 3.3), from which photometric redshifts accurate
to δz/(1 + z) 1 0.02 could be found.

4.5 Luminosity functions


What are the most common galaxies in the present-day Universe? If we picked a
galaxy at random, what sort of galaxy would it be? The Milky Way is sometimes
described as a ‘typical’ spiral galaxy, so you might think that the answer is
Milky-Way-sized galaxies, but they turn out to be enormously outnumbered by
dwarf galaxies.
The number of objects per unit volume, per unit luminosity is known as the
luminosity function, sometimes given the symbol φ(L). Figure 4.11 shows
the SDSS g-band luminosity function of all galaxies, and for early-type and
late-type galaxies separately. For higher redshift galaxies it becomes important to
account for the expansion of the Universe, and the usual convention is to use
comoving volume (Chapter 1) when calculating luminosity functions. The shape
is generally quite simple: a shallow slope, which changes to a steeper slope or
exponential decline around some characteristic luminosity L∗ known as the break
luminosity, or equivalently at some characteristic magnitude M∗ . This is very
often parameterized as a Schechter function, which fits many galaxy luminosity
functions:
C D−α C D
L −L
φ(L) = φ∗ exp , (4.18)
L∗ L∗
where the normalization φ∗ , the faint-end slope α and the break luminosity L∗ are
free parameters in fitting the luminosity function. This is suggestively similar
to the Press–Schechter form above (identical if α = 1/2, i.e. a white-noise
spectrum), giving some theoretical motivation for this form. However, the
underlying assumptions of Press–Schechter are overly simplistic, as we’ve seen.
Also, the observed galaxy luminosity functions have different faint-end slopes to
the predicted dark matter halo distributions, implying that there is no simple
linear relationship between galaxy luminosity and the mass of its halo. The
physics of dark matter halo formation is (relatively) simple, but the physics of the
baryons (star formation, winds, shocks, radiation) is much more complicated.
Understanding the formation and evolution of galaxies is one of the major
challenges of modern cosmology.

Exercise 4.5 Figure 4.11 shows the number density of galaxies per absolute
magnitude. How is this related to the number density of galaxies per unit
luminosity? ■
Calculating the number density of galaxies in principle is very simple: count the
number of galaxies N within a volume V (we’ll assume that the galaxies aren’t
evolving), to find the number density ρ = N/V . This can be done if we know all
the galaxies within a particular volume V0 , known as a volume-limited sample.

135
Chapter 4 The distant optical Universe

−1

−2

log10 (φ/h3 Mpc−3 mag−1 )


−3

−4

Figure 4.11 The SDSS −5 total


g-band luminosity functions
late
for galaxies. (The notation
mag−1 means ‘per magnitude’.) early
Late-type and early-type (i.e. −6
−16 −18 −20 −22
spiral and elliptical) galaxies are Mg − 5 log10 h
also shown separately.

We could write this as


N
N 1 E
ρ= = 1. (4.19)
V0 V0
i=1

In practice, though, it’s slightly trickier, because high-luminosity galaxies can be


seen to greater distances than low-luminosity galaxies. Suppose that galaxy i has
probability pi of being detected in the volume. (Again, we’re assuming that the
galaxies don’t evolve.) The number density would then be
N
1 E 1
ρ= . (4.20)
V0 pi
i=1

For example, if there’s only a probability of 1/10 that a particular sort of galaxy
is in the volume, then each time we see that type of galaxy in the volume,
there must be nine more we’ve missed, so each one we see is ‘worth’ ten. If
it’s a volume-limited sample, then pi = 1 for every galaxy, and it reduces to
Equation 4.19. The root-mean-square (RMS) estimate of the uncertainty in ρ from
Equation 4.20 is just
?
7N
174
E 1
σρ = . (4.21)
V0 p2
i=1 i

The probabilities or their equivalents are in general known as the selection


function (‘function’ because in general it depends on properties of the galaxy
such as luminosity and redshift). As long as the selection function is known, the
number density can be calculated, as can the number density per unit luminosity
(the luminosity function).
136
4.5 Luminosity functions

A common sort of survey in astronomy is the flux-limited sample, which


lists all objects brighter than a given flux at a particular wavelength. Unlike a
volume-limited sample, this one can detect high-luminosity galaxies to greater
distances than low-luminosity galaxies. To calculate the selection probabilities pi ,
imagine that we could push galaxy i to greater distances. When this galaxy is
sufficiently far away that its flux equals the flux limit of the survey, this is the
furthest distance at which it can be seen. We can write this distance as dmax , and
the volume enclosed by that distance as V (dmax ) or Vmax . In static Euclidean
space, Vmax will just be 43 π d3max times the fraction of the sky covered by the
survey. Let’s choose a volume V0 that’s bigger than the biggest Vmax of any
of the galaxies. The probability that galaxy i is above the flux limit will be
pi = Vmax,i /V0 (again we’re assuming no evolution), where Vmax,i is the Vmax for
galaxy i. The number density is therefore
N N
1 E V0 E 1
ρ= = . (4.22)
V0 Vmax,i Vmax,i
i=1 i=1
We could also calculate the number density per unit luminosity (the luminosity
function) φ(L) by counting the number of galaxies in intervals ΔL:
1 E 1
φ(L) = , (4.23)
ΔL Vmax,i
i in L→L+ΔL
where this time the sum is over only those galaxies with luminosities in the
interval L to L + ΔL. This is known as the 1/Vmax method for calculating
luminosity functions. Luminosity functions from more complicated selection
functions can be found by writing down the detection probabilities pi . For
example, a wide shallow flux-limited survey with a central deep part would have
the pi dropping discontinuously as an increasingly-distant galaxy crosses the
shallow flux limit, and then dropping to zero as the galaxy passes below the
deeper flux limit. In these cases we use the pi to calculate the accessible volumes
for each galaxy, sometimes written as Va rather than the Vmax used for a single
flux-limited sample.
There are a number of assumptions built into these estimates. First, we’re
assuming that there are no types of galaxies that could not be detected anywhere
in the survey, i.e. pi is never 0. Second, we’re assuming that we have a good
estimate of the detection probabilities pi . Third, we’re assuming no evolution. It’s
worth checking to see if these conditions hold. The first is tricky, unless you are
measuring quantities that deliberately exclude objects outside your selection
function. For example, the g-band luminosity function of g-band-selected galaxies
is not prone to this problem, but (say) the g-band luminosity function of
radio-flux-selected galaxies might be. Spectroscopic redshifts can sometimes be
difficult to obtain, and the second assumption can fail in practice if only galaxies
in a biased subset of the sample have redshifts (e.g. galaxies that happen to already
have redshifts from other research programmes). Surveys without such biases are
known as complete samples. Sometimes only a random subset of galaxies is
targeted for redshifts, a technique known as sparse sampling. This reduces the
effective sky coverage of the survey, which can be folded into the pi estimates.
Note that a survey can be sparse sampled yet still be considered ‘complete’.
If we have a wrong estimate of the selection function, it would make a
non-evolving population look like it’s evolving. In local galaxy populations, the
137
Chapter 4 The distant optical Universe

key test is therefore whether they seem to evolve. There are many ways of testing
this, but one method is particularly free of other assumptions: the (V /Vmax / test.
The trick here is to use the cumulative probabilities. For any probability
distribution p(x), a sample x will be in the 10th percentile exactly 10% of the
time, in the 50th percentile 50% of the time, and so on. These percentiles are
essentially the cumulative probabilities. If we plotted a histogram of the values
of x, the shape would be p(x) (within the uncertainties). However, if we plotted
a histogram of the percentiles associated with each x, the histogram would
be uniform (again within the uncertainties).
)x In other words, the cumulative
probability distribution c(x) = 0 p(x% ) dx% is uniform from 0 to 1.
In the case of our flux-limited sample, the cumulative probability that a galaxy at
distance di is seen at any distance from 0 to di is Vi /Vmax,i , where Vi is the
volume enclosed by the distance di , and Vmax,i is, as before, the volume enclosed
by dmax,i . The test that there’s no evolution is therefore that Vi /Vmax,i should be
uniformly distributed from 0 to 1. In particular, this means that the average value
should be (V /Vmax / = 1/2. To test the null hypothesis of no evolution, we need a
measured value of (V /Vmax / and its uncertainty. You might think that the best
approach is to propagate the uncertainties on the redshifts, but in practice these
are effectively negligible. However, even if the null hypothesis holds and there is
no evolution, there would still be some variation in (V /Vmax /, as Exercise 4.6
shows. The usual procedure is to use this expected variation (in the case of no
evolution) as the uncertainty (V /Vmax /, which is then used to test whether the
sample data are consistent with no evolution.

Exercise 4.6 Show that the variance of a uniform distribution from 0 to 1 is


1/12, then use the central limit theorem to show
√ that the standard deviation of
(V /Vmax / from a sample of N galaxies is 1/ 12N , assuming that the null
hypothesis of no evolution holds. (Recall that the standard deviation is the square
root of the variance, and the variance is the mean of the squares minus the square
of the mean.)
Exercise 4.7 Does no evolution imply (V /Vmax / = 1/2?
Exercise 4.8 Does (V /Vmax / = 1/2 imply no evolution? ■
For more complicated selection functions, the idea is to use the pi to calculate the
enclosed volumes Ve and compare them to the accessible volumes Va , so the
(V /Vmax / test is generalized to (Ve /Va /. In surveys of high-redshift galaxies and
quasars (Section 4.6) the populations evolve strongly, so (V /Vmax / is no longer
useful for testing the knowledge of the selection function. Instead, it’s sometimes
used to demonstrate the presence of evolution.
A related and possibly equivalent concept to the selection function is selection
effects. This can occasionally have a pejorative overtone if there is some debate
over whether an effect is really present in a population or whether it is just due to
the selection of the sample of objects. For example, suppose that we had a galaxy
survey with both a radio flux limit and an optical flux limit, so a galaxy has to
have enough radio flux and enough optical flux to be present in the sample. If we
plotted the optical luminosity against the radio luminosity in this sample of
objects, there could well be a broad correlation, but this would be because both
luminosities would correlate with distance (because the numerous faint objects

138
4.6 Active galaxies

can only be seen nearby). This correlation would therefore be attributable to


selection effects.
Having said that, there is no such thing as an ‘unbiased’ survey without selection
effects. All astronomical surveys are defined by what they include and what they
exclude. Even if you set out to measure every luminous object in the observable
Universe, from galaxies to the smallest star, you would still have chosen to
exclude many classes of object (planets, dark matter haloes without galaxies, and
so on). Ideally you would choose a survey selection function that is tailored to
answer a particular scientific question; these inevitable selection effects can work
in one’s favour.
Finally, there is a sense in which our Galaxy can be considered typical, as the
following exercise shows.

Exercise 4.9 Assume that the galaxy luminosity function has a Schechter
function form, with a faint-end slope α < 1. Show that most of the light emitted
by galaxies per unit volume is dominated by galaxies near L∗ (also near the Milky
Way luminosity). ■

4.6 Active galaxies


We’ve already seen that the discovery of non-Euclidean radio source counts was
very strong evidence that the early Universe was very different to the present,
even before the discovery of the CMB. The radio source population surprised in
all sorts of other ways. In 1954, Baade and Minkowski discovered that the
bright radio source Cygnus A (Figure 4.12) is identified with a faint optical
galaxy. Suddenly it became apparent that radio sources sample a much more
cosmologically distant population than anyone had considered before. But what
generated these giant radio-emitting lobes? Baade wrote of the optical imaging:
I knew something was unusual the moment I examined the negatives. There
were galaxies all over the plate, more than two hundred of them, and the
brightest was at the center. It showed signs of tidal distortion, gravitational
pull between the two nuclei. I had never seen anything like it before. It was Figure 4.12 Radio image of
so much on my mind that while I was driving home for supper, I had to stop the radiogalaxy Cygnus A, taken
the car and think. with the Very Large Array at a
Pfeiffer, J., 1956, The Changing Universe, Victor Gollancz. frequency of 5 GHz at 0.5%%
resolution. This image is about
Then in 1963, Schmidt and Greenstein discovered that the radio source 3C273 4.3% from top to bottom.
was identified with an object at a (then gigantic) redshift of z = 0.158. The
optical luminosity implied by this was astounding: the V-band absolute magnitude
of −26.3 is about 120 times larger than the absolute magnitude of −21.1 of the
entire Andromeda galaxy.
● Why is a magnitude of −26.3 some 120 times larger than −21.1?
❍ Magnitudes m are related to fluxes S by m = −2.5 log10 S + k (where k is
some constant), so the difference between two magnitudes m1 and m2 is

139
Chapter 4 The distant optical Universe

related to their respective fluxes S1 and S2 by


m1 − m2 = −2.5 log10 S1 + k + 2.5 log10 S2 − k
= −2.5(log10 S1 − log10 S2 )
= −2.5 log10 (S1 /S2 ).

In other words, S1 /S2 = 10.0−0.4(m1 −m2 ) . We’re given a magnitude


difference of 26.3 − 21.1 = 5.2 magnitudes, which corresponds to a flux ratio
of just over 120.
These prodigious luminosities meant that quasars could be found at far greater
distances than galaxies; the astronomer J.H. Oort wrote that ‘quasars gave great
expectations for cosmology’. In optical photographic plates the object looked like
a star, so 3C273 was referred to as a quasi-stellar object (abbreviated QSO)
or a quasar for short. Figure 4.13 shows an optical image from SDSS, and
Figure 4.13 SDSS image of Figure 4.14 shows an optical spectrum of 3C273.
the quasar 3C273.

80 Hα

60

flux/mJy

40 Hγ [O III]

20

0
5000 6000 7000 8000
λ/Å

Figure 4.14 Optical spectrum of the quasar 3C273. Note the broad optical
emission lines from the quasar’s broad-line region (e.g. Hα).
Quasars typically (though not always) have very blue optical–ultraviolet spectra
and broad emission lines, sometimes with widths suggesting Doppler motions of
thousands of km s−1 . The optical–ultraviolet continuum varies on timescales of
weeks or less, suggesting physical sizes of light-weeks. These observations
suggest accretion onto a central massive object, and the size and velocity
constraints make it likely that this object is a supermassive black hole, i.e. one
with a mass " 106 M) , if only because any astrophysical alternatives would very
rapidly evolve into a single black hole.

140
4.6 Active galaxies

We’ll cover some of the detailed physics of quasars in Chapter 6 (see also
the further reading section, especially Kolb’s book Extreme Environment
Astrophysics). For now, we’ll state a few general results to define the terminology.
The modern picture of quasars and radiogalaxies is of a dusty torus that makes the
optical appearance depend strongly on orientation. Figure 4.15 shows a schematic
diagram of these models of active galactic nuclei (AGN).

narrow emission-line clouds

radio lobe

jet

gas and jet


dust torus black hole

broad emission-line clouds accretion disc host galaxy

10−4 10−2 1
10−5 10−3 10−1
distance from centre
in parsecs (b)
(a)

Figure 4.15 Schematic view (not to scale) of the dust-torus-based unified model of radio-loud active galaxies.
Where the broad-line region is visible, the active galaxy is seen as a type 1 object (i.e. with broad and narrow
lines), such as a Seyfert 1 or quasar. When the torus obscures the line of sight to the broad-line region, the active
galaxy is seen as a type 2 object (i.e. with narrow lines only), such as a Seyfert 2 or radiogalaxy. When the jet is
pointed directly along the line of sight, the jet luminosity can sometimes swamp the rest of the active nucleus;
these examples are referred to as blazars.

When the central black hole accretion (sometimes called the central engine) is
visible along the line of sight, the broad (> 1000 km s−1 ) emission lines and blue
optical–ultraviolet continuum are visible; when the host galaxy is visible, these
are known as Seyfert 1 galaxies, and in general these are type 1 AGN. When the
dusty torus obscures the line of sight to the central engine, only narrow emission
lines are visible and the continuum is dominated by starlight from the host
galaxy; these are Seyfert 2 or type 2 AGN. The broad line region and narrow line
region are shown in Figure 4.15. These AGN also show evidence for gas with
higher ionization than found in starbursts, such as having high ratios of [O III]
495.9 + 500.7 nm to [O II] 372.7 nm or Hβ 486.1 nm, or a high ratio of [N II]
654.8 + 658.4 nm flux to Hα 656.3 nm flux.

141
Chapter 4 The distant optical Universe

Figure 4.16 illustrates how star-forming galaxies and AGN emission lines
differ. About 10% of active galaxies are radio-loud, i.e. they have luminous
radio-emitting lobes. It’s not clear why some active galaxies have radio lobes
while others (the radio-quiet AGN) do not, but the lobes appear to be
caused by particle jets emanating from the central engine. These impact on
the interstellar/intergalactic medium and create a cocoon of plasma, in which
electrons spiral along magnetic fields lines and emit synchrotron radiation at radio
wavelengths. Even the radio-loud objects come in at least two distinct types:
Fanaroff–Riley (FR) types I and II (not to be confused with Seyfert type). FR-I
radiogalaxies have less luminous radio lobes that taper off in brightness towards
the edges (‘edge-darkened’), while FR-II radiogalaxies have more luminous,
edge-brightened lobes. About half the energy output of an FR-II radiogalaxy
comes out as jet kinetic energy — an astonishing output given that a quasar’s
luminous energy can exceed that of the rest of the galaxy combined.

10
intensity of [O III] 5007 Å
intensity of Hβ 4861 Å

0.1

0.01 0.1 1 10
intensity of [N II] 6583 Å
intensity of Hα 6563 Å

Figure 4.16 The relative strengths of narrow emission lines can be used to
diagnose whether black hole accretion or star formation is present in a galaxy.
This diagram is known as the ‘Baldwin–Phillips–Terlevich diagram’ or sometimes
‘BPT diagram’. Both wavelength pairs are close in wavelength, so are insensitive
to dust reddening. Open circles represent galaxies with emission lines from
H II regions (i.e. star formation), while the closed symbols are active galaxies.
(Filled circles are Seyfert 2 galaxies, and triangles are weaker AGN known as
‘low-ionization nuclear emission regions’ or LINERs.) AGN can be separated
from star-forming galaxies using the curved line.
Baade’s original question on the triggers of active galaxies is still with us. As
we’ll see in Chapter 6, the formation of supermassive black holes and their
growth through accretion are closely related to the formation of stars in their
host galaxies. The luminosity function of quasars is not in general well fit
by the Schechter function; instead, researchers have opted for an arbitrary
double-power-law parameterization:
φ∗
φ(L) = , (4.24)
(L/L∗ )α + (L/L∗ )β
where φ∗ , L∗ , α and β are free parameters to be determined from the data. This
model doesn’t have any underlying physical motivation, but if α > β then it
142
4.6 Active galaxies

has the property that φ ∝ L−β at luminosities far below the break luminosity
(i.e. L * L∗ ), and φ ∝ L−α at L 0 L∗ . The overall shape is therefore of a
shallow power law at faint luminosities, steepening at luminosities around L∗ to a
steeper power law at high luminosities.
Quasars held another surprise: the luminosity function of quasars evolves very
strongly. Figure 4.17 shows the evolution of bright quasars from SDSS.

10−7
ρ(z|Mi < −27.6)/Mpc−3

10−8

10−9
0 1 2 3 4 5 6
redshift, z

Figure 4.17 The total comoving number density of SDSS quasars with i-band
absolute magnitudes brighter than −27.6.
It appears that quasars were far more common at a redshift of z = 2 than at the
present. The first measurements of this evolution parameterized these changes as
evolution in φ∗ only (pure density evolution or PDE) or in L∗ only (pure
luminosity evolution or PLE). Initial indications were that PLE fitted better,
suggesting a long-lived population of quasars that dimmed over cosmological
times; however, the radio-loud subset also showed PLE, while the electron energy
loss from synchrotron radiation in their radio lobes implied ages of only tens of
millions of years. The PLE decline in quasar number density from z = 2 to z = 0
varies approximately as (1 + z)3 . We’ll return to the underlying causes of this
sudden increase and decline in quasar activity in the cosmos in Chapters 5 and 6,
but an interesting clue comes from comparing the evolution in galaxy–galaxy
major merger rates (i.e. mergers of similarly-sized galaxies) inferred from optical
galaxy surveys: the merger rate varies as (1 + z)2.7±0.6 at least up to z = 1.
Numerical simulations have shown that galaxy–galaxy mergers could drive gas to
the final common centre, so a link is plausible. At the highest redshifts, the quasar
number density appears to drop quickly (Figure 4.17), known as the redshift
cut-off of quasars. Again the underlying physical causes of this change are still
debated.

Exercise 4.10 How would PDE and PLE translate a luminosity function in the
(log φ, log L) plane? ■

143
Chapter 4 The distant optical Universe

Exercise 4.11 (This is a more open-ended and difficult exercise than most in
this book.) The luminosity L of radio lobes depends on the kinetic energy output
per unit time of the radio jet, Q, on the density of the surrounding medium ρ
(which can be inferred independently from other observations), and on time. As
the radio jet burrows deeper into the surrounding medium, the cocoon of ionized
plasma increases in size with time, so the linear size r of the radio lobes increases.
See, for example, Miller, P., It can be shown that the jet power output Q is related to these observables roughly
Rawlings, S. and Saunders, R., as Q ∝ L6/7 r−4/7 ρ−1/2 . Comment on whether radio lobe surface brightness is
1993, Monthly Notices of the suitable for the Tolman test. ■
Royal Astronomical Society,
263, 425. One advantage that radiogalaxies have over radio-loud quasars is that the
observed-frame optical luminosities are not dominated by the central active
nucleus, so the host galaxy is visible. The K-band light is much less sensitive to
young stars than the ultraviolet (Figure 4.6), so should be a measure of the
assembled stellar mass in the host galaxy. How does the K-band luminosity of
radiogalaxies evolve? It turns out that the K-band Hubble diagram (apparent
magnitude versus redshift) has a tight scatter, consistent with just passive stellar
evolution in the host galaxies that locally are giant ellipticals. However, this
scatter increased considerably above a redshift of around 2. Is this the formation
epoch of giant elliptical galaxies? Another effect conspired to increase the
observed scatter: at z > 2 the [O III] 500.7 nm emission line redshifts into the
K-band window at 2–2.5 µm, so the added dispersion in the K–z relation might
just reflect variations in the emission line contribution.

4.7 Deep-field surveys and wide-field surveys


In 1996 the Hubble Space Telescope (HST) made an ultra-deep survey that was to
revolutionize the study of the evolution of galaxies. The idea was to invest 150
orbits (about 5% of a year’s total) using the HST’s Wide Field Planetary Camera 2
instrument (WFPC2), mainly making a deep image of a single camera field of
view in several broad-band filters, in a region in the constellation of Ursa Major.
Many of the galaxies detected would be too faint for optical spectroscopy, which
in part drove the development of better photometric redshift estimates. This
project was known as the Hubble Deep Field (HDF). A later HST survey
performed a similar WFPC2 map in the southern hemisphere with a z = 2.24
quasar centred in the HST’s ultraviolet imaging spectrograph STIS, and the
two are now known as the HDF North and HDF South (HDF-N and HDF-S).
The images of HDF-N are shown in Figure 4.18. Unusually, the data were
made public immediately and the astronomical community pounced on them,
metaphorically. The HDF-N and its successors contained many surprises. One is
that the sky is mostly black — it’s not a hopeless jumble of overlapping objects (a
further answer to Olbers’ paradox). Most distant galaxies appear quite small,
despite the fact that angular diameter distance has a maximum around z 1 2
(Figure 1.18). Another surprise was the evolution of the galaxy merger rate (see
above) and galaxy morphologies. While there are plenty of examples of spirals
and ellipticals as in the local Universe, the familiar Hubble sequence disintegrates
at high enough redshift. There are more irregular morphologies at faint apparent
magnitudes. The U-band dropouts in particular tended to have the very disturbed
morphologies. Figure 4.19 shows an example of a type of U-band dropouts that
became known as chain galaxies.
144
4.7 Deep-field surveys and wide-field surveys

Figure 4.18 The Ursa Major constellation (top left), with a square degree region marked in red. A zoom into
this region shows the location of the Hubble Deep Field North (HDF-N) in red. Note that the region of sky is
unremarkable. The location was chosen to avoid bright objects at all wavelengths (e.g. bright stars in the optical,
bright galaxies in the far-infrared or radio); not every patch of sky is suitable for a blank-field survey. The panel to
the right shows the final zoom into the HDF-N itself.

Figure 4.19 Details from


the Hubble Deep Field
North. Alongside elliptical
and spiral morphologies
familiar from the Hubble
sequence, there are many
disturbed and/or irregular
galaxies, including blue
elongated ‘chain galaxies’.

145
Chapter 4 The distant optical Universe

These are part of what was originally called the ‘faint blue galaxy’ population.
These Lyman break galaxies cluster strongly. Combined with their number
density, it became clear that the bias parameter for these galaxies was high,
effectively cancelling the weaker clustering of dark matter at earlier epochs, so the
overall clustering strength resembled present-day galaxies.
Searches for U-band dropouts yielded another surprise: a population of galaxies
with a strong Lyman α emission line but weak continuum. (See, for example,
Steidel, C.C. et al., 2000, Astrophysical Journal, 532, 170.) These have become
known as Lyman α blobs. The widths of the lines suggest that the ionization
causing the Lyman α line is from star formation, though at least one candidate has
been found with an obscured X-ray core (Figure 4.20), suggesting hidden AGN.
We’ll return to the coupled formation of black holes and galaxies in later chapters.
It’s not clear what fraction of the ionizing radiation escaped from galaxies at high
Figure 4.20 A composite redshifts; we’ll return to this in Chapter 8.
image of a Lyman α blob, with Other new galaxy populations have been discovered in these deep field surveys.
Lyman α coloured yellow, The Extremely Red Objects (ERO) were found to be very faint in optical
Spitzer infrared data marking wavelengths, but very bright in the near-infrared, with colours (R–K) > 5. It
the sites of dust-shrouded star was not initially clear whether these were very dusty star-forming galaxies or
formation coloured red, and redshifted old stellar populations. In fact, both appear to be present in the ERO
Chandra X-ray data marking the population. EROs cluster strongly, suggesting that they are associated with more
site of supermassive black hole massive dark matter haloes than Lyman break galaxies. We’ll return to this in the
accretion marked in blue. This next chapter. A related population is the BzK galaxies, selected in the (B–z)
may be an example of the versus (z–K) colour–colour plane to satisfy (z–K)–(B–z) > −0.2 (when the
feedback processes in action magnitude zero points are given in the AB system). This appears to select z > 1
(Chapters 5 and 6). star-forming galaxies independently of reddening because reddening moves
See Daddi, E. et al., 2004, galaxies parallel to the (z–K)–(B–z) = −0.2 threshold.
Astrophysical Journal, 617, 746.
Like quasars, the galaxy luminosity function evolves strongly at all wavelengths
investigated so far. Figure 4.21 shows the evolution in M∗ and φ∗ in the best-fit
Schechter functions at several wavelengths. The deepest survey so far is the
Hubble Ultra Deep Field (UDF). This is a further ultra-deep survey with the
more sensitive Advanced Camera for Surveys on the upgraded HST, some
details of which are shown in Figure 4.22. The most recent (and possibly final)
refurbishment of the HST introduced the WFC3 instrument, which has yielded
z ∼ 10 galaxy candidates in the UDF using the Lyman dropout technique, shown
in Figure 4.23. A surprise from these observations has been that the luminous
output from these galaxies is declining so quickly that these galaxies could not
have reionized the Universe (see Chapter 8).
We can use the evolution in the luminosity function to estimate the cosmic
star formation history, i.e. the amount of mass being turned into stars per unit
comoving volume per year, as a function of redshift. Because heavy elements
are synthesized in stars, this also traces the metal enrichment of the Universe
throughout its history. Recall that astronomers use ‘metal’ to refer to any elements
heavier than helium.

146
4.7 Deep-field surveys and wide-field surveys

−5 1
1500 Å
2800 Å
−4 0.8
SDSS data
"
−3 u
0.6
M ∗ − M0∗

φ∗ /φ∗0
−2
g" 0.4

−1
0.2

0 SDSS data
0
0 2 4 6 0 2 4 6
redshift, z redshift, z

Figure 4.21 The evolution of φ∗ and M∗ of the best-fit Schechter function for
galaxies, at various wavelengths, determined from the FORS Deep Field survey.
Also shown are the local SDSS galaxy constraints.

Figure 4.22 Details from the Hubble Ultra Deep Field, taken with the HST’s
Advanced Camera for Surveys.

147
Chapter 4 The distant optical Universe

V +i+z Y J H

Figure 4.23 Candidate


z 1 10 galaxies discovered UDFj-43 696 407 H = 28.9 J − H > 1.5
using the Lyman break
technique as J-band (1.2 µm)
dropouts, using the new WFC3
on the HST. Confirmation of the NICMOS
H-band (1.6 µm) detections
comes from an average of the
H-band images taken with the UDFj-35 427 336 H = 29.1 J − H > 1.4
HST’s earlier observations using
the less-sensitive NICMOS
instrument. These galaxies are
seen just 500–600 Myr after
recombination. Each image is
2.4%% × 2.4%% . UDFj-38 116 243 H = 28.9 J − H > 1.6

We’ve seen that star-forming regions have hot young O and B stars that are most
luminous at ultraviolet wavelengths. The amount of ultraviolet light being
emitted per unit comoving volume could therefore be used as an estimator of the
volume-averaged star formation rate. If we integrate L φ(L) (where L is the
ultraviolet luminosity and φ(L) is the luminosity function at this ultraviolet
wavelength), we can calculate the ultraviolet luminosity density. The next step is
to extrapolate from the O and B star formation to calculate the formation of stars
of all types. For this, one needs to assume an initial mass function for the stars.
The next step is to account for dust obscuration, to which ultraviolet luminosity is
particularly prone. This is also difficult. As we’ve seen, the extinction depends on
both the geometry of the dust and the dust composition. One approach (known
See Calzetti, D., Kinney, A.L. as the Calzetti extinction law) is to use empirical correction derived from a
and Storchi-Bergmann, T., 1994, comparison of models with optical–ultraviolet rest-frame spectra of galaxies.
Astrophysical Journal, 429, 582. Figure 4.24 shows the cosmic star formation history derived from ultraviolet
observations with this extinction law. This was first known as the ‘Madau plot’
Madau, P. et al., 1996, after the 1996 Madau et al. paper that first appeared to detect the high-redshift
Monthly Notices of the Royal decline. Later usage changed to refer to the Madau diagram or Madau–Lilly
Astronomical Society, 283, diagram (the latter acknowledging earlier work that detected the initial increase
1388. from z = 0 to z = 1). Equivalently, the term cosmic star formation history is
used. This diagram has been enormously influential, but it’s wise to keep in mind
the uncertainties in the underlying assumptions. We’ll see in Chapter 5 other
approaches to constraining this diagram. It is superficially similar to the evolution
of quasars, which immediately suggested a physical connection between the
formation of stars and the growth of black holes.
Another way of constraining the cosmic star formation history is with the
present-day optical spectra of nearby galaxies. Population synthesis models can
be used to determine (or at least constrain) when the bulk of the stars formed, i.e.
the location of the peak in the Madau diagram. The result of this analysis applied
to the SDSS spectra of about 3 × 105 galaxies shows a decline from z = 1 to
148
4.7 Deep-field surveys and wide-field surveys

t/Gyr
10 5 2 1 0.6 0.4

−1
$
star formation rate
M! yr−1 Mpc−3

−2

Figure 4.24 The cosmic star


#
log10

−3 formation history (also known


as the Madau diagram) derived
from rest-frame ultraviolet
0 2 4 6 8 10
luminosities of galaxies. The
redshift, z data have had a correction for
dust extinction applied.

z = 0, but the analysis doesn’t have the redshift resolution to find the location of
the peak.
In the previous chapter we spent some time discussing local galaxy scaling
relationships, so it’s worth saying briefly how these relationships change at higher
redshift. At redshifts z < 1, ellipticals show some evolution in the fundamental
plane, at least some of which may be due to passive stellar evolution, though there
are discussions of how the selection effects of magnitude-limited samples affect
the result.3 Meanwhile, spiral galaxies show evolution in the Tully–Fisher
relation, suggesting ‘differential’ evolution,4 meaning that different types of
galaxies evolve differently: low-mass galaxies appear to have undergone more
star formation more recently than higher-mass galaxies. The z 1 1 edge-on
spiral discs appear to have thicker widths than their z = 0 counterparts, and
disturbances such as warps were more common5 at z = 1.
The HST seems to be a long way from being limited by galaxy–galaxy overlaps,
but other telescopes have not been so fortunate. There is a threshold known as the
confusion limit beyond which one can’t rely on being able to separate individual
objects. Even the HST can reach this limit when observing a dense star cluster.
Intuitively, there has to come a point where the RMS fluctuations of the image
√ are
no longer dependent on the noise in the image (which would reduce as 1/ time)
but instead are dominated by the fluxes of the faint sources lying below the
detection threshold.
The confusion limit is usually defined as three or five times the fluctuations from
background objects. The location of the confusion limit varies depending on the
slope of the source counts, and is surprisingly high, as the following worked
example shows. We shall use the term beam to mean the area on the sky occupied

3
See, for example, Treu, T., 2003, astro-ph/0307281, and Almedia, C., Baugh, C.M. and Lacy,
C.G., 2007, Monthly Notices of the Royal Astronomical Society, 376, 1711.
4
See, for example, Böhm, A. and Ziegler, B.M., 2007, Astrophysical Journal, 668, 846.
5
See, for example, Reshetnikov, V.P., Dettmar, R.-J. and Combes, F., 2003, Astronomy and
Astrophysics, 399, 879.
149
Chapter 4 The distant optical Universe

by a point source, i.e. an object that’s spatially unresolved by that telescope. For
example, if a point source has a Gaussian shape with a (standard deviation) width
of r arcseconds, then the beam area Ω could be regarded as πr2 square arcseconds
(though conventions differ slightly on this choice). We’ll use a detection limit of
five times the noise, because then less than one beam in 3.5 million will have a
random detection. This may seem excessively cautious, but remember that
megapixel cameras are now very common and it’s quite possible to have many
millions of beams in an image, particularly in wide-field astronomical mosaics.

Worked Example 4.1


Find the noise caused by point sources below the detection limit Slim ,
assuming a source count slope of N (> S) = kS −α and a beam area of Ω.
At what surface density of sources are these fluctuations one-fifth of Slim ?
(You might predict a bit less than one source per beam, but this is very
wrong.) You can make use of the fact that the variance in the number of
galaxies equals the number of galaxies (Poisson statistics).

Solution
In an interval S → S + dS, the total number of sources per
unit area will be dN = αkS −α−1 dS. We can write this as
dN = α(N (> S)/S) dS. Therefore the number of sources in one beam will
be Ω dN = α Ω(N (> S)/S) dS. This will be subject to Poisson statistics,
so the variance will equal the mean, i.e. α Ω(N (> S)/S) dS. The variance
of the flux (as opposed to the number) will be S 2 times the variance in the
number, i.e. αSΩ N (> S) dS. Integrating this from S = 0 to S = Slim gives
* Slim
Var(S) = αSΩkS −α dS
0
α
= k S 2−α Ω
2 − α lim
α 2
= Ω N (> Slim ) Slim .
2−α
-
Therefore the noise σ = Var(S) will be
#
α
σ= Ω N (> Slim ) Slim .
2−α
Now, the quantity Ω N (> Slim ) is the number of sources per beam.
We’ll find it more useful to work in the number of beams per source,
(Ω N (> Slim ))−1 . If we want the detection limit Slim to be five times the
noise level, i.e. Slim = 5σ, then
C D−1/2
α
Ω N (> Slim ) = 5,
2−α
which rearranges to give
25α
(Ω N (> Slim ))−1 = .
2−α
If the source count slope is Euclidean, then α = 1.5, so we’d need one
source per 75 beams! In practice the source counts flatten at the very
faintest fluxes, and one source per 20–40 beams is usually considered as the
confusion limit.

150
4.7 Deep-field surveys and wide-field surveys

The confusion limit severely constrains what can be said about individual objects,
but it may still be possible to constrain the shape of the source count slope from
the histogram of pixel values in a confusion-limited map. This is known as a P(D)
analysis, where D is the deflection from the mean in the map, and P is the
probability of that deflection. The observed histogram is compared to the pixel
value distribution predicted by a given source count model. We’ve also assumed
that the underlying point sources are not clustered, but clustering will increase the
confusion limit. It may also be possible to constrain the clustering of the sources
below the confusion limit by measuring the angular power spectrum of the
distribution of pixel values, known as a fluctuation analysis.

Exercise 4.12 You might think that a limit of five times the RMS fluctuations
(a one in 3.5 million chance of a random noise spike for Gaussian noise) is
extraordinarily conservative. For a beam of one square arcsecond, which is typical
in some ground-based optical imaging, calculate how big an image has to be
in square degrees to have one random 5σ noise spike on average, assuming
Gaussian noise. (For comparison, there are several cameras on world-class optical
telescopes with fields of view of at least a half a degree along each side.) ■
The HDF-N and HDF-S are examples of pencil-beam surveys: very deep but
narrow surveys, covering a long and thin volume of the Universe. The opposite
strategy is a wide-field survey: shallower, but wider area. On the widest scales,
we’ve already met the SDSS survey, which covers a quarter of the sky. Digitized
versions of photographic plates taken by Schmidt survey telescopes are available
online for the whole sky, known as the Digitized Sky Survey (DSS). The DSS is available online
from several sites, such as
Exercise 4.13 For many observations, including HST imaging, the depth of an [Link]
image is proportional to the square √
root of the time spent integrating, i.e. the
faintest flux S is proportional to 1/ t. Suppose that the galaxy source counts
are Euclidean. Would you detect more galaxies in a pencil-beam survey or a
wide-field survey, for a fixed amount of observing time? What source count slope
would tip the balance in the opposite direction? ■
The HDF has perhaps had such an enormous impact on cosmology that time
allocation committees on the HST and other telescopes have been emboldened. In
any case, the HDF-N and HDF-S have been supplemented by various wider-area
HST surveys, most notably COSMOS (Cosmological Evolution Survey), which
covered 2 deg2 . We shall meet some of its key results in Chapter 7.
Figure 4.25 shows the redshift histogram in the HDF-N. The non-uniformity
is quite striking. This illustrates one disadvantage of pencil-beam surveys:
large-scale structure fluctuations can have a big effect on some measurements
such as the redshift distribution. This was part of the motivation for making a
second Hubble Deep Field, HDF-S. The similarity of the galaxy populations in
HDF-N and HDF-S is sometimes cited as confirmation of the cosmological
principle, though the redshift histograms differ. We saw in Chapters 2 and 3 how
the variance of the galaxy distribution depends on scale, and we can use this to
give an order-of-magnitude estimate for the fluctuations in our survey, e.g. by
defining k = 1/V 1/3 , where V is the comoving volume of the survey. (This is, in
fact, an underestimate of the fluctuations — see the further reading section.) The
clustering of dark matter is expected to be weaker at high redshift, but to first
order this is cancelled by the evolving bias parameter of galaxies, as we’ve seen.
151
Chapter 4 The distant optical Universe

20

15

N (z)
10
Figure 4.25 Redshift
histogram in the HDF-N
estimated from a sparse
5 sample of 140 objects. Note
the peaks in the redshift
distribution, caused by
0 the pencil-beam survey
0.3 0.4 0.5 0.6 0.7 0.8
passing through large-scale
redshift, z
structures.

Figure 4.26 shows how the sizes of galaxies evolve in the Hubble UDF. Galaxies
3 of a given luminosity are consistently smaller at higher redshifts, with a
size-dependence scaling approximately as (1 + z)−1.1±0.2 . What could cause this
size evolution? One clue comes from the phase space density of a galaxy. This is
2
the volume that a galaxy (or a portion of a galaxy) occupies in an imagined
rhl /kpc

six-dimensional space of three space dimensions (x, y, z) and three velocity axes
(vx , vy , vz ). One can estimate it by dividing the mass density by the volume
1
of an ellipsoid with the axes equal to the velocity dispersions in each of the
three velocity axes. This is a useful quantity because numerical simulations of
0 galaxy–galaxy mergers have shown that phase space density decreases by a factor
2 3 4 5 6 of a few during a merger, and it can be shown by Liouville’s theorem (Chapter 7)
redshift, z that phase space density cannot be increased, unless the stars’ kinetic energy is
dissipated into, for example, gas motions or radiation. Some elliptical galaxies
Figure 4.26 The half-light
have phase space densities consistent with being the merger product of spiral
radii (the radii containing 50%
galaxy collisions, and numerical simulations predict that the final product would
of the light) of galaxies in the
have an elliptical morphology. On the other hand, the cores of giant elliptical
Hubble UDF.
galaxies have phase space densities much higher than those of spiral galaxies, so
they cannot be formed by (dissipationless) mergers. However, Lyman break
galaxies at z > 5 have phase space densities similar to the cores of present-day
massive ellipticals. Could these be the progenitors of today’s giant ellipticals?
Perhaps. We’ll meet another population of galaxies making a similar claim in
Chapter 5: the submm galaxies, which have giant starbursts as expected in the
original monolithic collapse model (Chapter 3). It seems that we’re still some way
from resolving the monolithic collapse versus disc–disc merger debate on the
origin of elliptical galaxies.
The sizes of high-redshift elliptical galaxies have recently thrown up another
fundamental puzzle, which at the time of writing is unresolved. There is a very
See van Dokkum, P.G., numerous population of small, passively-evolving (i.e. not star-forming) elliptical
Kriek, M. and Franx, M., 2009, galaxies at z > 1 whose luminosities imply large stellar masses of > 1011 M) .
Nature, 460, 717, and references These are sometimes called red nuggets, and they have no local counterparts.
therein. Could their luminosities be a misleading measure of the underlying stellar
mass, for some reason? Extremely deep spectroscopy of one example suggests not:
152
4.7 Deep-field surveys and wide-field surveys

1
Fλ /10−15 J s−1 m−2 Å−1

0.5 0.6

0.4
P
0.2
0
100 200 400 600 800
velocity dispersion/km s−1
0
4000 5000 6000 7000
(a) rest-frame wavelength/Å
(b) HST image (c) model (d) difference

0"" .5 5 kpc

Figure 4.27 The spectrum of galaxy 1255-0 (grey), with a smoothed version shown in black and the best-fit
population synthesis model shown in red. The wavelengths of some absorption lines are marked in yellow.
The insert shows the likelihood distribution of the galaxy’s velocity dispersion, using two different methods of
estimating the noise. The panels to the right show the HST 1.6 µm image, a model, and the difference between the
two.
despite a small effective radius of just 0.78 ± 0.17 kpc, this galaxy has an
enormous velocity dispersion of 510+165
−95 km s
−1 (Figure 4.27), implying a stellar

mass of around 2 × 1011 M) . How do these galaxies transform into present-day


massive ellipticals? Major galaxy–galaxy mergers don’t appear to puff up the
sizes by enough, though a large number of accretion events of small galaxies
might work, as might a significant energy input from supermassive black hole
accretion, which we’ll discuss in Section 5.8. The implication of having many
minor mergers has been seen by some as significant evidence against monolithic
collapse models for massive ellipticals; whatever mechanisms are involved, a
single monolithic collapse cannot be the whole story. But with so many minor
mergers, how can the tight dispersion in the fundamental plane be maintained?

153
Chapter 4 The distant optical Universe

What about elliptical galaxies that form later — why do they conform to the same
fundamental plane? One possibility is that red nuggets are the cores of giant
elliptical galaxies at z > 1, and we simply haven’t imaged deeply enough to see
See Mancini, C. et al., 2009, the diffuse faint outer regions. Clearly, there are still many unanswered questions
arXiv:0909.3088. about the evolution of massive galaxies.

4.8 Morphological K-corrections


At high redshifts, optical imaging samples the rest-frame ultraviolet light in
galaxies. Might this affect the observed morphologies of high-redshift galaxies?
Examples in the local Universe strongly suggest that it might. Figure 3.10 showed
the familiar optical image of the Andromeda galaxy, M31. Compare this to
Figure 4.28, which is an ultraviolet (135–275 nm) image taken with NASA’s
Galaxy Evolution Explorer (GALEX) space telescope. As we saw in Figure 4.6,
young hot O stars and B stars emit most of their light in the ultraviolet, so
we shouldn’t be too surprised to see the ultraviolet images dominated by star
formation in H II regions. The rest-frame ultraviolet morphology of M31 is quite
different to the rest-frame optical morphology. Could the unusual high-redshift
morphologies in the Hubble Deep Fields, such as chain galaxies, similarly be due
Figure 4.28 The Andromeda to the different rest-frame wavelengths at high redshifts?
galaxy, M31, as seen in the This effect is sometimes known as morphological K-corrections by analogy to
ultraviolet. the K-correction effect on fluxes discussed in Chapter 1. The key test is whether
the rest-frame optical morphologies (observed-frame near-infrared) in the Hubble
Deep Fields are the same as those in the rest-frame ultraviolet (observed-frame
optical). Some of the highest angular resolution near-infrared images have been
made with the HST’s NICMOS camera, and the NICMOS imaging of the Hubble
Deep Field North found very similar morphologies, as shown in Figure 4.29. At
the moment it appears that morphological K-corrections are not a strong effect for
most galaxies, though counter-examples in a minority have been found.

4.9 The blue cloud and red sequence


The evolution of stars is completely characterized by their tracks on the
Hertzsprung–Russell diagram, also known as the colour–magnitude diagram. A
star’s position on the colour–magnitude diagram can reveal a great deal about its
composition and internal structure, past and future. Galaxies are made (partly) of
stars, so can we characterize their evolution on the galaxy colour–magnitude
Bell, E.F. et al., 2004, diagram? This approach was first made in 2004 by Eric Bell and generated some
Astrophysical Journal, 608, 752. useful insights. However, keep in mind that much less of the information about
galaxies is encoded in the galaxy colour–magnitude diagram than is the case with
stars, because galaxy properties are not uniquely determined by the average
luminosity and colour. Figure 4.30 shows the colour–magnitude diagram of
galaxies from the COMBO-17 survey.
The distribution of galaxies in the colour–magnitude plane is typically segregated
into the red sequence of red, passively-evolving galaxies and the blue cloud of
more actively star-forming galaxies. Broadly speaking, the red sequence is
occupied by early-type galaxies and the blue cloud by late-type galaxies.
154
4.9 The blue cloud and red sequence

z = 0.75 z = 0.95

Figure 4.29 Optical


z = 0.96 z = 1.01 and near-infrared
(0.8, 1.1, 1.6 µm)
morphologies of galaxies in
the Hubble Deep Field
North, at a variety of
z = 1.36 z = 2.01 redshifts. Most optical and
near-infrared morphologies
are similar, so morphological
K-corrections are not
responsible for observed
z = 2.27 z = 2.80 morphological evolution.

The underpopulated narrow region between these two has come to be known
as the green valley. Some authors have used the red sequence as a de facto
morphology-independent definition of early-type galaxies at z < 1; in fact, one
result of the Galaxy Zoo project is that the colour–density relation is stronger Bamford, S. P. et al., 2009,
than the morphology–density relation (Chapter 3) in the local Universe. The red Monthly Notices of the Royal
sequence and blue cloud both evolve with redshift (Figure 4.30). The blue cloud Astronomical Society, 393,
becomes slightly redder with time, perhaps because of stellar ageing or increasing 1324.
dust content. There are also far more luminous blue cloud galaxies at z > 0.5 than
in the local Universe. Similarly, the red sequence becomes redder on average with
time, consistent with passive stellar evolution. Taking into account the effects of
passive stellar evolution, the numbers and magnitudes of galaxies in the red
sequence imply a build-up of stellar mass in early-type galaxies by a factor of 2
since z = 1.
These observations could be explained if some star-forming galaxies in the blue
cloud stop forming stars, then move to the red sequence and evolve passively.
Galaxy evolution in general would then be a story of formation in the blue
cloud (or merger-induced starbursts moving a system into the blue cloud), then
migration across the green valley to join the red sequence. This, however, begs
the question of what mechanism stopped the star formation. There have been
suggestions that active galaxies are more common in the green valley, suggesting
that AGN activity is somehow responsible for truncating star formation (because
the green valley objects might be expected to be transition objects). Furthermore,
ultraviolet estimates of the star formation rate in (morphologically) early-type
galaxies find that the amount of star formation anti-correlates with the velocity
dispersion of the galaxy, but not with the overall galaxy luminosity. We’ll see in Schawinski, K. et al., 2006,
Chapter 6 that this velocity dispersion is closely linked to the mass of the central Nature, 442, 888.
supermassive black hole. The truncation of star formation does appear to have
something to do with the central black hole.

155
Chapter 4 The distant optical Universe

0.2 < z ≤ 0.3 0.3 < z ≤ 0.4 0.4 < z ≤ 0.5


2
(U − V )rest frame

AV = 1 AV = 1 AV = 1

0.5 < z ≤ 0.6 0.6 < z ≤ 0.7 0.7 < z ≤ 0.8


2
(U − V )rest frame

AV = 1 AV = 1 AV = 1

0.8 < z ≤ 0.9 0.9 < z ≤ 1.0 1.0 < z ≤ 1.1


2
(U − V )rest frame

AV = 1 AV = 1 AV = 1

−22 −20 −18 −16 −22 −20 −18 −16 −22 −20 −18 −16
MV − 5 log10 h MV − 5 log10 h MV − 5 log10 h

Figure 4.30 The blue cloud and red sequence, shown at a range of redshifts. The sloping solid lines are a fit to
the red sequence location, and the dashed line marks the distinction between clouds regarded as ‘blue’ or ‘red’.
The sloping dotted line is the approximate apparent magnitude limit of the survey. Simulated evolutionary tracks of
some galaxies are shown with lines and crosses. The predicted movement of a galaxy undergoing a reddening of
AV = 1 is shown as a vector. Only representative error bars are shown.

A detailed comparison of galaxy evolution models with the evolving blue cloud
and red sequence, together with the requirements of needing to reproduce the
Madau diagram and the evolution of the total stellar mass density Ω∗ , revealed
that the number densities of the most massive early-type galaxies could not be
reproduced. One possibility is that these most massive early-type galaxies are the
156
Summary of Chapter 4

result of dry mergers, i.e. mergers of galaxies with very little gas, so no star
formation results. However, there is currently debate in the community as to
whether dry mergers could account for the mass–metallicity relation in early-type
galaxies, and the structure and sizes of early-types. The mass–metallicity relation
is the correlation between stellar mass and the metal enrichment as measured by
(for example) emission line ratios in star-forming galaxies. There is much that we
have yet to understand about the formation of the most massive elliptical galaxies.

Summary of Chapter 4
1. Dark matter is described as ‘cold’, ‘hot’ or ‘warm’ according to the relative
speeds of the dark matter particles.
2. The hierarchical formation model of large-scale structure describes the
merger of dark matter haloes to make progressively larger dark matter
haloes.
3. Population synthesis models describe the evolution of galaxy spectra by
modelling the evolution of stars (and dust) within them.
4. Dust extinction in the V-band is expressed as AV , measured in magnitudes.
This can be measured with the Balmer decrement, among other ways,
though the estimates are sensitive to assumptions about the dust distribution.
5. Redshifts can be determined from galaxy emission lines. Redshifts can also
be estimated by modelling the changes of observed colours with redshift, a
technique known as photometric redshifts. An important example of a
photometrically-selected high-redshift galaxy population is the Lyman break
galaxy population.
6. The luminosity function of galaxies is the number per unit luminosity (or per
decade luminosity), per unit comoving volume. It can be measured using the
1/Vmax statistic.
7. The V /Vmax values can be used to test for evolution, or in local galaxy
samples to test for incompleteness.
8. Type 1 active galaxies have a direct view of the quasar’s broad-line region,
whereas type 2 active galaxies do not. The type 2 systems can be
distinguished from star-forming galaxies using emission line ratios.
9. Quasars evolve strongly, with a peak in number density around z 1 2.5 and
a decline at higher redshifts.
10. The rest-frame ultraviolet luminosity density can be used to measure the
comoving star formation density of the Universe, known as the Madau–Lilly
diagram or the Madau diagram. This is similar, but it seems not identical, to
quasar evolution.
11. The Madau diagram can also be inferred from the ages of stars in local
galaxies.
12. The confusion limit restricts how deep a given telescope can image. This
limit is conventionally set at 3 or 5 times the noise per beam from
background objects.

157
Chapter 4 The distant optical Universe

13. Pencil-beam surveys such as the Hubble Deep Fields have redshift
histograms that show peaks due to the large-scale structures through which
the surveys pass.
14. Although the rest-frame ultraviolet morphologies of local galaxies can be
quite different to the rest-frame optical morphologies, it appears that these
differences are not responsible for most of the changing appearance of
galaxies in deep Hubble Space Telescope (observed-frame) optical surveys.
15. High-redshift galaxy surveys have used galaxy colour–magnitude diagrams
to generalize the local division of early-type and late-type galaxies into the
red sequence, the blue cloud and the green valley.

Further reading
• For more on the evolution of large-scale structure, see the graduate-level text
Peacock, J.A., 1999, Cosmological Physics, Cambridge University Press.
• Alternatively, try Coles, P. and Lucchin, F., 1995, Cosmology, Wiley.
• For more on gamma-ray bursts and active galaxies, see Kolb, U., 2010,
Extreme Environment Astrophysics, Cambridge University Press.
• Antonucci, S., 1993, ‘Unified models for active galactic nuclei and quasars’,
Annual Review of Astronomy and Astrophysics, 31, 473.
• Kennicutt, R.C., 1998, ‘Star formation in galaxies along the Hubble sequence’,
Annual Review of Astronomy and Astrophysics, 36, 189.
• There is a curious analogy between the clustering of galaxies or CMB
fluctuations, and the uncertainty principle in quantum mechanics: see
Tegmark, M., 1995, Astrophysical Journal, 455, 429, and Tegmark, M., 1996,
Monthly Notices of the Royal Astronomical Society, 280, 299.
• For an accessible review on the conundrums posed by red nugget galaxies, see
Glazebrook, K., 2009, Nature, 460, 694.
• Binney, J. and Tremaine, S., 2008, Galactic Dynamics, Princeton University
Press.
• To view John Michell’s famous paper of 1767 go to
[Link]

158
Chapter 5 The distant multi-wavelength Universe
The distant view is not always the truest view.
Nathaniel Hawthorne

Introduction
We’ve seen how strongly dust can affect optical observations — but what
happens hidden behind the dust? One of the great surprises in cosmology in
the past decade has been the tremendous amount of star formation and black
hole accretion hidden behind heavy dust extinction. This is invisible to optical
telescopes but not to other telescopes, as we’ll see in this chapter.

5.1 The extragalactic optical and infrared background light


What colour is the sky in space? It would appear black to us, but with a sensitive
enough camera we’d be able to detect the background light from all the stars and
galaxies that have ever existed. This would appear, if we had the eyes to see it, as
a dull reddish colour. Of course, this background is not restricted to optical light:
Figure 5.1 shows the extragalactic background at X-ray to radio wavelengths.
Figure 5.1 The spectrum of
1000.0
the extragalactic sky. To the
right is the CMB, peaking
100.0 around 103 µm, which
dominates the energy density
ν Iν /nW m−2 sr−1

10.0 of photons in our Universe.


Moving to the left, there is
1.0
the contribution from the
far-infrared extragalactic
background light peaking at
0.1
around 101.5 µm. Around
101 µm is a minimum. (There
0.0 are stringent upper and lower
10−4 10−2 1 102 104
λ/µm limits at 15 µm, and for clarity
we’ve just drawn this as a single
data point, though it’s not
The most striking features of Figure 5.1 are that there’s a bump in the optical and a direct measurement.) At
near-infrared, a valley at around 15 µm, and another bump in the far-infrared. The shorter wavelengths still there is
far-infrared bump is believed to be thermal radiation from dust in galaxies. As the optical and near-infrared
Exercise 5.1 shows, the energy output in the far-infrared bump is roughly the extragalactic background light,
same as the output in the optical/near-infrared bump. The consequences are peaking around 100 µm. At the
profound: roughly speaking, for every two photons created by stars or by black shortest wavelengths we have
hole accretion, one photon has been absorbed by dust and its energy has been the cosmic X-ray background.
re-radiated as thermal emission by the dust.

159
Chapter 5 The distant multi-wavelength Universe

Exercise 5.1 The background intensity Iν per unit frequency is sometimes


measured in W m−2 Hz−1 sr−1 . Show that the quantity νIν is proportional to the
background intensity per decade in frequency (i.e. the intensity per factor of 10
interval in frequency), and hence that there’s about as much energy output in the
far-infrared bump as in the optical/near-infrared bump. ■
The light from all the stars, galaxies and dust that have ever existed is only a small
part of the cosmic photon energy budget. Figure 5.1 compares the cosmic
microwave background to the extragalactic far-infrared and optical backgrounds.
The energy density from the luminous output of every object in the history of the
Universe is only about 5% of the energy density of the CMB.
Another way to think of the extragalactic background from objects is to integrate
the contributions as a function of flux. We’ll write the flux as Sν , to refer to the
flux density (in, for example, W m−2 Hz−1 ) for the energy received from a galaxy,
per unit time and frequency, measured at a frequency ν. The number of objects in
a flux interval Sν → Sν + dSν is (dN/dSν ) dSν , so the intensity from them must
be Sν (dN/dSν ) dSν . The extragalactic background must therefore be
* ∞
dN
Iν = Sν dSν . (5.1)
0 dSν
As we’ve already seen, a Euclidean source count slope has an integrated
background light that diverges at small fluxes (Chapter 1). Therefore the source
counts must at some faint flux eventually be less steep than the Euclidean slope.
The galaxies that dominate the extragalactic background light per unit interval in
flux will be the ones that have the largest values of Sν dN/dSν . Nevertheless,
it’s common in extragalactic astronomy to plot the Euclidean-normalized
differential source counts, which means dN/dSν multiplied by Sν2.5 . If the
source counts are Euclidean, then a plot of Sν2.5 dN/dSν is a horizontal line.
These diagrams are used to illustrate the deviations from the Euclidean slope.
Figure 5.2 shows some example Euclidean-normalized counts in the mid-infrared.
This figure uses a non-SI unit that is very common in astronomy, namely the
jansky (symbol Jy), after the radio astronomer Karl Jansky:

1 Jy = 10−26 W m−2 Hz−1 . (5.2)

● In this plot, a no-evolution curve is plotted. Why is it not horizontal?


❍ The Euclidean (horizontal) slope assumes a flat unexpanding space with no
redshift, while the plotted no-evolution model includes these effects.
The source count diagram in Figure 5.2 shows a Euclidean slope, a pronounced
bump, and a shallower-than-Euclidean slope at the faintest fluxes. The bumps are
broad, so the galaxies in these bumps tend also to be the ones that dominate
the extragalactic backgrounds per unit interval in flux at those wavelengths
(i.e. Sν dN/dSν is maximum).

Exercise 5.2 We’ve just seen that the flux interval dSν that contributes the
most background will be the one in which Sν dN/dSν is a maximum. Which
logarithmic flux interval d ln Sν contributes the most background? ■

160
5.1 The extragalactic optical and infrared background light

λ = 15 µm
1000

dN/dS S 2.5 /(mJy1.5 deg−2 )

100
Figure 5.2 Galaxy
differential source counts from a
variety of sources, normalized to
the Euclidean prediction, at an
10 observed wavelength of 15 µm.
Also shown are a no-evolution

00
0
00
0
01

10

00

.0
.0

0.
0.
0.

0.

1.

model (black) and a galaxy

00
10

00
10

10

10
S/mJy evolution model that better fits
the data (red).

Another way of thinking about these backgrounds is as an integral over redshift of


the galaxy populations. Suppose that the comoving
) luminosity density at some
redshift z and some frequency ν is Eν = Lν φ(Lν ) dLν (so Eν could be
measured in, for example, W per Hz per cubic comoving Mpc). In general, this
luminosity density Eν will depend on redshift (because galaxies evolve) and on
frequency (because galaxies don’t have flat featureless spectra). We’ll write
this as Eν = Eν (ν, z). The subscript ν is there as a reminder that it’s per unit
frequency. To find the background energy density Bν (ν0 ) at some observed
frequency ν0 , we can just add up the contributions to the comoving intensity
throughout time, taking account of the fact that the rest-frame wavelength is
different to the observed wavelength by a factor of (1 + z):
* t0
Bν (ν0 ) = Eν ((1 + z)ν0 , z) dt, (5.3)
t=0
where t0 is the present-day age of the Universe, as in Chapter 1. To convert this to
an energy flux density, measured in (for example) J s−1 Hz−1 sr−1 , we multiply
this by c/(4π):
* t0
c
Iν (ν0 ) = Eν ((1 + z)ν0 , z) dt. (5.4)
4π t=0
It turns out that this can also be expressed quite simply as an integral over
comoving distance. From Equation 1.42 we have that
−c dz
ddcomoving = ,
H
and using Equation 1.28 we can write this as
c dz
ddcomoving = = c(1 + z) dt.
(1 + z)−1 dz/dt
So Equation 5.4 becomes an integral of the comoving luminosity density over
comoving distance:
*
1 ddcomoving
Iν (ν0 ) = Eν . (5.5)
4π (1 + z)
161
Chapter 5 The distant multi-wavelength Universe

Now, we’ve already seen how the galaxies that dominate the ultraviolet luminosity
density can dominate the (optically-derived) cosmic star formation history. What
this means is that the galaxies that dominate the cosmic star formation history at
some redshift are necessarily the same ones that contribute the most to the
extragalactic background, at that frequency and redshift. Finding out which
galaxies dominate the extragalactic background light is a very similar research
problem to measuring the cosmic star formation history.
Reproducing the extragalactic background light is therefore a key objective of
source count models, one of which is plotted in Figure 5.2. These models
aim to account for the observed number counts and (where available) redshift
distributions, at all wavelengths, and reproduce the present-day stellar mass
density Ω∗ . There are many approaches. One approach is to vary the numbers of
galaxies of different types with redshift, and find the best-fit evolution for this
assumed population mix. Another approach is semi-analytic modelling, in which
the locations of dark matter haloes are given by a numerical model, and the haloes
are populated by galaxies based on some physical assumptions with adjustable
free parameters. Ideally, one would like to simulate a cosmological volume right
down to the scales of the formation of stars in molecular clouds, but this is a very
long way from being computationally possible, so many source count models rely
on template galaxy spectral energy distributions (SEDs), i.e. template spectra
from the ultraviolet to the far-infrared and beyond, to extend the galaxy number
count predictions to different wavelengths. These templates are sometimes
predictions from numerical radiative transfer models of dust and stars in particular
galaxies, or sometimes taken directly from observations.

5.2 Submm galaxies and K-corrections


Imagine that you had some spectacular power to push a galaxy to higher redshift.
Ordinarily, you’d expect the galaxy to appear fainter. We saw in Chapter 1 that the
difference between observed and rest-frame wavelengths changes the observed
flux of a high-redshift galaxy — an effect known as the K-correction. At submm
wavelengths and mm wavelengths, galaxy spectra are dominated by the steep
Rayleigh–Jeans slope of the thermal dust emission. The K-correction is therefore
extremely strong and can make a more distant galaxy brighter than a closer
identical galaxy! Figure 5.3 shows a redshifted galaxy template spectrum, keeping
its luminosity constant. At most observed wavelengths the galaxy is fainter at
high redshifts, except at wavelengths around 1 mm. Figure 5.4 shows how the
observed flux of a dusty galaxy is predicted to vary with redshift.
But in order to make use of this K-correction, we need cameras that operate at
submm or mm wavelengths. The first was the Submillimetre Common User
Bolometer Array (SCUBA) on the Anglo–Dutch–Canadian James Clerk
Maxwell Telescope (JCMT) in Hawaii. Prior to SCUBA, the best available
models of galaxy evolution predicted zero or at most one galaxy detectable
in even the deepest single SCUBA exposures. However, it’s often the way
that opening up a new wavelength regime in astronomy brings unexpected
discoveries, and SCUBA’s image of the HDF-N field and of galaxy cluster lenses
(Figure 5.5; see also Chapter 8) revealed a completely unanticipated population of
far-infrared-luminous high-redshift galaxies!
162
5.2 Submm galaxies and K-corrections

102

Figure 5.3 The spectral


1
energy distribution of the
star-forming galaxy M82, as it
flux density/mJy

would appear at different


10−2 redshifts. The colours from blue
to red refer to redshifts of 0.1,
0.3, 1, 2, 3, 4, 5 and 6. Note that
at submm-wave and millimetre
10−4 wavelengths, the K-corrections
are strong enough to counter the
cosmological dimming. The
same galaxy is shown on the
10−6 cover of this book. M82 is often
0.1 1 10 100 1000 10 000
λ/µm
used as a template SED for a
starburst galaxy.

from blue to red: λ = 24,


70, 110, 160, 200, 350,
450, 500, 850, 1100,
100 1400 and 2100 µm

10
flux density/mJy

0.1
Ωm,0 = 0.3 op
tic
ΩΛ,0 = 0.7 a l
1.4

h = 0.65
Figure 5.4 The flux variation
GH

0.01 5 × 1012 L!
with redshift of a typical
z

38 K
star-forming galaxy with a fixed
0.1 1 10
luminosity, at a variety of
redshift, z
observed-frame wavelengths.

These galaxies were soon known as ‘SCUBA galaxies’, but as other cameras
became available (with names such as MAMBO, BOLOCAM, AzTEC,
SHARC-II, BLAST, LABOCA) the more generic term of submillimetre galaxies
163
Chapter 5 The distant multi-wavelength Universe

(or SMGs) became current. The selection function of SMGs (or their mm-wave
counterparts MMGs) is strikingly uniform, as you can see in Figure 5.4: an SMG
at z = 10 would have almost the same brightness as an identical galaxy at z = 1.
● Would the histogram of redshifts of SMGs also be uniform?
❍ No. Even if the number density of SMGs didn’t evolve, we’d still be sampling
different amounts of comoving volume at different redshifts, i.e. dV /dz is not
constant.

Figure 5.5 The 850 µm image of the Hubble Deep Field North. This image has
a radius of 100 arcseconds. The zoom shows the corresponding Hubble Space
Telescope data in the region of the brightest SMG.
An intense campaign of optical spectroscopy with the Keck telescopes found a
median redshift of around z = 2.2 for SMGs.6 Even without redshifts, the
far-infrared luminosities suggested star formation rates of around 1000 M) per
year. (Because submm flux is more or less independent of redshift at 1 < z < 10,
the luminosities can be estimated even without redshifts.) We’ll discuss in
Section 5.7 how SMGs changed our physical picture of galaxy formation and
evolution.
Bigger submm- and mm-wave cameras have led to larger surveys of submm- and
mm-wave galaxies. One daring experiment to survey parts of the sky at submm
wavelengths involved dangling a telescope with a 2 m primary mirror from a
weather balloon! In 2006 the Balloon-borne Large Aperture Submm Telescope
(BLAST) flew for 11 days around the South pole and mapped the Chandra Deep
Field South (CDF-S) field (among other fields), shown in Figure 5.6. Highly
redshifted galaxies should be visible at the longest wavelengths, but less visible at
the shorter wavelengths because the peak of the emission has redshifted past
(Figure 5.3). BLAST used the same detector technology as the SPIRE instrument
on the ESA Herschel Space Observatory, which launched in 2009. The first
images from Herschel have been spectacular. Figure 5.7 shows the local spiral
galaxy M74. Even in this short exposure, there are many background galaxies,
which appear to be clustered.

6
Chapman, S.C. et al., 2005, Astrophysical Journal, 622, 772.
164
5.2 Submm galaxies and K-corrections

total intensity λ = 0.25 mm

deep 1 deg2 exposure


λ = 0.35 mm

λ = 0.50 mm

shallow 8 deg2 exposure

Figure 5.6 Submm-wave maps of the Chandra Deep Field South region from
the BLAST mission. For reference, the same features have been circled in each of
the images. Total intensity = sum of the three wavelengths.

But what are these SMGs, apart from being galaxies detected at submm
wavelengths? To find out, we need to cross-match the submm-wave objects with
images or catalogues at other wavelengths, including the optical if we want to take
optical spectroscopy of the optical counterpart. However, the diffraction limit of
telescopes makes this difficult: the angular resolution of a circular aperture is
1.22λ/D radians, where λ is the wavelength of the light and D is the diameter of
the aperture.

Exercise 5.3 Calculate the angular resolution in arcseconds of the Herschel


Space Observatory (diameter 3.5 m) at its longest imaging wavelength of
500 µm. ■
Figure 5.5 illustrates the problem. Submm-wave imaging has the benefit of
negative K-corrections, but imaging at other wavelengths does not. The positions Figure 5.7 First-light image
of SMGs are typically only accurate to a few arcseconds or tens of arcseconds, but from the Herschel Space
within this area there can be many optical galaxies. Ultra-deep radio mapping has Observatory at 250 µm, of the
managed to detect around half of the SMGs but the rest were unidentified in even nearby galaxy M74. There are
the deepest 1.4 GHz images of the sky. It’s possible to improve the positions using also many background SMGs
submm- or mm-wave interferometry, but this takes very long exposures and is visible.
currently feasible only for the brightest SMGs.
This will change dramatically with the Atacama Large Millimeter/Submillimeter
Array (ALMA), a huge interferometer in Chile that is scheduled to start full
operations no earlier than 2012. In the meantime, the Spitzer Space Telescope
achieved a breakthrough in identifying SMGs: soon after its launch in 2003, it
found that it could identify SMGs in only ten-minute snapshots. There does
appear to be some considerable overlap between the populations of galaxies seen
by Spitzer, and (except for a minority at very high redshifts) the population
of SMGs. There are also good indications from stacking analyses that
the galaxies that dominate the extragalactic background light at Spitzer’s
165
Chapter 5 The distant multi-wavelength Universe

mid-infrared wavelengths are largely the same as the populations that dominate
the submm-wave extragalactic background at < 500 µm. However, much of the
background at 850–1100 µm is still unaccounted for.
We’ve already met the population of Extremely Red Objects (EROs), some of
which appear to be dusty starbursts, particularly at fainter K-band apparent
magnitudes. It turned out that approximately 10–20% of SMGs are also ERO
galaxies, and early indications are that a similar fraction appear to be BzK
starbursts. There is now a considerable variety of definitions of various types of
red galaxies, with often partly overlapping memberships, such as Distant Red
Galaxies (DRGs) with (J–K) > 2.3, or Dust Obscured Galaxies (DOGs) with
S24 /SR > 1000 (where SR is the R-band flux, and S24 is the 24 µm flux). DRGs
and BzK galaxies contribute tens of percent to the cosmic submm background
See, for example: Pope, A. et light.
al., 2008, Astrophysical Journal, One final subtlety is that selection effects may have an insidious effect on the
689, 127; Knudsen, K.K. et al., types of galaxies seen in the far-infrared and submm. If there are populations of
2005, Astrophysical Journal galaxies with lots of cool dust radiating predominantly at longer wavelengths, we
Letters, 632, 9; Takagi, T. et al., might expect these galaxies to be over-represented in SMG samples. Similarly,
2007, Monthly Notices of the galaxies selected in the far-infrared at, say, 70 µm, may tend to have warmer
Royal Astronomical Society, colour temperatures. It is too early to say definitively if this is the case, and to
381, 1154. what extent these biases operate, but several observations have been found
consistent with the presence of these subtle biases.

5.3 Ultraluminous and hyperluminous infrared


galaxies
SMGs were not the first tremendous surprise in infrared extragalactic astronomy.
The US/UK/Dutch Infrared Astronomy Satellite (IRAS) mapped 98% of the sky
at 12, 25, 60 and 100 µm, and as well as detecting many tens of thousands of
star-forming galaxies, it soon became clear after its launch in 1983 that there were
many galaxies with luminosities of 1 1012 L) or more, about 1–2 orders of
magnitude more luminous than the Milky Way, yet with most of the energy
Houck, J.R. et al., 1985, output in the infrared. It was not immediately clear what caused these high
Astrophysical Journal Letters, luminosities, but violent starbursts or dust-shrouded active nuclei were suspected.
290, 5. All-sky surveys are also very useful for finding very rare populations, and
1991 saw the spectacular discovery by Rowan-Robinson et al. of the galaxy
Rowan-Robinson, M. et al., IRAS FSC 10214+4724. (Here FSC stands for ‘faint source catalogue’ and the
1991, Nature, 351, 719. numbers refer to approximate right ascension and declination coordinates.) Most
IRAS galaxies were found at redshifts of < 0.1, and even the 1012 L) objects
rarely exceeded z = 0.2, but this galaxy was found at the (then) tremendously
high redshift of z = 2.286 (Figure 5.8), with a derived far-infrared luminosity of
3 × 1014 L) . This was the most luminous object known so far in the Universe,
and given the predictions of monolithic collapse galaxy formation models, this
galaxy excited a great deal of interest as a candidate ‘protogalaxy’.
Astronomy has a tendency as a field to be fond (perhaps overly fond) of
classifying. These discoveries of infrared-luminous galaxies led to the classes of
luminous infrared galaxies (or LIRGs) with 1011 –1012 solar luminosities,
ultraluminous infrared galaxies (or ULIRGs) with 1012 –1013 L) , and

166
5.3 Ultraluminous and hyperluminous infrared galaxies

15

[O II]
He II
NV

Si IV + O IV]

[Ne IV]

[Ne IV]
[Na V]
He II ?

Mg VI
N IV]

N III]

C III]
Ly α
O VI

N II]
C IV

C II]
−1
flux density/10−20 W m−2 Å

10

5
Figure 5.8 Spectrum of
the redshift z = 2.286
hyperluminous galaxy
IRAS FSC 10214+4724. Note
the numerous emission lines
from ionized gas, some of
0
which are characteristic of
4000 5000 6000 7000 8000 starburst galaxies, and others
characteristic of active galaxy
wavelength/Å
narrow line regions.

hyper-luminous infrared galaxies (or HLIRGs) with > 1013 L) . Some models
predicted different physical mechanisms driving the evolution of these classes, so
these divisions were not without physical motivation, though these interpretations
were by no means unique. Figure 5.9 shows HST morphologies of ULIRGs; it As a joke I once tried to coin the
appears that major galaxy–galaxy mergers are important in the local Universe term ‘überluminous’ in a
in triggering ultraluminous starbursts. There was a twist to the story of paper describing hypothetical
IRAS FSC 10214+4724: the enormous apparent luminosity turned out to be in ∼1014 –1015 L) galaxies, but the
part due to gravitational lensing, as we’ll see in Chapter 7, though it remains a referee (perhaps quite rightly)
prototypical HLIRG, albeit a lensed one. would have none of it!
IRAS was followed by the ESA Infrared Space Observatory (ISO) in 1995. As an
observatory rather than a sky survey, it specialized in a few deeper surveys and
follow-ups of individual objects. NASA’s Spitzer Space Telescope, launched in
2003, had a primary mirror with a similar diameter to the ISO, but enormously
more sensitive detectors. Both the ISO and Spitzer resulted in the discovery of
many new ULIRGs and HLIRGs, as well as shedding (metaphorical) light on
many star-forming galaxies. Figure 5.10 shows the Antennae galaxies, a pair
of colliding galaxies observed with the HST and with Spitzer. Some of the
heavily-extincted regions in the HST image are strongly luminous at mid-infrared
wavelengths in the Spitzer image. In the local Universe, ULIRGs contribute
a negligible amount to the cosmic star formation density, but the discovery
of SMGs and the surveys by Spitzer have demonstrated that the fractional
contribution from ULIRGs to the cosmic star formation history increases strongly
with redshift, as we’ll see.
167
Chapter 5 The distant multi-wavelength Universe

IRAS 01003-2238 Mrk 1014 IRAS 05189-2524

IRAS 08572-3915 IRAS 12071-0444 Mrk 231

Pks 1345-12 Mrk 463 IRAS 15206+3342

Figure 5.9 HST imaging of ultraluminous infrared galaxies in the local


Universe. Many contain quasars, which appear as bright point sources, but all
have at least some evidence of a violent galaxy–galaxy merger.
Figure 5.10 The
Antennae galaxies, a pair of The thermal emission from dust is not entirely black body, because the wavelength
colliding galaxies. The top of the light emitted can sometimes be of the same order as the dust grain size.
black-and-white image shows a As a result, the black body spectrum is modified by a wavelength-dependent
ground-based optical image, factor kd , typically in the range kd ∝ λ−1 to λ−2 . This index (1 to 2 in this case)
with a green overlay that shows is often given the symbol β and is known as the grey body emissivity index. It is
the area of the HST optical usually found empirically by observing the Rayleigh–Jeans tail. Knowledge of
imaging (centre). The lower this index is important in calculating dust masses from the thermal spectra of
image is the same field seen by galaxies. If we are observing dust with a single temperature, the dust mass M for
the Spitzer Space Telescope at a given flux Sν at an observed frequency ν would be
wavelengths of 3.6–8 µm. There
is a large dusty region that is 1 Sν d2L
M= , (5.6)
dark to the HST, but luminous at 1 + z kd (νrest ) B(νrest , T )
longer wavelengths. The widths where dL is the familiar luminosity distance (Chapter 1), νrest = ν (1 + z), and
of the two lower images are B(ν, T ) is the Planck black body function.7 However, it’s physically implausible
about 2.3 arcminutes, or about that the single-temperature approximation holds in practice, so some research
13 kpc at the distance of these
galaxies. 7
See, for example, Hughes, D.H., Dunlop, J.S. and Rawlings, S., 1997, Monthly Notices of the
Royal Astronomical Society, 289, 766.
168
5.4 Measuring star formation rates

groups opt instead to use sophisticated numerical radiative transfer models of


the three-dimensional heating of dust. (Sometimes these models impose a
simplifying symmetry, such as cylindrical or spherical symmetry, for more
rapid computations.) The single-temperature approximation may sometimes
nevertheless still be useful for order-of-magnitude estimates of dust masses.

Exercise 5.4 The factor kd has a large uncertainty. Estimates range from
0.04 m2 kg−1 to 0.3 m2 kg−1 at a rest-frame wavelength of 800 µm, i.e. a range of
about a factor of 7, though more typical values are 0.15 ± 0.09 m2 kg−1 . For an
850 µm observation of an SMG at a redshift of z = 3, what would the fractional
range be in the possible dust mass assuming the following?
(a) kd = 0.15 ± 0.09 m2 kg−1 , a grey body emissivity index of β = 1.5 and a
fixed temperature.
(b) The same as (a), except also allowing a grey body emissivity index of
β = 1–2.
(c) The same as (b), but also allowing a range of assumed temperature of
T = 20–40 K (a wide but not unreasonable range for galaxies).
What advantages are there to measuring fluxes at more wavelengths than just
850 µm? ■
Although undoubtedly difficult to measure, the dust masses are often key
predictions for galaxy evolution models, especially those in which giant elliptical
galaxies form at high redshifts, converting gas to stars in high star formation rates
and generating large dust masses.

5.4 Measuring star formation rates


We’ve already met how to use the ultraviolet comoving luminosity density to
estimate the star formation history of the Universe, and discussed the systematic
uncertainties from the IMF and from dust obscuration. There are other ways of
making these estimates that are less prone to dust obscuration. We’ll see in
Section 5.6 that this leads to an astonishing conclusion that is similar to the result
from the extragalactic background light.
The short-lived but luminous O stars and B stars ionize their environments and
cause strong Balmer emission lines, as we found in Chapter 4, as well as [O II]
372.7 nm emission and Lyman α 121.6 nm emission. The Hα 656.3 nm Balmer
line, [O II] line and Lyman α line, once corrected for extinction, could therefore
be used to estimate star formation rates. Longer wavelengths are less prone to
dust extinction, as we saw in Chapter 4, giving Hα an advantage, but it redshifts
out of the observed-frame optical (i.e. > 1 µm) above a redshift of a half or so.
Infrared spectra can be measured but so far not in such large numbers as optical
spectra, because of technological limitations. As we saw in Chapter 4, shorter
wavelength observations are dominated by lower extinction regions, so longer
rest-frame wavelengths directly sample more of the star formation, but Hα is still
a long way from being an extinction-independent measure of star formation.
However, the energy absorbed by the dust in star-forming regions does not go
away; it is re-radiated as thermal radiation by the dust. The peak emission from
169
Chapter 5 The distant multi-wavelength Universe

dust in galaxies is roughly around 70–130 µm, as shown in Figures 5.1 and 5.3.
Could we use this as our star formation rate indicator? How do we know that the
dust is heated by star formation, and not (for example) just by the ambient
interstellar radiation or from an active nucleus (Chapter 6)? In our Galaxy, the
ambient interstellar light heats the ambient dust, and the thermal radiation from
that dust has been detected in the all-sky far-infrared surveys from IRAS and the
Japanese AKARI space telescope. This dust has come to be known as cirrus
owing to its wispy appearance. (This foreground cirrus structure also places a
similar limit to some deep-field observations to point source confusion, known
See, for example, Gautier, T.N. as cirrus confusion noise. The power spectrum of cirrus is approximately
III et al., 1992, Astronomical P (k) ∝ k −3 , where k is inverse angle, so cirrus is smoother on smaller scales,
Journal, 103, 1313. thus observations with larger beams are more susceptible to cirrus confusion.)
Ultimately, the case for using the far-infrared luminosity to measure star formation
(like all other estimators) rests on astrophysical plausibility. Radiative transfer
models predict that the cirrus contribution is cooler than star-forming giant
molecular clouds (Orion, for example, is warmer than Galactic cirrus, shown in
Figure 5.11), and in star-forming galaxies the cirrus component is predicted to be
lower luminosity than the far-infrared emission from star formation. Supermassive
black hole accretion (Chapter 6) in a galaxy’s active nucleus also heats dust in the
circumnuclear torus, but most models predict this contribution to dominate in the
mid-infrared rather than at longer wavelengths. Far-infrared measurements
capture the obscured star formation but still aren’t free of IMF assumptions,
because the massive stars (above 5 M) or so) dominate the dust heating.
Another star formation rate indicator relies on an entirely different physical
process, and is completely independent of dust obscuration: the radio luminosity.
Supernovae from massive stars accelerate charged particles, which spiral along
the field lines of the galaxy’s magnetic field, emitting synchrotron radiation. The
synchrotron luminosity should therefore be proportional to the recent supernova
rate in the galaxy. This synchrotron radiation dominates at radio wavelengths
(e.g. metre-scale) with a power-law spectrum Sν ∝ ν −α with α 1 0.7 to 1.0
depending on the energy distribution of the charged particles (which itself depends
Figure 5.11 The Orion on time). Dust clouds are transparent to this radiation, and synchrotron-emitting
nebula, as seen by the AKARI regions are themselves optically thin to synchrotron radiation, so the radio
space telescope at 140 µm. The luminosity is ostensibly obscuration-independent. However, it’s still subject to the
constellation itself is marked in IMF. Figure 5.12 shows schematically how stars of different masses contribute to
white. The bright far-infrared the radio, far-infrared and ultraviolet luminosities. All three are sensitive to
knot at the location of the massive stars, to varying degrees, but none covers stars less massive than 5 M) .
nebula (inside the bottom of the
These three star formation rate indicators are sensitive to different parts of
constellation) is caused by
the galaxy too: the ultraviolet traces the unobscured regions, the far-infrared
dust-shrouded star formation.
traces the dust-shrouded regions, and the radio traces the total. Perhaps it’s not
too surprising, then, that the radio and far-infrared luminosities of galaxies
Helou, G., Soifer, B.T. and correlate, as in Figure 5.13. This correlation was discovered by George Helou,
Rowan-Robinson, M., 1985, Tom Soifer and Michael Rowan-Robinson in 1985. However, the tightness of
Astrophysical Journal Letters, the radio–far-infrared correlation over nearly four orders of magnitude in
298, 7. luminosity is an unsolved puzzle, as is the physical origin of the normalization.
The radio–ultraviolet and far-infrared–ultraviolet correlations, meanwhile, are less
tight.

170
5.4 Measuring star formation rates

relative contribution

νSN Nuv
Figure 5.12 The relative
contributions to the
LFIR far-infrared luminosity, the
supernova rate and the
ultraviolet ionizing radiation,
per logarithmic interval of
10 100
stellar mass, assuming an
M/M! IMF varying as M −5/2 up to
100 M) .

24

23
log10 [h2 L1.49 /W Hz−1 ]

22

21
Figure 5.13 The
correlation between
20 far-infrared and radio
(1.49 GHz) luminosities
of star-forming galaxies.
19 Galaxies with active nuclei
8 9 10 11 12
log10 (h2 LFIR /L! )
have been excluded from this
plot.

Ideally, we’d like to track the cosmic star formation history with far-infrared,
radio and ultraviolet tracers at the same time. However, we’ve seen that the
diffraction limit of telescopes limits far-infrared observations. Moreover,
the atmosphere is opaque in most far-infrared wavelengths, and the Earth’s
atmosphere is strongly luminous in the few transparent windows.

Exercise 5.5 Why would a high background flux from (for example) the
Earth’s atmosphere limit astronomical observations? Demonstrate your answer
using Poisson statistics. ■
Radio observations get around the diffraction limit with interferometry, but this is
technically very challenging in the far-infrared partly because of the more
stringent timing requirements at higher frequencies. A solution to the terrestrial
sky background is space telescopes, but it’s difficult to launch large primary
mirrors into space. The largest so far is Herschel (Exercise 5.3). The proposed
Japanese SPICA space telescope will have a similar aperture, but will cool the
171
Chapter 5 The distant multi-wavelength Universe

optics to reduce the background from the telescope and increase the sensitivity. At
the time of writing, there are also ambitious proposals with both NASA and ESA
for future far-infrared space-based interferometers.
The coarse angular resolution of far-infrared images makes it difficult to identify
24 µm: (0.599 to 0.892) mJy which optical galaxy is the far-infrared emitter: the angular size of the submm
image in Figure 5.5 is approximately the same as the whole Hubble Deep Field
North. This also increases the confusion noise. However, there are some possible
shortcuts. One method is to use stacking analyses (Chapter 2), such as averaging
together the far-infrared images of galaxies detected at other wavelengths, in order
to measure the average far-infrared flux for those galaxies. Another method is to
24 µm: (0.337 to 0.389) mJy see what else the far-infrared luminosities correlate with. In the local Universe,
the mid-infrared luminosities correlate with bolometric luminosities (i.e. total
luminosities). The shorter wavelengths in the mid-infrared lead to higher angular
resolutions. Figure 5.14 shows stacked far-infrared images of 24 µm-selected
galaxies, implying that this correlation exists at higher redshifts too.
24 µm: (0.190 to 0.219) mJy But why should mid-infrared luminosities correlate with far-infrared? The
dust emitting the mid-infrared light is physically distinct from the far-infrared
emission. In order to be radiating at these shorter wavelengths, the dust must be
hotter. It turns out that the dust grains responsible are small (less than 0.1 µm, or
even as small as tens of atoms), and are transiently heated sometimes by a single
photon. The mid-infrared spectra of star-forming galaxies have strong spectral
24 µm: (0.107 to 0.123) mJy
signatures (see, for example, Figure 5.3), often (but not always) attributed to
polycyclic aromatic hydrocarbons (or PAHs). This small-grained and PAH dust
is heated by very short-wavelength light, because wavelengths much longer
than the grain size are largely unaffected by these small dust particles, so the
mid-infrared spectra measure the dust-shrouded ultraviolet light from O and B
24 µm: (0.080 to 0.089) mJy stars. The PAH emission line ratios can also be used to investigate the dust
composition, and at around 10 µm there is an additional absorption feature from
Figure 5.14 The average silicate dust grains. These mid-infrared spectral features do, however, make the
70 µm (left) and 160 µm (right) K-corrections quite complicated. This can make it difficult to estimate the
images of 24 µm-selected mid-infrared luminosities, but the Japanese AKARI space telescope turned this to
galaxies, for a variety of its advantage and made deep surveys in many mid-infrared filters, in order to
ranges of 24 µm flux. Most make mid-infrared photometric redshifts possible.
of these 24 µm galaxies are
One disadvantage of using the mid-infrared for estimating star formation rates is
individually undetected at longer
that black hole accretion can also contribute. The dust tori in active galactic nuclei
wavelengths. For fainter 24 µm
(Chapter 4) are predicted to emit the peak of their radiation at exactly these
fluxes, the detections in the
wavelengths. However, the thermal spectra from dust tori are largely featureless
average images are noisier, but
(notwithstanding a 10 µm absorption feature) so mid-infrared spectra can in
nonetheless significant.
principle distinguish star formation from black hole accretion.
The proposed successor to the HST, the James Webb Space Telescope (JWST),
will have an expected collecting area of 25 m2 , and will operate from 0.6 µm to
28 µm. SPICA should still out-perform the JWST in the mid-infrared,
despite SPICA’s smaller mirror, because of its cooled optics. The proposed
ESA planet-finding mission Darwin may also be able to take high-resolution
mid-infrared images, using mid-infrared space-based interferometry.
One more star formation rate measure deserves a brief mention. High-mass
X-ray binary stars (HMXBs) have one massive star, emitting a wind that is
accreted by a companion neutron star, and the accretion is responsible for the
172
5.5 Multi-wavelength surveys

X-ray radiation. Since massive stars are shorter-lived, the numbers of HMXBs
must be a measure of the recent star formation rate. In practice the high-redshift
X-ray luminosities from star formation are often overwhelmed by that from
supermassive black hole accretion, though this star formation rate estimator can
be useful in some systems. For more details, see the further reading section.

5.5 Multi-wavelength surveys


With so many approaches to measuring star formation rates in galaxies, each with
different advantages and disadvantages, what’s the best way to study galaxy
evolution? In practice, astronomy at different wavelengths uses different
observing techniques and different detector technologies, and so has developed
specialist research communities; each community would at one time undoubtedly
have promoted their own as ‘best’. However, the more modern approach is to take
a coordinated multi-wavelength view, acknowledging the complementary insights
at different wavelengths.
A classic example of this is the Great Observatories Origins Deep Survey
(GOODS), which used three of NASA’s Great Observatories: the HST, the Spitzer
Space Telescope and the Chandra X-ray space telescope. GOODS made or uses
the deepest pencil-beam surveys with all three facilities: the Chandra Deep Field
North (CDF-N) in the HDF-N field and the Chandra Deep Field South (CDF-S);
the HST HDF-N, HDF-S and the Ultra-Deep Field in CDF-S; and deep Spitzer
imaging in HDF-N and CDF-S, now also known as GOODS-N and GOODS-S.
Note the deliberate choices to map the same sky areas (though HDF-S and CDF-S
are in different locations).
It isn’t just pencil-beam surveys that have taken this approach; many other
surveys of various depths and areas have taken a multi-wavelength approach.
Nevertheless, there are only around a dozen or so well-studied fields in
extragalactic astronomy. This is partly because of the competing requirements of
multi-wavelength astronomy, such as:
• avoiding bright foreground objects at optical, mid/far-infrared and X-ray
wavelengths;
• avoiding bright radio sources in the field or nearby for radio interferometers
such as the Very Large Array (VLA) or the UK’s MERLIN array (or the
forthcoming upgraded versions e-VLA and eMERLIN);
• having low Galactic neutral hydrogen column density for X-ray observations;
• having low cirrus intensity and hence low cirrus confusion noise for
far-infrared observations;
• having high visibility for the space-based observatories such as Chandra or
ESA’s XMM-Newton X-ray space telescope.
We’ve met the play-offs between depth and area in pencil-beam surveys and
wide-field surveys. Some facilities have taken a deliberate approach of combining
a variety of deep and wide surveys, sometimes called a ‘tiered’ approach or a
‘wedding cake’. Figure 5.15 shows a simulation of tiered surveys with the
Herschel Space Observatory. Each tier is a flux-limited survey, which means that
the survey includes all the objects brighter than a given flux, within the survey
173
Chapter 5 The distant multi-wavelength Universe

area. Note that in any single tier, there is a strong correlation of luminosity with
redshift (Section 4.5), which as a selection effect is also known as Malmquist
bias. However, by combining surveys of different depths, there is better coverage
of the luminosity–redshift plane.

1012
luminosity/(L! sr−1 )

1011
Arp 220

1010

M82

109

108
0 1 2 3
redshift, z

Figure 5.15 Simulated catalogues for planned surveys with the Herschel Space Observatory. Luminosity per
unit solid angle is plotted against redshift, with objects in each survey colour-coded. The luminosities of the local
starbursts Arp 220 and M82 are marked with dashed lines. The wide and shallow surveys cover the upper parts of
this figure, while the deep and narrow pencil-beam surveys cover the lower parts of the figure. In each survey in
isolation, redshift is correlated with luminosity, due to the Malmquist bias selection effect. Taking the surveys as an
ensemble improves the coverage of the luminosity–redshift plane. The slight horizontal banding is an artefact of
the simulation, but the paucity of high-luminosity objects at low redshifts is real and due to Malmquist bias.

● Why are there few high-luminosity objects at low redshift in any of the
surveys?
❍ This is partly because the luminosity function evolves, so high-luminosity
objects are intrinsically more common at higher redshifts. However, it’s also
because the amount of comoving volume sampled per unit redshift is smaller
at low redshift than at high redshift (Figure 1.19).

Exercise 5.6 Draw (or otherwise indicate) the approximate line of any flux
limit in Figure 5.15. ■

5.6 Cosmic star formation history and stellar


mass assembly
The cosmic star formation histories derived from obscured star formation
indicators and obscuration-independent indicators show just as much evolution as

174
5.6 Cosmic star formation history and stellar mass assembly

the rest-frame ultraviolet luminosity density. There’s a strong increase from the
local Universe to redshift z = 1 seen in radio surveys, but it’s difficult to extend
this to higher redshifts because of the difficulty in obtaining reliable redshift
estimates for the radio-selected starbursts.
However, this has been achieved for some submm-wave surveys. In these, the
negative K-corrections give a very uniform selection function over most of the
Hubble volume, as we’ve seen. Figure 5.17 shows the cosmic star formation
history of SMGs. While this seems similar to the ultraviolet star formation
history, remember that the observed optical identifications of SMGs are often faint
or unprepossessing optical galaxies, and not strongly luminous in the rest-frame
ultraviolet. Therefore this star formation is in addition to the ultraviolet-derived
star formation rate!
Meanwhile, the Spitzer Space Telescope has made it possible to conduct
wide-field and deep extragalactic surveys at 3–8 µm. In the near-infrared, the
luminosities of galaxies are insensitive to star formation (see Chapter 4) and
relatively insensitive to dust obscuration, notwithstanding the possibility of a large
proportion of old stars in dust clouds. We could therefore use the near-infrared
luminosity density to measure the total density in stars, Ω∗ . By definition, this
must be an integral of the cosmic star formation history. These constraints are
shown in Figure 5.16. Differentiating this with respect to time gives the cosmic
star formation history in Figure 5.17. Surprisingly, these constraints turn out not
to be consistent with the observed cosmic star formation history!

lookback time/Gyr
1 3 5 7 9 11

8.5
0.5
log10 [ ρ∗ /M! Mpc−3 ]

ρ∗ /ρ∗ (z = 0)

0.2
8 Figure 5.16 The stellar mass
density of the Universe as
a function of time and of
0.01 redshift. A compilation of
earlier estimates is shown in
coloured symbols, while the
7.5 most precise determination so
0.05 far is shown in black symbols.
Open circles show the direct
measurements, while filled stars
0 1 2 3 4 incorporate an additional
redshift, z correction for galaxies below the
flux limits of the data.

175
Chapter 5 The distant multi-wavelength Universe

lookback time/Gyr
1 3 5 7 9 11
0.5

0.2

ρ̇∗ /M! yr−1 Mpc−3 0.1

Figure 5.17 The comoving


star formation density history of 0.05
the Universe, derived from
mid-infrared luminosities
(blue error bars) and submm
luminosities (magenta error 0.02
bars). These estimates have been
averaged together to make the
black error bars. The error bars
0.01
with filled stars and the open
circles show the result of 0 1 2 3 4
differentiating Figure 5.16, and redshift, z
are substantially lower.

What could cause this discrepancy? There are many potential sources of
systematic uncertainties that we’ve discussed in this chapter and the previous
one. Perhaps, for example, the dust extinction corrections need revision, or
perhaps some of the star formation rate indicators have additional unrecognized
contributions from AGN. A more fundamental underlying cause could be changes
in the IMF. Indeed, some semi-analytic models assume a strongly top-heavy IMF
in order to be able to reproduce the SMG population without overproducing Ω∗ .
The revisions to the IMF are radical, though. Many observations point to the same
or very similar IMFs, but there is evidence that the IMF is strongly top-heavy in
See, for example, Larson, 2005, the central parsec of our Galaxy.
in the further reading section, Another surprise has been the high-redshift reversal of the relationship between
and Bartko, H. et al., 2009, star formation and environment. In the local Universe, star-forming galaxies avoid
arXiv:0908.2177. rich environments such as the cores of galaxy clusters (see Section 3.7). However,
there is increasing evidence that at z 1 1, star-forming galaxies detected at
24 µm are more common in richer environments. This is already a challenge to
semi-analytic models, and might suggest that mergers were more important in
triggering star formation at higher redshifts. There are also hints that SMGs
are found in richer environments than non-starbursting galaxies. The natural
generalization of the Butcher–Oemler effect (Chapter 3) is to measure the cosmic
star formation history in different environments, and this is the subject of much
active research.

176
5.7 Downsizing

5.7 Downsizing
When did galaxies form? Up to the 1980s, the tendency was to speak of an epoch
of galaxy formation at some (as yet unfathomed) redshift, perhaps involving
monolithic collapse (Chapters 3 and 4). The advent of deep HST imaging
changed the terminology and the thinking, with galaxy formation then being seen
as an ongoing process. The growing sophistication of N -body and semi-analytic
simulations led to a growing acceptance of hierarchical galaxy formation,
following the schematic picture in Figure 4.2.
The discovery of SMGs at redshifts z > 2 with enormous star formation rates of
thousands of solar masses per year therefore came as a tremendous surprise.
Evidence mounted that present-day massive galaxies formed earlier than
present-day less-massive galaxies. This is exactly the opposite to what you might
expect from hierarchical structure formation (Figure 4.2). Figure 5.18 shows the
fraction of stellar mass assembled in present-day galaxies as a function of
redshift. A similar trend is seen in direct measurements of the star formation
history at an observed wavelength of 24 µm (Figure 5.19).

lookback time/Gyr
1 3 5 7 9 11
100
ρ∗ /ρ∗ (z = 0) as a percentage

50

20
1010.0 < M/M! < 1011.0
1011.0 < M/M! < 1011.5
1011.5 < M/M! < 1011.7
1011.7 < M/M! < 1012.0
M/M! > 1012.0
0 1 2 3 4
redshift, z

Figure 5.18 The fraction of the present-day assembled stellar mass, for a
variety of galaxy masses. Note that the most massive galaxies formed their stars
earlier in the history of the Universe.

177
Chapter 5 The distant multi-wavelength Universe

109
0.100

SFR/M! yr−1 Mpc−3


ΩIR /(L! Mpc−3 )
Figure 5.19 The cosmic
star formation history for 108
all galaxies (green), for 0.010
< 1011 L) galaxies (blue), and
for galaxies with > 1011 L)
(yellow) and > 1012 L) (red).
The contribution from
higher-luminosity galaxies 107
evolves more quickly, making it 0.001
negligible in the present-day 0.0 0.2 0.4 0.6 0.8
Universe but important by redshift, z
z = 1.

Note that massive starbursts comprised a greater proportion of the cosmic star
formation rate at higher redshifts. This is so striking that it was sometimes
called ‘anti-hierarchical’. Nevertheless, hierarchical semi-analytic models were
ultimately able to account for these populations, though needing to invoke
non-standard IMFs (Section 5.6) or particular feedback mechanisms, which we
shall meet in Section 5.8 and in Chapter 6. A more apt and widely-used term to
describe this top-down galaxy formation is downsizing.
If SMGs are the progenitors of giant elliptical galaxies, then we’d expect them to
be in massive dark matter haloes, and (according to semi-analytic models)
strongly biased tracers of the underlying dark matter distribution. They should
therefore cluster strongly. There have been hints of strong clustering in the
SCUBA surveys, but one of the key aims of the new SCUBA-2 camera on the
JCMT aims to measure the clustering of SMGs. The BLAST balloon-borne
telescope has already inferred a clustering signal in SMGs from fluctuations in its
background measurements, consistent with z ∼ 1 SMGs having typical halo
masses of ∼1013 M) . Also, it appears that the central brightest cluster galaxies
(BCGs) in z > 1 galaxy clusters have already formed > 90% of their stellar
See Collins, C.A., 2009, Nature, masses by z = 1.5, compared to their z = 0 counterparts. This suggests that the
458, 603. formation of BCGs is more akin to monolithic collapse than resembling the result
of repeated mergers. The forthcoming wide-field surveys by SCUBA-2 and
Herschel may find the rare violent starbursts that accompanied this collapse.

5.8 Feedback in galaxy formation


What do ULIRGs evolve into? It’s very hard to know what any galaxy evolves
into without an impossibly long wait, but it’s been proposed that ULIRGs evolve
into quasars. Many local ULIRGs appear in optical imaging to be in the late
stages of a galaxy–galaxy merger, while there are at least some quasars that
178
5.8 Feedback in galaxy formation

appear to be in later merger stages. However, the levels of disturbance in both


cases are luminosity-dependent, and the luminosity of the ULIRG need not
necessarily equal the luminosity of the later quasar, so it’s not clear how to
use these observations to test the model. Numerical simulations have shown
(Figure 5.20) that the energy input from quasar activity can heat the interstellar
medium of a galaxy and/or expel gas, which can abruptly shut off star formation
(recall that the Jeans mass is temperature-dependent — see Exercise 3.3).

Figure 5.20 A simulation of a collision between two galaxies, with (top) and
without (bottom) energy input from accretion round a supermassive black hole.
The gas temperature distribution is colour-coded, blue to red. The maximum in
the star formation and black hole accretion is at 1.6 Gyr, when the galaxies merge.
The energy output from black hole accretion expels the gas from the inner regions
of the merged galaxy after this stage. However, without this energy input, the
result would be very different.
Another source of energy input and a cause of gas expulsion is supernovae. Local
starburst galaxies often have supernova-driven ‘superwinds’ of gas being expelled
from the galaxy. The amount of star formation (or black hole accretion) can
therefore affect the future star formation; in general, this is known as feedback.
Both star formation and black hole accretion are very complex phenomena and
are so far best addressed with large numerical simulations, the results of which are
used in generic ways in semi-analytic models. The role of feedback is perhaps the
single greatest unknown in our understanding of galaxy evolution.
The intra-cluster medium in galaxy clusters is enriched with heavy elements,
which is good evidence that galactic winds have played an important role in
galaxy evolution. The figure on the cover of this book shows an example galactic
wind observed with the Chandra X-ray telescope. Numerical simulations show
that supernova-driven winds succeed in driving out most of the gas only in dwarf
galaxies (< 108 M) ). However, the lack of resolution in these simulations means
that they don’t account for the Rayleigh–Taylor instability (which could help the
hot wind escape) or the Kelvin–Helmholtz instability. Alternatively, an outflow
driven by the energy from black hole accretion could also lead to a bigger wind
from the galaxy than supernovae can generate on their own.

179
Chapter 5 The distant multi-wavelength Universe

AGN feedback could also solve the cooling flow problem in galaxy clusters
(Section 3.9). Figure 5.21 shows a smoothed X-ray image of the Hydra A galaxy
cluster (greyscale) tracing the hot gas of the intra-cluster medium, superimposed
on contours of radio flux density from the radio lobes. The X-ray gas temperature
and profile is consistent with a cooling flow, but the AGN radio lobes have cleared
out cavities. Perhaps AGN radio lobes are the mechanism by which cooling flows
are stopped or regulated? A further source of mechanical energy input from the
AGN was found in Chandra X-ray images of the Perseus galaxy cluster, shown
in Figure 5.22 (the cavities are similarly due to radio lobes). After an image
processing technique called ‘unsharp masking’ (which amplifies high-frequency
Fourier components in the image, while suppressing low-frequency components),
large-scale ripples are seen (Figure 5.23). The lack of temperature changes
in these oscillations led to them being interpreted as acoustic waves with a
wavelength of about 11 kpc and a period of 9.6 × 106 years, or a frequency
about 57 octaves below middle C.8 AGN can therefore inject energy into the
surrounding medium in two ways: radiation energy and mechanical energy. These
are sometimes called radiative mode and kinetic mode.

Figure 5.21 Smoothed Figure 5.22 X-ray image of Figure 5.23 The result of
X-ray image of the galaxy the Perseus galaxy cluster, applying the unsharp masking
cluster Hydra A (grey shading), colour-coded by temperature. technique to Figure 5.22, which
compared to the radio lobes The image is 350 arcseconds removes large-scale features and
from the active nucleus across, or 131 kpc. accentuates small-scale features,
(contours). The bar marks a revealing subtle ripples.
distance of 20 arcseconds.
AGN may also play a more complex role than simply shutting down star
formation and expelling gas. Radio jets from radio-loud AGN have been observed
to trigger star formation in some systems, and there is at least one quasar in which
this jet-induced star formation has been argued9 to pre-date the formation of the
quasar host galaxy.

8
The press referred to this as the ‘deepest bass’ note in the Universe.
9
See Elbaz, D. et al., 2009, arXiv:0907.2923, and references therein.
180
Summary of Chapter 5

Summary of Chapter 5
1. The energy density in the cosmic optical/near-infrared background is about
the same as the energy density in the cosmic far-infrared background.
2. The observed cosmic backgrounds from optical to far-infrared wavelengths
are closely linked to the evolving comoving luminosity densities:
*
1 ddcomoving
Iν (ν0 ) = Eν . (Eqn 5.5)
4π (1 + z)
At any redshift, the populations that contribute most to the background will
be the same as those that contribute most to the corresponding luminosity
density. This links the extragalactic background light to the cosmic star
formation history.
3. A common unit used in astronomy is the jansky (symbol Jy), where
1 Jy = 10−26 W m−2 Hz−1 .
4. No-evolution source count models don’t follow the Euclidean slope at the
faint end, partly because of K-corrections and partly because of the
redshift-dependence of luminosity distance. In other words, the source
counts of a non-evolving population in a flat non-expanding universe are
very different to no-evolution counts in a flat expanding universe.
5. K-corrections often strongly affect the redshift distributions of surveys.
6. The angular resolution of a telescope in radians is given by θ = 1.22λ/D,
where λ is the observed wavelength and D is the diameter of the telescope’s
primary mirror. The large diameters of submm-wave telescopes are not
enough to compensate for the large wavelengths, so optical telescopes have
sharper images. Many optical galaxies can sometimes be found within the
observed position of an SMG, and it’s not always obvious which optical
galaxy is responsible for the submm flux. Other information is needed in
these cases.
7. Infrared-luminous galaxies are sometimes described as LIRGs
(1011 –1012 L) ), ULIRGs (1012 –1013 L) ) and HLIRGs (> 1013 L) ).
Locally, ULIRGs and HLIRGs appear to be galaxy–galaxy mergers.
8. Far-infrared luminosity in galaxies is not always cospatial with
optical-ultraviolet light.
9. There are many assumptions in calculating dust masses: the values of kd , the
temperature and the grey body index are needed, even if assuming a single
dust temperature.
10. There are many methods of determining star formation rates, such as
ultraviolet luminosities from young stars, far-infrared luminosities from star
formation in giant molecular clouds, or radio luminosities deriving
ultimately from supernovae. However, all measure the numbers of massive
stars, so one needs to assume an initial mass function to extrapolate to all
stellar masses.
11. The far-infrared and radio luminosities of star-forming galaxies correlate
strongly.

181
Chapter 5 The distant multi-wavelength Universe

12. In a flux-limited survey, the luminosity correlates with redshift. This is a


selection effect called Malmquist bias, caused by the effect of the flux limit
and the evolution of the objects being studied.
13. Downsizing is the apparently anti-hierarchical behaviour of massive
present-day galaxies having formed the bulk of their stars earlier in the
history of the Universe than present-day less-massive galaxies.
14. The energy and momentum input to the interstellar medium from an active
nucleus, or from supernovae, can have a large effect on the evolution of a
galaxy. In general, this is known as feedback, and is a major area of
uncertainty in models of galaxy formation and evolution.

Further reading
• Kolb. U.C., 2009, Extreme Environment Astrophysics, Cambridge University
Press.
• The Space Telescope Science Institute has an online tool for converting
between janskys, magnitudes and other systems, currently at
[Link] [Link].
• Condon, J.J., 1992, ‘Radio emission from normal galaxies’, Annual Review of
Astronomy and Astrophysics, 30, 575.
• De Zotti, G. et al., 2009, ‘Radio and millimeter continuum surveys and their
astrophysical implications’, arXiv:0908.1896.
• Hauser, M.G. and Dwek, E., 2001, ‘The cosmic infrared background:
measurements and implications’, Annual Review of Astronomy and
Astrophysics, 39, 249.
• Kennicutt, R.C., 1998, ‘Star formation in galaxies along the Hubble sequence’,
Annual Review of Astronomy and Astrophysics, 36, 189.
• Larson, R.B., 2005, ‘Thermal physics, cloud geometry and the stellar initial
mass function’, Monthly Notices of the Royal Astronomical Society, 359, 211.
• For more about the BLAST mission, including its crash and the heroic recovery
of its data, see the BLAST web page currently at [Link].

182
Chapter 6 Black holes
Black holes . . . are the most perfect macroscopic objects there are in the
universe: the only elements in their construction are our concepts of space
and time.
S. Chandrasekhar

Introduction
Where are the biggest black holes in the Universe? Why does every galaxy
contain a giant black hole at its centre, and how can we tell? And where did they
come from?
We’ll see in this chapter that black holes have been extremely important in galaxy
evolution. Most of the light that’s generated in the Universe has, ultimately, two
main origins: the release of nuclear binding energy through nuclear reactions in
stars, and the release of gravitational binding energy through accretion onto black
holes. This accretion luminosity is extraordinarily efficient — black holes turn out
to be not so black after all.

6.1 What are black holes?


Dark matter is an older idea than you might suppose. In the late eighteenth
century, John Michell speculated that some stars might be so massive that light
could not escape their surfaces. It was then very unusual for astronomers to
use statistics, but Michell used a statistical argument to show that most close
superpositions of stars on the sky (such as the Pleiades) are in fact neighbours, ‘to
whatever cause this may be owing, whether to their mutual gravitation, or to some
other law or appointment of the Creator’.10 In a brief but fascinating speculation11
in 1784, he pointed out that if many stars are in binary systems, then there could
be stars seen orbiting an invisible massive companion.
The modern counterpart to Michell’s ingenious suggestion of ‘dark stars’ is black
holes. In general relativity, the exterior of any spherically-symmetric non-rotating
(uncharged) mass is described12 by the Schwarzschild metric
C D
2 RS 2 2 dr2
ds = 1 − c dt − − r2 (dθ2 + sin2 θ dφ2 ), (6.1)
r 1 − RS /r
where the Schwarzschild radius is RS = 2GMBH /c2 , and MBH is the black hole
mass.13 The derivation assumes only spherical symmetry, a vacuum at the radii of
interest, and Einstein’s field equations, and so also proves Birkhoff’s theorem
10
Michell, J., 1767, Philosophical Transactions of the Royal Society of London, 57, 234–64;
available at [Link] This paper also proves the
N (> S) ∝ S −3/2 Euclidean integral source counts in Chapter 4.
11
In Michell, J., 1784, Philosophical Transactions of the Royal Society of London, 74, 35–57;
available at [Link] This paper also makes the first
prediction of gravitational redshift, using Newtonian arguments.
12
This is derived, for example, in Theoretical Cosmology by R. Lambourne.
13
Some texts use M• to represent the black hole mass.
183
Chapter 6 Black holes

(mentioned in Chapter 4). General relativity therefore shares two important


results with Newtonian gravity: that there is no gravitational field inside a
spherical shell, and that the gravitational field outside a spherically-symmetric
matter distribution is the same as the field from a central point mass. Another
corollary is that spherically-symmetric pulsation cannot produce gravitational
waves, of which more later in this chapter.
The surface r = RS is known as the event horizon, and black holes have their
matter within this surface. A light ray (i.e. ds = 0) on a radial trajectory in a
Schwarzschild metric (i.e. dθ = dφ = 0) has dr/dt = ±c(1 − RS /r). As t → ∞,
r → RS and dr/dt → 0, so the light ray never crosses r = RS as seen by a distant
observer. Nevertheless, it can be shown that an infalling observer would still cross
the event horizon in a finite proper time τ . Seen by a distant observer, an infalling
watch would tick increasingly slowly as it approached the event horizon, but
would not cross it; seen by an observer wearing that watch, time would not dilate
and he or she would cross the horizon in a finite time. Once the horizon is crossed,
this infalling observer is out of communication with the rest of the Universe.
There had been some speculation that some physical processes could prevent
matter inside a black hole collapsing to the singularity at r = 0. The
Penrose–Hawking singularity theorems put an end to these hopes, by showing that
the collapsing matter is trapped inside a volume whose surface shrinks to zero.
General relativity makes no predictions for the conditions at the singular point, so
it’s a theory with the unusual property that it demonstrates its own incompleteness.
Black holes are also surprisingly simple: it turns out that they can be described
completely by their mass, spin and electric charge. Once the event horizon is
formed and the metric eventually settles down to become time-independent, all
other information is lost. (The saying ‘black holes have no hair’ has led to this
being called the ‘no-hair theorem’.)
This is described in detail in The metric of a rotating (uncharged) black hole is the Kerr metric, which we
graduate-level texts such as shall describe only briefly here. It is completely determined by the mass MBH and
Gravitation by Misner, C. W., angular momentum J , often expressed in terms of angular momentum per unit
Thorne, K. S. and Wheeler, J. A. mass a = J/(MBH c). Provided that the spin is less than a maximal value of
(published by W. H. Freeman). a = GM/c2 , there are two astrophysically-relevant critical surfaces, at
AC D
GMBH GMBH 2
r+ = + − a2
c2 c2
and
AC D2
GMBH GMBH
s+ = + − a2 cos2 θ .
c2 c2
The radius r+ is an event horizon, but the outer horizon is the boundary of the
‘ergoregion’ in which all matter and light rays are forced to co-rotate with the
black hole — an extreme case of a relativistic phenomenon known as frame
dragging. (There is also a third surface — see the further reading section.)
It’s not clear if black holes can have spins at or above the maximal value
a ≥ GM/c2 . No mode of matter accretion can take a sub-maximally-spinning
black hole and increase the spin above its maximal value — it turns out that any
angular momentum gained is more than balanced by an increase in MBH . From
184
6.2 The Eddington limit

the equations above, black holes spinning above the maximal rate have no event
horizons. It turns out that it then becomes possible for particle world-lines around This view has recently been
the black hole to be closed time-like loops, i.e. the black hole would be a time challenged by Jacobsen, T. and
machine! Roger Penrose has proposed a ‘cosmic censorship hypothesis’ that there Sotiriou, T. P., arXiv:0907.4146.
are no naked singularities in nature, i.e. no singularities without event horizons.
At the time of writing, this remains unproven in classical gravity (unless naked
singularities are taken as pre-existing). The hypothesis is the subject of a bet
between Stephen Hawking, who contends that it is correct, and Kip Thorne and
John Preskill, who contend that it is not. The stake of the bet (re-formulated in
1997 to eliminate possible loopholes) is: ‘The loser will reward the winner with
clothing to cover the winner’s nakedness. The clothing is to be embroidered
with a suitable, truly concessionary message.’ A related conjecture by Stephen
Hawking is the ‘chronology protection conjecture’ that fundamental physics
forbids closed time-like loops. This conjecture would be addressed by, or could
form part of, a future theory of quantum gravity.
General relativity is time-symmetric, so one could conceive of the time-reverse of
black holes, sometimes called ‘white holes’. Perhaps entropic reasons forbid
them, for a reason similar to why molecules in a lake do not suddenly conspire to
throw pebbles out. We shall not dwell on this, except to say that many deep
problems in classical and quantum gravity involve entropic aspects of gravity.
White holes also occur in discussions of the Kerr metric. The curvature singularity
in the Kerr metric is ring-shaped, unlike the point-like Schwarzschild singularity.
Infalling matter in a Kerr metric could pass through a further inner horizon and
though the ring, and, if the same metric holds past these points, would emerge out
of a white hole — where? This has been the subject of much speculation in
science fiction. But an infalling traveller attempting this journey would receive an
infinite flux of radiation that fell into the hole from the other side, infinitely
blueshifted — an infinitely ferocious gauntlet to run. Clearly, there is still much to
be understood about these singular regions.
Black holes can form as the end-point of the most massive stars’ evolution,
when nuclear reactions or degeneracy pressure can no longer support a stellar
core against gravity. This conclusion was first reached by Chandrasekhar and
infamously opposed by Sir Arthur Eddington, who said: ‘I think there should be a
law of nature to prevent a star from behaving in this absurd way.’ Hindsight has
shown Chandrasekhar to be correct, but ironically Eddington’s name has become
associated with black hole accretion, as we shall see in the next section. One
might nevertheless share a similar reaction to the inevitable curvature singularities
inside black holes.

6.2 The Eddington limit


How quickly can black holes grow? There must come a point where the outward
photon pressure from the accretion luminosity balances the inward gravitational
attraction towards the black hole.
Suppose that the luminosity of the central object is L. The rate of energy output
from this central object is just L, i.e. dE/dt = L. For photons, the momentum p
and energy E are related by E = pc, so the momentum flux from the central
object is dp/dt = L/c. The photon pressure P at any distance r from the centre
185
Chapter 6 Black holes

will be the momentum output per unit area, i.e.


L
P = , (6.2)
4πcr2
because the area of the surface of a sphere with radius r is 4πr2 . Suppose that the
infalling gas is hot enough to be a plasma (which is true for most astrophysical
black holes). The force felt by an electron in this plasma from this photon flux
will be this pressure P times the cross section σT for Thomson scattering of
photons by electrons:
σT L
Fphoton = σT P = . (6.3)
4πcr2
Atomic nuclei also present a Thomson scattering cross section, but σT ∝ q 2 /m4 ,
where q is the particle charge and m is its mass, so the electron Thomson
scattering cross section provides the dominant outward force on the infalling gas.
The inward force from gravity on a hydrogen nucleus is just
G MBH mp
Fgravity = , (6.4)
r2
where mp is the mass of a hydrogen ion, i.e. a proton, and MBH is the mass of the
black hole. Although we’ve assumed that the infalling matter is a plasma, the
electron gas would still be strongly coupled electrostatically to the gas of positive
nuclei or ions. The outward force on the infalling plasma is exerted mainly on the
electrons, while the inward force is exerted mainly on the protons and neutrons,
but the plasma still responds as a whole rather than separating out by charge.

Exercise 6.1 Demonstrate this, using some quantified argument of your own
invention. (Like Exercises 4.4 and 4.11 this is a more open-ended exercise.) ■
The inward and outward forces balance when Fphoton = Fgravity , so
σT L G MBH mp
2
= . (6.5)
4πcr r2
Remarkably, the r2 terms cancel, so the balance is independent of radius. The
luminosity at this balance is
4πGc MBH mp
LE = , (6.6)
σT
which is known as the Eddington luminosity, after Sir Arthur Eddington who
first made these calculations in the context of stellar opacity.
In black hole accretion, this luminosity is generated by the accretion disc, so
L ∝ dMacc /dt. Ultimately, some of this energy output comes from the release of
gravitational binding energy of matter falling towards the black hole. This
happens because friction in the accretion disc leads to energy losses through
thermal radiation, so the orbital radius of the matter decreases. As the accreting
matter approaches the black hole, some of the combined mass-energy is converted
to luminosity. The accretion luminosity can be expressed in terms of the
mass-energy accretion rate (E = mc2 so Ė = ṁc2 ):
dMacc 2
L=η c , (6.7)
dt
186
6.2 The Eddington limit

where η (which is ≤ 1) is the conversion efficiency from mass-energy accretion to


luminosity. The maximum black hole growth rate will happen when the accreting Note therefore that the fraction
matter reaches the Eddington luminosity, known as the Eddington limit. of the accretion rate that goes
into increasing the mass of the
Exercise 6.2 The Eddington limit is also the highest luminosity that a black hole is (1 − η) and
gravitationally-bound object can have without photon pressure blowing it apart. so the rate of growth of the
How close is the Sun to the Eddington limit? (The value of σT is 6.65 × 10−29 m2 . mass of the black hole is
The mass of a proton, and the mass and luminosity of the Sun, are in Appendix A.) dMBH /dt = (1 − η)dMacc /dt.
Exercise 6.3 How close is a light bulb to the Eddington limit? Comment on
your answer. ■
We can also define a characteristic timescale tE , sometimes called the Eddington
timescale or the Salpeter timescale:
MBH c2 σT c
tE = = 1 4 × 108 yr. (6.8)
LE 4πGmp
This is the time that an object would take to radiate away all its rest mass, at the
Eddington limit. If we set LE = η(dMacc /dt)c2 , we can write this as
η dMBH 2
LE = c .
1 − η dt
Then, given the definition of the Eddington timescale in Equation 6.8, we can
combine these two equations to give
dMBH 1 − η MBH
= .
dt η tE
This differential equation has a solution MBH ∝ exp[((1 − η)/η)(t/tE )]. So the
e-folding timescale for the growth of the black hole is tE /(1 − η). The e-folding timescale is the
Remember that we’ve assumed spherical symmetry in deriving the Eddington time to increase by a factor of
luminosity and accretion rate. Non-spherical accretion is complicated to e = 2.718 28.
describe analytically, and is often studied in numerical simulations. Accreting
matter in astrophysics usually settles into a disc geometry, for which there is
a well-developed theory (see the further reading section). Nevertheless, the
Eddington limit provides an important order-of-magnitude reference for accretion
flows in general. It’s also possible for accretion rates above the Eddington rate
(called super-Eddington accretion) to occur in some cases. If the accreting
matter is optically-thick and also has a very high accretion rate, it could trap the
radiation and drag it down with it into the black hole — a process known as
advection. These advection-dominated accretion flows (ADAFs) have been
conjectured in the rare class of narrow line Seyfert 1 AGN, which appear to lack a
broad line region yet have a strong non-thermal continuum. Super-Eddington
ADAFs are not spherically symmetric, and strong gas flows along the angular
momentum axis are driven out by radiation pressure. In the opposite limit, in
which the accretion rate is much lower than the Eddington limit, advection can
again play a role. If the accreting gas has a very low density, it may be inefficient
at radiating and the matter may be advected onto the central black hole. These low
accretion rate ADAF models reproduce the spectral energy distribution of Sgr A*
at the centre of our Galaxy, and have been conjectured to be present in some
low-luminosity AGN.

187
Chapter 6 Black holes

6.3 Accretion efficiency


As matter orbits a black hole, it gradually loses energy through radiative losses
from, for example, friction. One could think of this as a conversion of
gravitational binding energy or mass-energy to luminosity. We’ll find that
accretion onto black holes can be extraordinarily efficient at converting
mass-energy into luminosity — much more efficient than, say, a nuclear bomb.
Once matter has fallen inside the event horizon, no light signals can reach the
outside world, but before that point the accretion process can be prodigiously
luminous. Paradoxically, this makes black holes the best candidates for some of
the most luminous objects in the Universe.
To calculate the accretion efficiency of black holes, we’ll first have to calculate the
orbital motion. We calculated this for radial motion of light rays in Section 6.1,
but for a massive particle ds2 5= 0, so we need more information to find the
motion. Freely-falling particles in general relativity
) follow geodesics, which are
extremal paths in the relativistic sense, i.e. ds along the path is a maximum.
This means that the integral
* # *
dxµ dxν
Y= gµν dτ = ds (6.9)
dτ dτ
In space (as opposed to is a maximum for the path taken by the particle x(τ ), where τ is the proper time,
spacetime) geodesics are paths and xµ means the µth component of the vector x rather than x-to-the-power-µ
of shortest distance between two (see Appendix B). Calculus conventionally finds the value of a variable that
points, but in the spacetime of maximizes a function, but here we want to find the function x(τ ) that maximizes
general relativity, geodesics are the value of a variable Y.
paths between two events along This is the subject of a branch of calculus) known as the calculus of variations.
which the proper time is a The key result for our purposes is that if L dτ is to be minimized, and L does
maximum. not depend on a parameter y, then
∂L
= constant (6.10)
∂ ẏ
is a conserved quantity (note the dot in the denominator). For example, if L
doesn’t depend on the µth component of the vector x, i.e. xµ , then ∂L/∂ ẋµ is a
conserved quantity. The proof takes us away from the main story of this chapter,
but in case you’re not satisfied with having this pulled out of a hat, there’s a proof
in the box below.

The principle of least action


The most useful application of the calculus of variations in physics is the
principle of least action. From it we’ll find a deep and beautiful connection
between angular and linear momentum, and a wonderful way of re-stating
the fundamental laws of the Universe. It will also indirectly answer this
question: why is it that in a plot of linear momentum mv against velocity v,
the area under the curve is the kinetic energy 12 mv 2 ? Where’s the
connection? There’s no obvious answer in conventional Newtonian physics.
)
We define the action A as A = L dt, where L (known as the Lagrangian)
is the difference between the kinetic and potential energies, L = EK − V .
This is not the total energy, which would be EK + V . As in Fermat’s

188
6.3 Accretion efficiency

principle in optics, it seems that particles in classical mechanics follow paths


that minimize the action (we’ll ask why later). Note that some textbooks use
the symbol T for kinetic energy in this context.
Suppose that x(t) is the path taken by a particle between two points x(t0 )
and x(t1 ) that minimizes the action, and that some other neighbouring path
x(t) + ε(t) doesn’t do so (see Figure 6.1).

x
x(t1 )

x(t) ε(t)

x(t0 )

t0 t1 t

Figure 6.1 The least-action path x(t) and a neighbouring path


x(t) + ε(t).

We set ε(t0 ) = ε(t1 ) = 0 so that the paths start and end at the same points.
The velocity on the least-action path is v(t) = dx(t)/dt, while on the
neighbouring path it’s v(t) + ε̇(t). To first order, the Lagrangian L(x, v) on
the neighbouring path will be

∂L ∂L
L(x + ε, v + ε̇) = L(x, v) + ε(t) + ε̇(t) , (6.11)
∂x ∂v
so the action on the neighbouring path will be A + δA, where
* t1 C D
∂L dε ∂L
δA = ε(t) + dt. (6.12)
t0 ∂x dt ∂v

We can integrate the second term in the integral by parts:


* : L * t1 C D
t1
dε ∂L ∂L t1 d ∂L
dt = ε(t) − ε(t) dt. (6.13)
t0 dt ∂v ∂v t0 t0 dt ∂v

But we set ε(t0 ) = ε(t1 ) = 0, so the term in the square brackets is zero, and
* t1 C D
∂L d ∂L
δA = ε(t) − dt. (6.14)
t0 ∂x dt ∂v

Now, you already know that if a variable a minimizes a function y(a), then
changing a at that minimum doesn’t change y to first order (see Figure 6.2).

189
Chapter 6 Black holes

Δy ∝ a

Δy ∝ a2
a

Figure 6.2 Near the minimum of this curve y(a), the displacement Δy is
proportional to a2 . This is because dy/da = 0 there, which means that the
first-order term in a Taylor series expansion around this point is zero, and
only the second-order term is non-zero. Elsewhere dy/da 5= 0, so Δy ∝ a
to first order.
Similarly, if x(t) is the path that minimizes the action, then putting a small
wiggle ε(t) onto the path ) won’t change the action to first order, i.e. δA = 0
(sometimes written as δ L dt = 0). But though ε is small, it’s arbitrary, so
δA = 0 can happen only if
∂L d ∂L
= . (6.15)
∂x dt ∂v
This is known as the Euler–Lagrange equation.
We’ve done this in one dimension for simplicity, but if the Lagrangian
depends on many coordinates q1 , q2 , . . . , q̇1 , q˙2 , . . . — where, for example,
the coordinates could be Cartesians (q1 , q2 , q3 ) = (x, y, z) or polars
(q1 , q2 , q3 ) = (r, θ, φ) or indeed any coordinate system — then
∂L d ∂L
= (6.16)
∂qi dt ∂ q̇i
for every qi .
We can use this to find conservation laws in physics. If the Lagrangian L is
independent of a Cartesian coordinate x, and v = dx/dt, then
d ∂L
= 0,
dt ∂v
so ∂L/∂v must be a constant. Empty space with no potential (V = 0) has
this property, and the conserved quantity in Cartesian coordinates turns out
to be linear momentum. Similarly, L in polar coordinates won’t depend
on θ, and the conserved quantity turns out to be angular momentum. In
Newtonian gravitation, L also doesn’t depend on θ, and the conserved
quantity gives Kepler’s second law. This can also be proved from angular
momentum conservation in elliptical orbits, but the Lagrangian trick is much
quicker and easier.
These coordinate independencies can also be thought of as symmetries,
because if L is not a function of x (i.e. L 5= L(x)), then L is invariant under

190
6.3 Accretion efficiency

the transformation x → x + δx for any δx. The fact that symmetries in the
Lagrangian imply conservation laws is known as Noether’s theorem, after
the brilliant physicist Amalie Emmy Noether (Figure 6.3), and is very
widely used in fundamental physics. A related argument can show that
energy conservation in general reflects time-independence. Maxwell’s
equations and charge conservation are equivalent to Lorentz invariance of
the electromagnetic vector potential — or rather, a more general invariance
known as gauge invariance (see, for example, Ryder, L.H., 1985, Quantum
Field Theory, Cambridge University Press). Einstein’s field equations of
general relativity can be found by minimizing an appropriate Lagrangian.
All fundamental physics can be thought of as simple symmetries and
conservation laws. This realization was a crowning achievement of
nineteenth and early-twentieth century physics, and is still true today.
What is energy, anyway? What is momentum? Lower-level textbooks tend
to describe their effects, but fundamentally these quantities are mostly
interesting because they are conserved. If they weren’t, we’d find and use
other quantities that were. This thinking is useful when particle physics
presents you with more abstract quantities such as strangeness or colour
charge or lepton number, and you wonder what they are.
You might well ask why Nature obeys the principle of least action —
after all, the total energy EK + V is an obviously physical quantity, but
L = EK − V isn’t. Classically, it’s hard to interpret, but the physicist
Richard Feynman found that in quantum mechanics the phase of the wave
function is A/!. He imagined particles in quantum mechanics taking all
possible paths simultaneously, but nearby paths would have wave functions
that tend to cancel out (because to first order the phases are different), except
in the regions where the phases are all the same to first order, i.e. where
δA = 0, which is the path of least action. The smallness of ! ensures
that this cancellation happens only very close to the least-action path, so
macroscopically (i.e. classically) the particle appears to take only the
least-action path.
Figure 6.3 Amalie Emmy
Geodesics are the paths that Noether, 1882–1935.
) ) maximize the total relativistic spacetime distance
ds along the path (i.e. δ ds = 0 in the notation of the box above). In fact, we
)can pick any two points on the path and the geodesic will follow the maximum
ds between those points — because if it didn’t,
) we could tweak the path
between those points and find a better global ds. In general relativity, any
free-falling frame is locally the metric of special relativity, and the maximum
spacetime interval-in special relativity between two events is just a straight line in
spacetime: δs = (c δt)2 − (δx)2 − (δy)2 − (δz)2 . We can think of a geodesic
as a sum of these δs contributions measured in a chain of free-falling reference
frames along the path. But if δs is a maximum, then so is (δs)2 . By instead
summing up these (δs)2 contributions, we can make a new quantity, say Y2 , that’s
also maximized along geodesics:
*
dxµ dxν
Y2 = gµν dτ. (6.17)
dτ dτ

191
Chapter 6 Black holes

The Schwarzschild metric in Equation 6.1 is independent of t and φ, so applying


the Euler–Lagrange equation (Equation 6.16) gives two conserved quantities, one
corresponding to energy and one to angular momentum:
C D
RS dt E B
1− = constant = =E (6.18)
r dτ mc2
and
r2 dφ J B
= constant = = J, (6.19)
c dτ mc
where RS is the Schwarzschild radius, and we define the constants of specific
angular momentum (i.e. angular momentum per unit rest mass) to be J/mc or JB,
B = E/(mc2 ).
and the specific energy to be E
To find the orbital equations in the Schwarzschild metric, we can choose an orbit
in the equatorial plane (θ = π/2, dθ = 0) without loss of generality. We can then
combine the metric (Equation 6.1 with ds2 = c2 dτ 2 ) with the angular momentum
and energy conservation (Equations 6.18 and 6.19) to find that
C D C D9 B2
<
1 dr 2 B2 − 1 − R S J
=E 1+ 2 . (6.20)
c dτ r r
B 2 − VB 2 , where VB is an ‘effective
We can think of this as (1/c)2 (dr/dτ )2 = E
potential’ per unit mass given by
?
7C
7 D9 B2
<
R J
VB (r) = 4 1 −
S
1+ 2 . (6.21)
r r

Figure 6.4 plots this function for various specific angular momenta J. B Most
angular momenta have a stable minimum at one radius. This is also the radius of a
B because for circular orbits, dr = 0 ⇒ dr/dτ = 0
circular orbit at that J,
⇒E B = VB . But at sufficiently low angular momenta, there is no stable circular
orbit. There is therefore a closest possible inner edge of the accretion disc.
We’ll use this closest stable circular orbit to calculate the maximum efficiency for
converting mass-energy to luminosity in a black hole accretion disc. It’s not too
hard (but a bit tedious) to show that√the smallest stable circular orbit around a
Schwarzschild black hole has JB = 3RS and radius r = 3RS . (Either find where
d2 VB /dr2 = 0, or set dVB /dr = 0 and require that a finite root exists.) Putting in
the numbers, the fractional binding energy of an orbit at this radius is therefore
mc2 − E -
= 1 − EB = 1 − 8/9 = 0.0572 . . . 1 6%. (6.22)
mc2
If a particle spirals in from r = ∞, radiating binding energy or mass-energy as
luminosity via friction, this is the fraction of energy released. For comparison,
< 0.1% of the uranium rest mass in a fission-based atomic bomb is converted to
energy.
The corresponding result for a maximally-spinning Kerr metric is a minimum
stable equatorial circular orbit radius of r = (5 ± 4)GMBH /c2 (use − for
prograde
J orbits, + for retrograde). The fractional binding energy
J is
5 1 1
1− 3 3 1 4% for retrograde orbits and an astonishing 1 − 3 1 42% for
192
6.4 Cosmic mass density of black holes, ΩBH

prograde orbits. (If the black hole has a spin just less than the maximal value, e.g.
0.998 times the maximal spin, then this can drop to ‘merely’ ∼30%, but this is See,1.04
for example, Thorne, K.S.,
still an astonishingly high efficiency.) The proofs follow similar methods to the 1974, Astrophysical Journal,
Schwarzschild case, but are longer for this more complicated metric (details are in 191, 507.
Misner, Thorne and Wheeler). Retrograde accretion orbits seem unlikely, either 1.02
because of frame dragging or because the black hole and accretion disc both
formed from matter with similar angular momentum axes, so spinning black holes
are expected to be even more efficient converters of mass–energy to luminosity. 1.00

6.4 Cosmic mass density of black holes, ΩBH 4.330

V!
4
Almost 200 years after John Michell’s suggestion of matter hidden in dark stars, 0.98
the cosmologist Andrzej Soltan spotted that the number counts of quasars give an
ingenious constraint on the present-day total mass of black holes, which is
independent of H0 , Ωm and ΩΛ . This is done by estimating the total energy output 0.96
of quasars throughout the history of the Universe, and applying the Eddington
limit. 3.75

0.94 2 3 = 3.464
The energy output of quasars per unit comoving volume and per unit time is
E = L Φ(L, z), where L is the quasar bolometric luminosity and Φ is the
bolometric luminosity function. The bolometric energy output in a time t to t + dt 0.92
from quasars with luminosities from L to L + dL is then 0 5 10 15 20 25
r/(GM/c2 )
E(L, t) dL dt = L Φ(L, z) dL dt,
Figure 6.4 The effective
where time t and redshift z are related through Equation 1.34. Now, the number
potential V! around a
counts are related to the luminosity function by
Schwarzschild black hole, given
dV by Equation 6.21. The numbers
4π n(S, z) dS dz = Φ(L, z) dL dz,
dz on the curves are the values of
where n(S, z) is dN/dS evaluated at flux S for objects with redshift z, and V is the specific angular momentum
the comoving volume. Luminosity and flux are also related: relative to the black hole mass,
! 2 /(GM ), where M is the
Jc
L = 4πd2L S, black hole mass.
so
" $−1
2 dV
E(L, t) dL dt = (4π) S n(S, z) dS d2L dt. (6.23)
dz
Now one can show that
" $
2 dV −1 1
4πdL dt = (1 + z) dz. (6.24)
dz c
Putting this into Equation 6.23 and integrating over L and t, we find that the total
energy output of quasars throughout the history of the Universe is
# ∞ # t0
Etotal = E(L, t) dL dt
L=0 t=0
# #
4π ∞ ∞
= (1 + z) S n(S, z) dS dz. (6.25)
c z=0 S=0
This does not depend on either H0 or the cosmology!

193
Chapter 6 Black holes

Exercise 6.4 Soltan’s ingenious argument rests in part on Equation 6.24.


Prove this relation, by first showing that
dV 4πcd2L c 4πd2L
= = (6.26)
dz (1 + z)2 H(z) H0 (1 + z)2 (H(z)/H0 )
Beware — some textbooks (in itself a useful relation), and then using other results in Chapter 1. ■
define the comoving volume
differential dV /dz for a unit We can use this to estimate the total mass of black holes today. Quasar number
solid angle, not for the whole counts are rarely measured in bolometric fluxes, but B-band number counts
sky as we have done, so they are much more common. We can relate this to the bolometric flux S (where
miss out the 4π factor. ‘bolometric’ means the total over all wavelengths):
Sometimes this is also referred S = kbol fB νB ,
to as the ‘volume element’.
where kbol is known as the bolometric correction, fB is the B-band flux, and
νB is the typical frequency of B-band light. Soltan approximated Equation 6.25 as
* ∞

Etotal 1 kbol fB νB n(fB ) {1 + (z|fB /} dfB , (6.27)
c fB =0

where (z|fB / is the mean redshift at a given B-band flux fB . The uncertainty in
the B-band number counts n(fB ) is much larger than the uncertainties in (z|fB /.
Once we have the total energy emitted by quasars, we can convert this to a
present-day black hole mass density using ρBH c2 = Etotal (1 − η)/η, where η is
the conversion efficiency of black hole mass accretion to luminosity. The factor of
(1 − η), not originally used by Soltan, accounts for the fact that not all the
accreted matter falls into the black hole. Putting in the numbers, Soltan found a
present-day black hole density of
ρBH = (0.1/η) × 8 × 104 M) Mpc−3
for a bolometric correction of kbol = 6.0 (justified on the basis of quasar spectral
energy distributions). This corresponds to a cosmological density of black holes
from ‘dormant’ quasars of
1 − η 0.1
ΩBH h2 = 3 × 10−7 .
0.9 η
For comparison, the present-day mass density in stars has been estimated as
Ω∗ h = (2.9 ± 0.43) × 10−3 for a Salpeter initial mass function, so ΩBH in
dormant quasars is only about 0.01h% of Ω∗ . However, we’ve only counted the
type 1 (broad line) active galaxies and not the type 2 (narrow line) ones, so this
should be regarded as a lower limit.
Nearby type 1 active galaxies have had black hole masses measured using the
widths of the Balmer emission lines. On the assumption that the dynamics of the
broad line region is dominated by gravity, MBH 1 v 2 RBLR /G, where v is the
velocity width and RBLR is the broad line region radius. This latter parameter can
be estimated from models of the ionization within quasars, or from reverberation
mapping, which we shall meet in the next section. This yields an integrated mass
density of 1 600 M) Mpc−3 from local active galaxies. This is two orders of
magnitude smaller than Soltan’s estimate of ΩBH . Most of the present-day ΩBH
must therefore be in dormant quasars, which we shall cover in the next section.

194
6.5 Finding supermassive black holes

6.5 Finding supermassive black holes


6.5.1 Context
There are many lines of evidence that point to accreting black holes in other
galaxies, such as rapid variability, or non-thermal spectra. We’ll meet some of
these shortly. We’ll also show you some ways to find a black hole that’s not
accreting.
It’s not necessary for matter to be compressed to extreme densities to begin the
formation of a black hole. Filling the Solar System with liquid water (at standard
temperature and pressure, STP, of 0◦ C and 1 atm) up to the orbit of Jupiter would
be more than sufficient. There is not enough water in the Galaxy to do this, but
the density of the Sun is only about 1.4 times that of water at STP, and there
is no shortage of stars, particularly at the centres of galaxies. In practice the
stellar densities are not high enough to generate a supermassive black hole
spontaneously; nevertheless, supermassive black holes (MBH > 106 M) ) are
inferred at the centres of many galaxies.
How can one detect supermassive black holes if they are not accreting matter?
Broadly speaking, the trick is to resolve the sphere of influence of the black
hole, within which the gravity of the black hole GMBH /r2 dominates over the
centripetal acceleration from the galaxy’s velocity dispersion σ. These effects
balance at a typical radius rh , where
σ2 GMBH
= (6.28)
rh rh2
so
C DC D−2
GMBH MBH σ
rh = 1 10 pc. (6.29)
σ2 108 M) 200 km s−1
For comparison, the Schwarzschild radius can be expressed as
C D
MBH
RS 1 2 AU, (6.30)
108 M)
where 1 astronomical unit (AU) is the mean distance from the Earth to the Sun,
equivalent to about 4.8 × 10−6 pc, so the radius rh is about 106 times bigger
than the Schwarzschild radius RS when the galaxy’s velocity dispersion is
σ 1 200 km s−1 . So, if we can reach angular resolutions of ∼106 RS , we may be
able to discern the effects of the black hole.
Proving that the central object is a supermassive black hole, and not some
super-dense clump of non-luminous stuff, is another matter. One could imagine,
for example, a dense cluster of non-luminous stellar objects such as neutron stars,
white dwarfs and stellar-mass black holes. In three galaxies, the limits on the size
and density of the central object imply that it would have dispersed within the
lifetime of the galaxy, so these could be regarded as providing robust evidence for
a black hole. It’s still possible to hypothesize exotic alternatives, such as a cluster
of very low mass (≤ 0.04 M) ) black holes, but there is no plausible astrophysical
process for making them. It’s also very unclear how one would form a central star
cluster of that density without creating a supermassive black hole anyway. The
evidence for supermassive black holes in galaxies therefore rests on astrophysical
plausibility rather than direct detections.
195
Chapter 6 Black holes

Exercise 6.5 Find the angular size in arcseconds of the sphere of influence of a
108 M) black hole in a galaxy with a central velocity dispersion σ = 220 km s−1 ,
at a distance of 10 Mpc. Compare this to the best angular resolution typical of
optical ground-based telescopes (known as ‘seeing’) of ∼0.5%% set by turbulence in
the Earth’s atmosphere. (One arcsecond is 1/3600th of a degree.) ■

6.5.2 Stellar and gas kinematics


As you’ve found in the last exercise, you need excellent angular resolution to
probe the sphere of influence of a supermassive black hole, and this is far beyond
the capabilities of conventional optical ground-based telescopes. One approach is
to site your telescope above the Earth’s turbulent atmosphere. The Hubble Space
Telescope (HST) has been extremely important in making high angular resolution
constraints of supermassive black holes. Another approach is to correct for the
turbulence in the Earth’s atmosphere, by measuring the distortion and correcting
telescope optics in real time. This approach, known as adaptive optics, uses a
nearby bright reference object to monitor the distortions in the wavefront and
deforms the telescope’s optics in real time to correct for these distortions. The
bright reference could be a nearby bright star or a ‘laser guide star’, i.e. a laser
beam sent from the telescope itself.

250

200
σ/km s−1

150

100

−40 −20 0 20 40
r/arcsec
150

100

50
v/km s−1

−50
Figure 6.5 The velocity
−100
dispersion σ and mean
−150 velocity v (both in km s−1 )
−40 −20 0 20 40 along the major axis of the
r/arcsec
core of M31.

The Andromeda galaxy (M31) is 2.52 million light-years away, or 0.77 Mpc. It is
one of our Galaxy’s closest neighbours and can even be seen with the naked eye
on a dark enough night. At this distance, one arcsecond is about 3.7 pc, so the
196
6.5 Finding supermassive black holes

sphere of influence of the black hole may just be within the capabilities of
ground-based telescopes. Figure 6.5 shows the velocities of the stars in the bulge
of M31 (measured from the integrated starlight rather than detecting individual
stars). The characteristic sharp feature in the centre implies a black hole mass of
1 106 M) . Subsequent HST spatially-resolved spectroscopy revised this upwards
to (3.0 ± 1.5) × 107 M) .
The HST found evidence for a much larger black hole in another nearby galaxy,
M87, by measuring the Doppler shifts in ionized gas close to the centre. The
spatial sampling of the HST’s Faint Object Camera is 0.028%% , corresponding
to only about 2 pc in M87. Two sets of observations by the HST yielded
(2.4 ± 0.7) × 109 M) within 18 pc (0.25%% ), followed by (3.2 ± 0.9) ± 109 M)
within 3.5 pc (0.05%% ). The HST has since been used for measuring black hole
masses in several other galaxies. For comparison, the total mass of the entire
Small Magellanic Cloud (Chapter 3), including its dark matter, is about
6.5 × 109 M) .

6.5.3 Megamasers
The early HST discoveries received a lot of press coverage, but at about the same
time — though finding less press notice — a much stronger constraint on a
supermassive black hole came from radio astronomy. The galaxy NGC 4258 (also
known as M106) has naturally-occurring masers (microwave lasers) generating
coherent radiation through stimulated emission, with H2 O molecules providing
the masing medium. These masers are believed to be generated in random
directions but with a small random subset lying along the line of sight to the
Earth. (This abundant maser activity is sometimes referred to as ‘megamasers’.)
This masing emission can be detected by radio interferometry, which routinely
has milliarcsecond (mas) resolutions or better. Figure 6.6 shows the line-of-sight
velocities of masers in NGC 4258 observed with the Very Long Baseline Array
(VLBA) of radio telescopes. Currently, observations are consistent with a central
mass of (3.82 ± 0.01) × 107 M) within a central radius of 0.13 pc (4.1 mas, i.e.
0.0041%% ).

1500

1000
LSR velocity/km s−1

500
550

0 500
450
400
−500 0.4 0.0 −0.4 Figure 6.6 The velocities of
megamasers around the central
9 8 7 6 5 4 3 2 1 0 −1 −2 −3 −4 −5 −6 −7 −8 −9
supermassive black hole in the
impact parameter/milliarcsecond
galaxy NGC 4258.

197
Chapter 6 Black holes

6.5.4 Stellar proper motion


You may already have wondered whether the best chance of resolving the sphere
of influence of a supermassive black hole could be in our Galaxy. The distance
to the Galactic Centre is approximately 8 kpc, so one parsec at this distance
corresponds to around 1/8000th of a radian, or 25%% . This is sufficiently close that
we can trace the orbits of individual stars around the central object and make a
direct mass estimates. This can only be done in the near-infrared because of the
very heavy dust extinction to the Galactic Centre. Ground-based adaptive optics
in the 1–3 µm wavelength range on telescopes with 8–10 m primary mirrors can
be competitive with the HST, and long-term monitoring campaigns of the Galactic
Centre have yielded impressive results. For example, Figure 6.7 shows the orbit of
a star just over only ten light-hours from the central black hole. The enclosed
mass can be derived from the orbits of this star and others, as a function of the
distance to the centre. The data are consistent with a central mass of at least
Ghez, A. M. et al., 2008, 2.6 × 106 M) in our Galaxy. At the time of writing, the current best estimate for
Astrophysical Journal, 689, the central black hole mass is (4.5 ± 0.4) × 106 M) .
1044.
S2 orbit around SgrA*

1994.32 1995.53
1992.23
1996.25
1996.43
1997.54
1998.36
0.05"" 1999.47
(2 light-days)

2000.47
SgrA*
2002.66
2002.58 2001.50
2002.50
2002.40
2002.33 2002.25

Figure 6.7 The orbit of a star close to the central supermassive black hole in
our Galaxy. The numbers indicate the dates of the observations, expressed as
decimal years.

6.5.5 Broad iron X-ray emission line


The inner regions of the accretion disc can sometimes give rise to a high-energy
emission line in the X-ray from ionized iron, known as the Kα line. A pioneering
detection in a nearby active galaxy MGC-6-30-15 has a width of the line implying
Doppler motions of ∼100 000 km s−1 (!), suggesting a location at three to ten
Schwarzschild radii. This is a rare chance to sample the relativistic environment
very close to a black hole.
198
6.5 Finding supermassive black holes

6.5.6 Reverberation mapping


Another way to set a limit on the size is the variability of its light output. If an
object varies on timescales of weeks, then it’s likely to have a physical size of
light-weeks or smaller. Quasars vary on timescales from years to (in some cases)
days. The extremely high luminosities of quasars and these small inferred
physical sizes, together with the very high radiative efficiency of black hole
accretion (Section 6.3), led to supermassive black holes being the leading
explanation for quasar activity.
However, there are exceptions to these variability arguments: for example, if the
light is from a relativistic jet that happens to be pointed at the Earth, the observed
variability can appear to be much shorter. This is one of the consequences of
combining motion towards the observer with the Lorentz transformation (see
Appendix B), known as relativistic beaming, which is covered in more detail
elsewhere (see the further reading section).
Does quasar variability reflect light-travel time or is it just a beaming artefact? A
variation on this technique, known as reverberation mapping, avoids the
possibility of relativistic beaming. A flare of emission from the centre will cause
a brightening in the broad emission lines, but there will be some time delay
because the broad emission line clouds are not as centrally located. Also, the
clouds exist in physical conditions that are very different to relativistic jets. By
cross-correlating the variation in the continuum with the variations in the emission
lines, one can infer how close the emission line clouds are to the central luminous
object. Figure 6.8 shows the time delay between continuum variations and broad
line luminosity variations in the Seyfert 1 galaxy NGC 5548. The time lag
is measured,by cross-correlating the line and continuum measurements, i.e.
summing C(t) L(t − Δt) for various supposed lags Δt.
Why are the lag measurements in Figure 6.8 fairly broad? One reason is that if the
Δt chosen is close to (but not equal to) the underlying value, then there may still
be some positive cross-correlation signal. More importantly, light-travel time
effects mean that the broad line clouds are sampled at a range of radii at any
fixed time lag, as shown in Figure 6.9. One can use this reverberation mapping
in different emission lines to derive the internal structures of the broad line
region of active galaxies. High-ionization lines respond most rapidly, implying a
stratification of ionization, and they also have the largest widths. This immediately
suggests a route to estimating black hole masses: if these widths represent
Doppler motion, then material closer to the black hole is moving faster and we
can use a formalism analogous to Equation 6.28 to estimate a black hole mass:
r(Δv)2
MBH 1 , (6.31)
G
where the radius r is estimated from the reverberation measurements, while Δv is
measured from the line widths.

199
Chapter 6 Black holes

80 1

Fλ (1350) Å
40 0

10 1

Fλ (5100) Å
5 0

10 1
He II

5 0

correlation coefficient
100 1
Lyα

50 0

80 1
C IV

60 0
Figure 6.8 The variations in
the continuum and in selected
emission lines in the Seyfert 1 1
15
galaxy NGC 5548. The
C III

left-hand panels show the 10


0
variations themselves, and 5
the right-hand panels show
the strength of the lag Δt. 10 1
This strength is proportional

to 8
,the cross-correlation 0
C(t) L(t − Δt), where 6
C(t) are the continuum 47 500 47 600 47 700 0 20
measurements and L(t) are the (JD − 2400 000)/days delay/days
line measurements.

Reverberation mapping has also confirmed predictions from photoionization


models of the structure of quasar broad line regions. In the simplest such model,

the radius r from which an emission line mainly originates scales as r ∝ L,
where L is the luminosity of the quasar. This follows from the expectation that the
number of photons per electron, U , will vary as U ∝ L/(4πr2 ), and any given
emission line will be most efficiently produced at a particular U . The parameter U
is known as the ionization parameter; we shall meet it again in Chapter 8.

200
6.6 The Magorrian relation

isodelay surface
r cos θ
Y illuminating
emission line photons Z continuum
θ photons
r

to observer θ
X
D

Figure 6.9 Isodelay surface in a quasar broad line region. Broad line clouds
lying along the isodelay surface have a time delay of Δt = (r/c)(1 + cos θ)
relative to light from the centre.
In practice the long-term monitoring data are available for only a few quasars, so
the reverberation measurements act as calibrators for black hole mass estimators
of the form MBH = kLαν (Δv)2 , where Lν is the continuum luminosity measured
near a particular emission line, and k and α√are constants specific to that emission
line. Typically α ! 0.5, as required if r ∝ L.
The quasars in the Sloan Digital Sky Survey (SDSS) have had black hole mass
estimates ranging from around 107 M! to an astonishing > 1010 M! (compare
the mass of the entire Small Magellanic Cloud galaxy in Subsection 6.5.2).
However, these black hole mass estimates have underlying uncertainties of many
tens of per cent at best. It’s possible that these largest black holes are in fact
lower-mass objects in which the underlying uncertainties happen to have given
rise to a higher measurement. Nevertheless, the SDSS quasar data set is consistent
with black holes existing up to a maximum mass of 3 × 109 M! .

6.6 The Magorrian relation


The masses of black holes turn out to have an astonishingly close relationship to
certain properties of their host galaxies. This closeness is a very important clue to
the formation of supermassive black holes and the galaxies that host them. The
next exercise should convince you that it’s at least not immediately obvious how
supermassive black holes came into existence.
Exercise 6.6 The age of the Universe in the concordance cosmology at
redshift z = 2 was about 3 Gyr. A 10 M! black hole could be made early in the
history of the Universe as an end-product of the first stars. Show that even if you
start with a 10 M! black hole at z = ∞, you still cannot create a supermassive
black hole by z = 2 through Eddington-limited black hole growth if it is accreting
with maximal efficiency η = 0.42. ■
How do black holes relate to host galaxies? One test is to correlate the black hole
measurements with host galaxy properties. A closer physical link should result in
a tighter correlation. We must consider not just the correlation but also the scatter.
201
Chapter 6 Black holes

If we correlate the supermassive black hole masses against the total host galaxy
masses, the correlation is somewhat weak. A stronger correlation (shown in
Figure 6.10) is between MBH and the spheroid luminosity, i.e. the bulges in the
case of spiral galaxies, and the entire galaxies in the case of ellipticals.

1010 1010

109 109
black hole mass/M!

black hole mass/M!


108 108

107 107

106 106

60 80 100 200 400 −16 −18 −20 −22 −24


σ/km s−1 B-magnitude of bulge

Figure 6.10 Left: correlation between the black hole masses and central
velocity dispersions for local galaxies. Right: correlation between black hole
masses and bulge B-band luminosities of the same sample of local galaxies.
Ellipticals are displayed as circles, spirals as triangles, and the squares represent
both lenticulars and compact elliptical galaxies.

This was first discovered in 1998 by a team led by the astronomer John
Magorrian, and has become known as the Magorrian relation. There is an even
stronger correlation between the spheroid velocity dispersions σ and MBH . This
correlation is also shown in Figure 6.10. The dispersion of the data points is
almost entirely attributable to the uncertainties in the measurements — in other
words, the measurement of the underlying scatter is almost consistent with zero.
The best-fit relationship is
C D4.58±0.52
8 σ
MBH = (1.66 ± 0.32) × 10 M) . (6.32)
200 km s−1
Central supermassive black holes are about 0.6% of the masses of their galactic
bulges (or the whole galaxies in the case of ellipticals), with a sphere of influence
that is less than a thousand billionth of the volume of the bulge, yet the black hole
mass correlates astonishingly strongly with the galaxy velocity dispersion.
Clearly, the creation of a central supermassive black hole has somehow been
strongly connected to the formation of its galaxy. We shall speculate on how and
why later in this chapter.

202
6.7 The hard X-ray background

6.7 The hard X-ray background


In Chapter 5 we turned Olbers’ paradox around and made inferences about the
evolution of galaxies and star formation from the spectrum of the extragalactic
optical and infrared background light. With the cosmic X-ray background, we can
make similar inferences about the cosmic history of black hole accretion.
The cosmic X-ray background was discovered in 1962 on a rocket mission
designed to study the Moon. The Moon remained undetected in X-rays until
1991, but the discovery of the cosmic X-ray background pre-dates the discovery
of the CMB by two years, making it the first known cosmic background. Riccardo
Giacconi led this pioneering discovery, and his cumulative contributions to X-ray
astronomy were recognized in his Nobel Prize in 2002. Figure 6.11 shows a 1991
image of the Moon by the ROSAT (Röntgensatellit) X-ray satellite.

Figure 6.11 Image of the Moon taken with the ROSAT


satellite. Note the reflected solar X-rays, the isotropic
background that is shadowed by the dark side of the Moon,
and the weaker background that appears to emanate from the
dark side of the Moon. This latter emission has since been
deduced to be local to the Earth and the satellite.

The Moon reflects solar X-rays, but the dark side is clearly obscuring a
faint background. The image is grainy because X-ray detectors respond to
individual photons. The dark side of the Moon isn’t quite black because the
Earth’s geocorona, or extended outer atmosphere, emits X-rays, and the satellite
orbits within this geocorona. (The extragalactic X-ray research community
sometimes refers to the ‘background’ as being the proportion that is not yet
resolved into point sources, rather than the total flux from the sky. We shall follow
the conventions at other wavelengths and take the ‘background’ to mean the total
flux, rather than the unresolved component of it.)
As with the CMB, the X-ray background appears to be fairly isotropic, once the
Galaxy has been subtracted. This on its own suggests a cosmological origin. The
X-ray background is included in Figure 5.1. One can make inferences about
the evolution of black hole accretion from this, but one thing quickly became
apparent. The spectrum of the extragalactic X-ray background is very different to
the spectrum of a star-forming galaxy or an unobscured quasar, so what could
generate this background? This is sometimes known as the X-ray spectral
paradox.
The shape of the X-ray background therefore requires some objects with harder
X-ray spectra, i.e. with a greater proportion of higher-energy X-rays. The most
likely candidate is type 2 AGN. Figure 6.12 shows the effect of X-ray and optical
203
Chapter 6 Black holes

absorption for various levels of obscuration. As we increase the gas column


density (the integral of the gas particle number density along the line of sight), so
the optical depth of dust should increase. As the dust content is increased, the
optical and near-ultraviolet flux is decreased, with the absorbed energy being
re-emitted in the infrared as thermal emission. Dust grains are not effective
absorbers of X-rays, but the neutral hydrogen gas associated with the dust does
absorb X-rays. The sharp cut-off at low X-ray energies is sometimes known as the
photoelectric cut-off. The hardest energy X-rays are the most penetrating,
which means that X-ray spectra with absorption have an inverted spectral index.
Cosmologically, this has two important observations consequences. First, there is
a negative K-correction, as with submm-wave surveys (see Chapter 5). Second,
the redshifting causes the observed photons to come from higher-energy
rest-frame X-rays at higher redshifts. Therefore the high-redshift objects will have
lower rest-frame obscuration, unlike in optical galaxy surveys.

wavelength/nm wavelength/nm
1 10 300 1000 3000
1.0000 1.0000

0.1000 0.1000
flux

flux
0.0100 0.0100

0.0010 0.0010

0.0001 0.0001
10.0 1.0 0.1 0.003 0.001 0.0005
energy/keV energy/keV

Figure 6.12 Left: the transmitted flux through a neutral hydrogen column density of (from top to bottom) 1020 ,
1021 , 1022 and 1023 cm−2 . The greater the column density, the less flux is transmitted through. On the right-hand
side, the corresponding near-infrared to ultraviolet extinction is shown for NHI = AV × 1.8 × 1025 m−2 mag−1 .
The right-hand curves show AV = 0.055, 0.555, 5.55 and 55.5 (from top to bottom).

The active galaxies with the highest NH column densities,


NH > σT−1 1 1.5 × 1024 cm−2
(where σT is the Thomson scattering cross section), are optically-thick to
Compton scattering of their hard X-ray photons. The scattered X-ray photons
have lost some of their energy, which was carried off as kinetic energy by the
electron with which they collided. These lower-energy X-rays are much more
easily absorbed. These Compton-thick active galaxies are difficult to detect in
hard X-ray surveys but their presence can be inferred from interpreting the
shape of the hard X-ray background. One could avoid having a Compton-thick
population, but (it turns out) only at the cost of failing to reproduce the observed
204
6.7 The hard X-ray background

absorption column density distribution. There must therefore have been some
significant contribution from Compton-thick active galaxies to the black hole
accretion history of the Universe.
If the geometry of the absorption allows it, it may be possible to detect some of
the Compton scattered component directly (though faintly). The spectrum of this
‘Compton reflected’ component is expected to be a broad peak around 20–30 keV,
depending on the ionization of the scattering medium. Also, Compton scattering
from iron nuclei can excite a strong iron Kα line (Subsection 6.5.5). Both the iron
line and the spectral shape can be useful indicators of Compton-thick absorbers,
though with current X-ray telescopes this can be done only in luminous and/or
local active galaxies.
The softer X-ray background has been mostly resolved into its constituent
point sources by ROSAT (1 75% of the 0.5–2 keV background). It has taken
longer to do the same at harder X-ray fluxes where the X-ray spectral paradox
suggested new populations. The Japanese ASCA satellite resolved 1 35%
of the 2–10 keV background, while the Italian BeppoSAX mission resolved
1 20–30% of the 5–10 keV background. The breakthroughs in resolving most
of the hard X-ray background into its constituent point sources came from
the European Space Agency’s XMM-Newton space telescope and NASA’s
Chandra space telescope. Figure 6.13 shows the deep pencil-beam surveys
taken by XMM-Newton and Chandra. The point sources in these surveys can
account for 1 80–90% of the 2–6 keV background, but only 50–70% of the
6–10 keV background. Astronomical X-ray CCDs detect individual photons by
converting them to electrons via the photoelectric effect (about 10–80% of
incident photons are converted, depending on the X-ray photon energies), then
reading the accumulated charge in each pixel. The faintest X-ray sources found by
XMM-Newton and Chandra have electron count rates of only 1 1 per day.
The term ‘hard X-ray background’ often tends to refer roughly to the 2–10 keV
range, but one should not forget the higher-energy background. Most of the
energy density in the cosmic X-ray background is at 10–100 keV, but only a few
per cent of this background has been directly resolved so far. The proposed future
European Space Agency X-ray space telescope, currently named the International
X-ray Observatory (IXO), will be able to observe 0.1–40 keV and directly probe
the Compton-thick populations.
What sort of objects dominate the hard (2–10 keV) X-ray background? Most
turn out to be active galaxies, but there is an extraordinary range in the optical
properties: the X-ray–optical flux ratios vary by over four orders of magnitude.
There are unobscured and obscured AGN, which appear in optical spectra as
broad line and narrow line objects, respectively. More surprisingly, there is a
minority of objects with obscured X-ray spectra but broad optical emission lines,
and others with low X-ray column densities yet narrow line AGN optical spectra
(implying that the obscuring gas doesn’t follow the same distribution as the
obscuring dust in Figure 4.15). Many objects in the XMM-Newton and Chandra
pencil-beam surveys are too faint for optical spectroscopy, even with the largest
8–10 m-mirror optical telescopes, though broad-band photometry is consistent
with these being mainly distant AGN. Some objects have no optical counterparts
in even the deepest ground-based and space-based optical imaging. There are also
X-ray bright, optically-normal galaxies (with the delightful acronym XBONGs),

205
Chapter 6 Black holes

which show X-ray evidence for AGN (sometimes obscured, but not always) yet
no optical AGN evidence (such as broad or high-ionization emission lines) in the
optical spectra. A minority of the hard X-ray background also comes from
starburst galaxies (see Chapter 5), galaxy clusters and groups (see Chapters 3
and 7), and Galactic stars.

Chandra Deep Field-North XMM-Newton Lockman Hole

(a) (b)
Figure 6.13 Deep fields taken by the XMM-Newton and Chandra space telescopes. (a) The Chandra Deep Field
North (CDF-N). This is a 2 Ms image (i.e. an exposure of 2 × 106 seconds) taken by Chandra, in the region of
the Hubble Deep Field North (HDF-N, marked in green). The Spitzer GOODS survey field is also shown in
green. There are X-ray data over around 448 arcmin2 , i.e. around 60% of the angular size of the Moon. This
image represents 0.5–2 keV photons as red, 2–4 keV as green (except for annotations), and 4–8 keV as blue.
Confusingly, there is also a Chandra Deep Field South (CDF-S), which does not coincide with the Hubble Deep
Field South (HDF-S), but the Chandra field is nevertheless the site of the Hubble Ultra Deep Field (UDF). (b) The
XMM-Newton deep field in a region of sky known as the Lockman Hole in Ursa Major (named after Felix
Lockman who discovered that this region had very low X-ray absorption from Galactic neutral hydrogen). Here,
0.5–2 keV photons are represented as red, 2–4.5 keV as green, and 4.5–10 keV as blue. Objects are broader closer
to the edges because the instrumental angular resolution is coarser. This image covers about 1556 arcmin2 .
But where are the Compton-thick objects? A few high-redshift objects are known
to be Compton-thick from X-ray observations, such as the hyperluminous galaxy
IRAS F10214+4724 (Chapters 5 and 7), but most are very hard to detect in
X-rays. An intriguing clue has recently come from infrared surveys. Good
candidates for highly dust-shrouded quasars were found through high 24 µm to
Martı́nez-Sansigre, A. et al., 3.6 µm flux ratios. (A faint 3.6 µm flux implies that it’s not an unobscured quasar,
2005, Nature, 436, 666. in which case the 3.6 µm flux is dominated by the host galaxy, while the 24 µm
excess suggests hot dust as expected for an AGN dust torus.) Furthermore, as we
saw in Chapter 5, star-forming galaxies have strong emission and absorption
features in the mid-infrared, while the mid-infrared spectra of active galaxies are
typically featureless. Mid-infrared spectroscopy of some mid-infrared-bright
but optically-faint galaxies in the Chandra Deep Field North (also known as
GOODS-North) has found evidence of active nuclei, but the X-ray emission of
these galaxies is weak or absent. This suggests that there is a population of
‘mid-infrared excess’ galaxies having high mid-infrared–optical flux ratios, at
least some of which are Compton-thick active galaxies. Could these be the
missing galaxies that dominate the highest-energy X-ray backgrounds?
206
6.8 Black hole demographics

6.8 Black hole demographics


We can use Equation 6.32 to estimate the present-day cosmological density of
black holes, by integrating over the spheroid luminosity function and using the
Faber–Jackson relationship between velocity dispersion and luminosity, and its
equivalent for spiral bulges (Chapter 3). From this, one finds that
+3.9 2
ρBH = 9.4−2.9 h × 105 M) Mpc−3 .
This is about ten times bigger than Soltan’s original estimate of
1 − η 0.1
ρBH = × 8 × 104 M) Mpc−3
0.9 η
(where η is the accretion efficiency), but Soltan considered only the type 1 (broad
line) active galaxies. Locally, there are about four times as many type 2 active
galaxies as type 1. In order to bring the Soltan estimate broadly into agreement
with this local value, the type 1/type 2 fraction would need to evolve (or the
estimated bolometric correction was wrong).
Alternatively, one could make a Soltan-style analysis using the X-ray number
counts of active galaxies, since these will include the contributions from type 2
active galaxies, or at least the Compton-thin proportion. This gives
1−η
ρBH = (4.7 − 10.6) × × 105 M) Mpc−3 .

Comparing this to the local black hole number density yields a broad constraint
on the accretion efficiency of η = 0.04–0.16. There have also been several
attempts to model the fraction of time that quasars spend accreting (known as the
duty cycle), their accretion efficiencies and their Eddington ratios. The typical
inputs to these models are the local black hole mass function (number per unit
volume per unit mass), the evolving quasar luminosity function, and the evolution
and luminosity dependence of the type 1/type 2 number density ratio. However,
as these constraints are model-dependent, we shall not discuss them here.
X-ray observations of local galaxies have revealed a final surprise: a population of
‘ultraluminous X-ray sources’ with luminosities > ×1032 W (i.e. > 3 × 105 L) )
that do not lie at the centres of the galaxies (see, for example, Figure 6.14). If Figure 6.14 Optical image of
these are accreting black holes, then their masses are in the range 100–10 000 M) , the galaxy M74, with the hard
sometimes referred to as ‘intermediate mass black holes’. How did they form? X-ray image from the Chandra
Are they formed, for example, at the centres of globular clusters? Some globular satellite superimposed in red.
clusters do show evidence for central intermediate mass black holes, and they The ultraluminous X-ray source
appear to obey the same MBH –σ relationship as galaxy spheroids (see, for is marked as ULX. This is too
example, Maccarone, T. J. et al. (2008) Nature, 445, 183). Do intermediate mass bright to be a conventional
black holes eventually sink to the centre of their galaxy to build up the central X-ray binary star system, unlike
supermassive black hole? How do intermediate mass black holes participate in the the other X-ray sources in this
formation of globular clusters, if at all? Are they indeed black holes, or some brief image.
X-ray luminous super-Eddington state of an accreting stellar-mass black hole? At
the time of writing, these are still open questions.

6.9 Observations of black hole growth and the effects of feedback


Which came first — the black hole or its galaxy? Can observations constrain
which came first? This takes us into what at the moment is uncertain experimental
207
Chapter 6 Black holes

territory. There’s a connection to the cooling flow problem in Section 3.9: dense
cooling core clusters are a nearby example where we can observe the influence of
a black hole suppressing star formation in the central galaxy.
One of the problems is that it’s not currently technologically feasible to directly
resolve the black hole sphere of influence in any but the most local galaxies. The
local MBH –σ relationship is the end result of billions of years of evolution,
including multiple galaxy mergers. Intuition suggests that the MBH –σ relationship
was somehow imprinted early on, and numerical simulations have confirmed that
this relationship, once established, is maintained surprisingly well in galaxy
mergers. Unfortunately, the primordial high-redshift links between black holes
and their host galaxies are very difficult to observe directly.
However, there are slightly less direct approaches. As we’ve seen, reverberation
mapping can be used to derive the masses of quasar black holes, while the
velocity dispersions can be inferred from the widths of absorption lines in the
quasars’ host galaxies. Provided that these are both reliable measures, we can
make some constraint on the evolution of the black hole–host galaxy relationship.
Another approach is to use radio-loud active galaxy unification models. The
radio-loud active galaxy population in general has powerful radio jets emitted
from the active nucleus, terminating in a bow shock in the intergalactic medium.
Figure 4.12 is an example. Only about 10% of active galaxies are radio-loud, but
it is not clear why.
In the radio-loud unification model, active galaxies have a dusty torus that
obscures the view of the quasar broad lines from some orientations (see
Figure 4.15). Quasars and radiogalaxies with the same radio lobe luminosities
should be members of the same population, though seen with different
orientations. Therefore we can measure the host galaxy properties by studying the
radiogalaxies, then measure the corresponding quasar properties by comparing the
radiogalaxies’ counterparts in the quasar population. There is some (admittedly
weak but suggestive) evidence for the evolving black hole mass and host galaxy
properties inferred from this method.
So which came first, the black hole or its galaxy? The comparison of the Madau
diagram (Chapters 4 and 5) to the black hole accretion history suggests that there
was plenty of star formation before the quasar epoch. However, the comparison of
3CRR host galaxy masses and black hole masses suggests that the MBH /Mspheroid
ratio increases with redshift, which in turn suggests that the most massive black
holes were pre-existing and spheroids formed around them.
On the other hand, there are hints that the submm-selected galaxy population has
a smaller MBH /Mgalaxy mass ratio than these quasars, at least in those submm
galaxies with broad optical emission lines (Figure 6.15). What’s more, many
submm galaxies have been detected in hard X-rays, implying that black hole mass
accretion appears to be much more common in those galaxies than in the general
galaxy population. Perhaps the most massive starbursts are eventually shut off by
the energy input from an exponentially-growing accreting black hole. Quasars at
all redshifts are ultraluminous starbursts on average, though this doesn’t tell us
whether the quasar phase comes at the start of the starburst or at the end. At the
time of writing, it seems that the star formation rate in quasars varies roughly as
the square root of the quasar luminosity, not linearly:

208
6.9 Observations of black hole growth and the effects of feedback

dM∗ /dt ∝ (dMBH /dt)0.44±0.07 . This feature has not so far been reproduced by
models of quasar feedback. Serjeant and Hatziminaoglou,
2009, Monthly Notices of the
Royal Astronomical Society,
local ULIRGs with MBH 397, 265.
z > 1.8 SMGs (stellar mass)
z > 1.8 SMGs (CO dyn mass)
X-ray luminous broad-line
SMGs, using CO dyn mass
−2
quasars
log10 (MBH /MGAL )

radio AGN

local relationship
−3 η = 0.1

η = 0.2

−4 η = 1.0

0 1 2 3
redshift, z

Figure 6.15 Black hole–galaxy mass ratios for the galaxies selected at submm
wavelengths (SMGs) that are obscured at X-ray wavelengths with galaxy masses
inferred from observed stellar mass and via the width of a carbon monoxide
emission line. Also shown is the relationship for local ultraluminous infrared
galaxies (ULIRGs), active galaxies, and an indication of the range spanned on
average by X-ray luminous QSO SMGs. The effect of changing the assumed
value of η (Equation 6.7) for SMGs is also shown. The SMGs have black hole
masses that are smaller than those of quasars, for their host galaxy sizes.
The scatter in the MBH –σ relationship appears to be smaller than that of the
MBH –Mhalo correlation, suggesting that the velocity dispersion and not the mass
of the dark matter halo is primary. This has been shown (at some length) to be Wyithe and Loeb, 2005,
consistent with self-regulated black hole growth, in which the energy output from Astrophysical Journal, 634, 910.
black hole accretion is enough to unbind the gas, which chokes off the supply of
fuel to the black hole. There is currently a great deal of research activity in this
area, aiming at inferring the strengths of the physical links from the tightnesses of
the correlations. For example, the surprise lack of a black hole in the galaxies
M33 and NGC 205, even though their central star clusters obey the same central
mass versus spheroid mass relationship, may point at a different fundamental
relation. This is still being debated and studied. Also, in nearby active galaxies,
there appears to be a different distribution of Eddington ratios for galaxies with
recent star formation, compared to those with more quiescent stellar populations.

209
Chapter 6 Black holes

Galaxies with recent star formation also seem to have higher Eddington ratios
(Figure 6.16). It’s been suggested that this is consistent with self-regulated black
hole growth while the gas supply is plentiful (which also fuels the star formation),
but when the gas supply runs out, it seems that the only fuel for the black hole
comes from mass loss from evolved stars, starving the black hole.

6.75 < log10 MBH < 7.00


0 0 7.00 < log10 MBH < 7.25
7.25 < log10 MBH < 7.50
7.50 < log10 MBH < 7.75
−0.5 −0.5 7.75 < log10 MBH < 8.00
8.00 < log10 MBH < 8.25
log10 F

log10 F
−1 −1

−1.5 −1.5

−2 −2

−2 −1 0 1 2 −2 −1 0 1 2
log10 [L([O III])/MBH ] log10 [L([O III])/MBH ]

Figure 6.16 Distribution of inferred Eddington ratios L/LE for galaxies with young stellar populations (left),
and with old stellar populations (right). The AGN luminosity is taken to be proportional to the luminosity in the
[O III] emission line, symbolized as L[O III], and the y-axis is the logarithm of the fraction of the population. The
colours represent black hole mass ranges, as shown in the right-hand panel.

6.10 Merging black holes and gravitational waves


A tremendous new window in observational astronomy may soon open up. We
may soon be able to observe the only detectable radiation that comes directly from
black holes. (Hawking radiation is too faint ever to be detectable for any known
population of black holes.)
In electromagnetism, accelerating charges radiate electromagnetic waves. In
general relativity, the analogous process is the radiation of gravitational waves,
but in this case the medium of the wave is spacetime itself. We said at the start of
this chapter that spherically-symmetric motion does not generate gravitational
waves. Dipole gravitational radiation turns out to be impossible because it would
violate conservation of momentum: any accelerated mass would be balanced by
an equal and opposite change of momentum somewhere else, so any attempt by
that mass to radiate dipole gravitational radiation would be cancelled out by the
radiation from elsewhere. Only quadrupole-moment motion or above generates
gravitational waves.
Gravitational waves took some time to gain wide acceptance, perhaps because
they need a very careful treatment of coordinate systems in general relativity.
210
6.10 Merging black holes and gravitational waves

Einstein himself initially thought that gravitational waves did not exist. Eddington
is said to have dismissively quipped that gravitational waves travel ‘at the speed of
thought’, but in truth his remark was directed at a certain spurious subset, and in Eddington, A. S. (1922)
fact he showed that members of another class of gravitational waves do indeed Proceedings of the Royal Society
carry energy. of London A, 102, 268–82.
Gravitational waves have been inferred in the binary pulsar PSR B1913+16, in a
beautiful verification of the predictions of general relativity that won Russell
Hulse and Joseph Taylor the 1993 Nobel Prize (Figure 6.17). The pulsar is in a
binary orbit with another star, detectable through subtle variation in the timings
of the pulses. (Doppler shifts imply timing variations, in the same way that
cosmological redshift implies supernova time dilation — see Chapter 1.) The
energy loss from gravitational radiation leads to a gradual spiralling in of the two
pulsars, detectable from the timings. Primordial gravitational waves are also
expected to contribute to the CMB power spectrum (Chapter 2), though direct
detection of primordial gravitational waves will be extremely challenging.

−5

−10
cumulative shift of periastron time/s

−15

−20 general
relativity
prediction

−25

−30
Figure 6.17 The cumulative
change of pulsar PSR B1913+16
−35 in the periastron time (the
time of closest approach of
the two stars), compared to
−40 1975 the predictions of general
1980 1985 1990 1995 2000 2005
relativity. The data agree with
year
the predictions to 0.2%.

In fact, at the time of writing, no direct detections of gravitational waves have


been made, but ambitious experiments may soon succeed and usher in a new
era of gravitational wave astronomy. Gravitational waves change the distance
211
Chapter 6 Black holes

between free-falling observers, so gravitational wave observatories seek to


monitor distances carefully using laser interferometry. The detectability of
gravitational waves is greatly helped by the fact that the amplitudes are being
measured, rather than the energies as with electromagnetic radiation. The
energy E varies with amplitude A as E ∝ A2 . Therefore, while fluxes fall off as
1/r2 , amplitudes fall off only as 1/r. Several ground-based gravitational wave
detectors are under development at the time of writing, such as LIGO (Laser
Interferometer Gravitational-wave Observatory) and GEO, the German–British
gravitational wave detector. The European Space Agency also has plans to
launch a space-based gravitational wave observatory named LISA (Laser
Interferometer Space Antenna), consisting of three free-flying spacecraft linked
by laser interferometry. Figure 6.18 shows the expected sensitivity of forthcoming
gravitational wave detectors. LIGO has already given useful upper limits to the
gravitational waves from a nearby gamma-ray burst.

10−18
coalescence of massive black holes
resolved Galactic binaries
gravitational wave amplitude

NS-NS and BH-BH coalesence


SN core collapse
10−20
Figure 6.18 The predicted
sensitivity of Advanced LIGO
and LISA, compared to the
expected gravitational wave 10−22
amplitudes of various
astrophysical populations. The
amplitudes are expressed as LISA LIGO
fractional changes in lengths, so unresolved Galactic binaries
10−24
are dimensionless. NS refers to 10−4 10−2 1 102 104
neutron stars, BH to black holes, frequency/Hz
SN to supernovae.

The merger of two black holes would generate copious gravitational waves.
Within our Galaxy, many merger events of black holes and neutron stars should be
detectable (see Figure 6.18). At cosmological distances, one could detect only the
mergers of supermassive black holes. Could this happen? At the time of writing,
at least one credible candidate for a binary supermassive black hole has been
found in a quasar (Figure 6.19). But supermassive black hole mergers may be
much more common than this single example suggests. If quasars and starbursts
are triggered by mergers, then galaxy–galaxy merging is common in the history of
the Universe. Merging galaxies with pre-existing supermassive black holes will
have their supermassive black holes forming a binary system within a million
years of the merger, according to numerical simulations. The expectation is of a
few tens of supermassive black hole merger events per year detected with LISA.
The gravitational waves from inspiralling black holes are also sufficiently
well-understood that they could be treated as standard candles, and they are
sometimes referred to as ‘standard sirens’. The physical simplicity of such a
system, completely determined in practice by two masses and two spins, is very

212
Summary of Chapter 6

z = 0.3889
Figure 6.19 The quasar
z = 0.3727 SDSS J1536+0441. The
100 spectrum has evidence of three
redshifts. At z = 0.3889
−1

are broad hydrogen Balmer


1020 fλ /W m−2 Å

emission lines (Hα and Hβ) and


narrow emission lines ([O II],
[O III], [Ne III], [Ne V]). At
z = 0.3727 there are further
50
broad Balmer lines and Fe II
emission, but no narrow lines.
There are also absorption lines
at an intermediate redshift
z = 0.387 83 of z = 0.387 83 from the
host galaxy. This has been
interpreted as a binary black
4000 6000 8000
λ/Å
hole in the quasar’s broad line
region.

attractive compared to other standard candles. Luminosity distances can be


determined to around 4% accuracy with LISA. If one could independently
measure the redshift, one would have an ingenious method to map the geometry
of the Universe, determine the evolution of dark energy, and so on. However, the
angular resolution of LISA will be only ∼0.2–0.5◦ , making this a challenging
(but not necessarily impossible) experiment. More spectacularly, a merger of
black holes at z = 10 could be a route to directly measuring the acceleration or
deceleration of the expansion of the Universe at z = 10.

Summary of Chapter 6
1. Non-rotating uncharged black holes are described by the Schwarzschild
metric, and their rotating counterparts by the Kerr metric.
2. The accretion efficiency of a black hole can be calculated by finding the
energy released from dropping from infinity to the radius of the smallest
stable circular orbit. For a Schwarzschild metric this is 6% of the rest mass
at infinity, while for the Kerr metric it is 42%. This is the most efficient
conversion process known from mass-energy to luminosity, making black
holes prime candidates for powering the central engines of quasars.
3. The present-day contribution to Ωm from black holes can be estimated from
the source counts of quasars, using measurements of the average redshift as
a function of apparent quasar magnitude, combined with assumptions of the
bolometric correction and the accretion efficiency. This constraint is
independent of the Hubble parameter H0 and the cosmological density
parameters.
4. A similar constraint can be made using the hard X-ray background. The
background has a harder spectral shape (i.e. more output at higher energies)
213
Chapter 6 Black holes

than unobscured quasars or starburst galaxies, implying a population of


X-ray-obscured quasars. For the most part these can be matched to
(optically) type 2 active galaxies but there are exceptions, e.g. optically
type 1 active galaxies that are heavily X-ray-obscured, or optically type 2
active galaxies that have low obscuration to hard X-rays.
5. The present-day number density of black holes in active galaxies is at least
two orders of magnitude less than the above estimates, implying that most
black holes are in dormant quasars in many local galaxies.
6. The centres of all galaxies appear to host supermassive black holes, detected
using the kinematics of stars and/or gas or megamasers, by resolving the
sphere of influence of the black hole (in which the effects of the black hole’s
gravity dominate the effects of the galaxy’s velocity dispersion). This region
is about five orders of magnitude larger than the Schwarzschild radius.
7. The masses of black holes in quasars can also be determined through
reverberation mapping.
8. The mass of the central supermassive black hole appears to correlate
strongly with the velocity dispersion in the surrounding bulge (for spiral
galaxies) or surrounding galaxy (for ellipticals), suggesting a close link
between the formation of the black hole and its surrounding galaxy, perhaps
through quasar feedback (injection of ionizing flux and/or kinetic energy
into the surrounding interstellar medium, affecting both star formation and
the amount of infalling gas to the black hole). Various attempts have been
made to find evolution in this relationship.
9. There is an additional population of ultraluminous X-ray sources in many
local galaxies outside their centres.
10. The inspiralling of black holes and neutron stars releases gravitational
waves. These have been inferred in binary pulsars (confirming the
predictions of general relativity). Gravitational waves from the merger of
black holes may be detectable with the next generation of gravitational wave
observatories.

Further reading
• John Michell’s 1767 paper ‘An inquiry into the probable parallax and
magnitude of the fixed stars’ is at
[Link] It includes a derivation of
integral source counts.
• John Michell’s 1784 paper is at
[Link]
• For more on black holes at this level, see Lambourne, R., 2010, Relativity,
Gravitation and Cosmology, Cambridge University Press.
• At the time of writing, some audio renderings of gravitational waves from
inspiralling black holes can be found at
[Link]

214
Summary of Chapter 6

• Chongchitnan, S. and Efstathiou, G., 2006, ‘Prospects for direct detection of


primordial gravitational waves’, Physical Review D, 73, 3511; available at
[Link]
• One classic graduate-level book on general relativity is Misner, C.W., Thorne,
K.S. and Wheeler, J.A., 1970, Gravitation, W.H. Freeman.
• For more on the principle of least action (at an accessible level), see Chapter 19
of Feynman, R.P., Leighton, R.B. and Sands, M., 1964, The Feynman Lectures
on Physics, Vol. II, Addison-Wesley.
• For the history of black hole science, see Ferrarese, L. and Ford, H.C.,
astro-ph/0411247.
• For more on supermassive black holes (and difficulties if trying to avoid
making them), see Begelman, M.C., Blandford, R.D. and Rees, M.J., 1984,
‘The theory of extragalactic radio sources’, Reviews of Modern Physics, 56,
255–351; for more on alternatives to supermassive black holes, see also
Maoz, E., 1998, Astrophysical Journal Letters, 494, 181.
• Brandt, W.N. and Hasinger, G., 2005, ‘Deep extragalactic X-ray surveys’,
Annual Review of Astronomy and Astrophysics, 43, 827.
• Comastri, A., 2004, ‘Compton thick AGN: the dark side of the X-ray
background’, astro-ph/0403693.
• Elvis, M., 2006, ‘Quasar structure and cosmological feedback’,
astro-ph/0606100.
• Colpi, M. and Dotti, M., 2009, ‘Massive binary black holes in the cosmic
landscape’, Invited Review to appear in Advanced Science Letters, Special
Issue on Computational Astrophysics, edited by Lucio Mayer; available at
arXiv:0906.4339.
• For more details on accretion discs and relativistic beaming, see Kolb, U.,
2010, Extreme Environment Astrophysics, Cambridge University Press.
• For Eddington’s 1922 Royal Society article, see
[Link]

215
Chapter 7 Gravitational lensing
Do not Bodies act upon Light at a distance, and by their action bend its
Rays; and is not this action (caeteris paribus) strongest at the least distance?
Isaac Newton, Opticks

Introduction
Some of the most beautiful images in cosmology are found in gravitational
lensing. In these, we see the direct effect that matter has on the curvature of
spacetime around it. Most astronomy can investigate only luminous matter, but
this is one of the very few opportunities to infer much more. Gravitational lensing
effects are created only by the intervening matter distribution, regardless of
whether it’s luminous or dark, or in equilibrium or not. Lensing can’t distinguish
between these different sorts of intervening matter, but the positive side of this is
that we don’t miss anything.

7.1 Gravitational lens deflection


Einstein’s theory of general relativity predicts that all mass-energy generates
curvature in its surrounding spacetime. The deflection of starlight close to the Sun
(e.g. Figure 7.1) was a brilliant confirmation of Einstein’s theory (see the New
York Times report in Figure 7.2).

source

deflector

observer

Figure 7.1 Schematic geometry of gravitational lensing. There can be multiple


lines of sight to the background source because of the foreground deflector.
However, gravitational lensing has been found where there are chance alignments
of background galaxies with foreground galaxies or clusters of galaxies.
Figure 7.3 shows the redshift z = 0.175 galaxy cluster Abell 2218, in which
Figure 7.2 Headlines on higher-redshift galaxies have been distorted into arcs by the curved spacetime
page 17 of the New York Times, around this Abell cluster. We’ll show that gravitational lensing conserves surface
10 November 1919. brightness (i.e. the flux per square degree on the sky), so stretching an image of a
background galaxy also magnifies the flux.

216
7.1 Gravitational lens deflection

Figure 7.3 The galaxy cluster Abell 2218 observed with the Hubble Space
Telescope (HST) with the WFPC2 instrument. Note the background galaxies
distorted into arcs.

This magnification can also make galaxies appear to have extraordinary


luminosities. The Infrared Astronomical Satellite (IRAS) made a surprising
detection of a z = 2.286 galaxy, IRAS FSC 10214+4724, that appeared to
have a tremendous bolometric luminosity of 3 × 1014 L) (see, for example,
Figure 5.8 in Chapter 5). This led to a great deal of theoretical speculation about
the formation of galaxies, but it later transpired that the galaxy was gravitationally
lensed by a z = 0.9 interloper (see Figures 7.4 and 7.5). Nevertheless,
IRAS FSC 10214+4724 is still one of the most luminous galaxies in the
observable Universe, even correcting for lensing. The hunt is on for others like it.
Only a handful have been found so far.
How can we tell if an object is a multiply-imaged gravitational lens, rather than
some chance pairing of neighbouring objects or an arrangement of galaxies that is
curved by chance? In general one looks for a morphology consistent with lensing,
a redshift of the candidate background object, a candidate lens and a lens redshift
estimate that’s a lot lower than the background redshift (e.g. Figure 7.5). The
spectra (and variability where one can measure it) should be consistent between
candidate multiple images — or at least, any inconsistencies should be small
enough to be attributable to differential magnification or microlensing (of which
more later). Often, however, only some of these criteria can be met, because of
lack of data or difficulty in obtaining data.
Another possible test relies on lensing being purely geometrical. Light travels on
geodesics, regardless of the light’s wavelength. Therefore lensing must be
wavelength-independent or achromatic. Different images of the background
source must therefore have the same colours, i.e. the same spectra. If the light

217
Chapter 7 Gravitational lensing

from the background source is partially obscured by dust in the lensing galaxy, it
could give the appearance of different colours, but this achromaticity test can
sometimes be done at radio wavelengths where dust extinction has no measurable
effect. If the source has some variation in colour and has different magnifications
in different parts, then the multiple images could have different colours. One must
then carefully model the lens system to see if any observed achromaticity could be
due to differential magnification (e.g. Figure 7.4).

0.5
1 5
2

arcsec
−0.5

E
−1
N

−1.5 −1 −0.5 0 0.5


arcsec

Figure 7.4 HST image of the hyperluminous galaxy IRAS FSC 10214+4724, taken at a wavelength of around
800 nm (just to the red of the visible range). The IRAS galaxy is the arc to the left, gravitationally lensed by the
foreground galaxy (marked as 2). There is a second image (‘counterimage’) of the IRAS galaxy, marked as 5.
Objects 2 and 5 have their central pixels boosted artificially in this image for clarity. The contours are HST data at
around 400 nm, which surprisingly failed to detect the counterimage; the slight shift in the 400 nm and 800 nm
images suggests some colour gradient in the IRAS galaxy and hence differential magnification.
How much does an object deflect light by gravitational lensing? We can make a
Newtonian prediction of the gravitational lens deflection angle by a mass M , by
treating an incoming photon as being a particle with initial velocity c, as shown in
Figure 7.6. In this Newtonian model, the deflection angle φ in radians will be
φ 1 tan φ = vy /c, where vy is the y-axis velocity acquired by the photon as it
passes the Sun. We neglect any x-axis change since the imparted velocity will be
* c.
In Newtonian gravity, the photon will move with acceleration a = GM/r2 in the
direction towards the Sun. The vertical (y-axis) acceleration in Figure 7.6 will just
be ay = a cos θ = (GM/r2 ) cos θ. We can shortcut some tedious algebra
by using Kepler’s second law (i.e. the conservation of angular momentum):
r2 dθ/dt = constant. We’ll need the value of that constant, and another trick
helps: Kepler’s laws apply even if the mass M is limitingly small or even zero.
Therefore the constant must be bc (where b, known as the impact parameter, is
shown in Figure 7.6), because that would be the value of r2 dθ/dt at the point of
closest approach to the mass if the photon were not deflected.
Now imagine a short time interval dt. The change in y-axis velocity in that
time will be dvy = ay (t) dt, because ay = dvy /dt. But we can rearrange
r2 dθ/dt = bc to get dt = (r2 /bc) dθ. Putting this together, we find
218
7.1 Gravitational lens deflection

0.4

0.3
flux/arbitrary units

0.2
Figure 7.5 Summed spectrum
of the two nearest foreground
objects that dominate the
0.1
gravitational lensing of the
redshift z = 2.286 galaxy
IRAS FSC 10214+4724. The
0
discontinuity is the 4000 Å
break (Section 4.4), redshifted to
5500 6000 6500 7000 7500 8000
about z = 0.9. This spectrum is
λ/Å very different to that of the
IRAS galaxy (Figure 5.8).

y
M
θ Figure 7.6 The
gravitational lensing
b deflection of a photon by a
ed path mass M . Radial distances r
deflect
are measured outwards from
undeflected path the mass M . The distance b
is sometimes known as the
x
impact parameter.
GM
cos θ dt
dvy = ay (t) dt =
r2
GM r2 GM
= 2 cos θ dθ = cos θ dθ.
r bc bc
Integrating this from θ = −π/2 to +π/2, we find that
2GM
vy = ,
bc
so
vy 2GM
φNewtonian 1 = . (7.1)
c bc2
In the weak-field limit, the full general relativistic treatment turns out to be
exactly a factor of two greater:
4GM
φ= . (7.2)
bc2
Why exactly a factor of two? This is difficult to answer. As you saw in Chapter 6,
there is a similar conservation of angular momentum r2 dθ/dλ = constant, where
219
Chapter 7 Gravitational lensing

λ is a parameter measured along the path of the photon. (For a massive particle
we could use dλ = dτ , where τ is the proper time, but photons have ds = 0 so
dτ = 0.) Converting λ to coordinate time t involves a factor also involving
GM/c2 (due to the spacetime curvature), which leads ultimately to the larger
deflection angle. We’ll return to this in Section 7.5.

7.2 The lens equation


Gravitational lensing has a beguilingly simple geometry, shown in Figure 7.7. The
photons spend most of their time travelling between the background source and
the lens, or between lens and observer, but they spend very little of their time
being deflected. We can therefore use the ‘thin lens approximation’ and treat the
change of direction as instantaneous. The angles are all assumed to be small, so
we can write, for example, θ 1 tan θ = ξ/DL .

Figure 7.7 The geometry


S2 η S S1
of gravitational lensing of a
source S by a lens L, with the
angles that are discussed
in the text labelled. The
DLS
apparent positions of the
α
"
source seen from the Earth
are S1 and S2 . (Light rays for
ξ
DS L S2 are not shown, for clarity.)
The impact parameter is
given the symbol ξ. DLS is
β α the distance to the source as
DL
seen from the lens. Note
θ that the distances DL , DS
and DLS are all angular
O
diameter distances, so
DS 5= DLS + DL .

But beware of a subtlety that traps the unwary. The vertical distances in Figure 7.7
are angular diameter distances (Chapter 1). For example, DS is the angular
diameter distance from the source to the observer, while DL is the angular
diameter distance from the lens to the observer. But DLS is the angular diameter
distance to the source as seen from the lens, so DS does not necessarily equal
DLS + DL !
● Do any cosmological distances add up, so that Earth-to-source equals
Earth-to-lens plus lens-to-source?
❍ Comoving distances add up in exactly this way.
If the lens, image, background source and the Earth are all in the same plane, as in
Figure 7.7, then
β = θ − α. (7.3)
(To show that this is true while avoiding DS 5= DLS + DL , compare distances
along the top of Figure 7.7.) But what if they aren’t in the same plane? This could
220
7.2 The lens equation

happen if the lens is not symmetrical, for example. In this case we can treat the
angles as vectors on the sky, so
β = θ − α(θ). (7.4)

This is known as the lens equation and is the fundamental equation of


cosmological gravitational lensing. Note that we’ve written α as a function of θ,
which is also true in the scalar case. (A subtlety in Equation 7.3 is that α can be
negative, i.e. it’s not the modulus of α.)

Exercise 7.1 Derive a flat space expression for DLS involving the comoving
distances rL and rS (the comoving distances to the lens and source, respectively),
and the lens and source redshifts zL and zS .
Exercise 7.2 Write down a proof of Equation 7.4 by working with vectors on
the source plane, keeping in mind that DS 5= DLS + DL . ■
So far we’ve not used any information on the lens mass distribution, or on how
much deflection that mass causes. Let’s see what happens for a point mass M .
Adapting Equation 7.2, a point mass M will cause a deflection of
4GM
α
K= . (7.5)
c2 ξ
(By symmetry, the light rays are all confined to a plane in this case, so we don’t
need to use vectors.) This deflection is related to the observed shift α by
DLS
α= α
K (7.6)
DS
using Figure 7.7, so the lens will cause a visible deflection of
DLS 4GM
α= . (7.7)
DS c2 ξ
We can rewrite the (scalar) lens equation as
DLS 4GM
β =θ−α=θ− ,
DS c2 ξ
and using θ = ξ/DL (Figure 7.7) we reach
DLS 4GM
β =θ− . (7.8)
DL DS c2 θ

Exercise 7.3 What if the background object is exactly behind the lens, so
β = 0? What will this look like? (Give this some thought before looking up the
answer!) ■
This angular size is often known as the Einstein radius and given the symbol θE :
#
4GM DLS
θE = . (7.9)
c2 DL DS
It depends only on the source redshift zS , the lens redshift zL and the lens
mass M . (Note that θE isn’t just a property of the lens, because it also depends on
the distance to the background source.) It’s an important quantity in gravitational
lensing in general.
221
Chapter 7 Gravitational lensing

When the source position β is around θE or less, the magnifications are typically
strong. Conversely, if β 0 θE , then there is typically very little magnification.
We’ll show in Section 7.6 how θE can also be a boundary between having
multiple images and having only one image. Also, multiple images tend to have
separations of roughly 2θE , as we’ll show. Figure S7.1 from Exercise 7.3 is an
example of an Einstein ring.
Substituting in the numerical values and assuming a point mass, we obtain an
equation that’s useful for cosmological lensing:
C D1/2 C D
θE M DL DS /DLS −1/2
= . (7.10)
arcseconds 1011.09 M) Gpc
Typically, galaxy–galaxy lensing gives Einstein radii of the order of an arcsecond,
while lensing by a galaxy cluster typically has θE about ten times bigger. At the
opposite size scale, gravitational microlensing (which we shall meet later in this
chapter) can be characterized with
C D1/2 C D
θE M DL DS /DLS −1/2
= . (7.11)
milliarcseconds 1.23 M) 10 kpc
For our point mass lens, we can write Equation 7.8 as
θE2
β=θ− . (7.12)
θ
This quadratic equation has the solution
C J D
1
θ= β ± β 2 + 4θE2 , (7.13)
2
giving two possible values for θ.

Exercise 7.4 Show that one value of θ in Equation 7.13 is always negative. Is
this a physical solution? If it is, then what does it correspond to? If it isn’t, then
why does it occur in this equation?
Exercise 7.5 In general, is there a unique image position θ for any given
source position β? Also, is the reverse true — is there a unique source position β
for every image position θ? ■

7.3 Magnification
Gravitational lensing magnifies not only the sizes of distant galaxies, but also their
fluxes. This is no coincidence: it turns out that surface brightness (flux per unit
area on the sky) is conserved in gravitational lensing. We’ve outlined a proof
briefly in the box below, but this is only in case you’re unsatisfied with having
surface brightness conservation unproven. We won’t use the proof later in the
book.

Why does lensing conserve surface brightness?


The key idea is the phase space density of photons. Phase space is an
imagined six-dimensional space that describes both spatial position and

222
7.3 Magnification

momentum. Each photon has a position (x, y, z) and a momentum


(px , py , pz ), and we lump these together and treat any photon’s state as being
a point (x, y, z, px , py , pz ) in a six-dimensional space. Now, if we apply
a Lorentz transformation along the x-axis, the x-position gets Lorentz
contracted by a factor of γ, while the px -momentum is increased by the
same factor. Therefore the phase space density of photons is constant, i.e.
the number of photons per unit phase space volume is constant. This is
sometimes called Liouville’s theorem.
Next, we imagine that we have a telescope pointed along the z-axis. We put
a filter in the optics so that it receives photons only within an energy range
E → E + δE. Our telescope detector receives N photons from a patch of
sky with solid angle area ΔΩ (in, say, steradians or square degrees), and our
detector itself has an area A (in, say, square cm or square metres). Figure 7.8
shows this schematically.

z p2

Vp = |p|2 Δ|p| ΔΩ
y
= (p0 )2 Δp0 ΔΩ
x

δz = δt

Vs = A δt p1
p3

3-space volume Vs 3-momentum volume, with direction of


momentum vectors reversed for ease
of visualization (telescope as an
emitter, not a receiver)

Figure 7.8 The space volume and momentum space volume of photons
hitting the detector in a time δt. To make the figure clearer, we’ve flipped the
momentum diagram and shown the detector as an emitter instead of a
receiver.

In a short time δt the detector receives the photons from a volume Vs = A δt.
Those photons have energy E → E + δE, and since E = pc for photons,
their z-axis momenta are E/c → E/c + δE/c. Their momentum space
volume is Vp = (1/c3 )E 2 δE ΔΩ (see Figure 7.8). Putting this together with
the spatial volume, we find that our N photons have phase space density
N N c3 N c3
ρphase = = = = constant, (7.14)
Vs Vp A δt E 2 δE ΔΩ h3 A δt ν 2 δν ΔΩ
where for the second step we used E = hν, with h being Planck’s constant
and ν the frequency, and the last step is just stating Liouville’s theorem.
The surface brightness is the amount of energy per unit area, per unit solid

223
Chapter 7 Gravitational lensing

angle, per unit frequency, per unit time:


N hν
Iν = . (7.15)
A δt δν ΔΩ
Combining Equations 7.14 and 7.15, and using the photon phase space
density conservation, shows that Iν /ν 3 has to be constant. But the photons
have not gained or lost any energy by moving past the gravitational lens
(notwithstanding any Sunyaev–Zel’dovich effect), so the frequencies ν of
the photons are the same. Therefore the surface brightness Iν is the same.

So the flux of a object with a uniform surface brightness Iν and an area Ω on the
sky is Sν = Iν × Ω. Lensing increases the area to Ωlensed = µΩ (where µ is the
magnification factor), and Iν is the same, so the lensed flux is
Sν,lensed = Iν Ωlensed = Iν µΩ = µSν .
But hang on — doesn’t surface brightness conservation violate energy
conservation? We’ve conserved surface brightness and made the image bigger, so
where have the extra photons come from? In fact, it’s still consistent with energy
conservation. Part of the answer is that photons are being redirected, so in some
directions the background source could be demagnified. Another part of the
answer is that you must take account of the spatial curvature around the lens: the
photons from the background source are now being spread over slightly less than
4π steradians. There is still the same number of photons, but they’re being
distributed over slightly less space.
The magnification factor of an image is therefore equal to the factor increase
of the image’s area on the sky. If the lens is circularly symmetric, then the
magnification is given by
θ dθ
µ= , (7.16)
β dβ
where θ and β are as given in Figure 7.7.

Exercise 7.6 Show by differentiating Equation 7.12 (or otherwise) that lensing
by a point mass (a special case of circular symmetry) gives rise to a magnification
& C D4 @−1
θE
µ= 1− , (7.17)
θ
where, as we’ve seen, θ has two possible values for any source position β.
Exercise 7.7 If an image is within the Einstein radius, i.e. θ < θE , then the
magnification in Equation 7.17 is negative. Is this a physical solution? If it is,
what does this correspond to? If it isn’t, why does this occur in this equation?
(Hint: Why would µ in Equation 7.16 be negative?) ■
We can write the total magnification caused by a point mass as µ = |µ1 | + |µ2 |,
where µ1 and µ2 are the magnifications of each of the two images. After a little
algebra, it turns out that the total magnification of a point mass is
2 + (β/θE )2
µ = |µ1 | + |µ2 | = - .
(β/θE ) (β/θE )2 + 4
224
7.3 Magnification

This has the remarkable property that it is always larger than 1, for any β or θE !
Again, doesn’t this violate energy conservation? Again, it doesn’t. Putting a point
mass lens into the Universe couldn’t change the number of photons that the
background source put out, but it would change the volume over which they are
distributed, because the point mass has a spatial curvature around it. Just like with
our discussion of the surface brightness conservation above, the same number of
photons is being distributed over slightly less than 4π steradians because of this
curvature, so if we compare a universe without the lens (more volume) to one with
the lens (less volume), it’s possible for the magnification always to be > 1.
Gravitational lens magnification has a curious effect on the source counts of
extragalactic objects. We can imagine putting a population of lenses between
ourselves and some extragalactic background objects. These lenses will give
each extragalactic background object a random magnification |µ|, which has
a probability distribution Pr(|µ|). Lenses are generally quite sparse on the
sky, so Pr(|µ|) will have a sharp peak close to |µ| = 1. (We’ll ignore any
redshift-dependence of this probability for the purposes of demonstration.)
The magnification |µ| can be less than 1, in general, so some objects could be

log10 (dN/dS)
demagnified, while some are boosted in flux.
The underlying magnification probability is Pr(|µ|), but the observed
magnification histogram could look very different. Imagine surveying the sky
for background objects with an observed flux of S0 , and suppose that these
background objects have power-law source counts around S0 , i.e. dN/dS ∝ S −α , S0
with α being some constant. There will be a few objects brighter than S0 that are
demagnified, so appear to have flux S0 . However, there will be many more objects log10 (S)
fainter than S0 , as shown in Figure 7.9, some of which have a high µ so appear to
have flux S0 . The net effect is that high magnifications will be over-represented, Figure 7.9 The magnification
compared to what you’d expect from the shape of Pr(|µ|). The steeper the source bias effect. At any flux S0 ,
counts, i.e. the higher the value of α, the more high-magnification objects you’d there are many objects fainter
find. This is known as magnification bias and may be an important new way of than S0 , some of which will be
finding gravitational lenses, as we’ll see in Section 7.11. magnified to the flux S0 . There
If the lens does not have circular symmetry, the magnification calculation is a little are far fewer objects brighter
more complicated. The mapping from source position β = (βx , βy ) to image than S0 , of which again some
position θ = (θx , θy ) is in general done with a matrix A: a small change in β, will be demagnified to S0 .
dβ = (dβx , dβy ), relates to dθ via dβ = A dθ, where The asymmetry between the
C D brighter and fainter populations
∂β ∂βx /∂θx ∂βx /∂θy changes the distribution of
A= = . (7.18)
∂θ ∂βy /∂θx ∂βy /∂θy magnifications for objects with a
fixed observed flux of S0 .
(Equation 7.16 is a special case of Equation 7.18 for circular symmetry.)
The steeper the source count
To calculate the magnification, we want to know how much the background image slope, the more the observed
area (proportional to dβ 2 ) relates to the observed image area (proportional magnification distribution
to dθ2 ). This comes out as is skewed towards higher
dθ2 1 magnifications.
2
= ,
dβ det A
where det A means the determinant of the matrix A, sometimes written using
modulus signs:
C D $ $
a b $a b $
det A = det =$$ $ = ad − bc. (7.19)
c d c d$
225
Chapter 7 Gravitational lensing

For this reason the matrix A is sometimes called the inverse magnification
tensor. (If you need a reminder about what a tensor is, see the box below.) The
magnification tensor is M = A−1 , so 1/det A = det M .

What is a tensor?
Suppose that you have two springs connected to wires as shown in
Figure 7.10a. Both springs have the same spring constant k. What is the
force on the object of mass M in this figure? By Hooke’s law, the force from
each spring is proportional to the displacement, so we have
F = (Fx , Fy ) = (−kx, −ky) = −k(x, y) = −kr, (7.20)
where r is the displacement vector.
y-displacement

y-displacement
M M

(a) x-displacement (b) x-displacement

Figure 7.10 (a) A mass M pulled by two springs. The springs have
frictionless rings that slide along bars that follow the x- and y-axes. The
displacement vector r is also shown. The resulting force vector F is aligned
with r (though in the opposite direction). (b) Now the mass M is pulled by
one spring along the y-axis direction but two along the x-axis direction. The
resulting force vector F is no longer aligned with the displacement vector r.
Now let’s put a second spring on the x-axis, as shown in Figure 7.10b.
What’s the force now? The force F is in a different direction to the
displacement r, and we can’t pull out the factor of k as we did in
Equation 7.20. But we could write it as a matrix:
F = (Fx , Fy )
= (−2kx, −ky)
C D C D
2k 0 2k 0
=− (x, y) = − r = −Kr,
0 k 0 k
where K could be called, say, the ‘spring tensor’ by analogy to the spring
constant. So we can think of this tensor as a matrix that operates on the
displacement vector r to give us the force vector F .
This is nearly sufficient to define this type of tensor, but not quite. Not every
matrix can be a tensor, because a tensor must obey certain transformation
rules. You may not have been aware of this, but the definition of a vector

226
7.4 The singular isothermal sphere model

includes the fact that it obeys the right transformation laws. In Galilean
relativity, any spatial three-vector must by definition obey the Galilean
transformation, so its length and direction are observer-independent. If they
aren’t, it’s not a vector. Similarly, in special relativity, a four-vector must by
definition obey the Lorentz transformation and have a Lorentz-invariant
‘length’ (such as the interval Δs in the case of the position four-vector). The
definition of a tensor is that it must obey similar transformation laws.
This takes us beyond the scope of this book, but it’s one of the key ideas
underpinning the beautiful theory of general relativity.
We’ve discussed only two-dimensional tensors here, but there can be
higher-order ones too, e.g. cubical arrays or hypercubes of numbers.

7.4 The singular isothermal sphere model


Obviously a point mass isn’t a good model for a galaxy lens. Can we come up
with something more realistic? Many galaxies are observed to have fairly flat
rotation curves, i.e. the one-dimensional velocity dispersion σv is independent of
or only weakly dependent on radius r from the centre over much of the radius.
One approach is to imagine that stars or other clumps of matter are like particles
in a gas. This ‘gas’ is imagined to obey an ideal gas law, p = ρkT /m, where ρ is
the density and m the typical mass of the star or clump. The temperature T is
related to the one-dimensional velocity dispersion of the stars or clumps σv by
mσv2 = kT .
To solve this using the ideal gas law, we need to relate the density ρ and the
pressure p. Imagine a shell of thickness dr. It must have volume 4πr2 dr
and mass dM = 4πr2 ρ dr. The gravitational force on this shell must be
dF = −GM (r) dM/r2 , where M (r) is the mass enclosed by the radius r, and
the minus sign accounts for the direction. The pressure exerted by the shell will be
this force divided by the area, which is dF/(4πr2 ). Alternatively, we could think
of this as the pressure drop dp from going from r to r + dr:
dF −GM (r) dM −GM (r) 4πr2 ρ dr −GM (r)
dp = 2
= 2 2
= = ρ dr,
4πr r 4πr r2 4πr2 r2
so
1 dp −GM (r)
= .
ρ dr r2
The solution of these equations turns out to be
σv2 1
ρ(r) = . (7.21)
2πG r2
In other words,
) ρ(r) ∝ r−2 , so M (r) )must be ∝ r (because
M (r) = ρ(r) 4πr2 dr = constant × dr). Therefore the circular velocity
of a star or clump in this galaxy would satisfy v 2 /r = GM (r)/r2 , i.e.
v 2 = GM (r)/r = constant = 2σv2 . Projecting along the line of sight, the
observed surface mass density Σ (we shall spare you the algebra) comes out as
σv2 1
Σ(ξ) = . (7.22)
2G ξ
227
Chapter 7 Gravitational lensing

(The distance ξ was shown in Figure 7.7.) This mass distribution is known as the
singular isothermal sphere. You’ve seen already why this spherically-symmetric
distribution is isothermal. It’s called ‘singular’ because the mass density and
surface density tend to infinity as r and ξ respectively tend to zero. There are
various modifications that can be made to the model to avoid this singularity. The
total mass enclosed within a projected distance ξ is just
* ξ
πσv2
M (ξ) = Σ(ξ % ) 2πξ % dξ % = ξ. (7.23)
0 G
● Can the singular isothermal sphere model be extended to infinity?
❍ M (r) ∝ r, so the mass would tend to infinity. Therefore in practice this
model has to be truncated at some radius (typically > θE ) for it to be physical.
What about gravitational lensing by a singular isothermal sphere? By Birkhoff’s
theorem (Chapters 4 and 6), the deflection by any spherically-symmetric mass
distribution will depend only on the mass within the angular distance ξ, i.e. M (ξ):
4GM (ξ)
α
K= , (7.24)
c2 ξ
which comes out as
C D2
σv2 σv
α
K = 4π 1 (1.4%% )
c2 220 km s−1
(compare Equation 7.5). Similarly, the Einstein radius is
A
4GM (θE ) DLS
θE = (7.25)
c2 DL DS
(compare Equation 7.9), so
4GM (θE ) DLS 4GM (θE ) DLS θE 4G πσv2 ξ DLS θE
θE2 = = =
c2 DL DS c2 DS ξ c2 G DS ξ
thus
4πσv2 DLS
θE = . (7.26)
c2 DS
We can use the scalar lens equation because this lens is circularly symmetric:
β =θ−α (Eqn 7.3)
(see Figure 7.7). Remember that α can be positive or negative.
If β = 0, then θ = θE , i.e. the source is directly behind the lens. The lens equation
is therefore
β = θ ± θE . (7.27)
If β > θE , this gives only one possible solution, θ = β + θE . However, if β < θE ,
there is also a negative solution for θ, i.e. on the other side:
θ = β ± θE . (7.28)
This solution is shown in Figure 7.11a.

228
7.5 Time delays and the Hubble parameter

β β

one image one image

θE θE
two images
three images

−θE θE θ −θE θE θ
(a) (b)

Figure 7.11 Graphical representation of the gravitational lens solution for (a) a singular isothermal sphere in
Equation 7.27, (b) an isothermal sphere with a smoothed-out density profile in the core, sometimes called a
‘softened isothermal sphere’.

If we write θ± for the two images, then the magnifications from Equation 7.16
come out as
C D
θ± θE θE −1
µ± = =1± = 1∓ . (7.29)
β β θ±
Strictly speaking, there would be a third image at θ = 0, because a single photon
shot straight through the middle could not be deviated (by symmetry). However,
this can only come from a zero-sized point in the background source, and the flux
of this central image comes out at zero. There could be a non-zero central image
if the central density cusp in the singular isothermal sphere mass distribution is
smoothed out somehow, which must be the case if the density profile is physical.
An example is shown in Figure 7.11b. This faint central image then appears
even if β 5= 0. The magnification of this image is typically |µ| < 1, i.e. it’s
demagnified. In general, more complicated density profiles also have faint central
images that depend on the central mass distribution. One of the aims of the new
eMERLIN array of radio telescopes in the UK (see Figure 7.12) is to detect faint
central images in order to determine the density profiles at the centres of galaxies.

Exercise 7.8 Suppose that the lens is an infinite sheet of matter with a constant Figure 7.12 The Lovell
surface density Σ. Show that the deflection angle α is given by telescope at Jodrell Bank,
4πGΣ DL DLS near Manchester in England.
α(θ) = θ. This telescope is part of the
c2 DS
eMERLIN array of radio
Next suppose that Σ takes the critical value telescopes.
c2 DS
Σcr = . (7.30)
4πG DL DLS
What will happen? Do gravitational lenses in general focus light? ■

7.5 Time delays and the Hubble parameter


Gravitational lenses also give us a beautiful geometric way of finding the Hubble
parameter. To see how, we’ll need to return to the deflection of light by a single
229
Chapter 7 Gravitational lensing

mass, which curiously turned out to be exactly a factor of two more than the
Newtonian prediction. We’ll shed a little more light on that here.
In Section 7.1 we found the Newtonian deflection as
* +π/2
vy GM
φNewtonian = = 2
cos θ dθ
c θ=−π/2 bc
*
1 ∞ GM
= cos θ dt.
c t=−∞ r2
Now, (GM/r2 ) cos θ is the gradient of the gravitational potential, Φ = −GM/r,
in the y-axis direction in Figure 7.6. We can write this as ∇⊥ Φ, where ⊥ refers to
differentiation being made along a direction perpendicular to the direction of
motion of the particle. (Again, we’re treating this as effectively the same thing as
the y-axis direction, because the change in direction is small.) The deflection is
therefore
*
1 ∞
φNewtonian = ∇⊥ Φ dt. (7.31)
c −∞

c (Φ is negative, but that sign is absorbed into the definition of ∇⊥ .) If we take


‘perpendicular’ to mean at right angles to the direction of motion, rather than
strictly parallel to the y-axis, then this equation is the correct general Newtonian
expression without approximations. Making the approximation that dt = dx/c,
we get
*
1 ∞
φNewtonian = 2 ∇⊥ Φ dx. (7.32)
c/n c −∞
We could also choose to think of the light encountering an effective refractive
index n, which varies with position and so veers the light ray off course. The
analogous situation for a glass prism is shown schematically in Figure 7.13. Here
we have
*
c
φNewtonian = − ∇⊥ n dx,

so we can identify ∇⊥ n = (1/c2 ) ∇⊥ Φ in the Newtonian case.


The general relativistic equivalent can be found by making a weak-field
approximation to the Schwarzschild metric (Chapter 6), using the approximation
C D C D
Figure 7.13 An incoming 2GM −1 2Φ −1 2Φ
light wavefront with speed c 1− 2 = 1+ 2 11− 2
c r c c
meets a prism with refractive
to give
index n and is deflected. The C D C D
effective speed of light within 2 2Φ 2 2 2Φ
ds = 1+ 2 c dt − 1 − 2 dr2 − r2 (dθ2 + sin2 θ dφ2 ). (7.33)
the prism is c/n. The dashed c c
lines mark lines of constant A light ray has ds = 0, and a radial light ray will also have dθ = dφ = 0. In this
light-travel time. situation we have
C D2
dr 1 + 2Φ/c2
= c2 . (7.34)
dt 1 − 2Φ/c2
So the effective (radial) speed of light is
A
1 + 2Φ/c2
c ,
1 − 2Φ/c2
230
7.6 Caustics and multiple images

i.e. a bit less than c (remember that Φ is negative). We could again think of the
lens as having an effective refractive index
A
1 − 2Φ/c2 2Φ
n= 2
11− 2
1 + 2Φ/c c
(using the first terms in a Taylor series expansion). This time, however,
∇⊥ n = (2/c2 ) ∇⊥ Φ, explaining the extra factor of two back in Section 7.1.
A photon takes time (1/c) dY to travel distance dY in empty flat space. If there is a
refractive index n, the time spent is (n/c) dY. Therefore putting a gravitational
lens in between the source and the observer will induce a total time delay of
* observer * observer
1 n
Δt = dY − dY
source c source c
* observer
1−n
= dY
source c L
* observer

= dY, (7.35) S
source c3 O
large H0
where the integrations are done over the light path from the source to the observer.
This is known as the Shapiro delay, after its discoverer. Two different images of
a background source would have two different path lengths and experience S L
different potentials, so in general we should expect there to be a relative time O
delay between different images of a background source.
small H0
This leads to an ingenious method of finding the Hubble parameter H0 . Most of
the lensing equations that we’ve derived up to now have been dimensionless. For Figure 7.14 Schematic view
example, angles are dimensionless, and DLS /DS is dimensionless. Therefore of how the geometry of a
there’s no way to use the lens configuration or arrangement of images to gravitational lens depends on the
determine the absolute size scale of the lens system (see, for example, Hubble parameter H0 . The lens
Figure 7.14). However, the Shapiro delay is proportional to the path length is marked as L, while the source
from the source to the observer. Cosmological distances are proportional and observer are S and O,
to c/H0 (see Chapter 1) so the time delay between two images will be respectively. It’s not possible to
Δt ∝ (1/H0 ) × a number that depends on the lens mass model. So if we can tell from the positions of images
find a mass model of the lens that reproduces the lens geometry (e.g. image alone what the absolute size
configurations, lens redshift and source redshift), we can predict the value of scale of the system is, but the
H0 Δt; then by measuring Δt we can infer the Hubble parameter! time delay between two different
This has been done in several lenses, such as the quasar QSO 0957+561. The images can give an absolute
main uncertainty in this experiment is the mass model. (This uncertainty is much scale and hence H0 .
larger than the effect that varying ΩΛ or Ωm would have on the lens geometry.)
Also, the time delay itself can sometimes be hard to discern from the data. A
recent compilation of time delays from 10 different gravitational lens systems Saha, P. et al., 2006,
found an average Hubble parameter of H0 = 72+8 −11 km s
−1 Mpc−1 . Astrophysical Journal Letters,
650, L15.

7.6 Caustics and multiple images


There are many varied and beautiful patterns in gravitational lensing. To
understand them, we’ll use Fermat’s principle, and that’s most easily done if we
slightly reformulate and simplify the equations that we’ve found so far.

231
Chapter 7 Gravitational lensing

It follows from Section 7.5 that the deflection from a gravitational lens is
*
2 ∞
φ= 2 ∇⊥ Φ dx, (7.36)
c −∞
K in
where Φ is the Newtonian potential. This deflection is also the angle α
Figure 7.7. The observed deflection α will therefore be
*
2 DLS
α= 2 ∇⊥ Φ dx, (7.37)
c DS
where we have switched to the more general vector notation. The lens equation is
therefore
*
2 DLS
β =θ−α=θ− 2 ∇⊥ Φ dx. (7.38)
c DS
We could rewrite this in a simpler-looking form as
β = θ − ∇θ ψ (7.39)
if we can find a suitable new function ψ. Here ∇θ means derivatives with respect
Note that we’re not equating two to θ, i.e.
numbers or variables, but rather C D
∂ ∂
two operators. This is a subtle ∇θ = , . (7.40)
∂θx ∂θy
but radical change in the use of
the = sign. The simplest choice of ψ that works is
*
DLS 2
ψ(θ) = Φ dx. (7.41)
DL DS c2
This is sometimes called the scaled projected Newtonian potential. It’s related to
the deflection angle α through
∇θ ψ = α. (7.42)
We can then rewrite the lens equation as
; H
0 = θ − β − ∇θ ψ = ∇θ 12 (θ − β)2 − ψ . (7.43)
To see what the term in square brackets means, here is the corresponding equation
for the time delay:
(1 + zL ) DL DS ; 1 H
Δt(θ) = 2 (θ − β)2 − ψ = Δtgeom + Δtgrav , (7.44)
c DLS
where zL is the redshift of the lens. We won’t prove this directly (it would take us
too far off-topic); instead, we’ll point out some general features. The two terms in
the square brackets correspond to a gravitational Shapiro time delay (Δtgrav )
involving the projected potential ψ, and a geometrical time delay (Δtgeom )
involving the angular offset between β and θ. The geometrical term is caused by
the fact that the light ray is simply travelling further in getting around the lens.
The factor of (1 + zL ) is necessary because a time delay of Δt as the light passes
the lens will be time dilated by an additional factor of (1 + zL ) by the time it’s
received on the Earth.
Together, Equations 7.43 and 7.44 imply that ∇θ t(θ) = 0. This means that we
find images at stationary points in the time delay. This is a cosmological version
of Fermat’s principle. We’ll see in Section 7.9 that this projected potential ψ can
also be related to the projected mass density Σ.
232
7.6 Caustics and multiple images

The time delay is sometimes called the time delay surface since it varies in
general with both θx and θy on the sky. Images will form at the stationary
points of this surface (minima, maxima, saddle points and points of inflection).
However, if the lens is circularly symmetric, we need to consider only one axis.
Figure 7.15 shows how the two components of the time delay vary with position
for a particular circularly-symmetric lens. Note the images at the three stationary
points in the time delay. If we move the position of the background source, the
geometric time delay component moves (see Figure 7.16), which changes the
shape of the total time delay.

tgeom

tgrav
time delay

time delay

ttotal

angular position, θ angular position, θ

Figure 7.15 The geometric and gravitational Figure 7.16 The variation of the total time
time delays for a particular circularly-symmetric delay and the positions of the images, as the
lens. The position of the source is marked as β, position of the background source is changed.
while the gravitational component peaks at the The lens is closely aligned with the background
centre of the lens (marked with a dotted line). source in the top panel, offset in the central panel,
There are three images marked as black dots that and offset by more in the bottom panel. Note
occur at stationary points in the total time delay how the leftmost image merges with the central
curve. image, and the combined image disappears.
Notice in Figure 7.16 how the image in the left-hand minimum point merges with
the image at the maximum, then they vanish. Images can only be created and
destroyed in pairs, because creating a new minimum means that we must also
create a new maximum. Therefore, provided that the lens is non-singular, there
must always be an odd number of images (sometimes called the odd-number
theorem). This is also true in the general non-circularly-symmetric case.
Another nice feature of these time delay curves is that the time delay between two
images is the vertical distance between them in these plots. In Figure 7.15, for
example, the image furthest from the lens will vary first. This is often the case in
cosmological lens configurations.

Exercise 7.9 Classify each of the images in Figure 7.15 as a maximum, a


minimum or a saddle point. (Hint: Don’t forget the axis coming out of the paper.)
233
Chapter 7 Gravitational lensing

Exercise 7.10 Suppose that you have a softened isothermal sphere potential,
like the one in Figure 7.11b, and you gradually let the potential in the centre get
deeper, so it looks more and more like the singular isothermal sphere model in
Figure 7.11a. What happens to the time delay of an image seen right through the
centre? And where does the image go when the lens potential becomes exactly a
singular isothermal sphere? ■
The images that form at maxima, minima and saddle points are each quite
different in character. How can we find whether images are minima or maxima?
In one-dimensional calculus, a function y(x) with a stationary point at x = x0 has
dy(x0 )/dx = 0. This point is a minimum if d2 y/dx2 > 0 there, a maximum if
d2 y/dx2 < 0, and a point of inflection if d2 y/dx2 = 0. The two-dimensional
equivalent is to consider the matrix
C 2 D
d t/dθx dθx d2 t/dθx dθy
T = . (7.45)
d2 t/dθy dθx d2 t/dθy dθy
The criteria are more complicated than in the one-dimensional case. They rely on
the determinant and the trace of the matrix. We defined the determinant of a 2 × 2
matrix in Equation 7.19, while the trace of a 2 × 2 matrix is defined as
C D
a b
tr A = tr = a + d. (7.46)
c d
The criteria are given in Table 7.1.
We have already met something like the T matrix in a different form. If we
differentiate Equation 7.44 twice, we find that
C D C 2 D
1 0 d ψ/dθx dθx d2 ψ/dθx dθy
T ∝ − . (7.47)
0 1 d2 ψ/dθy dθx d2 ψ/dθy dθy
Back in Section 7.3 we met the inverse magnification tensor, which we defined
as A = ∂β/∂θ (Equation 7.18). If we use the lens equation to expand this
(Equation 7.4, β = θ − α), we find that
C D
∂βx /∂θx ∂βx /∂θy
A=
∂βy /∂θx ∂βy /∂θy
C D
∂(θx − αx )/∂θx ∂(θx − αx )/∂θy
=
∂(θy − αy )/∂θx ∂(θy − αy )/∂θy
C D C D
1 0 ∂αx /∂θx ∂αx /∂θy
= −
0 1 ∂αy /∂θx ∂αy /∂θy
C D C 2 D
1 0 d ψ/dθx dθx d2 ψ/dθx dθy
= − , (7.48)
0 1 d2 ψ/dθy dθx d2 ψ/dθy dθy
where we’ve used α = ∇θ ψ in the last step. Therefore the matrix T is just
proportional to the inverse magnification tensor A.
One consequence of T ∝ A = M −1 is that we can immediately say what the
magnifications of the different types of images are, because µ = 1/det A. These
magnifications are listed in Table 7.1. The saddle point images also have the
curious property of having negative parity, i.e. being mirror-reversed.
Another consequence of T ∝ A = M −1 is that the curvature of the time delay
surface is proportional to inverse magnification, so if the surface is more curved,
the image is less magnified.
234
7.6 Caustics and multiple images

Table 7.1 The types and properties of gravitational lens images, and how to
identify them from the matrix A or T . (We show in the text that T is proportional
to A.)

t(θ) shape Local minimum Saddle point Local maximum


Determinant det A > 0 det A < 0 det A > 0
Trace tr A > 0 Anything tr A < 0
Magnification µ>0 µ<0 µ>0
Parity + − +

● When two images merge, what happens to the magnification?


❍ The curvature would have to be low in that region (see, for example,
Figure 7.16), so the magnification would be high.
We therefore expect that images that are close to each other on the sky would tend
to have high magnifications.
The positions on the sky where images merge are known as critical lines.
The corresponding background source positions are known as caustics. In
gravitational lensing we tend to refer to the image plane (which is what we see)
and the source plane (which is what’s going in the background plane of the
source). Figure 7.17 shows an example of a source being moved around in the
source plane, and the resulting effects in the image plane.

Figure 7.17 The predicted effect of moving a circular background object


through the lens caustics (left figures) caused by a simulated elliptical galaxy lens.
The images merge at the corresponding critical curves (right figures). The outer
caustics and critical curves mark the boundary between one image and three
images. The inner caustics and critical curves mark the boundary between three
and five images.

235
Chapter 7 Gravitational lensing

7.7 Other lens models


The singular isothermal sphere model is not the only lens model in wide
circulation; we’ll describe some alternatives briefly here. Though we’ll only use
them briefly in this book, they are widely used in the lensing community so this is
terminology with which you should be familiar.
The Navarro–Frenk–White model14 is based on predictions from N -body
simulations of dark matter haloes (Chapter 4). A generalization of this model is
ρ0
ρ(r) = ! Fα ! F3−α , (7.49)
r r
r0 1 + r0

where ρ0 and r0 are constants, and α = 1 for the Navarro–Frenk–White profile.


The best expression for describing galaxy and cluster haloes is still a matter of
debate. For example, other groups15 have found that α = 1.5 provides a better fit
to their own N -body simulations. If dark matter is self-interacting, this would
produce a shallower density profile and reduce the density of the central cusp
(Chapter 4).
Elliptical galaxies can also be modelled by generalization of the isothermal
sphere:
Σ0
Σ(θ1 , θ2 ) = - , (7.50)
θc2 + (1 − ε)θ12 + (1 + ε)θ22
where Σ0 is a constant, θ1 and θ2 are angular positions along the major and minor
axes, ε is the ellipticity, and θc is a core radius. Setting θc = 0 and ε = 0 reduces
this to the singular isothermal sphere. The effect of setting θc 5= 0 is to smooth out
the density spike in the centre. Alternatively, the Blandford and Kochanek
elliptical density profile16 is often used. For their ‘isothermal’ lens this is
DLS σ2 ; H1/2
ψ(θ1 , θ2 ) = 4π 2v θc2 + (1 − ε)θ12 + (1 + ε)θ22 , (7.51)
DS c
where σv is the isothermal velocity dispersion. (Non-isothermal lenses
have the term in square brackets raised to a positive power less than 1/2,
and have the normalization expressed as a different constant.) Unlike the
Navarro–Frenk–White profile and its variants, this functional form is motivated by
simplicity of calculation for gravitational lensing, though when ε is small it turns
out that it nevertheless is a reasonable approximation (for most uses) to the
isothermal ellipsoid. The demonstration of caustics in Figure 7.17 was made
using a Blandford and Kochanek profile with θE = 1%% , θc = 0.05%% and ε = 0.2.
Some amount of external shear is always present in gravitational lensing, so one
cannot rely on circular symmetry in modelling gravitational lenses. In individual
lens models this is sometimes characterized as an additional potential of
κ γ
ψ(θ1 , θ2 ) = (θ12 + θ22 ) + (θ12 − θ22 );
2 2
the convergence κ and shear γ will be described in more detail in Section 7.9.

14
Navarro, J.F., Frenk, C.S. and White, S.D., 1996, Astrophysical Journal, 462, 563.
15
Moore, B. et al., 1999, Monthly Notices of the Royal Astronomical Society, 310, 1147.
16
Blandford, R.D. and Kochanek, C.S., 1987, Astrophysical Journal, 321, 658.
236
7.8 Microlensing

7.8 Microlensing
In 1936 Einstein published a short note about the gravitational amplification that Einstein, A., 1936, Science, 84,
would occur if two stars happen to appear very close in projection on the sky, 506.
which has since been called microlensing, for reasons that will become clear. He
wrote: ‘there is no great chance of observing this phenomenon, even if dazzling
by the light of the much nearer star . . . is disregarded.’ He published this paper
after being encouraged to investigate the effect by an amateur named Rudi
Mandl (though unknown to both, Eddington and Chwolson had each published
little-known papers on related effects). Einstein also wrote a private note to the
journal editor saying: ‘Let me also thank you for your cooperation with the little For more on this story, see
publication, which Mister Mandl squeezed out of me. It is of little value, but it Renn, J., Sauer, T. and
makes the poor guy happy.’ Einstein reckoned without the tremendous advances Stachel, J., 1997, Science, 275,
in optical imaging technology that have happened in the past few decades. 5297.
We can get a rough idea of the probability of one star gravitationally lensing
another from the Einstein radius. We found this for star–star lensing in
Equation 7.11, with the result that it would be typically measured in
milliarcseconds (10−3 of an arcsecond, which itself is 1/3600th of a degree). The
number of stars per unit area on the sky varies, with higher densities closer to the
Galactic plane. In crowded fields (for example, towards the Galactic bulge), it
turns out that we’d expect of the order of one faint foreground star per square
arcsecond. It may be too faint to detect on its own, but it might nevertheless be a
potential lens. The probability of this foreground star lensing a background one
would be of the order of θE2 /ρ, where ρ is the number of potential lenses per unit
area on the sky, which comes out around 10−6 . So, to detect this type of lensing,
one would need to monitor millions of stars simultaneously. (A more careful
calculation takes into account the fact that lenses close to the source or close to See, for example, Griest et al.,
the Earth have smaller θE than ones more centrally placed.) 1991, Astrophysical Journal
In Einstein’s time, wide-field optical astronomy could be done only with Letters, 372, L79.
photographic plates. Wide-field CCD arrays have now made microlensing
searches possible. Figure 7.18 shows one of the first discoveries of gravitational
microlensing, made with a long-term monitoring campaign of the Large
Magellanic Cloud. As the foreground lens passes in front of the background star,
the background star is gravitationally lensed and magnified. Note the similar
profiles in the red and blue filters: achromaticity is an important test that it is
gravitational lensing, and not some unknown type of variable star.
The original aim of microlensing searches was to detect clumps of dark matter,
which were given the acronym MACHOs (massive compact halo objects). These
clumps could be black holes, clumps of non-baryonic elementary particles, or
dark baryonic matter such as planetary-sized objects or cometary nuclei such as
are found in the Oort cloud of our Solar System. The team that made the early
detection in Figure 7.18 also named their survey ‘The MACHO Project’. For this
experiment one wants to avoid star–star lensing, so surveys for MACHOs have
been done outside the Galactic plane, e.g. towards the Large Magellanic Cloud.
Microlensing events are rarer outside the plane of the Galaxy. The initial results
suggested a large population of ∼0.5 M) lenses in the Galactic halo, but with
larger surveys the current best limit is that < 8% of the dark matter halo of the See, for example, Tisserand et
Galaxy is made up of compact objects. al., 2007, Astronomy and
Astrophysics, 469, 387.
237
Chapter 7 Gravitational lensing

8
blue Amax = 6.86
"
t = 33.9
6

Ablue
4

0
red
6
Ared

1.5
Ared /Ablue

0.5

0
400 420 440 460
days from 2 Jan 1992

Figure 7.18 The light curve of one of the first observations of gravitational
microlensing events, also showing the best fit to the data. The best-fit maximum
magnification and timescales are quoted in the figure. Note that the amplification
is achromatic, as expected for lensing.

● Could all of the dark matter in the Universe be clumps of baryonic matter,
like free-floating Jupiters?
❍ No, because this would violate the Big Bang nucleosynthesis constraint on Ωb
(Chapter 2).
To describe gravitational microlensing, one ideally takes into account the finite
source size and limb darkening (stars not being uniformly bright circles), but a
good approximation is a point mass lens magnification (Equation 7.17). The
distances in this case are not cosmological, so we can just use Euclidean distances
in which DLS does equal DS − DL . Despite the fact that the lens is moving across
our line of sight to a background star, mathematical descriptions of microlensing
are simplest from the lens’s point of view, in which the lens is stationary but the
background source is moving. This is shown schematically in Figure 7.19.

238
7.8 Microlensing

Pythagoras’s theorem gives us the lens–source distance as a function of time:


AC D C D2
β b 2 v
= + × (t − t0 ) , (7.52) source
θE θE θE
where b is the impact parameter on the sky, t0 is the time when the source b
appears closest to the lens, and v is an angular speed (measured, for example, in θE lens
microarcseconds per day). Plugging this expression for β/θE into Equation 7.17
gives us a rather messy expression for the magnification as a function of time. We
won’t write this out in full, but it’s worth noticing what it depends on and what it
doesn’t. If we have some microlensing data like those in Figure 7.18, we would
use the expression for the magnification as a function of time, and we would vary
b/θE , v/θE and t0 to find the best fit to the data. The time t0 just gives us the Figure 7.19 Schematic view
time of closest approach, which is the time when the curve peaks. The overall of a microlensing event, seen
normalization of the light curve will depend on b/θE , while the width of the curve from the point of view of the
depends on v/θE . We can therefore find the parameters b/θE and v/θE , but not b lens. The background source
or v on their own. There’s no way of using a point mass lens microlensing light passes through the Einstein
curve on its own (e.g. Figure 7.18) to find θE . Therefore we can’t use the light radius of the lens θE , and
curves to find out how far away the lenses are, how fast they’re moving, or how its closest approach is an
massive they are. angular separation of b. The
source’s angular speed is v. In
One solution to this problem is to assume that the lens has a typical transverse
practice it’s the lens that moves
velocity within the galaxy of around 200 km s−1 , which is typical of stars in
across our line of sight to the
the Galaxy. With a large enough sample of microlensing events, plus some
background source, but it’s
assumptions about the spatial distribution of lenses, one could infer a lens mass
sometimes easier to visualize
distribution. This statistical method was the original approach of the MACHO
from the lens’s point of view.
project team.
As this is the source plane,
Another approach is to use parallax. If a microlensing event takes long enough, the position in this figure
there could be a measurable change in the lens geometry caused by the Earth’s corresponds to the angle β in
motion around the Sun. These events are rare, because most microlensing events Figure 7.7.
last tens of days, not hundreds. (To calculate a typical duration of a microlensing
event, you would assume a lens distance DL , use it to convert the Einstein radius
to a physical distance ξE = θE DL , then estimate how long it would take a star to
cross it at the typical Galactocentric speed.) It turns out that the parallax supplies
enough extra information to derive θE and find the lens mass and distance. See, for example, Alcock et al.,
Alternatively, the background object could be a binary — indeed, about half of the 1995, Astrophysical Journal,
stars in the Galaxy are in binary systems. This superimposes a slight periodic 454, 125.
variation on the light curve signal, which depends on the stars’ orbital distance in
units of θE . If the orbital parameters of the binary can be determined by other
means, θE can be inferred. This is, in some sense, an inverse to the parallax effect Griest, K. and Hu, W., 1992,
on microlensing. It has sometimes been called Xallarap. Astrophysical Journal, 397, 362.
Many collaborations have sought to find microlenses, including the MACHO
project mentioned above, EROS (Expérience pour la Recherche d’Objets
Sombres), OGLE (Optical Gravitational Lens Experiment) and MOA
(Microlensing Observations in Astrophysics). The consortia typically arrange for
rapid worldwide follow-ups of newly discovered microlensing events, often
using robotic telescopes such as the Faulkes telescopes. Much of the current
interest in microlensing is in planet discovery; a planet around a lens can create
sudden characteristic changes in the light curves, if the source passes through
the appropriate caustic from the planet. Microlensing is complementary to
239
Chapter 7 Gravitational lensing

other methods of exoplanet discovery; 2005 saw the microlensing discovery of


a 5.5 Earth mass planet. The anomalous data point in Figure 7.18 near the
maximum magnification may be the result of a binary companion to the lens. This
is a very exciting area of research, but it takes us beyond the cosmological theme
of this book. There are some suggestions for further reading on this at the end of
this chapter.
One final cosmological example of microlensing is worth mentioning. When
quasars are gravitationally lensed by foreground galaxies, the small angular sizes
of the regions emitting the quasar’s continuum and broad emission lines make the
quasar susceptible to microlensing by stars in the lensing galaxy. This could in
principle give clues about the internal structure of quasars, and attempts have been
made to pursue this, though in practice it is limited by the relative rarity of lensed
quasars and the microlensing timescales of (in some cases) multiple years.

7.9 Cosmic shear


Gravitational lensing can also trace the large-scale structure of the Universe,
through patterns of weak lensing. This directly measures the large-scale matter
distribution of the cosmic web and probes the matter power spectrum in the linear
regime (Chapter 4). This makes it an excellent test of hierarchical CDM structure
formation models. The method appears to be so promising that it could become a
powerful route to estimating the equation of state of dark energy.
In this case, the lensing of individual galaxies is weak, so there are not likely to
be multiple images, but one can still detect the effect statistically from the
tendency of galaxy ellipticities to align, as shown in Figure 7.20. Galaxies are not
themselves round, so alignments will not be perfect, but their intrinsic orientations
will be random compared to the foreground structure, so the effects of intrinsic
ellipticities should average out to zero.

Figure 7.20 Exaggerated view of weak lensing by the cosmic large-scale structure of matter. The shear
component of the gravitational magnification will tend to be aligned with the nearby large-scale structure (red), so
measured galaxy ellipticities (blue) on average will trace the foreground large-scale matter distribution.

240
7.9 Cosmic shear

To calculate how much ellipticity is induced by a gravitational lens, we’ll show


that we can rewrite the inverse magnification tensor as
C D C D
1 0 cos 2φ sin 2φ
A = (1 − κ) −γ , (7.53)
0 1 sin 2φ − cos 2φ
where the κ in the first term is called the convergence, the γ in the second is
called the shear, and φ measures the orientation angle of the shear. Figure 7.21
illustrates the different effects of these two terms: convergence is isotropic, so
the images are just rescaled by a factor, while shear stretches the shapes in a
particular direction. The weak lensing by large-scale structure is often called
cosmic shear. Figure 7.22 shows a schematic simulated image without shear and
another with shear at the level typical for lensing by large-scale structure, though
in a fixed direction uniformly over the field. These are subtle effects!

convergence alone

lensing

source

convergence + shear

Figure 7.21 Demonstration of the different Figure 7.22 Demonstration of an image


effects of convergence and shear in gravitational without shear (left) and with a constant shear
lensing. The arrow to the right is the direction of applied across the whole image (right). The
the shear. polarization is 0.1 in the right-hand image (in the
image distortion sense rather than the polarized
light sense — see page 242).
To get to Equation 7.53, we’ll first write A as
A = A − 12 (tr A) · I + 21 (tr A) · I, (7.54)
where tr A means the trace of the matrix A (Equation 7.46), and I is the identity
matrix:
C D
1 0
I= . (7.55)
0 1
To avoid a big mess of partial differentials, we’ll write ∂ 2 ψ/∂x2 as ψ11 ,
∂ 2 ψ/∂x∂y as ψ12 , and so on.
With this notation, the trace of A is just
tr A = 1 − ψ11 + 1 − ψ22 = 2 − (ψ11 + ψ22 ),
so the term 12 (tr A) · I comes out as
C D
1 1 − 21 (ψ11 + ψ22 ) 0
2 (tr A) · I = 0 1 − 21 (ψ11 + ψ22 )
. (7.56)

241
Chapter 7 Gravitational lensing

The convergence κ is defined via


1 − κ = 21 (tr A) = 1 − 21 (ψ11 + ψ22 ), (7.57)
so with a bit of algebra you can see that the + 21 (tr A) · I term in Equation 7.54
gives rise to the convergence term in Equation 7.53. It also turns out that
κ = Σ/Σcr , where Σ is the surface mass density and Σcr is the critical surface
mass density (see Exercise 7.8).
The next trick is to write
γ1 = 12 (ψ11 − ψ22 ) = γ(θ) cos(2φ(θ)), (7.58)
γ2 = ψ12 = ψ21 = γ(θ) sin(2φ(θ)). (7.59)
We won’t prove here that such a substitution is always possible, but we
have three free parameters in the inverse magnification matrix (because
∂ 2 ψ/∂θx ∂θy = ∂ 2 ψ/∂θy ∂θx ) and three proposed new parameters κ, γ and φ.
Therefore we might expect that some substitution of this form should be possible,
and we can use it to define γ and φ. Plugging these definitions in gets us to
Equation 7.53.
In practice, measuring cosmic shear is a difficult experiment that needs a stable
and well-characterized point spread function. (An unresolved object in an
image takes the shape of the point spread function.) This can be difficult from
ground-based astronomy. Also, we’ve assumed that galaxies don’t have
intrinsic alignments: galaxies will tend to align themselves with their local
large-scale structure. For example, in the vicinity of the lens, this could produce
apparent shears perpendicular to the gravitational lensing shear. The current
thinking is that intrinsic alignments don’t cause a fatal problem for cosmic
shear detection measurements, because galaxies that are close on the sky are
mostly well-separated in redshift. The shear measurements γ for each galaxy
can be estimated from their ellipticities, taking into account the point spread
function shape. A circular galaxy would appear as an ellipse with a major axis of
a = (1 − κ − γ)−1 and a minor axis of b = (1 − κ + γ)−1 . (In the weak lensing
regime, it’s usually reasonable to assume that the magnification is constant across
the image of the lensed galaxy.) Sometimes the ellipticity is expressed as a
complex number
a2 − b2 2iφ
ε= e = ε1 + iε2 .
a2 + b2
Somewhat confusingly, this complex ellipticity is sometimes referred to as image
polarization, even though it has nothing to do with polarized light. Similarly,
the shear is sometimes expressed as a complex number γ1 + iγ2 . One way of
estimating the strength of cosmic shear is to separate the galaxies into foreground
and background populations on the basis of, for example, photometric redshifts or
even just apparent magnitudes, then measure the tangential component of the
shear for every foreground+background pair of galaxies, then finally plot the
average tangential shear as a function of the pair separation. An example is shown
in Figure 7.23. It’s tempting to think of this as the tangential shear caused by the
foreground galaxies, but because these foreground galaxies cluster, there will also
be lensing contributions from neighbours. The theoretical predictions of this
signal are complicated because one must estimate all the weak deflections
experienced by the light ray in its passage through the intervening inhomogeneous
242
7.9 Cosmic shear

Universe. A good test of whether there are systematic errors lurking in the data
analysis is to rotate the background galaxies by 45◦ . If the measured tangential
shear is due to gravitational lensing, then this 45◦ -rotated signal should be
consistent with zero. This is indeed what’s seen in Figure 7.23.

10−1

10−2

10−3
'γT )

10−4

10−5

5 × 10−3

0
'γX )

−5 × 10−3

0.1 1 10 100
radius/arcmin

Figure 7.23 Tangential (top) and rotated (bottom) shear components as a


function of angular separation of foreground and background galaxies. Galaxies
were treated as lenses or background sources on the basis of apparent R-band
(660 nm) magnitude.
If there is redshift information available, then it’s possible to trace the evolving
structure of the cosmic web. This has been achieved by the COSMOS
(Cosmological Evolution Survey) project using a wide-area HST survey.
Figure 7.24 shows the three-dimensional recovered dark matter distribution
as a function of position on the sky (longitude and latitude known as right
ascension and declination) and as a function of redshift. Figure 7.25 compares the
distribution of total mass (dominated by dark matter) to the distributions of the
stellar mass of galaxies, of the numbers of optically-selected galaxies, and of the
X-ray-luminous gas. The X-ray emission is proportional to the square of the
electron density n2e , and since cosmological plasma is overall electrically neutral,
the X-ray flux will tend to highlight the higher-density regions of the baryon
distribution. (Gravitational lensing sensitivity, however, is linearly proportional to
mass.) There is a z = 0.73 galaxy cluster in Figure 7.25 at coordinates 149◦ 55%
and 2◦ 31% , around which the weak lensing finds filamentary dark matter structures.

243
Chapter 7 Gravitational lensing

Figure 7.24 Dark matter


distribution inferred from weak
lensing in the COSMOS survey.
The redshift axis is compressed
— the survey geometry is really
an elongated cone. Regions
are marked as opaque where 2.8
es

the density is greater than 2.6


g re

1.4 × 1013 M) , with a circle of 2.4


/de

radius 700 kpc on the sky and a 2.2


o n

redshift interval of Δz = 0.05. 2.0


ati lin

The darkness of the faint 1.8


149.6
d ec

1.0 150.4 150.2 150.0 149.8


greyscale background traces the 150.6
full density distribution. right ascension/degrees

The evolution of large-scale structure is a (known) function of the cosmological


parameters, including the dark energy equation of state parameter w. Cosmic
shear could therefore be a route to constraining cosmological parameters.
However, while the signal from cosmic shear itself is much larger than that of
intrinsic galaxy alignments, the intrinsic alignments are much larger than the
effect of changing w by (say) 1%. It turns out that the redshift-dependence of
intrinsic alignments can be used to distinguish the intrinsic alignments from
cosmic shear, and this is a subject of much ongoing research.
The prospects for improving the cosmic shear detections are excellent over the
next ten years or so. The Pan-STARRS survey (Panoramic Survey Telescope And
Rapid Response System) plans to use four dedicated 1.8 m optical telescopes in
Hawaii to survey all the sky visible from that site (about three-quarters of the
whole sky) in six filters to a typical optical magnitude of R = 26 (5σ detection
limit for a point source). The primary goal is to detect Solar System moving
objects, but the project has great potential for the detection of cosmic shear. The
start of operations with its first telescope is imminent at the time of writing.
Meanwhile, the Dark Energy Survey (DES) plans to use 30% of the time on the
4 m Cerro Tololo Inter-American Observatory (CTIO) to conduct an optical
survey of 5000 deg2 of the sky in four filters. One of its key objectives is the
detection of weak lensing using photometric redshifts to trace the evolution
of w(z). The Large Synoptic Survey Telescope (LSST) has similar science
objectives and sky area to Pan-STARRS but with a much larger (and hence more
sensitive) 8.4 m telescope.

244
7.9 Cosmic shear

2.8

2.6

2.4
declination/degrees

2.2

2.0

1.8

1.6 Figure 7.25 The total dark matter distribution in


150.6 150.4 150.2 150.0 149.8 149.6 COSMOS, projected onto the sky. Panel (a) has the
(a) right ascension/degrees dark matter marked as contours, while panels (b),
(c) and (d) mark the dark matter distribution in
greyscale. Also shown are various independent
tracers of the baryonic matter distribution: the
stellar mass (blue) as traced by near-infrared
photometry of galaxies, the density of galaxies
(yellow) as traced by optical galaxy counts, and the
hot gas (red) as traced by X-ray imaging of the field
(b) (c) (d) after removal of X-ray point sources.

At the time of writing, LSST survey operations are planned to begin around 2016.
Figure 7.26 shows the projected sensitivity for the dark energy equation of
state parameters for the LSST, assuming that the systematics from intrinsic
alignments and point spread function variations can be well-characterized.
All these future and imminent surveys also seek to measure baryon wiggles,
high-redshift supernovae and the evolution of galaxy clustering (Chapter 3).
Besides intrinsic alignments, the main difficulty with ground-based optical
measurements of weak lensing is the characterization and stability of the point
spread function. There are two quite different solutions that other forthcoming
cosmic shear experiments will (or may) use. One solution is to move the telescope
above the Earth’s turbulent atmosphere. At the time of writing there are two major
space missions proposed to do this: the European Space Agency EUCLID
mission, and the NASA Joint Dark Energy Mission (JDEM). Both missions are
ambitious wide-field optical/near-infrared imaging and spectroscopy surveys
using a ∼1.2–1.5 m space telescope.
It has been proposed that the missions should merge and form a joint ESA/NASA
project. The other option is to use radio telescopes, because the angular resolution
of radio interferometry is not subject to the seeing limitations of ground-based
optical astronomy. Getting enough galaxies over a large enough sky area is
challenging for the current generation of radio telescopes, but the Square

245
Chapter 7 Gravitational lensing

0.5

w1
0

Figure 7.26 Projected dark


energy equation of state WL
constraints for the Large −0.5 BAO
Synoptic Survey Telescope. cluster
As well as the weak lensing SN
constraints, there are also WL + BAO
constraints from baryonic
−1 all
acoustic oscillations (BAOs),
supernovae (SN) and galaxy −1.4 −1.2 −1 −0.8 −0.6
cluster number counts w0
(Chapters 3 and 4).

Kilometre Array (SKA; see also Chapter 8) will revolutionize this field. The SKA
should be completed around the year 2020, though early science observations
with a subset of the array will happen in the preceding few years. These future
projects aimed at measuring cosmic shear may also be useful for finding new
strong gravitational lenses (Section 7.11).

7.10 Galaxy cluster lenses


Gravitational lens magnification is one of the best ways of finding the most distant
objects in the Universe. Cosmological lenses are rare on the sky, but one place
where gravitational lensing can be reliably expected to happen is the cores of the
most massive galaxy clusters. We saw an example early on in this chapter in
Figure 7.3. This galaxy cluster, Abell 2218, has one of the best-constrained mass
models of any galaxy cluster. To make such a model, one first finds the lensed
arcs and multiple images. A common procedure is then to approximate the cluster
with a smooth model (e.g. isothermal or Navarro–Frenk–White, Sections 7.4
and 7.7), then add in the additional mass from the galaxies, assuming a constant
mass-to-light ratio. The parameters of the model are iterated until a good fit to the
pattern of arcs and multiple images is found.
The mass model of Abell 2218 has two main condensations, suggesting that the
cluster is the product of an ongoing merger. The cluster has been studied very
comprehensively at many wavelengths. Figure 7.27 shows some of the deep
images of the core of Abell 2218. Many of these are the deepest images ever taken

246
7.10 Galaxy cluster lenses

of the sky at that wavelength. Outside the core of the cluster the magnification
factors are modest, but within the core the magnification factors of individual
background galaxies vary typically from around 2 to 10, so these images are in
addition up to 10 times deeper than can be achieved in unlensed parts of the sky.

Figure 7.27 Images of


Abell 2218 taken with various
space telescopes. Top left:
I-band HST image. Top
right: 15 µm image with the
AKARI space telescope.
Bottom left: 24 µm image
from the Spitzer Space
Telescope. Bottom right:
250 µm image from the
Herschel Space Observatory.

Exercise 7.11 Suppose that at a particular redshift the background galaxies


have a luminosity function dΦ/dL ∝ L−α . For which values of α would lens
magnification increase the number of these background sources? Don’t forget that
increasing the angular size of a distant region also decreases the comoving volume
that’s sampled, for a fixed observed area on the sky. ■
In blank-field galaxy surveys (i.e. mapping of blank areas of sky — see
Chapter 4), the blending and overlapping of objects limits the depth at which
objects can be found. This confusion limit (see Chapter 4) can be circumvented
using gravitational lensing by targeting foreground galaxy clusters instead of
blank fields.
Galaxy cluster lenses have been used to find ultra-high-redshift galaxy candidates.
(At the time of writing this means z > 6, but this changes!) These are rare on the
sky but the magnification assists. The candidates tend to be selected on the basis
of photometric redshifts which at these redshifts are dominated by the Lyman
break. An important test of the proposed redshift is the position(s) of multiple
images. Another important test is optical spectroscopy, looking for emission lines
at the estimated redshift. It’s not obvious that emission lines will be visible if
the galaxy is very dusty (see, for example, Chapters 4 and 5), but even many
submm-selected galaxies have Lyman α lines, so it is certainly worth trying. Two
emission lines are needed to confirm a redshift (Chapter 4), but if an emission line
is seen at the expected position of Lyman α, it may be taken as confirmation of the
redshift. Nevertheless, a claimed z 1 10 gravitationally-lensed galaxy with an
apparent Lyman α emission line in cluster Abell 1835 later turned out to be a red
galaxy at a much lower redshift; the emission line appeared to have been an
artefact caused by bad pixels in the original data. The danger of these false
positive claims is quite high because low-redshift faint red galaxies are far
247
Chapter 7 Gravitational lensing

more common on the sky than ultra-high-redshift galaxies. The existence of


dark matter has very wide though not universal acceptance within the research
community. Some baulk at the prospect of a component of matter for which there
have been no direct observations from particle physics, but which has been
inferred in astronomy to dominate the matter density of the Universe. The
proposed alternative is to modify Newtonian gravitation to explain galaxy rotation
curves. This theory, known as ‘modified Newtonian dynamics’ or MOND, can
be given a relativistic context in an alternative to general relativity known as
‘Tensor–Vector–Scalar’ theory or TeVeS. It can be unwise to make predictions of
future discoveries (many expected Λ = 0 — see Chapter 1); nevertheless, since
this is a minority position within the community, we shall not dwell on this theory
in this book. One key prediction of MOND/TeVeS is that the gravitational lensing
deflection should follow the visible baryonic matter in galaxy clusters. The Bullet
cluster turned out to be an excellent place to test this prediction. Like Abell 2218,
the cluster is being seen in the process of a merger, though the Bullet cluster is at
an earlier stage. The cluster baryonic mass is traced by the X-ray luminous gas, so
the mass inferred from gravitational lensing should match the distribution of
X-ray-emitting gas. However, while the self-interaction of dark matter is very
weak or non-existent, a pocket of gas can interact very strongly with another
through pressure, shocks, and so on. When two galaxy clusters collide, we’d
therefore expect the two ‘clouds’ of dark matter to fall towards each other, pass
through each other and out the other side, oscillate and eventually settle through
tidal forces. Meanwhile, the two ‘clouds’ of gas would interact strongly and settle
into the centre more quickly. The Bullet cluster has a clear separation of the
masses inferred from X-ray gas and from gravitational lensing (Figure 7.28). This
is particularly challenging for MOND/TeVeS if it is to have no dissipationless
dark matter component.

Figure 7.28 Optical image of


the Bullet cluster with X-ray
image from the Chandra X-ray
Observatory superimposed in
pink and the mass inferred
from gravitational lensing
superimposed in blue. The
baryonic matter in galaxy
clusters is traced by the
X-ray-luminous gas. Note the
clear separation of the baryonic
matter (X-rays) from the total
matter, implying a spatially
separate dark matter component.
This is challenging to models
that seek to avoid the existence
of dark matter by modifying the
gravitational force law.

248
7.11 Finding gravitational lenses

7.11 Finding gravitational lenses


The first gravitational lens discovered was the double quasar QSO 0957+561
(Figure 7.29). (The numbers refer to the right ascension and declination
coordinates on the sky, α = 09h 57m and δ = +56.1◦ .) It was found entirely
serendipitously in a survey of optical candidates of radio sources aimed at
discovering new radio-loud active galaxies.
It turns out that only a small fraction of quasars (radio-loud or otherwise) are
gravitationally lensed. Exhaustive efforts were made to follow up radio sources
with the UK’s MERLIN array to make high-resolution images to find new lenses,
by the JVAS (Jodrell/VLA Astrometric Survey) project and later the CLASS
(Cosmic Lens All-Sky Survey) project. A total of 16 503 radio sources were
surveyed by MERLIN, of which only 22 were found to be strong gravitational
lenses.
Meanwhile, other gravitational lenses were being discovered serendipitously. The
Einstein Cross, also known more prosaically as QSO 2237+030, was discovered in
a spectroscopic redshift survey of nearby galaxies. Figure 7.30 shows an optical
image of this lens system. Such systems have become known as ‘quadruple
lenses’ or ‘quad lenses’, though by the odd-number theorem (Section 7.6) there
must be a fifth demagnified image near the centre.

Figure 7.29 1.6 µm image


(H-band) and 0.55 µm
image (V-band) of the
gravitationally-lensed quasar
QSO 0957+561. Both images
are shown in false colours. The
lensing galaxy is much redder
than the quasar images, i.e. it
is relatively brighter in H
than in V. Images from the
Figure 7.30 Deconvolved image of the Einstein Cross. There
CASTLES (CfA–Arizona Space
are four images of a background quasar at redshift z = 1.695,
Telescope LEns Survey) survey
surrounding the core of a foreground galaxy at redshift z = 0.0394.
of gravitational lenses.
The fifth feature at the centre is the core of the foreground galaxy,
rather than a fifth image.
The steeper the slope of the number counts or luminosity function, the
larger the magnification bias (Section 7.3). For this reason, the search for
gravitationally-lensed quasars focused on high-luminosity quasars. One of the
first to be discovered in this way was the Cloverleaf lens, so named because it’s
another quad lens. There is an ongoing search for lenses among the 1 100 000
spectroscopically-confirmed quasars in the SLOAN survey (Chapter 3), known as
the SLOAN Quasar Lens Survey (SQLS): candidate lenses are selected in the
low-resolution (1 1.3%% seeing) SLOAN imaging on the basis of morphology and
colour, then candidates are followed up at higher angular resolution with other
telescopes. At the time of writing, this project has uncovered 32 new lensed
quasars.

249
Chapter 7 Gravitational lensing

The SLOAN survey has also been the source of another large catalogue of
lenses. The SLOAN Lens ACS survey (SLACS) has found 131 galaxy–galaxy
lenses by searching the SLOAN spectra for an absorption-dominated redshift
combined with nebular emission lines (e.g. [O II] 372.7 nm or [O III] 500.7 nm) at
another, higher, redshift in the same spectrum. These lens candidates were
followed up with high-resolution imaging from the HST Advanced Camera for
Surveys (ACS). Figure 7.31 shows some of the beautiful lens systems from this
survey. The mass profile implied by these lenses is approximately isothermal
(Section 7.4), but on average the mass profile is not the same as the light profile.
The mass profile does not seem to have evolved since z = 1. Most of these
lenses are elliptical galaxies, because ellipticals tend to be massive galaxies, so
their cross section for lensing is higher. Typical Einstein radii are about an
arcsecond, with lens masses roughly in the range 1010 –1012 M) . The SLACS
lenses also follow a fundamental plane (Chapter 3) that is consistent with the local
fundamental plane once luminosity evolution is accounted for.

Figure 7.31 Sixty


gravitational lenses from the
SLACS survey from the HST
I-band (814 nm) ACS imaging.
Although the HST imaging is
intrinsically monochrome, the
colours of the foreground lenses
in these images have been
set using the SLOAN g–r
colours, while the background
lensed galaxies have been
artificially enhanced in blue. In
each pair, the left-hand panel
is the observed data, while
the right-hand panel is the
mathematical model used to
describe that lens.

Submm-wave surveys also have steep number counts (Chapter 5), particularly at
bright fluxes, so bright submm-wave galaxies should be more prone to
magnification bias (Section 7.3). At the time of writing, there are two forthcoming
surveys that may find many new lenses: the SCUBA-2 All-Sky Survey (SASSy)
and the Herschel ATLAS key project (Astrophysical Terahertz Large Area
Survey). Both projects aim to scan the sky quickly to a shallow sensitivity in
order to find the rare bright objects that may be lensed. Nearby galaxies that
make up the Euclidean slope of the counts will probably be easily excludable
by their cross-identifications with obvious nearby galaxies in optical surveys
250
Summary of Chapter 7

such as SLOAN. Similarly, radio-loud active galaxies will have obvious


cross-identifications in radio surveys. As a result, the gravitational lens selection
efficiency is expected to be around 100%, a far cry from the 22/16 503 lenses
found in the early MERLIN surveys.
Another approach is to visually examine galaxies by hand. In an extraordinary
solo effort, Neal Jackson from Jodrell Bank checked all 285 000 individual HST
galaxy images from the COSMOS survey (see Section 7.9 and Chapter 4) that are
brighter than an I-band magnitude of I = 25, finding two definite new strong
lenses, one probable strong lens and a list of over a hundred candidates. Some
progress has been made in using a computer to find gravitational lenses in
high-resolution images automatically, but for the time being at least, human See, for example, Marshall et
beings still perform better than software, so Neal Jackson’s effort cannot easily be al., 2009, Astrophysical Journal,
superseded. 694, 924.
Summary of Chapter 7
1. Gravitational lensing conserves surface brightness. Images that are
magnified in area are therefore also magnified by the same factor in flux.
2. Gravitational lensing is achromatic, though the magnification can vary
across an extended background object.
3. Gravitational lensing does not focus light rays (except in the case of a mass
sheet with the critical mass density). In general a gravitational lens will have
every variety of aberration in geometrical optics, except chromatic
aberration (because gravitational lensing is achromatic).
4. The gravitational lens deflection by a point mass is exactly twice the
Newtonian prediction:
4GM
φ= . (Eqn 7.2)
bc2
5. The magnification of a point mass is always larger than 1. This does not
violate energy conservation.
6. A point exactly behind a point mass is gravitationally lensed into a ring, the
radius of which is known as the Einstein radius:
#
4GM DLS
θE = . (Eqn 7.9)
c2 DL DS
This radius is useful for many lens models.
7. We define the source plane as the plane of the background object, and the
image plane as the plane of the lens, i.e. as it appears to the observer.
8. Non-singular lenses have an odd number of images, including a central
(demagnified) image whose flux is dependent on the density profile of
matter in the core of the lensing galaxy.
9. As a source moves in the source plane through a caustic, a new pair of
images is created on the corresponding critical line in the image plane.
10. Magnification bias affects the source counts of background objects and
modifies the observed distribution of magnifications.
11. Microlensing of stars can be used to search for dark matter in the halo of the
Galaxy and a few nearby galaxies.
251
Chapter 7 Gravitational lensing

12. Microlensing of background quasars by stars in a lensing galaxy causes long


timescale drifts (lasting in some cases several years) in the fluxes of
individual images.
13. Gravitational lensing by the cores of galaxy clusters has been used to find
high-redshift galaxies at many wavelengths, in some cases deeper than the
blank-field confusion limit. Although the fluxes are magnified, the
comoving volume sampled decreases.
14. Gravitational lensing by galaxy clusters can also be used to map the
distribution of dark matter in the clusters.
15. Weak lensing by the large-scale structure of the Universe is sensitive to the
matter power spectrum in the linear regime. If the systematic errors can be
well-characterized, the prospects are excellent for using this to constrain the
dark energy equation of state.
16. The inverse magnification tensor can be used to calculate magnifications,
parities, convergences and shears of images.
Further reading
• Adobe Photoshop currently has a downloadable gravitational lensing plug-in.
• More detail on the Schwarzschild solution and the gravitational lensing in
Eddington’s solar eclipse experiment that confirmed Einstein’s theory of
general relativity can be found in Lambourne, R., 2010, Theoretical
Cosmology, Cambridge University Press.
• For more information on lensing theory, see Schneider, P., Ehlers, J. and Falco,
E.E., 1992, Gravitational Lenses, Springer.
• Alternatively, see Narayan, R. and Bartelmann, M., 1995, ‘Lectures on
gravitational lensing’, in Dekel, A. and Ostriker, J.P. (eds) Formation of
Structure in the Universe, Proceedings of the 1995 Jerusalem Winter School,
Cambridge University Press (available at astro-ph/9606001). We have followed
many of the same formalisms and arguments in this chapter.
• For more recent and lengthy reviews of observational lensing, see Meylan, G.,
Jetzer, P. and North, P. (eds) Gravitational Lensing: Strong, Weak and Micro,
Saas-Fee Advanced Course 33, Springer. The four lectures are ‘Introduction to
gravitational lensing and cosmology’ by Peter Schneider, ‘Strong gravitational
lensing’ by Christopher S. Kochanek, ‘Weak gravitational lensing’ by Peter
Schneider (available at astro-ph/0407232), and ‘Gravitational microlensing’ by
Joachim Wambsganss.
• More information on exoplanet discovery can be found in Haswell, C.A., 2010,
Transiting Exoplanets, Cambridge University Press.
• Mellier, A., 1999, ‘Probing the Universe with weak lensing’, Annual Review of
Astronomy and Astrophysics, 37, 127.
• Brainerd, T.G., ‘Constraint on field galaxy haloes from weak lensing and
satellite dynamics’, invited review in Allen, R.E., Nanopoulos, D.V. and Pope,
C.N. (eds) The New Cosmology, AIP Conference Proceedings vol. 743,
available at astro-ph/0411244.
• Albrecht, A. et al., 2006, Report of the Dark Energy Task Force, available at
astro-ph/0609591.

252
Chapter 8 The intervening Universe
Birth: the first and direst of all disasters.
Ambrose Bierce

Introduction
After the CMB, what made the first light in the Universe — early stars, or black
hole accretion? When were these first things created? In this final chapter, we
shall explore what we know of the very earliest objects in the Universe. Much of
this evidence comes from absorption lines, which we shall meet first. These
absorbers also usefully track the cosmic consumption of gas in star formation, and
give us a wonderful method of counting the total number of baryons in the
observable Universe.

8.1 The Lyman α forest


One of the possible answers to Olbers’ paradox in Chapter 1 was that the Universe
is not transparent. As we’ve seen with the Hubble Deep Fields, the Universe turns
out to be surprisingly transparent at optical wavelengths, so this is not the solution
(and in any case, the energy absorbed by dust is still re-radiated in the far-infrared
as thermal radiation). The answer to Olbers’ profound paradox involves a
combination of the finite age of the Universe and cosmological (1 + z)4 surface
brightness dimming.
However, there are wavelength ranges where the Universe is not transparent, at
least along certain lines of sight. Figure 8.1 shows a spectrum of the high-redshift
quasar QSO 1422+23. The broad peak is the Lyman α emission line, caused by
electrons jumping from the n = 2 to n = 1 quantized energy states in hydrogen
(Figure 8.2).

Ly limit system quasar Ly β quasar Ly α


flux/arbitrary units

100

50

0
4000 4200 4400 4600 4800 5000 5200 5400 5600 Figure 8.1 The Lyman α
λ/Å forest in the spectrum of a
quasar.

● Why is the Lyman α line broad in quasars?


❍ The hydrogen gas is moving quickly close to the central supermassive black
hole. The large velocity dispersion gives rise to large Doppler shifts, which in
turn give rise to the large emission line width.
253
Chapter 8 The intervening Universe

energy/eV

−13.6/16
− continuum

E4 r
−13.6/9
− E3
E2

−13.6/4

−13.6
− E1

Figure 8.2 Hydrogen energy levels. The energy of an energy level is given by
En = −13.6 eV/n2 . The potential energy from the nucleus is shown as a black
curve.
On the left-hand side of the Lyman α line, i.e. at shorter wavelengths, the
spectrum seems much noisier. This is not noise; it is the Lyman α absorption lines
Note that n = 1 → 2 is from neutral hydrogen clouds between us and the quasar. The absorbing atoms
absorption of a photon and a each have an electron in the n = 1 energy level that is promoted to n = 2
promotion of the election, while using an absorbed photon’s energy. Since the clouds are at lower redshift than
n = 2 → 1 is emission of a the quasar, their Lyman α absorption is less redshifted, so appears at shorter
photon and demotion of the wavelengths. Figure 8.3 shows this schematically. (Note that ‘Lyman α’ is
electron. sometimes abbreviated as Ly α.)

Earth
quasar
cloud 1 cloud 2

Fλ α Fλ α Fλ α Fλ α

Figure 8.3 Schematic


representation of the Lyman α UV IR UV IR UV IR UV IR
forest. λ λ λ λ

These Lyman α clouds, collectively called the Lyman α forest, are another of
the few ways that astronomers can view the non-luminous Universe. From a
knowledge of the cross section of the absorption (see the box in Chapter 3),
which one can measure in a laboratory, one can calculate the projected number of
hydrogen atoms along this line of sight, per unit area. This is known as the
column density of the absorption, NH I .
254
8.1 The Lyman α forest

What are these absorbers? Are they intervening galaxies, for example? It turns
out that they are not — or at least, there is no one-to-one correlation between
intervening Lyman α absorbers and galaxies that appear to be close in projection
on the sky. Low-redshift absorbers are more likely close to gas-rich local galaxies;
nonetheless, galaxy haloes cannot account for all Lyman α clouds.
Instead, it seems that these absorbers are clumps of intergalactic material that
(for the most part) have not yet condensed to form galaxies. They would be
undetectable, were it not for the fact that they absorb light from background
quasars. The distribution of these primordial clumps is not subject to most of
the complicated physics that determines the distribution of galaxies, such as
non-linear gravitational collapse and feedback. The Lyman α forest can therefore
be used as a tracer of the underlying matter distribution. This is very useful for
testing cosmological models, as we shall see in the next section.
It may surprise you to read that Lyman α clouds exist even in the present-day
Universe. It was once imagined that galaxy formation was something that
happened only early in the history of the Universe, but more recently galaxy
formation has been seen as an ongoing process. There are even nearby galaxies
that seem to have formed all their stars very recently, such as the galaxy I Zw 18
(Figure 8.4). We imagine that a pre-existing puddle of neutral hydrogen has been
disturbed or interacted with in some way that has triggered the formation of stars
within it. It’s not clear what the triggers were for I Zw 18, however.

Figure 8.4 The dwarf galaxy I Zw 18 (read as ‘one Zwicky eighteen’).

Exercise 8.1 When a Lyman α photon is absorbed by a Lyman α forest cloud,


the hydrogen atom is left in an excited state. This state isn’t stable, and energy is
eventually re-emitted as another Lyman α photon. Why doesn’t this fill out the
absorption line?
Exercise 8.2 Suppose that you have a spherical cloud of neutral hydrogen
with a radius of 1 Mpc and a density of one hydrogen atom per cubic cm.
Calculate the column density as seen through the centre of the cloud. ■
Lyman α absorption (n = 1 → 2) at 121.6 nm in the rest frame of the hydrogen
atom isn’t the only absorption in these clouds. A sufficiently energetic photon
could be absorbed using the n = 1 → 3 transition, or n = 1 → 4, and so on.
255
Chapter 8 The intervening Universe

● What is the rest-frame wavelength of the n = 1 → 3 absorption?


❍ The wavelength of the n = 1 → 2 transition is 121.6 nm, which corresponds
to an energy difference of (−13.6/22 − (−13.6/12 )) eV, or 10.2 eV. The
n = 1 → 3 transition is similarly (−13.6/32 − (−13.6/12 )) eV = 12.08̇ eV.
Since E = hν = hc/λ (where h is Planck’s constant, c is the speed of light,
and λ is the wavelength),
E(n = 1 → 3) λ(n = 1 → 2)
= ,
E(n = 1 → 2) λ(n = 1 → 3)
so λ(n = 1 → 3) = 121.6 nm × 10.2/12.08̇ = 102.6 nm.
If the photons are sufficiently energetic, they may ionize the hydrogen atoms
entirely. This can happen only at energies > 13.6 eV, or rest-frame wavelengths
below 91.2 nm. In the spectrum in Figure 8.1, the quasar’s Lyman β emission
line (n = 3 → 1) is marked. At observed wavelengths shorter than those for
this emission line, the spectrum will contain absorption from both intervening
Lyman α and Lyman β lines. At wavelengths below 91.2 nm × (1 + zQSO ), where
zQSO is the quasar redshift, the atoms along the line of sight could be ionized
completely, and the quasar spectrum is noticeably suppressed. This is known as
Lyman limit absorption.
Just because a photon can excite a Lyman β transition, doesn’t mean that it must.
The cross section for Lyman β absorption turns out to be a factor of about five
smaller than that for Lyman α absorption. Similarly, the higher transitions
of n = 1 → 4, n = 1 → 5, and so on are increasingly less probable. Only
intervening neutral hydrogen clouds with the sufficiently high column densities
cause Lyman limit absorption (NH I > 1021 m−2 or so — see Exercise 8.4). This
is partly why the suppression in the quasar spectrum in Figure 8.1 doesn’t start
immediately below 91.2 nm × (1 + zQSO ) — the light from a distant quasar
doesn’t necessarily immediately encounter a sufficiently opaque absorber.

Exercise 8.3 The Lyman series of hydrogen absorption lines is n = 1 → 2


(Lyman α), n = 1 → 3 (Lyman β), n = 1 → 4 (Lyman γ), and so on. There is
also a Balmer series of hydrogen absorption lines: n = 2 → 3 is known as Hα,
n = 2 → 4 is Hβ, n = 2 → 5 is Hγ, and so on. But why is there no Hα forest in
the spectra of quasars? ■
When are clouds thick enough to be opaque? To answer this, and quantify what
we mean by ‘opaque’, we can write the fraction of radiation passing through the
cloud as e−τ , where τ is known as the optical depth. This varies from 0 to 1, and
we can regard τ > 1 as opaque (i.e. < e−1 1 37% of the photons pass through).
The optical depth of a Lyman α cloud to ionizing photons is
)
(σJν /(hν)) dν
τ = NH I ) , (8.1)
(Jν /(hν)) dν
where Jν is the spectrum of the ionizing background, and σ is the cross section
for hydrogen ionization. The factor of hν converts from energy flux Jν to photon
flux Jν /(hν). The cross section is σ = 7.88 × 10−22 m−2 at the Lyman limit, and
scales as ν −3 at higher frequencies.

Exercise 8.4 Assuming that Jν ∝ ν −α , show that the optical depth is τ > 1
when NH I > 1.3 ((α + 3)/α) × 1021 m−2 . This is known as self-shielding. ■
256
8.2 Comparison with cosmological simulations

For reference, quasar spectra have α around 0.5–1 in the ultraviolet, while
galaxies have redder spectra. Taking account of intervening absorption, a
Lyman α cloud might experience an ambient light spectrum of α 1 2, depending
on redshift.

8.2 Comparison with cosmological simulations


The density fluctuations δρ/ρ in the Lyman α forest are not too far from the linear
regime, so the Lyman α forest is not affected by virialization or dissipation, nor by
the complicated biasing physics that affects galaxy clustering. Having said that,
the Lyman α forest is sensitive to the thermal state of the intergalactic medium
(by affecting absorption line widths), so there have been many attempts to
numerically simulate the evolving large-scale structure of the Lyman α forest
and its ionization history using N -body codes with hydrodynamic inputs and
approximations. These COBE-normalized CDM simulations reproduce both the
shape and the amplitude of the spatial power spectrum (due to clustering) of the
Lyman α forest (Figure 4.1). It’s remarkable that Inflation+CDM is more or less
enough to reproduce the spatial clustering of the Lyman α forest, since the latter
covers a very different part of the Universe’s history, and is based on very
different physical phenomena. The clustering of the Lyman α forest can be
combined with the CMB and galaxy clustering to constrain the power spectrum
from ∼1 Mpc scales right up to the horizon scale.

8.3 Ωb and the cosmic deuterium abundance


We’ve already seen in Chapter 2 how the conditions in the early Universe were
hot and dense enough for nuclear reactions, and that the hot Big Bang model
makes very specific predictions for the resulting abundances of elements. The
nuclear reaction rates don’t depend linearly on the density of baryons: for
example, a hypothetical universe with double the baryon density Ωb of our
Universe would produce much more than double the lithium (see Figure 2.3).
Ωb,0 h2 can be determined from the Doppler peaks in the CMB, as we’ve seen, but
its measurement is not independent of other parameters that need to be determined
from the same data, such as the primordial spectral index ns or the Thomson
scattering optical depth to reionization. If we can measure the elemental and
isotopic abundances in primordial (or nearly primordial) gas, we can derive an
independent measurement of the baryon density of the Universe.
Deuterium, also called heavy hydrogen, is the reaction product that depends most
sensitively on Ωb,0 h2 . This isotope of hydrogen has a proton and a neutron as its
nucleus. This subtly changes the electronic energy levels. (In the Bohr model, the
electron’s energy levels change because the increased mass of the nucleus subtly
alters the atom’s centre of mass). If we have a strong Lyman α absorber, we might
hope to detect the corresponding deuterium absorption lines, slightly shifted in
wavelength from the hydrogen absorption. From this, one could calculate the
deuterium abundance relative to hydrogen, often written as [D/H], and then derive
Ωb,0 h2 independently of the CMB.

257
Chapter 8 The intervening Universe

This is a difficult experiment that requires a confirmed low-metallicity Lyman α


absorber. Figure 8.5 shows a Lyman α absorber in the quasar Q 0913+072, which
Ly 6 is strong enough to be almost completely opaque in the centre.

flux/continuum flux
0 1.0
Ly 7
0.5

0
0.0
normalized flux

Ly 8 1190 1200 1210 1220 1230 1240


λ/Å

0 Figure 8.5 Damped Lyman α system in the quasar QSO 0913+072. The
spectrum has been divided by the expected quasar flux, so a flux of 1.0 means no
Ly 9
absorption. The red line shows the best-fit Voigt profile (see Section 8.5).
This absorber was chosen to have very few associated absorption lines from
0 heavier elements. Higher Lyman transitions have lower optical depth, and in
Figure 8.6 the companion deuterium Lyman lines can be seen, slightly blueward
(Ly 11) Ly 10 of the hydrogen absorption. The [D/H] abundance depends on the hydrogen
column density determined from the Lyman α profile in Figure 8.5, which is
difficult to fit to given the presence of other intervening Lyman α absorbers. (This
0 is the principal source of systematic uncertainties.) The deuterium abundance is
−400 0 400 log10 [D/H] = −4.56 ± 0.04, which when combined with other similar
relative velocity/km s−1 measurements gives Ωb,0 h2 = 0.0213 ± 0.0010. Later in this chapter we shall see
how little of this baryonic content of the Universe is stars and planets, and how
Figure 8.6 Hydrogen and much is still in its primordial state.
deuterium absorption in the Figure 8.7 compares the WMAP cosmological parameter constraints with the
quasar QSO 0913+072. The constraint from the [D/H] abundance. This figure shows that combining the
spectrum has been divided by WMAP data with the measured baryon density requires that ns < 1. Several lines
the expected quasar flux, so a of evidence now appear to disfavour an ns = 1 scale-invariant spectral index. In
flux of 1.0 means no absorption. inflationary models, ns depends on the shape of the inflation potential. Is this
In this notation, Lyman α is Ly1, measurement a hint of the new physics of the inflation potential? Inflation models
Lyman β is Ly2, and so on. The with ns 5= 1 also predict a gravitational wave background that might eventually be
x-axis units are km s−1 relative detectable directly in future gravitational wave observatories, or whose effects
to the quasar hydrogen Lyman may be measurable in the polarized CMB with the recently launched Planck space
series absorption. The red lines telescope or other later CMB missions. This will be a critical consistency test for
mark the expected positions of inflation.
the hydrogen and deuterium
absorption. Ly11 is off to the
left of the bottom panel.
8.4 The column density distribution
We’ve seen that the Universe is awash with primordial hydrogen that follows the
filaments and clumps of the underlying matter distribution. But is most of this
hydrogen still lurking in wispy filaments, or is it already in galaxy-sized clumps
waiting to be turned into galaxies?

258
8.4 The column density distribution

1.02
80

1.00 78
76

H0 /km s−1 Mpc−1


0.98 74
ns

0.96 72
70
0.94 68
66
0.92
64
1

5
02

02

02

02

02

02
0.

0.

0.

0.

0.

0.

Ωb,0 h2

Figure 8.7 Constraints on the baryon density of the Universe, Ωb,0 h2 , and on the
primordial spectral index of scalar density perturbations ns . The points sample the
allowed distribution from the WMAP data, coloured according to the Hubble
parameter H0 in units km s−1 Mpc−1 . The shaded regions are the 1σ and 2σ bounds on
Ωb,0 h2 based on the deuterium abundance. (1σ means that there is an 1 68% chance of
the true value lying in that range; 2σ corresponds to 95%.) The curves are the 1σ and 2σ
constraints from combining the WMAP measurements with the deuterium abundance.
In Chapter 4, we used the luminosity function φ(L) of galaxies to find which
galaxies contribute most of the luminosity in the Universe: they were around the
peak of the L φ(L) distribution. In Chapter 5, we also used the source counts
dN/dS to find which galaxies dominate the extragalactic background light: they
were around the peak of the S dN/dS distribution. We shall use a similar
trick with the column density distribution to find out where most of the neutral
hydrogen is in the Universe.
The numbers of Lyman α clouds change strongly with cosmic time. Figure 8.8
shows the spectrum of a quasar at low redshift. Comparison with Figure 8.1
shows that the low-redshift quasar clearly has far fewer Lyman α absorbers than
the high-redshift quasar. It’s tempting to suppose that this is exactly the emptying
out of the voids, and filling up of the overdensities, that the cosmological
simulations predict. However, that supposes that we’re sampling the same
comoving volume in the two spectra. For example, could the 1250–1350 Å
observed wavelength range in the low-redshift quasar just be sampling much less
volume than the 4600–4700 Å observed wavelength range in the high-redshift
quasar? This might explain why there are fewer Lyman α lines at low redshift.
To find out, we shall calculate the number of absorbers that a photon would
encounter along its travel from the quasar to us.
Unfortunately there’s an annoying collision of notation: NH I is conventionally
used to mean column density, while N is conventionally used to mean numbers in
source counts dN/dS. The column density distribution (i.e. the number of
absorbers per unit column density) would then be dN/dNH I . To avoid this
clumsy notation, it’s conventional to use N to mean the number of absorbers.

259
Chapter 8 The intervening Universe

−1
10

flux/10−16 J m−2 s−1 Å


8

Figure 8.8 The 6


neighbourhood of the Lyman α 4
emission line of the z = 0.158
quasar 3C273. Compare the lack 2
of Lyman α forest lines with 0
1000 1200 1400 1600 1800
the high-redshift quasar in
λrest /Å
Figure 8.1.

The number of absorbers that a photon might encounter will be proportional to the
path length that it travels, dY, and proportional to the density of absorbers ρ, and
to the average geometrical cross section A of any single cloud. (Don’t confuse
this with the absorption cross section σ of a single atom.) Since ρ = nco (1 + z)3 ,
where nco is the comoving density of the absorbers, and dY = c dt, we can write
the number of absorbers encountered by a photon in a cosmic time interval
t → t + dt as dN = nco (1 + z)3 Ac dt. It’ll be useful to have the number of
We’ve written d2 N as a double absorbers per unit column density per unit redshift, which we can write as
differential, which it is, but be $ $
2
$ $
3 $ dt $
warned that some texts use dN d N = nco (NH I , z) A(NH I , z) (1 + z) c $ $ dNH I dz. (8.2)
dz
when referring to d2 N .

Exercise 8.5 Suppose that the absorbers have a constant comoving


space density, and constant proper sizes and cross sections. Show
that the number of absorbers per unit X would be constant, where
dX/dz = (1 + z)2 H0 /H(z). ■
The quantity X(z) in Exercise 8.5 is sometimes known as the absorption
distance.
It turns out that the number of lines N per unit absorption distance still evolves
strongly:
*
dN d2 N
= dNH I ∝ (1 + z)2.47±0.18 (8.3)
dX dX dNH I
at 1.5 < z < 4 and column densities > 1018 m−2 . So the increase in the number
of lines at high redshifts is not only about the increase in comoving volume
sampled.
Figure 8.9 shows the column density distribution of Lyman α absorbers at
z > 1.5. We’ve used d2 N /dX dNH I to mean the number of absorbers per unit
absorption distance X, per unit column density NH I . It’s also conventional to use
the symbol f to mean d2 N /dX dNH I . From Equation 8.2, we have that
d2 N c
f (NH I , X) = = nco A , (8.4)
dX dNH I H0
where nco and A are both functions of NH I and redshift. In practice, in a redshift
interval Δz corresponding to an absorption distance ΔX(z), we would count
the number of absorbers ΔN in a column density range ΔNH I , to estimate
f (NH I , z) 1 (1/ΔX) ΔN /ΔNH I .
260
8.5 Damped Lyman α systems

−21

−22
log10 f (NH I , X)

−23 Figure 8.9 The column


density distribution of
Lyman α clouds with column
−24 densities measured in cm−2 .
Also shown is the best-fit
single power-law model; the
−25
data, however, show a clear
steepening at the high column
−26
20.5 21.0 21.5 22.0
density end, and clearly
log10 NH I depart from the simple single
power-law model.

The red line in Figure 8.9 is f ∝ NH−1.3


I
. The total column density in an
absorption distance ΔX, assuming that power-law model, is just
*
NHI,total ΔX = ΔX NH I f (NH I ) dNH I
*
= ΔX NH I × NH−1.3 I
dNH I ∝ NH0.7I , (8.5)

which diverges as NH I tends to infinity. There is evidence that the power-law


index of the column density distribution steepens at the highest measured column
densities, as it must for NHI,total to remain finite. Therefore most of the mass of
neutral hydrogen in the Universe at 1.5 < z < 5 was concentrated in the higher
column density absorbers. We shall find out more about these in the next section.

8.5 Damped Lyman α systems


The strongest Lyman α absorbers may have played a key role in the birth of
galaxies. The objects with the biggest neutral hydrogen column densities locally
are spiral discs. Could the strongest high-redshift Lyman α absorbers be their
primordial precursors? A primordial galaxy that’s gravitationally bound, but
which is being seen before it has formed stars, might look very much like a deep
Lyman α absorber.
The incidence of strong absorbers evolves intriguingly. Figure 8.10 shows the line
density dN /dX for absorbers with NH I > 2 × 1024 m−2 . There is no evidence
for evolution from z = 2 to the present, but the incidence of strong absorbers
dropped by a factor of 2 from z = 4 to z = 2. Could this reflect primordial neutral
hydrogen clumps merging, which would reduce nco with time in Equation 8.2? Or
could this represent the consumption of gas by star formation, which would
reduce A with time? Simulations favour the latter option, but it’s frustratingly
impossible to tell from these data. We can only make inferences about the product
nco × A.

261
Chapter 8 The intervening Universe

0.12

0.10

0.08

lDLA (X)
0.06

0.04

0.02

0.00
0 1 2 3 4 5
redshift, z

Figure 8.10 The evolution of the line density


of damped Lyman α clouds.
The highest column density absorbers, such as the one in Figure 8.5, are almost
completely opaque in their centres. The shape of the absorption profile can be
Hendrik Antoon Lorentz, who predicted from quantum mechanical considerations to follow a Lorentzian profile
derived this theoretical profile α
for absorption and emission L(λ) ∝ ,
(λ − λ0 )2 + α2
lines, is also famous today for
where λ0 is the line centre, and α is some constant.
his contributions to special
relativity. Why are absorption lines not infinitely narrow? The cross section for absorption
near — but not equal to — the transition frequency ν = E(1 → 2)/h is not zero.
Heisenberg’s uncertainty principle makes the required energy E uncertain by
ΔE t1/2 1 h/(2π), where t1/2 is the half-life of the excited state. The frequency
width Δν would therefore give h Δν = h/(2πt1/2 ). Since Δλ 1 Δν × |dλ/dν|,
and ν = c/λ where c is the speed of light, the wavelength width is
λ2 1
Δλ = . (8.6)
2πc t1/2
In general, if the final state is not stable and has its own half-life tfinal , the
wavelength width is
C D
λ2 1 1
Δλ = + , (8.7)
2πc tinitial tfinal
where tinitial is the half-life of the initial state. (This holds whether the transition is
emission or absorption.)
This gives us a width, but not the detailed shape of the Lorentzian profile.
Deriving the Lorentzian profile is beyond the scope of this text, but Lorentz
himself derived it prior to the invention (or discovery) of quantum mechanics,
using a classical argument: he imagined atoms as oscillators, which are being
forced by the external electromagnetic field, and which lose energy via radiation
(which in quantum mechanics is spontaneous emission). This last effect appears
as a damping term in the equation of motion, and is responsible for the α term in
the Lorentzian profile expression above. Expressions similar to the Lorentzian
profile occur in physical systems with damped, forced harmonic oscillation.
There are also larger-scale reasons why the absorption is not infinitely narrow.
The gas will have some motion (e.g. turbulence or coherent rotation), which will
262
8.5 Damped Lyman α systems

cause Doppler shifts. There may also be Doppler shifts from the atoms’ thermal
motion, known as thermal broadening. (A third possibility, relevant in stars but
not at the expected densities of Lyman α systems, is pressure broadening: the
presence of nearby atoms affects the photon emission of any particular atom.)
The Doppler broadening in Lyman α clouds is generally treated as a Gaussian
distribution, because a Gaussian form occurs in the expected Maxwell–Boltzmann
2
thermal velocity distribution: Pr(v) ∝ e−mv /(2kT ) .
To find the total effect on the absorption profile, we convolve this Gaussian
distribution with the Lorentzian profile. The resulting curve is known as the Voigt
profile. Figure 8.5 shows the best-fit Voigt profile for this damped Lyman α
system. By analogy with the classical case, the shape of the wings of the profile
depends on the damping term in the oscillator equation of motion. Since the
centres of the profiles in these absorbers are essentially black (i.e. essentially
completely opaque), these damped wings dominate the profile shape, which is
why these Lyman α absorbers are known as ‘damped’. This happens typically at
column densities > 1024 m−2 or so.
The depth of the absorption can also be expressed as equivalent width, illustrated
in Figure 8.11. This is defined by imagining another absorption line, which
removes the same energy but is completely opaque and has a rectangular shape.
The width of this line (W in Figure 8.11) is the equivalent width. Note that this is
just a measure of the intensity of the absorption, and has nothing to do with
velocity widths. Mathematically the equivalent width is
* ∞
C(λ) − S(λ)
W = dλ, (8.8)
−∞ C(λ)
where C(λ) is the continuum level without the absorption, and S(λ) is the
observed spectrum with the absorption. In terms of optical depth, equivalent
width can be written as
* ∞! F
W = 1 − e−τ (λ) dλ. (8.9)
−∞

How can we use the observed equivalent widths to derive the column densities?

W
flux

wavelength

Figure 8.11 The equivalent width, marked as W , is the width of the box that
has an area (hatched) the same as the area of the absorption line (in yellow).
263
Chapter 8 The intervening Universe

Figure 8.12 shows the ‘curve of growth’ for damped Lyman α absorption,
meaning a curve of how the width depends on the optical depth. The optical depth
to ionizing photons τ is related to the column density:
τ (λ) = σ(λ) NH I , (8.10)
where σ is the cross section for absorption. The equivalent width increases
linearly with optical depth: W ∝ τ for small τ . This regime corresponds to
overdensities of δρ/ρ 1 0–15, corresponding to the linear or mildly non-linear
regime of cosmological structure formation. Once τ is around unity, the absorber
is essentially black, and there is little change to the equivalent width with
increasing optical depth until column density is high enough for the damping
wings to start affecting the equivalent width. Once τ > 105 or so, the equivalent
width increases as the square root of τ .

100
1
0.5
10
0
1215 1216
1 logarithmic
square root
W/Å

1
0.1
0.5
linear
1 0
0.01 1205 1215 1225
0.5

0.001 0
1215 1216

0.01 0.1 1 10 102 103 104 105 106 107 108


optical depth at line core, τ0

Figure 8.12 The variation of equivalent width with optical depth at the line
core.
In Exercise 8.4 we found that Lyman α absorbers above the threshold for Lyman
limit absorption are self-shielded, i.e. they are dense enough that ionizing
radiation has an optical depth > 1 and does not penetrate the cloud. This implies
that the gas in these higher column density absorbers must be mostly neutral,
particularly in damped Lyman α systems. This cannot be said of the Lyman α
forest in general, as we shall see later in this chapter.
In the nearby Universe, the objects with the biggest neutral hydrogen column
densities are spiral discs, and this continues to about z 1 1.6. Could the
higher-redshift damped Lyman α systems be the primordial progenitors of these
spiral discs? Even if they are, how can we separate the faint light of these
primordial galaxies from the glare of the background quasar?
One approach is to take high-resolution spectra to try to detect narrow emission
lines from star formation in the galaxy causing the damped Lyman α system.
264
8.5 Damped Lyman α systems

Astronomers have looked for Hα, redshifted into the infrared, or Lyman α in the
centre of the damped absorption trough. Similarly, one can take an image with a
narrow filter, chosen to cover the dark central region of the damped absorption
trough (see Figure 8.13); such an image may also detect faint, narrow Lyman α
emission from star formation. Only three damped Lyman α systems at z > 1.6
have any star formation detected so far using emission lines, though several have
upper limits. It seems that we are seeing a key stage in the assembly of galaxies,
before they are luminous.

6 0.6
flux/10−19 J s−1 m−2 Å−1

transmission
4 0.4

2 0.2

0 0
4500 4550 4600 4650 4700 4750
λ/Å

Figure 8.13 The left panel shows the transmission of a narrow-band filter, compared to the spectrum of a
damped Lyman α system in the quasar PKS 0528-250. Light that passes through this filter should have little or no
contribution from the background quasar. The right image (65%% × 65%% in size) is taken through this narrow-band
filter. The position of the quasar is marked as a red cross. Nearby, there is a galaxy that is ostensibly responsible
for the damped Lyman α absorption.

Another possibility is to use other absorption lines in the quasar spectrum. If the
interstellar medium of the galaxy causing the damped Lyman α absorption has
been enriched by star formation, there should be metal absorption lines in the
quasar spectrum, and these have been detected in many systems. Also, the C II∗
133.57 nm absorption line has been argued to correlate well with the [C II]
158 µm emission line, which in turn is an indirect star formation rate indicator.
From this it’s possible to estimate the star formation rate in projection, in units of
M) per year per kpc2 , in damped Lyman α systems. However, it turns out that the
mean metal content of damped Lyman α systems is about 10 times lower than
expected from their inferred cosmic star formation history! Could rapid star
formation use up the neutral hydrogen, so damped systems don’t stay damped and
others take over? At these low star formation rates, the timescales are too slow for
this to work. Could the metals be ejected from supernova-driven winds? This
would disagree with observations of the metallicity of the intergalactic medium.
It’s not clear what the solution is, but some approaches that have the neutral gas
spatially distinct in the absorbing galaxies from their active star forming regions
265
Chapter 8 The intervening Universe

may be consistent with the data. Whatever the solution, it’s clear that damped
Lyman α absorption gives us a unique window into otherwise invisible aspects of
galaxy formation.

Exercise 8.6 There is some evidence that the highest column density damped
Lyman α systems are more common in quasars with bright apparent magnitudes.
What could cause this?
Exercise 8.7 How could one use observations of the background quasars to
investigate the dust content of damped Lyman α systems? ■

8.6 The proximity effect


Lyman limit systems and damped Lyman α systems may be self-shielded, but
ionizing photons will penetrate lower column density systems. The Lyman α
absorption comes only from neutral hydrogen — how much hydrogen is being
missed by absorption line surveys? A great deal, as it turns out.
The equilibrium ionized fraction (conventionally, if bizarrely, given the symbol x)
is set by the balance between recombination and photoionization: the number of
atoms being ionized must equal the number recombining. The latter is xnH βne ,
where β = β(T ) is the temperature-dependent recombination rate, nH is the total
hydrogen density, xnH is the number of hydrogen ions (i.e. protons) available, and
ne is the number of electrons. The number of atoms being ionized is given
by the number of neutral atoms, i.e. (1 − x)nH , times the cross section for
absorption σ, times the flux density of ionizing photons. These last two terms are
frequency-dependent, so we integrate over frequency and find
* ∞
4πIν
xnH βne = (1 − x)nH σ(ν) dν, (8.11)
νion hν
where Iν is the ionizing background and h νion = 13.6 eV is the minimum energy
needed to ionize hydrogen. The factor of hν converts from energy flux Iν to
photon flux Iν /(hν). Cancelling the nH term gives
* ∞
4πIν
xβne = (1 − x) σ(ν) dν. (8.12)
νion hν
It turns out that the temperature dependence of β is not important, since the
gas tends to be at kT 1 13.6 eV. Therefore the ionization fraction is mainly
determined by the balance between the numbers of ionizing photons and free
electrons. It’s useful to define the dimensionless ionization parameter U to be
the number of photons per electron:
* ∞
nγ 1 4πIν
U= = σ(ν) dν. (8.13)
ne ne νion chν
The hydrogen ionization then comes out as about
x U
= −5.2 . (8.14)
1−x 10
So the amount of ionized (and invisible) hydrogen depends on the ambient
ionizing light that the Lyman α clouds experience. But how can we find out what
ambient light they experience?
266
8.7 ΩH I , the neutral hydrogen density parameter

The vital clue has come from the proximity effect in quasar spectra: as we
approach the redshift of any quasar, the numbers of Lyman α clouds in that
quasar’s spectrum decreases. This is caused by the ionizing radiation from the
quasar, which can be estimated independently from extrapolating the quasar
spectrum. When the quasar’s ionization equals that from the ambient background,
the number of Lyman α clouds dN /dX will be half the number that there are
elsewhere (e.g. along other lines of sight to other quasars, far from a quasar).
At z 1 2.5 the background turns out to be around 10−24 J m−2 . As with the
Hubble parameter, this is sometimes expressed as a dimensionless quantity:
Iν = J−21 × 10−21 (νion /ν)α erg cm−2 s−1 Hz−1 sr−1 , where α is the slope of
the spectrum. (Note: 1 erg cm−2 = 10−3 J m−2 .) In other words, J−21 is the
background at νion in units of 10−21 erg cm−2 s−1 Hz−1 sr−1 . If α = 1, then
J−21 = 1 corresponds to a proper photon density of 63 photons m−3 .
The value of J−21 comes curiously close (within a factor of a few) to the total
ionizing background estimated from integrating the quasar luminosity function.
Do star-forming galaxies provide the rest of this ionizing background? The
similarity of the quasar contribution to the total would then just be a coincidence.
Or are there errors or inaccuracies in the calculations, and quasars provide it all?
The jury is still out. In case ‘coincidence’ is read as pejorative, remember that
there are other coincidences in astronomy (indeed, there must be): for example,
the similar angular sizes of the Sun and the Moon are a coincidence that makes
total solar eclipses possible. In any case, this ‘coincidence’ may reflect some
underlying physical connection between quasar activity and star formation,
already hinted at in the Magorrian relation.
The ionizing background is a fairly constant J−21 1 1 at 1.6 < z < 4, but there is
a very quick decline in the ionizing background at z < 1.6. At z 1 0.5, J−21 is
only 6 × 10−3 , as the epochs of cosmic quasar activity and star formation draw to
a close. At the earliest cosmic epochs, the ability of primordial galaxies to ionize
their environments will depend on the escape fraction of ionizing photons from
these galaxies, of which we shall hear more later in this chapter.
Finally, a creative way to constrain the lifetimes of quasars and test the isotropy of
their emission is the transverse proximity effect: if you have two quasars that
have different redshifts but appear close on the sky, then you can use the Lyman α
forest in the spectrum of the more distant quasar to measure the ionization effect
of the nearer quasar. If quasars are found in rich environments on average, this
will complicate the interpretation, since the richer environment might compensate
for the loss of Lyman α clouds from ionization. (A similar bias may be present in See Goncalves, Steidel and
the proximity effect measurements of J−21 .) Pettini, 2008, Astrophysical
Journal, 676, 816.

8.7 ΩH I , the neutral hydrogen density parameter


We’ve seen that it’s frustratingly impossible to constrain the comoving number
density of absorbers nco separately from their absorption cross section A, along
any one line of sight. However, it turns out to be possible to estimate the
comoving density of neutral hydrogen.
The total neutral hydrogen mass of any single absorber must be µmH NH I A,
where µ is the mean molecular mass, and mH is the mass of a hydrogen atom. We

267
Chapter 8 The intervening Universe

include the µ term to account for the contribution of helium to the neutral gas
mass. The comoving neutral hydrogen matter density must therefore be
*
ρH I (z) = µmH nco NH I A(NH I , z) dNH I
*
H0 µmH
= NH I f (NH I , z) dNH I , (8.15)
c
using Equation 8.4. It’s conventional to measure ρH I in units of the critical
(matter) density, i.e.
8πG ρH I
ΩH I = (8.16)
3H 2
(compare Equation 1.15). In practice, the total HI is estimated over an absorption
distance ΔX by summing the column densities in the interval ΔX:
* ,
NH I,i
N f (NH I , z) dNH I = .
ΔX
Here, NH I,i refers to the column density of the ith absorber.
Figure 8.14 shows the evolution in ΩH I measured by quasar absorption lines.
At the time of writing, the picture is somewhat confusing. Over the redshift
interval 2 < z < 6, there seems to be significant evolution, consistent with the
consumption of gas by star formation. The data point at z = 0 is consistent with
this broad trend. However, at 0.16 < z < 2 there are marginally discrepant data
points. It’s not yet clear what are the causes of the discrepancy, or whether this
represents a genuine effect.

lookback time/Gyr
0 4 6 8 9 10 11 12
1.5

1.0
ΩH I /10−3

0.5

Figure 8.14 The redshift


0.0 evolution of ΩH I . Data from
0 1 2 3 4 5 different sources are shown in
redshift, z
different colours.

It was originally thought that the high-redshift ΩH I matched the current comoving
stellar mass density, often written as Ω∗ . The decline in ΩH I could then be
attributed to the consumption of gas by star formation in galactic discs. This
interpretation, in which damped Lyman α systems (which dominate ΩH I ) do not
interact much with their environment, is sometimes called the ‘closed box’
268
8.8 How big are Lyman α clouds?

model. Curiously, closed box models consistently overestimated the number of


low-metallicity stars in the solar neighbourhood, known as the G-dwarf problem.
However, this interpretation of the evolution in ΩH I rested on the assumption
of an Ωm = 1, Λ = 0 universe. The advent of the concordance cosmology
(Chapter 2) changed this neat picture. It seems now that the present-day Ω∗ is
significantly larger than ΩH I at any epoch (Figure 8.14). Hand in hand with these
observational changes, numerical simulations changed the theoretical picture. It’s
now thought that the neutral gas used up by star formation in damped Lyman α
systems can be replenished by accretion of hydrogen from the intergalactic
medium. This would naturally explain why the observed ΩH I (which is dominated
by damped Lyman α clouds) is at all times less than the present-day Ω∗ . In
hierarchical structure formation, matter overdensities often accrete neighbouring
clumps (Chapter 4), or merge with larger neighbours. Damped Lyman α systems
are seen as dynamic neutral gas reservoirs, in this picture.
Even before the influential Madau diagram (Chapters 4 and 5), Pei and Fall
presciently modelled the consumption of gas in damped Lyman α systems and Pei, Y.C. and Fall, S.M., 1995,
broadly correctly predicted the shape of the cosmic star formation history. Astrophysical Journal, 454, 69.

8.8 How big are Lyman α clouds?


As we’ve seen, it’s frustratingly impossible to tell from one Lyman α forest what
the sizes are of the Lyman α clouds. However, there are important clues from
quasar pairs and gravitationally-lensed quasars. If Lyman α clouds are large
enough, they will intersect more than one quasar line of sight, so we can constrain
the sizes of Lyman α clouds by comparing the Lyman α forests of nearby quasars
or lensed images of a quasar. Lensed quasars have strikingly similar Lyman α
forests (e.g. Figure 8.15), which turn out to give stringent lower limits to Lyman α
cloud sizes of > 25 h−1 kpc.

1
0.9
0.8
0.7
counts/s−1

0.6 Figure 8.15 The Lyman α


0.5 forests in two images of the
0.4 gravitationally lensed quasar
0.3 UM 673. The upper spectrum
0.2
0.1
has been scaled by 7.9 and
0.0 offset by 0.5, and the lower
−0.1 spectrum offset by 0.1. The
3250 3500 3750 4000 4250 4500 4750 bottom panel shows the
λ/Å difference of the spectra.

Quasar pairs, on the other hand, are more widely separated and probe separations
of 1–2 h−1 Mpc. The line-of-sight comparisons are much less striking. From
statistical cross-correlations, there do appear to be some coherent structures on
Mpc scales, but it’s less clear that one is taking two lines of sight through a single,
coherent object — one might be just tracing the same large-scale structure.

269
Chapter 8 The intervening Universe

These size constraints are already enough to constrain the physics of Lyman α
clouds. One early suggestion was that the Lyman α clouds are neutral clumps in
pressure equilibrium with a tenuous ionized medium, but the predicted range of
sizes of 0.03–30 kpc in this model is inconsistent with these size observations.
However, as we’ve seen, the sizes are consistent with cosmological simulations in
which the Lyman α forest is the neutral ‘tip of the iceberg’ of the predominantly
ionized hydrogen gas, which follows the bottom-up gravitational collapse of
matter perturbations.

8.9 Reionization and the Gunn–Peterson test


After the epoch of recombination at z 1 1000, about the time of the CMB, the
Universe entered what’s sometimes called its dark ages. The first luminous objects
in the Universe formed long after the epoch of recombination, and reionized the
Universe. We’ve already seen in Chapter 2 that the free electrons generated in
reionization can Thomson scatter the microwave background photons. To a
good approximation, this reduces the temperature fluctuations on all scales, i.e.
δT /T → (δT /T ) exp(−τT ). The effect of this can therefore be seen in the Cl
anisotropies of the microwave background. The optical depth to Thomson
scattering (which we write here as τT to distinguish it from other optical depths in
this chapter) is one of the free parameters in the CMB fits, and the reionization
epoch has a 95% confidence constraint from the CMB of zreion = 11 ± 3.5,
assuming a single instantaneous reionization redshift. Can we do better, and find
when and how the first objects lit up the Universe?
As the first luminous objects formed, they ionized their immediate surroundings,
then more distant regions. These ionized bubbles or regions, known as Strömgren
spheres, eventually overlapped, and the Universe was almost entirely reionized,
apart from the self-shielded clumps. A simulation of this is shown in Figure 8.16.

Figure 8.16 Numerical simulation of reionization in a (2h−1 )3 comoving Mpc3 volume by Nick Gnedin. The
brown opaque fog symbolizes neutral hydrogen. Glowing blue gas is dense ionized hydrogen, and less dense
ionized hydrogen is rendered as transparent. Yellow dots represent galaxies. The redshifts shown are z = 12.1,
10.4, 9.1, 8.1, 7.3, 6.6, 6.3, 6.0.

270
8.9 Reionization and the Gunn–Peterson test

It’s easy to show that most of the Universe at 1.6 < z < 4 is, on average, ionized.
The present-day density of the Universe is ρ0 = 1.8789 × 10−26 Ω0 h2 kg m−3
(Chapter 1). Putting in the nucleosynthesis value of Ωb,0 h2 1 0.015, and
remembering that density scales as (1 + z)3 , we find the baryon density of the
Universe to be ρb (z) 1 2.8(1 + z)3 × 10−28 kg m−3 . About 75% is hydrogen, as
we’ve seen, and the mass of a proton is 1.67 × 10−27 kg, so there are on average
about 0.13(1 + z)3 hydrogen ions or atoms per cubic metre. The average free
electron density is therefore ne = 0.13(1 + z)3 x per cubic metre, where x is the
average hydrogen ionization fraction. We’ve already seen that the estimated
J−21 1 1 at z 1 3 implies about nγ = 63 photons per cubic metre, so the
ionization parameter is U = nγ /ne = 500x−1 (1 + z)−3 . Using Equation 8.14 we
can find a quadratic equation for x:
107.9
x2 = (1 − x) ,
(1 + z)3
for which the only positive solution is (1 − x) 1 10−8 (1 + z)3 or x 1 1.
Therefore the z 1 3 Universe should on average be highly transparent to
Lyman α, and it’s only because of density inhomogeneities that any Lyman α
absorbers can be seen. If we assume that a Lyman α cloud is 25 kpc in size
(Section 8.8), the neutral hydrogen density must be around ρH I 1 NH I /25 kpc,
which comes out as ρH I 1 (NH I /1019 m−2 ) × 0.013 atoms per cubic metre. This
is much lower than the total hydrogen density of the Universe from primordial
nucleosynthesis (see above), so again we see that most of the hydrogen must be
ionized.
What’s more, an ionizing flux of J−21 = 1 is clearly enough to ionize the
Universe at any redshift for which we are likely to observe an object. However, if
we see the highest-redshift Universe becoming opaque on average to Lyman α
photons, then J−21 must have dropped sharply, and the Universe will be
predominantly neutral. This was first proposed by Gunn and Peterson in 1965 and
is now known as the Gunn–Peterson test. The transition between opaque and
transparent would then be probing the epoch of reionization in which the first
Strömgren spheres expand around the very first luminous objects in the Universe.
We had to wait several decades for the first thrilling hints of reionization in
quasar spectra from a Gunn–Peterson absorption trough at the highest redshifts.
Figure 8.17 shows the spectra of the highest redshift quasars — note the rapidly
decreasing lack of flux in the Lyman α forest at redshifts z > 6. We can convert
this to a Lyman α optical depth, shown in Figure 8.18. Whether this represents a
transition to the epoch when the Strömgren spheres were just beginning to overlap
is still a matter of debate. The Lyman α opacity is sensitive to the presence of rare
voids in the intergalactic medium, so measurements of the Gunn–Peterson
trough are sensitive to assumptions about the distribution of gas. Known quasars
are also biased tracers of the underlying matter distribution, and the quasars
or starbursts responsible for reionization may well also have been strongly
biased, so reionization is likely to have been inhomogeneous. Nevertheless, these
high-redshift quasars are the first to give useful constraints on reionization
simulations.

271
Chapter 8 The intervening Universe

J1148 + 5251, z = 6.42

J1030 + 5254, z = 6.48

J1623 + 3112, z = 6.22

J1048 + 4637, z = 6.20

J1250 + 3130, z = 6.13

J1602 + 4228, z = 6.07

J1630 + 4012, z = 6.05

J1137 + 3549, z = 6.01

J0818 + 1722, z = 6.00


J1306 + 0356, z = 5.99

J1335 + 3533, z = 5.95

J1411 + 1217, z = 5.93

J0840 + 5624, z = 5.85

J0005 − 0006, z = 5.85

J1436 + 5007, z = 5.83

J0836 + 0054, z = 5.82

J0002 + 2550, z = 5.80

J0927 + 2001, z = 5.79

J1044 − 0125, z = 5.74

7000 7400 7800 8200 8600 9000 9400 9800


λ/Å

Figure 8.17 High-redshift quasars in the Sloan Digital Sky Survey (SDSS).
Note the increasing Gunn–Peterson opacity at the highest redshifts.
272
8.10 The Lyman α forest of He II

6
τeff

Figure 8.18 The opacity


2 of the Universe to Lyman α
photons. Data on different
0 quasars are plotted in
3 4 5 6
different colours, with lower
zabs
limits shown as arrows.

Was the first light in the Universe from star formation, or from black hole
accretion in quasars? At redshifts z > 3 the comoving number density of the most
luminous quasars drops quickly (Chapter 4). The slope of the quasar luminosity
function tells us whether fainter quasars could be important. It turns out that the
slope at z > 4 is shallower than at low redshift, which implies that quasars did not
contribute the majority of J−21 during the tail-end of reionization at z 1 6.
Another possibility for the origin of the first light is star-forming galaxies, since
young massive O and B stars are prodigious emitters of ionizing radiation. But
what fraction of this ionizing radiation escapes star-forming galaxies? It’s
difficult to measure Lyman continuum photons from high-redshift galaxies
because of the presence of intervening Lyman α absorbers and Lyman limit
systems; measurements of escape fractions so far range from fesc < 0.1 to
fesc > 0.5. However, even assuming fesc = 1, the luminosity function of z > 6
optically-selected galaxies suggests, as for quasars, that they are insufficient to
reionize the Universe.
Perhaps a new population of objects — such as intermediate mass accreting black
holes — reionized the Universe, but this mini-quasar population could easily
exceed the unresolved soft X-ray background. Perhaps the luminosity function of
z > 5 star-forming galaxies steepens at luminosities much fainter than probed so
far, invoking a new population of dwarf star-forming galaxies. At the time of
writing, the objects that reionized the Universe remain a mystery.
We can’t yet rule out more than one reionization epoch. Figure 8.19 shows
the constraints on the neutral fraction (1 − x) as a function of redshift. Two
reionization epochs might happen if there is an initial flurry of star formation that
creates predominantly massive stars because of the low metallicity (known as Radiation pressure limits the
population III stars), but subsequent stars (population II) are less massive so less maximum luminosity of stars,
able to ionize their surroundings. The intergalactic medium in this model would but the primordial gas from
then recombine until enough stars have formed to ionize it again. which population III stars
formed lacked metal absorption
lines, reducing the radiation
pressure on the gas.
8.10 The Lyman α forest of He II
The epoch of hydrogen reionization is tantalizingly just beyond our grasp, but
25% of the baryons in the Universe are in helium, and helium reionization is
already within our grasp. Helium is harder to ionize than He II: 54.4 eV are

273
Chapter 8 The intervening Universe

needed, as opposed to 13.6 eV with hydrogen. (He I is neutral helium, while He II


refers to He+ ions, i.e. helium with one electron missing; similarly, H I means
neutral hydrogen, and the Strömgren sphere of ionized hydrogen is an H II
region.) This means that a much harder radiation field (i.e. more photons at higher
energies) is needed for complete helium reionization than for hydrogen. Also,
helium reionization will have happened later than hydrogen reionization, because
of the availability of these high-energy ambient photons.

1 Ly α galaxy
late
and GP trough
length WMAP

10−1
Strömgren
sphere
neutral fraction

10−2

double
10−3
Gunn–Peterson

10−4
early

10−5
5 10 20
redshift, z

Figure 8.19 Experimental constraints on the reionization history of the Universe. The lines are two models that
are consistent with the Gunn–Peterson data, and one that is not (but which still is marginally consistent with
WMAP). The Strömgren sphere point is a constraint on the sizes of ionized regions around high-redshift quasars.

Exercise 8.8 Calculate the rest-frame wavelengths of the hydrogen and


helium Lyman limits, and identify the redshifted hydrogen Lyman limit in
Figure 8.20. ■
The helium Lyman α forest and the Gunn–Peterson test could tell us about the
spectrum of objects that are keeping the Universe ionized. However, the helium
lines are a factor of four shorter in wavelength (because the energy levels are
∝ Z 2 , where Z is the atomic number), so the rest wavelength of He II Lyman α is
121.6/4 nm = 30.38 nm. The He II Lyman α forest will therefore all lie below the
hydrogen Lyman limit (rest wavelength 91.2 nm).
This doesn’t mean that He II Lyman α lines are unobservable, but rather means
that they need a quasar that happens to have not much total hydrogen column
density. Figure 8.20 shows the simulated typical opacity in the spectrum of a
quasar at redshift z = 3.2. Below the H I Lyman limit, the spectrum is heavily
attenuated, but at shorter wavelengths the attenuation reduces (because the cross
section for hydrogen ionization varies as ν −3 at frequencies above the Lyman
limit), and there is a possibility of the He II Lyman α lines being observable in
some quasars.
274
8.11 The first light in the Universe and gamma-ray blazars

transmission 1 He II λ = 304 Å

0.5

zQSO = 3.2

0
1000 2000 3000 4000 5000
observed wavelength, λ/Å

Figure 8.20 The expected average transmission to Lyman continuum photons in the spectrum of a z = 3.2
quasar. The dashed lines show the ±1σ range expected in the opacity. Also shown is the location of the He II
Lyman α line at z = 3.2. For some quasars, we might expect enough transparency to be able to detect this line.
Figure 8.21 shows the He II Gunn–Peterson trough in the quasar HE 2347-4342.
The He II opacity is strikingly different to the H I opacity, at 4 times longer
wavelengths. Some regions lacking H I Lyman α lines are also relatively
transparent to He II, but for the most part the spectrum is opaque to He II. (There
is a region with high He II opacity but no obvious H I Lyman α absorbers, possibly
caused by thermal broadening or instrumental noise, or possibly related to
variations in the hardness of the ionizing radiation.)
Taking all available observations, He II reionization is measured to have happened
at a redshift of z = 2.8 ± 0.2. Despite the decline in quasar comoving number
density at z > 2, quasars are more than enough to reionize He II. There is some
tentative evidence, from comparing the H I and He II opacities, that the spectrum
of ionizing radiation is softer at high redshifts, i.e. a smaller proportion of
high-energy photons, consistent with star-forming galaxies providing a bigger
proportion.

8.11 The first light in the Universe and


gamma-ray blazars
Will it ever be possible to directly detect the objects that reionized the hydrogen in
the Universe? If they are (for example) dwarf galaxies, then they are far beyond
the reach of current telescopes, but in the future, the James Webb Space Telescope
(JWST) may be able to detect them.
At the time of writing, the JWST’s launch is several years away, but the integrated
light from the reionization population may already have been detected. This light,
heavily redshifted, should appear as an all-sky background light in the near-
275
Chapter 8 The intervening Universe

redshift, z
2.775 2.800 2.825 2.850 2.875 2.900 2.925 2.950
1.00
0.75
0.50

normalized flux
0.25
0.00
1.00
0.75
0.50
0.25
0.00
1150 1160 1170 1180 1190 1200
λ/Å

Figure 8.21 Signatures of He II reionization in the quasar HE 2347-4342. The top panel shows the optical
spectrum, normalized to the quasar spectrum (so a flux of 1.0 is no absorption). The lower panel shows the
ultraviolet spectrum. The wavelengths in the top panel have been divided by approximately four, to match the
wavelengths of the H I and He II Lyman α forests. The thin dotted, roughly horizontal line is the 1σ uncertainty in
the ultraviolet measurements. The thick vertical dotted line marks the expected position of He II Lyman α (no
emission line is detected), and data redward (i.e. rightward) of the dashed vertical line are affected by absorbers
within the quasar itself or its host galaxy. The quasar redshift is z = 2.885, and redward of the dashed line the
quasar is known to have absorption lines associated with the quasar itself, i.e. zabs 1 zQSO .
infrared. There have been several claims of detections of this cosmic near-infrared
background, independently from the Infrared Telescope in Space (IRTS) and
the Diffuse Infrared Background Experiment (DIRBE) on the COBE CMB
mission. This would be a ground-breaking discovery, but this faint background
(∼10–50 nW m−2 sr−1 at wavelengths of 1–4 µm) is around a hundred times
fainter than the reflected sunlight from the zodiacal dust in our own Solar System.
This is a very delicate experiment that requires careful control of the systematic
uncertainties, and opinion is still divided as to whether genuinely cosmic infrared
background signals have been detected. Another approach is to take the DIRBE
maps, subtract the infrared fluxes of known stars and galaxies, and/or mask them
out, then look for the clustering of the residuals (analogously to the CMB). This is
independent of the absolute cosmic infrared background level. The clustering
measurements in the cosmic near-infrared have been argued to be consistent with
reionization population predictions, but opinion is again divided because this is
again an experiment that needs careful treatment of systematic uncertainties.
Nevertheless, the potential reward of discovering the reionization population
makes this a hot topic in current cosmology.
A completely independent approach to constraining the cosmic near-infrared
background comes from gamma-ray observations of quasars. If the Universe
is filled with a homogeneous cosmic background of near-infrared photons,
they should interact with the gamma rays through the pair production reaction
(γ + γ → e− + e+ , the inverse reaction of electron–positron annihilation, where
one γ is a gamma-ray photon and the other γ is a near-infrared photon), which
results in a measurable gamma-ray opacity. This opacity has not been seen, which
places important limits on the cosmic near-infrared background.

276
8.12 The Square Kilometre Array

8.12 The Square Kilometre Array


A powerful new telescope, or rather array of telescopes, will eventually join the
race to find the epoch of reionization. The Square Kilometre Array (SKA) will be
a radio-wave observatory that will have a combined effective collecting area
(factoring in telescope efficiencies) more than 30 times bigger than the largest
telescope ever built. Although full SKA science operations are still some way off
(not before 2020) at the time of writing, the science promises to be revolutionary
when it comes.
Cosmological neutral hydrogen doesn’t just absorb photons at the Lyman α
wavelength (n = 1 → 2). The energy of an n = 1 electron is slightly different,
depending on whether its spin is parallel to the nucleus spin, or antiparallel
(Figure 8.22). The energy difference is very small in comparison to the Lyman α
transition: ΔE = 5.87 × 10−6 eV, compared to ΔE = 10.2 eV for Lyman α. This
corresponds to photon energies of λ = hc/ΔE 1 21 cm for this n = 1 spin-flip
transition. This is in the radio wavelengths and is accessible to radio telescopes
even when redshifted to z > 10 (though conditions in the Earth’s ionosphere limit
performance at the longest wavelengths). The 21 cm transition during reionization spin-flip
could be seen in absorption or emission, depending on the thermal history of the
gas. In only a few hours’ exposure time, the SKA is expected to detect many tens
or even hundreds of thousands of galaxies in H I out to z ∼ 1.5, from which
Figure 8.22 The hydrogen
redshifts can immediately be calculated.
spin-flip transition. The electron
There is also expected to be a 21 cm forest , analogous to the Lyman α forest, and changes its spin from
a 21 cm Gunn–Peterson test. Unlike Lyman α, however, the 21 cm line doesn’t parallel to antiparallel
saturate when the neutral fraction is large. A simulated SKA radio spectrum of a relative to the nucleus,
z = 10 radio-loud quasar is shown in Figure 8.23. We have yet to find any which releases an energy of
suitable z 1 10 background radio source, but it is not unreasonable to expect the ΔE = 5.87 × 10−6 eV as a
SKA to be able to find one. photon with wavelength 21 cm.

19
flux density/mJy

18

Figure 8.23 A numerical


simulation of the 21 cm forest
128 129 130 131
by the Square Kilometre
observed frequency/MHz
Array, by Chris Carilli.

The SKA also promises to revolutionize many of the topics discussed in this
book: the SKA team aim to measure the dark energy equation of state (from
cosmic shear and baryon wiggles), test whether dark energy clusters (using the
Integrated Sachs–Wolfe effect), measure the power spectrum of primordial
277
Chapter 8 The intervening Universe

density fluctuations, and measure the Hubble parameter to 1% accuracy (from


observations of extragalactic water masers), and the SKA could even work as a
gravitational wave detector by comparing the timings of millisecond pulsars.

8.13 The CODEX experiment


We shall end this book with the story of another extraordinary and ambitious
experiment. Obtaining the results will take some decades, but the astonishing goal
is to detect the real-time expansion of the Universe. Broadly, the aim is to watch
distant objects over decades, and see their redshifts slowly change because of the
expansion of the Universe.
This is an extremely small effect. The quantity dz/dtobserved has dimensions of
1/time, i.e. the same as the Hubble parameter H0 . Broadly speaking, we would
expect dz/dtobserved ∝ H0 , with the constant of proportionality depending on the
cosmological parameters, and with some dependence on redshift. (Note: do not
confuse this with Equation 1.34, which links redshift z with lookback time.)
A Hubble parameter of H0 = 72 km s−1 Mpc−1 is equivalent to
H0 = 2.3 × 10−18 s−1 . If the constant of proportionality mentioned above is of
order 1, then in ten years we would expect a change in redshift of only ∼10−9 .
Measuring this minute change would require extremely accurate redshifts. Even
Doppler shifts within the objects become significant: to first order, v/c 1 zDoppler ,
where c is the speed of light, so a change of just δv = 0.3 m s−1 would result in
a redshift change of δz 1 10−9 . If you compare this with typical velocity
dispersions of galaxies, ∼230 km s−1 , or the velocity widths of narrow emission
lines of active galaxies, ∼1000 km s−1 , you will appreciate the enormous
difficulty of obtaining such a measurement.
We might hope that the constant of proportionality is much bigger than 1, but alas
it is not. If we differentiate 1 + z = Robs /Rem (writing ‘obs’ for observed and
‘em’ for emitted), we find that
dz dRobs /dtobs dRem Robs
= − 2
,
dtobs Rem dtobs Rem
using the quotient rule for differentiation. The chain rule then gives
dz dRobs /dtobs dRem Robs dtem
= − 2 dt
.
dtobs Rem dtem Rem obs
Now, we know that the Hubble parameter is H = (1/R)dR/dt in general, so the
value now must be H0 = (1/Robs ) dRobs /dtobs . Meanwhile, at the time of
redshift z, the Hubble parameter was H(z) = (1/Rem ) dRem /dtem . We can use
these values to simplify the terms in the equations above:
dRobs /dtobs Robs
= H0
Rem Rem
and
dRem Robs Robs
2
= H(z) ,
dtem Rem Rem
which gives
dz Robs dtem Robs
= H0 − H(z) .
dtobs Rem dtobs Rem
278
8.13 The CODEX experiment

But we know that Robs /Rem = dtobs /dtem = (1 + z), so this just becomes
dz
= (1 + z)H0 − H(z).
dtobs
Without being too disingenuous we could write this as
5 +
dz H(z)
= ż = H0 (1 + z) − , (8.17)
dtobs H0
where H(z)/H0 is the factor by which the Hubble parameter has changed.
The rate of change of redshift, ż, is plotted in Figure 8.24. So the constant of
proportionality is generally a bit smaller than 1, at least in the concordance
cosmology.

0
ż × 1010 /yr

−1 ΔV = 0.5 cm s−1

−2 ΩΛ,0 = 0.9
ΩΛ,0 = 0.8
ΩΛ,0 = 0.7 ΔV = 1.0 cm s−1
ΩΛ,0 = 0.6
ΩΛ,0 = 0.4 ΔV = 2.0 cm s−1
−3 ΩΛ,0 = 0.0
2 4 6 8
redshift, z

Figure 8.24 The real-time rate of change of redshift for various cosmological
models. Some reference velocity changes are shown as thin black dashed lines.
So the prospects for measuring the real-time expansion of the Universe seem to
look grim. However, one approach that might work is the Cosmical Dynamics
Experiment, or CODEX. The objective is to take very-high-resolution spectra of
the Lyman α forest with an extremely careful and stable wavelength calibration.
By cross-correlating the spectrum with a second spectrum at least 10 years later,
the shifts in redshifts should be detectable. Figure 8.25 shows simulated spectra
separated in time; note that it’s only by averaging the shifts of many Lyman α
absorbers that the expansion is detectable. This averaging also washes out any
peculiar acceleration in individual objects. CODEX is currently a proposed
experiment for the proposed European Extremely Large Telescope, and is still
many years from taking its first data. Another approach could be to use the SKA
to monitor the H I 21 cm forest in a z > 10 radio-loud active galaxy, if we can find
one that is bright enough (see Figure 8.23). Either way, it is possible that within
our lifetimes we shall have detected the real-time expansion of the Universe.
279
Chapter 8 The intervening Universe

1
0.8
0.6
0.4
0.2
0

transmission
5800 5900 6000 6100
λ/Å

0.5

0
5860 5870 5880 5890
λ/Å

Figure 8.25 The change in redshift in the Lyman α forest expected in five
million years. The shift in ten years will be much smaller, and detectable only
statistically.

Summary of Chapter 8
1. The Lyman α forest of absorption lines blueward of the Lyman α emission
line in quasars and galaxies is caused by intervening, lower-redshift
Lyman α absorbers.
2. Most of the electrons in hydrogen atoms in the Lyman α clouds are in the
ground state, implying no Hα absorption.
3. If sufficiently dense, the clouds will be self-shielding, i.e. will have an
optical depth > 1 to Lyman α photons.
4. The large-scale structure of the Lyman α forest traces the power spectrum of
baryonic matter on scales that are close to the linear regime for the growth of
perturbations.
5. The largest contribution to ΩH I comes from damped Lyman α systems. The
term ‘damped’ refers to the damping wings of the Lorentzian absorption
profile.
6. The column density and optical depth of an absorption line can be related to
the equivalent width via the curve of growth.
7. The comoving number density of absorbers along a line of sight is measured
with the use of absorption distance.
8. Ionizing photons near a quasar reduce the comoving number density of
Lyman α clouds. This is known as the proximity effect.

280
Summary of Chapter 8

9. Gravitationally-lensed quasars and quasar pairs offer two adjacent lines of


sight that can be used to place constraints on the sizes of Lyman α clouds.
The cloud sizes cannot be determined from a single line of sight; only the
product nco × A (where nco is the comoving density and A is the area) can
be found from a single line of sight.
10. After recombination, the Universe was neutral (i.e. unionized) and therefore
opaque to Lyman α photons. The first luminous objects later reionized the
Universe. The epoch of reionization is (at the time of writing) uncertain, but
the constraints from the Gunn–Peterson test (the opacity of the Universe
derived from Lyman α absorption) place it at redshifts z > 6.
11. He II reionization occurs later than hydrogen reionization, at a redshift of
z = 2.8 ± 0.2. The He II Lyman α lines occur at about a factor of four
shorter wavelength than hydrogen, making them observable along only those
few lines of sight without Lyman limit systems over a sufficiently wide
redshift range.
12. Other constraints on the reionization population come from the gamma-ray
opacity of the Universe and the cosmic near-infrared background. The
Square Kilometre Array aims to map the reionization of the Universe using
the 21 cm spin-flip transition of neutral hydrogen.
13. There are proposals to monitor the real-time expansion of the Universe using
the Lyman α forest or the 21 cm forest.

Further reading
• Fan, X., 2006, ‘Observational constraints on cosmic reionization’, Annual
Review of Astronomy and Astrophysics, 44, 415.
• Loeb, A. and Barkana, R., 2001, ‘The reionization of the Universe by the first
stars and quasars’, Annual Review of Astronomy and Astrophysics, 39, 19.
• Rauch, M., 1998, ‘The Lyman alpha forest in the spectra of QSOs’, Annual
Review of Astronomy and Astrophysics, 36, 267.
• Wolfe, A.M., Gawiser, E. and Prochaska, J.X., 2005, ‘Damped Ly α systems’,
Annual Review of Astronomy and Astrophysics, 43, 861.
• Cen, R., 2003, ‘The Universe was reionized twice’, Astrophysical Journal,
591, 12.
• Faucher-Giguère, C.A., Lidz, A., Hernquist, L. and Zaldarriaga, M., 2008,
‘Evolution of the intergalactic opacity: implications for the ionizing
background, cosmic star formation, and quasar activity’, Astrophysical Journal,
688, 85.
• Hauser, M.G. and Dwek, E., 2001, ‘The cosmic infrared background:
measurements and implications’, Annual Review of Astronomy and
Astrophysics, 39, 249.
• Rybicki, G.B. and Lightman, A.P., 1979, Radiative Processes in Astrophysics,
Wiley.
• More on the Square Kilometre Array can be found at its website, currently
[Link]

281
Epilogue
How wonderful it would be to become wise.
Genesis 3, 6
Where will the next big changes in thinking in cosmology come from? Many of
the previous big changes have come from unexpected observational discoveries,
which makes it difficult to foresee the next leaps. As we’ve seen, the population
of high-redshift submm-luminous galaxies seemed unremarkable to optical
telescopes, yet were found to be convulsed in violent star formation by
submm-wave imaging. This led in part to the new model of galaxy downsizing.
As I write this, the submm-wave Herschel Space Observatory will be launched in
six days, and the submm-wave SCUBA-2 camera will shortly be commissioned at
the James Clerk Maxwell Telescope in Hawaii. Both have tremendous scope for
new discoveries. Cosmology has also seen a change from small teams and lone
scientists, to large international consortia using many different astronomical
facilities and techniques. Despite that, it’s still possible for individual scientists to
make a mark, whether on their own or as part of a small or large team. As a result
of these large-scale international efforts and developments in survey technology at
all wavelengths, we are in a very data-rich phase of astronomy.
So what’s next? Large CCD arrays are just making time-domain optical
astronomy possible. Projects such as PAN-STARRS and Gaia will repeatedly
survey large areas of sky. These will almost certainly uncover many new
gravitational microlens events and many nearby supernovae. Gamma-ray
monitoring of the sky led to the completely unexpected discovery of gamma-ray
bursts, which themselves have optical transients, so what else lies in wait to be
discovered in time-domain optical astronomy? Perhaps the new generation of
radio telescopes such as LOFAR and the SKA, or the HST’s successor the JWST,
or the next generation of 1 50 m-diameter optical/near-infrared telescopes, will
detect unexpected reionization populations that generated the first light in the
Universe after the Big Bang. Perhaps the new gravitational wave observatories
LIGO and LISA will detect inspiralling black holes and confront us with
irreconcilable inconsistencies with general relativity. Perhaps the anisotropies
in the CMB will eventually be found inconsistent with inflation, or the LHC
could fail to find the Higgs boson, either of which would force big changes in
fundamental physics. Perhaps the signatures of dark matter particles will be found
in terrestrial direct detection experiments or at the LHC, or their annihilation
signatures will be inferred from cosmic rays, which will tell us what dominates
most of the matter content of the Universe. We know very little indeed about the
dark sector in general, whether dark matter or dark energy. We assume, perhaps
blithely, that dark matter only responds to gravity, but perhaps it has its own
intricate suite of dark physics. Perhaps the delicate measurements of cosmic
shear or baryon wiggles or the expansion of the Universe will constrain the
phenomenological parameters of dark energy, and give some insight on the
physical causes of what dominates the current expansion of the Universe, or
even challenge our assumptions of the size scales at which the Universe is
homogeneous. There has surely never been a more exciting time in observational
cosmology.

282
Appendix A
Table A.1 Common SI unit conversions and derived units.
Quantity Unit Conversion
speed m s−1
acceleration m s−2
angular speed rad s−1
angular acceleration rad s−2
linear momentum kg m s−1
angular momentum kg m2 s−1
force newton (N) 1 N = 1 kg m s−2
energy joule (J) 1 J = 1 N m = 1 kg m2 s−2
power watt (W) 1 W = 1 J s−1 = 1 kg m2 s−3
pressure pascal (Pa) 1 Pa = 1 N m−2 = 1 kg m−1 s−2
frequency hertz (Hz) 1 Hz = 1 s−1
charge coulomb (C) 1C = 1A s
potential difference volt (V) 1 V = 1 J C−1 = 1 kg m2 s−3 A−1
electric field N C−1 1 N C−1 = 1 V m−1 = 1 kg m s−3 A−1
magnetic field tesla (T) 1 T = 1 N s m−1 C−1 = 1 kg s−2 A−1

Table A.2 Other unit conversions.


wavelength mass–energy equivalence
1 nanometre (nm) = 10 Å = 10−9 m 1 kg = 8.99 × 1016 J/c2 (c in m s−1 )
1 ångstrom = 0.1 nm = 10−10 m 1 kg = 5.61 × 1035 eV/c2 (c in m s−1 )

angular measure distance


1◦ = 60 arcmin = 3600 arcsec 1 astronomical unit (AU) = 1.496 × 1011 m
1◦ = 0.017 45 radian 1 light-year (ly) = 9.461 × 1015 m = 0.307 pc
1 radian = 57.30◦ 1 parsec (pc) = 3.086 × 1016 m = 3.26 ly

temperature energy
absolute zero: 0 K = −273.15 ◦ C 1 eV = 1.602 × 10−19 J
0 ◦ C = 273.15 K 1 J = 6.242 × 1018 eV

spectral flux density cross-sectional area


1 jansky (Jy) = 10−26 W m−2 Hz−1 1 barn = 10−28 m2
1 W m−2 Hz−1 = 1026 Jy 1 m2 = 1028 barn

cgs units pressure


1 erg = 10−7 J 1 bar = 105 Pa
1 dyne = 10−5 N 1 Pa = 10−5 bar
1 gauss = 10−4 T 1 atmosphere = 1.013 25 bar
1 emu = 10 C 1 atmosphere = 1.013 25 × 105 Pa

283
Appendix A

Table A.3 Constants.


Name of constant Symbol SI value
Fundamental constants
gravitational constant G 6.673 × 10−11 N m2 kg−2
Boltzmann’s constant k 1.381 × 10−23 J K−1
speed of light in vacuum c 2.998 × 108 m s−1
Planck’s constant h 6.626 × 10−34 J s
! = h/2π 1.055 × 10−34 J s
fine structure constant α = e2 /4πε0 !c 1/137.0
Stefan–Boltzmann constant σ 5.671 × 10−8 J m−2 K−4 s−1
Thomson cross section σT 6.652 × 10−29 m2
permittivity of free space ε0 8.854 × 10−12 C2 N−1 m−2
permeability of free space µ0 4π × 10−7 T m A−1

Particle constants
charge of proton e 1.602 × 10−19 C
charge of electron −e −1.602 × 10−19 C
electron rest mass me 9.109 × 10−31 kg
= 0.511 MeV/c2
proton rest mass mp 1.673 × 10−27 kg
= 938.3 MeV/c2
neutron rest mass mn 1.675 × 10−27 kg
= 939.6 MeV/c2
atomic mass unit u 1.661 × 10−27 kg

Astronomical constants
mass of the Sun M) 1.99 × 1030 kg
radius of the Sun R) 6.96 × 108 m
luminosity of the sun L) 3.83 × 1026 W
mass of the Earth M⊕ 5.97 × 1024 kg
radius of the Earth R⊕ 6.37 × 106 m
mass of Jupiter MJ 1.90 × 1027 kg
radius of Jupiter RJ 7.15 × 107 m
astronomical unit AU 1.496 × 1011 m
light-year ly 9.461 × 1015 m
parsec pc 3.086 × 1016 m
Hubble parameter H0 (70.4 ± 1.5) km s−1 Mpc−1
(2.28 ± 0.05) × 10−18 s−1
age of Universe t0 (13.73 ± 0.15) × 109 years
current critical density ρc,0 (9.30 ± 0.40) × 10−27 kg m−3
current dark energy density ΩΛ,0 (73.2 ± 1.8)%
current matter density Ωm,0 (26.8 ± 1.8)%
current baryonic matter density Ωb,0 (4.4 ± 0.2)%
current non-baryonic matter density Ωc,0 (22.3 ± 0.9)%
current curvature density Ωk,0 (−1.4 ± 1.7)%
current deceleration q0 −0.595 ± 0.025

284
Appendix B

Introduction
In this appendix, we shall take you on a very quick revision of special relativity.
You will need the Lorentz transformation, time dilation and Lorentz contraction in
this book, as well as to be able to use Einstein’s mass–energy equivalence.
Proving E = mc2 will take us into discussions of four-vectors in this appendix,
though four-vectors are not needed in themselves for this book. Where algebraic
steps have been left out for brevity, enough information should be given for you to
fill them in, should you wish to.
There isn’t space to describe Einstein’s ingenious thought experiments that led
him to this theory, nor the many astonishing relativistic paradoxes. For these and
more, consult a specialist text, such as Lambourne’s Relativity, Gravitation and stationary
Cosmology published by Cambridge University Press.

length, L0
B.1 Principles
The principles of special relativity are:
• There is no universal standard of rest.
• The speed of light (c) is invariant.
moving with velocity v

B.2 Feynman light clock

length, L0
Consider a light clock (two mirrors with a light ray bouncing between them),
shown in Figure B.1. The clock is moving with velocity v. One can use
Pythagoras’s theorem to show that
δt1 = γ δt0 , (B.1)
Figure B.1 A Feynman light
where clock, made of two mirrors at a
1 fixed distance L0 , between
γ=- , (B.2)
1 − v 2 /c2 which a light pulse bounces. In a
stationary clock, bounces occur
δt1 is the time between reflections in the moving frame, and δt0 is the time in the at intervals of δt0 = L0 /c. In a
stationary frame. This can be remembered as ‘moving clocks run slowly’. moving clock, the intervals δt1
Sometimes the notation β = v/c is used. between bounces are longer,
because the
- light travels a
distance L20 + (v δt1 )2 > L0 .
B.3 Lorentz contraction and simultaneity Setting this distance equal to
c δt1 and rearranging, one
In Figure B.2, the light pulses leave the corner simultaneously. The impacts at the
obtains Equation B.1.
top and side mirrors are simultaneous in the stationary frame, but not in the
moving frame because the outward journey is longer than the return journey for
the light ray moving parallel to the direction of motion. In general, there is no
universal standard of simultaneity in special relativity.

285
Appendix B

It can be shown that the length in the stationary frame L0 (measured along the
direction of motion) and the length in the moving frame L are related by
length, L0

1
L= L0 . (B.3)
γ
Note that this is contraction, not dilation: ‘moving rulers are short’. There is no
contraction perpendicular to the motion. To prove this, imagine two circular hoops
with the same rest-frame radius, both aligned to be perpendicular to the x-axis.
length, L0 One hoop moves along the x-axis. If one hoop passed inside the other, then there
would be a preferred standard of rest, in contradiction with the first principle.
Figure B.2 A modified
Feynman light clock, made of
two sets of mirrors, both at a B.4 Lorentz transformation
fixed rest-frame distance L0 ,
between which light pulses The transformation of ct, x, y, z coordinates from one reference frame to another
bounce (shown as dashed lines). (which we denote as primed and unprimed coordinates) can be expressed as
 %  
ct ct
 x%  x
 % = Λ , (B.4)
y  y
z% z
where Λ is a 4 × 4 matrix (not to be confused with the cosmological constant).
We assume that the origins of the coordinate systems coincide. For x-axis motion
with velocity v,
 
γ (−v/c)γ 0 0
(−v/c)γ γ 0 0
Λ= 
. (B.5)
0 0 1 0
0 0 0 1
This can be proved elegantly using only symmetries.
First, y % = y and z % = z, because there is no Lorentz contraction perpendicular to
the motion. Suppose that
C %D C D C DC D
ct ct A B ct
=Λ = .
x% x C D x
(We neglect the y- and z-components for reasons of space.) Consider light rays in
the positive and negative x-directions. From the principles of special relativity, the
line x = ct must transform to x% = ct% , i.e.
C DC D C D
A B 1 1
=a ,
C D 1 1
where a is a non-zero constant, which implies that A + B = C + D. Similarly,
x = −ct must transform to x% = −ct% , i.e.
C DC D C D
A B 1 1
=b ,
C D −1 −1
where b 5= 0 is another constant, implying that A − B = −(C − D). Together,
these imply that A = D and B = C, i.e. Λ is a symmetric matrix:
C D
A B
Λ= .
B A
286
Appendix B

Clearly, if x% = 0, then x = vt, so


C %D C DC D
ct A B ct
=
0 B A x
gives 0 = Bct + Ax and so B = (−v/c)A. Hence
C D
A (−v/c)A
Λ= .
(−v/c)A A
Finally, we must have
C D C %D
ct −1 ct
=Λ ,
x x%
where Λ−1 is the inverse matrix to Λ. By symmetry, we must also have
C D
−1 A (+v/c)A
Λ = .
(+v/c)A A
The equation ΛΛ−1 = I (where I is the identity matrix) can be solved to show
that A = γ, as required.

B.5 Invariants
Using the Lorentz transformation, one can show that the interval δs is invariant
(i.e. the same in all reference frames) under Lorentz transformations, where
(δs)2 = c2 (δt)2 − (δx)2 − (δy)2 − (δz)2 .
ct
Note that δs = c δτ , where τ is the proper time. Note also that δs = 0 for photons.

ct
We can write this as

=
A

=

E

x
ct
(δs)2 = ηαβ δxα δxβ , B
α,β
x
where xα does not mean ‘x to the power of α’, but rather in this context refers to
the components of the four-vector (ct, x, y, z). The convention is for this to count
from zero, i.e. x0 = ct, x1 = x, x2 = y, x3 = z. One can only apologize for the
obvious inadequacies of this very common notation. It is usually clear from the
context whether superscripts refer to components, or mean ‘to the power of’.
The matrix Figure B.3 Special
  relativistic lightcone diagram.
1 0 0 0
0 −1 0 0 The origin and point A have a
η=
0

time-like separation for all
0 −1 0
0 0 0 −1 observers, while the origin and
point B have a space-like
is called the metric tensor. Real intervals are known as time-like, and imaginary separation for all observers.
intervals as space-like (see Figure B.3). (Note that some textbooks use
diag(−1, 1, 1, 1), resulting in the opposite convention for δs for space-like and
time-like intervals.) This metric is sometimes known as Minkowski spacetime.
In general, if Aα and B α are the components of four-vectors, then
E
ηαβ Aα B β
α,β
287
Appendix B

is invariant. We define the scalar product of four-vectors as


E
A·B = ηαβ Aα B β ,
α,β

so s = x · x is the invariant ‘length’ of the position four-vector
x = (ct, x, y, z).
Other four-vectors Lorentz transform in the same way as the position four-vector.
(This is, in fact, the definition of a four-vector.) In the following sections we shall
introduce some useful four-vectors and some useful invariant lengths.

B.6 Position four-vector


As mentioned above, the position four-vector is (ct, x, y, z), or xα with
α = 0, 1, 2, 3 in component notation. The invariant length (squared) is x · x = s2 .

B.7 Velocity and acceleration four-vectors


We differentiate the position four-vector xα with respect to proper time τ (an
invariant scalar) to obtain another four-vector, the velocity four-vector, the
components of which are:
dxα
Uα =

(except for photons).
The invariant length (squared) of any velocity four-vector can be shown to be
U · U = c2 .
If one differentiates again with respect to τ , one can show that U · a = 0, where
a is the four-acceleration, the components of which are:
dU α
aα = .

Hence four-acceleration is ‘orthogonal’ to four-velocity.

B.8 Relationship between four- and three-velocities


We consider x-axis motion only, for simplicity. Writing u for the three-velocity,
the components of the four-velocity are
dt c
U0 = c =- = γ(u) c,
dτ 1 − u2 /c2
dx dx/dt
U1 = = = γ(u) u,
dτ dt/dτ
dy
U2 = = 0,

dz
U3 = = 0.

288
Appendix B

In general, if ui = dxi /dt is the three-velocity (where i = 1, 2, 3), then


U α = γ × (c, ui )
is the four-velocity, where γ is a function of |u|.
The Lorentz transformation of the velocity four-vector can be used to derive the
addition law for three-velocities:
u+v
utotal = . (B.6)
1 + uv/c2
This also applies if one of the velocities is c. For example, the headlights on a car
moving on a road at speed u can send photons out at v = c in the car’s reference
frame. Nevertheless, according to a stationary observer beside the road, the
photons’ speed is still c, not v + c. (Try substituting v = c into Equation B.6.)

B.9 Momentum four-vector


This is defined for a massive particle as P α = mU α , where m is the rest mass.
Therefore
dxα
Pα = m ,

so
P α = mγ × (c, ui ).

B.10 Force four-vector


Non-relativistically, force is the rate of change of momentum, so we define the
four-force vector components as
C D
α dP α d dxα
F = = m = maα .
dτ dτ dτ
From this one can show that
C 0 D
dP i
F = , γf ,

where f i is the relativistic force three-vector:
C D
i d dxi d1 (
f = γm = γmui .
dτ dτ dτ

B.11 E = mc2
To reach Einstein’s famous equation, we start from U · a = 0:
E E
0= ηαβ U α maβ = ηαβ U α F β
α,β α,β
dP 0 dt d(γmc) dxi i
= U0 − U i γf i = c − γf .
dτ dτ dτ dτ
289
Appendix B

Using the chain rule, we get


dt dt d(γmc) dt dxi i
0=c − γf .
dτ dτ dt dτ dt
But dt/dτ = γ, so
d(γmc) dxi i
0 = cγ 2 −γ γf
dt dt
thus
dxi d1 (
fi = γmc2 .
dt dt
In non-relativistic physics, we have
dxi dE
fi = ,
dt dt
where E is the energy. Therefore we choose to identify energy with
E = γmc2 + a constant, and we can assume that the constant is zero
without loss of generality. One can show using a Taylor series that
E(u) = mc2 + 21 mu2 + higher-order terms starting with the order (mu4 /c2 ).
The second term here is the non-relativistic kinetic energy, 12 mu2 . For a stationary
object, u = 0, so it has energy E = mc2 . Rest mass therefore has an equivalent
energy. Also, P = (E/c, mγui ). By considering the invariant length of the
momentum four-vector, one can show that
E 2 = p2 c2 + m2 c4 , (B.7)
where p is the magnitude of the relativistic three-momentum, pi = mγui .

B.12 Photons
Photons have zero rest mass, but nevertheless carry momentum and energy
consistent with Equation B.7: E = pc. The four-velocity is not defined for a
photon, but the four-momentum is (E/c, px c, py c, pz c), where px is the
x-component of the relativistic three-momentum, and so on. The invariant
interval δs along any two points on a light ray is always zero.
One curious and little-known aspect of special relativity is that it implies Planck’s
famous formula E = hν, but doesn’t give a value for h. If we consider a photon
with energy E moving along the x-axis, and Lorentz transform to the frame of
an observer
- also moving along the x-axis with speed v, one can show that
E % /E = (c + v)/(c − v). This is the relativistic Doppler shift (but don’t
confuse it with cosmological redshift in Chapter 1).
Alternatively, we could consider a monochromatic plane wave. We can define a
wave four-vector as k = (ω/c, kx , ky , kz ), where the three-vector (kx , ky , kz )
points along the direction of the wave, and ω = 2πν, where ν is the frequency. (To
see why k must be a four-vector, note that the phase
J φ must be an invariant scalar,
and that k · x = φ.) The wavelength is λ = 2π/ kx2 + ky2 + kz2 . A Lorentz
transformation of the -wave four-vector of a wave moving along the x-axis leads
%
eventually to ν /ν = (c + v)/(c − v). Therefore E % /E = ν % /ν, or E ∝ ν.
290
Solutions to exercises

Solutions to exercises
Exercise 1.1 When we calculated that the sky is as bright as the Sun, we
assumed that the line of sight stopped on the star, i.e. stars are opaque. When we
calculated the brightness of the sky for a S −5/2 power law, we integrated down to
zero flux, which (for any particular type of star) means integrating to r = ∞.
So the lines of sight don’t stop on stars in the latter case; stars are treated as
transparent.
Exercise 1.2 We use E = γm0 c2 , where E is the energy, m0 is the
rest mass, γ = (1 − v 2 /c2 )−1/2 , and c is the speed of light. We have that
1020 eV = γc2 × 938.28 MeV/c2 1 γ × 109 eV . The quoted accuracy of the
energy does not justify carrying more than just the first significant figure on the
proton’s rest mass. The γ factor is then just γ = 1020 /109 = 1011 . The cosmic
ray is moving at very close to the speed of light, so it would take about 100 000
years for the proton to cross the Galaxy in the Galaxy’s rest frame. But moving
clocks run slow, so it would take 100 000/γ years in the proton’s rest frame, i.e.
105 /1011 years, or 10−6 years, or about 30 seconds!
Exercise 1.3 First we differentiate Equation 1.7 with respect to time t to get
8πG ! 2 F 2Λc2 RṘ
2ṘR̈ = ρ̇R + 2ρRṘ + , (S1.1)
3 3
where we write ρ = ρm + ρr for brevity and the ‘dot’ notation is used to indicate
differentiation with respect to time, i.e. Ṙ = dR/dt and R̈ = d2 R/dt2 . The
conservation of matter energy gives
d 1 2 3(
ρc R = ρ̇c2 R3 + 3ρc2 R2 Ṙ
dt
d(R3 )
= −p
dt
= −3pR2 Ṙ,
so
ρ̇c2 R3 = −3pR2 Ṙ − 3ρc2 R2 Ṙ.
Equation S1.1 has a term ρ̇R2 , so we rearrange the above to find
−3pRṘ
ρ̇R2 = − 3ρRṘ
c2 C D
3p
= −RṘ + 3ρ .
c2
Substituting this into Equation S1.1 gives
5 C D+
8πG 3p 2Λc2 RṘ
2ṘR̈ = 2ρRṘ − RṘ + 3ρ +
3 c2 3
C D 2
8πGRṘ 3p 2Λc
= 2ρ − 2 − 3ρ + RṘ
3 c 3
C D
−8πGRṘ 3p 2Λc2
= ρ+ 2 + RṘ
3 c 3
C D
3p RṘ 2Λc2
= −8πG ρ + 2 + RṘ.
c 3 3
291
Solutions to exercises

Dividing this by 2Ṙ gives


C D
3p R Λc2 R
R̈ = −4πG ρ + 2 +
c 3 3
C D
3p R Λc2 R
= −4πG ρm + ρr + 2 + ,
c 3 3
as required.
Exercise 1.4 If Λ = 0, then ΩΛ is always zero (Equation 1.17). From
Equation 1.33, we therefore have that (H/H0 )2 = (1 + z)3 when Ωm = 1 and
Λ = 0. Now, Equation 1.28 tells us that
−1 dz
H= ,
1 + z dt
so
C D2
1 1 dz
2 = (1 + z)3 ,
H0 (1 + z)2 dt
which we may write more simply as dz/dt ∝ (1 + z)5/2 , or dt/dz ∝ (1 + z)−5/2 .
Integrating this with respect to z, we get t ∝ (1 + z)−3/2 . But 1 + z = R0 /R, so
t ∝ R3/2 , or
R = αt2/3 , (S1.2)
where α is some constant. In particular, at the current time t = t0 we have
2/3
R0 = αt0 , (S1.3)
and dividing Equation S1.2 by Equation S1.3 gives R/R0 = (t/t0 )2/3 .
Exercise 1.5 We can rearrange Equation 1.35 to read
C D2/3
t
R = R0 . (S1.4)
t0
From Equation 1.12, we have that H = (1/R) dR/dt. Differentiating
Equation S1.4, we get
dR 2 R0 −1/3
= t .
dt 3 t2/3
0
At a time t = t0 , this is just
$
dR $$ 2 R0
$ = .
dt t=t0 3 t0
Therefore the Hubble parameter at a time t = t0 in this model Universe is
$
1 dR $$ 1 2 R0 2
H0 = = = ,
R0 dt $t=t0 R0 3 t0 3t0
or t0 = 2/(3H0 ) as required. Putting in H0 = 72 ± 3 km s−1 Mpc−1 , we find
t0 = 9.1 ± 0.4 Gyr.
Exercise 1.6 The angular diameter in degrees will be inversely proportional
to dA (Equation 1.47), so the angular area (e.g. in square degrees) will vary as
292
Solutions to exercises

θ2 ∝ d−2 2
A . The flux will be inversely proportional to dL (Equation 1.49), i.e.
−2
S ∝ dL . The surface brightness will therefore vary as S/θ2 ∝ d2A /d2L . But
dL = (1 + z)2 dA (Equation 1.50), so surface brightness must vary as (1 + z)−4 .
Exercise 1.7 In Section 1.5 we are given that H0 = 72 ± 3 km s−1 Mpc−1
and ΩΛ,0 = 0.742 ± 0.030. One parsec is 3.09 × 1016 m, so in SI
units, H0 = (2.3 ± 0.1) × 10−18 s−1 . Equation 1.17 relates these two
quantities to Λ: ΩΛ,0 = Λc2 /(3H02 ), so Λ = 3ΩΛ,0 H02 /c2 . Putting in the
numbers,
- we get Λ = (1.3 ± 0.2) × 10−52 m−2 . The horizon size will be
3/Λ = (1.5 ± 0.1) × 1026 m, or 4900 ± 300 Mpc. This cosmological event
horizon will be exceedingly distant; for comparison, the current radius of the
observable Universe in Section 1.9 is about 3.53c/H0 = 14 900 Mpc.
Exercise 2.1 The 13.6 eV photon does ionize another atom. However, the
process of recombination needn’t result in the emission of just one photon.
Sometimes the electron will bind first in a high energy state (releasing one photon
with an energy < 13.6 eV), then release the remaining energy in stages as the
electron drops down the energy levels of the hydrogen atom. Each of these stages
will involve the release of a photon, but none of these photons will have enough
energy on its own to ionize hydrogen atoms.
Exercise 2.2 We are given that T = 2.725 ± 0.001 K, so the energy density
must be ρr,0 c2 = 4σT 4 /c = 4 × 5.67 × 10−8 × 2.7254 /(3.00 × 108 ) joules per
cubic metre, i.e. ρr,0 c2 = 4.17 × 10−14 J m−3 , or mass-equivalent density of
ρr,0 = 4.64 × 10−31 kg m−3 . Applying Equation 1.16, and remembering that
H0 = 100h km s−1 Mpc−1 = 3.24 × 10−18 h s−1 , we find that
8πG ρr,0
Ωr,0 = = 2.47h−2 × 10−5 .
3H02
So
Ωr,0 h2 1 2.5 × 10−5 ,
as required.
Exercise 2.3 The matter energy density scales as R−3 , while the
photon/neutrino energy density scales as R−4 . Therefore from Equations 1.15
and 1.16, Ωr /Ωm = (1 + z) Ωr,0 /Ωm,0 . From Exercise 2.2 and the text following
it, we have that Ωr,0 h2 1 4.2 × 10−5 (TCMB,0 /2.725 K)4 . The epoch of
matter–radiation equality must by definition satisfy Ωr /Ωm = 1, so
Ωm,0
1 + zeq =
Ωr,0
h2
= Ωm,0 (TCMB,0 /2.725 K)−4
4.2 × 10−5
1 23 800 Ωm,0 h2 (TCMB,0 /2.725 K)−4 ,
as required. Using Ωm,0 = 0.268, h = 0.704 and TCMB,0 = 2.725 K gives
zeq 1 3160.
Exercise 2.4 The analysis is the same up to Equation 1.30, where ρ this time
is ρr . However, instead of ρ = ρ0 × R03 /R3 , we must also take into account the

293
Solutions to exercises

fact that photons lose energy from redshifting, so ρr = ρ0 × R04 /R4 . With Λ set to
zero, the equivalent of Equation 1.32 comes out as
C D2
H 1 (
= (1 + z)2 1 − Ωr,0 + Ωr,0 (1 + z)2 ,
H0
and inserting Ωr,0 = 1 and using H 2 = (1 + z)−2 (dz/dt)2 , we find that
C D2
dz
= H02 (1 + z)6
dt
so
dz d(1 + z)
= = H0 (1 + z)3 .
dt dt
Now the dimensionless scale factor a is related to redshift via a = 1/(1 + z), so
we could write this as
d(a−1 )
= H0 a−3
dt
thus
da
−a−2 = H0 a−3
dt
hence
a da ∝ dt.
Integrating this gives a2 ∝ t, or a ∝ t1/2 as required.
Exercise 2.5 ! is measured in J s. A Joule has dimensions of energy (like
1 2 2 −2
2 mv ) so it has dimensions ML T , where we write M for the dimension of
mass, L for length, and T for time. (Note that numerical constants are ignored
in dimensional analysis.) Therefore we can write the dimensions of ! as
[!] = ML2 T−1 . Similarly, the dimensions of c are [c] = LT−1 . To find the
dimensions of G, we can start with the familiar equation F = GM m/r2 , and note
that force is mass times acceleration, so ma = GM m/r2 or G = ar2 /M , so the
dimensions of G are [G] = LT−2 L2 /M = M−1 L3 T−2 . Now let’s suppose that the
Planck time is given by a formula of the form !x cy Gz , where the constants x, y
and z are to be determined. The result must have the dimensions of time, so
1 (x 1 −1 (y 1 −1 3 −2 (z
T = ML2 T−1 LT M L T .
Multiplying this out and rearranging gives
T = Mx−z L2x+y+3z T−x−y−2z .
The left-hand side has no mass M, so x − z must equal zero, i.e. x = z. The
left-hand side also has no length L, so 2x + y + 3z = 0. The left-hand side has
exactly one power of T, so −x − y − 2z = 1. We have three simultaneous
equations for three unknowns. Substituting in x = z into the other two equations
gives 5x + y = 0 and −3x − y = 1. Therefore y = −1 − 3x = −5x, or
x = 1/2. Since x = z, we have z = 1/2. Finally, any of the equations involving y
imply that y = −5/2. Therefore- the characteristic time must be of the form
x y z
! c G =! c 1/2 −5/2 1/2
G = !G/c5 , as required.

294
Solutions to exercises

Exercise 2.6 We have already that (1/R) d2 R/dt2 = α(α − 1)t−2 . Since t is
positive and α > 1, the right-hand side must be positive. Therefore the left-hand
side must also be positive. Since R is also positive, d2 R/dt2 > 0.
Exercise 2.7 We start with
3H φ̇ = −V % (Eqn 2.23)
and then use

H2 = V. (Eqn 2.24)
3m2Pl
Now the H dt term in the integral in the question can also be expressed as
dt dφ
H dt = H dφ = H .
dφ φ̇
Next we use Equation 2.23 to get
dφ dφ
H dt = H %
= −3H 2 % .
(−V /3H) V
Finally, using Equation 2.24 this comes out as
C D
−8π V
H dt = 2 dφ,
mPl V %
so we reach the required integral:
*
−8π φ1 V
N= 2 dφ.
mPl φ2 V %
For the next part, we set V % 1 V /φ and φ1 = 0 (as advised in the question) to
write this as
*
−8π 0 V
N= 2 φ dφ.
mPl φ2 V
Evaluating this integral gives
C √ D2
4π 2 2 πφ2
N = 2 φ2 = .
mPl mPl
√ √
Thus to have N > 60 we need φ2 > mPl 60/(2 π), or in other words,
φ2 > 2.2mPl .
Exercise 2.8 No, not immediately. At first the CMB will appear very uniform,
as you receive light from only your immediate neighbourhood. As time progresses
you will receive light from larger and more distant parts of the Universe. You’ll
only be able to see the structures with wavelength λ once light has had time to
travel the distance λ, i.e. after a time δt = λ/c, where c is the speed of light. The
size of the largest acoustic peak is set by the sound horizon after inflation. Once
light has had time to travel this distance, all the acoustics will start to become
visible. Also, the acoustic peaks will have a different angular size on the sky,
because the surface of last scattering was closer. Finally, the CMB wouldn’t have
peaked at microwave wavelengths then, so perhaps we shouldn’t call it the CMB
then!
295
Solutions to exercises

Exercise 2.9 We found in Section 2.7 that the particle horizon radius
√ at
recombination was 2c/H = 0.46 Mpc. The sound speed is√cs = c/ 3, so the
sound horizon will be 2cs /H = (2c/H) × (cs /c) = 0.46/ 3 Mpc = 0.27 Mpc.
Exercise 2.10 Dark matter clumps through gravitation, while dark energy
appears to be smoothly distributed through space. Dark matter is also essentially
pressureless, with Ωm dominated by the rest mass of the dark matter particles,
while dark energy has a strong negative pressure. Dark matter makes up about
20% of the total energy density of the Universe, and at recombination made up
about 70%. Dark energy, meanwhile, was negligible at recombination and yet
dominates the present-day energy density of the Universe. (One hopes that it
will soon be possible to add that the dark matter particle has been directly
detected, though that is not yet true at the time of writing; certainly, the proposed
particle physics mechanisms for generating dark matter and dark energy are very
different.)
Exercise 2.11 One parsec is about 3.09 × 1016 m, so
H0 = 72 × 103 /(106 × 3.09 × 1016 ) 1 2.33 × 10−18 s−1 . In Chapter 1 we saw
that ΩΛ,0 = Λc2 /(3H02 ), so Λ = 3ΩΛ,0 H02 /c2 . Putting in the numbers gives
Λ = 1.3 × 10−52 m−2 .
Exercise 3.1 The luminosity contributed by a shell of radius r → r + dr will
be I(r) times the area of) the shell, 2πr dr. Summing these shells, the total

luminosity will be L = 0 I(r) 2πr dr. Let’s define L0 to be the luminosity with
I0 = r0 = 1, i.e.
* ∞
L0 = f (r) 2πr dr.
0
Now let’s calculate the luminosity in the more general case:
* ∞ C D
r
L= I0 f 2πr dr
0 r0
* ∞ C D C D
2 r r r
= I 0 r0 f 2π d .
0 r0 r0 r0
But this integral has the same form as the integral defining L0 , which also
integrates from 0 to ∞, so L = I0 r02 L0 .
Exercise 3.2 A shell of thickness dr and radius r will have mass
dM = 4πr2 ρ dr. The gravitational potential energy of this shell will be
−G M (< r) dM
dEGR = , (S3.1)
r
where M (< r) is the mass enclosed within a radius r, i.e.
M (< r) = 43 πr3 ρ,
and the mass of the shell is
dM = 4πr2 ρ dr.
Substituting this into Equation S3.1 gives
−G 43 πr3 ρ
dEGR = dM = −G 43 πr2 ρ × 4πr2 ρ dr
r
296
Solutions to exercises

so
#4 2
!2
dEGR = −3G × 3 πr ρ dr.
Integrating this from radius 0 to radius R gives
$ R
# 4 2 !2 # 4 !2 R5
EGR = −3G 3 πr ρ dr = −3G 3 πρ
0 5
−3G # 4 3 !2
= πR ρ
5R 3
3GM 2
=− ,
5R
where M = 34 πR3 ρ is the total mass of the sphere.
Exercise 3.3 The kinetic energy will be EK = 32 N kT , where N is the number
of gas particles. Virial equilibrium is 2EK = −EGR , i.e.
3 GM 2
3N kT = .
5 R
The requirement for gravitational collapse is therefore
3 GM 2
3N kT < .
5 R
To reach Equation 3.7, we need to eliminate N and R. To a good approximation,
at recombination we can assume that the gas particle masses are the proton
mass mp , so the number of particles must be N = M/mp . We can also use
M = 34 πρR3 to eliminate R, since R = (3M/4πρ)1/3 . Inserting these
substitutions gives
" %1/3
M 3 2 4πρ
3 kT < GM ,
mp 5 3M
which when rearranged in terms of M gives the required equation.
The current temperature of the CMB is about 2.7 K, and the redshift of
recombination is about z = 1000, so the photon temperature at recombination
must be T = 2.7(1 + z) ! 3000 K . Matter and radiation will just have been in
thermal equilibrium, so this will have been the matter temperature too. The
baryonic density will be proportional to (1 + z)3 , and using Equation 1.26 and
ρb = Ωb ρcrit (Equation 1.22), we have that the baryonic density at z = 1000 will
be
ρb = ρb,0 (1 + z)3
= ρcrit × Ωb,0 (1 + z)3
= 1.8789 × 10−26 × Ωb,0 h2 (1 + z)3 kg m−3
= 1.8789 × 10−26 × 2.273 × 10−2 × (1 + 1000)3 kg m−3
! 4.3 × 10−19 kg m−3 .
Putting in the numbers gives
" %3/2 " %1/2
5 × (1.381 × 10−23 J K−1 ) × 3000 K 3
M> ×
(6.673 × 10−11 N m2 kg−2 ) × (1.673 × 10−27 kg) 4π × 4.3 × 10−19 kg m−3
! 2 × 1036 kg ! 106 M" as required.
297
Solutions to exercises

Exercise 3.4 For a flat universe, the comoving distance is the same as the
proper motion distance (Equation 1.56). This isn’t true in general (watch out!) but
it’s true in a flat universe. The proper motion distance is related to the angular
diameter distance dA by Equation 1.50, which gives dA = dcomoving /(1 + z). The
definition of angular diameter distance in Equation 1.47 gives us a relationship
between the size of an object as it was at the time of redshift z and the angular
size as it appears today. The proper size of the BAO wiggles is just the comoving
size divided by (1 + z), i.e. LBAO /(1 + z). The angular diameter distance to
redshift z is therefore dA = (LBAO /(1 + z))/θBAO . The comoving distance to
redshift z must therefore be dcomoving = dA × (1 + z) = LBAO /θBAO , as required.
Exercise 3.5 Here the trick is to use Equation 1.43. It follows from that
relation that a small comoving interval along the redshift axis must equal
δdcomoving = c δz/H(z). Setting this comoving interval to LBAO gives us
LBAO = c δz/H(z), so H(z) = c δz/LBAO , as required.
Exercise 3.6 No. The amplitude of the fluctuations could depend on the bias,
but the scale length itself is bias-independent.
Exercise 4.1 First, we need to get Equation 1.7 into a form where the only
time-dependent parameter is R. The density ρ is time-dependent and varies as
ρ = ρ0 (R0 /R)3 (where subscript 0 indicates present-day values), so we have
" % " %3
dR 2 8πG R0 8πGρ0 R03 −1
= ρ0 R2 − c2 = R − c2
dt 3 R 3
(where we’ve used k = +1). If we set dR/dt = 0 and solve, we find that
Rmax = R = 8πGρ0 R03 /(3c2 ). Therefore
" %
dR 2 Rmax 2
= c − c2 .
dt R
Using the chain rule we have that
" % " % " % " % " %
dR 2 dR 2 dt 2 dR 2 R 2
= =
dθ dt dθ dt c
and so
" % " %2 " %
dR 2 R Rmax 2
= c − c = Rmax R − R2 ,
2
dθ c R
as required.
We’re asked to verify that Equation 4.2 works rather than proving it, so all we
have to do is substitute it in. Differentiating Equation 4.2 with respect to θ gives
dR Rmax
= sin θ
dθ 2
so
" %2
dR 2
Rmax R2 # !
= sin2 θ = max 1 − cos2 θ .
dθ 4 4

298
Solutions to exercises

Meanwhile,
2
Rmax 2
Rmax
Rmax R − R2 = (1 − cos θ) − (1 − cos θ)2
2 4
2
Rmax R2 1 (
= (2 − 2 cos θ) − max 1 + cos2 θ − 2 cos θ
4 4
2
Rmax 1 (
= 2 − 2 cos θ − 1 − cos2 θ + 2 cos θ
4
2
Rmax 1 (
= 1 − cos2 θ ,
4
which equals (dR/dθ)2 as above.
Finally, we just need to differentiate Equation 4.3, which gives
dt Rmax R
= (1 − cos θ) = ,
dθ 2c c
as required.
Therefore Equations 4.2 and 4.3 are a solution.
Exercise 4.2 To show this, we’ll first get things in terms of H. It’s a flat
matter-dominated universe, so Ωm = 1 = 8πGρm /(3H 2 ), thus 4πGρm = 3H 2 /2.
We also know that H(t) = ȧ/a. Substituting this into Equation 4.9, we have
δ̈ + 2H(t) δ̇ = 3H 2 (t) δ/2.
Next we use H(t) = 2/(3t) to reformulate this in terms of a differential equation
involving just δ and time:
C D
4 3 2 2 2
δ̈ + δ̇ = δ = 2 δ.
3t 2 3t 3t
Next, let’s try power law solutions δ = btc where b and c are constants. Then
δ̇ = bctc−1 and δ̈ = bc(c − 1)tc−2 . Substituting in, we find
4 2
bc(c − 1)tc−2 + bctc−1 = 2 btc .
3t 3t
Collecting the terms together, we find that
bc(c − 1)tc−2 + 34 bctc−2 = 23 btc−2 ,
and dividing through by btc−2 gives
c(c − 1) + 43 c = 32 .
The solution to this quadratic equation is c = 2/3 or c = −1. The −1 solution is
known as the decaying mode, and is not physically relevant in this universe (it
decays more rapidly than the growing mode grows and is quickly negligible). The
2/3 power law time-dependence (which we found ultimately from linearized fluid
dynamic equations) is identical to Equation 4.8, which is why the latter is known
as the linear theory.
Exercise 4.3 The redder colour will be the one with the larger V-band
to B-band flux ratio SV /SB . The fluxes are related to the magnitudes by

299
Solutions to exercises

V = −2.5 log10 SV + cV and B = −2.5 log10 SB + cB , where cV and cB are


constants (not necessarily identical). Therefore
(B–V) = −2.5 log10 SB + cB + 2.5 log10 SV − cV
= −2.5(log10 SB − log10 SV ) + (cB − cV )
= −2.5 log10 (SB /SV ) + (cB − cV )
= 2.5 log10 (SV /SB ) + (cB − cV ),
which gives
2.5 log10 (SV /SB ) = (B–V) − (cB − cV )
so
log10 (SV /SB ) = (B–V)/2.5 − (cB − cV )/2.5
thus
(SV /SB ) = 10(B–V)/2.5−(cB −cV )/2.5
= 10(B–V)/2.5 × 10−(cB −cV )/2.5
= 10(B–V)/2.5 × constant.
Therefore the larger the value of (B–V), the larger the value of SV /SB . Therefore
(B–V) = 1 is redder than (B–V) = 0.
Exercise 4.4 We haven’t specified the geometry yet, so let’s keep things
simple. Let’s take the dust and stars to be in a cylinder facing us, with
cross-sectional area A. Let’s set the length of the cylinder to be h, and measure
distances along this length with the variable x. An infinitesimal layer would have
thickness dx and volume A dx. The bigger the volume, the more stars it will
contain, so let’s set the luminosity of the shell to be dL = ρA dx, where ρ is a
constant (the luminosity density). By the time the light emerges from the end of
cylinder, it will have been extinguished by a factor of eτ (x) , where τ (x) is
the optical depth at a distance x into the cylinder. This optical depth must be
proportional to x, because each increment δx will suppress the light by the same
factor, which we could write as eδτ , so let’s write that as τ = kx. We could,
for example, write the optical depth from one end of the cloud to the other as
τ total = kh. The light that emerges from the shell at x → x + dx will therefore be
dLout = L × e−τ (x) = ρA dx × e−kx . If we integrate that from x = 0 to x = h,
we get
* h
ρA ! F
Lout = ρA e−kx dx = 1 − e−kh .
x=0 k
Some quick checks: note that k has dimensions of one over length (because
τ = kx and τ is dimensionless), so A/k has dimensions of volume, and so ρA/k
is luminosity density times volume, which is a luminosity. Note also that kh is
dimensionless.
Now, what would happen if there were no dust? The luminosity would just be
Lno dust = ρAh. The dust has therefore reduced the output luminosity by a factor
Lout ρA/k ! F 1 ! F
= 1 − e−kh = 1 − e−kh .
Lno dust ρAh kh
300
Solutions to exercises

This ratio is independent of the geometrical cross section A and of the luminosity
density ρ. If the cloud is deep enough, then the term in brackets is 1 1, so we just
have Lout /Lno dust = 1/(kh). We can now write this for Hα light:
Lout (Hα) 1
= .
Lno dust (Hα) kHα h
For Hβ, we have that τ Hβ 1 1.45 τ Hα , so kHβ = 1.45 kHα , thus
Lout (Hβ) 1 1 1 Lout (Hα)
= = = .
Lno dust (Hβ) kHβ h 1.45 kHα h 1.45 Lno dust (Hα)
Therefore
Lout (Hα) Lno dust (Hα)
= 1.45 . (S4.1)
Lout (Hβ) Lno dust (Hβ)
This is independent of h, so we’ve now removed all dependence on the geometry.
So even if kh is enormous and Lout * Lno dust , the luminosity ratio of Hα and Hβ
is only ever 1.45 times the ratio that you get with no dust, when enough dust is
evenly mixed with the gas emitting the emission lines.
Now suppose that you wrongly assumed that it’s a simple dust screen with an
optical depth of τ Hα for Hα and τ Hβ = 1.45 τ Hα for Hβ. Your luminosities
would be
Lout (Hα) = Lno dust (Hα) × e−τ Hα ,
Lout (Hβ) = Lno dust (Hβ) × e−1.45 τ Hα ,
so the luminosity ratio would be
Lout (Hα) Lno dust (Hα) 0.45 τ Hα
= e . (S4.2)
Lout (Hβ) Lno dust (Hβ)
Comparing this to Equation S4.1, we have 1.45 = e0.45 τ Hα , or
τ Hα = ln(1.45)/0.45 1 0.83. Since τ Hα 1 0.7AV , we have AV 1 1.2. So, if you
have an optically-thick cloud in which the dust is well-mixed with the gas, but you
wrongly assumed a foreground dust screen, you’d infer a V-band extinction of just
1.2 magnitudes, regardless of what the real extinction τtotal is from one end of the
cloud to the other.
Exercise 4.5 Astronomical absolute magnitudes are defined as
m = −2.5 log10 L + constant, so
d(ln L) −2.5 1
dm = −2.5 d(log10 L) = −2.5 = dL. (S4.3)
ln 10 ln 10 L
Therefore
dN − ln 10 dN
= L . (S4.4)
dm 2.5 dL
The − sign just indicates that the magnitude increment dm is in the opposite
sense to the luminosity increment dL, and is usually neglected.
Exercise 4.6 The variance of a probability distribution p(x) is the mean of the
squares minus the square of the mean, i.e.
* 1 C* 1 D2
2
Var(x) = x p(x) dx − x p(x) dx .
0 0
301
Solutions to exercises

Now, our probability distribution is uniform, so p(x) = 1 for all x from 0 to 1,


hence this is just
* 1 C* 1 D2
2
Var(x) = x dx − x dx
0 0
: Lx=1 &C D2 @x=1
x3 x2
= −
3 x=0 2
x=0
1 1 1
= 3 − 4 = 12 ,

as required. The standard deviation is the square root


√ of the variance, so the
standard deviation of the uniform distribution is 1/ 12. The central limit theorem
states that if you have N measurements, each with an uncertainty σ (i.e. taken
from the same distribution with standard deviation√σ), then the standard deviation
of the mean average of these measurements is σ/ N . Now, if our null hypothesis
holds, then V /Vmax is uniformly distributed, so each√measurement of V /Vmax is
taken from a distribution with standard deviation 1/ 12. Therefore√ the standard
deviation of the average N measurements of V /Vmax must be 1/ 12N , as
required.
Exercise 4.7 Yes, provided that the selection function has been correctly stated.
Exercise 4.8 No, not necessarily. Suppose that you had a volume-limited
sample with Vmax = V (zmax ) for all galaxies. Now suppose that half the galaxies
exist at exactly z = 0, half are at z = zmax , and there are none in between.
Clearly, the numbers of galaxies are evolving very strongly and discontinuously,
but (V /Vmax / = 1/2.
Exercise 4.9 The amount of light emitted per unit volume will be given by the
number density of galaxies multiplied by their luminosity, i.e. L × φ(L). At
luminosities far below the break, φ(L) ∝ L−α , so L φ(L) ∝ L1−α . Since we’re
given that the faint-end slope α satisfies α < 1, this must be increasing with
luminosity. At the bright end we have that φ ∝ exp(−L/L∗ ), which tends to zero
faster than 1/L, so L φ(L) (which is proportional to L exp(−L/L∗ )) must also
tend to zero. We’d expect one turning point — but where? We can differentiate
L φ(L), set the result equal to zero and rearrange. This gives
d(Lφ)
= φ∗ (−e−L/L∗ (L/L∗ )−α+1 + (1 − α)e−L/L∗ (L/L∗ )−α ) = 0.
dL
Dividing by φ∗ e−L/L∗ gives (L/L∗ )1−α = (1 − α)(L/L∗ )−α . Further dividing
by (L/L∗ )−α gives L/L∗ = 1 − α, or L = (1 − α)L∗ . The galaxies that dominate
the cosmic luminosity density are therefore those with luminosities of (1 − α)L∗ .
Exercise 4.10 PDE is vertical translations, while PLE is horizontal translations.
Exercise 4.11 Active galaxies can be seen to much higher redshifts than the
elliptical galaxies used in the Tolman test in Chapter 3, and as the predicted
redshift-dependence of surface brightness is strong, i.e. (1 + z)4 , it might appear
that the radio lobes of radiogalaxies have a strong advantage. The attraction of the
Tolman test is that the (1 + z)4 surface brightness prediction is independent of
the cosmological parameters. In order to apply it, we need a population of
objects whose luminosity per unit area (in, for example, square parsecs) is
302
Solutions to exercises

constant. In this case, rearranging the relation in the question gives us


L/r2 ∝ Q7/6 r−4/3 ρ7/12 . We might hope to find active galaxies with the same
Q on average if we match other properties of the central engine (e.g. optical
emission lines and continuum) on average. We might also be able to calibrate out
any variations in density through other observations as indicated in the question,
but we’re still left with a surface brightness that depends on the linear size of the
system. Without additionally having a standard rod as a comparison, we can’t
apply the Tolman test as it stands.
Exercise 4.12 There are 60 × 60 = 3600 arcseconds in a degree, so
there are 36002 1 1.30 × 107 square arcseconds in a square degree.
Therefore the number of random 5σ noise spikes in one square degree would
be (1.30 × 107 )/(3.5 × 106 ) 1 3.7. So we’d expect one 5σ noise spike in
1/3.7 square degrees, or about 0.27 square degrees. In practice, noise spikes can
occur more frequently than this for a variety of reasons (including instrumental
effects).
Exercise 4.13 Suppose that your camera or detector covers an area A on the
sky. Let’s say that you invest all your time in a pencil-beam survey, and it reaches
a flux S. The number counts are Euclidean, so N (> S) = kS −1.5 , where k is
some constant. Therefore the number of galaxies seen in the pencil-beam survey is
npencil = A × N (> S) = AkS −1.5 .
Now suppose that instead of doing a pencil-beam survey, you spread your
integration time over m fields of view, each of which has area A. The total√area
that you cover is m × A, but the images would be shallower by a factor of m, so
the total number of galaxies in the wide-field survey would be

nwide = mAk( mS)−1.5 = mAkm−0.75 S −1.5 = m0.25 AkS −1.5 .
Comparing this to npencil , we see that nwide = m0.25 npencil , so the wide-field
survey finds more galaxies by a factor of m0.25 .
A similar calculation shows that if the source counts are steeper than
N (> S) ∝ S −2 , then the pencil-beam survey would see more. However, only
rarely are source counts that steep (we’ll see an example in Chapter 5). In the vast
majority of cases, wide-field surveys find more objects in a given observing time
than pencil-beam surveys. In practice, though, there’s often a limit to how wide
you can make a survey, because the time spent simply moving the telescope or
reading out the detector becomes significant (we’ve neglected both effects here).
Exercise 5.1 Iν dν is the background intensity in an interval ν → ν + dν.
The background per decade is the background in a logarithmic interval,
ν → ν + d log10 ν. Let’s write this as B d log10 ν. If we can set this
equal to something times dν, then that something must be Iν . Now,
d log10 ν = (d ln ν)/ ln(10), so B d log10 ν = (B/ ln(10)) d ln ν. But
d ln ν = (1/ν) dν, so
1
B d log10 ν = B dν.
ν ln(10)
Therefore
1
Iν = B ,
ν ln(10)
303
Solutions to exercises

so B = ln(10) νIν . Therefore the background intensity per decade of frequency is


proportional to νIν . Looking at Figure 5.1, we see that the far-infrared bump has a
similar height and area to the optical/near-infrared bump, each over roughly
the same logarithmic frequency interval of about Δ log10 ν = 1.5. Therefore
there’s about the same energy output in the far-infrared bump as in the
optical/near-infrared bump.
Exercise 5.2 This will be the one in which Sν dN/d ln Sν is a maximum, and
since d ln Sν = Sν−1 dSν , we can also express this as Sν2 dN/dSν . This is similar
(though not quite identical) to Figure 5.2.
Exercise 5.3 The angular resolution in radians is
1.22λ/D = 1.22 × 500 × 10−6 /3.5 = 1.7429 × 10−4 (we’ll carry some
extra significant figures until the end of the calculation). In degrees this
is 1.7429 × 10−4 × 360◦ /(2π) = 0.009 985 8◦ . In arcseconds this is
0.009 985 8 × 3600 = 35.95%% , or 36.0%% to the accuracy of the initial numbers.
Exercise 5.4 (a) The fractional range would be 0.09/0.15 = 0.6 or 60%,
which we could also quote as a possible variation of a factor of 1/0.6 = 1.7.
(b) The variation in β changes the extrapolation from the 800 µm quoted to the
rest frame, which is 850/(1 + z) µm = 850/4 µm = 212.5 µm. The wavelength
dependence is λ−β , so
C D
kd (800 µm) 800 −β
= = 3.765−β ,
kd (212.5 µm) 212.5
i.e. 0.0705–0.2656 when β = 1–2, or a further variation of a factor of 3.8. The
total variation so far is 1.7 × 3.8 1 6.5.
(c) Using the black body spectrum given in Equation 2.2
and putting in the numbers for a wavelength of 212.5 µm (i.e.
ν = c/212.5 µm = 2.998 × 108 m s−1 /(212.5 × 10−6 m) = 1.411 × 1012 Hz) and
temperatures of T = 20 K and 40 K, we find that
B(1.411 THz, 40 K) exp(hν/kT1 ) − 1
=
B(1.411 THz, 20 K) exp(hν/kT2 ) − 1
exp(6.626 × 10−34 J s × 1.411 × 1012 Hz/1.381 × 10−23 J K−1 × 20 K) − 1
=
exp(6.626 × 10−34 J s × 1.411 × 1012 Hz/1.381 × 10−23 J K−1 × 40 K) − 1
= 6.435.
The range of allowed temperatures therefore gives an additional fractional range
of 6.4, so the total fractional range is 1.7 × 3.8 × 6.4 1 41, i.e. we cannot even
quote a dust mass to within an order of magnitude!
However, if we measure fluxes at more wavelengths, we might be able to reduce
these uncertainties by constraining the value of β on the Rayleigh–Jeans tail, and
determining the temperature from the wavelength λmax of the location of the peak
of the spectral energy distribution. This is quantified with the Wien displacement
law, which can be expressed in astrophysically-useful quantities as
λmax 20 K
= 1.45 .
100 µm T
There is, however, still the issue that galaxies do not have single temperatures.
304
Solutions to exercises

Exercise 5.5 Suppose that there were no background. In some fixed


observing time, suppose that we collect N photons from a distant object. Using
Poisson statistics, the variance on√this number will also be N , so the standard
deviation the noise) will be N . The signal-to-noise ratio will therefore
√ (i.e. √
be N/ N = N . Now suppose that there’s a strong background, so
√ observe N +√Nback photons,
we √ with Nback 0 N . The noise on this will be
N + Nback 1 Nback 0 N . What we want is N and not N + Nback , so we
have to observe an additional blank bit of sky to estimate Nback . This can be done
if we have a small object in our camera, so we can use blank bits of the image, but
if our detector has only one or a small number of pixels, we have to spend extra
time observing blank sky. However, even neglecting√the uncertainty on our Nback
estimate, we√still have a signal-to-noise ratio of N/ Nback , which is much less
than the N/ N that we’d have in the case of no background. So once Nback ≥ N
we enter the background-limited regime where good signal-to-noise is harder to
get. In the case of the SCUBA camera, the faintest objects are 1 105 –106 times
fainter than the sky background. Worse, the background varies on timescales of
less than a second, so observing techniques at submm wavelengths are often
geared towards making the best background subtraction.
Exercise 5.6 See Figure S5.1.

1012
luminosity/(L! sr−1 )

1011
Arp 220

1010

M82

109

108
0 1 2 3
redshift, z

Figure S5.1 This is the same as Figure 5.15, but with the approximate location
of one possible flux limit marked as a thick black line.

Exercise 6.1 Suppose that we wanted to separate a human being into protons
and electrons, then hold them one metre apart. For a 60 kg mass, the force
required would be F = (ne)2 /(4πε0 r2 ), where r = 1 m, n = 60 kg/mp and
ε0 is the vacuum permittivity of free space. This comes out as a gigantic
F 1 3 × 1029 kg m s−2 . The luminosity of the Sun is L) = 3.83 × 1026 W, so the
momentum flux from the Sun is L) /c = 1.28 × 1018 kg m s−2 . If we could
305
Solutions to exercises

employ all the momentum flux from all the 1 1011 stars in the Galaxy in keeping
the positive and negative parts separate, it would be just sufficient to maintain a
1 m separation for just 60 kg. The potential barrier for separating the charged
components of a plasma accreting around a black hole is clearly insuperable for
radiation pressure.
Exercise 6.2 Putting the numbers into Equation 6.6 gives
4π × (6.67 × 10−11 N m2 kg−2 ) × (3.00 × 108 m s−1 ) × (1.99 × 1030 kg) × (1.67 × 10−27 kg)
LE =
6.65 × 10−29 m2
31
= 1.26 × 10 W.
The luminosity of the Sun is 3.83 × 1026 W, which is far below the Eddington
limit.
Exercise 6.3 Assuming that the mass of a 100 W light bulb is (say) about 50 g,
we get an Eddington limit of just 0.2 W . Clearly, a light bulb radiates at much
more than the Eddington limit. Light bulbs don’t blow themselves apart because
they are not gravitationally bound.
Exercise 6.4 To obtain Equation 6.26 we start with Equation 1.53, then use
Equation 1.41. It immediately follows that
4πc dz c dz
dV = d2A (1 + z)3 = 4πd2A (1 + z)2 .
(1 + z) H(z) H(z)
(We ignore the − sign, which just refers to the directions in which the
infinitesimal increments are measured.) Next, putting in the relationship between
angular diameter and luminosity distance, dL = (1 + z)2 dA (Equation 1.50), gives
4πd2L 2 c dz 4πd2L c dz
dV = (1 + z) = .
(1 + z)4 H(z) (1 + z)2 H(z)
Dividing by dz and multiplying by H0 /H0 gives
dV 4πcd2L c 4πd2L
= = ,
dz (1 + z)2 H(z) H0 (1 + z)2 H(z)/H0
as required.
We can rearrange this as
4πd2L H(z)
= (1 + z)2 .
dV /dz c
Finally, we use Equation 1.28: |dz/dt| = (1 + z) H(z) (again we’ll not worry
about the sign). Therefore
4πd2L 1
dt = (1 + z) dz,
dV /dz c
which is Equation 6.24, as required.
Exercise 6.5 The angular radius θ will satisfy θ 1 tan θ = rh /D,
where D = 10 Mpc and rh is given by Equation 6.29:
rh = 10 × (108 /108 ) × (220/200)−2 pc = 8.3 pc. Plugging in the numbers, we
have θ 1 r/D = 8.3 pc/10 Mpc = 8.3 × 10−7 radians. In arcseconds this is
306
Solutions to exercises

θ = 8.3 × 10−7 × (360◦ /2π) × 60 × 60 = 0.17%% (or double that for the diameter).
This is clearly smaller than the seeing limit of ground-based telescopes.
Exercise 6.6 The e-folding timescale for Eddington-limited black hole growth
is te-fold = 4 × 108 × η/(1 − η) yr. There have been 3 × 109 /te-fold e-foldings
since the start of the Universe, or 7.5 × (1 − η)/η e-foldings. To reach 106 M) ,
one needs loge (106 /101 ) = 11.5 e-foldings. If η = 0.42, this is only time for
10.4 e-foldings. In order to grow a black hole large enough, it must be spinning
more slowly and therefore have a lower accretion efficiency.
Exercise 7.1 Comoving distances add, so rS = rL + rLS . Therefore
rLS = rS − rL . In flat space, angular diameter distance is simply comoving
distance divided by (1 + z) (Chapter 1), but in this case we need the redshift of
the background source as seen from the lens. We could write this factor as
(1 + zLS ). This is the factor by which the Universe expanded between the source
redshift and the lens redshift, i.e. RL /RS , where R is the scale factor. But
RL RL /R0 R0 /RS
= =
RS RS /R0 R0 /RL
(where the subscript 0 refers to the present day), so (1 + zLS ) = (1 + zS )/(1 + zL ).
Therefore our final expression for the angular diameter distance DLS is
(1 + zL )
DLS = (rS − rL ) × .
(1 + zS )

Exercise 7.2 First, matching distances along the top of Figure 7.7 shows that
θDS = βDS + αD K LS . But α = αD
K LS /DS , so θDS = βDS + αDS . Dividing
out the scalar DS gives θ = β + α, which we can rearrange to β = θ − α, as
required.
Exercise 7.3 We set β = 0 in Equation 7.8. We can rearrange this to show that
#
4GM DLS
θ= .
c2 DL DS
But what would this look like? The background object is exactly behind the lens
and it’s deflected by an angle θ. Is it deflected to the left or right or up or down?
In fact, there is nothing to give the deflection any particular direction, so the
background source is lensed into a ring. These are very rare, but an example is
shown in Figure S7.1.
Exercise 7.4 β 2 +J
4θE2 is always positive, but the square root of it can be
positive or negative. β 2 + 4θE2 > β unless θE = 0, so the negative root must
always give a negative θ. This is indeed a physical solution and represents an
angle measured in the opposite direction: as shown in Figure 7.7, the image is on
the other side of the lens. Note that one image is at θ > θE and the other is at
θ < θE , unless θ = θE and the system is an Einstein ring.

307
Solutions to exercises

Figure S7.1 The gravitational lens 0038+4133 (an Einstein ring) from the
COSMOS survey, taken by the HST . The image is 15%% by 15%% .

Exercise 7.5 From the previous exercise, a source can have multiple images, so
there is not necessarily a unique image position θ for a given source position β.
In mathematical terms, we would speak of the mapping β → θ as being
one-to-many. However, each image position θ does map in a one-to-one way onto
a source position β, i.e. each image position can correspond to only one position
in the background source. To see why, consider Equation 7.4. The function α(θ)
must be a single-valued function, i.e. any particular input θ can give only one
possible output α. Therefore there can be only one value of β for a given input θ.
Exercise 7.6 We’re asked to differentiate Equation 7.12, which gives
dβ/dθ = 1 + (θE2 /θ2 ). This gives us one of the fractions in Equation 7.16. The
magnification is therefore
C D−1 C D−1 C D−1
θ dθ θ θE2 θE2 θE2
= 1+ 2 =θ θ− 1+ 2
β dβ β θ θ θ
C 2
D−1 C 2
D−1 C D−1
θE θE θE θE2
2 θE4
= 1− 2 1+ 2 = 1+ 2 − 2 − 4
θ θ θ θ θ
C D −1
θ4
= 1 − E4 ,
θ
as required.
Exercise 7.7 A negative magnification means that the image is mirror-reversed.
For example, a positive change dβ would have a corresponding dθ in the opposite
direction, so dθ is negative. Therefore dθ/dβ is negative in Equation 7.16.
Exercise 7.8 We start from Equation 7.24. The mass enclosed is Σπξ 2 and we
set ξ = DL θ:
4GM (ξ) 4G 4G
α
K= = 2 × Σπξ 2 = 2 × Σπ × DL θ.
c2 ξ c ξ c
308
Solutions to exercises

Now,
DLS
α= α
K, (Eqn 7.6)
DS
so
4πGΣ DL DLS
α= θ,
c2 DS
as required.
If we then set Σ = Σcr , we find that α(θ) = θ for any θ, so β = 0. This means
that the gravitational lens is acting as a perfect focusing lens! However, this is a
very special case — gravitational lenses in general do not focus light. As ‘lenses’
in the optical sense, they have all forms of aberration, except of course chromatic
aberration since gravitational lensing is strictly achromatic.
Exercise 7.9 From left to right, they are a saddle point, a maximum and a
minimum.
Exercise 7.10 The time delay of the image at the centre increases. In a diagram
like Figure 7.15, the central panel showing the gravitational time delay would be
acquiring a sharper and higher point in the centre. When the lens potential
becomes a singular isothermal sphere, the time delay becomes infinite, so the
image disappears. Photons would take an infinite amount of time to climb out of
the infinitely-deep potential well, and (by symmetry) spend another infinite
amount of time falling in beforehand. But a more thoughtful answer is that this
deep potential well would form a black hole. Right from Equation 7.1, we’ve been
assuming a weak-field limit, so a better answer is that these simple assumptions
break down as the potential becomes more extreme.
Exercise 7.11 The background objects have the same redshift, so we
could think of the luminosity function as differential source counts, thus
dN/dS ∝ S −α . Therefore the number of objects per unit area brighter than a
flux S0 will be N (> S0 ) ∝ S01−α , which we could write as
N (> S0 ) = kS01−α .
If the background galaxies are gravitationally magnified by a factor of µ, the
intrinsic fluxes will be Sintrinsic = S/µ, while the comoving volume sampled will
be smaller by a factor of 1/µ. Therefore the number of galaxies brighter than an
observed flux S0 will be
C D
k S0 1−α
Nlensed (> S0 ) = = kµ−1 S01−α µα−1 = kS01−α µα−2 = N (> S0 ) µα−2 .
µ µ
Therefore for a magnification of µ (where µ > 1), the lensing changes the number
of background galaxies per unit area by a factor of µα−2 . For this factor to be
bigger than 1 we need
µα−2 > 1,
so log(µα−2 ) > log(1) = 0,
thus (α − 2) log(µ) > 0.
We already know that log(µ) > 0 (because µ > 1), so this can happen only if
α > 2. For example, if the source counts have a Euclidean slope (α = 2.5), then
309
Solutions to exercises

lensing would increase the number of objects. The effect of sampling less
volumes due to lensing, and so finding fewer objects than the flux magnification
on its own would suggest, is known as the Broadhurst effect. (See Broadhurst,
T.J., Taylor, A.N. and Peacock, J.A., 1995, Astrophysical Journal, 438, 49.)
Exercise 8.1 There’s no guarantee that the re-emitted photon comes out in the
same direction — in fact, it probably won’t. A corollary is that any Lyman α
cloud should glow faintly in Lyman α light in all directions from these re-emitted
photons, even if the cloud is not intercepting our line of sight to a quasar (because
there will always be some line of sight that does). This re-emission is in general
too faint to detect. However, Lyman α emission can sometimes be seen if there are
internal ionizing sources (e.g. star formation) within damped Lyman α systems,
which you will meet later in the chapter.
Exercise 8.2 The column density through the centre will be the same as that
seen through a cubical cloud with a side 2 Mpc, facing the observer (because the
absorption doesn’t depend on the distribution of material that the light doesn’t
pass through). One Mpc is about 3 × 1024 cm, so we can write the density as
(3 × 1024 )3 cm−3 = 2.7 × 1073 Mpc−3 . The total number of neutral hydrogen
atoms in the cube must be 2.7 × 1073 Mpc−3 × 8 Mpc3 = 21.6 × 1073 , which is
spread over a projected area of 2 × 2 Mpc2 = 36 × 1048 cm2 . Therefore the
column density must be 21.6 × 1073 /(36 × 1048 ) cm−2 1 6 × 1024 cm−2 .
Exercise 8.3 In order for a hydrogen atom to absorb an Hα photon, the
photon must have the right energy, and there must be an atom with an electron in
the n = 2 energy level ready to absorb the photon. This energy level is at
E = −13.6/n2 eV = −13.6/4 eV = −3.4 eV. In order to be in such a state, the
atom must have absorbed a photon of energy (−3.4 eV) − (−13.6 eV) = 10.2 eV.
Photons of this energy require a black body temperature of order
10.2 eV × 1.602 × 10−19 J eV−1
T 1 E/k = = 120 000 K.
1.381 × 10−23 J K−1
This is hotter than the surface of an O star, and is much hotter than the typical
temperatures in the intergalactic medium. Lyman α clouds are too cold to have
many atoms with electrons already excited to the n = 2 level, so the clouds have
almost no Hα absorption.
Exercise 8.4 We can write σ(ν) = σ0 (ν/νlimit )−3 , where
σ0 = 7.88 × 10−22 m−2 , and νlimit is the frequency of the Lyman limit. Writing
Jν = kν −α and plugging the terms in, we find
)∞ )∞ −3
νlimit (σJν /(hν)) dν νlimit (ν/νlimit ) kν −α−1 dν
τ = NH I ) ∞ = N H I σ0 )∞
−α−1 dν
νlimit (Jν /(hν)) dν νlimit kν
) ∞ −α−4
−α−3
NH I σ0 νlimit ν dν NH I σ0 νlimit α
= −3 ) ∞ = −3 −α
νlimit νlimit ν −α−1 dν νlimit α + 3 νlimit
N H I σ0 α
= ,
α+3
where we first cancelled the h terms, then cancelled the k terms. Setting τ > 1,
we find NH I > 1.3 ((α + 3)/α) × 1021 m−2 , as required.

310
Solutions to exercises

Exercise 8.5 Equation 1.28 relates dz/dt to H(z). Taking the modulus
and reciprocal of that equation gives (1 + z) |dt/dz| = 1/H(z).
A population with constant proper sizes has constant A in Equation 8.2,
and a constant comoving density is constant nco in the same equation.
Therefore d2 N ∝ (1 + z)3 |dt/dz| ∝ (1 + z)2 /H(z). If we write
dX/dz = (1 + z)2 H0 /H(z), then
$ $
2
$
2 $ 1 $
$
d N = nco A × (1 + z) c $ dNH I dz
H(z) $
gives
c
d2 N = nco A dX dNH I ,
H0
which is constant.
Exercise 8.6 Gravitational lensing of the background quasar by the damped
Lyman α system could cause such an effect. The strength of this effect, and the
biases that it creates on the measured cosmic evolution of neutral gas, are still the
subject of debate. However, it turns out that this is probably only a 10–20% effect
on ΩH I at z > 2.
Exercise 8.7 Dust in the damped Lyman α systems should redden the quasar
spectra, so one might compare the optical spectral indices or B–V colours of
quasars with and without damped Lyman α absorbers. However, if damped
systems are very dusty, they may induce so much reddening that the quasars drop
out of the parent sample, so bright quasar catalogues would be biased to detecting
low-reddening systems. Statistical analyses suggest that this latter effect does not
dominate, but direct results on quasar reddening are currently conflicting.
Exercise 8.8 The energy of the hydrogen Lyman limit is E = 13.6 eV,
i.e. E = 13.6 × 1.602 × 10−19 J = 2.179 × 10−18 J. This corresponds
to a frequency of ν = E/h, where h is Planck’s constant, which comes
out as ν = 3.289 × 1015 Hz. The wavelength of this light is λ = c/ν,
where c is the speed of light, which comes out as λ = 9.116 × 10−8 m, or
91.2 nm (i.e. 912 Å) to three significant figures. For the helium Lyman limit,
λHe = λ × 13.6/54.4 = 22.8 nm.
The redshifted hydrogen Lyman limit in Figure 8.20 is at a wavelength of
912 × (1 + z) Å = 912 × (1 + 3.2) Å = 3830 Å.

311
Acknowledgements
Grateful acknowledgement is made to the following sources:
Figures
Cover image courtesy of the Spitzer Space Telescope,
c
9NASA/JPL-Caltech/STScI/CXC/UofA/ESA/AURA/JHU;
Figure 1.9: supernova data taken from Blondin, S. et al. (2008) The Astrophysical
Journal, 682, 724; Figure 1.10 top left: [Link] Christian Buil;
Figure 1.10 top right: European Southern Observatory (ESO); Figure 1.10
bottom left: Stanford, S. A. et al. (2000) ‘The first sample of ultraluminous
infrared galaxies at high redshift’, The Astrophysical Journal Supplement Series,
131, 185, The American Astronomical Society; Figure 1.10 bottom right: van
Dokkum, P. G. et al. (2005) ‘Gemini near-infrared spectrograph observations of a
red star-forming galaxy at z = 2.225: evidence of shock ionization due to a
galactic wind’, The Astrophysical Journal, 622, L13, The American Astronomical
Society; Figure 1.11: Carroll, S. M. (2004), ‘Why is the Universe accelerating?’,
Freedman, W. L. ed. Measuring and Modelling the Universe, Carnegie
Observatories Astrophysics Series, 2, Carnegie Observatories; Figure 1.13:
NASA and the Hubble Heritage Team (STScI/AURA); Figure 1.16: Springel, V.
et al. (2005) ‘Simulations of the formation, evolution and clustering of galaxies
and quasars’, Nature, 435, 629 ; Figures 1.18 & 1.19: adapted from Carroll,
S. M., Press, W. H. and Turner, E. L. (1992) ‘The Cosmological Constant’,
c
Annual Review of Astronomy & Astrophysics, 30, 499, 9Annual Reviews Inc.;
Figure 1.20: Adapted from Knop R. A. et al. (2003), ‘New Constraints on ΩM ,
ΩΛ and w from an independent set of 11 high-redshift supernovae observed with
the Hubble Space Telescope’, The Astrophysical Journal, 598, 102, 9The c
American Astronomical Society;
Figures 2.1, 2.2 & 2.9: NASA/WMAP Science Team; Figure 2.3: adapted from
Coc, A. (2009) ‘Big-bang nucleosynthesis: a probe of the early Universe’,
Nuclear Instruments & Methods in Physics Research A, 611, 224, Elsevier
Science BV; Figure 2.4: adapted from a figure by Professor Edward L. Wright,
UCLA; Figures 2.7 & 2.8: Peacock, J. A. (1999) Cosmological Physics,
Cambridge University Press; Figure 2.10: University of Hawaii; Figure 2.11:
Granett, B. R. et al. (2008) ‘An imprint of super-structures on the microwave
background due to the Integrated Sachs–Wolfe effect’, The Astrophysical Journal
Letters, 683, L99, Institute of Physics Publishing; Figure 2.12: adapted from
Dunkley, J. et al. (2009) ‘Five year Wilkinson Microwave Anisotropy Probe
(WMAP1) observations: likelihoods and parameters from the WMAP data’,
Astrophysical Journal Supplement Series, 180, 306, Institute of Physics
Publishing; Figures 2.13 & 2.15: Hu, W. and Dodelson, S. (2002) ‘Cosmic
microwave background anisotropies’, Annual Reviews of Astronomy &
Astrophysics, 40, 171, Annual Reviews; Figures 2.14 & 3.7: adapted from figures
by Edward L. Wright, UCLA and based on data from Kowalski, M. et al. (2009)
The Astrophysical Journal Supplement Series, 686, 749; Figure 2.16: adapted
from Larson, D. et al. (2010) ‘Seven year Wilkinson Microwave Anisotropy
Probe (WMAP1) observations: power spectra and WMAP-derived parameters’,
Astrophysical Journal Supplement Series (in press, arXiv:1001.4635), Institute of
Physics Publishing; Figures 2.17 & 2.18: adapted from Komatsu, E. et al. (2009)

312
Acknowledgements

‘Five year Wilkinson Microwave Anisotropy Probe (WMAP1) Observations:


cosmological interpretation’ Astrophysical Journal Supplement Series, 180, 330,
Institute of Physics Publishing;
Figure 3.1a: Justin Yaros and Andy Schlei/Flynn Haase/NOAO/AURA/NSF;
Figure 3.1b: adapted from Begeman, K. G., Broeils, A. H. and Sanders, R.
H. (1991) ‘Extended rotation curves of spiral galaxies: dark haloes and
modified dynamics’, Monthly Notices of the Royal Astronomical Society, 249,
523; Figure 3.2: ESO Online Digital Sky Survey [Link]/dss/dss.;
Figure 3.4: adapted from a figure of Professor Edward L. Wright, UCLA;
Figure 3.5: adapted from Dressler, A. (1980) ‘Galaxy morphology in rich clusters:
implications for the formation and evolution of galaxies’, The Astrophysical
Journal, 236, 351, American Astronomical Society; Figure 3.6: adapted
from Ciardullo, R. (2004) ‘The Planetary Nebula Luminosity Function’, A
contribution to the ESO International Workshop on Planetary Nebulae beyond
the Milky Way, Garching (Germany), May 19–21, 2004; Figure 3.8: Chris
Schur, [Link]; Figure 3.9: [Link];
Figure 3.10: NASA/Jason Ware; Figure 3.11: Günter Kerschhuber, Gahberg
Observatory; Figures 3.12 & 3.13: Richard Powell, [Link];
Figure 3.14: adapted from de Lapparent, V. et al. (1986) ‘A slice of the
Universe’, The Astrophysical Journal, 302, 1, The American Astronomical
Society; Figures 3.15 & 3.18: The 2dF Galaxy Redshift Survey team
([Link] Figure 3.16: adapted from Peacock, J. A. et
al. (2001) ‘A measurement of the cosmological mass density from clustering in
the 2dF Galaxy Redshift Survey’, Nature, 410, 169, Nature Publishing Group;
Figure 3.17: adapted from Peacock, J. A. and Dodds, S. J. (1994), ‘Reconstructing
the linear power spectrum of cosmological mass fluctuations’, Monthly Notices of
the Royal Astronomical Society, 267, 1020, The Royal Astronomical Society;
Figure 3.19: adapted from Percival, W. J. et al. (2007) ‘Measuring the Baryon
Acoustic Oscillation Scale using the Sloan Digital Sky Survey and 2df Galaxy
Redshift Survey’, Monthly Notices of the Royal Astronomical Society, 381, 1053,
The Royal Astronomical Society;
Figure 4.1: adapted from Tegmark, M. and Zaldarriaga, M. (2002) ‘Separating the
Early Universe from the Late Universe: cosmological parameter estimation
beyond the black box’, Physical Review D, 66(10), 103508, The American
Physical Society; Figure 4.2: adapted from Lacey, C. and Cole, S. (1993) ‘Merger
rates in hierarchical models of galaxy formation’, Monthly Notices of the Royal
Astronomical Society, 262, 627, The Royal Astronomical Society; Figure 4.5:
Moore, B. et al. (1999) ‘Dark matter substructure within galactic halos’, The
Astrophysical Journal, 524, L19, American Astronomical Society; Figure 4.6:
adapted from Rocca-Volmerange, B. and Guiderdoni, B. (1988) ‘An atlas of
synthetic spectra of galaxies’, Astronomy & Astrophysics Supplement Series, 75,
93, European Southern Observatory; Figure 4.7: Dr Henner Busemann, School of
Earth, Atmospheric and Environmental Sciences (SEAES), The University of
Manchester; Figure 4.8: adapted from Gordon, K. D. et al. (2003) ‘A quantitative
comparison of the Small Magellanic Cloud, Large Magellanic Cloud, and Milky
Way ultraviolet to near-infrared extinction curves’, The Astrophysical Journal,
594, 279, The American Astronomical Society; Figure 4.9: Brammer, G. B. et al.
(2008) ‘EAZY: A fast, public photometric redshift code’, The Astrophysical
Journal, 686, 1503, The American Astronomical Society; Figure 4.10: adapted
313
Acknowledgements

from Dey, A. et al. (1998) ‘A galaxy at z = 5.34’, The Astrophysical Journal, 498,
L93, The American Astronomical Society; Figure 4.11: adapted from Bell, E. F.
et al. (2003) ‘The optical and near infrared properties of galaxies. 1. luminosity
and stellar mass functions’, The Astrophysical Journal Supplement Series, 149,
289, The American Astronomical Society; Figure 4.12: NRAO; Figure 4.13:
Sloan Digital Sky Survey; Figure 4.14: adapted from Yates, M. G. and Garden R.
P. (1989) ‘Near-simultaneous optical and infrared spectrophotometry of active
galaxies’, Monthly Notices of the Royal Astronomical Society, 241, 167, The
Royal Astronomical Society; Figure 4.16: adapted from Figure 2.3 of Peterson, B.
M. (1997) An Introduction to Active Galactic Nuclei, Cambridge University Press;
Figure 4.17: adapted from Richards, G. T. et al. (2006), ‘The Sloan Digital
Sky Survey Quasar Survey: quasar luminosity function from data release 3’,
The Astronomical Journal, 131, 2766, The American Astronomical Society;
Figure 4.18 left: A. Fujii; Figure 4.18 right: R. Williams (STScI), the Hubble
Deep Field Team and NASA; Figure 4.19: Robert Williams and the Hubble Deep
Field Team (STScI) and NASA; Figure 4.20: NASA/ESA, CXC, JPL-Caltech,
STScI, NAOJ, J. E. Greach (Univ Durham) et al.; Figure 4.21: adapted from
Gabasch, A. et al. (2004) ‘The evolution of the luminosity functions in the
FORS deep field from low to high redshift’, Astronomy & Astrophysics, 421, 41,
ESO; Figure 4.22: NASA, ESA, S. Beckwith (STScI) and the HUDF Team;
Figures 4.23 & 4.24: adapted from Bouwens, R. J. et al. (2009) ‘Constraints on
the first galaxies: z 10 Galaxy Candidates from HST WFC3/IR’, Submitted to
Nature (arXiv:0912.4263); Figure 4.25: adapted from Cohen, J. G. et al. (1996)
‘Redshift clustering in the Hubble Deep Field’, The Astrophysical Journal, 471, 5,
The American Astronomical Society; Figure 4.26: adapted from Bouwens, R. J.
et al. (2004) ‘Galaxy size evolution at high redshift and surface brightness
selection effects:constraints from the Hubble Ultra Deep Field’, The Astrophysical
Journal, 611, 1. The American Astronomical Society; Figure 4.27: adapted from
van Dokkum, P. G., Kriek, M. and Franx, M. (2009) ‘A high stellar velocity
dispersion for a compact massive galaxy at redshift z = 2.186’, Nature , 460, 717,
Macmillan Publishers Limited; Figure 4.28: NASA Jet Propulsion Laboratory
(NASA-JPL); Figure 4.29: H. Ferguson, M. Dickinson, R. Williams, STScI and
NASA; Figure 4.30: adapted from Bell, E. F. et al. (2004) ‘Nearly 5000 distant
early type galaxies in COMBO-17: a red sequence and its evolution since z 1’,
The Astrophysical Journal, 608, 752, The American Astronomical Society;
Figure 5.1: adapted from Hauser, M. G. and Dwek, E. (2001) ‘The Cosmic
Infrared Background: Measurements and Implications’, Annual Review of
Astronomy & Astrophysics, 39, 249, Annual Reviews Inc; Figure 5.2: adapted
from Hopwood, R. H. et al. ‘Ultra deep AKARI observations of Abell 2218:
resolving the 15 m extragalactic background light’, Astrophysical Journal Letters,
716, 45; Figure 5.3: [Link] Figure 5.4: adapted from
Blain, A. W. et al. (2002) ‘Submillimeter galaxies’, Physics Reports, 369,
111, Elsevier Science B.V.; Figure 5.5: adapted from Hughes D. H. et al.
(1998) ‘High-redshift star formation in the Hubble Deep Field revealed by
a submillimetre-wavelength survey’, Nature, 394, 241; Figure 5.6: BLAST
Collaboration; Figure 5.7: ESA and SPIRE Consortium; Figure 5.8: adapted from
Serjeant, S. et al. (1998) ‘A spectroscopic study of IRAS F10214+4724’, Monthly
Notices of the Royal Astronomical Society, 298, 321, Royal Astronomical Society;
Figure 5.9: adapted from Surace, J. A. et al. (1998) ‘HST/WFPC2 Observations

314
Acknowledgements

of warm ultraluminous infrared galaxies’, Astrophysical Journal, 492, 116, The


American Astronomical Society; Figure 5.10 top: Brad Whitmore (STScI) and
NASA; Figure 5.10 bottom: NASA/JPL-Caltech/Z. Wang (Harvard-Smithsonian
CfA); Visible: M. Rushing/NOAO; Figure 5.11: Courtesy of JAXA; Figures 5.12
& 5.13: adapted from Condon, J. J. (1992) ‘Radio emission from normal
galaxies’, Annual Reviews of Astronomy & Astrophysics, 30, 575, Annual Reviews
Inc; Figure 5.14: adapted from Dole, H. et al. (2006) ‘The cosmic infrared
background resolved by Spitzer’, Astronomy & Astrophysics, 451, 417, EDP
Sciences; Figures 5.15 & S5.1: adapted from Griffin, M. et al. (2007) ‘The
Herschel-SPIRE instrument and its capabilities for extragalactic astronomy’,
c
Advances in Space Research, 40, 612, 9COSPAR, Published by Elsevier Ltd;
Figures 5.16, 5.17 & 5.18: Pérez-González, P. G. et al. (2008) ‘The Stellar Mass
Assembly of Galaxies from z = 0 to z = 4’, The Astrophysical Journal, 675, 234,
The American Astronomical Society; Figure 5.19: adapted from Le Floc’h, E. et
al. (2005) ‘Infrared Luminosity Functions from the Chandra Deep Field-South’,
The Astrophysical Journal, 632, 169, The American Astronomical Society;
Figure 5.20: adapted from Di Matteo, T. et al. (2005) ‘Energy input from quasars
regulates the growth and activity of black holes and their host galaxies’, Nature,
433, 604, Nature Publishing Group; Figure 5.21: McNamara, B. R. et al. (2000)
‘Chandra X-ray observations of the Hydra A cluster:an interaction between the
radio source and the X-ray emitting gas’, Astrophysical Journal, 534, L135, The
American Astronomical Society; Figures 5.22 & 5.23: Fabian, A. C. et al.
(2003) ‘A very deep Chandra observation of the Perseus cluster: shocks and
ripples’, Monthly Notices of the Royal Astronomical Society, 344, L43, The Royal
Astronomical Society;
Figure 6.3: Science Photo Library; Figure 6.4: Misner, C. W., Thorne, K. S.
and Wheeler, J. A. (1973) Gravitation, W. H. Freeman & Co Ltd; Figure 6.5:
adapted from Kormendy, J. (1988) ‘Evidence for a supermassive black hole
in the nucleus of M31’, The Astrophysical Journal, 325, 128, American
Astronomical Society; Figure 6.6: adapted from Miyoshi, M. et al. (1995)
‘Evidence for a black hole from high rotation velocities in a sub parsec region of
NGC4258’, Nature, 373, 127, Nature Publishing Group; Figure 6.7: adapted
from Schdel, R. et al. (2003) ‘Stellar dynamics in the central arcsecond of our
galaxy’, The Astrophysical Journal, 596, 1015, The American Astronomical
Society; Figures 6.8 & 6.9: adapted from Peterson, B. M. (2001) ‘Variability of
active galactic nuclei’, Aretxaga, I., Knuth, D. and Mujica, R. eds. Advanced
Lectures on the Starburst-AGN Connection, World Scientific; Figure 6.10:
Ferraresse, L. (2002) ‘Black Hole Demographics’, Proceedings of the 2nd
KIAS Astrophysics Workshop held in Seoul, Korea (Sep 3–7 2001), Lee, C. H.
ed. World Scientific; Figure 6.11: J. Schmitt et al. ROSAT Mission, MPE,
ESA; Figure 6.13: Brandt, W. N. and Hasinger, G. (2005) ‘Deep Extragalactic
X-ray Surveys’, Annual Review of Astronomy & Astrophysics, 43, 827, Annual
Reviews; Figure 6.14: X-ray: NASA/CXC/U. of Michigan/J. Liu et al.; Optical:
NOAO/AURA/NSF/T. Boroson; Figure 6.15: adapted from Alexander, D. M. et
al. (2008) ‘Weighing the black holes in z ≈ 2 submillimeter-emitting galaxies
hosting active galactic nuclei’, The Astrophysical Journal, 135, 1968, The
American Astronomical Society; Figure 6.16: adapted from Kauffmann, G. and
Heckman, T. M. (2009) ‘Feast and famine: regulation of black hole growth in low
redshift galaxies’, Monthly Notices of the Royal Astronomical Society, 397, 135,

315
Acknowledgements

The Royal Astronomical Society; Figure 6.17: Weisberg, J. M. and Taylor, J. H.


(2005) ‘The relativistic binary pulsar B1913+16: thirty years of observations
ad analysis’, Rasio, F. A. and Stairs, I. H. (eds) Binary Radio Pulsars, ASP
Conference Series, 328, 25, Astronomical Society of the Pacific; Figure 6.18:
CalTech; Figure 6.19: adapted from Boroson, T. A. and Lauer, T. R. (2009) ‘A
candidate sub-parsec supermassive binary black hole system’, Nature, 458, 53,
Nature Publishing Group;
Figure 7.3: NASA, Andrew Fruchter and the ERO Team [Sylvia Baggett (STScI),
Richard Hook (ST-ECF), Zoltan Levay (STScI)] (STScI); Figure 7.4: adapted
from Nguyen, H. T. et al. (1999) ‘Hubble Space Telescope imaging polarimetry of
the gravitational lens FSC 10214+4724’, The Astronomical Journal, 117, 671,
The American Astronomical Society; Figure 7.5: adapted from Serjeant, S. et al.
(1998) ‘A spectroscopic study of IRAS F10214+4724’, Monthly Notices of
The Royal Astronomical Society, 298, 321, The Royal Astronomical Society;
Figure 7.12: Dr A. Holloway, University of Manchester; Figure 7.17: Burke, B. et
al. (1993) Sub-Arcsecond Radio Astronomy, Davis, R. J. and Booth, R. S. eds.
Cambridge University Press; Figure 7.18: adapted from Alcock, A. et al. (1993)
‘Possible gravitational microlensing of a star in the Large Magellanic Cloud’,
Nature, 365, 621, Nature Publishing Group; Figure 7.20: Stephane Colombi,
International Astronomical Union; Figure 7.22: adapted from Blandford, R. D. et
al. (1991) ‘The distortion of distant galaxy images by large scale structure’,
Monthly Notices of the Royal Astronomical Society, 251, 600, The Royal
Astronomical Society; Figure 7.23: adapted from Hoekstra, H. et al. (2004)
‘Properties of galaxy dark matter halos from weak lensing’, The Astrophysical
Journal, 606, 67, The American Astronomical Society; Figures 7.24 & 7.25:
adapted from Massey, R. et al. (2007) ‘Dark matter maps cosmic scaffolding’,
Nature, 445, 286, Nature Publishing Group; Figure 7.26: Large Synoptic Survey
Telescope Corporation; Figure 7.27: top right panel adapted from Hopwood et
al. (2010) Astrophysical Journal Letters, 716, 45; bottom left panel adapted
from Egami et al., paper in preparation; Figure 7.28: X-ray: NASA/CXC/CfA/
M. Markevitch et al. Lensing Map: NASA/STScI; ESO WFI; Magellan/U.
Arizona/D. Clowe et al. Optical: NASA/STScI; Magellan/U. Arizona/D. Clowe et
al.; Figure 7.29: CASTLES (CfA-Arizona Space Telescope Lens Survey);
Figure 7.30: NASA Johnson Space Center Collection; Figure 7.31: A. Bolton
(UH IfA) for SLACS and NASA/ESA; Figure S7.1: NASA, ESA, C. Faure
(Zentrum für Astronomie, University of Heidelberg) and J. P. Kneib (Laboratoire
d’Astrophysique de Marseille);
Figure 8.1: Rauch, M. (1998) ‘The Lyman alpha forest in the spectra of
quasistellar objects’, Annual Reviews of Astronomy & Astrophysics, 36, 267,
Annual Reviews; Figure 8.4: NASA, ESA, Y. Izotov (Main Astronomical
Observatory, Kyiv, UA) and T. Thuan (University of Virginia); Figures 8.5, 8.6 &
8.7: Pettini, M. et al. (2008) ‘Deuterium abundance in the most metal-poor
damped Lyman alpha system’, Monthly Notices of the Royal Astronomical
Society, 391, 1499, The Royal Astronomical Society; Figure 8.8 adapted from
Kriss, G. A. et al. (1999) ‘The Ultraviolet Peak of the Energy Distribution in
3C 273: Evidence for an Accretion Disk and Hot Corona around a Massive
Black Hole’, The Astrophysical Journal, 527, 683, The American Astronomical
Society; Figures 8.9 & 8.14: adapted from Noterdaeme, P. et al. (2009)
‘Evolution of the cosmological mass density of neutral gas from Sloan
316
Acknowledgements

Digital Sky Survey II-data release 7’, Astronomy & Astrophysics, 505, 1087,
European Southern Observatory; Figure 8.10: Prochaska, J. X. et al. (2005)
‘The SDSS damped Ly alpha survey: data release 3’, Astrophysical Journal,
635, 123, The American Astronomical Society; Figure 8.12: Reynolds, S.
C. (2007) ‘Quasar Absorbers and the InterGalactic Medium’, taken from a
pedagogical Seminar at the Royal Observatory, Edinburgh, 8 March 2007,
[Link]/ifa/postgrad/pedagogy/2007− [Link]; Figure 8.13: Möller,
P. and Warren, S. J. (1993) ‘Emission from a damped Ly alpha absorber at
z = 2.81’, Astronomy & Astrophysics, 270, 43, European Southern Observatory;
Figure 8.15: Smette, A. et al. (1992) ‘A spectroscopic study of UM 673 A & B:
on the size of the Lyman-alpha clouds’, Astrophysical Journal, 389, 39, The
American Astronomical Society; Figure 8.16: Nick Gnedin, Department of
Astronomy & Astrophysics, The University of Chicago; Figures 8.17 & 8.19:
Fan, X. et al. (2006) ‘Observational constraints on cosmic reionization’, Annual
Review of Astronomy & Astrophysics, 44, 415 92006c by Annual Reviews;
Figure 8.18: Becker, G. D. et al. (2007) The evolution of optical depth in the Ly
alpha Forest: evidence against reionization at z 6, The Astrophysical Journal, 662,
72, The American Astronomical Society; Figure 8.20: adapted from Möller, P.
and Jakobsen, P. (1990) ‘The Lyman continuum opacity at high redshifts: through
the Lyman forest and beyond the Lyman valley’, Astronomy & Astrophysics, 228,
299, European Southern Observatory; Figure 8.21: Smette, A. et al. (2002)
‘Hubble Space Telescope Space Telescope Imaging System Observations of the
He II Gunn–Peterson effect toward HE 2347-4342’, Astrophysical Journal,
564, 542, The American Astronomical Society; Figure 8.23: Carilli, C. L. et
al. (2002) ‘H I 21 centimeter absorption beyond the epoch of reionization’,
The Astrophysical Journal, 577, 22, The American Astronomical Society;
Figures 8.24 & 8.25: Cristiani, S. et al. (2007) ‘The CODEX-ESPRESSO
experiment: cosmic dynamics, fundamental physics, planets and much more . . .’,
Il Nuovo Cimento, 122B, 1165, Societa Italiana di Fisica.
Every effort has been made to contact copyright holders. If any have been
inadvertently overlooked the publishers will be pleased to make the necessary
arrangements at the first opportunity.

317
Index
Items that appear in the Glossary have page numbers in bold type. Ordinary
index items have page numbers in Roman type.
21 cm forest, 279 axion, 93 break luminosity, 135
21 cm Gunn–Peterson test, 277 bremsstrahlung, 99
B stars, 132, 154, 169
21 cm transition, 277 brightest cluster galaxy, 100, 178
Baldwin–Phillips–Terlevich diagram,
2dF galaxy redshift survey, 110 broad line region, 194
142
2dF quasar survey, 116 brown dwarfs, 38, 92
Balmer decrement, 132
3C273, 28, 139, 140 Bullet cluster, 248
Balmer line, 132, 169
Butcher–Oemler effect, 104, 176
Abell cluster catalogue, 100 Balmer series, 256
BzK galaxies, 146, 166
Abell 1835 galaxy, 247 baryogenesis, 46
Abell 2218 galaxy cluster, 216, 246 baryon asymmetry, 46 calculus of variations, 188
absorption distance, 260 baryon density, 24, 47, 72 Calzetti extinction law, 148
acceleration four-vector, 288 baryon drag, 73 Canis Major dwarf galaxy, 108
accretion efficiency, 188 baryon number, 46 Cartesian coordinates, 16
accretion luminosity, 186 baryon wiggles, 116, 245, 277 causality, 15, 53
achromatic, 217 baryonic acoustic oscillations, 116, caustics, 231, 235
acoustic peaks, 72, 76 246 CDM, 121, 240, 257
action, 188 baryosynthesis, 46 central engine, 141
active galactic nuclei, 141 B-band filter, 96 central limit theorem, 138
active galaxies, 155, 209 BCGs, 178 Cepheid variables, 105, 106
ADAFs, 187 beam, 149 chain galaxies, 144, 154
adaptive optics, 196 BeppoSAX, 205 Chandra, 205, 248
adiabatic, 65 bias parameter, 115 Chandra Deep Field North, 173, 206
adiabatic expansion, 44, 56 Big Bang nucleosynthesis, 46, 238 Chandra Deep Field South, 165, 173,
adiabatic perturbations, 66 big rip, 87 206
advection, 187 binary pulsar, 211 Chandra space telescope, 205
advection-dominated accretion flows, binary supermassive black holes, 212 chronology protection conjecture, 185
187 Birkhoff’s theorem, 123, 183, 228 cirrus confusion noise, 170
age of the Universe, 25, 26 black body, 45 cirrus dust, 170
AGN, 170, 180, 205 black body radiation, 42 CLASS, 249
AKARI space telescope, 170, 172 black body spectrum, 40 cloud-in-cloud problem, 128
ALMA, 165 black hole accretion, 273 Cloverleaf lens, 249
Andromeda galaxy, 109, 154, 196 black hole mass density, 194 CMB, 17, 19, 40, 160, 253, 257, 258,
angular correlation function, 114, 126 black holes, 37, 38, 55, 115, 140, 141, 270
angular diameter distance, 32, 34, 99, 155, 159, 170, 172, 183, CMB photons, 100
105, 220 237, 273 CMB power spectrum, 68, 211
annihilation, 47 Blandford and Kochanek elliptical CMB spectrum, 100
density profile, 236 CO emission, 98
Antennae galaxies, 168
BLAST, 164 CO molecules, 98
anthropic principle, 78
blazars, 141 COBE satellite, 42, 45, 68, 257, 276
anti-hierarchical, 178
blue cloud, 154, 156 CODEX, 278, 279
apparent recession velocity, 22, 32
blurring, 126 cold dark matter, 121
Arp 220 galaxy, 174
B-mode, 79 colour–density relation, 155
ASCA, 205
bolometric correction, 194 colour–magnitude diagram, 154
associated Legendre polynomials, 67
bolometric luminosity, 33 column density, 132, 204, 254,
astronomical filters, 97
Boltzmann distribution, 48 258–261, 264
ATIC, 93
bottom-up structure formation, 122 Coma, 109
318
Index

COMBO-17 survey, 135, 154 cycloid equations, 123 Dust Obscured Galaxies, 166
comoving coordinates, 29, 31 cycloid solution, 123 duty cycle, 207
comoving distance, 29, 30, 31, 34, Cygnus A, 139
100, 220 early Integrated Sachs–Wolfe effect,
damped Lyman α systems, 261 74
comoving volume, 34
dark energy, 84, 88 early-type galaxy, 94
comoving volume derivative, 35
dark energy density, 84 Eddington limit, 187, 193
complete sample, 137
dark matter, 24, 84, 92, 99, 121, 128, Eddington luminosity, 186
complex numbers, 61
236 Eddington ratios, 209, 210
Compton scattering, 101, 204
dark matter density, 72 Eddington timescale, 187
Compton y parameter, 102
dark matter haloes, 115 Einstein Cross, 249
Compton-thick active galaxies, 204,
206 dark sector, 84 Einstein radius, 221, 228, 237
concordance cosmology, 78 Darwin mission, 172 Einstein ring, 222
confusion limit, 149, 151 de Sitter spacetime, 36 Einstein tensor, 83
conservation of angular momentum, de Sitter universe, 87 Einstein’s field equations, 19, 183
218 de Vaucouleurs law, 98 Einstein–de Sitter model, 27, 124
conservation of energy, 19 deceleration parameter, 23 ekpyrotic Universe, 81
convergence, 241 density fluctuations, 257 elliptical galaxies, 98, 102, 144, 152
convolution, 126, 127 density parameters, 23, 23, 27, 28 eMERLIN, 173, 229
cooling flow problem, 105 density perturbations, 55, 63, 81 E-mode, 79
coordinates DES, 244 energy conservation, 44, 224, 225
comoving, 29, 31 deuterium, 51, 257 energy density, 43, 44, 59, 86
proper, 29, 31 deuterium abundance, 51, 257 energy–momentum tensor, 44, 83
correlation function, 113 deuteron, 49 entropy per baryon, 47, 47
cosmic censorship hypothesis, 185 differential magnification, 218 equation of state, 56, 59, 83, 84
cosmic microwave background, 17, 19, differential number counts, 120 equivalent width, 263
40, 160, 253, 257, 258, 270 differential source counts, 13, 120 ERO, 146
cosmic near-infrared background, 276 diffraction limit, 165, 171 EROS, 239
cosmic rest frame, 17 diffusion damping, 73 escape fraction, 273
cosmic shear, 240, 241, 277 Digitized Sky Survey, 151 EUCLID, 245
cosmic star formation history, 146, dimensionless frequency, 102 Euclidean-normalized differential
148, 148, 162, 174, 175, 178 dimensionless power spectrum, 64 source counts, 160
cosmic time, 17 of galaxies, 113 Euclidean source count, 120, 160
cosmic variance, 68 dimensionless scale factor, 20, 124, Euclidean space, 12
cosmic X-ray background, 203 126 Euler–Lagrange equation, 190, 192
Cosmical Dynamics Experiment, 279 dipole, 70 European Extremely Large Telescope,
cosmological constant, 19, 27, 28, 36, DIRBE, 276 279
83, 84, 86, 107 Distant Red Galaxies, 166 event, 14
cosmological event horizon, 37 Dn –σ relation, 99 event horizon, 37, 53, 184
Cosmological Evolution Survey, 243 DOGs, 166 e-VLA, 173
cosmological redshift, 20 Doppler peaks, 72 exoplanet, 240
cosmological time dilation, 20, 33 dormant quasars, 194 expansion of the Universe, 278, 279
COSMOS, 151, 243–245, 251 double quasars, 249 extinction, 131, 148
critical density, 23, 41, 47 downsizing, 178 extragalactic background, 159, 162
critical lines, 235 DRGs, 166 Extremely Red Objects, 146, 166
cross section, 102, 254 dry merger, 157 extrinsic curvature, 70
CTIO, 244 DSS, 151 Faber–Jackson relation, 98, 207
Curie temperature, 54 dust, 56, 129, 131, 148, 159, 168, 175, failed dwarf galaxies, 129
curvature, 16, 18 204, 218 faint blue galaxies problem, 120
curve of growth, 264 tori, 172 false vacuum, 57
319
Index

Fanaroff–Riley (FR) type I, 142 GEO, 212 horizon problem, 53, 66


Fanaroff–Riley (FR) type II, 142 geodesic, 14, 17, 188, 191, 217 Hot Big Bang, 45
Faulkes telescopes, 239 giant elliptical galaxies, 178 hot dark matter, 121
feedback, 178, 179 globular cluster luminosity function, HST, 144, 168, 243, 251
Fermat’s principle, 14, 232 106 ACS, 250
Fermi gamma-ray telescope, 94 globular clusters, 26, 27, 104, 207 Hubble constant, 23, 36
Feynman light clock, 285 GOODS-N, 173, 206 Hubble Deep Field, 144
fine structure constant, 21 GOODS-S, 173 Hubble Deep Field North, 144, 151,
fine-tuning problem, 27 grand unified theories, 46, 54 172, 206
fingers of God, 112, 125 gravitational lensing, 81, 92, 167, 216 Hubble Deep Field South, 144, 151,
first acoustic peak, 72 gravitational potential, 230 206
first light, 253, 270, 275 gravitational redshift, 44 Hubble Deep Fields, 253
flatness problem, 27, 53, 56 gravitational wave background, 55, 80, Hubble diagram, 107, 144
fluctuation analysis, 151 81, 211 Hubble distance, 35
flux-limited sample, 137, 138 gravitational waves, 55, 210 Hubble flow, 23
force four-vector, 289 Great Wall, 110 Hubble parameter, 22, 25, 101, 106,
force three-vector, 289 green valley, 155 229, 231, 278, 279
4000 Å break, 133 grey body emissivity index, 168 Hubble Space Telescope, 144, 168,
Fourier expansion, 113 Gunn–Peterson test, 270, 271 243, 251
Fourier series, 60 21 cm, 277 Hubble tuning fork, 94
Fourier transform, 60, 61, 114 Gunn–Peterson trough, 275 Hubble Ultra Deep Field, 146, 152,
four-momentum, 290 206
half-light radius, 98 Hubble volume, 31, 66
four-vector, 227, 285, 288
halo occupation distribution, 128 Hydra A, 180
FR-I radiogalaxies, 142
FR-II radiogalaxies, 142 hard X-ray background, 205 hydrogen column density, 258
harder X-ray spectra, 203 hydrostatic equilibrium, 99
fractional overdensity, 62, 124
Harrison–Zel’dovich spectrum, 63 hyper-luminous infrared galaxies, 167
frame dragging, 184
Hawking radiation, 55, 210
free-streamed, 73
HDM, 121 I-band filter, 95
freeze-out, 48
HE 2347-4342, 276 image plane, 235
Friedmann equations, 19, 123
helium-3 abundance, 52 image polarization, 242
Friedmann–Robertson–Walker metric,
helium-4 abundance, 52 IMF, 170, 176, 178
17
Helmholtz–Hodge theorem, 80 impact parameter, 218, 239
frozen out, 48
Herschel ATLAS key project, 250 inflation, 18, 28, 46, 54, 88, 257
fundamental plane, 98
Herschel Space Observatory, 164, 171, inflation potential, 57, 60, 67, 258
Galactic Centre, 198 173 inflaton, 57
Galaxy (the Milky Way), 98, 108, 109, Hertzsprung–Russell diagram, 154 inflaton field, 57, 83, 84, 88
129, 135, 187, 198 hierarchical formation, 122 Infrared Astronomy Satellite, 166
galaxy classification, 94 hierarchical galaxy formation, 177 Infrared Space Observatory, 167
galaxy clusters, 74, 99, 129, 206 hierarchical structure formation initial mass function, 128, 129
galaxy formation, 177 model, 103 integral number counts, 120
galaxy harassment, 102 Higgs boson, 57, 83 integral source counts, 120
galaxy luminosity function, 146 Higgs field, 57, 83, 88 Integrated Sachs–Wolfe effect, 74, 277
galaxy morphologies, 94 high-mass X-ray binary, 172 interferometry, 171
Galaxy Zoo, 94 high-redshift supernovae, 107 intermediate mass black holes, 207
galaxy–galaxy mergers, 143, 152 High-Redshift Supernova Search International X-ray Observatory, 205
Galilean transformation, 227 Team, 107 interplanetary dust grains, 131
Gaussian random field, 65 H II region, 22 interstellar dust grains, 129
G-dwarf problem, 269 HLIRGs, 167, 167 interval, 287
general relativity, 184 HMXBs, 172 intrinsic curvature, 70

320
Index

invariant, 14, 287 light element abundances, 50 Madau diagram, 148, 149, 156, 208,
inverse magnification tensor, 234, 241 lightcone, 16 269
ionization, 41 light-travel distance, 29 Madau–Lilly diagram, 148
ionization parameter, 266 LIGO, 212 Maffei 1 Group, 109
ionizing background, J−21 , 266, 267 limb darkening, 238 Magellanic Clouds, 108
ionizing radiation, 267 Limber’s equation, 115 magnetic monopole, 54
IRAS, 166 linear regime, 63 magnification, 224
IRAS FSC 10214+4724 galaxy, 166, linear theory, 124 magnification bias, 225, 225
206, 217–219 LINERs, 142 magnification tensor, 226
iron Kα line, 205 Liouville’s theorem, 223 Magorrian relation, 201, 202, 267
IRTS, 276 LIRGs, 166 Malmquist bias, 174, 174
ISO, 167 LISA, 212 masers, 197
isotropic, 17, 19 lithium abundance, 52 mass–energy equivalence, 285
IXO, 205 Local Group, 109–111 mass function, 127
I Zw 18 galaxy, 255 local supercluster, 109 mass–metallicity relation, 157
Lockman hole, 206 massive compact halo objects, 237
James Clerk Maxwell Telescope, 162 lookback time, 26, 30 mass-to-light ratio, 96
James Webb Space Telescope, 172 Lorentz contraction, 285 matter density, 19, 24, 47
jansky, 160 Lorentz transformation, 17, 70, 223, matter overdensity, 126
JDEM, 245 227, 285, 286 matter power spectrum, 64
Jeans mass, 103 Lorentzian profile, 262 matter–antimatter asymmetry, 59
jet-induced star formation, 180 low surface brightness galaxies, 95, 97 megamasers, 197
jets, 141, 180, 199 LSB galaxies, 95, 97 merger rate, 143
JVAS, 249 LSST, 244, 246 MERLIN, 173, 249, 251
JWST, 172, 275 luminosity density, 161 Mészáros effect, 121
luminosity distance, 32, 105 metallicity, 96
Kaiser effect, 112
luminosity function, 135, 148 metric, 14
Kα line, 198
luminous infrared galaxies, 166 metric coefficients, 14
K-band filter, 96
Lyman α absorption line, 254 metric tensor, 14, 287
K-correction, 34, 162, 175, 204
Lyman α blobs, 146 MGC-6-30-15 galaxy, 198
Kelvin–Helmholtz instability, 179
Lyman α cloud, 255, 259, 263, 269, microlensing, 237
Kepler’s second law, 218
271 millisecond pulsars, 278
kernel, 127
Lyman α emission line, 253 Minkowski spacetime, 287
Kerr metric, 184
Lyman α forest, 253, 254, 269, 271, mixed dark matter, 121
kinetic mode, 180
273 MOA, 239
kinetic S–Z, 101
Lyman β transition, 256 modified Newtonian dynamics, 248
Lagrangian, 188 Lyman break galaxies, 133, 146 momentum conservation, 44
Large Magellanic Cloud, 108, 237 Lyman limit, 133, 274 momentum four-vector, 289, 290
large-scale structure, 19, 29, 40, 46, Lyman limit absorption, 256 MOND, 248
74, 81, 108, 121, 126, 152, Lyman series, 133, 256 monolithic collapse model, 103, 153,
240, 242, 257 177
laser guide star, 196 M31, 196 monopole problem, 54, 55
laser interferometry, 212 M33, 209 Moon, 203
late-time Integrated Sachs–Wolfe M57, 22 morphological K-correction, 154
effect, 74 M81 Group, 109 morphology–density relation, 102
late-type galaxy, 94 M82, 163, 174 M-theory, 21
Legendre polynomial functions, 67 M83 Group, 109 multiple images, 231
lens equation, 221, 232 M87, 197 multiverse, 79
lepton asymmetry, 46 M106, 197
light echo, 106 MACHOs, 92, 237, 239 natural units, 58

321
Index

Navarro–Frenk–White model, 236, photon density, 72 QSO 2237+030, 249


246 photon pressure, 185 quantized energies, 20
N -body simulations, 125 photons, 290 quantized wavelengths, 20
negative pressure, 56, 84 Planck epoch, 54 quantum gravity, 54
neural nets, 94 Planck length, 54 quasar luminosity function, 267, 273
neutralino, 93 Planck mass, 54 quasars, 115, 140, 178, 199, 208, 231,
neutrino background, 121 Planck mission, 69, 81, 258 240, 259, 265, 271, 274
neutrinos, 49 Planck scale, 66, 67, 88 double, 249
neutron–proton ratio, 48, 51 Planck time, 54 quasi-stellar object, 140
neutron stars, 38 planetary nebula, 22 quintessence, 86
Newtonian gravity, 184 planetary nebula luminosity function, q0 , 23
Newtonian potential, 232 106
NGC 205, 209 point spread function, 242 radiation energy density, 42
NGC 4258, 197 Poisson statistics, 106, 150 radiative mode, 180
NGC 5548, 199 Poisson’s equation, 64 radio–far-infrared correlation, 170
Noether’s theorem, 191 polarization, 79 radio jet, 144
null interval, 15 polycyclic aromatic hydrocarbons, 172 radio lobe, 142, 144
number counts, 12 population III stars, 273 radio-loud galaxies, 142
population synthesis, 129, 148 radio-loud unification model, 208
O stars, 132, 154, 169 radio-quiet galaxies, 142
position four-vector, 288
obscured AGN, 205 radiogalaxies, 139, 142, 208
power law, 13, 34
observable Universe, 31 Rayleigh–Jeans regime, 41, 162
power law index, 13
OGLE, 239 Rayleigh–Jeans tail, 100
power spectrum, 63
Olbers’ paradox, 11, 12, 34, 40, 144, Rayleigh–Taylor instability, 179
precision cosmology, 40, 72
253 R-band filter, 95
Press–Schechter model, 128, 135
optical depth, 76, 256 recombination, 40, 79, 82, 266, 270
pressure, 19, 86
orthogonality, 68 red nuggets, 152
pressure broadening, 263
PAHs, 172 primeval atom, 18 red sequence, 154, 156
PAMELA satellite, 93 primordial element abundances, 50 reddening, 130, 146
Pan-STARRS, 244 primordial fireball, 49 redshift, 20
parallax, 239 primordial neutrino background, 49 redshift cut-off, 143
parameter degeneracies, 78 primordial nucleosynthesis, 37, 40, 47, redshift space distortion, 112
particle horizon, 53 48 Rees–Sciama effect, 74
passive stellar evolution, 103, 129 principle of least action, 188, 191 refractive index, 231
passively-evolving galaxies, 154 principles of special relativity, 285 reheating, 59
P (D) analysis, 151 proper area, 34 reionization, 76, 79, 82, 257, 270, 273
peculiar velocities, 20 proper coordinates, 29, 29, 31 relativistic beaming, 199
pencil-beam survey, 151, 173, 205 proper distance, 31, 53 relativistic Doppler shift, 290
Penrose–Hawking singularity proper motion distance, 32, 34, 100 relativistic three-momentum, 290
theorems, 184 proper time, 184 rest mass, 290
period–luminosity relationship, 106 proper volume, 34 reverberation mapping, 199, 199, 200,
Perseus galaxy cluster, 180 proton decay, 38 208
phantom energy, 86 proximity effect, 266, 267 Robertson–Walker metric, 17
phase space, 222 pure density evolution, 143 ROSAT, 203, 205
phase space density, 152, 223 pure luminosity evolution, 143 rotation curve, 92, 96
photoionization, 266 RR Lyrae stars, 105
photometric and spectroscopic QSO, 140
QSO 0913+072, 257, 258 Sachs–Wolfe effect, 74
redshifts, 132
QSO 0957+561, 231, 249 Sachs–Wolfe plateau, 69
photometric redshifts, 133
QSO 1422+23, 253 Salpeter timescale, 187
photon–baryon gas, 72

322
Index

SASSy, 250 spacetime interval, 14, 15 thermal radiation from dust, 159
scalar field, 57, 85, 88 sparse sampling, 137 thermalization, 45
scale factor, 16, 16, 19, 21, 31, 53, 87 spatial flatness, 56 thermal S–Z, 101
scale height, 97 spatially homogeneous, 17 thick disc, 97
scale length, 97 special relativity, 14, 285 thin lens approximation, 220
scale-invariance, 63 spectral energy distribution, 162 Thomson scattering, 40, 76, 102, 123,
scale-invariant potential fluctuations, spectral indices, 81 186, 204, 257, 270
64 spherical coordinates, 16 cross section, 186
scale-invariant spectrum, 63, 69 spherical harmonics, 67 three-vector, 227
Schechter function, 135, 146 SPICA, 171, 172 three-velocity, 288
Schmidt law, 97 spiral galaxies, 96, 102, 152 time delay surface, 233
Schwarzschild metric, 183, 192, 230 spiral galaxy collisions, 102, 152 time dilation, 285
Schwarzschild radius, 123, 183, 195 spiral–spiral mergers, 102, 152 time-like interval, 15
SCUBA, 162 Spitzer Space Telescope, 165, 167 tip of the red giant branch, 105
Sculptor Group, 109 SQLS, 249 tired light universe, 20
SDSS, 201 stacking analysis, 75 Tolman surface brightness test, 95, 144
second acoustic peak, 73 standard candle, 105, 212 topology, 69, 70
selection effects, 138 standard rod, 73, 105 torus, 141, 206
selection function, 136 standard siren, 212 transfer function, 126
self-shielding, 256 star formation, 103, 155, 179, 273 transverse proximity effect, 267
semi-analytic model, 128, 179 star formation history, 169 true vacuum, 58
semi-analytic modelling, 162 star formation rate, 96, 97, 175, 265 Tully–Fisher relation, 96, 106, 149
Sérsic profile, 98 starburst, 103 turn-around time, 123
Seyfert 1 galaxies, 141, 187, 199 starburst galaxies, 179, 206 21 cm forest, 277, 279
Seyfert 2 galaxies, 141 star-forming galaxies, 154, 172, 176 21 cm Gunn–Peterson test, 277
Sgr A*, 187 Steady State models, 45 21 cm transition, 277
Shapiro delay, 231 Stefan–Boltzmann law, 43 Two Micron All-Sky Survey, 108
shear, 241 Strömgren spheres, 270 type 1 active galaxies, 194, 207
σ0 , 23 strong anthropic principle, 79 type 1 AGN, 141
Silk damping, 73 submillimetre galaxies, 163 type 2 active galaxies, 207
singular isothermal sphere, 228, 229, Sunyaev–Zel’dovich effect, 101 type 2 AGN, 141, 203
234, 246 super-Eddington accretion, 187 type Ia supernovae, 106, 245
SKA, 245, 246, 277, 279 supermassive black hole, 140, 142,
SLACS, 250 155, 195, 202 U-band dropouts, 133
Sloan Digital Sky Survey, 110, 133, Supernova Cosmology Project, 107 UDF, 206
201 supernovae, 20, 21, 78, 170, 179 ULIRGs, 166, 167, 178, 209
SLOAN survey, 249 supersymmetry, 21 Ultra-Deep Field, 173
slow-roll approximation, 60 surface brightness, 11, 34, 95, 216, ultra-deep survey, 144
Small Magellanic Cloud, 108, 197 222, 224, 253 ultraluminous infrared galaxies, 166
SMGs, 164, 165, 176, 178, 209 surface brightness fluctuation, 106 ultraluminous X-ray sources, 207
smoothing, 126 surface mass density, 227 uncertainty principle, 262
SN 1987A, 106 surface of last scattering, 40 unified model, 141
softened isothermal sphere, 234 synchrotron radiation, 170 unobscured AGN, 205
Solar System, 18 S–Z effect, 101 unsharp masking, 180
sound horizon, 72
source count model, 162 tensor, 44, 226 vector field, 57
source counts, 12, 150 Tensor–Vector–Scalar theory, 248 velocity dispersion, 153, 208
source plane, 235 TeVeS, 248 velocity four-vector, 288
space-like interval, 15 thermal broadening, 263 velocity width, 96
spacetime curvature, 220 thermal radiation, 169 Very Large Array, 173

323
Index

Very Long Baseline Array, 197 warm dark matter, 121 WIMP, 93
Virgo cluster, 22, 109–111 water maser, 106 WMAP measurements, 75, 78
virial equilibrium, 125 wave four-vector, 290 WMAP satellite, 24, 42, 68, 81, 258
virial theorem, 99, 104 wave number, 61, 62 wrap-around scales, 70
virialized assemblage, 99 wave number vector, 63 wrap-around topology, 70
VLA, 173 weak anthropic principle, 79
VLBA, 197 white dwarfs, 38, 106 X-ray background, 203, 273
voids, 74 white holes, 185 X-ray spectral paradox, 203
Voigt profile, 263 wide-field survey, 151 Xallarap, 239
volume-limited sample, 135 Wien regime, 41 XBONGs, 205
XMM-Newton, 173, 205

324

Common questions

Powered by AI

The luminosity function provides insights into galaxy evolution by describing the distribution of galaxies' brightness across comoving volumes. It is fundamental for deriving information about the number density and evolution of galaxies over time. Through techniques like the 1/Vmax statistic and studying the evolution of the rest-frame ultraviolet luminosity density, researchers can track how galaxy populations change. Peaks in the redshift histograms from surveys like those of the Hubble Deep Fields reflect large-scale structures, further elucidating evolutionary patterns .

Redshift affects the understanding of matter-radiation equality by relating the scale factor of the universe to the density and dominant energy forms over cosmic time. The redshift at matter-radiation equality is given by the formula 1 + zeq = 23,800 Ωm,0 h^2 (TCMB,0/2.725 K)^−4, where Ωm,0 is the normalized matter density, h is the Hubble parameter, and TCMB,0 is the current CMB temperature . This expression shows how the energy densities of matter and radiation evolve differently with redshift, thereby impacting when they become equal.

The Magorrian relation is significant because it reveals a strong correlation between supermassive black hole masses and the luminosity of their host galaxy's bulge. This implies a deep connection between the growth of black holes and galaxy formation. The tight correlation suggests that the processes governing star formation and black hole accretion are linked, possibly through feedback mechanisms influencing star formation and gas inflow .

Hubble's Law helps determine the universe's early state by establishing a relationship between the distance of galaxies and their redshifts, suggesting the universe's expansion. This law, expressed as v = H0d, where v is the recession velocity, d is the distance, and H0 is the Hubble constant, implies that the early universe was more compact and dense, supporting the Big Bang theory and the expansion of space over time .

The correlations between supermassive black holes and host galaxy properties, like the Magorrian relation and the MBH-σ relation, imply a co-evolutionary history that informs galaxy evolution theories. These correlations suggest feedback mechanisms where black hole growth regulates star formation in galaxies, affecting their development. The tightness of these correlations indicates that galaxy and black hole growth are interlinked processes, challenging traditional hierarchical models that view them as distinct events .

The cosmic microwave background's uniformity presents the horizon problem, challenging our understanding of early universe conditions. Given its uniform temperature over vast angular sizes, regions have had no causal contact since the universe's creation. This contradicts standard Big Bang cosmology, which suggests homogeneous regions should have been causally disconnected. The problem implies the need for inflationary theory or other mechanisms to explain how distinct regions achieved thermal equilibrium .

Astronomers face challenges like the faintness of high-redshift galaxies and redshift confusion when identifying these objects. They overcome these challenges using photometric redshifts, which analyse observed colors to infer redshifts. Additionally, surveys use the Lyman break technique, exploiting the galaxy's spectral dropout at certain redshifts, to efficiently select high-redshift candidates. These methods mitigate issues related to cosmic dust, telescope sensitivity, and the evolved luminosity function .

Magnetic monopoles pose a problem in cosmology because Grand Unified Theories (GUTs) predict their formation in the early universe when the GUT symmetry breaks. Their expected abundance, given the small horizon size at that time, should have resulted in a dominant presence in the present-day universe, contributing significantly to its energy density. However, observationally, magnetic monopoles are not detected, which contradicts these predictions and suggests issues with our understanding of the universe's early conditions .

Extremely Red Objects (EROs) are crucial for understanding galaxy formation and evolution due to their distinct optical and infrared properties, suggesting high mass and potential association with more massive dark matter haloes. EROs include both dusty star-forming galaxies and old stellar populations. Their strong clustering indicates significant structures and informs us about the complex processes of star formation and galaxy assembly at high redshifts, offering insights into the chronological sequence of galaxy formation .

Lyman α clouds are important because they serve as tracers of the universe's underlying matter distribution. They are less affected by complex physical processes that influence galaxy formation, such as non-linear gravitational collapse and feedback, allowing them to provide a clearer indication of large-scale structures. By studying these clouds, researchers can test cosmological models and gain insights into the distribution of baryonic matter .

You might also like