Genome sequencing
Size of human genome
23 pairs of chromosomes
3.1 billion bp
If code written in NYC phone books and
stacked up would reach top of
Washington monument.
Human Genome Project
Began as a academic effort
Initially involved 5 research centers in
US and England.
Soon joined by Celera, spin off
company.
Some surprises
Initial estimate 100,000 to 150,000
genes but found to be 35,000 to
50,000. (C. elegans ~19,000 genes)
Mass of genome that codes for protein
originally estimated as 5% but found to
be 1.5%.
Some completely sequenced
genomes
Mycoplasma genetialium
578,000 bp, 400 genes
Haemophilus influenza
1,830,138 bp, 1738 genes
E. coli
4,639,221 bp, 4377 genes
S. cervisiae
12 x 106 bp, 5885 genes
More genomes
C. elegans
95.5 x 106, 19,820 genes
D. melanogaster
1.8 x 108, 13,601 genes
A. thaliana
1.17 x 108, 25, 498 genes
More genomes
M. musculus
3 x 109, ~30,000 genes
H. sapiens
3.3 X 109, 30-50,000 genes
O. sativa
4.3 x 108, 30-63,000 genes
The beginning
Human genome project initially
discussed at a UC-Santa Cruz meeting
in 1985.
What were the concerns?
What will it do to biology?
How will be pay for it?
Is this really science?
Why bother to sequence it all?
all vs. just the genes (skim sequencing)
Dept. of Energy
Initially funded project in 1987.
$5.3 million
Study radiation induced mutations,
repair and effect on humans.
NIH
Joined in 1988.
James Watson leader
3% of research budget devoted to
examining the ethical, legal, and social
implications of gene research (ELSI)
Other genomes
Parallel sequencing of E. coli, S.
cerevisiae, C. elegans, D. melanogaster ,
and M. musculus
Why
Work out the technology and methods
Watson’s vision
Sequence it all not just genes.
Use genetic maps and markers to help
assemble the pieces.
Academic players
Wash U
Baylor
Whitehead
Wellcome Trust
Joint Genome Institute—DOE Center
$1 to 10 cents a finished bp
automated processing of cloned DNA
automated DNA sequencing
computer system to support sequence
data
algorithms to assess sequence fidelity,
assemble sequences, and “find” genes.
Maps
Thomas Hunt Morgan (early 1900s)—
low resolution phenotypic markers
1970s restriction maps
1980s RFLPs
1989 Maynard Olson, Leroy Hood,
Charles Cantor, and David Botstein
sequence itself is a marker! (STS)
PCR
Polymerase Chain Reaction
[Link]
Techniques
Amplifying
Making copies of DNA
The PCR revolution
1985
Kary Mullis-Cetus Corporation
No need to send clones back and forth
Allowed automated DNA sequencing
No need for large clone repositiory for
all human genes
Unrestricted access to genes via public
sequence databases.
Kary Mullis talks about PCR
[Link]
Techniques
Amplifying
Interviews
Making DNA copies
Naming PCR
Sequencing-the old way
Maxim and Gilbert or Sanger methods
[Link]
Techniques
Sorting and Amplifying
Early DNA sequencing
[Link]
Techniques
Sorting and Amplifying
Interviews
Dideoxy method of sequencing
Automated Sequencing
Automation made possible by new dye
chemistry developed by Leroy Hood and
Lloyd Smith at Cal. Inst. Tech. in 1986.
[Link]
Techniques
Sorting and Amplifying
Cycle Sequencing
Inside the automated sequencer
Collaboration with ABI produced first
automated sequencer.
Laser detection of each bp.
[Link]
Techniques
Sorting and Amplifying
Interviews
Making sequencing automated
Inside an automated sequencer
Sequencing
Detect all 4 nucleotides in one lane so
quadrupled the output from a single
sequencing gel.
Dupont dye terminators—allowed all
four nucleotides to be attached to
terminal nucleotide in the same
sequencing reaction.
Capillary eliminated need to cast gels.
Sequencing the Genome an
Overview
Show [Link] file containing
movie about sequencing the human
genome.
Two approaches to sequence the
genome
Hierarchical Shotgun clone libraries
Use map to pick pieces of genome in order,
break them, sequence and reassemble.
(Watson)
Whole genome shotgun
Break up genomic DNA randomly,
sequence several genome equivalents, and
reassemble. (Ventner)
Hierarchical Shotgun Clone
Libraries
Top-down strategy
Ordered library of clones based on large
scale maps.
Subclone larger inserts into sequencing
vector.
Reassemble sequence.
Based on order.
ESTs
Expressed sequence tags
Reverse transcribe mRNA and
sequence.
Venter used nonspecific primer to
randomly amplify 150-400 bp fragments
of genes.
Patent controversy
NIH announced it would seed a patent
on Venter’s STS.
Very controversial since functionally
unknown.
More appropriate to private company.
Watson said it was “sheer lunacy” and
resigned due to conflict with Bernardine
Healy NIH director.
More patent
Many biotech companies arose at the
time to mine ESTs and applied for
patents on the genes for diagnostics
and pharmaceuticals.
NIH withdrew patent application.
ESTs must be novel to be patented.
ESTs must be useful to be patented.
The result
No patents granted thus far on genes
without known function.
Whole genome shotgun
Break the genome into a bunch of pieces
often by mechanical shearing.
Sequence pieces and reassemble.
Weber (Marshfield Medical Research
Foundation) and Myers (U of AZ) proposed
method to speed sequencing.
1998 Venter leaves NIH to head Celera and
promised to sequence human genome in 3
years for $300 million.
Accelerated the public project.
Whole genome method was tested by
sequencing 120 Mbp of Drosophila
genome.