“Every dollar we invested to map the human genome returned $140 to our economy — every dollar. Today our scientists are mapping the human brain to unlock the answers to Alzheimer’s. They’re developing drugs to regenerate damaged organs, devising new materials to make batteries 10 times more powerful. Now is not the time to gut these job-creating investments in science and innovation.”—Barack Obama, “Project Seeks to Build Map of Human Brain,” 2013
Homo sapiens chromosome 15 genomic contig, GRCh37.p10 Primary AssemblyNCBI Reference Sequence: NT_037852.6
Typical Sunday Morning(?):
I wake up asking myself whether it would be possible to modify the ByteWriter program to visualize DNA.
After breakfast, a couple of cups of coffee, and a few hours, the answer is: It is.
After looking through a database containing the human genome and visualizing a portion of the 15th chromosome, partially responsible for eye and skin color, I noticed some repeating patterns.
This is what that first segment sounds like.
The source file contains only 4 characters (A, C, G, & T) in raw UTF-8 text. So the natural conversion is translating the data into 8-bit mono audio stream.
I can’t believe how much this resembles raw machine code.
“We share 56% of our genes with yeasts, more than half with fruit flies, 90% with mice, and 98% with Chimpanzees. It is not genetics that makes us human.”—Results of the Human Genome Project.
The Human Genome Project
An international scientific research project designed to study and identify all of the genes in the human genome, to determine the base-pair sequences in human DNA, and to store this information in computer databases. The Human Genome Project began in the United States in 1990 and was completed in 2003.
Me and science
I went to a tiny liberal arts school where we studied the history of (Western) math and science. The things I know about science are dated, often wrong (we read Galen and Ptolemy before we read Harvey and Copernicus, for example), and are what I assume to be laughably well-known and simple to anybody who studies actual science. Further, I wasn’t even that good at it (Huygens and Maxwell are stupid hard, and I have also always been terrible at science) so what I got out of my “History of Science” classes was very minimal.
tl;dr, I am the last person you would go to if you wanted an expert on science.
And so I was really surprised when I read this article on Eric Schadt about, among other things, the failure of the Human Genome Project.
It was so triumphant that we believed in it (and still believe in it) even when it has gone a long way toward bankrupting the pharmaceutical industry with drugs like the painkiller Vioxx and the diabetes medication Avandia — drugs that hit their molecular targets but also cause catastrophic side effects by hitting other unforeseen targets as well — or drugs that never come close to making it to market at all. We still believe in it even when nearly ten years after the mapping of the genome, it has radically increased the cost of drug development while delivering next to nothing in return.
Basically the idea I got from it is that a lot of people - not just laymen but some of the scientists working on the project - believed that each gene had a discrete purpose, that there were “good” genes and “bad” cancer-causing genes, that these genes didn’t interact with each other in any way, and therefore fucking with the “bad” genes wouldn’t have any harmful side effects.
They’re wrong. And the idea of discrete purposes of genes as opposed to “the milieu,” as we called it in class, is even to me dated and simplistic and silly. That is some Mendellian middle school bullshit right there. (And I want to be clear I have the utmost respect for Mendel and his work). But I cannot imagine that somehow, at my tiny backwards college, we learned something that we have managed to keep secret from the tens of thousands of actual scientists in the country. Especially because I don’t even know if I should be talking about genes, or chromosomes, or DNA.
Anyway, I’m constantly amazed at things like this. Because even though I’m terrible at science and I hate doing science, I kind of love it. Occam’s Razor is all well and good but it seems like for every magically-simplifying theory (“have you ever considered a heliocentric model?”) there’s another that shows that things are actually more crazy and complicated and awesome than we thought (quantum mechanics!). And if that isn’t what excites you about doing science I don’t really understand why you’re in charge of the Human Genome Project.
Vague Notes Time!
Feel free to scroll past on your dash :V
Genome = DNA info of individual or species
Humans: 3 billion base pairs
Can sequence 1000 bp fragments
First discussed in early-mid 80’s
1986-7 federal effort started
US Dept of Energy
1987-8 National Institutes of Health gets involved
Each chromosome sequenced individually
International cooperation from Japan, UK, etc.
Public access database established in 1991
1995 first genomes (bacteria) were announced
Slow progress in mid 90’s
Relating fragments via tedious manual procedures
1998 Craig Venter starts Celera Genomics Corporation
For profit (Celera) vs non-profit (governmental) approaches
June 2000 “working draft” sequence
May 2006 final
Total cost about $2 billion
High quality data
Estimated errors 1/100,000
Humans have surprisingly few genes (20,000-25,000)
Mouse - ~20,000 genes
Fruit fly - 13,600 genes
Roundworm - 19,100 genes
Total genome ~4.7 billion bp (70% known)
Mammoths, elephants differ 0.6%
Differences: 1.2% single-nucleotide changes, 2.7% duplications, 35 million single-base differences
99% mouse genes are also in humans but shifts location after evolution
The C-value paradox
Some “extra DNA” does not code for protein
Still has important functions
Centromeres: 200 bp sequence repeated 1000s of times (70% AT)
Telomeres: TTAGGG in many vertebrates, TTTAGGG in plants
However, does not account for all non-coding DNA
Minisatellites (aka VNTRs or variable number of tandem repeats)
10-100 bp sequences repeated 1000-5000 bp
~200,000 in human genome
Not transcribed or translated, unknown function
Origin: errors in replication
Human Alu sequence (~300 bp)
Repeated 500,000, makes up 5% of genome
Interspersed repeats in genome: >1,000,000 copies
Retroviral RT often present in cells
Can act on cell RNA
Eukaryotes have lots of nontranslated DNA
Human genome as a whole: Transposons = 45%, Introns = 24, Large duplications = 5, Simple sequence repeats = 3, Other untranslated = 21, Exons = 1-2
Microbiome: 500 bacterial species
100 trillion cells vs. few trillion human cells
All genes not expressed at same time
~21,000 genes, 10-15,000 mRNAs
PCR: important tool for sequencing
Another way to multiply particular sequence
DNA polymerase rather than cellular application
Kary Mullis, 1983
Existing DNA strand as template
Mullis developed way to copy part of genome
DNA polymerase adds nucleotides
Denaturation adds artificial primers
High temperatures —> can’t use ordinary DNA polymerase
30 cycles: 1 molecule —> ~1 billion
30 cycles: 3 hours
Similarity between primates long recognized
Prosimians (New World monkeys, i.e. lemurs, tarsiers, etc.)
Old World monkeys (apes, gorillas, etc.)
Humans are especially similar to apes
Tyson’s 1699 anatomical study of a chimp
Linnaeus’s 1735 classification of humans with primates
Humans = homo sapiens
Orangutans and chimpanzees also homo
Now, humans = homo, chimps = pan, orangutans = pongo, gorillas = Gorilla
Macroevolution; the missing link
Java Man: Dutch physician Eugene Dubois; 1891, Indonesia, Java
Flat, very thick skullcap, brain size about 940 cc
Age about 700,000 years
First Pithecanthropus (ape-man) erectus, now Homo erectus
Raymond Dart, Australian anatomy professor
“Taung child”, South Africa, 1924
Several fossil species, about 4.2-1.4 million years old
Like apes in some features, like humans in some features, intermediate in some features
Brain: 430-440 cm cubed (other species <500)
Climate change: less forest, more grassland, food dispersed in patches
More energy-efficient locomotion
Less chance of thermal stress
Homo, same genus as humans
Increase in brain size to 650 cc to 1400 cc
Earliest tools, 2.6 million years ago
Stone hammers, scrapers
First outside Africa
Persisted till about ~18000 years ago
Homo Sapiens - 195,000 years ago
Macroevolution, a new type of organism should not show up suddenly
Transition between types
Scientific method and human evolution
Induction —> hypothesis
Deductive logic —> prediction
Based on human-ape anatomical similarity
Fossil record should contain intermediates
Humans and apes should be similar genetically
Reconstructing evolutionary relationships from DNA
PCR to amplify same gene in different species
Reconstruct ancestral pattern (ATCG)
Usually study another species related to these 3
Humans: hearing, speech, immune system, smell genes rapidly evolved rapidly
Chimps: muscle and skeleton genes evolved rapidly