bioinformatics

7

Top 5 misconceptions about evolution: A guide to demystify the foundation of modern biology.

Version 1.0

Here is an infographic to help inform citizens.  From my experience most people who misunderstand evolution are actually misinformed about what science is and how it operates.  That said, here are five of the biggest barriers faced when one explains evolution - I have faced these and they are documented in the literature.

I hope you can build on my work and improve the communication between the scientists and the public.

Want to do more?  If you want to donate to the cause of science education I suggest the National Center for Science Education http://ncse.com, your local university, or an equivalent organization.  Volunteering at schools and inviting scientists into classrooms are two ways to encourage an informed society.  Attend hearings if school boards start questioning evolution’s role in public curriculum.  Raise a storm if anyone tries to ban science.  Plus, it never hurts to reblog a well made evolution post.

Thank you followers for all your support!
Love, 
molecularlifesciences.tumblr.com

[Version 2.0 now available!]

CRISPR could protect genetic intellectual property by self-destructing synthetic sequences:

“A lot of efforts have been made around creating [biological] kill switches. We’re building on that, so that [a bacterium] wouldn’t just kill itself, but delete its synthetic DNA before doing that.” It’s like the biological version of hitting CTRL-Z.”

Wondering if it’s working with DNA Storage.

Molecular model of a ribosomeWellcome Images

Molecular model of a bacterial ribosome showing the RNA and protein components in the form of ribbon models. In the large (50S) subunit the 23S RNA is shown in cyan, the 5S RNA in green and the associated proteins in purple. In the small (30S) subunit the 16S RNA is shown in yellow and the proteins in orange. The three solid elements in the centre of the ribosome, coloured green, red and reddish brown are the transfer RNAs (tRNAs) in the A, P and E sites respectively. The anticodon loops of the tRNAs are buried in a cleft in the small subunit where they interact with mRNA. The other ends of the tRNA, which carry the peptide and amino acid, are buried in the peptidyl transferase centre of the large subunit, where peptide bond formation occurs.

A hierarchical ontology of genes, cellular components and processes derived from large genomic datasets.

Toward a New Model of the Cell
Everything You Always Wanted to Know About Genes

Turning vast amounts of genomic data into meaningful information about the cell is the great challenge of bioinformatics, with major implications for human biology and medicine. Researchers at the University of California, San Diego School of Medicine and colleagues have proposed a new method that creates a computational model of the cell from large networks of gene and protein interactions, discovering how genes and proteins connect to form higher-level cellular machinery.

The findings are published in the December 16 advance online publication of Nature Biotechnology.

“Our method creates ontology, or a specification of all the major players in the cell and the relationships between them,” said first author Janusz Dutkowski, PhD, postdoctoral researcher in the UC San Diego Department of Medicine. It uses knowledge about how genes and proteins interact with each other and automatically organizes this information to form a comprehensive catalog of gene functions, cellular components, and processes.

“What’s new about our ontology is that it is created automatically from large datasets. In this way, we see not only what is already known, but also potentially new biological components and processes – the bases for new hypotheses,” said Dutkowski.

Originally devised by philosophers attempting to explain the nature of existence, ontologies are now broadly used to encapsulate everything known about a subject in a hierarchy of terms and relationships. Intelligent information systems, such as iPhone’s Siri, are built on ontologies to enable reasoning about the real world. Ontologies are also used by scientists to structure knowledge about subjects like taxonomy, anatomy and development, bioactive compounds, disease and clinical diagnosis.

A Gene Ontology (GO) exists as well, constructed over the last decade through a joint effort of hundreds of scientists. It is considered the gold standard for understanding cell structure and gene function, containing 34,765 terms and 64,635 hierarchical relations annotating genes from more than 80 species.

“GO is very influential in biology and bioinformatics, but it is also incomplete and hard to update based on new data,” said senior author Trey Ideker, PhD, chief of the Division of Genetics in the School of Medicine and professor of bioengineering in UC San Diego’s Jacobs School of Engineering.

“This is expert knowledge based upon the work of many people over many, many years,” said Ideker, who is also principal investigator of the National Resource for Network Biology, based at UC San Diego. “A fundamental problem is consistency. People do things in different ways, and that impacts what findings are incorporated into GO and how they relate to other findings. The approach we have proposed is a more objective way to determine what’s known and uncover what’s new.”

Read more

kickstarter

Music of the Spheres - Two researchers have recorded music onto DNA molecules

We know that it’s possible to store and preserve data in DNA and living cells. Now, the collaboration Music of the Spheres between visual artist Charlotte Jarvis and British scientist Dr Nick Goldman want to store a digital music piece from the Kreutzer Quartet as digital information in synthetic DNA molecules.

Music of the Spheres is a cross-disciplinary art project inspired by the possibilities of the new bioinformatics technology developed by Dr. Nick Goldman. Visual artist Charlotte Jarvis commissioned music from the Kreutzer Quartet, the recording of which has been encoded into DNA. The DNA was then suspended in soap solution and will be used by Charlotte to create performances and installations filled with bubbles. The ‘recording’ will fill the air, pop on visitors’ skin and literally bathe the audience in music.

They are looking for £5,000 on kickstarter to achieve their goal. Potential backer rewards include a bottle of DNA-infused bubble solution and paintings made from music-encoded DNA being blown onto paper.

It’s a wonderful project, showing the potentials of bioinformatics, DNA Engineering and DNA Storage. But if you want to have a look on the dark side, check out DNA Fog for marking criminals.

[Music of the Spheres - Kickstarter] [Music of the Spheres Project] [via factmag]

vimeo

704TB in DNA: A New Method of Information Storage?

One of the first things you learn in biology is that the nucleic acids, particularly DNA, store the biological blueprint inside an organism’s cells. Scientists have been experimenting with making little circuits and and factories out of DNA, but it’s evident now that we’ve been missing out.

Professor George Church and his team at Harvard University have encoded and copied the professor’s new book entirely into DNA. They stuffed 96 bits into each DNA strand by treating each of the bases (A, T, C, and G) as though they were binary values. The genetic sequence was then synthesised by a microfluidic chip that matched up that sequence with its position in a relevant data set, even when all the DNA strands were out of order.

“My flash drive works just fine,” you say? Well, apparently DNA works better. Microscopic DNA can store a gigantic amount of information: 704TB of data fits into a cubic millimeter, or more than you’d get out of a few hundred hard drives. Of course, there are caveats; the processing time is currently too slow for time-sensitive content, and cells with living DNA would destroy the strands too quickly to make them viable for anything more than transfers. All the same, with DNA’s density and lifespan of eons, Professor Church has certainly opened a few interesting doors in biotechnology and broadened technology’s view of information storage.  

The article was originally published in the journal ScienceHarvard University’s website has an article that makes for lighter reading. The video is the property of Harvard University.

Wide range of differences, mostly unseen, among humans

No two human beings are the same. Although we all possess the same genes, our genetic code varies in many places. And since genes provide the blueprint for all proteins, these variants usually result in numerous differences in protein function. But what impact does this diversity have? Bioinformatics researchers at Rutgers University and the Technische Universitaet Muenchen (TUM) have investigated how protein function is affected by changes at the DNA level. Their findings bring new clarity to the wide range of variants, many of which disturb protein function but have no discernible health effect, and highlight especially the role of rare variants in differentiating individuals from their neighbors.

Continue Reading

You can code, too!

Recently, I’ve had several conversations with biologists who know they should probably learn to code, but the whole concept is so daunting and they’re too busy with their research anyway. I’ve tried to convince them to pursue programming, not just because it’ll make their research analyses faster and better, but because it’s an incredibly powerful skill that anyone can learn.

I see programming and engineering as just other methods of creating things that never were, like writing or art or music or dance. Before Grace Hopper, there were no compilers for computers, and everything had to be written in machine- and architecture-specific instructions. Before Donald Knuth (also see his advice for young people video), typesetting math equations was a huge pain, and now we have TeX, LaTeX and derivatives to painlessly create beautiful equations. Before Linus Torvalds and Richard Stallman, creators of GNU/Linux (this is the operating systema upon which Mac OS is built), there was no unifying operating system that could be modified and changed for each user’s needs. Before Microsoft and Apple computer, the concept of the personal computer didn’t even exist, and it was thought to be just a fad that wouldn’t hold.

The reason I got into programming wasn’t to make games or to hack into things. I started because I was working in a biology lab on protein binding microarrays, and I ran a graduate student’s code which turned this glass slide with polka dots of where proteins bound to double-stranded DNA, into the information of the letters bound by this protein. Then, I took my first programming class in my sophomore year of college, and I only took it because it was required for Biological Engineering. It was in Python. I thought it was pretty cool, but the moment where I truly became a computational biologist was my second research experience where I coded up Hidden Markov Models in Python. The lab I was working in was a C-language lab so I decided to learn C and “translate” my program from Python to C. It was awesome. As much as malloc and free were annoying (though valgrind helped!), it was so satisfying to see my program run 100x faster in C, and have the satisfaction that I learned this difficult language. Keep in mind that I had been coding for less than a year when I attempted this project.

When I did my MS in bioinformatics at UC-Santa Cruz, I met Paola. She had studied biochemistry at UCLA and came to UCSC for the program’s bimolecular side, since it technically was biomolecular engineering and bioinformatics. The first class we took for the program was a bioinformatics algorithms course, and this was the first time she had coded in her life. And she did awesome. She fell in love with coding as I had, and left gel electrophoresis behind to take discrete math courses to boost her CS knowledge. She now works as an engineer at Survey Monkey, getting hired after just over a year of coding.

Another friend of mine, Neha studied mechanical engineering in undergrad, went into consulting, but hated it. She applied to Hacker School twice but didn’t get in. She took Udacity classes on programming and applied again, this time getting in. Through three months of intensive programming and software development at Hacker School, Neha learned more than many people learn in years. She quickly found a job as a software engineer at Rent the Runway and adores it.

I’m incredibly proud of Paola and Neha for reinventing themselves. And the task of learning programming is not impossible, it just takes time and grit. And you can do it, too.

Next time you sit down to analyze some differential expression or calculate some statistics, force yourself to do it in Python, specifically pandas. Feel free to tweet at me with questions! My handle is @olgabot

EDIT:

I forgot to mention that one of my mentors, Sean Eddy, studied molecular biology for his PhD, and only after he earned his PhD did he get into coding. And he’s made significant contributions to the field of computational biology through the sequence homology comparison program, HMMER, RNA structure prediction through Infernal. If you can do it after a PhD, it’s truly never too late!

On the flip side, at both UC-Santa Cruz and UC-San Diego there are lifelong software engineers who have spent decades coding up programs, and have decided that they want to apply their programming skills to something else. So they go do a PhD in bioinformatics! Either way, from biology to coding, or from coding to biology, you end up in bioinformatics! Clearly the best field :)

Resources for learning to code

  • Codecademy (like a “code-academy”… get it??): Codecademy is an awesome resource for getting started with a language. I used it to learn Javascript!
  • Rosalind.info: Named after Rosalind Franklin, whose X-ray crystallography experiments helped Watson and Crick discover that DNA is a double helix, Rosalind.info is an awesome resource for bioinformatics-specific problems. You start with translating DNA to RNA to Protein, then eventually you build your own genome assembler!
  • try.github.io: Cool way to start learning code versioning (keeping backups of your code), which is ultra-important so you don’t have to try to remember what you did six months ago.
  • Coursera: An online learning platform with tons of classes, including the awesome (Bioinformatics Algorithms)[https://www.coursera.org/course/bioinformatics] class that I helped design. I’ve also taken the Probabalistic Graphical Models and Machine Learning courses. PGMs was good, but ML was a little watered down for my math major self, but it’s a good intro overall.
  • Udacity: Another online learning platform. They have tons of courses on Computer Science, Web Development, Design of Computer Programs (taught by Peter Norvig!!!), and many, many, more.

A computer analysis of gene expression in normal and cancer cells. Each line represents one of 22,500 different genes, which have been analyzed for changes occurring in their expression patterns when normal and cancer cells are treated with a demethylating agent. DNA methylation is a process associated with regulating genes. Red represents high levels of gene expression and blue represents low levels.  Image courtesy of Wellcome Trust

One Hundred Patients, One Thousand Experts, One Million Hopes: UC San Diego Moores Cancer Center Launches “My Answer to Cancer”

Today, UC San Diego Moores Cancer Center launched a personalized cancer treatment program called “My Answer to Cancer.”  A team of oncologists, bioinformaticians, pathologists and geneticists has pledged to sequence and analyze the DNA of a 1000 patients with metastatic disease.  When the project concludes, researchers and clinicians hope to have created an enormous and unprecedented database of cancer/DNA data linked to clinical outcomes. That knowledge can then be used to offer novel therapies targeted for specific tumor defects.

Scott Lippman, MD, director of Moores Cancer Center, is unabashedly optimistic that efforts like “My Answer to Cancer” will change the paradigm of cancer treatment:

“‘My Answer to Cancer’ is a necessary first step to getting to personalized medicine.  We ultimately want to be able to do molecular sequencing on everybody, but you have to start somewhere. This study helps us work out the many aspects and challenges, such as getting critical information from the lab to the doctor and how the doctor can effectively use it.  We need to learn how to do this quickly, without waiting weeks or months between lab results and treatment.

We’re experiencing a sea-change. This is not your father’s cancer treatment.  It’s not enough now to just know what’s going on in a cancer cell. We need to know what’s happening in the tumor microenvironment, in other cells and tissues, in how the patient’s immune system is functioning and responding and in respect to other host factors such as inherited changes and commensal bacteria. These things have major impacts upon the behavior of the cancer.

Scientists and doctors at UC San Diego and Moores Cancer Center are identifying molecular defects and targets that we can therapeutically attack in dozens of different ways, with antibodies, for example, or small molecules. The science is moving quickly. Look at all of the drugs that have been developed and approved for cancer in the last few years. Compare this to the situation 20 years ago. Back then, to treat a cancer patient, doctors would look at the few drugs available and consider various doses or combinations for optimal effect.

I think we can find new cures for some cancers and get to a point with others where we can convert cancer from a life-threatening disease to a chronic condition.”

I once had a particularly thorny scientific problem that I kept hacking away at (if you must know, the in-memory layout of compressed DNA structures for fast search retrieval) and it almost killed me. I thought about it as I commuted to work by car and it absorbed all my attention as I drove around Cambridge – through traffic lights, tailbacks and roundabouts. So much so that when I started to see a solution, my brain shut out more and more of the world around me. I was totally startled when someone tapped on my car window and asked if I was OK. I had frozen in front of a roundabout and apparently hadn’t moved my car for a good three minutes. I hadn’t even heard the beeping behind me. I apologised profusely and drove on and have since then stuck to a rule of not thinking too hard about things while operating heavy machinery. I take the train a lot more.
— 

Ewan Birney @ewanbirney 
Computational biologist and joint associate director of the European Bioinformatics Institute (EMBL-EBI Hinxton), Cambridge

The Guardian: Scientists and their emotions