Recently, I’ve had several conversations with biologists who know they should probably learn to code, but the whole concept is so daunting and they’re too busy with their research anyway. I’ve tried to convince them to pursue programming, not just because it’ll make their research analyses faster and better, but because it’s an incredibly powerful skill that anyone can learn.
I see programming and engineering as just other methods of creating things that never were, like writing or art or music or dance. Before Grace Hopper, there were no compilers for computers, and everything had to be written in machine- and architecture-specific instructions. Before Donald Knuth (also see his advice for young people video), typesetting math equations was a huge pain, and now we have TeX, LaTeX and derivatives to painlessly create beautiful equations. Before Linus Torvalds and Richard Stallman, creators of GNU/Linux (this is the operating systema upon which Mac OS is built), there was no unifying operating system that could be modified and changed for each user’s needs. Before Microsoft and Apple computer, the concept of the personal computer didn’t even exist, and it was thought to be just a fad that wouldn’t hold.
The reason I got into programming wasn’t to make games or to hack into things. I started because I was working in a biology lab on protein binding microarrays, and I ran a graduate student’s code which turned this glass slide with polka dots of where proteins bound to double-stranded DNA, into the information of the letters bound by this protein. Then, I took my first programming class in my sophomore year of college, and I only took it because it was required for Biological Engineering. It was in Python. I thought it was pretty cool, but the moment where I truly became a computational biologist was my second research experience where I coded up Hidden Markov Models in Python. The lab I was working in was a C-language lab so I decided to learn C and “translate” my program from Python to C. It was awesome. As much as
free were annoying (though
valgrind helped!), it was so satisfying to see my program run 100x faster in C, and have the satisfaction that I learned this difficult language. Keep in mind that I had been coding for less than a year when I attempted this project.
When I did my MS in bioinformatics at UC-Santa Cruz, I met Paola. She had studied biochemistry at UCLA and came to UCSC for the program’s bimolecular side, since it technically was biomolecular engineering and bioinformatics. The first class we took for the program was a bioinformatics algorithms course, and this was the first time she had coded in her life. And she did awesome. She fell in love with coding as I had, and left gel electrophoresis behind to take discrete math courses to boost her CS knowledge. She now works as an engineer at Survey Monkey, getting hired after just over a year of coding.
Another friend of mine, Neha studied mechanical engineering in undergrad, went into consulting, but hated it. She applied to Hacker School twice but didn’t get in. She took Udacity classes on programming and applied again, this time getting in. Through three months of intensive programming and software development at Hacker School, Neha learned more than many people learn in years. She quickly found a job as a software engineer at Rent the Runway and adores it.
I’m incredibly proud of Paola and Neha for reinventing themselves. And the task of learning programming is not impossible, it just takes time and grit. And you can do it, too.
Next time you sit down to analyze some differential expression or calculate some statistics, force yourself to do it in Python, specifically pandas. Feel free to tweet at me with questions! My handle is @olgabot
I forgot to mention that one of my mentors, Sean Eddy, studied molecular biology for his PhD, and only after he earned his PhD did he get into coding. And he’s made significant contributions to the field of computational biology through the sequence homology comparison program, HMMER, RNA structure prediction through Infernal. If you can do it after a PhD, it’s truly never too late!
On the flip side, at both UC-Santa Cruz and UC-San Diego there are lifelong software engineers who have spent decades coding up programs, and have decided that they want to apply their programming skills to something else. So they go do a PhD in bioinformatics! Either way, from biology to coding, or from coding to biology, you end up in bioinformatics! Clearly the best field :)
Resources for learning to code
- Rosalind.info: Named after Rosalind Franklin, whose X-ray crystallography experiments helped Watson and Crick discover that DNA is a double helix, Rosalind.info is an awesome resource for bioinformatics-specific problems. You start with translating DNA to RNA to Protein, then eventually you build your own genome assembler!
- try.github.io: Cool way to start learning code versioning (keeping backups of your code), which is ultra-important so you don’t have to try to remember what you did six months ago.
- Coursera: An online learning platform with tons of classes, including the awesome (Bioinformatics Algorithms)[https://www.coursera.org/course/bioinformatics] class that I helped design. I’ve also taken the Probabalistic Graphical Models and Machine Learning courses. PGMs was good, but ML was a little watered down for my math major self, but it’s a good intro overall.
- Udacity: Another online learning platform. They have tons of courses on Computer Science, Web Development, Design of Computer Programs (taught by Peter Norvig!!!), and many, many, more.