Last May, I started my internship at Tumblr on the Core Scala team, which writes and maintains the backend Scala services at Tumblr. These backend services do a lot of the high-throughput work involved with communicating between the frontend of the website and the immense amount of data. Tumblr has almost 200 million blogs and over 83 billion posts, with tons of data associated with all of those; therefore, you can imagine that any service that has to deal with retrieving and formatting this information would need to be extremely robust and highly scalable. Before coming to Tumblr, I’d had some experience doing software engineering, but never on a website, nor anything even remotely close to this scale. I had taken courses that taught the concepts behind distributed systems, but applying those ideas to real-world services was an entirely new opportunity.
The first thing I worked on at Tumblr was a service that took in requests for unread post counts, unread inbox counts, and toast notifications, and returned the desired data back as properly filtered and formatted JSON. These requests come in at slightly over 7,000/requests/second at normal peaks in traffic. Originally, this functionality was accomplished by a php file (poll.php, named that because the dashboard continually polls for this data) which utilized 36 machines in our datacenter to serve the requests. My project, eventually dubbed the “Pollscala” service since it was written in Scala, started out as a training exercise for me to learn many of the different technologies that come together in building a successful service. Eventually, Pollscala evolved from being just a training exercise into being a production service. Each instance of the new service can serve over 5,000 requests/second, and even assuming I wanted to keep a few extra servers than necessary as backup (which we did, of course), I was able to decommission most of the old machines and shut them down. While the old service was near its threshold handling the peak level of requests, the new Scala service can theoretically handle somewhere around 25,000 requests/second. My work at Tumblr is helping to give the site room to grow with its users.
As my work on Pollscala progressed, I learned about Scala, about using tools such as Redis and Memcached, about using JSON, about building RESTful web services, and had the joy of exploring the complex codebase and ecosystem that keeps Tumblr running smoothly (most of the time!) for the gigantic set of users who love to browse Tumblr all day and all night. I took a brief trip into the land of Scala JSON libraries and built a tool to benchmark their performance on different shapes of JSON objects. I even was able to take a day long journey out to the great state of New Jersey (my birthplace and childhood home) to see the Tumblr datacenter, a place that is an impressive feat of engineering and could survive the apocalypse and keep serving data.
(A brief aside on Scala: before coming to Tumblr, I didn’t know anything about Scala, besides from being able to recognize that it was a programming language and not a mythical creature from Arthurian legend or something. As the tale goes, Scala was designed by a cool German dude named Martin Odersky who grew so frustrated when working on generics for java that he decided to just build another better language on the JVM. Scala is the beautiful love child of Java and Standard ML, and it has the best parts of both its parents. Scala has a strong static type system, is object oriented, and supports all of the cool shit we love from functional programming languages. And, if you feel like you’re missing that one awesome Java library you used in every project, don’t worry—any Java library can be used in Scala to your heart’s desire. Learning and playing with Scala was one of the highlights of my summer at Tumblr.)
The final project that I worked on was a build and deploy tool for Scala services at Tumblr. I contributed to the backend RESTful API portion of the tool (which we wrote in Scala), which controlled actions such as getting and putting things into the database, as well as kicking off builds and deploys and tracking the progress of these processes though statuses and logs. I quickly learned the vital importance of having a good tool for doing builds, deploys, and service control (starts, restarts, stops) on our services.
Working at Tumblr has been educational, interesting, and above all, phenomenally fun. Coming in, I was daunted by all the technologies I would have to master in order to successfully contribute to the company, but I now walk away from this summer comfortable with an incredible new skill set. I’m grateful that my team put their faith in me and gave me the opportunity to grow as an engineer, and I’m grateful to have worked in such a high-energy and exciting office. Being a part of Tumblr, and becoming friends with some downright hilarious and chill people here, has been absolutely delightful. Fuck Yeah.