delayedjob

Delayed Processing Roundup in Rails

While working on a Rails 3 application running on Heroku, I found the need to process operations after a delay. Say a 3 minute delay, then revisit the task. If certain criteria aren’t yet met, put it back in for another 3 minute delay.

I sought recommendations from my Ruby User Group and got some good suggestions. This post will explore some of those along with others I found. I’ll try to identify pros and cons of each as related to my project and needs. This isn’t an exhaustive analysis, but I hope this can be of some value to others.

First, I’ll disclose my requirements and motivations because they affect my bias and final decision. My preferred solution:

  • Doesn’t require a dedicated worker thread. On Heroku, a dedicated thread runs on a “Dyno”. Each Dyno that I’m running costs me billing. I haven’t paid for my development hosting yet, and I’d rather not start now. :)
  • Can scale up as demand/usage scales.

DelayedJob

DelayedJob is a mature project with lots of real-world usage. It was extracted from a real system (Shopify) and has been used by many others.

From the project’s site:

Database based asynchronously priority queue system — Extracted from Shopify

Delayed_job (or DJ) encapsulates the common pattern of asynchronously executing longer tasks in the background.

It is a direct extraction from Shopify where the job table is responsible for a multitude of core tasks. Amongst those tasks are:

  • sending massive newsletters
  • image resizing
  • http downloads
  • updating smart collections
  • updating solr, our search server, after product changes
  • batch imports
  • spam checks

Pros

  • Mature, proven
  • Suitable for lots of purposes
  • Pure Ruby solution
  • Supported on Heroku
  • Can scale up to more worker threads

Cons

  • Requires dedicated separate thread that is continuously running
  • Each worker will check database every 5 seconds (even when no jobs, forcing processor time)

Heroku Delayed Job Autoscale

The not-so-catchy named "Heroku Delayed Job Autoscale" gem, is however, at least descriptive.

The github project says:

Simply put, this gem saves you money on Heroku by only running the workers when you need them. When a new job is enqueued, this gem will fire up a new worker instance if none are running. When the job finishes, it’ll shut it down.

Save me money?! That sounds pretty good to me! That’s one of my primary goals with this project.

At the time of writing, the github project hasn’t been updated in about 4 months, has 13 watchers and 3 forks. Hmm. Pretty quiet.

Is there anything better, newer, or more widely used? I’m looking for sustainability and more maturity.

HireFire

HireFire is a gem very much like the “Heroku Delayed Job Autoscale” gem. As of this writing, the github project was updated within one week, has 273 watchers and 12 forks. A good start for “community”. A big factor for me was seeing the project has accepted a number of pull requests. I hate submitting changes and hearing nothing… ever. A responsive project is a better project.

How do they describe the project?

HireFire automatically “hires” and “fires” (aka “scales”) Delayed Job and Resque workers on Heroku. When there are no queue jobs, HireFire will fire (shut down) all workers. If there are queued jobs, then it’ll hire (spin up) workers. The amount of workers that get hired depends on the amount of queued jobs (the ratio can be configured by you). HireFire is great for both high, mid and low traffic applications. It can save you a lot of money by only hiring workers when there are pending jobs, and then firing them again once all the jobs have been processed. It’s also capable to dramatically reducing processing time by automatically hiring more workers when the queue size increases.

Sounds good so far. The low-traffic example seems to perfectly fit my needs.

Low traffic example say we have a small application that doesn’t process for more than 2 hours in the background a month. Meanwhile, your worker is basically just idle the rest of the 718 hours in that month. Keeping that idle worker running costs $36/month ($0.05/hour). But, for the resources you’re actually making use of (2 hours a month), you should be paying $0.10/month, not $36/month. This is what HireFire is for.

It scales up smoothly for higher demand too? Excellent! I was also impressed that it works with Resque/Redis, as those project interest me as a possible future scaling path. The configuration for hiring more workers for higher demand is elegant by supporting a couple ways for defining worker counts. Declaratively with an initializer or using lambdas allowing for computation.

A problem with HireFire and my scenario is the scheduled work for some period in the future. From the HireFire github page:

  • Question: With Delayed Job you can set the :run_at to a time in the future.

    • Answer: Unfortunately since we cannot spawn a monitoring process on the Heroku platform, HireFire will not hire workers until a job gets queued. This means that if you set the :run_at time a few minutes in the future, and these few minutes pass, the job will not be processed until a new job gets queued which triggers the chain of events. (Best to avoid using run_at with Delayed Job when using HireFire unless you have a mid-high traffic web application in which cause HireFire gets triggered enough times)
  • Question: If a job is set to run at a time in the future, will workers remain hired to wait for this job to be “processable”?

    • Answer: No, because if you enqueue a job to run 3 hours from the time it was enqueued, you might have workers doing nothing the coming 3 hours. Best to avoid scheduling jobs to be processed in the future.

HireFire coupled with DelayedJob seems to resolve the cons I had with DelayedJob. But HireFire can’t handle my goal of processing a task 3 minutes from now. Since HireFire is my best solution so far, I may have to rethink my requirements. Hmm.

Cron

Another option for background or scheduled work is to use cron. Some action might trigger the launching of a new cron job to perform some work. Cron is very reliable and easy to use. The jobs it runs are whatever your script says. So you have full control of execution.

With Ruby, the whenever gem gives easier interaction for creating the jobs. Railscast #164 covers cron in Ruby as well. It is simple to use and can be refactored out when needed.

On Heroku, cron is available as an addon. As of this writing, it is free for a once-a-day job or $3 for an hourly job. This is too infrequent for my needs. For a self-hosted solution, this may be a better fit.

Cron on Heroku doesn’t support my needs. Not intended for my purpose.

Resque

From the gem’s README page:

Resque is a Redis-backed Ruby library for creating background jobs, placing those jobs on multiple queues, and processing them later.

Resque is heavily inspired by DelayedJob (which rocks) and comprises three parts:

  1. A Ruby library for creating, querying, and processing jobs
  2. A Rake task for starting a worker which processes jobs
  3. A Sinatra app for monitoring queues, jobs, and workers.

Resque workers can be distributed between multiple machines, support priorities, are resilient to memory bloat / “leaks,” are optimized for REE (but work on MRI and JRuby), tell you what they’re doing, and expect failure.

Resque queues are persistent; support constant time, atomic push and pop (thanks to Redis); provide visibility into their contents; and store jobs as simple JSON packages.

Resque requires Redis (a key-value store). Redis works as a Heroku Addon as well. I really like how it doesn’t store data in my database but uses a JSON formatted data in Redis. That’s pretty cool. The Redis page already put together a little list of Resque vs. DelayedJob for us, so I’ll include that here…

Resque vs DelayedJob

How does Resque compare to DelayedJob, and why would you choose one over the other?

  • Resque supports multiple queues
  • DelayedJob supports finer grained priorities
  • Resque workers are resilient to memory leaks / bloat
  • DelayedJob workers are extremely simple and easy to modify
  • Resque requires Redis
  • DelayedJob requires ActiveRecord
  • Resque can only place JSONable Ruby objects on a queue as arguments
  • DelayedJob can place any Ruby object on its queue as arguments
  • Resque includes a Sinatra app for monitoring what’s going on
  • DelayedJob can be queried from within your Rails app if you want to add an interface

If you’re doing Rails development, you already have a database and ActiveRecord. DelayedJob is super easy to setup and works great. GitHub used it for many months to process almost 200 million jobs.

Choose Resque if:

  • You need multiple queues
  • You don’t care / dislike numeric priorities
  • You don’t need to persist every Ruby object ever
  • You have potentially huge queues
  • You want to see what’s going on
  • You expect a lot of failure / chaos
  • You can setup Redis
  • You’re not running short on RAM

Choose DelayedJob if:

  • You like numeric priorities
  • You’re not doing a gigantic amount of jobs each day
  • Your queue stays small and nimble
  • There is not a lot failure / chaos
  • You want to easily throw anything on the queue
  • You don’t want to setup Redis

In no way is Resque a “better” DelayedJob, so make sure you pick the tool that’s best for your app.

I’m not already using Redis on my Heroku account. If I were using Redis, this would be a more compelling solution. I expect my jobs to be few and quickly processed. The most important aspect for me is the scheduled delay. However, when scaling my project later, this could be a good solution to re-evaluate.

DelayedJob with HireFire fit better for now.

Girl Friday

Girl_friday is gem for asynchronous processing that works differently than the other options I covered. This blog post announces the gem and covers usage and explanation.

From the github README:

Unlike delayed_job, girl_friday runs inside of the same process as your Rails application, alleviating the need to execute multiple external application instances. It also integrates with 3rd party error tracking services and provides hooks for callbacks and monitoring.

Girl_friday runs in the same process space but uses threads to run independent of the main application (or request threads). It appears to be a newer project. (ie. not as mature)

From the gem page:

We recommend using JRuby 1.6+ or Rubinius 2.0+ with girl_friday. Both are excellent options for executing Ruby these days.

gem install girl_friday

girl_friday does not support Ruby 1.8 (MRI) because of its poor threading support. Ruby 1.9 will work reasonably well if you use gems that release the GIL for network I/O (mysql2 is a good example of this, do not use the original mysql gem).

From the wiki page:

Job Persistence

By default, girl_friday just persists jobs to memory but I’m guessing you probably don’t want to lose any queued work if you restart your app server instances. To prevent this, girl_friday supports job persistence to a Redis server.

I really like the idea of running separate threads in the same application instance. It seems like it would be lighter weight by not running a separate application environment in another process.

However, this has the same Redis issues for me as does Resque. Girl_friday also seems better suited to just doing background jobs, not so much scheduled delayed jobs (my need).

Girl_friday is an interesting project whose architecture requirements don’t match my project as well as some others. I’ll be curious to see how it develops in the future.

Conclusion

The best fit for my needs was HireFire with DelayedJob. I chose this solution because:

  1. DelayedJob is really easy to use
  2. DelayedJob is mature and battle tested
  3. HireFire solves the “cons” I had with DelayedJob
  4. HireFire saves money for my Heroku project in low-use mode
  5. HireFire scales up elegantly to allow for additional workers to be started at peak times (if I want)
  6. HireFire works with Resque as well, so I have a migration and scaling path available

There is a remaining gap where HireFire, by it’s nature, can’t process a job 3 minutes from now. I may have to change my design/requirements.

I hope this information is helpful for you determining the best solution for your needs and your project.

UPDATED: 5/11/2011 - Added information about HireFire not being able to use delayed_job’s :run_at feature.