Attaching jobs in Gearman

Thursday, 8 October 2009

I've used Gearman on and off in the past but for a new project, I've decided to explore some features I rarely made use of previously. Most notably, the unique ID that clients can submit with each job.

Let me just clarify some behaviors for non-background jobs:

the worker will execute the job until it finishes even if the client that submitted it dies;
if the client dies before the job is passed on to a worker, it will be removed and will never execute.

For background jobs, the behavior is different: a client submits a background job, the gearmand daemon queues this (on stable storage if you use the persistent queues of the latest versions of the C implementation), and sends it to a worker as soon as one is available.

This is basic stuff, I'm just making sure we are all on the same page regarding Gearman behavior.

Back to the the unique parameter. You can define a unique key for your jobs. Those keys should be unique per function, so you can have the same key on different functions and they will be two different jobs.

The fun part with unique is that you can have multiple clients listening to status and completion events of the same job if you share the key between them. Say that you start a job with a key alpha. If other clients submit a job with the same key, and (and this is an important) the job is still running, this second client will attach itself to the same job.

To test this I wrote a small shell script worker (save it as slow.sh and make it executable with chmod 755 slow.sh):

#!/bin/sh

echo "Starting a slow worker..." >>/dev/fd/2
for i in 5 4 3 2 1 ; do
  echo $i >>/dev/fd/2
  sleep 1
done
echo done $PID

This worker does nothing expect print to STDERR a count, and then sends to the client a done string with the worker process ID. But it takes 5 seconds to run so it allows us some time to switch between windows.

After you start your gearmand server (I used gearmand -vv just to have some debug) you can start a worker process like this:

gearman -w -f slow ./slow.sh

This worker registers the slow function and will execute the slow.sh per job.

Now open two new terminal windows and type on each one:

gearman -f slow -u alpha -s

This will submit a non-background job for the slow function, with the unique key alpha. The -s means that we won't be sending any data.

You'll see one execution of the worker, and then both clients will output something like this:

done 57998

You can experiment, and attach the second client only half-way through the worker run. Or you can stop the worker, start both clients, and then start the worker. The result will be the same: both clients receive the output for the same job.

This is a very very cool feature, and can be used easily for slow processing inside a web request.

First, if a web request requires a slow processing phase, we store all the relevant data and submit a background job to do the processing with some random key. The user receives a "Processing page" on his browser, that includes this key.

A small javascript program using AJAX, connects to our long-pooling server and submits a non-background job to the same function and using the key. This second client request will now wait until the processing is done, and can even receive status updates from the worker and send them back to the browser.

This attach-to-job-using-key is very reliable. I've tested several combinations with and without workers running, strange order of job submission, adding multiple clients to the same job, and all of them work as expected.

The only real problem with this is that the clients need to use the same API that is used to submit a job. Instead of a new API, like ATTACH_JOB for example, you use the same SUBMIT_JOB API that you use for new jobs. This works fine, until your second client attaches with a key for a job that has already ended. The gearmand server will fail to find it, and will dutifully create a new job.

My current workaround for this is to send a specific payload on the attach requests, to signal the worker that this is a attach request and not a new job. The worker, if he detects this signal, just ends the processing. For example, if your jobs require payload data, you can use an empty data field as the flag.

This is not optimal of course, you would be waking up workers just to return immediately, but it would work.

If you have multiple gearmand servers, you need to make sure that the clients that will be attaching to a job use the same server. A solution would be to pass the server identification (IP or even better hash-of-IP) along with the key.

The traditional way of doing this is have the workers store the completed result in a database somewhere. The solution presented here does not invalidate that, it only provides a very light notification of completion to background jobs. It beats pooling the database to see if the background job is completed.