Dreaming in sync

Monday, 7 January 2013

If you ever get me talking about applications and synchronisation of data, you might notice that I'm very passionate about sync.

You'd be right. My favourite topic in college was distributed operating systems. Today, when I design a system, I always envision it as a set of cooperating processes, working together and in parallel for a common goal.

Over the past 15 years, I've kept refining a set of rules of what I think is the ideal features any application that does sync should have.

I've narrowed down to three.

1. Close your laptop and go

If you see a "sync now" button, they blew it.

To paraphrase Vince Lombardi:

Sync is not a sometime thing; it's an all time thing. You don't sync once in a while, you don't do things right once in a while, you do them right all the time. Sync is habit.

If you share application state with your co-workers (say for example a project management app, with issues, goals, notes, design documents, whatever), when you are sitting in your office, your own version of the application is running on your laptop. It has a copy of the state. Maybe not all of the state, maybe just the part that relates to you, but for now let's assume that it has all of the state.

When you close the laptop lid and move to another place, the information on your local copy should be the update version of the shared state up to the moment you lost network connectivity.

There should be any "oh I'll just wait a bit and sync before I disconnect" though crossing your mind. Sync should not be part of your thinking process. It should just be the normal world view, it should be subconscious.

One of the best outcomes of this always-in-sync subliminal state? Given that you'll be always seeing, manipulating, the latest version of something, the opportunity for conflicts based on mis-information is rare.

2. Cherish thy conflicts

Which brings us to the big bad wolf of synchronisation: conflicts. This is the primary fear most people who think about implementing a sync solution share.

What I'm here to tell you, is that conflicts are you friends. They are the tether that binds you to humanity, because they only come up when someone has a different world view from your own.

Having someone who disagrees with you is glorious! It gives you the opportunity to get out of your bubble and interact. And you get to choose how to do so. Maybe you just IM or mail him, or maybe you call him, or even share a few minutes of his time in a hall near a whiteboard.

Conflicts are nature way of telling you that you need to get out a bit and talk to someone.

3. Be a packrat, collect it all

The most important lesson I got from git was not the graph of objects on which it was built, but the fact that git never stores diffs. It will always store a complete version of the new state.

The main advantage of having the entire state blob for each version is simple: you can always improve your conflict resolution, or your diff algorithm, or any part of the UX of both, because you have the raw data available. If you store diff's, you'll be using the state of the art diff algorithm at that moment, and going back to any version means replaying all the diffs since the last full version.

You might think that this is a huge waste of space, and it could be if you have big state blobs on which only a small percentage change between versions, but I posit that this is not the most common case. In fact, big state blobs should sound warning bells in your head, and start to break them up into smaller concepts.

No, you want small state blobs, connected together. They are faster to sync individually, and they provide a smaller surface for conflict resolution when those blissful events are casted upon you.

And with small state blobs, when state changes, you should always store

Let's be about it

So you are hyped now, you want your next application to have sync. What is your next step?

The Dropbox generation

The sync experience of Dropbox is comparable to the automobile experience of a Model T. Or the the sexual pleasure of masturbation. It gets the job done, barely, but you end up thinking that should be more to life than this.

It should, and there is.

Dropbox is a fine product, I depend on it daily for a lot of things, but most of them are files.

To sync files between devices and across several operating systems, there aren't many other solutions out there with the same proven track record. So in recent time, a lot of applications that deal with files have added builtin Dropbox support as their solution for the sync problem. And it works fine, for files, and single user scenarios.

But when you start talking about application state, with multiple users, if you plan on using Dropbox, you better start modelling your data store as a series of independent files. And forget about data protection of any kind.

But it can be done. For the single user case, it is more than enough.

The best example I know (and use, and recommend by the way) is 1Password. A couple of versions back, they switched from a single file to a file bundle as their storage system. You can read all about the Agile keychain design, good stuff in there.

What they did was break a large state blob into several smaller ones.

But the Dropbox API lacks any way to be notified in real-time of changes others may have made to your files. Sure, the desktop client uses a private Dropbox API to be notified of new stuff, but that is not available to you.

So, if you plan on using Dropbox as the sync service for your app, remember that always-in-sync scenario requires the Dropbox desktop client, and you have to monitor (fsevents on Mac OS X, or inotify on Linux, kqueue on FreeBSD...) the filesystem yourself to detect changes.

But, but, but git is magical!

Yes it is. But only for some things. Files, source code, text articles.

If you plan on basing your sync solution on git, thats fine. It is a viable option now that libgit2 is stable.

Just remember that source code conflicts are easier to solve because they happen on something that has structure, something that can be validated,at syntax level by your language compiler or interpreter, and at semantic level by your test suite. You do have one of those, right?

There is no possible test suite that covers your application data semantics. You can have some business rules that must be checked before accepting changes to application state, but they will never be comprehensive.

Also, git lacks any way to notify you in real-time of new commits on remote branches, so you have to roll your own too.

Natives need not apply

You might think that all of this is my subtle way of pushing you from your friendly web to the evil seductive native apps embrace, but no.

Yes, I do believe that the vast majority of your app logic should be executed on the client side. It's not only my distributed systems engineer wet dream, but also your ecological duty not to wast all those processor cycles sitting on your lap or nicely wrapped in your hand.

I also believe that no matter how much bandwidth you have, how blazing fast your servers are, you can't beat the latency of a local app working on local data.

And besides, even if you have always-hopefully-on internet connectivity, isn't it better to hide all those wishful thoughts about "when the user presses this button I will have network access, my server will be up, and everything is peachy" in the smallest part of your application as you can manage it? If your entire app uses local data, and then, in the background, those changes are sent to the remote peers, the user experience is much faster.

So keep you mad Web skills, they are your best bet it this the brave new continent you're about to explore. Just think about bringing along a simple HTTPS server, sitting right there on your local device, holding a copy of your app, and your state.

Interactive sync

There is a crop of products that focus on real-time collaboration. Software like SubEthaEdit and recently Ex-Google Wave.

They are very specific niches of the whole synchronisation topic, and I don't think they are a good solution for most apps. A bit of overkill.

Action!

Finally, my brain dump from having done some synchronisation work.

Divide and conquer. Start with a simple problem. Don't make sync a feature in a future version, plan it from the start. It is very hard to bold sync later on, believe me.

Cherish immutable content, or single-owner content, like comments. No conflicts there. A post has a set of comments, this set is conflict free.

Embrace that conflicts are a human problem, not a technical one, and instead of wasting your time dreading them, focus on the best UX you can provide to make the work of two or more people involved easier to do: how to pick the new world view?

Start with a central meeting place, where all changes flow through. It is easier to do at first, just make sure this central place never accepts conflicted information. All conflicts must be solved on the client side befor any changes are sent.

Start with "rolling sync", the simplest approach to merges: if you have been offline for a bit of, first undo all your offline changes, apply all remote changes done since your last sync, and then roll your changes over that to catch any conflicts. If you know git, think rebase, not merge.

Use UUID's or SHA1 of content as identifiers: there is no clear cut rule on which one to pick, it really depends on the situation.

Remember the truth about clocks: everybody has one, and each one keeps his own perfect time. If only they could agree on what that perfect time is...

Don't ask permission to collect all of what you can from your peers, just do it. Later, your person can always check why was something changed, and who did it.

Its a brave new world

Sync is not just a feature anymore, it must be something that its just there, unquestioned, subliminal.

Do your part, and start designing your apps with "Sync first!" mentality.