Building simplicidade.org: notes, projects, and occasional rants

Android, XMPP and EXI

Last week there was some chatter about Google decision to rename the XMPP API in Android to GTalk API, and dropping the XML-based wire protocol for something binary.

The reason that the Android team gave for this was: XML is too verbose.

First, let me state that I agree that the API should be changed to GTalkAPI if its no longer a compatible XMPP API. This leaves the name open for a future real XMPP API, so in this limited way, it makes total sense to rename it.

Second, "XML is too verbose" seems to me a quick and dirty excuse, and I would prefer to have a real reason for this. A couple come to mind:

  • XMPP is too heavy in terms of CPU and/or battery consumption for our target levels (we have XML parsing, optional zlib compression and TLS encryption);
  • XMPP is too complex and we only need a couple of small things;
  • We want to tie this to our own Google Talk service.

I don't know much about mobile platforms to say if the CPU and/or battery usage is an issue, but I wouldn't be surprised if it were. On the other hand, if the new binary protocol doesn't offer some sort of encryption, well, it won't really be an option, right?

But the most interesting discussion that this change will trigger is about XMPP wire-level protocol.

The upper layers of XMPP should be XML. Our specs are written in XML and I only see advantages on doing so. But this does not mean that the wire protocol must remain as XML. We must understand that XML is just one of the possible representations of the XMPP protocol. It's the one we choose in our documentation, because its the one we humans find easier to read and understand. But that's it. In the end it will be machines that read and parse and interpret the XMPP protocol, and the wire representation should be the most efficient to the machine. Having a XML console is nice, and it will always be there, even if we change the wire representation.

There are alternatives to a UTF-8 human readable version of XML, and one of the most likely to succeed is Efficient XML Interchange (EXI). It remains to be proven, as Robert points out, that EXI (or any other binary encoding applied to XMPP) is more efficient than the current XML-over-zlib-over-TLS.

But making justice to the X in XMPP, we can extend XMPP to support these alternative encodings, using an existing standard, XEP-0138. There is already a proposal for a EXI encoding for XMPP using that same XEP. So experimenting with different encodings can be done within the standard, and that is good.


So we can see that we don't have to use XML as a wire representation and still be XMPP compatible. The next issue is: is the problem with mobile platforms and XMPP related to the XML representation or the protocol itself?

Both Robert and the people participating in the discussion on the standards mailing list (several threads, but this is probably the most interesting one, specially after the change of subject), seem to think that the main issue is the very verbose nature of the protocol, specially with <presence> stanzas.

Also mentioned is the fact that XMPP is usually implemented over a TCP connection and that we don't have a good, quick and efficient way to recover a session after a dropped TCP connection.

So maybe XMPP is not very effective in the mobile world right now. But one thing is clear to me: a lot of people in the XMPP community are knowledgeable and even experts in that field, and so I'm sure we will see further specs to optimize the XMPP usage in a mobile environment.


But getting back to the XMPP wire representation. I really hope to have something more efficient than the current XML representation. And I don't say this because of the mobile world. I want this on the server-side.

More and more, servers are splitting themselves in standalone components: a frontend connection manager, a couple of core XMPP routers, and a set of external components, each dealing with specific features like file transfer, multi-user chat, or publish/subscribe.

And these components must talk to each other, and process a lot of messages. (as a side note, I once tested a commercial XMPP server, doing 500k messages per second). Even right now, I see servers doing tens of thousands of messages per second.

So given this traffic volume, making the overhead per message the lowest possible is a worthy goal, and if that means dropping the XML representation and adopting a new binary one, I for one will welcome the change, and the new binary overlords.