Monday, July 28, 2008

The Programmable Web is getting more complex

OSCON is a conference which has various amorphous tracks and most of those relate to the "trends" of tomorrow. Some of the people whom I met were complaining about the lack of Java's presence in the conference. Well, as Tim Bray says, cool kids are not using Java these days. Personally I was a bit disappointed with the lack of presence of the Scala community in OSCON. In fact, except in one of Tim Bray's keynotes, Scala, I guess, never figured in the radar of this year's conference. There were tutorials and BOFs on actor models, where also Scala got only a passing reference. But I digress ..

Web Architecture is getting more complicated!

There is no doubt about the fact that we are looking at more and more options to structure services (and data) delivered over the Web. While we are all huge fans of RESTful APIs delivering the goods at optimal payloads for many of today's Web 2.0 sites, things may start changing with the scale of data that we are witnessing today as a result of frivolous cross pollination of data streams. Inundating message streams, flooding of tweets, flickr uploads and continuous exchange of other social objects over the Web have started challenging the inherent polling model for some of the usecases that REST based architecture employs.

In the session titled Beyond REST? Building Data Services with XMPP PubSub, Rabble and Kellan made a great presentation of how FriendFeed polled Flickr 2.9 million times on one day to check on updates for 45 thousand users, of whom only 6.7 thousand were logged in at any one time. With Flickr publishing feeds as the only way to advertise presence, Friendfeed had no other alternative but to "poll". Kellan called this "Polling sucks!". The alternative is asynchronous message passing with shared nothing architectures and non-blocking event loop based processing. Then Kellan and Rabble goes on to present how the pub-sub model over XMPP/Jabber can be hijacked to implement Web scale message passing and thereby getting rid of the polling nightmare. When we are talking XMPP, we are talking XML payloads, and this makes perfect sense when you are operating with undefined (or loosely defined) endpoints, high latency and low bandwidth systems. Again a good mix for all the social mashups out there ..

Take a hard look at your usecase ..

While polling sucks for the FriendFeed-Flickr usecase that Rabble and Kellan discussed, polling has its legitimate and rational usecases too. RSS and Atom feeds are ubiquitous today, while very few clients support XMPP. After all, when I have a huge number of clients interested in my feed, it makes perfect sense to go for the polling model for all my clients. Google Reader has been doing this day in and day out.

In another related presentation Open Source XMPP for Cloud Services, Matt Tucker of Jive Software, also discussed the virtues of XMPP based message passing paradigm for handling complex cloud services. While comparing alternatives like SOAP, he talked about performance optimizations that XMPP based services offer through long lived persistent connections as opposed to overhead of establishing encryption and repeated authentication in polling based systems. Well, I am not sure, what impact will there be on the system through 1 million persistent connections - I would love to get some ground realities from people who are actually using it. Matt demonstrated how easily an XMPP based system can be implemented using open source OpenFire XMPP server and a host of client APIs available for a multitude of languages.

But that's really one part of the story, that, at best, adds another dimension to our thought process when designing services for the Web. You need to define what scalability and performance optimizations mean for you. What Google can achieve through Bigtable and MapReduce makes no sense for Twitter. DareObasanjo has a great piece on this subject of ubiquitous scalability.

Whether it's 100% RESTful or not is purely driven by requirements ..

Architecture changes when you have defined end points, low latency and high bandwidth networks to play with. You can afford to enforce constraints over your end points and have binary payloads like Protocol Buffer or Thrift that need a specific subset of programming languages and runtimes. Google and Facebook have been using them with great success. When you need to process lots of data and perform data intensive computations, it makes sense to use the binary wire format instead of XML.

So options start getting multiplied as to whether you need to use polling based RESTful Web services, or message oriented XMPP services or the more classical distributed computing model of using RPC through binary serialization. OSCON 2008 had quite a few sessions that discussed many of these cloud computing infrastructure components - the key message was, of course, one size doesn't fit all. And the Web architecture is getting complicated by the day.

No comments: