PubSubHubbub: a 1987 idea with HTTP/XML and Peer-to-Peer Sprinkled on It
If you read tech blogs like ReadWriteWeb or TechCrunch you probably have heard of PubSubHubbub, a distributed publishing method recently announced by the Google’s folks: PubSubHubbub.
Tech bloggers are going crazy about it and wrote thousands of posts without really knowing what it is and who will benefit from it. It is one of the buzzwords of the moment and nobody wants to miss on it.
But anybody who studied Computer Science in college will probably remember the Publish-Subscriber model from some of the introductory classes. When simple poll models, in which who is interested in new data constantly asks for it, are too expensive or not scalable, everybody switches to a push model, in which who is interested in the data (subscriber) let the creator (publisher) know and will receive updates whenever there is something new. This was invented in 1987.
Add some HTTP/XML, sprinkle some ideas from peer-to-peer systems, and 30 years later you have PubSubHubbub.
Seriously, that is the idea. The publisher picks some hubs and let them know that it will be publishing something. When it has some updates, pings them (with an HTTP POST) to let them know. Once alerted, Hubs go fetch the full content and distribute it to all the subscribers who previously registered with the hub for that particular feed.
Yes, it will make your little blog scale better since people will crush the Hub and not your site (but isn’t it on WordPress/Blogger servers anyway, so why you care?), but how many blogs/publisher out there have this kind of problems? And if they are that important to receive so much traffic, shouldn’t they actually think about that as a business?
Finally, why isn’t anybody talking about the Hubs? The system is “simple” for publisher and subscribers, but who designs, runs and maintains the Hubs? If an Hub goes down all the subscribers lose the updates (yes, they can go to another Hub) so those systems need to be redundant and scalable (they have to download the content and distribute it).
The only interest one can have to create and maintain an public/free Hub (as Google has done) is to get an hold on the data. Instead of crawling millions of blogs (publishers) to check if they have been updated, they will let you know. At the same time, you will know who (subscribers) is interested in what, and as Google has shown in the past years, that is pretty handy information.
If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.


I really enjoyed this!
I am not going to be original this time, so all I am going to say that your blog rocks, sad that I don’t have suck a writing skills