How can we do a realtime web?
There is no more doubt that a huge piece of the internet is going to switch from a “asynchronous” “pulled” mechanism to a “synchronous” “push” and “real-time” paradigm.
This will happen sooner or later, but, as Twitter (for example) proves, this move will probably not be pushed from “the big guys” . Why? Because they spent so much to build a “scalable” pulled architecture that it would just be too painful for them to build something real-time. And, well to be fair, I guess that most of the 3rd party dev who build apps based on existing APIs are not ready to switch to real-time.
I believe that change (as usual) will come from the small-ones, and I want Notifixious to be part of the leading services in this trend.
One of the key objectives of Notifixious is to provide updates when new content is available as soon as possible. For this, we built our “superfeeder”. The superfeeder is in charge of detecting new stories as fast as possible.
Pushed Content :
Technically the content is pushed to us, which makes notification a matter of seconds after publication. Our goal is to have any service online to use this.
- Identica, with its XMPP feed
- Seesmic, with its XMPP Pubsub feed
- Sixapart blogs (Typepad, Vox and LiveJournal, even though it’s not 6A anymore) with their AtomStream.
Smarter Pull :
The goal is to perform a single pull of the information, when the content is available. It’s not perfect, but it’s easier to implement for most existing web-services. The detection time is a matter of minutes (below 5 most of the time)
- Simple Update Protocol, popularized by Friendfeed and used by them, Disqus, and a few others. Any blogger can also use their public sup directory, which makes it easy to integrate
- Ping servers’ public activity. Google Blog Search is the only one that gives the feeds (at least that we know), so we’re using it. The problem is that it’s not very reliable (some updates are missed), but there is virtually nothing to do implement for content producers, since several blog platform and web-services have built-in ping functionality.
- Notifixious own ping : you can send us your feed’s url when you updated it, so that we check it.
“Traditional” pull :
To do it in a most efficient way, we have set up an architecture with queues and workers. A feed is pulled at most every 32 minutes (unless its content is pushed). To determine the frequency, we use the number of subscribers : it’s better to satisfy a lot of consumers, than a lot of producers ;). Of course, while fetching the feed, we use etags, if-modified-since headers and we even keep track of its “content” with a hash-key system. Once fetched, and if it’s new, the feed is sent to a parser. We use Ruby to parse feeds (and the rfeedparser library). It’s probably not the best language for this, but we spend more time “downloading” the feeds themselves than parsing them, so we’d rather optimize the fetching! Of course all this architecture is “evented” for a maximum effisciency.
Please note however that if the feeds themselves are not updated in real-time, or if they use cache systems (such as Feedburner and its low latency), you might see greater delays before we actually detect the story.
Once we’ve done this, we’ve done barely only half of the job. The other part is all about “sharing” what we detect. Somebody shouldn’t have to build the same kind of architecture (or at least for the content we already monitor!), so we’re now building the tools to share this. We will use 2 things : SUP (the half-goodness approach) and XMPP (the full goodness approach). But I’ll talk about it later, when it’s available!
