This is a response to Danny Sullivan’s recent post on SearchEngineLand.
Danny is a great reporter, and one of the top reporters in Search since the beginning (or at least what I know as the beginning). Danny posted an article today on his definition of Realtime search, and since I disagree, I felt like I needed to share my thoughts.
I love Twitter. I use it all the time. For me it has replaced blogging, iGoogle, and often times email and IM. I use Twitter search to track what people are saying about OneRiot, events and other business/ventures that I’m involved with. It’s great. Actually it’s more than great. It’s invaluable, and yes, I support them getting the Nobel Peace Prize for their impact during the Iran Election protests.
But getting a firehose of tweets is not the answer to realtime search. Realtime search is finding the right answer to your question based on what’s available right now, about the subject you care about right now. Realtime search is finding the ‘Right Answer, Right Now’.
In the case of the Iran Elections Twitter told you that protests were happening, but sorting through the tweets would require that you be super-human (I believe there were about 6,000 tweets a second at the peak). You could not get a ‘right answer, right now’ because you were drowning in the firehose.
The Michael Jackson death was announced on TMZ.com. Yes it was shared on Twitter, but the day of his death Tweets on the TMZ article were a tiny fraction of all the tweets on MJ that day. Finding the ‘right answer, right now’ was impossible by sifting by through the firehose.
And of course there is the spam. From fake posts on Jeff Goldblum’s death (although the resulting appearing on the Colbert Report was well worth it), to tweeted links on the Iran Election that actually sent you to Viagra ads, Twitter is far too easy for people to spam. Combine the spam with the firehose, and you start getting a viciously growing cycle of more spam => more tweets => more drowning in the firehose.
The solution to the ‘right answer, right now’ requires a filter. Sounds easy, but actually filters in search are very hard to do well. Talk to Microsoft, Yahoo and Ask on what it takes to catch up with Google on search. I’m dramatically oversimplifying what Google does, but at the end of the day Google filters are better than everyone else’s when it comes to finding the right answer in the static web.
The winner in the realtime web will have the best filters, that work at the speed of realtime, to get you the ‘Right Answer, Right Now’.
OneRiot’s approach has been to focus on the content, and to order that content using PulseRank – PageRank for the realtime web. We were the first search engine to have the TMZ article on Michael Jackson’s death as the top result for a query on MJ. We were the first search engine to find the text of Mousavi’s speech during the Iran Protests. Were we behind the Tweets. Probably, by a few seconds, but if you searched for Michael Jackson on OneRiot, the TMZ article was at the top of the list within a minute of it being posted. I suspect it would have taken your average searcher more than 60 seconds to sift through the tweets coming in at 500+/minute.
Finding the ‘Right Answer, Right Now’ also includes bringing in data from sources other than Twitter/Micro-blogging. Digg is an amazing source of realtime data. While Digg is a smaller data source, when Digg has the content, it’s good content. OneRiot’s panel of users share what’s important to them as they surf the web. We get about 10 million urls/day from our panel coming in in realtime (more than double what we get from Twitter). That is an incredible resource for us that tells us what’s important right now on the web. Flickr, Youtube, Facebook and any other resource where users share what’s important to them make up the realtime web. If you ignore them you will not get the right answer, period.
Bottomline: Realtime search is about finding the Right Answer, Right Now. Users don’t care how you do it, just get it done.








These are fair points in critiquing the Twitter firehose now.
But Twitter also realizes that. That’s why they’re working on their reputational algorithm to rank users and their content, and taking steps now to verify accounts and cut down on spam. If those kinds of steps are executed well, it should definitely give them quite a leg up in real-time search.
Thanks, Kimbal. I guess this part:
“Realtime search is finding the ‘Right Answer, Right Now’.”
Has nothing to do with real time search. I mean, that’s true for any search engine. Before we had Twitter; before we had Google, we had general purpose search engines that were used by people who wanted (and received) the right answer, right now.
I agree. Searching through the Iran tweets could be mind numbing. There’s plenty of spam, as I addressed. And yet despite the huge flaw, people already are ferreting out good info from microblog posts. I hope the area improves, and it’s one place where real time search players are focusing.
The other primary area is where OneRiot seems to focus — can you tap into the real time sharing activity at Twitter and other places to ferret out good articles that people are reading. That’s still compelling. And like I also said, some will still consider that to be real time search as well. I think I also said that I’m mixed on whether Digg data is real time or not. Some of the sharing does happen in real time. But the publishing of the information itself, also didn’t.
That doesn’t mean it’s not useful. I guess I just feel like it would be easier to explain what some of the sharing search sites are about if they had a different name than “real time,” such as “Sharing Search” or “Popularity Search.” But like I also said, I don’t expect that will happen. So I’ll just try to focus on what they do and the usefulness that they provide — they are useful, whatever we want to call them.
Upon reading both articles, I’m not sure that they’re saying anything all that different from each other. Sullivan’s focus on access to the firehose seems to be mainly as a way of introducing the “right answer, right now” problem by way of Google, who have already proven themselves in search-algorithm land, rather than a statement that users themselves want to parse all this data.
One Riot’s method of elevating pagerank for content that is shared across multiple services in greater numbers is smart, and would undoubtedly benefit from less API limitations a la access to the full twitter firehose of content as well, so I hope that access doesn’t get limited to just the key players as suggested in Sullivan’s article. Twitter has always benefited from having a broad developer base.
Vaguely related note: OneRiot differentiates itself by providing some data to back up pagerank, which is beneficial in establishing legitimacy with a new audience. I wonder, though, if an indication of whether or not a search engine has “made it” would be people trying to game the system (in which case the google blackbox is probably a pretty helpful model).
[...] RE: What Is Real Time Search? Definitions & Players (oneriot.com) [...]