TITLE OF PAPER: Technorati Hacks URL OF PRESENTATION: _URL_of_powerpoint_presentation_ PRESENTED BY: David L. Sifry REPRESENTING: CEO and founder of Technorati CONFERENCE: O'Reilly Emerging Technology Conference DATE: February 10, 2004 LOCATION: San Diego, CA -------------------------------------------------------------------------- REAL-TIME NOTES / ANNOTATIONS OF THE PAPER: {If you've contributed, add your name, e-mail & URL at the bottom} Infoporn: 11k new weblogs tracked per day (started at about 2k/day) In march of last year, 4/5 thousand created a day In September, hit the 1 million mark, seeing 7/9k created/day Now over 1.6 Million sources tracked (1 every 7.8 seconds) So, how much of this is really blogs and which it testing, and which is flash in the pan? Churn: they're dead after 3 months, 65% are still alive 100k people updating each day (points as Scoble, that prodigious bastard) Median time between post to live index: 7 minutes It's a conversation search engine, partly because it's very different than Google. Example: Home page for ETCon. Link cosmos for this page has four pages with the most recent post shown from 34 minutes ago. A story: They released keyword search on Monday (feedback please, Dave's now a response time maniac) which sped up the ability to do really big queries. Shows the cosmos for Amazon.com. Shows posts from 19 minutes ago and 11 pages of results (which took a couple of seconds). Mentions how good it is that Amazon has permanent deep links to products, creating a PID for people to pass around. But, how is a normal person going to understand this? Acknowledges the user experience is pretty geekish. "But maybe I can do a hack"..... What if they aggregated amazon APIs and technorati APIs and showed you... A page with the top products discussed in the last 24 hours: http://www.technorati.com/cosmos/products.html This took him two or three hours of hacking, which is a result of both technorati and Amazon's publishing of open web APIs. You can choose the cosmos link for the PID url and see what everyone is saying about a product. What Technorati can do is provide a sort of archeological dig of references on the net by time, which is different from newspapers when larger papers can "scoop" stories which have been previously published by smaller papers. Who found Salam Pax? "Everybody talks about the power law. Fuck it, I've got the data." All the power law says is that when it's easy to publish that you'll have a relatively small number of things (compared to the entire space of options) which are linked to by a lot of people. The important part is not the top 100 (bfd) but what happens in the top 100k when there are five inbound links, which is significant because it means that there is still a community for people. Here's what's interesting. If this was broadcast and only the big guys mattered, the graph would look the same. But what we're seeing is that the aggregate number of links in the lower portion of the graph greatly outnumber the links into the top 100. There are lots more little clusters than big clusters. Technorati as platform: XML APIs for all functionality free for non-commercial use REST based architecture Developer site: http://developers.technorati.com/ Other hacks: Joi Ito: IM/SMS notifications of new links Movable Type Plugins Threading on weblog readers (newsmonster, newsgator, blosxom) High priority indexer: pinger http://www.technorati.com/ping.html (If your weblog service doesn't ping Technorati you can bookmark it and it will automatically send out its spider.) Application Directions: Open reviews (RVW format): eg. link reviews to mapping services Subscribe to a set of keyword and Cosmos filters Provide discovery and filtering of subscription lists (input OPML file and they're reorder it by update dates) - attention.xml Vote links (tags saying vote = -1|0|1 for whether the link is good or bad) Geographic search - The problem with geourl is that the metadata is user entered, so it isn't reliable or common (~11k blogs) Technorati wants to be a mirror to the community, not a standards defining co or something requiring active user participation (eg. by adding tags) "More metdata is good." "My mom is creating metadata." Breaking news in the blogosphere. After 9/11 Dave didn't have enough time to track all of the news. Then he realized: "Geez, we're tracking millions of users, and maybe they have more time than I do.... In aggregate." *laughing* "Who is linking to a story and what are the most authoritative bloggers saying about a story?" "Using the bloggers as editors.. Or even better, as filters" Q: What's the difference between "breaking news" and "current events"? A: Breaking news is in chronological order (like a blog) while current events are ranked by the interest shown on blogs. Suggestion: we could send this to the journalists "This is like heroin for writers. Writers are serious about heroin" Conclusions: Where to go from here? What do you want? Not being said (yet): Dave's hiring. SF Area. Non telecommuting. -------------------------------------------------------------------------- REFERENCES: {as documents / sites are referenced add them below} http://www.technorati.com/ http://www.technorati.com/cosmos/products.html http://www.technorati.com/bloglinks.html http://www.technorati.com/cosmos/breakingnews.html [I'LL BE POSTING THIS ONLINE HERE http://trevor.typepad.com/blog/2004/02/oreilly_emergin.html ] [SO PLEASE MAKE YOUR EMAIL OF THE FORM user (at) domain dot com] -------------------------------------------------------------------------- CONTRIBUTORS: {add your name, e-mail address and URL below} Trevor F. Smith, trevorolio (at) mac dot com, http://trevor.smith.name/ -------------------------------------------------------------------------- EMAIL BOUNCEBACK: [TFS: I'd rather you check the site for updates: http://trevor.typepad.com/blog/2004/02/oreilly_emergin.html but I'll email if that doesn't work for you ] shrub (at) mac dot com, phil (at) gyford.com -------------------------------------------------------------------------- NOTES ON / KEY TO THIS TEMPLATE: A headline (like a field in a database) will be CAPITALISED This differentiates from the text that follows A variable that you can change will be surrounded by _underscores_ Spaces in variables are also replaced with under_scores This allows people to select the whole variable with a simple double-click A tool-tip is lower case and surrounded by {curly brackets / parentheses} These supply helpful contextual information. -------------------------------------------------------------------------- Copyright shared between all the participants unless otherwise stated...