Is the Semantic Web (Web 3.0) Dead On Arrival?

Business,Search by on February 7, 2007 at 10:33 pm

Warning, kind of long…

I saw Tim Berners-Lee evangelizing the Semantic Web back in ~ 1998 while I was at MIT. I love the concepts behind the semantic web, and I fully appreciate the power of what the semantic web might some day look like, but I think we’re nearing the 18th year of his evangelical crusade.

A quick primer on the Semantic Web. The semantic web is all about standardized and formatted data. Calendars, places, names, research data, etc. would all be published in standardized formats that could be mashed up, shared and used throughout the web. As always, wikipedia has a good article on the semantic web, and much information can be found on the W3C standards page and the microformats page. There is also a great write up of the semantic web at ReadWriteWeb.

RSS and Microformats are the most well-known variants on the semantic web concept. RSS continues to be a success, but Microformats have languished.

Why RSS succeeded?

It’s all about value thresholds and defaults.

  • Value thresholds for users: It only takes 2 RSS feeds and an RSS reader for a user to derive value from an RSS feed. So, if 2 sites a user visits have RSS feeds, then they’ll benefit by using a single RSS reader. The value threshold for changing user behavior is very low. Only a few sites need to publish the format to get users to see value and begin to change their behavior.
  • Defaults: There are really 3 dominant blog platforms, and they’ve been dominant for a long time. They made the creation of RSS feeds a default very early. Writers had to do nothing to enable RSS feeds - they were automatically generated. So, as blogging took hold, default settings created the supply of RSS feeds that were consumed by visitors.
  • Value thresholds for publishers: Once a critical mass of visitors seeking RSS formed, mainstream sites found huge value in publishing an RSS feed. It connected them to their users offsite and drove visitors from other mashed up sites using their feeds. But, it wasn’t until the critical mass of users formed that mainstream sites adopted RSS.

Challenges of Microformats

Ok, let’s look at microformats and the challenges they face. De facto data standards for calendar events, contact information and addresses (courtesy of the Microsoft Office monopoly and the USPS) have been around forever, but have not received anywhere near the adoption of RSS online. Is the hcard, really any different than the vcard?

  • Value thresholds for users: Users don’t get behavior changing benefit from 1 or 2 sites that use hcalendars, hcards or hreviews. Sure, an hcard might make it easier for a user to add to their address book, but it won’t fundamentally change the way the user interacts online. Any developers that want to create utilities for users, will need to support unformatted data as well, unless there are tons of sites publishing to the standard. There will be no community of users demanding the implementation of microformats.
  • Default settings - too much work: Microformats are all about providing structure. In order for publishers to adopt the microformats they’ll need to do extra work. They can’t write a standard blog entry that’s a review, it will have to be something different. Same thing if they’re announcing an event or providing their contact info.
  • Value thresholds for publishers: Unless tons of users (or tons of other websites) are using the microformats, publishers have little incentive to publish to microformats (or any other sort of pre-defined standard). For example, we implemented hreview microformats at Judy’s Book, and they have had absolutely no impact on our business. We’ve gotten great distribution of our reviews, but microformats haven’t played a role in that. At all.

A critical mass of users consuming microformats isn’t likely to form. They aren’t going to change their behavior because www.joesautobody.com uses hreviews, hcards or hcalendars. Firefox 3 is looking at ways to enable users to use microformats within the browser experience, but even the use cases addressed in this post are really edge cases that hit a user once a week at best.

Publishers aren’t going to do the extra work without any direct benefit to themselves (happier users, more users, etc.)


So, is the Semantic Web DOA?

I don’t think so. I just think it will unfold very differently than conventional wisdom expects. Publishers won’t adopt the standards. Aggregators will build an audience, define (maybe adopt) the standards and then drive adoption to smaller publishers hoping to access the audience of the aggregators.


Vertical Search, Google Base & Intelligent Agents


Method 1: A big player defines a standard. Google, Yahoo and Ask have incorporated Judy’s Book’s reviews into their local products. Google and Yahoo defined their own formats, and since we wanted our reviews there, we published to those standards. We’ve received substantial traffic from those relationships, but they had nothing to do with microformats, the semantic web or other industry standards. If the major players define a format and allow any site to submit their content, small sites will scramble over themselves to get their content in that format.

Hellooo Google Base. If Google starts sending search through Google Base content to websites, we’ll see a mass adoption of Google’s defined formats.

Combine this standard with APIs to access the content, and mashups will spring up everywhere. And, more sites will submit in the format.

Method 2: Intelligent Agents standardize the information.
Ask actually approached the problem differently, they crawled our web pages, determined our page structure and then extracted the reviews. This intelligent agent approach is far more powerful than the Google Base approach and I believe it is more likely to drive the creation of a semantic web.

Aggregators & ‘Intelligent Agents’ (Vertical Search) are already solving the hard problem of dealing with disparate data formats. Kayak for flights, CalendarData / Trumba for events, Yodlee for financial information, Trulia for real estate, and Dapper for data.

These services have directly addressed the ‘user and publisher’ value thresholds. They attempt to aggregate ALL of the information on a particular topic, without publishers having to do anything different. Finding reviews, contact info and calendar events out in the wild is a hard but not an intractable problem.

Not all of the information that these Agents find out in the wild will be deciphered. They’ll take an 80/20 approach and won’t get everything. But, as those aggregators gain traction and start shedding traffic to other sites, publishers will begin ‘review optimizing’ or ‘calendar optimizing’. They’ll optimize to whatever standards the aggregation sites require, much like happens with search optimizers today. And, it is only a matter of time before the Intelligent Agents open up via API so that mashups can form the world over.

Voila, you have a semantic web. Maybe the standards of the semantic web will be those defined today, but more likely they’ll be whatever Google says they will be.
  • acewasabi

    excellent analysis. i’d add that MyYahoo was also an early adopter of RSS and that, alongside the momentum created by auto-published RSS feeds from blog sites, also contributed to RSS becoming a major standard.

    good comments also on Base vs intelligent agents.

  • acewasabi

    ps you have mastered the art of writing good headlines ;)

  • Great point about MyYahoo – they were the first RSS reader for the masses (and probably still are the way that vast majority of the population interacts with RSS).

  • Alex Hopmann also has an interesting post on the history of RSS. He worked at Microsoft back when RSS was being developed and it is interesting to see their perspective on its development.

  • Pingback: AlexHopmann.com » Technology- Semantic Web and Microformats()

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. | Dave Naffziger’s Blog | Dave & Iva Naffziger