Future of Web Apps Day 2 Afternoon Session 1

October 4, 2007

Next is John Aizen and Eran Shir from Dapper to talk about transforming the existing web into the semantic web.

Generally most have come to agree that the semantic web vision has failed. Why? Because there is a considerable effort required to ‘semantify’ the web and almost noone has spent the money.

RSS and API’s are starting to follow this vision. Mashmaker, BlueVision and Facebook are all doing SemWebby stuff.

What has really been happening to support SW?

First off, the concept of feeds. Second the concept that less is more. REST wins over SOAP, microformats win over RDF/OWL.

Dapper aims to create a semantically meaningful layer over the top of the web. It lets users choose pages on the web and then separate form from content by assigning content to semantically named fields which are then published as xml or other formats.

Semantically based advertising is one interesting area of development. Search is also a bit of a holy grail.

The Dapper guys are now doing a demo of a search engine based on dapper data. It’s pretty impressive, allowing users to query data semantically, for example to look for recipes where ingredient=chocolate or venue=London.

Questions..

Someone pointed out that this is based on screen scraping, and asked about the fragility of this. Dapper apparently has the means to take this on board, looking for inaccuracies in data and responding accordingly.

It has always struck me that the biggest challenge for Dapper (or in fact anything SWebby) is explaining what is essentially a very complex topic. There is no 2 minute elevator pitch for Semantic Web.

7 comments
Mike
Mike

Elliot, irritatingly you put that far better than I've managed to in about 3 rounds of comments :-) - couldn't agree more... It looks from the link to your website that you're involved in a similar space? I'll go do some research.

Elliot
Elliot

IMHO, neither a "top down" or "bottom up" approach to achieving the Semantic Web will be entirely successful. It's likely that the most successful SW approaches will leverage both techniques: relying on ontologies and other machine-parseable resources when they're available, but able to fall back to more "dirty" approaches such as screen-scraping when necessary.

Mike
Mike

I absolutely take on board that the SW isn't going to appear with the click of a mouse. I also think, however, that to dismiss these kinds of technologies is also foolish. Sure, from an academic perspective, the top-down approach isn't going to solve everything. But the web is a reasonable (!) size now and the quantity of content already on it without any semantic markup or any thought about how to make this markup work is - to understate hugely - considerable. I've been involved in a number of workshops where we discussed that really we should refer to the Semantic Web (capital letters = the TBL vision, everything machine-connected, all working in perfect harmony) and also the semantic web (small letters = microformats, small pockets of connectedness, screen scraping, low level feeds). I carry this over into the Dapper approach. It ain't perfect, but it's useful - and can work as of today, *alongside* the more "pure" bottom-up approaches. As per the bottom-up/top-down argument...how about the enormous takeup of RSS (top-down) as opposed to RDF (bottom-up)? Simpler technologies get adopted, harder ("better") ones often fail. REST has huge takeup while SOAP has little. RSS has huge takeup, RDF has little. DC has some takeup, "better" (but more complex) metadata schemas have little. There's another very very good argument for giving these simpler approaches some time of day. Name me ANY Semantic Web application? Apart from a few very academic demos I can't say I've seen any... While this is the case, widescale adoption just won't happen, ever. And until adoption happens, semantic markup isn't going to happen either. Dapper et al at the very least adds functionality to the web *today* which will make the sw (or SW) better understood in a practical and not just academic sense.

Tom Morris
Tom Morris

James is absolutely right about this. Yesterday's talk was yet more misaligning of the goals and aims of the Semantic Web, and complete ignorance of what Semantic Web developers are doing every day. The transition to the Semantic Web was never going to be a quick one - just as the transition from the web of FONT tags to the web of CSS wasn't a quick one, nor a painless one. Pure marketing. It's like standing up and saying "ovens are dead, we have microwaves!". Prediction: we are going to see a lot of companies and people stand up and say "hey look! The Semantic Web is dead and we have created the New Semantic Web and it's So Much Better because you don't have to worry about all that complex RDF stuff!" And they will all be wrong. You can't just make complexity disappear in a puff of smoke.

James
James

One centralized database of screen-scraped information about various resources on the Web - regardless of any level of semantics involved - will not create a Web of data, or the Semantic Web. The options for getting the metadata out there may seem limited now, but people are still figuring that part out. For instance, one possible method of getting the metadata out there is to do what we have always done - don't involve the user in the technical details of what he's doing and make sure that it benefits him in some direct, quick, and obvious manner. We could also install plug-ins in our favorite blogging software and other CMSs that will do the heavy lifting for the user. A Wordpress plug-in that exports FOAF information about your relationships does more for the Semantic Web than Dapper will, unless Dapper plans on using the standards set forth by the W3C to expose this information publicly to the rest of the Web. Silos will not find a place in the Semantic Web, and an API to extract information from a silo probably isn't going to do much either. I see a lot of people (and companies) quickly scrambling to put together these hackish (screen-scraping is the epitome of hackish) ideas of how they can bring about something *like* the Semantic Web, but they are falling short and look silly trying. The Semantic Web is the natural next-step in Web evolution. The funny thing about these companies is that you can tell they just fell short of "getting it." So, they try to build something like the Semantic Web and claim marvelous things (usually the exact same things the Semantic Web offers, only on even shakier grounds). >>Bottom-up doesn’t work, and never will. Top-down is therefore surely the only way to try and do this. Bottom-up is the only method that has ever worked for the Web. Do you know of any top-down services that have helped create the Web as it is today (proprietary or otherwise)? I'd like an example if you have one. The top-down approach to creating the Semantic Web is nearly as important as the bottom-up approach. Top-down systems will allow us to extract CURRENTLY EXISTING information from the Web and bring it into the Semantic Web, while bottom-up ensures that all NEW information makes it into the Semantic Web in the appropriate form.

Mike
Mike

James, you're almost definitely right, but it would be good if you could elaborate a bit - why do you think scraping can't "do" Semantic Web...? I think it's generally recognised that any alternative ("hey, publish your site in RDF!") just isn't going to happen. No-one (probably not even the guys at Dapper) would claim that screen-scraping is anything other than a pretty nasty hack, but the options are severely limited. Bottom-up doesn't work, and never will. Top-down is therefore surely the only way to try and do this.

James
James

A replacement for the Semantic Web cannot be done by a proprietary screen scraping service.