Linked Data: my challenge

What with Gordon Brown’s recent (just an hour or so ago) announcement of lots of digital goodness at the “Building Britain’s Digital Future” event, the focus sharpens once again on Linked Data.

I’ve been sitting on the sidelines sniping gently at Linked Data since it apparently replaced the Semantic Web as The Next Big Thing. I remained cynical about the SW all the way through, and as of right now I remain cynical about Linked Data as well.

This might seem odd from someone obsessed with – and a clear advocate of – the opening up data. I’ve blogged about, talked about and written papers about what I’ve come to call MRD (Machine Readable Data). I’ve gone so far as to believe that if it doesn’t have an API, it doesn’t – or shouldn’t – exist.

So what is my problem with Linked Data? Surely what Linked Data offers is the holy grail of MRD? Shouldn’t I be embracing it as everyone else appears to be?

Yes. I probably should.

But…Linked Data runs headlong into one of the things I also blog about all the time here, and the thing I believe in probably more than anything else: simplicity.

If there is one thing I think we should all have learned from RSS, simple API’s, YQL, Yahoo Pipes, Google Docs, etc it is this: for a technology to gain traction it has to be not only accessible, but simple and usable, too.

Here’s how I see Linked Data as of right now:

1. It is completely entrenched in a community who are deeply technically focused. They’re nice people, but I’ve had a good bunch of conversations and never once has anyone been able to articulate for me the why or the how of Linked Data, and why it is better than focusing on simple MRD approaches, and in that lack of understanding we have a problem. I’m not the sharpest tool, but I’m not stupid either, and I’ve been trying to understand for a fair amount of time…

2. There are very few (read: almost zero) compelling use-cases for Linked Data. And I don’t mean the TBL “hey, imagine if you could do X” scenario, I mean real use-cases. Things that people have actually built. And no, Twine doesn’t cut it.

3. The entry cost is high – deeply arcane and overly technical, whilst the value remains low. Find me something you can do with Linked Data that you can’t do with an API. If the value was way higher, the cost wouldn’t matter so much. But right now, what do you get if you publish Linked Data? And what do you get if you consume it?

Now, I’m deeply aware that actually I don’t actually know much about Linked Data. But I’m also aware that for someone like me – with my background and interests – to not know much about Linked Data, there is somewhere in the chain a massive problem.

I genuinely want to understand Linked Data. I want to be a Linked Data advocate in the same way I’m an API/MRD advocate. So here is my challenge, and it is genuinely an open one. I need you, dear reader, to show me:

1. Why I should publish Linked Data. The “why” means I want to understand the value returned by the investment of time required, and by this I mean compelling, possibly visual and certainly useful examples

2. How I should do this, and easily. If you need to use the word “ontology” or “triple” or make me understand the deepest horrors of RDF, consider your approach a failed approach

3. Some compelling use-cases which demonstrate that this is better than a simple API/feed based approach

There you go – the challenge is on. Arcane technical types need not apply.

Are synapses intelligent?

It’s hard not to be fascinated by the emerging and developing conversations around museums and the Semantic Web. Museums, apart from anything else, have lots of stuff, and a constant problem finding ways of intelligently presenting and cross-linking that stuff. Search is ok if you know what you’re looking for but browse as an alternative is usually a terribly pedestrian experience, failing to match the serendipity and excitement you get in a physical exhibition or gallery.

During the Museums and the Web conference, there was a tangible thread of conversation and thought around the API’d museum, better ways of doing search, and varied opinions about openness and commerce, but always there was the endless tinnitus of the semantic web never far away from people’s consciousnesses.

As well as the ongoing conversation, there were some planned moments as well, among them a workshop run by Eric Miller (ex. W3C sem web guru), Ross Parry‘s presentation and discussion of the “Cultural Semantic Web” AHRC-funded think tank and the coolness of Open Calais being applied to museum collections data by Seb Chan at the Powerhouse (article on ReadWrite Web here – nice one Seb!).

During the week I also spent some time hanging out with George Oates and Aaron Straup Cope from Flickr, and it’s really from their experiences that some thoughts started to emerge which I’ve been massaging to the surface ever since.

Over a bunch of drinks, George told me a couple of fairly mind-blowing statistics about the quantity of data on Flickr: more than 2 billion images which are being uploaded at a rate of more than 3 million a day….

What comes with these uploads is data – huge, vast, obscene quantities of data – tags, users, comments, links. And that vat of information has a value which is hugely amplified because of the sheer volume of stuff.

To take an example: at the individual tag level, the flaws of misspellings and inaccuracies are annoying and troublesome, but at a meta level these inaccuracies are ironed out; flattened by sheer mass: a kind of bell-curve peak of correctness. At the same time, inferences can be drawn from the connections and proximity of tags. If the word “cat” appears consistently – in millions and millions of data items – next to the word “kitten” then the system can start to make some assumptions about the related meaning of those words. Out of the apparent chaos of the folksonomy – the lack of formal vocabulary, the anti-taxonomy – comes a higher-level order. Seb put it the other way round by talking about the “shanty towns” of museum data: “examine order and you see chaos”.

The total “value” of the data, in other words, really is way, way greater than the sum of the parts.

This is massively, almost unconceivably powerful. I talked with Aaron about how this might one day be released as a Flickr API: a way of querying the “clusters” in order to get further meaning from phrases or words submitted. He remained understandably tight-lipped about the future of Flickr, but conceptually this is an important idea, and leads the thinking in some interesting directions.

On the web, the idea of the wisdom of crowds or massively distributed systems are hardly new. We really is better than me.

I got thinking about how this can all be applied to the Semantic Web. It increasingly strikes me that the distributed nature of the machine processable, API-accessible web carries many similar hallmarks. Each of those distributed systems – the Yahoo! Content Analysis API, the Google postcode lookup, Open Calais – are essentially dumb systems. But hook them together; start to patch the entire thing into a distributed framework, and things take on an entirely different complexion.

I’ve supped many beers with many people over “The Semantic Web”. Some have been hardcore RDF types – with whom I usually lose track at about paragraph three of our conversation, but stumble blindly on in true “just be confident, hopefully no-one will notice you don’t know what you’re talking about” style. Others have been more “like me” – in favour of the lightweight, top-down, “easy” approach. Many people I’ve talked to have simply not been given (or able to give) any good examples of what or why – and the enduring (by now slightly stinky, embarassing and altogether fishy) albatross around the neck of anything SW is that no-one seems to be doing it in ways that anyone ~even vaguely normal~ can understand.

Here’s what I’m starting to gnaw at: maybe it’s here. Maybe if it quacks like a duck, walks like a duck (as per the recent Becta report by Emma Tonkin at UKOLN) then it really is a duck. Maybe the machine-processable web that we see in mashups, API’s, RSS, microformats – the so-called “lightweight” stuff that I’m forever writing about – maybe that’s all we need. Like the widely accepted notion of scale and we-ness in the social and tagged web, perhaps these dumb synapses when put together are enough to give us the collective intelligence – the Semantic Web – that we have talked and written about for so long.

Here’s a wonderful quote from Emma’s paper to finish:

“By ‘semantic’, Berners-Lee means nothing more than ‘machine processable’. The choice of nomenclature is a primary cause of confusion on both sides of the debate. It is unfortunate that the effort was not named ‘the machineprocessable web’ instead.”

Semanticism. Semanticness. Semanticitivity.

Ever found yourself struggling to answer the question “but what is the Semantic Web? Can’t you give me an example…”?

When I was talking at a UKSG seminar recently, one of the deligates asked one of the presenters exactly this – how the Semantic Web might work in practice. The response was slightly woolly – arguably like pretty much everything to do with the entire notion of “semanticness” 😉

Now, thanks to the video below from True Knowledge (thanks Simon!) it all makes a little more sense.

[blip.tv ?posts_id=473501&dest=-1]

Microformats: added to my TWTOD

I spend a lot of my time talking and living tech. Actually, I spend *most* of my time talking and living tech…and yet I’m not in any way a hardcore techy: a fact that often amuses or bemuses those people around me. I don’t really understand the detail behind TCP/IP, I only know the basic principles behind RDF, I’m not sure I really care too much about Linux and I am fearful of my life when reading even the first few lines of anything OpenSourcey.

This makes me quite dangerous. I geddit enough to say things like “wouldn’t it be cool if…” but not enough to understand the (what I see as) mindblowingly dull history there seems to be behind every single standard, technology, approach. This doesn’t in any way make me immune to tech-lust, you understand, it’s just that I like to understand what a technology does for users before getting too much into the tech itself.

Anyway. This post was supposed to be about something else and I’ve gone off on a rant again…

Ah yes. I’ve been thinking and playing with microformats again. The more I play, the more I think they’re immensely cool, and the less I listen to the noise of the Hardcore Semantic Webbers, who claim (for reasons I don’t really understand) that microformats (and in fact anything top-down) are in some way “cheating”.

Microformats tick everything in my TWTOD (“technologies worth the time of day”) checklist:

1. Does it do something useful or add value to the user experience?
2. Can it be implemented quickly?
3. Is it easy to understand?
4. It is easy (no, REALLY) for a user to…USE?
5. Do people other than geeks get value out of it?

Here’s the elevator pitch about what microformats are and why they are cool: You add custom tags around various bits of your content. Non-human users of your site (computers, probably..) will then know that the thing you’ve marked as a “telephone number” is actually a telephone number and not another 11-digit thingy, or that a particular segment of text is referring to an address or an event. Human users to your site don’t see any different, unless they’ve got something like the Operator plugin in which case they get the option to do interesting things.

If you’re anything like me you’ll love an example from the user perspective, so here it is. I’ve already installed the Operator plugin, by the way (a downside in the TWTOD equation, but a minor one, particularly if browsers start to become more microformat-centric)

1. I browse to a particular event on a microformat supporting site – Upcoming.org in this example

xmas

2. My Operator plugin lights up telling me there are microformats on the page (a “normal” user wouldn’t notice a thing..)

operator

Note that here they’ve embedded lots of stuff: clicking on the “addresses” dropdown gives me the option of linking straight through to a google or yahoo map of the event location, clicking “contacts” lets me add to my Outlook contact list.

3. In this case though, I choose “events” and then “Export Event” to add the event to my Outlook calendar:

appointment

Now how cool is that? Semantic Webby functionality with minimal effort from the site owner. And real gains in functionality for the end user.

Any reason why not do this? None that I can think of…