Linked Data: my challenge

What with Gordon Brown’s recent (just an hour or so ago) announcement of lots of digital goodness at the “Building Britain’s Digital Future” event, the focus sharpens once again on Linked Data.

I’ve been sitting on the sidelines sniping gently at Linked Data since it apparently replaced the Semantic Web as The Next Big Thing. I remained cynical about the SW all the way through, and as of right now I remain cynical about Linked Data as well.

This might seem odd from someone obsessed with – and a clear advocate of – the opening up data. I’ve blogged about, talked about and written papers about what I’ve come to call MRD (Machine Readable Data). I’ve gone so far as to believe that if it doesn’t have an API, it doesn’t – or shouldn’t – exist.

So what is my problem with Linked Data? Surely what Linked Data offers is the holy grail of MRD? Shouldn’t I be embracing it as everyone else appears to be?

Yes. I probably should.

But…Linked Data runs headlong into one of the things I also blog about all the time here, and the thing I believe in probably more than anything else: simplicity.

If there is one thing I think we should all have learned from RSS, simple API’s, YQL, Yahoo Pipes, Google Docs, etc it is this: for a technology to gain traction it has to be not only accessible, but simple and usable, too.

Here’s how I see Linked Data as of right now:

1. It is completely entrenched in a community who are deeply technically focused. They’re nice people, but I’ve had a good bunch of conversations and never once has anyone been able to articulate for me the why or the how of Linked Data, and why it is better than focusing on simple MRD approaches, and in that lack of understanding we have a problem. I’m not the sharpest tool, but I’m not stupid either, and I’ve been trying to understand for a fair amount of time…

2. There are very few (read: almost zero) compelling use-cases for Linked Data. And I don’t mean the TBL “hey, imagine if you could do X” scenario, I mean real use-cases. Things that people have actually built. And no, Twine doesn’t cut it.

3. The entry cost is high – deeply arcane and overly technical, whilst the value remains low. Find me something you can do with Linked Data that you can’t do with an API. If the value was way higher, the cost wouldn’t matter so much. But right now, what do you get if you publish Linked Data? And what do you get if you consume it?

Now, I’m deeply aware that actually I don’t actually know much about Linked Data. But I’m also aware that for someone like me – with my background and interests – to not know much about Linked Data, there is somewhere in the chain a massive problem.

I genuinely want to understand Linked Data. I want to be a Linked Data advocate in the same way I’m an API/MRD advocate. So here is my challenge, and it is genuinely an open one. I need you, dear reader, to show me:

1. Why I should publish Linked Data. The “why” means I want to understand the value returned by the investment of time required, and by this I mean compelling, possibly visual and certainly useful examples

2. How I should do this, and easily. If you need to use the word “ontology” or “triple” or make me understand the deepest horrors of RDF, consider your approach a failed approach

3. Some compelling use-cases which demonstrate that this is better than a simple API/feed based approach

There you go – the challenge is on. Arcane technical types need not apply.

The whole NPG / Wikimedia thing

There’s acres and acres of stuff to read and write about the whole National Portrait Gallery legal action threat against Wikimedia contributor Dcoetzee and his addition to the Wikimedia collection. I’m not going to try and add to the noise too much but it would seem apposite to at least comment given my current thread of presentations and posts is all about freedom, openness and MRD.

As always (just like the argument currently brewing about Free), there are two possible dangers in any debate like this. First, we go into too much detail and lose the view of the house because we’re examining the bricks too closely. Second, we polarise the debate.

I’m good at polarising, being a bear of simple brain – particularly when it comes to copyright. Simply, I don’t think it works in many cases, and I think this particular example holds – on many levels – great reasons as to why not. Cross-country, cross-domain, cross-sector, hidden images, non-hidden images, etc etc. This level of complexity doesn’t hold well with users, and they will abuse, either knowingly or unknowingly.

Having said that, there are clearly two sides to this particular debate, and actually I think both sides are being pretty reasonable. NPG have offered medium sized pictures; Wikimedia has been on the case for some years seeking access to these (arguably) public domain images. The discussion over the detail in this particular case will ramble on; the legal threat will be sorted out of court; everyone will ultimately go away at least semi-happy.

The bigger picture is the more important question, and it is this: why are cultural institutions putting collection (images) online? I ask this as an open question, as un-loaded as it can be (given you probably know where I’m coming from on this).

The possible answers are these (none is mutually exclusive, by the way):

  • to sell them / variations of them, such as prints, etc
  • to increase exposure to them
  • to increase exposure to the holding institution
  • to increase ticket sales / physical visits to the holding institution

So with these in mind, I think the important questions in this particular debate are not about the devil detail of cross-country copyright or whether Dcoetzee “should” have done what he did. I think they are:

  • does the exposure on Wikimedia increase exposure? (Answer: yes)
  • does exposure of hi-res pictures stop people from buying them (Answer: unknown, but possibly not)
  • does the exposure of the images improve the standing of the institution (as being a place that “has a great collection”) ? (Answer: yes)
  • does the exposure of the images increase click-through to the NPG website (and hence, assuming at least some kind of connection between traffic and physical visits) ? (Answer: unknown – I’m about to submit a FOI request to see if we can find out, but probably yes)
  • does the threat of legal action make NPG look good? (Answer: not really)

There’s some great questions here, which I’ve been asking our sector to answer for a while. Where is value in a networked age? How does virtual equate to physical? Does exposure increase or decrease physical sales (go ask Anderson or Gladwell this one…).

Just as a closing thought, I wonder if the NPG will be chasing Yahoo! for this YQL query or Google Images for this one? I suspect not.

Pushing MRD out from under the geek rock

The week before last (30th June – 1st July 2009), I was at the JISC Digital Content Conference having been asked to take part in one of their parallel sessions.

I thought I’d use the session to talk about something I’m increasingly interested in – the shifting of the message about machine readable data (think API’s, RSS, OpenSearch, Microformats, LinkedData, etc) from the world of geek to the world of non-geek.

My slides are here:

[slideshare id=1714963&doc=dontthinkwebsitesthinkdatafinal-090713100859-phpapp02]

Here’s where I’m at: I think that MRD (That’s Machine Readable Data – I couldn’t seem to find a better term..) is probably about as important as it gets. It underpins an entire approach to content which is flexible, powerful and open. It embodies notions of freely moving data, it encourages innovation and visualisation. It is also not nearly as hard as it appears – or doesn’t have to be.

In the world of the geek (that’s a world I dip into long enough to see the potential before heading back out here into the sun), the proponents of MRD are many and passionate. Find me a Web2.0 application without an API (or one “on the development road-map”) and I’ll find you a pretty unusual company.

These people don’t need preaching at. They’re there, lined up, building apps for Twitter (to the tune of 10x the traffic which visits twitter.com), developing a huge array of services and visualisations, graphs, maps, inputs and outputs.

The problem isn’t the geeks. The problem is that MRD needs to move beyond the realm of the geek and into the realm of the content owner, the budget holder, the strategist, for these technologies to become truly embedded. We need to have copyright holders and funders lined up at the start of the project, prepared for the fact that our content will be delivered through multiple access routes, across unspecified timespans and to unknown devices. We need our specifications to be focused on re-purposing, not on single-point delivery. We need solution providers delivering software with web API’s built in. We need to be prepared for a world in which no-one visits our websites any more, instead picking, choosing and mixing our content from externally syndicated channels.

In short, we now need the relevant people evangelising about the MRD approach.

Geeks have done this well so far, but now they need help. Try searching on “ROI for API’s” (or any combination thereof) and you’ll find almost nothing – very little evidence outlining how much API’s cost to implement, what cost savings you are likely to see from them; how they reduce content development time; few guidelines on how to deal with syndicated content copyright issues.

Partly, this knowledge gap is because many of the technologies we’re talking about are still quite young. But a lot of the problem is about the communication of technology, the divided worlds that Nick Poole (Collections Trust) speaks about. This was the core of my presentation: ten reasons why MRD is important, from the perspective of a non-geek (links go to relevant slides and examples in the slide deck):

  1. Content is still king
  2. Re-use is not just good, it’s essential
  3. “Wouldn’t it be great if…”: Life is easier when everyone can get at your data
  4. Content development is cheaper
  5. Things get more visual
  6. Take content to users, not users to content (“If you build it, they probably won’t come”)
  7. It doesn’t have to be hard
  8. You can’t hide your content
  9. We really is bigger and better than me
  10. Traffic

All this is is a starter for ten. Bigger, better and more informed people than me probably have another hundred reasons why MRD is a good idea. I think this knowledge may be there – we just need to surface and collect it so that more (of the right) people can benefit from these approaches.