April 17, 2008
It’s hard not to be fascinated by the emerging and developing conversations around museums and the Semantic Web. Museums, apart from anything else, have lots of stuff, and a constant problem finding ways of intelligently presenting and cross-linking that stuff. Search is ok if you know what you’re looking for but browse as an alternative is usually a terribly pedestrian experience, failing to match the serendipity and excitement you get in a physical exhibition or gallery.
During the Museums and the Web conference, there was a tangible thread of conversation and thought around the API’d museum, better ways of doing search, and varied opinions about openness and commerce, but always there was the endless tinnitus of the semantic web never far away from people’s consciousnesses.
As well as the ongoing conversation, there were some planned moments as well, among them a workshop run by Eric Miller (ex. W3C sem web guru), Ross Parry‘s presentation and discussion of the “Cultural Semantic Web” AHRC-funded think tank and the coolness of Open Calais being applied to museum collections data by Seb Chan at the Powerhouse (article on ReadWrite Web here – nice one Seb!).
During the week I also spent some time hanging out with George Oates and Aaron Straup Cope from Flickr, and it’s really from their experiences that some thoughts started to emerge which I’ve been massaging to the surface ever since.
Over a bunch of drinks, George told me a couple of fairly mind-blowing statistics about the quantity of data on Flickr: more than 2 billion images which are being uploaded at a rate of more than 3 million a day….
What comes with these uploads is data – huge, vast, obscene quantities of data – tags, users, comments, links. And that vat of information has a value which is hugely amplified because of the sheer volume of stuff.
To take an example: at the individual tag level, the flaws of misspellings and inaccuracies are annoying and troublesome, but at a meta level these inaccuracies are ironed out; flattened by sheer mass: a kind of bell-curve peak of correctness. At the same time, inferences can be drawn from the connections and proximity of tags. If the word “cat” appears consistently – in millions and millions of data items – next to the word “kitten” then the system can start to make some assumptions about the related meaning of those words. Out of the apparent chaos of the folksonomy – the lack of formal vocabulary, the anti-taxonomy – comes a higher-level order. Seb put it the other way round by talking about the “shanty towns” of museum data: “examine order and you see chaos”.
The total “value” of the data, in other words, really is way, way greater than the sum of the parts.
This is massively, almost unconceivably powerful. I talked with Aaron about how this might one day be released as a Flickr API: a way of querying the “clusters” in order to get further meaning from phrases or words submitted. He remained understandably tight-lipped about the future of Flickr, but conceptually this is an important idea, and leads the thinking in some interesting directions.
I got thinking about how this can all be applied to the Semantic Web. It increasingly strikes me that the distributed nature of the machine processable, API-accessible web carries many similar hallmarks. Each of those distributed systems – the Yahoo! Content Analysis API, the Google postcode lookup, Open Calais – are essentially dumb systems. But hook them together; start to patch the entire thing into a distributed framework, and things take on an entirely different complexion.
I’ve supped many beers with many people over “The Semantic Web”. Some have been hardcore RDF types – with whom I usually lose track at about paragraph three of our conversation, but stumble blindly on in true “just be confident, hopefully no-one will notice you don’t know what you’re talking about” style. Others have been more “like me” – in favour of the lightweight, top-down, “easy” approach. Many people I’ve talked to have simply not been given (or able to give) any good examples of what or why – and the enduring (by now slightly stinky, embarassing and altogether fishy) albatross around the neck of anything SW is that no-one seems to be doing it in ways that anyone ~even vaguely normal~ can understand.
Here’s what I’m starting to gnaw at: maybe it’s here. Maybe if it quacks like a duck, walks like a duck (as per the recent Becta report by Emma Tonkin at UKOLN) then it really is a duck. Maybe the machine-processable web that we see in mashups, API’s, RSS, microformats – the so-called “lightweight” stuff that I’m forever writing about – maybe that’s all we need. Like the widely accepted notion of scale and we-ness in the social and tagged web, perhaps these dumb synapses when put together are enough to give us the collective intelligence – the Semantic Web – that we have talked and written about for so long.
Here’s a wonderful quote from Emma’s paper to finish:
“By ‘semantic’, Berners-Lee means nothing more than ‘machine processable’. The choice of nomenclature is a primary cause of confusion on both sides of the debate. It is unfortunate that the effort was not named ‘the machineprocessable web’ instead.”