Are synapses intelligent?

April 17, 2008

It’s hard not to be fascinated by the emerging and developing conversations around museums and the Semantic Web. Museums, apart from anything else, have lots of stuff, and a constant problem finding ways of intelligently presenting and cross-linking that stuff. Search is ok if you know what you’re looking for but browse as an alternative is usually a terribly pedestrian experience, failing to match the serendipity and excitement you get in a physical exhibition or gallery.

During the Museums and the Web conference, there was a tangible thread of conversation and thought around the API’d museum, better ways of doing search, and varied opinions about openness and commerce, but always there was the endless tinnitus of the semantic web never far away from people’s consciousnesses.

As well as the ongoing conversation, there were some planned moments as well, among them a workshop run by Eric Miller (ex. W3C sem web guru), Ross Parry‘s presentation and discussion of the “Cultural Semantic Web” AHRC-funded think tank and the coolness of Open Calais being applied to museum collections data by Seb Chan at the Powerhouse (article on ReadWrite Web here – nice one Seb!).

During the week I also spent some time hanging out with George Oates and Aaron Straup Cope from Flickr, and it’s really from their experiences that some thoughts started to emerge which I’ve been massaging to the surface ever since.

Over a bunch of drinks, George told me a couple of fairly mind-blowing statistics about the quantity of data on Flickr: more than 2 billion images which are being uploaded at a rate of more than 3 million a day….

What comes with these uploads is data – huge, vast, obscene quantities of data – tags, users, comments, links. And that vat of information has a value which is hugely amplified because of the sheer volume of stuff.

To take an example: at the individual tag level, the flaws of misspellings and inaccuracies are annoying and troublesome, but at a meta level these inaccuracies are ironed out; flattened by sheer mass: a kind of bell-curve peak of correctness. At the same time, inferences can be drawn from the connections and proximity of tags. If the word “cat” appears consistently – in millions and millions of data items – next to the word “kitten” then the system can start to make some assumptions about the related meaning of those words. Out of the apparent chaos of the folksonomy – the lack of formal vocabulary, the anti-taxonomy – comes a higher-level order. Seb put it the other way round by talking about the “shanty towns” of museum data: “examine order and you see chaos”.

The total “value” of the data, in other words, really is way, way greater than the sum of the parts.

This is massively, almost unconceivably powerful. I talked with Aaron about how this might one day be released as a Flickr API: a way of querying the “clusters” in order to get further meaning from phrases or words submitted. He remained understandably tight-lipped about the future of Flickr, but conceptually this is an important idea, and leads the thinking in some interesting directions.

On the web, the idea of the wisdom of crowds or massively distributed systems are hardly new. We really is better than me.

I got thinking about how this can all be applied to the Semantic Web. It increasingly strikes me that the distributed nature of the machine processable, API-accessible web carries many similar hallmarks. Each of those distributed systems – the Yahoo! Content Analysis API, the Google postcode lookup, Open Calais – are essentially dumb systems. But hook them together; start to patch the entire thing into a distributed framework, and things take on an entirely different complexion.

I’ve supped many beers with many people over “The Semantic Web”. Some have been hardcore RDF types – with whom I usually lose track at about paragraph three of our conversation, but stumble blindly on in true “just be confident, hopefully no-one will notice you don’t know what you’re talking about” style. Others have been more “like me” – in favour of the lightweight, top-down, “easy” approach. Many people I’ve talked to have simply not been given (or able to give) any good examples of what or why – and the enduring (by now slightly stinky, embarassing and altogether fishy) albatross around the neck of anything SW is that no-one seems to be doing it in ways that anyone ~even vaguely normal~ can understand.

Here’s what I’m starting to gnaw at: maybe it’s here. Maybe if it quacks like a duck, walks like a duck (as per the recent Becta report by Emma Tonkin at UKOLN) then it really is a duck. Maybe the machine-processable web that we see in mashups, API’s, RSS, microformats – the so-called “lightweight” stuff that I’m forever writing about – maybe that’s all we need. Like the widely accepted notion of scale and we-ness in the social and tagged web, perhaps these dumb synapses when put together are enough to give us the collective intelligence – the Semantic Web – that we have talked and written about for so long.

Here’s a wonderful quote from Emma’s paper to finish:

“By ‘semantic’, Berners-Lee means nothing more than ‘machine processable’. The choice of nomenclature is a primary cause of confusion on both sides of the debate. It is unfortunate that the effort was not named ‘the machineprocessable web’ instead.”

8 comments
miaridge
miaridge

Hi Owen, when you say, "I suspect the problem with structure is that in order for it to be meaningful we have to all agree on the same structure", is that something you think or an attitude you encounter? Can you expand on this a bit more? RSS or microformats do seem like a solution that's "good enough" for the purposes of our projects in the meantime, rather than waiting for a top-down project to have meetings for five years and agree on the ideal museum metadata structure? The artwork microformat (http://microformats.org/wiki/work-of-art) is one lightweight solutions; it wouldn't work for all museum objects but it's a start. I'm wondering if the idea of the 'perfect structure' that can deal with all the peculiarities of all possible museum objects is a bit like the idea of the 'ontology of everything' that put people off putting out their local ontologies as they are and finding ways to link them to other ontologies as they're used. cheers, Mia (hi Mikes!)

Mike
Mike

Mike L - "semimantic"! - genius! You should be in marketing :-) I haven't looked into hList but am Googling as I speak... re. synapses - damn, I had a suspicion I hadn't researched enough... Mike

Mike L
Mike L

Mike, it is finally getting exciting to watch. I could never understand the negativity of many people toward the idea of a data web, but as you infer it was partially the 'heavyweight' theoretical approach that scared people off. Standards and a bit of clever programming can achieve a lot. The 'semimantic' approach - do some structure now where we can - is working and bringing some of the sw concepts into the mainstream. Yahoo / LinkedIn's announcements are a hugely significant step. The newish 'hList' microformat is possibly the start of something the cultural sector could use with its core content. I'd say its not a duck yet - I think we're still in the pre-Cambrian here, but it has a backbone like a duck's and its effectively able to waddle. BTW single synapses can hold memory, and be a part of many - Eric Kandel got a Nobel for this recently - they're the mechanism not the information- don't underestimate them...

Owen Stephens
Owen Stephens

I think we agree on quite a lot of this. The unfortunate (from my PoV) part of Google's attitude (which I don't blame them for) is that Google is able to define 'commonly supported'. So, they introduce something like 'Google Sitemap', and loads of people start using it. They aren't a neutral party. An announcement from Google that they were going to explicitly use hCard etc. along the lines of the recent Yahoo announcement would change people's attitude to it - suddenly everyone would want their data available in that format. I suspect the problem with structure is that in order for it to be meaningful we have to all agree on the same structure - since this is the hard bit, it's much easier to use things that have got some agreement round them (like RSS or Microformats) than it is to hammer out something new. I'm all for this, and would love to see us adopting these as standards on the web for MLAs - but lets think of them as a basecamp rather than the summit.

Mike
Mike

Hi Owen Thanks for the insight and thoughts - interesting, and I agree very much about the "more structure is better" concept. When I was over working with Google in Paris I asked them about metadata and their line was "We love metadata. We just assume it won't be there". And that for me is an excellent way of approaching this stuff. It all comes down to the barriers that are put up by perfection. I've blogged LOADS about this and it's the same thing again and again: perfection and structure is great but it has a tendancy (particularly in institutions like universities, museums and libraries, where accuracy is - arguably - the backbone of those insitutions) to become a blocker rather than an enabler. The point you make about structured data becoming unstructured on the web is a great one, and I hadn't really focussed on this. I guess again it comes down to finding the best ways of presenting data in a format which is readable, as structured as it can be, but also *commonly supported*. Which is why a JPEG is always going to be a better option than RAW, even though it's not - strictly - "better"... Cheers Mike

Owen Stephens
Owen Stephens

Brian Kelly also blogged the report by Emma at http://ukwebfocus.wordpress.com/2008/04/12/reflecting-on-openness-and-the-semantic-web/ and there are some follow up comments on some of this. My personal view is that the more structure the better - but I'm not extreme about it. I think the thing is that many institutions like libraries, museums and archives store data in a structured way, but when they are exposed on the web, they often do so in a completely unstructured way - this is clearly wasting an opportunity. I'd also argue that once you have decided to output data in a structured format, you may as well aim for the most meaningful structure, and at the moment it looks like RDF is a good way of achieving this. I'm no expert on RDF, and like you, I often start to feel lost when talking to others (more expert than me) about it, but certainly in the world of libraries, the same is probably true of MARC (MAchine Readable Cataloguing), but because it is seen as part of the professional body of knowledge we all go along with it, even if only the minority really understand it. Once you've got a system that outputs your data in MARC, you don't have to worry about the detail - someone has done the hard work for you. I think the same is true with RDF - I don't need to really understand it to be honest, I just need a system that will output my data as RDF. The frustration is that as institutions we hold a whole load of structured data and when it comes to the web suddenly forget all about the structure, as if it doesn't matter. The extra cost of outputting in one structured format over another is not going to be substantial, but takes some initial investment from system developers - and the only way this will happen is if we, as the people investing in systems, make it part of what we expect from such a system. At the moment libraries seem to be in the situation they were 10-15 years ago when the web came along - many catalogues were online, but all through telnet interfaces - it took several years to get to the point where an html web interface was 'expected'. We now need to take the next step - where 'web friendly' structured data is expected, and I feel we may as well aim for the 'best' structure (although I'm happy to fall short - I'm not an idealist) - I suppose it's a bit like deciding on picture formats - nothing wrong with a jpeg, but if you really want others to be able to manipulate your pictures and do something new and interesting with it, RAW is better.

Trackbacks

  1. […] langsam es an der Zeit, dass wir einen API-Zugang zu diesem neuen Weltwissen bekommen? Auch die visuell-semantischen Bedeutungsnetzwerke, die durch das Vergeben von Flickr-Tags entstehen, sind eine ähnliche Leistung. Welches […]

  2. […] in “machine-useful” form is a topic about which you’ll have noticed I’m pretty passionate. It’s a hard call, though (and one I’m working on with a number of other museum […]

  3. […] days I came across two fascinating scientists: Eric Kandel and Volker Sommer. Thinking about how this can all be applied to the Semantic Web leads to hundreds of […]