Science Online 2010: Shakespeare wasn’t a semantic web guy

Posted on 18th January 2010 by ideonexus in Social Networking Scientists - Tags:

This post is part of my coverage of the Science Online 2010 conference.

One of the headaches we have come to accept with the anarchic REST Architecture of the World Wide Web is that a link we post to an image, web page, or other resource online today may go dead a month from now. This could happen for a variety of reasons, like the host going down, a company deciding to charge for the content, or the link changing domains. As a result, looking at old blog posts, we see broken images, “content no longer available” messages for embedded videos, and “page cannot be found” messages in response to our onclicks.

What is an inconvenience for web content authors is a much more serious issue for researchers who publish online. What happens to a paper that links to a bioinformatics dataset hosted at another server that goes dead? An anarchic architecture is fantastic for online freedom of expression, but it’s a serious flaw when trying to ensure academic integrity.

Super-mega-kudos to Dr. Jonathan Rees for giving a talk on what is a highly-technical and obscure problem in online research and citations, a talk that less than a dozen people attended and less than half a dozen were able to sit all the way through, but concerned a fascinating problem in computer science that affects everyone who attended the conference.

Factory Pattern Provides a Level of Indirection
Factory Pattern Provides a Level of Indirection
Credit: Michael Duell

The solution Dr. Rees advocates involves the computer science principle of introducing a level of indirection to provide a buffer between software components subject to change. In the case of linking to online resources, we are seeking a level of indirection between the source and the referrer to insulate the reference list from the possibility of the link changing or the existence of many different links for the same resource.

The Shared Names Project seeks to provide this level of indirection with a database of links that provide a single connection to a citation that can change its reference if the WWW link to the reference changes. For instance, instead of linking to “http://www.data.gov” directly, researchers would create a “http://sharedname.org/unique_key” link that redirects to data.gov to post in their papers and posts. This way, if data.gov becomes “http://www.info.gov”, only the sharedname unique_key pointer would need to change.

The problem with this level of indirection, despite affording us some stability, is that we are implementing what, to my mind, is a maintenance nightmare. In a programming environment, I try to have total control over all aspects of my program. As Dr. Rees himself asks, whose responsibility it is to manage the database of links? Publishers cannot be responsible for notifying every url-redirecting service, like TinyUrl, that their link has changed, and a community of people using the database of links will cause redundancy and conflicts.

As much as I appreciate Dr. Rees bringing attention to what is an important problem in Computer Science, I can’t see SharedNames as a practical solution.

Few people realize that WWW was just one of many possible strategies for linking everything together. Ted Nelson envisioned an internet where there was only one instance of every object online, and we would all use the same link to it, allowing content providers to control the use of their objects and bring additional stability to the internet. Unfortunately, because we didn’t go with Ted Nelson’s vision of what a URI should be, if we want to maintain the integrity of our references online, we have no choice but to make a redundant copy whenever possible and assume responsibility for maintaining it ourselves. Dr. Rees did mention the fact that data cannot be copyrighted under United States law, this means that researchers in bioinformatics do have the option of downloading the data and posting it to a server with more stability, assuming the storage size is manageable and the resources are available.


Additional:

See the wiki for this session, which has links to additional resources.

You can see a PDF of my raw notes from this session here.

3 Comments »

  1. It is indeed a maddening thing. When my 1993-2008 webpage finally died, pretty much every link to my own content that I had ever posted died — and there wasn’t a damn thing I could do about it. :/

    Now, I could register my own domain name, but then I’m spending money, and the same thing will still happen when I die.

    Comment by ClintJCL — January 25, 2010 @ 11:27 am

  2. Yeah… The whole death thing has been bothering me in that respect too. I usually pay my webhost to keep two years of hosting lined up, but after that, except for archive.org, I’m gone from the interwebs.

    Comment by ideonexus — January 25, 2010 @ 8:31 pm

  3. I think it would be nice if Flickr kept photostreams at a Pro level after people died.

    But then we’d have all these people faking deaths to get free flickr pro. Argh.

    I have a future dated post set to my 100th birthday, 1/13/2074, thanking wordpress for keeping my blog alive for so long… If they don’t, it will never post, and they wont get thanked. So it evaluates properly no matter what happens :D

    Comment by ClintJCL — January 25, 2010 @ 9:26 pm

Leave a comment

Creative Commons License