Thursday, August 16, 2007

How to make the web safe for (g)Monkeys ?

Thanks to the superhuman efforts of my collaborator Eddie, ED survived a nasty scare today..

As foreshadowed in a recent blog post , Connotea added a new *feature today that rendered both of ED's (1000+ line) GreaseMonkey scripts totally useless. The problems, now fixed, were cause by the addition of javascript elements that now connect Connotea to adds and links from . This brings up the question,

  • what might websites like Connotea that seem to be designed to be mashed up do to improve the stability of the external programs that enhance them?

    In earlier writing, I suggested that some of the solution to this problem might be a focus first on the provision of data in a standardized, presentation agnostic form (e.g. RDF) or/and via API access. Connotea provides RDF via a nice RESTful API, so I couldn't really ask for more... but, there really is more to share than just the data. User-scripts can and do benefit from the presentation information on the websites that they process. So I wonder to myself, "wouldn't be cool if the presentation elements of web pages could be separated from the data they display so that mash monkeys could use both as they see fit?" Best of both worlds - sharable, mergable, mashable data chunks and presentation chunks!

    Of course ;) those bright folks at places like MIT have already been working on this sort of thing for years... So far, they've come up with Fresnel, a standardized display vocabulary for RDF and have used it inside several semantic web browsers. When I first started thinking about this, I naively thought we were going to see general purpose semantic web browsers that could consume and somehow beautifully display pure RDF.. This is of course totally impossible - there is simply way, way, way more than one way to present a piece of data. Semantic Web Browsers won't do all the work - we'll still need good designers. What we need to do, I think, is work out the kinks of binding RDF to flexible, standardized presentation elements like those in Fresnel. Only with the two separate, symbiotic standards in place, one for the data and one for the presentaiton of that data, will it be possible to generate a successful, general purpose browser for the semantic web.

    * p.s. it bums me out a little that I've put so much work into a project that, if successful, will end up benefiting a for-profit company that I don't work for and that now collaborates closely with a company that wants to "provide the Internet economy with greater control, precision, and profitability of content monetization"...gross. If only CiteULike had produced an RDF-serving API when I was first looking at these things...


    Ian Mulvany said...

    Hi Benjamin,

    First, I'd like to say that I really excited when I saw that you posted the Entity Describer on connotea a few days ago. I've been looking for the time to play around with it, it looks like a very powerful extension to connotea.

    Absolutely if I had known that the Proximic addition was going to cause problems then I would have looked for a way to work around those problems. I'm planning on some major changes to the layout of information on the site towards the end of the year, and until reading your post I would not have been thinking about the impact that these changes might make on the ecosystem around connotea, so now I know I'll try to tread a bit more carefully, with more consultation, next time.

    About the monetization of connotea via the link with Proximic. It's clear from your post that you are not too keen on a move like this. Well it was my decision so I'm happy to take some time to justify it.

    Part of my job description is to think about ways to monetize connotea. It costs money to run and develop. We are committed to keeping it free and open. The conclusion has to be to at least try to do things like advertising in a smart way on the site. By the way I'm totally open to other suggestions!

    Given that I'm going to try this out there are two reasons for me to go with Proximic. One is that they are a cousin company to Nature, and so I have a good working relationship with them. I can get them to change things faster than I might with another company. The other more important reason is that they let me fill the index that matches are made against with content that I want. My initial plan is to have half of these these content matching buckets filled with non-ad related content. I am interested in feedback because if the service provides no value to users then I'll pull it. It has to add value.

    It's early days yet.

    The general point you raise in your post does bear serious consideration though. We can certainly start thinking about how we might change the presentation of the data to make mashups more malleable.

    Benjamin Good said...

    Ian, I'm happy to see that you found your way to this post. I'm planning to make an ED announcement on the Connotea discussion list, but am waiting for more stability on our end (moving to the new machine today hopefully). Regarding your points:
    I-> "it looks like a very powerful extension to connotea"
    cool, I really hope so!
    I-> "I'll try to tread a bit more carefully, with more consultation, next time."
    That would be appreciated. Maybe you already do this and we hadn't noticed it, but do external developers have access to the dev site? Seems the obvious early-warning solution.
    As I talked about a bit above, I don't think that the gMonkey pattern of external hackers monkeying around with your html and javascript elements is really ever going to produce a long-term, stable product. You will always want to make changes to that presentation layer and these are always going to cause problems for programs that take that layer as their input. This is something that bioinformatics has wrestled with for a long time as the early databases served all of their data directly through their websites (see notorious screen-scraping quote in here). Like everywhere else, bioinformatics has made a lot of progress in providing more computationally stable/useful access to data; but, almost nothing has been done about the presentation aspects. It seems like this area would be a natural thing for Nature Web to take a lead position in.
    i-> ".. ways to monetize connotea"
    I absolutely don't blame you, it has to be done and proximic does look like a company that could help. I've been surprised at how poorly the Google adwords have been working on Connotea - pretty much all I see are adds for "Nature's tea". You have to admit though that that quote from the proximic website is not exactly heartwarming for your primarily academic, open-access, underpaid, free-loving scientific audience... I'm sure their investors like it, but I'd rather pretend those motivations weren't at the root of everything I see on the web. I mention CiteULike because they are more like me I suppose - closer to my current independent, academic tribe. The fact is however that I really think Connotea is a better product (sorry Richard..) and that is why I will continue to use it for my own references and play with it for my research.