
Should I sell my Twitter username 'bgood' to this Boston-based burger company of the same name -> bgood ?
Thursday, June 25, 2009
Sell bgood on twitter ?
Posted by Benjamin Good at 1:19 PM View Comments Links to this post
Friday, June 5, 2009
publisher removal
In my last post, I mentioned offhand that I could not access a PDF (about an ontology for autonomic license management) without paying a $29 fee to Springer. Though the post was not a direct request for help running around this paywall, I have now received the pdf from 5 different people - several of whom I have never met before.
Posted by Benjamin Good at 9:36 AM View Comments Links to this post
Labels: academic publishing, bioinformatics, micro-payment, open-access, springer
Tuesday, June 2, 2009
licenses and linked data
One of the major downsides of working outside of academia is that I now have to pay much more attention to licenses. No longer can I just grab whatever data I like, do something fun with it, and try to publish what I did and move on. Now I need to know - very specifically - what I am allowed to do with what so I can reduce the possibility of being sued and so that I can put up appropriate "powered by bla bla" messages. Not being fond of reading legal agreements, this is a drag.
"The license agreement can be seen as the knowledge source for a license management system. As such, it may be referenced by the system each time a new process is initiated. To facilitate access, a machine readable representation of the license agreement is highly desirable, but at the same time we do not want to sacrifice too much readability of such agreements by human beings. Creating an ontology as a formal knowledge representation of licensing not only meets the representation requirements, but also offers improvements to knowledge reusability owing to the inherent sharing nature of such representations. Furthermore, the XML-based ontology languages such as OWL (Web Ontology Language) can be user friendly for the non-developers who are often those responsible for implementing and managing such license agreements. This paper shows our use of ontology to represent the license agreement in a development prototype. The ultimate goal is to build ontology for the license management domain that will facilitate autonomic knowledge management. Knowledge based on such ontology can then be shared and utilized by many types of license management system. "What do you think? Is it worth it to pay Springer $29 for the 1592kb in that paper?
Posted by Benjamin Good at 12:39 PM View Comments Links to this post
Labels: copyright, gene ontology, law, licenses, linked data, provenance
Monday, May 11, 2009
CWA at the YMCA
Somewhere high in the air between New York and Minneapolis, my first stop on my way home, I feel compelled to explain a few things to myself. Why on Earth have I just spent the last several nights living in the YMCA in Flushing, New York? Why did I decide to go on my first self-funded professional excursion at a time when I have no income and very little savings? What did I hope to get and what did the trip deliver?
The inspiration for this minor adventure was the inaugural meeting of the Concept Web Alliance (CWA) at the New York Hall of Science. The mission statement of the CWA (written partly at this meeting) is as follows:
“To enable an open, collaborative environment to jointly address the challenges associated with high volume scholarly and professional data production, storage, interoperability, and analyses for knowledge discovery”
The idea is to form an alliance of like-minded researchers and science publishers interested in sharing knowledge in a computationally accessible fashion (i.e. not plain text and such that information from multiple sources can easily be integrated and interacted with). The basic building block envisioned for these efforts is the ‘triple’ – a Concept-Relation-Concept structure. (The word ‘triple’ and the interesting new verb ‘triplification’ - meaning to convert some non-triple-structure like text into a set of triples - were almost certainly the most commonly uttered words in the presentations at the meeting.)
For those familiar with semantic Web standards such as RDF (a generic triple-based language for representing and sharing information) and OWL (a set of languages for representing knowledge in the form of ontologies) it is perhaps most interesting to consider what is not present in a CWA triple and what was not discussed at all in the public portions of the meeting. The following words never came up ‘description logic’, ‘axiom’, ‘class’, ‘reality’.
The intended materialization of the Concept Web vision - at the moment - thus seems to be an open collection of informal (non logic-based) concept representations, identified by URIs retrievable on the Web, that can be linked together to form semantic networks. This graph-structure could be queried (e.g. using SPARQL) for the ‘facts’ that it would contain where each such fact would be linked to extensive information about where it came from (who (or what algorithm) suggested it, when, and with what confidence). Interestingly, this is very similar in its flexibility, lack of built-in reasoning, and its strong notion of provenance tracking to the Freebase model.
While some of you who like to work with reasoners and OWL or who think that it is better to talk about ‘universals’ and ‘particulars’ than it is to talk about ‘concepts’ may find this lack of formality a little disappointing, I am growing more and more enthusiastic about it because fits the publish-then-filter nature of the Web perfectly. We see again and again that once information is out there on the Web, its value increases tremendously. (In fact, many very smart people seem to think that when there is enough text and other unstructured data online that is all we will really need to solve most of our information problems.) By providing a very low barrier for entry and then focusing computer science efforts on handling the noise, complexity, and the conflicts that will inevitably arise (the filter part), I think this triple-publishing approach has great potential to push research in a productive direction. In particular, I think it will push people to spend more time working on other, more flexible modes of inference that don’t just die when a logical conflict is detected - all that squishy probability stuff that the semantic Web has managed to ignore for so long and that happens to be the stuff that makes almost all interesting AI-like technology work now. Furthermore, the triple-focus absolutely does not stop groups that participate in the CWA from making use of approaches grounded in formal logics in their own development.
While there may or may not be good reasons to take one philosophical stance over another when creating knowledge bases, the fact will always remain that there will be conflicts of opinion about this. When dealing with a small group, it may be possible to convince or force acceptance of a particular world view, but it is not IMHO going to be possible to enforce something as arguable as the philosophy of the representation of the nature of being on the scale of the Web. By focusing on the smallest possible units, the triples, and leaving the more precise formalizations and the philosophy out of the vision as much as possible, the CWA might make it possible for a diverse, interoperable ecology of knowledge bases to emerge and co-exist. Ideally, those who wish to make use of, for example - description logic reasoning, should be able to benefit from the pool of URIs in the Concept Web if for nothing other than for the many multi-lingual labels and textual definitions that will be associated with each of them.
It is still very early days for the CWA – probably far too early to really speculate too far about the consequences of its basic technological approach as even this approach is still very much up for debate. Still.. I’m not sure exactly how to express this, but the meeting smelled good. There were enough capable, powerful, enthusiastic people together in that room that seemed to have enough of a shared vision that I think it is very likely that something of that vision is likely to come to life.
So, was it worth it? I think that it was in the end. I got an early look at something that might provide solutions to many of the problems that I’ve spent the past several years of my life thinking about (the social construction of a biosemantic web). I got to reconnect with old friends and make some new ones. I had a chance to see New York for the first time (the scale of which blew my mind). And, last but not least, it just might be the last such academic event I get to take part in. Depending on the choices I make and the dictations of the wheels of fate I may be in the process of losing the privilege of working in the ivory tower. If it was indeed my goodbye to the community of scholars, it was a good one.
So yes, it was a worthwhile trip and it remains an exciting time to be thinking about the Web - concept or otherwise. (and the YMCA wasn’t really so bad in the end ;).
You can follow - and perhaps influence - the evolution of the Concept Web on their blog.
Posted by Benjamin Good at 10:19 AM View Comments Links to this post
Labels: bioinformatics, concept web, CWA, New York, semantic web, social web, travel
Friday, May 8, 2009
CWA live webcast today
In case you want to follow the Concept Web Alliance meeting happening right now, it is being webcast.
Posted by Benjamin Good at 8:03 AM View Comments Links to this post
Friday, April 24, 2009
The big apple
After much hemming and hawing I've decided to make my own way to New York for the inaugural meeting of the Concept Web Alliance. It will be my first conference outside the $helpful$ cocoon of academia and my first visit to the home of the statue of liberty. I'll try to write more about what the CWA is after I find out more at the meeting.
Posted by Benjamin Good at 9:36 AM View Comments Links to this post
Labels: concept web, CWA, New York, travel
Thursday, April 23, 2009
bad day programming
You know you've had a bad day programming when, at the end of the day, you find yourself manually editing the text of a file that looks like this...
- On a Mac, the urllib2.py Python module guesses about your proxy configuration by looking at the file here on the right called com.apple.internetconfig.plist (in /User/you/Library/Preference/)
- If you are having a problem creating connections with urllib2 , try having a look at what proxies it thinks exist like this:
>>> urllib2.getproxies()
- If it comes back with something unexpected like 'http://evil.proxy.bug:8080' then you will need to get rid of that offending string in the com.apple.internetconfig.plist file. The file will open and look a little prettier in the plist editor; however the editor has no search function and I couldn't find the string - which I knew to be there - anywhere. In the end I just opened it in Emacs, found it, zapped it, and prayed that I hadn't hurt anything in the process. Sometimes that works...
Posted by Benjamin Good at 7:27 AM View Comments Links to this post
Labels: appengine, pain, programming, python, urllib2
Wednesday, April 15, 2009
Dissertation now online
The full text of my dissertation, "Strategies for amassing, characterizing, and applying third-party metadata in bioinformatics", is now available via UBC's information repository. It is "manuscript-based" so each of the chapters except the introduction and the conclusion can be read and understood independently. (So there is really no reason for anyone ever to try to read the whole thing in its entirety.)
Bioinformatics resources on the Web are proliferating rapidly. For biomedical researchers, the vital data they contain is often difficult to locate and to integrate. The semantic Web initiative is an emerging collection of standards for sharing and integrating distributed information resources via the World Wide Web. In particular, these standards define languages for the provision of the metadata that facilitates both discovery and integration of distributed resources. This metadata takes the form of ontologies used to annotate information resources on the Web. Bioinformatics researchers are now considering how to apply these standards to enable a new generation of applications that will provide more effective ways to make use of increasingly diverse and distributed biological information. While the basic standards appear ready, the path to achieving the potential they entail is muddy. How are we to create all of the needed ontologies? How are we to use them to annotate increasingly large bodies of information? How are we to judge the quality of these ontologies and these proliferating annotations? As new metadata generating systems emerge on the Web, how are we to compare these to previous systems? The research conducted for this dissertation seeks new answers to these questions. Specifically, it investigates strategies for amassing, characterizing, and applying metadata (the substance of the semantic Web) in the context of bioinformatics. The strategies for amassing metadata orient around the design of systems that motivate and guide the actions of many individual, third-party contributors in the formation of collective metadata resources. The strategies for characterizing metadata focus on the derivation of fully automated protocols for evaluating and comparing ontologies and related metadata structures. New applications demonstrate how distributed information sources can be dynamically integrated to facilitate both information visualization and analysis. Brought together, these different lines of research converge towards the genesis of systems that will allow the biomedical research community to both create and maintain a semantic Web for the life sciences and to make use of the new capabilities for knowledge sharing and discovery that it will enable.About the title.. I ended up using the generic "metadata" rather than something more specific because I needed a way to concisely capture things that range from Del.icio.us tags to classes in the Foundational Model of Anatomy. "Metadata" seemed to do the job, but it remains a little vague and therefore unsatisfying. Similarly, "third-party metadata" is broader than necessary. I don't touch institutionally generated third-party metadata - just what I guess you would probably call "socially generated metadata".
Posted by Benjamin Good at 2:07 PM View Comments Links to this post
Labels: bioinformatics, dissertation, metadata, phd, semantic web, ubc
Tuesday, April 14, 2009
shiny new pen
Another page of conclusions, about 17 corrected typos, and about 7 more forms later and I have reached the penultimate experience of academic student life: a free pen from the university! After that comes the funny hat and then thats all she wrote.
Posted by Benjamin Good at 3:11 PM View Comments Links to this post
Monday, April 6, 2009
So long 23rd grade!
After all the stress and the worry of the last few months I finally made it to the other side. Now, I suppose its ok to call me Dr. Feel Good if you insist ;)
- "What is bioinformatics?" - Francis Ouellette
- "What emerging technologies do you think might have the biggest impact on the future of your research?" - Wyeth Wasserman
Thanks one more time to everyone that has helped and encouraged me along the way - you all rock!
Posted by Benjamin Good at 9:16 AM View Comments Links to this post
Labels: academentia, academia, defense, phd