As I've recently learned, UIMA stands for 'Unstructured Information Management Architecture'. UIMA emerged from the bowels of IBM Research and is now a full-fledged, open source Apache software project. From the Apache description: "Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at."
In conclusion, it feels like this project (bionlp-uima) is basically one step away from being a powerful, useful tool that the bioinformatics community could really benefit from. That step is really to do the beauty work - to make an application for people rather than just a code collection for hackers. The project reminds me a lot of my all time favorite open source software project WEKA - the Waikato Environment for Knowledge Analysis. WEKA contains implementations of thousands of machine learning algorithms along with a tools for experimenting with them. The key difference is that it has a stable click-and-run user interface to provide access to those tools (though you can still access, change, and learn from the large Java stack that runs it). If the BioNLP code was wrapped up in such a framework I suspect they would get many more users, I would certainly be much happier ;).
Wednesday, September 22, 2010
Trials and Tribulations with the UIMA wrapper for the NCBO annotator
I came across UIMA via this article about "A UIMA wrapper for the NCBO annotator" by Christopher Roeder and friends from Colorado and Stanford. For those that are unfamiliar, the annotator is a fairly newish web service for identifying terms from biomedical ontologies in text. (Here is a nice little interface you can use to see what it can do.) As I'm always looking for ways to avoid reinventing the wheel, I was hoping I would be able to use this wrapper on top of this well established framework to quickly build up a nice client for processing some Gene Wiki-related text. It turned out that, aside from my hopes for a quick solution... this is pretty much what I found.
Your results may vary, but here are my key early experiences with the UIMA wrapper for the NCBO annotator :
(Note, this my first foray into this kind of unstructured information extraction. If you know of better ways to do it, please do let me know!)
Posted by Benjamin Good at 3:31 PM
Labels: annotator, information-extraction, natural-language-processing, ncbo, nlp, uima, wrapper
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment