Showing posts with label academic publishing. Show all posts
Showing posts with label academic publishing. Show all posts

Wednesday, October 1, 2014

Conference proceedings are citable, stop double-dipping

Over the last several years there has been a trend among conferences such as the Bio-Ontologies Special Interest Group meeting at ISMB and the Semantic Web applications and tools for life sciences (SWAT4LS) to invite article submissions for people that want to present at the conference and then to subsequently invite presenters to expand their article in an "official" publication in an associated journal.  Since the PDFs of the articles submitted to the conference usually end up online, as they should, this results in a situation where first reviewers and later readers are often confronted with two versions of essentially the same article - typically with the same title, author list, and often the same abstract.  This causes problems for reviewers as this kind of overlap with prior work (even from the same authors) would typically be grounds for rejection - yet because of this bizarre arrangement with the conference, reviewers are supposed to treat the original article as if it were a pre-print of the first, despite the fact that it is a citable entity on the Web and never referred to as a preprint anywhere.

Conference organizers, please stop this madness.  Here are three models that would be better.

  1. Following the International Biocuration Conference model, invite submissions directly to the partner journal first and then choose presenters from the successful submissions and independently submitted abstracts.  No confusion.  One good, citable paper.  Probably higher quality conference submissions.
  2. Put the articles submitted to the conference in a pre-print server such as arXiv or BioRxiv and continue with the concept of an expanded article in a journal.
  3. Do what the computer science community does and recognize contributions to conference proceedings as citable articles and do away with the attempt to get an 'official' journal publication in addition to the conference citation.

Tuesday, January 10, 2012

Semantic Publishing workshop at ESWC in Greece


I'm helping to organize this exciting event, please consider submitting a manuscript and or attending.  From the official call for papers:
http://sepublica.mywikipaper.org/SePublica2012 an ESWC2012 Workshop.  May 27-31, Heraklion, Greece.
At Sepublica we want to explore the future of scholarly communication and scientific publishing. As we are going through a transition between print media and Web media, Sepublica aims to provide researchers with a venue in which this future can be shaped. Consider research publications: Data sets and code are essential elements of data intensive research, but these are absent when the research is recorded and preserved by way of a scholarly journal article. Or consider news reports: Governments increasingly make public sector information available on the Web, and reporters use it, but news reports very rarely contain fine-grained links to such data sources.  At Sepublica we will discuss and present new ways of publishing, sharing, linking, and analyzing such scientific resources as well as reasoning over the data to discover new links  and scientific insights. 
Workshop Format 
We are planning to have a full day workshop with two main sessions. During the first part of the workshop accepted papers will be presented; the second part of the workshop will address by means of focus groups two main questions, namely “what do we want the future of scholarly communication to be?”  and “how could data be preserved and delivered in an interactive manner over scholarly communications?”. These focus groups will be followed by a panel discussion. As an outcome of these activities we will have a communique that will be the editorial for the workshop proceedings,

Dates 
* workshop papers submission deadline: Feb 29
* workshop papers acceptance notification: April 1
* workshop papers camera ready: April 15 
Submission
 https://www.easychair.org/conferences/?conf=sepublica2012

Issues to be addressed
  • Representation:
    • Formal representations of scientific data; ontologies for scientific information
    • What ontologies do we need for representing structural elements in a document?
    • How can we capture the semantics of rhetorical structures in scholarly communication, and of  hypotheses and scientific evidence?
    • Integration of quantitative and qualitative scientific information
    • How could RDF(a) and ontologies be used to represent the knowledge encoded in scientific documents and in general-interest media publications?
    • Connecting scientific publications with underlying research data sets
  • Technological Foundations:
    • Ontology-based visualization of scientific data
    • Provenance, quality, privacy and trust of scientific information
    • Linked Data for dissemination and archiving of research results, for collaboration and research networks, and for research assessment
    • How could we realize a paper with an API?  How could we have a paper as a database, as a knowledge base?
    • How is the paper an interface, gateway, to the web of data? How could such and interface be delivered in a contextual manner?
Applications and Use Cases:
  • Case studies on linked science, i.e., astronomy, biology, environmental and socio-economic impacts of global warming, statistics, environmental monitoring, cultural heritage, etc.
  • Barriers to the acceptance of linked science solutions and strategies to address these
  • Legal, ethical and economic aspects of Linked Data in science

Friday, May 6, 2011

Integrating the Gene Wiki with traditional publishing?

While we (Andrew Su and I) like to talk about the successes of the Gene Wiki - articles like the one for Reelin that represent arguably the best consolidated body of text associated with the gene - there remain some rather glaring holes in its content.  A couple months ago I had a look for under-developed articles linked to genes with extensive numbers of publications.  With a small bit of hacking I uncovered a list of 2,553 genes that were linked directly to more than 20 PubMed citations (using NCBI's gene2pubmed) but had less than 100 words of text in their Gene Wiki article. (Up to the previous period, this post contained 105 words.)  From this list I found 151 genes with more than 100 PubMed citations and less than 100 words of wiki text.

An example is the PIN1 gene.  When the analysis was run, this gene was linked to 154 citations in PubMed yet had only 2 sentences in the Gene Wiki.  So...  how do we fill in these gaps?  This is, of course, the fundamental question associated with wikis or any other attempt to harness community intelligence and there is no easy answer.  One model that we are very interested in was pioneered by Alex Bateman and colleagues at the journal of RNA Biology.  When hopeful authors submit an article about a new RNA family to the journal, it is a condition of publication that they contribute an article to Wikipedia about that family.   Aside from being a generally good thing to do as far as sharing knowledge with the world, these articles are subsequently used to manage the annotations for RNA families in the Rfam database (e.g. snoZ107_R87).  After a few years of operation, the Rfam team published an article that, among others things, celebrated the success of the Wikipedia connection.  So, how might we expand upon this model to tackle the challenges facing the Gene Wiki?

The beauty of this approach is that it does not rely on any changes to the incentive system currently operational in science.  Scientists need to publish in peer-reviewed journals.  Rather than complaining about the inefficiency of this outdated process and suggesting social changes with no obvious way to achieve them, lets see what we can do to make the system work for us as it stands.  Lets create a way for scientists to obtain real publications in real journals and have Gene Wiki article content generated as a natural part of the process.   Here is one idea.

A Gene Wiki Meta-Journal Special Edition
In this model we would work with a number of smallish, topic-focused journals to requisition short review articles about the molecular function and phenotypic relevance of individual human genes.  (By phenotypic relevance I mean the connection between the gene and something that non-scientists might care about such as a role in a disease or a connection to some human attribute such as height, hair color or athletic performance.)  These review articles would be published in journals appropriately matched to the key phenotypes associated with the gene.  For example, we might imagine requesting a review article in the Journal of Investigative Dermatology about the gene Filaggrin because the most important variations in this gene have been shown to relate to skin conditions such as eczema.  Each of these phenotypically targeted gene review articles would be linked from and would link back to a central article that described the meta-journal concept - ideally in a journal with an audience broad enough to span each of the more niche-specific journals that participated in the experiment.  Following the RNA Biology model, a condition of publication would be to update or create the relevant Gene Wiki article with content from the submitted review article.  
While logistically challenging, this approach appeals to me because it continues with the theme of tapping into the 'Long Tail'.  If we can distribute the labor out among a larger number of journals we ought to be able to connect with a larger number of individual contributors.  In addition, it might be appealing to the editors and contributors to more niche-specific journals to participate in a project with broad visibility.   As an alternative, we might consider attempting to organize a gene-focused special edition similar to the annual Nucleic Acids Research database edition in one gene-focused journal (e.g. Genome Biology), but it seems unlikely that this approach would have the same potential breadth of impact.  Also, following a phenotypic rather than molecular orientation aligns well with Wikipedia's notability criterion and hence might help to generate article content that would meet with less resistance from current Wikipedia editors.
If you have any thoughts on this idea (or if you have better ideas!) I would love to hear from you.

Tuesday, February 1, 2011

cash for semantic publishing research

Following from my last post, here is an update from the SePublica workshop


SUBMISSION DEADLINE February 28
ELSEVIER BEST SEMANTIC PAPER AWARD
The Best Paper Award is presented to the author(s) deemed to have written the paper covering the most innovative and feasible proposal concerning semantic publishing in the workshop. All submissions to the SePuBlica workshop will be considered, and a panel of experts will rate the papers according to originality of the idea, feasibility and  presentation. The Best Paper award is sponsored by Elsevier as an incentive for researchers working on defining the next generation of scientific publishing concepts.  The Best Paper Award will be handed out at the end of the SePuBlica workshop.
As a cash prize, the Best Paper Award will receive: US$ 750
The runner-up will be awarded a prize of US$ 250.

Wednesday, January 12, 2011

CfP Semantic Publishing

Minoan rhyton from Crete!
I'm on the program committee for this conference workshop so clearly you should submit something (and its in Crete!).  See the call for papers below:

-----------------------------------------------------------------------------------------------------------------

1st International Workshop on Semantic Publication (SePublica 2011)
http://sepublica.mywikipaper.org
at the 8th Extended Semantic Web Conference (ESWC 2011)
http://www.eswc2011.org
May 29th or 30th, Hersonissos, Crete, Greece
Keynote by Steve Pettifer, Manchester University, UK.
“Utopia Documents and The Semantic Biochemical Journal experiment”

SUBMISSION DEADLINE February 28

The MISSION of the SePublica workshop is to bring together researchers
and practitioners dealing with different aspects of Semantic
Technologies in the Publishing Industry. How is the Semantic Web
impacting the publishing industry? How is our experience of
publications changing because of Semantic Web technologies being
applied to the publishing industry?

The CHALLENGE of the Semantic Web is to allow the Web to move from a
dissemination platform to an interactive platform for networked
information. The Semantic Web promises to “fundamentally change our
experience of the Web”.

In spite of improvements in the distribution, accessibility and
retrieval of information, little has changed in the publishing
industry so far. The Web has succeeded as a dissemination platform for
scientific and non-scientific papers, news, and communication in
general; however, most of that information remains locked up in
discrete documents, which are poorly interconnected to one another and
to the Web.

The connectivity tissues provided by RDF technology and the Social Web
have barely made an impact on scientific communication nor on ebook
publishing, neither on the format of publications, nor on repositories
and digital libraries. The worst problem is in accessing and reusing
the computable data which the literature represents and describes.

• Consider research publications: Data sets and code are essential
elements of data intensive research, but these are absent when the
research is recorded and preserved in perpetuity by way of a scholarly
journal article.
• Or consider news reports: Governments increasingly make public
sector information available on the Web, and reporters use it, but
news reports very rarely contain fine-grained links to such data
sources.

QUESTIONS AND TOPICS OF INTEREST

• What does a network of truly interconnected papers look like?
How could interoperability across documents be enabled?
• How could concept-centric social networks emerge?
• Are blogs and wikis new means for scholarly communication?
• What lessons can be learned from humanities and social science publishers
(i.e. going beyond scientific publishing towards scholarly publishing)?
• How could we move beyond the PDF?
How can we embed and link semantics in EPUB and other e-book formats?
• How are digital libraries related to semantic e-science?
What is the relationship between a paper and its digital library?
• How could we realize a paper with an API?
How could we have a paper as a database, as a knowledge base?
• How is the paper an interface, gateway, to the web of data?
How could such and interface be delivered in a contextual manner?
• How could RDF(a) and ontologies be used to represent the knowledge encoded
in scientific documents and in general-interest media publications?
• What ontologies do we need for representing structural elements in a
document?
• How can we capture the semantics of rhetorical structures in
scholarly communication, and of  hypotheses and scientific evidence?

AUDIENCE

• researchers from diverse backgrounds such as argumentative
structures, scholarly communication, multi-modality in publications,
digital libraries, semantics in publications, and ontology
engineers.
• practitioners active in the publishing industry, repositories of
experimental information and document standards.

IMPORTANT DATES

Paper/Demo Submission Deadline: February 28, 23:59 Hawaii Time
Acceptance Notification: April 1
Camera Ready Version: April 15
SePublica Workshop: May 29 or May 30 (to be announced)

SUBMISSION AND PROCEEDINGS

Research papers are limited to 12 pages and position papers to 5
pages. For system descriptions, a 5 page paper should be
submitted. All papers and system descriptions should be formatted
according to the LNCS format

http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0

We encourage the submission of semantic documents. LaTeX documents in
the LNCS format can, e.g., be annotated using SALT
(http://salt.semanticauthoring.org) or sTeX
(http://trac.kwarc.info/sTeX/). We also invite submissions in
XHTML+RDFa or in the format or YOUR semantic publishing tool.
However, to ensure a fair review procedure, authors must additionally
export them to PDF.  For submissions that are not in the LNCS PDF
format, 400 words count as one page. Submissions that exceed the page
limit will be rejected without review.

Depending on the number and quality of submissions, authors might
be invited to present their papers during a poster session.

Please submit your paper via EasyChair at
http://www.easychair.org/conferences/?conf=sepublica2011

The author list does not need to be anonymized, as we do not have a
double-blind review process in place.

Submissions will be peer reviewed by three independent
reviewers. Accepted papers have to be presented at the workshop
(requires registering for the ESWC conference and the workshop) and
will be included in the workshop proceedings that are published online
at CEUR-WS.

PROGRAM COMMITTEE

• Robert Stevens, Manchester University, UK
• Benjamin Good, GNF, USA
• Michael Kohlhase, Jacobs University, Germany
• Oscar Corcho, Politecnica de Madrid, Spain
• Steve Pettifer, Manchester University, UK
• Jodi Schneider, DERI, NUI Galway, Ireland
• Sebastian Kruk, knowledgehives.com, Poland
• Henrik Eriksson,  Linköping University, Sweden
• Dagobert Soergel, University of Maryland, USA
• Tim Clark, Harvard Medical School, USA
• Paolo Ciccarese, Harvard Medical School, USA

ORGANIZING COMMITTEE

• Alexander García Castro, University of Bremen, Germany
• Christoph Lange, Jacobs University Bremen, Germany
• Anita de Waard, Elsevier, USA/Netherlands
• Evan Sandhaus, New York Times, USA

QUESTIONS? → sepublica@googlegroups.com

Tuesday, December 7, 2010

Digital Science launches

Digital Science, a "sibling" of Nature Publishing under the parentage of Macmillan Publishers looks like it could be an interesting new company in the scientific publishing domain.  Interesting because it looks like they are really setting themselves up as a software company and perhaps more interesting because they seem to be doing so primarily through partnerships.  Seems like it could be grand opportunity if you are trying to get the ball rolling with a scientific software company.

Sunday, September 27, 2009

Semantic media retrieval service? please?

Here is an application of semantic Web technologies that I would like to have. Please make it for me so that I don't have to. When I finish writing this post,
  1. I would like to press a button that said "Enhance?".
  2. When I pressed the button, the application would read through the text and identify terms, phrases, or other conceptual nuggets that it 'understood'.
  3. These concept nuggets would then be used to find stock / open access images (and videos, etc.)
  4. Where a likely candidate set of images was identified, they would be displayed such that I could quickly choose which, if any, that I liked
  5. When I agreed to keep one, it would be embedded in a reasonable location in the text and I would very rapidly go on with my life, but with the added joy of having authored a much more entertaining piece of online personal history.
This thought crept into my mind after reading through Joey de Villa's post about joining Microsoft which is shot full with entertaining media enhancements to the text - which likely took a non-insignificant amount of time for him or his team of personal assistants to put together.

Pictures are indeed worth many words, but how many $$$'s? Perhaps you might even be able to make money with such an app by using it to sneakily sell professional photos and other content.

While you are at it, could you please provide the same text-to-media service in a non-embedded application so that when I needed a clever portrayal of a concept like 'failure', 'success', or 'mass collaboration', for a presentation I could quickly look one up. I might even be willing to by it if the content was good and the price was reasonable -> in a world where I could almost certainly find what I needed by spending a little more of my own valuable time looking for it.

Friday, June 5, 2009

publisher removal

In my last post, I mentioned offhand that I could not access a PDF (about an ontology for autonomic license management) without paying a $29 fee to Springer.  Though the post was not a direct request for help running around this paywall, I have now received the pdf from 5 different people - several of whom I have never met before.


Clearly, the (micro) community that read that post believe that research articles should be shared in an open-access fashion and that it is both wrong for publishers to charge access fees and right to sabotage the publishers via peer-to-peer exchange of such articles.

I'm wondering if this micro-community (that is you) would feel any differently if the fees paid for such articles went directly to the researchers that produced them rather than to an apparently irrelevant publisher ?  

?

(Also, I wonder about the Radiohead style "tip jar" approach.  This would allow you to read the article first and then contribute a payment afterwords if you felt that the research in the article was worthy of supporting.)

Monday, February 9, 2009

non-anonymous peer review

I spent this afternoon acting as a voluntarily non-anonymous peer reviewer - its scary.  I ended up advocating rejection of the article I was reading and I have to say that Vince Smith(see end of linked post) was absolutely right that the act of signing your review "keeps you in check".  Knowing from the outset that your words are going to be linked to your name can really change what you have to say - it certainly makes you think about it for a while longer.  It is scary though - I hope that I managed to convey enough of my reasoning and suggestions for ways to improve the article that the authors don't despise me and attempt to ruin my life...  I also hope that the editors of the journal manage to acquire at least one additional reviewer for this manuscript - safety in numbers! Or perhaps the editors will strip my name from my comments?  Time will tell I guess.


Friday, January 23, 2009

getting attention

I sent the first complete draft of my dissertation to my committee last week.  Yay!  

At my last meeting with them they promised a quick response if I got it to them that day - even saying that they would reserve time specifically for reading it.  Unsurprisingly, most have not responded as they had promised - in fact most have not said anything.  Boo!  This means that either I will submit the thesis without having the benefit of their comments or I will miss my window for graduating this spring - double boo.

As many of you are no doubt aware, this is not an atypical situation.  Whether it is a 200 page thesis document, a 2 page report for the local hospital, or a technical article being prepared for publication, it can be very difficult to get essential feedback.  As Michael Nielsen has succinctly put it, the critical resource in science today is the attention of scientists.  My committee isn't ignoring me out of spite, they are just unbelievably overworked people with limited amounts of attention to meter out and I happen to be lower on their personal totem poles than their grant applications, their manuscripts, their other students, their husbands, their wives, (and probably their dogs and cats).  

The question this brings up is, when its critical that you receive some attention from a scientist or two, how do you go about getting it?  Lets make this more specific and ask, when you need some one to review one of your papers (prior-to, or rather-than a journal), how do you go about acquiring that needed attention?  

The only way that I have addressed this problem is by asking friends and family for help.  This works well (depending on your friends and family) up to a certain extent, but has some significant shortcomings:
  1. they like you and don't want to make you feel bad - which may influence their assessment of your work
  2. they may reside within the same information cocoon that you do - which means they may have little additional knowledge to contribute 
  3. they eventually get tired of helping you out because they have their own problems to deal with
What other options are there? I suggest two that both revolve around markets.  The first market exists and the second is yet to be created.

$Market #1$
In discussing whether it would be worth my time to take a free scientific writing course offered at my institute, a professor who had taken the course really encouraged me to take it.  When I asked what she got out of it, she said that the most important thing that she learned was the value of working with a professional editor.  She now pays an editor to review every research paper and every grant that she submits.  I found that a little strange.  The most valuable product of a class is to learn that you need to pay some one to help you do what the class was trying to teach you?  Weird.  I would have dropped it there, but in another conversation with a very talented and well respected author, the same advice appeared.  To write at a professional level, getting professional help appears to be a vital component.

Now, as a student or a post-doc making lets just say not a lot of money, this advice is about as valuable as another suggestion that I love to hear from friends that actually have savings and real jobs (or rich parents), "oh, you should really try to buy a house, its such an important investment and now is such a great time to buy".  Great, thanks.  As soon as my scholarship check comes in I'll head out to the real estate agent...   Lacking funds to actually pay money to an editor, what could I possibly provide in exchange for some scientific attention?

Market #2
Well, according to some definitions, you might actually call me a scientist.  In fact, several journals have successfully taken my scientific attention from me (without any form of compensation) and handed it out to other scientists in the form of peer reviews.  Maybe I could claim greater control over this process?  Maybe there is a way to generate a market within which I could pay for attention when I needed it with my attention at other times.  I'm not referring to the perhaps more exciting 'collaboration markets' that Dr. Nielsen discusses, at least not yet, I'm simply referring to a market for the direct exchange of literary review in scientific contexts.  

Here is the essence of the deal; I will exchange my attention in reading and commenting on your paper in exchange for your attention on mine.  

Here are some of the additional complexities that might make an implementation of this idea interesting;
  1. the chance to accumulate 'reviewer points' so that the system could go beyond barter and towards a more complete kind of market
  2. the opportunity for anonymity for authors and reviewers to ensure that you can always say what you think you should say
  3. the opportunity for the lack anonymity - for reviewers to be acknowledged in future iterations of the work that they review
  4. the chance for participants in the system to establish levels of trust - some reviews really are more valuable than others and this should be recognized
I think something like this is vital.  It opens up a wide range of new opportunities for improving the way science works.  Papers could be 'published' within this system and gradually accumulate findability-enhancing credibility (and improvements).  Such assessments of credibility could be used to form a continuous rating scale for 'publications' that would replace the unnecessarily binary nature of journal-based publishing without losing the filtering effect touted by its proponents.

Such a system might improve on preprint archives like Nature Precedings by both providing a direct incentive for scientists to contribute comments on papers (most papers are never commented on at all at the moment) and providing a very direct approach to the filtering problem.  

If anyone is interested in creating something along these lines, let me know.  I hope that I will be needing a job soon.

p.s.  Thanks to Mikele Pasin for thoughts we shared on the 'Paper Demolisher' at KCAP 2007 that are directly related to this post.

Saturday, November 1, 2008

dullhunk PloS Bio Article

Duncan Hull and company have just published a thorough (210 references..) review of the current state of scientific digital libraries. Anyone interested in the changing face of publishing or of the Web in general would likely find the article interesting. Out of the many ideas discussed, these two caught my attention:

"As we move in biology from a focus on hypothesis-driven to data-driven science, it is increasingly recognized that databases, software models, and instrumentation are the scientific output, rather than the conventional and more discursive descriptions of experiments and their results."
and
"We suggest that the main obstacles to warmer libraries are primarily social rather than technical in nature. Identity, trust, and privacy are all potential stumbling blocks to better libraries in the future."
To the first quote, I will say simply, hear hear!  The idea that the units with which scientific progress is published and thus measured should correspond more directly to discrete, integratable chunks of knowledge and to sharable processes for knowledge generation rather than (often unparsable) stories is one whose time has clearly come. 

The second one stuck out because it is so similar to something I played a part in writing a while back.  In another review article (with a paltry 93 references) we suggested that ".. the primary hindrances to the creation of the SWLS [Semantic Web for Life Sciences] may be social rather than technological in nature ..".  In a sense, the SWLS that we were thinking about way back then could be said to subsume the digital libraries of Dr. Hull et al.'s article.  But... looking more closely at the first quote above, I realize that relation isn't subsumption, but equivalency.  Though they are writing specifically about 'libraries', they clearly consider databases, software, etc. as parts of the new incarnations of libraries and thus are writing about exactly the same thing that we were writing about.   

Its interesting that we came to similar conclusions to some extent, both articles suggest that the main challenges in moving science forward on the Web are in handling problems that have people at their center, not technology.  

Sunday, September 7, 2008

fear and loathing in academentia

OK, I'm mad and probably shouldn't write the following today. Oh well.

Now, here is why I am mad. I've had a paper rejected by the Journal of Biomedical Informatics on the basis of one review. It took two months to get this review. The review does not seem fair and certainly does not provide useful guidance about how to improve the quality of the science described in the paper. Here is JBI's response in its totality, with some embedded reactions from me in red.

Ms. No.: JBI-08-163
Title: OntoLoki: an automatic, instance-based method for the evaluation of biological ontologies on the semantic Web
Corresponding Author: Dr. Mark Denis Wilkinson
Authors: Benjamin M Good; Gavin Ha; Chi Kin Ho;

Dear Dr. Wilkinson,
I requested that my advisor be the corresponding author on the paper because I would be traveling right after the submission and am hoping to relocate soon.
Experts in the field have now reviewed your paper, referenced above. Based on their comments, we regret to inform you that we are unable to accept your manuscript for publication in the Journal of Biomedical Informatics.

We have attached the reviewers' comments below to help you to understand the basis for our decision. We hope that their thoughtful comments will help you in future submissions to the JBI and in your future studies.

Sincerely,

Janine Burch
Journal of Biomedical Informatics, Editorial Office

Elsevier
525 B Street, Suite 1900
San Diego, CA 92101-4495
USA
Phone: (619) 699-6392
Fax: (619) 699-6211
E-mail: jbi@elsevier.com

Reviewers' comments:

Reviewer #2:
Its a little odd that we only got to see Reviewer 2's comments. I don't know if anyone else reviewed it or not.
Good and his colleagues present OntoLoki, a very interesting approach for data-driven ontology evaluation. The novel idea is that the quality of ontologies can be measured automatically although ontologies without or with very few formal restrictions on class membership are used. For poly-hierarchically organized classes suitable datasets with positive examples - i.e. instances with properties - as well as negative examples are composed. Machine learning algorithms are used to determine empirically those rules (patterns of properties) that allow predicting class membership reliably. With other words: The ideal situation is to find instances of classes like "Cat" with properties like "furry" allowing their consistent assignment to the class "Cat" and discrimination to neighbour classes like "Bird".
Yep, that pretty much sums up the general idea. So far, so fair.
There are a lot of inherent challenges with this approach that are addressed by the authors, e.g.
- the dependence on the context (chapter 1) and on the way of determining instances and their properties (chapter 1.2.1)
- the problem of sufficient number of instances for every class for estimating a class predictor (chapter 1.2.4)

These are common problems when using empirical approaches. However, the reviewer has doubts about the suitability of the OntoLoki approach for evaluating ontologies. The authors themselves admit that especially the results of the Cellular Component experiment are suboptimal, see chapter 3.2.1 (only 17% is evaluated) and chapter 3.2.3 (results are not overwhelmingly illuminating).
OK. At this point the reviewer has pointed out that we correctly identified challenges with empirical approaches to ontology evaluation and discussed them in respect to our approach in the paper. Both of these challenges, context-sensitivity and data dependency, are fundamental to any methodology that is based on the use of data to help answer a question. Keep in mind that the main point of the paper is to describe and evaluate a method. To do so, we explain it and then test it out in a variety of different scenarios (different ontologies and different datasets). In some cases it is successful and others it is not. By describing the results from all of these experiments we faithfully represent the realities of applying the method.

The reviewer criticizes the method by pointing out our own admissions regarding problems encountered with the dataset assembled for the evaluation of the cellular component branch of the gene ontology without actually saying anything about the method itself. Perhaps, criticism could fairly be placed on our data collection methods for that particular ontology. However, the point was not to evaluate that ontology it was to evaluate the proposed method. That 17% number resulted because we didn't collect enough instances to evaluate the other classes. If we collected more data, the number would have been higher, but that is completely irrelevant to the utility of the method and our evaluation of it. In fact, by including data like that, we much more accurately present both the positive and negative aspects of the method. Perhaps next time we should simply obscure any negatives to avoid such criticism.

The MAIN PROBLEM the reviewer has with this approach:
OntoLoki tries to solve a structural classification problem empirically that originates in poor defined ontologies. Instead of (suboptimally) trying to determine the consistency of ontologies with no formally defined restrictions on class membership the ontologies should be enriched by such formal definitions on class membership, see http://bioinformatics.oxfordjournals.org/cgi/content/abstract/22/14/e530.
Now, here is where this starts to get ridiculous. "the ontologies should be enriched by such formal definitions". Well, we couldn't agree more! That is one of the main reasons we did this! OntoLoki provides a starting point for doing just that!

As the reviewer seemed to understand in the summary of the paper above, the method is intended to be applied to ontologies (or whatever you want to call class polyhierarchies used in classification situations) that aren't necessarily formally defined. The rules that are learned could be used to suggest possibilities for formal class restrictions that are based on the data the classes are already associated with.

As a matter of fact, the OntoLoki method can actually be used on formally defined ontologies to identify candidate expansions of other definitions. We recognize the importance of these definitions and the reasoning they allow for, that is why the reference so generously provided above was one of the main citations in the paper! In fact, the ontology described in that paper was used as a benchmark of quality for other ontologies - thus providing us with a means to evaluate our method.

In the introduction Jeremy Rogers and later on Barry Smith are referenced as proponents of ontology evaluation. However, these and other researchers in the field of biomedical ontology are mostly concerned with the quality of explicit formal definitions and the structure of ontologies, see http://ontology.buffalo.edu/evaulation.html.
What exactly does the "however" mean here? Indeed, both of these scholars are involved in ontology evaluation and I would say are, in fact, proponents of the idea. Why the contrasting "however"? The quality of formal definitions and the structure of ontologies (which can of course result directly through inference applied to those formal definitions) are certainly aspects of relevance to the domain of ontology evaluation.

The structure of ontologies must certainly have something to do with the inferred or asserted class hierarchies they produce. The OntoLoki method is designed for evaluating these hierarchies. So.. why this statement ? Perhaps you could argue that the method is not useful in achieving the task, but it doesn't make any sense to say that the task is irrelevant as seems to be implied here.

The very idea of ontologies is to have explicit criteria for deciding class membership of instances opposed to ambiguous language terms denoting those classes. If there are artefacts with no formally defined restrictions they should not be called ontology.
Alright, now we've got to essence of this so-called "review". The reviewer doesn't believe that the things the method was built to evaluate should be called ontologies. So they don't believe the Gene Ontology is an ontology and they don't believe that most of the ontologies in the OBO foundry are ontologies. OK, fine. Perhaps the reviewer should have suggested that we change the title and used a different word to describe whatever it is these things are. The complaint has absolutely nothing to do with the manuscript! The maddening thing is that we have been (sometimes very lonely) proponents of the expanded use of axiomitized, property-based definitions in biological ontologies for years and are still very much of this view. To be criticized for the community's fairly slow uptake of these methods makes my head feel like its going to explode.
However, the machine learning methods are very interesting for supporting different purposes in the context of REAL ontologies WITH formal restrictions on class membership", see chapter "Making use of OntoLoki" in the discussion section. The whole paper should be rewritten oriented to those other supporting purposes in the context of developing, using and evaluating ontologies.
Well thanks. It seems that some of the applications of the method (and the software we developed) are "very interesting" but only in the context of "REAL ONTOLOGIES". As it turns out, the method and implemented code could be applied directly to REAL ONTOLOGIES without alteration. (Note that the capitalization is from the reviewer).
This paper, submitted as a paper in the Biomedical Informatics Journal, is a copy of a Technical report, see http://bioinfo.icapture.ubc.ca/bgood/OntoLoki_14.pdf.
That this is even mentioned as a presumed negative is outrageous. The report (which does in fact contain the same content as the submission) is not a peer-reviewed publication, it is simply a very informal pre-print. Posting it is perfectly in accordance with Elsevier's rules when it comes to pre-prints, rules that it is clear the reviewer is not aware of.
It is far to long and should conform to the editorial guidelines of the journal.
First, I actually agree that it is probably a bit too long. We discussed this at some length before deciding to submit the full version and, in the end, decided that the length was warranted in this case in order to present the argument and experiments in completion. We could shorten it, and likely will when we resubmit to a different (open-access) journal, but that was actually one of the reasons we chose JBI - they explicitly state that there is no "arbitrary limit on the length of individual articles". The submission was well within the editorial guidelines of the journal - guidelines which the reviewer, again, was clearly not familiar with.

Ok, my rant is over now, the red has drained out of my face and I can no longer hear my heart beating in my ears, so I will switch back out of the red to conclude.

So, Reviewer #2, who are you?

One of the more impressive people I met at SciFoo told me that he has been signing his reviews for years to "keep himself in check". If reviewers had to sign their reviews it seems that perhaps they might be forced to do a better job. Good quality reviews (either arguing for reject or accept) would provide another form of publication - another way for scientists to get credit for the work that they do. Are you up to it? Sign your next review.

Thursday, July 10, 2008

OntoLoki lives!


(Update June 8, 2012.  This paper was not accepted at first submission (see sad story), but can be accessed as Chapter 4 in my dissertation.  I've had enough interest in the concepts it contains that it is probably worth resubmitting it somewhere, someday, somehow...)


As noted in previous posts labeled with the tag OntoLoki, I've been working off and on for a few years now (yikes) on a program for automatic ontology evaluation.  Now, we are getting ready to submit our first paper on the subject and would like to open things up for comments.  I labeled it as a technical report in hopes of starting a tradition of such things in our laboratory.  It seems like a good way to keep the locals on track and have another chance for reviews before things go out into the scary world of official peer review.  I suppose I could drop this into Nature Preceedings again, but I'm tempted to wait until its gone through more revision cycles before I do so as that is likely to form a more permanent record than I am really ready to commit to I think.


Here is the longish abstract to whet your appetite.  I was thinking of blogging the rest of the document as distinct posts for each section - thoughts on that?  
As always I really appreciate any time you spend here and any ideas that you choose to share.

Abstract
Background: The delineation of clear, logical definitions for each class in an ontology and the consistent application of these definitions to the assignment of instances to classes are important criteria for ontology evaluation. If ontologies are specified with formal, property-based restrictions on class membership, then such consistency can be checked automatically using existing technology. If no such logical restrictions are applied however, as is the case with many current biological ontologies, there are currently no automated methods for measuring the semantic consistency of instance assignment on an ontology-wide scale, nor for inferring the patterns of properties that might define a particular class.

Objective: The aim of this study is to identify, implement, and test a new method for automatic, data-driven ontology evaluation that is suitable for the evaluation of ontologies with no formally defined restrictions on class membership. The method should quantify the consistency of instance classification within such an ontology based on patterns of properties found to be associated with the instances of particular classes.

Design: We constructed a program that takes as its input an OWL/RDF knowledge base containing an ontology, instances associated with each of the classes in the ontology, and properties of those instances. For each class, it outputs: 1) a rule for determining class membership based on the properties of the instances and 2) a quantitative score for the class that reflects the ability of the identified rule to correctly predict class membership for the instances in the knowledge base. To test the proposed method, we constructed a series of knowledge bases that varied from perfectly consistent through to completely random and evaluated each one using the implementation. In addition to this artificial control study, two other well-known biological ontologies were evaluated using public data to provide indications of the behavior of the system in realistic contexts.

Results: In the first experiment, the method produced direct quantitative assessments of the different versions of the knowledge bases that correlated directly with the known level of consistency of instance assignment for each knowledge base. The evaluations of the other ontologies indicated that the method was successful at detecting relevant patterns associated with the instances of classes in real biological ontologies based on publicly available data.

Conclusion: The results indicate that the suggested method can be used to conduct objective, automatic, data-driven evaluations of biological ontologies without formal class definitions in regards to the property-based consistency of instance-assignment. This inductive method complements existing, purely deductive approaches to automatic consistency checking, offering not just the potential to help in the ontology engineering process but also in the knowledge discovery process.

Monday, June 2, 2008

academic publishing innovation contest

For those that are interested in improving upon current approaches to academic publishing, you may be interested in this contest put on by Elsevier.
"The Elsevier Grand Challenge: Knowledge Enhancement in the Life Sciences is a contest created to improve the way scientific information is communicated and used. The contest invites members of the scientific community to describe and prototype a tool to improve the interpretation and identification of meaning in (online) journals and text databases relating to the life sciences..."
Its a chance to win $35,000 and, potentially, to have a real impact.  

Monday, February 11, 2008

another Scientific American article on the Semantic Web

 

A follow-up to the much cited Scientific American article on the Semantic Web has recently come out.  Would love to say more about it, but I can't find a free copy at the moment..  

Links related to the new article now:
If anyone has a copy, perhaps you could bravely post it somewhere and comment a link to it?

Its both funny and I think instructive that the number one hit in Google Scholar when searching for "semantic web" is to the first (2001) article in this now two-part series, but that, out of the 36 versions of that article that are listed, none of them link to Scientific American in any way!  Seeing that the article has been cited more then 5,000 times, it seems that S.A. could probably have made a quite a bit of money one way or another if they had decided to host a free public version from the outset and remained the main provider of that information.



Thursday, December 6, 2007

official reviews of E.D.

After more than three months, I've just received notification that the E.D. manuscript has been rejected for publication in the semantic mashup edition of JBI. I provide the reviews below and pose the question to you, the ether - what should I do now? Should I carry out some user-studies and resubmit? Should I make ammendments to the text as suggested by the second reviewer and resubmit? Should I send it to a different a journal? Should I give up on it and finish other pending projects?

?


Dear Mr. Good,

Experts in the field have now reviewed your paper, referenced above. Based on their comments and the number of submissions, we regret to inform you that we are unable to accept your manuscript for publication in the Special Issue "Semantic BioMed Mashup" of the Journal of Biomedical Informatics. One of the major concerns is that more work (e.g., a better use case) is needed for increasing the substance of the paper. You may consider revising the paper according the reviewers' comments and re-submit it to a regular JBI issue in the future.


We have attached the reviewers' comments below to help you to understand the basis for our decision. We hope that their thoughtful comments will help you in future submissions to the JBI and in your future studies.

Sincerely,
JBI Editorial Office

Reviewers' comments:

Reviewer #1: This paper could be a good workshop paper but it is not suitable for journal publication. The paper describes a prototype interface for a system that could potentially be useful, but provides no evaluation whatsoever. It doesn't even say if there are any users of the system. As the authors rightly note, there are many open questions, and even a simple user evaluation would have taken some steps in addressing those questions. Otherwise, it looks like an ad hoc exercise.

For instance, do the users use the suggested tags correctly, or is not being able to see the context for definition makes them select wrong terms? What happens if there are several tags from different vocabularies? Is the extra selection step too cumbersome and users won't bother? How is the agreement between users? How about the agreement with manually generated tags such as MeSH headings?

Without at least some evaluation, I don't think the paper can be a journal paper.

HOwever, the research goal is worthwhile and the approach interesting, so I would strongly encourage the authors to pursue it!

Additional comments: you compare the number of MeSH tags and Connotea annotations and suggest that the difference on the number of tags per item is somehow an indication of quality. I have a hard time understanding how the number of tags corresponds to teh *quality* of those tags. All this shows are the difference in scale.

I think the section motivating adding controlled vocabularies to social tagging systems (:Linking taggers and their tags...) is too one-sided. No potential drawbacks are discussed. What if users don't understand the tags from these controlled vocabularies and use them incorrectly? Would it be worse than not using them at all? Do users need to understand the vocabularies? Will non-rpofessional users know the vocabularies enough to use them without any special training? All this discussion must be present in the paper.

You say that you couldn't use the NCI Thesaurus and WordNet because they are too big for "Semantic Web technologies" This is not true. Many Semantic Web tools can process these easily; so it is a limitation of your technology.


Reviewer #2: This paper presents an application, the Entity Describer (ED), for
generating and storing controlled semantic annotations on biomedical
resources, as an extension of the Connotea social tagging system.
The authors briefly review semantic annotation (i.e., professional
indexing) in biomedicine and social tagging of Web resources, before
comparing the two. While professional annotation results in more
complete, standard and accurate sets of annotations, it is also not
sustainable due to its cost. The authors argue that the quality of
annotation through social tagging would improve if the taggers used
standard terminologies rather than homegrown tags. In order to explore
this hypothesis, they combined an existing social tagging system,
Connotea, with some controlled terminologies, including MeSH and GO. The
application is built -- using Semantic Web technologies -- as a mashup
of Connotea, terminologies and a database of annotations. The ED
modifies the Connotea interface to help users select terms from
controlled vocabularies and stores these annotations in a database,
while maintaining the usual features of Connotea. A prototype of this
application has been developed. The authors propose this annotation
model as an alternative to professional indexing and automatic
indexing. Future work includes making additional terminologies available
and applying ED to other tagging systems than Connotea.


This paper on enriching social tagging with controlled terminologies
through Semantic Web technologies is undoubtedly relevant to this
special issue. The paper is interesting and clearly written, easily
accessible to a readership that would not be familiar with the Semantic
Web. The references are appropriate.
This reviewer has essentially minor reservations about this manuscript
regarding the overall organization, statement of objectives, and the
discussion. These points could be addressed easily. The only major
reservation is the absence of proper discussion of the limitations of
this work.

Overall organization
The paper is composed of ten sections. Although logically flowing, this
succession of small sections might be distracting to the reader as it
fails to reflect the overarching organizational structure of the
paper. I would recommend grouping the first 3 sections under
Introduction/background and the next 4 under Materials and
Methods. Discussion and future work could be grouped.

Statement of objectives.
Again, it does not become clear until section 4 what the objectives of
this work are. I would recommend adding a short introduction to present
the issues of professional indexing and social tagging and stating that
the application presented proposes to reconcile them.

Discussion and future work.
In the future work section, rather than a litany of issues, it would be
useful to regroup the issues around terminology-related and
system-related issues. The issue of extension to other terminological
systems is presented in a somewhat naive manner. For example, it does
not look like the authors have fully appreciated the issues in making
the UMLS available through this system (e.g., size, lack of explicit
subclass relations, intellectual property restrictions, etc.)

Insufficient discussion of the limitations of this work.
The discussion is extremely short. The limitations of this work are not
clearly mentioned (lack of an evaluation [or even a metric for an
evaluation], scalability issues, etc.) Another limitation of this
approach is that you never make the point that professional indexing
relies not only on a controlled terminology, but also on a set of
indexing rules, used to further control the use of the indexing
terms. Finally, in your OWLization of MeSH, you briefly mention
converting broader/narrower links into subClassOf properties, without
raising any issues. What about Liver subClassOf Abdomen? I understand
this shortcut helps you meet the requirement that OWL be used in the
framework of this mashup. This is nonetheless highly inappropriate and
deserves being addressed in the discussion. The short paragraph about
using SKOS instead of OWL should be expanded and moved to the
discussion. The work of Guus Schreiber's group on representing MeSH in
SKOS should be acknowledged. http://thesauri.cs.vu.nl/eswc06/

Technical comment about MeSH.
Figure 7 shows hippocampus in MeSH identified by
"A08.186.211.577.405". Using tree numbers instead of the unique ID
(D006624) is bad practice. In this example, it so happens that
hippocampus occurs in only one hierarchy and has therefore only one tree
number. Most MeSH descriptors, however, have several tree numbers.
A side effect of this practice is that URI based on tree numbers would
result in multiple, non-reconcilable identifiers for the same MeSH
descriptor, leading to seemingly distinct annotations for the same
descriptor.


Minor comments
- Introduction: The sentence "Examples of semantic annotation
... UniProt [1-3]." would fit better between the first two sentences,
that at the end of the first paragraph.
- Introduction, 2nd paragraph: Arguably, the semantic annotation of
MEDLINE citations with MeSH describes *topics* more than it described
*entities*.
- p. 3: "The act of adding a resource to a social tagging collection" Do
you mean "The act of adding a tag to a resource"? The tagging *event*
is a process and cannot be composed of entities such as a tagger,
etc. please rephrase.
- p. 8: "formal training in classification" Do you mean "formal training
in *annotation* (or indexing)"? Classifying resources is a different
issue.
- p. 8: "main subject descriptors". The "official" MeSH terminology
refers to "main headings" (or "descriptors"), with the "major
descriptors" (marked by an asterisk) denoting the main topics in the
article. It is probably safer to stick to this terminology. In this
case, 12.7 must be the average number of descriptors, not major
descriptors.
- p. 9: Since your goal is to compare dispersion of the number of
descriptors in MEDLINE and Connotea, where the means are different,
the coefficient of variation should be used instead of (or in
addition to, for the purpose of the comparison) the raw standard
deviation values. For details, see:
http://en.wikipedia.org/wiki/Coefficient_of_variation
- p. 10: This section should introduce the notion of "controlled
vocabularies" or "controlled terminologies".
- p. 10: Arguably, what will decrease is not so much the quality of the
annotations as it is their *homogeneity*.
- p. 11: Please some background on GreaseMonkey.
- p. 12: It is unclear why the term "Hip" in MeSH (D006615)is not
retrieved as part of the list of terms suggested for the entry "hip".
- p. 15, first line: "if a non-annotation property and was used..."
Remove and.
- p. 15, later: "it would render the knowledge base OWL-Full". you
probably mean: "it would require OWL-Full for the representation of
the KB"
- p. 17, bullet 1: It is unclear what is the justification for
suggesting the addition of these particular terminologies.
- p. 18, bullet 3: tree-like interfaces would be extremely inconvenient
to render biomedical terminal terminologies with a high degree of
multiple inheritance.

Wednesday, October 17, 2007

Where is the API?

Yes.. I am procrastinating. I should be sleeping, working on the OntoLoki automatic ontology evaluation system, or preparing for our meeting with the SWAN team tomorrow morning; but instead, I am perusing Project Prospect and thinking about what a journal should look like. This is largely because of my disappointment in reading this nascent blog post in which Ian Mulvaney (a person who I think I respect and leader of a project I obviously find fascinating) suggests that enforcing the application of naming standards for chemical entities at the time of publication would a) be too hard for authors, b) not provide much benefit, c) that it would be better to let this be a voluntary step - all of which I absolutely disagree with.

This, and comments on the post, lead me to Project Prospect - which seems to be the first real publisher to take the idea of semantic enhancements of online manuscripts seriously.

Project Prospect provides semantic annotation (e.g. labeling GO terms etc. in manuscripts) and uses this to provide some enhanced navigation patterns and some additional information (e.g. definitions) for any of the annotations. Doing a pretty nice job at this was apparently enough to win them the 2007 ALPSP/Charlesworth Award for Publishing Innovation. While this is certainly a nice addition and a step in what I think is the right direction, it is 1) overwhelmingly similar to the much older and much more flexible, Conceptual Open Hypermedia ServicE (COHSE) from the University of Manchester and 2) does not seem to provide any capacity for semantic integration of the manuscripts in the collection.

Is this really the best we can do?

What I would like to see is a journal with an API. An API that would let me ask it questions like "what genes are present in articles published in this journal that contained both go:0005576, or any of its children , the word 'vaccine', and are described in the article as being upregulated". Right now, we can approach this sort of question with text-mining, but, with extensions to work like that done to enable the hypermedia browsing defined above (which fundamentally depends on solid entity identification and annotation within the document), this question (which spans multiple manuscripts) could be answered with a relatively straightforward query.

Its time for journals to step up an stop wasting talented researchers time writing text mining algorithms. Lets build a journal with a proper API, one with standards compliant methods for both writing content to it and querying the content inside it programmatically. Such a journal would not only improve human navigation and understanding of its independent textual documents, but would also enable entirely new modes of interaction with the integrated knowledge spanning all of its semantic content.

Tuesday, September 25, 2007

Nature Preceedings verse the blog(b)!


OK, its been a few weeks since I posted my latest piece of work on my blog (Sept. 4, 2007) and on Nature Precedings (Sept. 7, 2007). I think that is enough time to give a little summary of my experiences with both.


number of commentsnumber of ambiguous votesnumber of potential job offers
Blog1001
N. P.040


Based on these metrics, the blog post is clearly the winner; however, the preceedings version does have a few plusses not listed. It does appear to be a tiny bit more professional looking, there is a consistent versioning system, they offer a standardized way to cite the draft, the voting system has some potential, and it is frankly cool to get yourself onto their home page in any way possible..

In my humble opinion, N.P. could be improved by:
  1. enabling both positive and negative votes

  2. improving their submission system such that content not suitable for inclusion in a PDF such as large images, movies, and so on could easily by added

  3. notifying the authors of submitted manuscripts when comments or votes are added to their manuscripts

So, why did I get comments on my blog and not the N.P. post?
I think it is mostly because of the personal, social nature of the blog as a media. Many of the comments, (though not all) came from people that I think are signed up to a feed for my blog, the majority of which are personal friends (again, not all). These are the people that are most likely to a) be interested in what I have to say and b) to take the time to provide a useful response. If a post is interesting enough, these same people will tell their friends about it and they will tell their friends about it and all of a sudden it will have reached the right segment of the Internet population before Google Karma has even had a chance to act.

Nature is a broad spectrum journal with Nature Precedings even broader. If the Precedings idea is to take hold and thus reach a large enough participating audience to become interesting, I suggest that they need to figure out how to better accomodate the social side of this equation. (and don't think they aren't working on that..)