In the life science domain, the total number of items described by social tagging systems is currently tiny in comparison to the number of resources described by institutions. To illustrate, the MEDLINE bibliographic database contains over 16 million references while, as of November 9, 2008, Citeulike, the largest of the academic social tagging services, contained references to only about 203,314 documents with known PubMed identifiers.
Monday, November 17, 2008
Though important to see where things stand today, the interesting aspect of these new systems right now is their potential for growth. Given the large numbers of contributors (and very large numbers of potential contributors), it seems possible that their coverage might eventually meet or surpass that of resource-contrained institutional mechanisms. In 2007, the NLM reported that it indexed 670,943 citations for the MEDLINE database which equates, on average , to about 56,000 citations per month. To estimate if social tagging services might someday reach the same level of throughput as the NLM indexing service, we compared the rates of growth, per month, for MEDLINE and for Citeulike (on Pubmed citations) over the last several years and used this data to make some predictions of future trends. Here is what we came up with.
The figure plots the numbers of distinct Pubmed citations described by users of Citeulike and by NLM indexers each month and, using exponential smoothing, plots an extrapolation of the observed trends several years into the future. Based on the data obtained so far, we find that the numbers of biomedical resources described per month by Citeulike users is increasing more rapidly than the the number indexed per month by MEDLINE and that, if current trends continue, Citeulike coverage would catch up with MEDLINE around the year 2014 - at which point both systems would be describing approximately 70,000 biomedical citations per month. As the rapidly expanding confidence intervals illustrate, there is insufficent data to provide strong evidence for the precise point of intersection or even that Citeulike will continue to grow; however, it seems plausible that Citeulike and other scientifically oriented social tagging services will continue to expand in their coverage of the life sciences domain at a faster rate than institutional systems and thus will eventually catch up to the point where every document indexed by a professional is also tagged for personal use by a scientist (or 10).
So, what are we going to do with all of that data ?