Information Retrieval:  A Health & Biomedical Perspective

Information Retrieval:  A Health & Biomedical Perspective (Second Edition)

William Hersh, M.D.

Springer-Verlag , 2003

Back to Updates Table of Contents

Update to Chapter 4 - Content

(4/24/06) The second edition of the book describes the massive changes in the information world that occurred after the first edition. Well, now there have been some pretty significant changes just a few years after the publication of the second edition. These include:
A great source for updates on all content produced by the NLM is the NLM Technical Bulletin, available at:
http://www.nlm.nih.gov/pubs/techbull/tb.html

(4/20/03) The Severe Acute Respiratory Syndrome (SARS) outbreak demonstrated not only how mainstream IR technologies have become, but also how quickly information can be obtained and disseminated. The Centers for Disease Control quickly established a Web site that provides comprehensive information to researchers, clinicians, and patients (http://www.cdc.gov/ncidod/sars/). Furthermore, the Internet enabled scientists to collaborate in unprecendented ways, with the prime example being the rapidity of which the genome of the newly discovered coronavirus causing the illness was sequenced (http://www.cdc.gov/ncidod/sars/sequence.htm).

4.1 Classification of health information

4.2 Bibliographic

4.2.1 Literature reference databases

4.2.1.1 MEDLINE

(4/22/07) For the most recent statistics on MEDLINE, you can visit the Citation Counts (http://www.nlm.nih.gov/bsd/medline_cit_counts_yr_pub.html) and Key Indicators (http://www.nlm.nih.gov/bsd/bsd_key.html) pages. The on-line version (MEDLINE, aka, MEDLARS On-Line) recently turned 35 years old (Anonymous, 2006). MEDLINE contains over 16 million records and covers over 5,000 journals. Over 600,000 new records are now added annually. Nearly 89% of the citations are published in English, although 29 other languages are represented. About 76% of the records have English abstracts, including some non-English articles. The database is updated weekly. An updated fact sheet about MEDLINE is available at:
http://www.nlm.nih.gov/pubs/factsheets/medline.html

Many people confuse MEDLINE with PubMed. It used to be simple to say that the latter was the search software and interface to the former. However, as noted in the fact sheet outlining the differences between the two (http://www.nlm.nih.gov/pubs/factsheets/dif_med_pub.html), PubMed also includes additional records not in MEDLINE, including:
NLM has also established a "baseline repository" for records in PubMed as well as other tools (http://mbr.nlm.nih.gov/). These represent static views of the data from a given point in time.  For example, the baseline of the MEDLINE/PubMed database for 2007 (created in November, 2006) contains over 16 million records.

The NLM has developed a fact sheet concerning notices in MEDLINE records (http://www.nlm.nih.gov/pubs/factsheets/errata.html):
Some older aspects of MEDLINE have been retired. One of these is the paper version of the venerable Index Medicus (Anonymous, 2004). For many older users of the biomedical literature, these huge paper catalogs used to be the entry point in the biomedical literature. Now, of course, accessing MEDLINE is virtually ubiquitous through PubMed, which is freely available on the Web anywhere. Another feature of MEDLINE recently retired is the old MEDLINE UI (Unique Identifier), which has been replaced by the PMID as the only unique identifier for MEDLINE and OLDMEDLINE citations (Tybaert and Rosov, 2004).

Despite new changes in MEDLINE, the OLDMEDLINE database, representing citations from before the official 1966 "start date" of MEDLINE, continues to grow (Demsey et al., 2003). OLDMEDLINE now has over 1.7 million references dating back to 1950. A fact sheet about it is available (http://www.nlm.nih.gov/databases/databases_oldmedline.html).

NLM has varied in how it has treated "corporate" (i.e., named groups) authors over the years. Prior to 2000, they were listed at the end of the Title (TI) field. In 2000, however, a new field was established, Corporate Name (CN), with the name also displayed at the end of the Author (AU) list. Starting in 2006, however, the order in the AU list was changed to where it appeared in the actual journal (Knecht, 2006)

The table of MEDLINE subsets in the book is somewhat out of date. Here is a list of the current subsets:

Subset Contains citations about or includes
AIDS AIDS and HIV
Bioethics Bioethics
Cancer Cancer
Complementary and alternative medicine Complementary and alternative medicine
Core clinical journals Several hundred clinical journals
Dental journals Dentistry
History of medicine History of medicine
MEDLINE MEDLINE citations only
Nursing journals Nursing
Old MEDLINE MEDLINE citations prior to 1966
PubMed Central Articles in PubMed Central
Space life sciences Space life sciences
Toxicology Toxicology

Anonymous (2004). Index Medicus to Cease as Print Publication. NLM Technical Bulletin. May-June 2004. e2. http://www.nlm.nih.gov/pubs/techbull/mj04/mj04_im.html.
Anonymous (2006). MEDLINE Turns 35! NLM Technical Bulletin. September-October, 2006. http://www.nlm.nih.gov/pubs/techbull/so06/so06_med_35.html.
Demsey, A., Nahin, A., et al. (2003). OLDMEDLINE Citations Join PubMed. NLM Technical Bulletin. September-October, 2003. e2. http://www.nlm.nih.gov/pubs/techbull/so03/so03_oldmedline.html.
Tybaert, S. and Rosov, J. (2004). MEDLINE Data Changes - 2004. NLM Technical Bulletin. Bethesda, MD, National Library of Medicine: 335:e6. http://www.nlm.nih.gov/pubs/techbull/nd03/nd03_med_data_changes.html.

(5/6/03) For an overview of the history of MEDLINE, see:
Zipser, J. (1998). MEDLINE to PubMed and Beyond. National Library of Medicine. http://www.nlm.nih.gov/bsd/historypresentation.html.

4.2.1.2 Other NLM Bibliographic Resources

(4/17/05) The LocatorPLUS system has been made accessible under the NLM's Entrez system as the NLM Catalog (Jacobs, 2004). This allows improved searching functionality and integration with all of the other resources in Entrez.

Jacobs, A. (2004). New Entrez database: NLM Catalog. NLM Technical Bulletin. September-October, 2004. e2. http://www.nlm.nih.gov/pubs/techbull/so04/so04_entrez_cat.html.

4.2.1.3 Non-NLM Bibliographic Databases

(4/22/07) A database of peer-reviewed journal literature for the complementary and alternative medicine field is the Manual Alternative and Natural Therapy Index System (MANTIS, http://www.healthindex.com/). MANTIS indexes over 1,000 journals and has over 280,000 records in its database. Some full-text articles are available as well.

Another new bibliographic database is Scopus (http://www.scopus.com/), a product of Elsevier that includes 29 million records covering 15,000 journals from 4,000 publishers, including 5,300 health science journals (Burnham, 2006). Scopus also includes links to the full text of articles as well as cited and citing documents. The database also contains patents and scientific Web pages.

For medical educators, the Association of American Medical Colleges (AAMC) has developed MedEdPORTAL (http://www.aamc.org/mededportal), a database of peer-reviewed medical education resources. Each record in the database contains metadata about the resource, such as its educational objectives and document type. For computer programmers, a new bibliographic resource is Krugle (http://www.krugle.com/), which is a bibliographic database of open source computer code as well as information about computer code.

In the bioinformatics/text mining community, a new bibliographic database is Biomedical Literature (and text) Mining Publications (BLIMP, http://blimp.cs.queensu.ca/).

For consumers, a new Web catalog is Healthline (http://www.healthline.com/), which limits its database to Web pages vetted by an expert editorial board and allows users to rate pages and write reviews.

But problably the biggest "buzz" in on-line access to scientific literature comes from Google Scholar (http://scholar.google.com/), which contains links to full-text scientific articles on the Web, even those that are protected by passwords (for subscribers) (Banks, 2005; Henderson, 2005). As will be noted in Chapters 5 and 6, the interface to Google Scholar is similar to that of Google, with searching by words in articles and sorting of results by number of Web links to the article. The latter gives a fascinating ranking of the output of searching on me.  Google Scholar has inspired other approaches, such as Microsoft's Windows Academic Live (http://academic.live.com/), which is currently limited to about 4,300 computer science, electrical engineering, and physics journals.

Banks, M. (2005). The excitement of Google Scholar, the worry of Google Print. Biomedical Digital Libraries, 2: 2. http://www.bio-diglib.com/content/2/1/2.
Burnham, J. (2006). Scopus database: a review. Biomedical Digital Libraries, 3: 1. http://www.bio-diglib.com/content/3/1/1.
Henderson, G. (2005). Google Scholar: a source for clinicians? Canadian Medical Association Journal, 172: 1549-1550.

(4/19/04) A number of bibliographic resources are valuable especially for the biomedical informatics field. One of these was mentioned in the book in the Preface and Chapter 10, but really should be mentioned here. This is CiteSeer (also at one point called ResearchIndex, http://citeseer.ist.psu.edu/), which maintains a database of computer science-oriented (including biomedical informatics) scientific literature. Each record contains bibliographic data, links to the full text (if available), and links to other papers that it cites as well as those that cite it.

(4/18/04)  Other bibliographic databases for computer science include:

4.2.2 Web Catalogs

(4/22/07) OMNI is now called Intute, which itself is a Web catalog maintained by unviersitites in the United Kingdom. The medical portion of the site is at:
http://www.intute.ac.uk/healthandlifesciences/medicine/

(4/17/05) Not really a Web catalog per se, but a growing bibliographic-type resource on the Web is RSS, which is claimed to stand for either Really Simple Syndication or Rich Site Summary (Pilgrim, 2002, Hammersley, 2003; King, 2004). RSS "feeds" provide short summaries, typically of news or other recent postings on Web sites. Many news sites, such as CNN (www.cnn.com), BBC (www.bbc.co.uk), and USA Today (www.usatoday.com) provide them. Users receive RSS feeds by an RSS aggregator that can typically be configured for the site(s) desired and to filter based on content. (An RSS aggregator is built into the new FireFox Web browser from Mozilla.org.)

There are unfortunately a number of different versions of RSS, although each has the fundamental fields and most aggregators can handle all of the different versions. The various versions can be grouped into two categories.  One category (version 1.0) builds on the Resource Description Framework (RDF) and aims to allow rich metadata, while the other category (version 2.0) uses plain XML and aims to keep things very simple. The fundamental fields of RSS include:
Here is an example of XML code from an RSS item from the BBC:
<title>
Google maps give fresh perspective
</title>
<link>
http://news.bbc.co.uk/go/rss/-/2/hi/technology/4448807.stm
</link>
<description>
Search engine Google offers users the chance to see satellite photos of many locations in North America.
</description>
RSS is not limited to news feeds. In fact, there are a growing number of innovative uses for it in scientific fields (Hammond et al., 2004). Certainly it can be used for newly published scientific papers an an information notification application, similar to the electronic table of contents most journals already offer. This is already being done by the Nature Publishing Group and Highwire Press. Nature also circulates its job advertisements via RSS. More recently, NLM has made MEDLINE records available via RSS (Canese, 2005).

Canese, K. (2005). RSS Feeds Available from PubMed. NLM Technical Bulletin. http://165.112.6.70/pubs/techbull/mj05/mj05_rss.html.
Hammersley, B. (2003). Content Syndication with RSS. Sebastopol, CA. O'Reilly & Associates.
Hammond, T., Hannay, T., et al. (2004). The role of RSS in science publishing: syndication and annotation on the Web. D-Lib Magazine, 10(12). http://www.dlib.org/dlib/december04/hammond/12hammond.html.
King, A. (2004). Introduction to RSS. Westport, CT, WebReference.com. http://www.webreference.com/authoring/languages/xml/rss/intro/.
Pilgrim, M. (2002). What is RSS? XML.com. http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html.

(4/19/04) Another Web catalog that is limited exclusively to "high-quality," i.e., evidence-based resources, is the Translating Research into Practice (TRIP, http://www.tripdatabase.com). The TRIP database allows searching over the titles and/or full-text of over 70 on-line resources, from full-text journals (e.g., British Medical Journal) to electronic textbooks (e.g., eMedicine) to EBM databases (e.g., Bandolier). There is a basic free version and a commercial version that is more enhanced.

Two Web catalogs mentioned in the book have become defunct since publication. The original Medical Matrix no longer exists, although a commerical product by the same name offers prescription claims processing. CliniWeb (http://www.ohsu.edu/cliniweb/), my own pride and joy from the early days of the Web, has also been retired.

4.2.3 Specialized registries

4.3 Full-Text

4.3.1 Periodicals

(4/22/07) PubMed Central (PMC) has a more direct URL since the book was published: http://pubmedcentral.gov/. PMC continues to grow, albeit slowly. It contains articles from over 320 journals as of April, 2007. PMC also contains articles repesenting manuscripts submitted by authors representing research done via funding from NIH grants, based on policy adopted that encourages grantees to submit the final document submitted to the journal after peer review but before typesetting (http://pubmedcentral.gov/about/authorms.html, http://publicaccess.nih.gov/). These are submitted by authors using the NIH Manuscript Submission System (NIHMS).

PMC content can be accessed either by searching or browsing from the PMC site or via linkages from MEDLINE records displayed in PubMed. The rules for journals joining PMC continue to evolve as well, with the latest instructions at:
http://pubmedcentral.gov/about/pubinfo.html

One counter-response to PMC has been for journals adhering to the Washington DC Principles for Free Access to Science to make articles from their archives (6-12 months old) freely available on their Web site. This has also increased the amount of "free" science available on the Web.

NLM and PMC have attempted to bring more standardization to electronic journal publishing with a new Archiving and Interchange Document Type Definition (DTD) (http://dtd.nlm.nih.gov/). This provides a standard way to format content for NLM databases in XML. A related Journal Publishing DTD is optimized for authoring and initial XML tagging of journal material. Likewise, a PubMed Journal Article DTD has been created for the submission of citations and abstracts for MEDLINE/PubMed and a Book DTD has been developed for the NCBI Bookshelf.

Another effort of PMC is to scan back issues of the included journals (http://www.pubmedcentral.gov/about/scanning.html). The scanned pages for each article are combined into a single PDF file. The text has optical character recognition (OCR) applied for searching, although OCR errors are not corrected.

(4/24/06) The full text of the biomedical informatics literature is increasingly available for free. AMIA has made its proceedings from 1997 to 2003 (there was no AMIA conference in 2004 due to AMIA hosting MEDINFO 2004, and subsequent proceedings have not been made available) at:
http://www.amia.org/pubs/proceedings/symposia/start.html
In addition, the journals JAMIA, JMLA, and BMC Medical Informatics and Decision Making are available in PMC. As noted in the book, an important source of information about bioinformatics databases and systems comes from the annual database issue of Nucleic Acids Research. The publisher of this journal, Oxford Journals, has made this issue freely available under an open access model. In fact, in 2005, the entire journal adopted an open access model. The most recent database issues can be accessed at the following URLs:
(4/18/04) The coalescence of the Elsevier publishing empire has allowed the company to merge the content from the 1,800+ scientific journals it publishes into a single database (and search system) called Science Direct (http://www.sciencedirect.com).

4.3.2  Textbooks

(4/24/07) Another large and growing collection of on-line textbooks is the NCBI Bookshelf (http://www.ncbi.nih.gov/entrez/query.fcgi?db=Books). Part of the NCBI Entrez system, this resource provides access to the full text of several dozen commerically published textbooks. Many of the books cover topics in cellular and molecular biology. Some of these books are also formatted for handheld devices and can be downloaded for loading on to them (http://www.ncbi.nlm.nih.gov/entrez/query/Books.live/Help/mobile.html).

One book within this collection is particularly relevant to health and biomedical IR:  The NCBI Handbook. This book has 23 chapters on a variety of topics relevant to NCBI databases:
Related books that help searchers of NCBI resources include NCBI Help Manual and NCBI Short Courses.

Another book produced by NCBI and available in this collection is Genes and Disease, which provides a description of the roles genes play in a variety of human diseases. Also fashioned into a book for this collection is the Health Services/Technology Assessment Text (HSTAT) database, which contains all of the evidence reports, technology assessment, and practice guidelines of the Agency for Healthcare Research & Quality (AHRQ, http://www.ahrq.gov).

A growing number of other textbooks are being converted to PDA format. The largest vendors of PDA-based medical textbooks include Skyscape (www.skyscape.com) and Unbound Medicine (www.unboundmedicine.com).

One type of book that is natural for electronic format is the drug compendium. Two of note are:

4.3.3 Web sites

(4/22/07) Kamel Boulos describes the new world of wikis, blogs, and podcasts for use in medical education.

One of these types of Web resources that has grown substantially since publication of the second edition of the book is the wiki, or free encyclopedia. Wikis allow any indivudual in a community to write or edit an entry. This allows massive distributed and collaborative work to be done. For example, the prototype wiki, Wikipedia (http://en.wikipedia.org/wiki/Main_Page), has over 7 million entries, with over 1.7 million in English (http://s23.org/wikistats/wikipedias_html.php?sort=good_desc and http://en.wikipedia.org/wiki/Special:Statistics). However, the distributed approach is a double-edged source, with no guarantee of authority or accuracy for any topic (Terdiman, 2005), leading one author to describe it as a "faith-based encyclopedia" (McHenry, 2004). However, as noted in the Chapter 2 update, Wikipedia may be more accurate and comprehensive than the venerable Encyclopedia Brittanica (Giles, 2005), although the publisher of the latter takes exception to that claim (Anonymous, 2006).

At least two wikis are devoted to general medical topics:
Anonymous (2006). Fatally Flawed - Refuting the recent study on encyclopedic accuracy by the journal Nature. Chicago, IL, Encyclopedia Brittanica. http://corporate.britannica.com/britannica_nature_response.pdf.
Giles, J. (2005). Internet encyclopaedias go head to head. Nature, 438: 900-901. http://www.nature.com/nature/journal/v438/n7070/full/438900a.html.
McHenry, R. (2004). The Faith-Based Encyclopedia. Tech Central Station. November 15, 2004. http://www.techcentralstation.com/111504A.html.
Terdiman, D. (2005). Wikipedia Faces Growing Pains. Wired News. January 10, 2005. http://www.wired.com/news/print/0,1294,66210,00.html.

(4/22/07) Almost all of the URLs listed for clinical practice guidelines in the book have changed since publication:
Other collections of clinical practice guidelines include:
Other URLs in this section have changed as well:
(4/22/07) A growing type of Web content is the weblog or blog. A blog is essentially a running commentary on a topic maintained by a person or community. While probably less widespread for biomedical topics, blogs are extremely popular in the political realm. They are also popular in virtual communities with an interest in a diversity of topics.

One of the interesting effects from blogs early on was their impact on the Google PageRank searching algorithm. When words were repeatedly linked to a specific Web site, they could cause that Web site to rise up in Google's search rankings. A well-known example of this was the search "miserable failure," which those opposed to the policies of George W. Bush were able to associate with links to his biography. (Bush's biography ranks at the top of Google output for the search miserable failure.) Some call this activity "Google bombing." This aspect of Google's behavior has not been without controversy, e.g., Google's placing of an anti-Semitic Web site near the top of its rankings when the word "jew" is entered (see http://www.google.com/explanation.html). Google finally has put an end to "Google bombing" by changing its algorithm.

Sullivan, D. (2007). Google Kills Bush's Miserable Failure Search & Other Google Bombs. SearchEngineLand. January 25, 2007. http://searchengineland.com/070125-230048.php.

(4/22/07) A number of commercial collections of patient-based information, available to health care organizations by license for use on their internal Web sites, have become available:
There are also growing numbers of free consumer-oriented resources available as well, including:

4.4 Databases/Collections

4.4.1 Images

(4/26/06) As with other content, the number of image databases continues to grow. A Web page with links to many medical images sites is available at:
http://www.library.uthscsa.edu/internet/ImageDatabases.cfm

Some updates of URLs from the book:
Another collection of pathology images is the Pathology Education Instructional Resource (PEIR, http://www.peir.net/). The Health Education Assets Library (HEAL, http://www.healcentral.org/) is a project aiming to create a national repository of free, Web-based multimedia teaching materials in the health sciences. Associated with each image is a standard metadata record based on the Dublin Core Metadata Initiative (DCMI, http://www.dublincore.org/), which is described in Chapter 5. The Digital Anatomist Project (http://sig.biostr.washington.edu/projects/da/) models anatomical structures and the knowledge associated with them (Brinkley and Rosse, 1997). Its indexing approach is briefly described in the update for Chapter 5.

A commercial image encyclopedia has been published by Current Medicine, Images.MD (http://www.images.md/). Another image collection has been assembled for image retrieval research. The CasImage collection (http://www.casimage.com/) was developed at University Hospitals of Geneva and consists of anonymized textual case reports each linked to one or more anonymized images associated with the case (Rosset et al., 2004). A large majority of the case reports are in French, but about 20% are in English. A paper describing the operational system that collected the images has been described (Rosset et al., 2002).

A fascinating site for images is Flickr (http://www.flickr.com/), which lets individuals upload their pictures and allows anyone to annotate them. This site was recently acquired by Yahoo.

Brinkley, J. and Rosse, C. (1997). The Digital Anatomist distributed framework and its applications to knowledge-based medical imaging. Journal of the American Medical Informatics Association, 4: 165-183.
Rosset, A., Müller, H., et al. (2004). Casimage project: a digital teaching files authoring environment. Journal of Thoracic Imaging, 19: 103-108.
Rosset, A., Ratib, O., et al. (2002). Integration of a multimedia teaching and reference database in a PACS environment. Radiographics, 22: 1567-1577.

4.4.2 Genomics

(4/24/06) The LocusLink database of NLM has been superceded by Entrez Gene. While the latter maintains all of the records of the former (including Gene Reference in Functions or GeneRIFs), it adds a much larger number of organisms covered as well as integration within the Entrez searching system. MEDLINE records that contain information about a gene in Entrez Gene now allow linkage to it through the "Link Out" function. The NLM's approach to gene indexing was recently described by Ward (2005).

GenBank and its related sequence databases (EMBL and DDBJ) have surpassed 100 million sequences (http://www.ncbi.nlm.nih.gov/Genbank/). For up-to-date statistics on GenBank, see:
http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html

Another new genomics resource from NLM is the Genetics Home Reference (Mitchell et al., 2004). The system draws on publicly available resources, most of which are written for professionals, but presents them with additional material to provide a view more understandable to the lay public.

Mitchell, J., Fun, J., et al. (2004). Design of Genetics Home Reference: a new NLM consumer health resource. Journal of the American Medical Informatics Association, 11: 439-447.
Ward, J. (2005). Gene Indexing and Entrez Gene. NLM Technical Bulletin. e6. http://165.112.6.70/pubs/techbull/ma05/ma05_gene.html.

(4/24/07) A resource of growing importance in genomics is the model organism database, where all information (e.g., gene nomenclature, nucelotide and protein sequences, literature references, and other data) are brought together into a unified resource. The major model organism databases were described by Bahls et al. (2003).  An accompanying article described the challenges of building and maintaining such databases (Perkel, 2003). The five most-developed model organism databases include:
Naturally, the development of all these model organism databases has led to the development of a tool to facilitate their construction, the Generic Model Organism Database Construction Kit (http://www.gmod.org/) (Stein et al., 2002).

A growing effort is being developed by these databases and other resources to annotate the function of genes and proteins in biology. Most of the annotation is done using the GeneOntology, which is described the next chapter. One resource that attempts to bring together the names, annotations, and linkages to data sets for genome-scale analysis is SOURCE (http://source.stanford.edu), developed at Stanford University (Diehn et al., 2004).  Another attempt focused on the human genome is the GDB Human Genome Database (http://www.gdb.org/).

An original aggregation in the molecular biology domain was the Transparent Access to Multiple Bioinformatics Information Sources (TAMBIS) system (Goble et al., 2001), which is no longer available. A newer aggregation, however, is the Online Bioinformatics Resources Collection (OBRC, http://www.hsls.pitt.edu/guides/genetics/obrc) from the University of Pittsburgh.

Bahls, C., Weitzman, J., et al. (2003). Biology's models. The Scientist . June 2, 2003. 5. http://www.the-scientist.com/yr2003/jun/feature_030602.html.
Diehn, M., Sherlock, G., et al. (2003). SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data. Nucleic Acids Research, 31: 219-223.
Goble, C., Stevens, R., et al. (2001). Transparent access to multiple bioinformatics information sources. IBM Systems Journal, 40: 532-552. http://www.cs.man.ac.uk/~stevensr/papers/goble01.pdf.
Perkel, J. (2003). Feeding the info junkies. The Scientist. June 2, 2003. 39. http://www.the-scientist.com/yr2003/jun/feature14_030602.html.
Stein, L., Mungall, C., et al. (2002). The generic genome browser: a building block for a model organism system database. Genome Research, 12: 1599-1610.

4.4.3 Citations

(4/24/07) A growing number of bibliographic databases provide citation information, such as:
Bakkalbasi et al. (2006) recently compared Google Scholar, Scopus, and Web of Science (Science Citation Index), finding that none of them provided completeness of all citations.

Bakkalbasi, N., Bauer, K., et al. (2006). Three options for citation tracking: Google Scholar, Scopus and Web of Science. Biomedical Digital Libraries, 3: 7. http://www.bio-diglib.com/content/3/1/7.

4.4.4 Evidence-based medicine databases

(4/24/07) The proliferation of EBM databases continues, with a variety of formats used. This section provides an update on existing resources as well as description of some new ones.

While some EBM purists argue that Up to Date (http://www.uptodate.com/) is not completely evidence-based, e.g., not all statements are tagged with levels of evidence or support from studies of the highest quality evidence, the resource is comprehensive and very popular among clinicians as well as those in training. Up to Date has about 4,500 topic reviews in adult and pediatric medicine which are updated continually. Each topic has an outline that allows easy navigation. One of those outline headings is "Recommendations," which quickly gives the specific clinical recommendations for diagnosis and/or treatment of the problem. Topics are linked to both the MEDLINE references of articles cited as well as a drug compendium for specific prescribing information. Up to Date also provides a "What's New" area for each clinical topic, describing the latest clinical news in a given field. The system has also been enhanced with links to the Lexi-Comp drug reference, PubMed MEDLINE references, and patient education information.

(Historical note:  I programmed the first version of Up to Date! This was before it was called Up to Date and when it was considerably less developed than it is now. The founder of Up to Date, Dr. Burton Rose of Brigham & Women's Hospital, sought an informatics fellow to help develop his idea of a resource that would provide simple yet authoritative information to physicians. Due to my interest in IR, I took on the project, developing the first version in Apple Hypercard. Of course, I finished my fellowship and moved on, and another fellow, Dr. Joseph Rush, took on the project and now is the senior programmer for Up to Date. Drs. Rose and Rush have built a substantial enterprise from those humble beginnings!)

Another resource growing in size and comprehensiveness is PIER: The Physicians' Information and Education Resource (http://pier.acponline.org/) from the American College of Physicians (ACP, http://www.acponline.org/), the specialty society for internal medicine. PIER is designed to be the comphrensive information resource for practitioners of adult primary care medicine.

At this time, PIER is only available to members of the ACP. PIER is organized into modules that are categorized under six topic types:
As of now, the largest category of modules is Diseases, with over 500 developed. The content for each disease is organized under the following headings:
Modules also include references, patient information, additional references, and a PDF file of entire module for printing. A handheld version is also available (http://pier.acponline.org/pierpdajump.html) and the underlying system is constructed in a modular way to allow access via other applications, such as electronic health records. PIER has also been licensed, not only by some conventional publishers but also by some electronic health record vendors for context-aware linkage from the medical record. (This process and the concept of "infobuttons" is presented in the discussion of digital libraries later.) PIER has also completed the full circle back to paper, setting the foundation for a series in Annals of Internal Medicine devoted to providing evidence-based overviews of diseases, such as diabetes mellitus (Laine, 2007).

Every single guidance statement and recommendation in PIER is given a strength of recommendation rating to help the clinician assess their usefulness. These evidence ratings come from the procedure used in another ACP publication, ACP Journal Club (http://www.acpjc.org/shared/purpose_and_procedure.htm#criteria). The strength of recommendation is rated from A-C based on the following criteria:
  1. The preponderance of data supporting this statement is derived from level 1 studies, which meet all of the evidence criteria for that study type
  2. The preponderance of data supporting this statement is derived from level 2 studies, which meet at least one of the evidence criteria for that study type
  3. The preponderance of data supporting this statement is derived from level 3 studies, which meet none of the evidence criteria for that study type or are derived from expert opinion, commentary or consensus
The evidence criteria vary for the study type (e.g., randomized controlled trials for therapeutic or preventive interventions). References drawn from the medical literature are also given a level of evidence rating:
  1. Studies that meet all of the evidence criteria for that study type
  2. Studies that meet at least one of the evidence criteria for that study type
  3. Studies that meet none of the evidence criteria for that study type or are derived from expert opinion, commentary or consensus
Another widely distributed and comprehensive resources is Clinical Evidence (http://www.clinicalevidence.com/). Billed as an "evidence formulary," Clinical Evidence classifies each intervention for a given medical condition into the following categories:
An additional comprehensive collection of EBM content consist of POEMS ("patient-oriented evidence that matters"), which are short evidence-based synopses whose topics are selected based on the following criteria:
The main component of InfoPOEMS (http://www.infopoems.com/) is InfoRetriever, a resource that invludes a variety of evidence-based content and tools, including:
A less comprehensive EBM resource is Evidence-Based On Call (http://www.eboncall.co.uk/), which provides evidence-based summaries of 38 "on-call" medical conditions.

Some EBM collections take newer approaches, and are in the development stage and thus less comprehensive. Designed for clinicians at the University of Washington, PrimeAnswers (http://www.primeanswers.org/) aims to provide the "best evidence at the point of care." The system includes easy access to the other EBM resources, some of which are commerical and thus password-protected. It also features its own new content, consisting of evidence-based summaries of about 20 common clinical conditions, with linkage to the appropriate evidence. The Family Practice Inquiries Network (http://www.fpin.org/) is a project led by leading Departments of Family Medicine in the United States. It features several resources:
Laine, C., Goldmann, D., et al. (2007). In the clinic. Annals of Internal Medicine, 146: 70.

(4/19/04)  The Best Evidence product described in the book is no longer available, although the component publications that made it up, ACP Journal Club and Evidence-Based Medicine, are still available.

4.4.5 Other databases

(4/24/07) Beginning as a database of clinical trials sponsored by NIH, ClinicalTrials,gov has taken on a new role with the requirement for registration of clinical trials.  After problems were uncovered with post-inception protocol changes in clinical trials, the International Committee of Medical Journal Editors (ICMJE) adopted a policy of requiring registration at inception of study (Deangelis, 2005). This requires that clinical trials be registered in ClinicalTrials.gov (Zarin, 2005) or other comparable databases (Haug, 2005) before they begin in order to be later published.

ClinicalTrials.gov does not contain results of clinical trials, although many advocate that it or other comparable resouces provide results of clinical trials (Korn, 2006). Not only could readers get more details about the results of such trials, but those who carry out systematic reviews would have easier and better access to data. Indeed, Derry et al. (2001) have noted that articles of clinical trials in the medical literature are usually inadequate for reporting adverse events discovered in those trials. Some advocate even larger availability of raw data from clinical trials, although others have expressed caution that not only peer review, but also patient privacy protection, could be compromised (Fisher, 2006). One large-scale approach currently advocated is the Global Trial Bank, promoted by the American Medical Informatics Association (Sim and Detmer, 2005). A recent report commissioned by the NLM focused on clinical trials reporting and databases for the purpose of improving the efficiency of systematic reviews (Carson, 2007). This report provided the following table of other databases of clinical trials beyond ClinicalTrials.gov.

Database Internet Address Sponsor
Pharmaceutical industry sponsored
ClinicalStudyResults.org http://www.clinicalstudyresults.org/home/ PhRMA
AstraZeneca http://www.astrazenecaclinicaltrials.com/ AstraZeneca
Bayer Healthcare http://www.bayerhealthcare.com/index.php?id=224&L=2 Bayer Healthcare
Boehringer Ingelheim http://trials.boehringer-ingelheim.com/Trial_Results/index.jsp Boehringer Ingelheim
Bristol-Myers Squibb http://ctr.bms.com/ctd/ResultProductAction.do?type=all Bristol-Myers Squibb
Eli Lilly http://lillytrials.com/results/results.html Eli Lilly
Forest http://www.forestclinicaltrials.com/CTR/CTRController/CTRWelcome Forest
Glaxo SmithKline http://ctr.gsk.co.uk/welcome.asp Glaxo SmithKline
Novartis http://www.novartisclinicaltrials.com/clinicaltrialrepository/public/main.jsp Novartis
Organon http://www.organon.com/clinical_trials/Clinical_Trial_Results/index.asp Organon
Roche http://www.roche-trials.com/ Roche
Sanofi-Aventis http://www.sanofi-aventis.us/live/us/en/layout.jsp? scat=E7C27A86-08F4-4798-8241-710051CE000A#p4 Sanofi-Aventis
Government sponsored
Drugs@fda http://www.accessdata.fda.gov/scripts/cder/drugsatfda/ FDA
European Medicines Agency http://www.emea.eu.int/index/indexh1.htm EMA
National Cancer Institute Clinical Trials http://www.cancer.gov/clinicaltrials/results/ National Cancer Institute
ReFeR (Research FindingsRegistry) Department of Healthresearch findings directory http://www.refer.nhs.uk/ViewWebPage.asp?Page=Home UK Department of Health
Other funding
RCT Bank (Global Trial Bank Project) http://rctbank.ucsf.edu/Presenter/ also http://www.globaltrialbank.org NLM, AMIA

Carson, S., Cohen, A., et al. (2007). Making Clinical Trial Results Databases Useful for Systematic Reviews. Bethesda, MD, Lister Hill National Center for Biomedical Communications, National Library of Medicine.
DeAngelis, C., Drazen, J., et al. (2005). Is this clinical trial fully registered? A statement from the International Committee of Medical Journal Editors. Journal of the American Medical Association, 293: 2927-2929.
Derry, S., Loke, Y., et al. (2001). Incomplete evidence: the inadequacy of databases in tracing published adverse drug reactions in clinical trials. BMC Medical Research Methodology, 1: 7. http://www.biomedcentral.com/1471-2288/1/7.
Fisher, C. (2006). Clinical trials results databases:  unanswered questions. Science, 311: 180-181.
Korn, D. and Ehringhaus, S. (2006). Principles for strengthening the integrity of clinical research. PLoS Clinical Trials, 1: e1. http://clinicaltrials.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pctr.0010001.
Haug, C., Gotzsche, P., et al. (2005). Registries and registration of clinical trials. New England Journal of Medicine, 353: 2811-2812.
Sim, I. and Detmer, D. (2005). Beyond trial registration:  a global trial bank for clinical trial reporting. PLoS Medicine, 2(11): e65. http://medicine.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pmed.0020365.
Zarin, D., Tse, T., et al. (2005). Trial Registration at ClinicalTrials.gov between May and October 2005. New England Journal of Medicine, 353: 2779-2787.

(4/17/05) With the growing concern about bioterrorism and hazardous material incidents more generally, the NLM has created the Wireless System for Emergency Responders (WISER, wiser.nlm.nih.gov). This system is available for both handheld devices (Palm and Pocket PC) and PCs (Windows and Web versions). Data for the system comes from a variety of sources, such as the NLM Hazardous Substances Data Bank (HSDB), the Department of Transportation Emergency Response Guidebook, and the POISINDEX system from Micromedex. The output from searching is presented in different order depending on whether the user is a first responder, hazaradous materials (HAZMAT) specialist, or emergency medical system (EMS) specialist.

(4/26/06) A variety of interesting new databases have appeared that do not fall under the rubric of textual databases but are integrated with them:
Bednarz, A. and Dubie, D. (2006). Desktop search tools seen raising red flags. Network World. April 17, 2006. http://www.networkworld.com/news/2006/041706-desktop-security.html.
Dumais, S., Cutrell, E., et al. (2003). Stuff I've seen:  a system for personal information retrieval and re-use. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, CA. ACM Press. 72-79. http://research.microsoft.com/~sdumais/SISCore-SIGIR2003-Final.pdf.
Sayers, E. (2005). PubChem:  An Entrez Database of Small Molecules. NLM Technical Bulletin. e2. http://165.112.6.70/pubs/techbull/jf05/jf05_pubchem.html.

(4/19/03) The NIH CRISP database was the subject of a recent newspaper story in which scientists expressed worry that some of the topics of their research could come under increased scrutiny due to their politically controversial nature (Goode, 2003).

Goode, E. (2003). Certain Words Can Trip Up AIDS Grants, Scientists Say. New York Times. http://www.nytimes.com/2003/04/18/national/18GRAN.html.

(4/19/03) An interesting resource outside the domain of health and biomedicine is the Software Engineering Body of Knowledge (SWEBOK, http://www.swebok.org/). The goal of this resource is to map all of the knowledge of the field of software engineering (Bourque et al., 1999). The paper by Bourque et al. summarized the challenges in creating such a resource. For example, where does one draw the line between the discipline of software engineering and related ones, such as computer science, cognitive science, management science, and systems engineering. Likewise, what should the depth of material presented? The project chose to adopt the approach of inlcuding "generally accepted" knowledge, which applies to most situations most of the time and has widespread consensus about its value and effectiveness. This type of knowledge was distinguished from "advanced and research" knowledge, which was not yet mature, and "specialized" knowledge, which was not yet generally applicable.

The knowledge in SWEBOK is organized under a hierarchical breakdown of topics. Each topic includes:
Bloom, B. and Krathwohl, D. (1984). Taxonomy of Educational Objectives . New York. Addison-Wesley.
Bourque, P., Dupuis, R., et al. (1999). The guide to the Software Engineering Body of Knowledge. IEEE Software, 16(6): 35-44. http://www.lrgl.uqam.ca/publications/pdf/463.pdf.
Vincenti, W. (1990). What Engineers Know and How They Know It: Analytical Studies from Aeronautical History. Baltimore. The Johns Hopkins University Press.

(4/24/07) The are some "body of knowledge" projects in biomedical informatics-related areas. The most developed of these is the Health Information Management (HIM) Body of Knolwedge managed by the American Health Information Management Association (AHIMA, http://library.ahima.org/). It includes:
Another effort in medical informatics called the Health Informatics Body of Knowledge (HIBOK, http://www.ehrweb.org/ehrweb/implementation/pages/hibok.htm), but at this point, however, it is more of a concept than a reality.

4.5 Aggregations

(4/22/07) A very comprehensive collection of content has been made available to all clinicians in the United Kingdom is the National Library for Health (http://www.library.nhs.uk/) from the British National Health Service.  A variety of free and commerical resources are available, with the latter only to available to those with a password.  The commerical resources include Clinical Evidence, the full text of over 800 journals, the Cochrane Library, and a variety of bibliographic databases.

From a more resource-poor country comes another comprehensive portal, INFOMED: The Cuban National Health Care Telecommunications Network and Portal (http://www.sld.cu/; Séror, 2006).

A new consumer-oriented aggregation is Revolution Health (http://www.revolutionhealth.com/). It features:
Séror, A. (2006). A case analysis of INFOMED: The Cuban National Health Care Telecommunications Network and Portal. Journal of Medical Internet Research, 8(1): e1. http://www.jmir.org/2006/1/e1/.

(4/20/03) All of the Web sites of the National Cancer Institute (http://www.nci.nih.gov/) are now organized under a single URL (http://cancer.gov/), which includes CancerNet, PDQ, and more.

(4/18/04) Some MEDLINEplus oriented to the elderly has been repackaged into the NIH Senior Health Web site (http://nihseniorhealth.gov/). Some innovative additional features of this site for elderly people with poor vision and/or low reading ability include the capability to enlarge the font size of the text, increase the contrast by using a black background with white or yellow text, and have the content delivered in spoken format.

(4/18/04) The distinction between aggregations and other resources continues to blur. For example, all of McGraw-Hill's textbooks, including the venerable Harrison's, are now available in a single product called Access Medicine (http://www.accessmedicine.com/). There are increasing linkages across textbooks as well as links to updates, continuing medical education (CME) self-assessments, and other Web resources.

The market for aggregations of clinical content continues to grow. A number of commercial products (beyond those mentioned above or in the book) have emerged either de novo or from the aggregation of previous standalone systems:
Not surprisingly, most of the above clinical references are available in formats for Personal Digital Assistants (PDAs).

Of course, a big challenge that remains with all of these wonderful resources is that they are only aggregated among themselves and not to other resources, perhaps with the exception of linkages to MEDLINE references in PubMed. As a result, one cannot "mix and match" different of his or her favorite clinical resources into a unified digital library. Chapter 10 discusses digital libraries and what might be done to make this possible from a technical standpoint. Not surprisingly, the real barriers are economic, i.e., publishers do not want to link a user to the resources of a competitor.

(4/24/06) A recently available aggregation of drug-related data comes from the DrugBank site:
http://redpoll.pharmacy.ualberta.ca/drugbank/

Last updated - April 24, 2007