Information Retrieval:  A Health & Biomedical Perspective

Information Retrieval:  A Health & Biomedical Perspective (Second Edition)

William Hersh, M.D.

Springer-Verlag , 2003

Back to Updates Table of Contents

Update to Chapter 7 - Evaluation

7.1 Usage frequency

7.1.1 Directly measured usage

(5/19/07) A number of new studies have assessed usage frequency of information retrieval (IR) systems in clinical settings. The largest body of work comes from Australia. One study analyzed QuickClinical, a system that include access to Pubmed and several drug databases (Magrabi et al., 2005). A total of 227 general practitioners were given access to the system, 193 of whom made use of it. Those who used the system carried out about 8.7 searches per month. Female physicans were more likely to use the system than male physicians (10 vs. 4 searches per month). Over 80% of usage occured in exam rooms during normal working hours. Another study looked at a larger population of clinicians who were provided access to Clinical Information Access Program (CIAP, http://www.ciap.health.nsw.gov.au/), a Web site providing access to a wide range of bibliographic and other resource databases (Westbrook et al., 2004). All 55,000 clinicians in the state of New South Wales were provided access, with usage data analyzed to determine frequency of use. Overall, there were about 48.5 search sessions per 100 clinicians per month. Over 75% of clinicians reported using the system.

Other studies have focused on handheld devices, also called personal digital assistants (PDAs). Garrity et al. (2006) performed a systematic review of previous surveys of physician use and concluded that anywhere from 45%-85% of physicians use them. The most frequent use appears to be for administrative and organizational tasks, with use for patient care less. There is somewhat higher use among younger physicians and those who are hospital-based. One specific study assessed usage of PDAs in an intensive care unit setting (Lapinsky et al., 2004). Of 17 physicians studied, 10 used their PDAs regularly, averaging 32.8 times per month, though medical applications were accessed only 9.0 times per month.

Another analysis focused on a query log of Pubmed rather than a population of individual users (Herskovic et al., 2007). A single day's log from around October, 2005 was made available to these researchers. They were able to determine "individuals" by Internet Protocol (IP) address. They eliminated from their analysis all users with over 50 queries during the time period, figuring that these were "bot" queries. For the remainder of the data, they determined that there were about 2.7 million queries posed by 624,514 users. The mean number of queries per user was 4.21, while the median number of queries was 2. The three most commonly used words were the Pubmed tags [author], [au], and [pmid]. These were followed in frequency by the words cancer, cell, review, and 2005.

A more focused analysis was carried out on 2,272 randomly selected queries. These queries were classified as "informational" (74.4%) versus "navigational" (22.1%), with the latter appearing to be seeking specific articles. The number of articles in the results set of these queries varied widely (1 to 4.8 million), with an average of 14,050 and median of 68. Only 11.2% of queries used Boolean operators, with nearly all of them AND. However, another 10.6% of articles had Boolean words (and, or, not) in lower case and were possibly attempting to use them, although as recalled from Chapter 6, Pubmed requires such operators to be in upper case.

Garritty, C. and El Emam, K. (2006). Who’s using PDAs? Estimates of PDA use by health care providers: a systematic review of surveys. Journal of Medical Internet Research, 8: e7. http://www.jmir.org/2006/2/e7/.
Herskovic, J., Tanaka, L., et al. (2007). A day in the life of PubMed: analysis of a typical day's query log. Journal of the American Medical Informatics Association, 14: 212-220.
Lapinsky, S., Wax, R., et al. (2004). Prospective evaluation of an internet-linked handheld computer critical care knowledge access system. Critical Care, 8: R414-R421.
Magrabi, F., Coiera, E., et al. (2005). General practitioners' use of online evidence during consultations. International Journal of Medical Informatics, 74: 1-12.
Westbrook, J., Gosling, A., et al. (2004). Do clinicians use online evidence to support patient care? A study of 55,000 clinicians. Journal of the American Medical Informatics Association, 11: 113-120.

(5/10/03) Recent studies continue to show that physicians are infrequent users of computers for seeking knowledge-based information. Arroll et al. (2002) observed 50 New Zealand family physicians in their offices for a half day. A total of 122 questions were asked, for an average of 2.3 questions per half day. The most common sources to answer questions were books (39%), colleagues (20%), and other paper sources (16%). Computerized sources were used for only 6% of answers, despite the fact that 78% of the physicians had computers on their desks and used them for clinical notes (48%) and Internet access (30%). Among the reasons given for infrequent use of computers to answer questions, the physicians stated reasons like the perceived longer time it would take to answer questions, their familarity with their own paper-based collections, and their uncertainty over how up to date the cmputerized resources would be.

Arroll, B., Pandit, S., et al. (2002). Use of information sources among New Zealand family physicians with high access to computers. Journal of Family Practice, 51: 8. http://www.jfponline.com/content/2002/08/jfp_0802_0706a.asp.

7.1.2 Reported usage

(5/19/07) A survey of 294 New Zealand family practitioners from 2001 found that about half of physicians at that time reported using the Internet for seeking clinical information (Cullen, 2002). Usage was higher among those who were younger and male, but there were no differences in usage by practice type or location.

Two studies have assessed usage of the Up to Date medical database product in surveys. Maviglia et al. (2002) found that among the 27% of people who responded to an on-line survey, 64% used the product at least three times per month and the average use was 14 times per month. Somewhat similarly, Meadows et al. (2003) found about one-quarter of users in a trial subscription reported using the product daily and another half reported using it weekly.

The most recent "cyberchondriacs" survey from Harris Interactive (2006) finds that an estimated 136 million Americans, 80% of all Internet users, have looked online for health information. This number is verified by recent surveys by the Pew Internet & American Life Project (Fox, 2006). Other recent surveys by Pew have found that 87% of all Americans on the Internet use it to seek science information, with 20% relying on it for their primary source of news about science (Horrigan, 2006). About 36% of Americans report using Wikipedia (Rainie and Tancer, 2007).

Anonymous (2006). Number of "Cyberchondriacs" - Adults Who Have Ever Gone Online for Health Information - Increases to an Estimated 136 Million Nationwide. Rochester, NY, Harris Interactive. http://www.harrisinteractive.com/harris_poll/index.asp?PID=686.
Cullen, R. (2002). In search of evidence:  family practitioners' use of the Internet for clinical information. Journal of the Medical Library Association, 90: 370-379.
Fox, S. (2006). Online Health Search 2006. Washington, DC, Pew Internet & American Life Project. http://www.pewinternet.org/pdfs/PIP_Online_Health_2006.pdf.
Horrigan, J. (2006). The Internet as a Resource for News and Information about Science. Washington, DC, Pew Internet & American Life Project. http://www.pewinternet.org/pdfs/PIP_Exploratorium_Science.pdf.
Maviglia, S., Martin, M., et al. (2002). Usage of UpToDate at an academic medical center (abstract). Journal of General Internal Medicine, 17(Supp1): 204.
Rainie, L. and Tancer, B. (2007). Wikipedia users. Washington, DC, Pew Internet & American Life Project. http://www.pewinternet.org/pdfs/PIP_Wikipedia07.pdf.

(5/10/03) A survey of Internet users published in 2002 noted that 73 million Americans used the Internet to seek health information (Fox, 2002). About 81% of these searchers started at a general search engine (e.g., Google, Yahoo, MSN) while 15% began at a health-specific site (e.g., WebMD). While 45% of searchers started at the top of the output list and worked their way down the listed sites, 39% read the list more carefully and only selected sites that seemed relevant and 12% clicked on a site because they recognized the sponsor or name.

In the same year, Taylor (2002) found that about 80% of all adults who were online sometimes used the Web to look for health care information. About 18% say they did so "often", while most did so "sometimes" (35%), or "hardly ever" (27%). This 80% of all those online amounted to 110 million users nationwide. This compared with 54 million in 1998, 69 million in 1999 and 97 million in 2001.  On average those who ever looked for health care information online did so three times every month.

Fox, S. (2002). Search Engines: A Pew Internet Project Data Memo. Pew Internet & American Life Project. http://www.pewinternet.org/reports/toc.asp?Report=64.
Taylor, H. (2002). Cyberchondriacs Update. Harris Interactive. http://www.harrisinteractive.com/harris_poll/index.asp?PID=299.

(5/10/04) Two studies of reported usage have focused on the synthesized clinical resource, UpToDate (http://www.uptodate.com). One study of clinician usage of UpToDate at an academic medical center found that two-thirds of users returning an email survey about its use described themselves as regular users, defined as using it at least three times per month (Maviglia et al., 2002). The average reported usage per user was actually around 14 times per month.

A second study randomized four physician practices to access or no access to UpToDate (Blackman et al., 2002). Each physician was interviewed after each patient seen for a total of 678 patient visits over five weeks. The average number of questions per visit was about the same, with 0.18 for intervention physicians and 0.21 for control physicians. Similar to previous studies assessing information resource usage, control physicians answered questions most commonly with textbooks (10.7%), computer-based literature searching (6.4%), information handbooks (2.9%), colleagues (2.9%), and medical Web sites (2.9%). The intervention physicians, however, used UpToDate for 50% of questions, followed by textbooks (13.8%), computer-based literature searching (12.5%), and colleagues (6.7%).

Both of these studies were limited by small sample sizes and reporting only as abstracts. The study of Maviglia et al. was an email survey with a 27% response rate.

Blackman, D., Cifu, A., et al. (2002). Can an electronic database help busy physicians answer clinical questions? (abstract). Journal of General Internal Medicine, 17(Supp1): 220.
Maviglia, S., Martin, M., et al. (2002). Usage of UpToDate at an academic medical center (abstract). Journal of General Internal Medicine, 17(Supp1): 204.

7.2 Types of usage

(5/19/07) The study by Arroll et al. (2002) above roughly verified the data present in Section 7.2 of the book, namely that the most common types of questions asked were about treatment (39%), diagnosis (33%), administration (19%), monitoring (4%), prevention (2%), and general review (2%). Another study, one of faculty physicians, found somewhat different proportions (Schwartz, 2003), with therapy (50%), prognosis (14%), epidemiology (13%), and prevention/screening (11%) questions most common.

The recent study by Fox cited above found that consumers searching the Web for health information most often were looking for information about a specific disease or medical problem (63%), a certain medical procedure or treatment (47%), or diet or nutrition information (44%).

Schwartz, K., Northrup, J., et al. (2003). Use of on-line evidence-based resources at the point of care. Family Medicine, 35: 251-256.

7.3 User satisfaction

(5/19/07) Continuing a finding reported in the book, just about all studies that evaluate general search tools or specific products find generally high user satsifaction. Most of the major surveys of consumer users, e.g., Pew and HarrisInteractive, likewise find general satisfaction with information found.

7.4 Searching quality

7.4.1 System-oriented performance evaluations

(5/10/04)  O'Rourke et al. (1999) report improvements in precision without impact on recall for mediated searches with the use of a form that prompts the user for evidence-based medicine "anatomy," i.e., the PICO classification of the patient or condition, intervention, comparison, and outcome.

O'Rourke, A., Booth, A., et al. (1999). Another fine MeSH: clinical medicine meets information science. Journal of Information Science , 25: 275-281.

7.4.1.1 Bibliographic system performance

7.4.1.2  Full-text system performance

(5/20/07) Koonce et al. (2004) compared several evidence-based resources for their ability to answer two types of clinical questions: 40 complex questions generated during clinical rounds and 40 general care management questions. Instead of comparing resources against each other, they used all of them to identify the best answer. Their results found that 20% of the complex clinical questions and 47.5% of the general care management questions were completely answered, while 40% and 22.5% of each respectively were partially answered. The remainder were unanswered.

Another comparison of "point of care" evidence-based knowledge tools was carried out. Trumble et al. (2006) looked at the major market segment leaders in this area, assessing them by the quality of their evidence as well as other factors deemed important by an expert panel, such as breadth of information, depth of information, searchability, links to Pubmed, and availability of PDA versions. Each product was then ranked by the quality of evidence, the factors deemed most important, and an overall score. The clear leader in all categories was the ACP PIER product, followed by Clinical Evidence and DynaMed.

Koonce, T., Giuse, N., et al. (2004). Evidence-based databases versus primary medical literature: an in-house investigation on their optimal use. Journal of the Medical Library Association, 92: 407-411.
Trumble, J., Anderson, M., et al. (2007). A Systematic Evaluation of Evidence Based Medicine Tools for Point-of-Care Houston, TX, Texas Health Science Libraries Consortium. http://ils.mdacc.tmc.edu/THSLC_SCC2006_EBM.zip.

(5/12/03) Alper et al. (2001) assessed a variety of free and commercial databases for answering the clinical questions of primary care family physicians. Due to the study's broad coverage of resources for family medicine, such venerable internal medicine resources like the Merck Manual and Up to Date were excluded from the analysis. Twenty questions were selected for searching from a database of over 1,200 that had been captured observing clinical practice. The selected questions covered a broad array of not only topics but also question types. Two physicians did the searching and found that the top resources for anwering questions were as follows:

Resource
URL
Mean % of questions answered
Mean time to answer questions
Stat!-Ref
www.statref.com
70%
4.0
MDConsult
www.mdconsult.com
60%
3.9
Dynamed
www.dynamicmedical.com
60%
2.4
MAXX
(no longer available)
55%
4.0
MDChoice.com
www.mdchoice.com
50%
3.3

The study also found that four combinations of two databases could answer more than 80% of questions, with Stat!-Ref and MDConsult able to answer 85% in combination. Two combinations of three databases (Stat!-Ref, MDConsult, and either DynaMed or MAXX) could answer 90% of questions, while some combinations of four databases answered 95% of questions. This study showed what many physicians have known for decades, which is that no single secondary literature resource answers all questions, and a variety must be available to comprehensively meet information needs.

Alper, B., Stevermer, J., et al. (2001). Answering family physicians’ clinical questions using electronic medical databases. Journal of Family Practice, 50: 960-965.

7.4.1.3 Web searching system performance

7.4.2 User-oriented performance evaluations

7.4.2.1 Assessing users by search critique

7.4.2.2 Assessing users with relevance-based measures

7.4.2.3 Assessing users with task-oriented measures

(5/23/07) Several new task-oriented types of studies have appeared in the literature. Sintchenko et al. (2004) used infectious disease and intensive care physicians with eight simulated cases to compare efficiency and effectiveness of three different knowledge resources: antibiotics guidelines, laboratory reports, and laboratory reports augmented with clinical decision support. Efficiency was measured in time taken to reach a decision, while effectiveness was measured based on agreement of recommendations with a panel of experts. Another measure assessed was clinical impact score, which was the product of the usage rate of the given resouce and the agreement with the expert panel.

Intensive care physicians (80-93%) were more likely than infectious disease physicians (31-56%) to use any source of knowledge support. The following table shows results of each intervention for both groups combined. These results indicate the best agreement and the most impact with laboratory reports augmented with clinical decision support.

Intervention
Intervention used
Agreement with experts
Confident or highly confident
Time (mean seconds)
Impact score
None (control)
NA
65%
68%
113
NA
Guidelines
39%
67%
75%
202
0.26
Laboratory report
58%
78%
78%
123
0.45
Laboratory report plus decision support
60%
97%
73%
245
0.58

Westbrook et al. (2005) used clinical scenarios with 44 physicians and 31 clinical nurse consultants (CNCs) to assess an online evidence retrieval system with methods similar to the studies of Hersh et al. described in the book. In their largest study, Hersh et al. (2002) compared medical and nurse practitioner students answering questions searching MEDLINE. Westbrook et al., on the other hand, assessed practicing physicians and consulting nurses using a suite on full-text evidence-based resources in addition to MEDLINE. In their study, Westbrook et al. found that physicians started with a higher pre-searching rate of correctness on the clinical tasks (37% vs. 18%) but that the retrieval system brought both groups up to the same level (50%). They also found that confidence in answers was likely to be higher for correct vs. incorrect answers, although over half of those who had persistently incorrect answers (before and after searching) were likely to have confidence in their answers. In addition, those who answered the scenario incorrectly initially had the same confidence in their answer after searching whether it was correct or incorrect. Both the Hersh et al. and the Westbrook et al. studies demonstrate that retrieval systems, and the confidence they engender, are far from perfect.

McKibbon and Fridsma (2006) used the same questions as Hersh et al. and obtained somewhat similar results. In this study, practicing clinicians were given the questions and allowed to search all of their "usual" resources. The results found that the addition of the search system did not improve their answers, as 39.1% of questions were answered correct before searching and 42.1% were answered correctly after searching. Users went from incorrect to correct answers with searching at the same frequency of going from correct to incorrect answers. The researchers found great variation in the ability of different resources to answer questions, with Google/Web and the Cochrane database more likely to lead to correct answers and Pubmed, Up to Date, and InfoPOEMS more likely to lead to incorrect answers.

Hersh, W., Crabtree, M., et al. (2002). Factors associated with success for searching MEDLINE and applying evidence to answer clinical questions. Journal of the American Medical Informatics Association, 9: 283-293.
Leroy, G., Xu, J., et al. (2007). An end user evaluation of query formulation and results review tools in three medical meta-search engines. International Journal of Medical Informatics: Epub ahead of print.
McKibbon, K. and Fridsma, D. (2006). Effectiveness of clinician-selected electronic information resources for answering primary care physicians' information needs. Journal of the American Medical Informatics Association, 13: 653-659.
Sintchenko, V., Coiera, E., et al. (2004). Comparative impact of guidelines, clinical data, and decision support on prescribing decisions: an interactive web experiment with simulated cases. Journal of the American Medical Informatics Association, 11: 71-77.
Westbrook, J., Gosling, A., et al. (2005). The impact of an online evidence system on confidence in decision making in a controlled setting. Medical Decision Making, 25: 178-185.

7.5 Factors associated with success or failure

7.5.1 Predictors of success

(5/22/07) Magrabi et al. (2007) have looked at the factors that make IR systems likely to be used by clinicians. In a survey of 227 Australian general practitioners with access to the QuickClinical system described earlier, they found that few factors were associated with usage, including age, level of clinical training, experience, or hours worked. They did find, however, that female clinicians were slightly more likely to search than male physicians. Not surprisingly, those who believed the system improved care were more likely to use it.

As noted above, McKibbon and Fridsma found that different resources (e.g., Google/Web and the Cochrane database) were associated with successful answering whereas others (e.g., Pubmed, Up to Date, and InfoPOEMS) were association with unsuccessful answering.

Magrabi, F., Westbrook, J., et al. (2007). What factors are associated with the integration of evidence retrieval technology into routine general practice settings? International Journal of Medical Informatics: Epub ahead of print.

7.5.2 Analysis of failure

(5/10/03) Sievert et al. (2001) looked at how lexical variants of terms affected search results for epistaxis as well as three eye conditions:  pink eye, conjunctivitis, and color blindness. They found that "bloody nose" did not map into "epistaxis" in MeSH, leading to very poor search results in MEDLINE when using the former. They also noted in consumer-oriented Web reousrces (which do not use MeSH) that slight variations on the wording of the search (e.g., "bloody nose," "nose bleed," and "nosebleed") led to substantial differences in both number of retrieval and number of relevant pages. They express particular concern for consumers, who are less knowledgable about medical language than clinicians.  ault et al. (2002) analyzed the MeSH mapping for several different common MEDLINE systems and found substantial variation in how effectively they mapped from user input to MeSH.

Gault, L., Shultz, M., et al. (2002). Variations in Medical Subject Headings (MeSH) mapping: from the natural language of patron terms to the controlled vocabulary of mapped lists. Journal of the Medical Library Association , 90: 173-180. http://pubmedcentral.gov/articlerender.fcgi?artid=100762.
Sievert, M., Patrick, T., et al. (2001). Need a bloody nose be a nosebleed? or, lexical variants causing surprising results. Bulletin of the Medical Library Association, 89: 68-71. http://pubmedcentral.gov/articlerender.fcgi?artid=31706.

(5/11/04) McCray and Tse (2003) assessed search failures (i.e., queries yielding no retrievals) in the NLM's consumer-oriented resources, MEDLINEplus and ClinicalTrials.gov. About 77% of the MEDLINEplus queries and 88% of the ClinicalTrials.gov queries were "in scope." Over two-thirds of these in-scope queries were error-free but just retrieved no matches. The most common problems were the same in both databases:  misspelled words (16% in MEDLINEplus and 27% in ClinicalTrials.gov), use of non-alphanumeric characters (14% and 21% respectively), and inappropriate search operators (14% and 15% respectively). Another interesting finding of these queries was the minimal use of "consumer" terms, e.g., "nose bleed" and "tube tied," which were used less than 0.4%.

McCray, A. and Tse, T. (2003). Understanding search failures in consumer health information systems. Proceedings of the AMIA 2003 Annual Symposium , Washington, DC. Hanley & Belfus. 430-434.

7.6  Assessment of impact

(5/20/07) Pluye and colleagues have performed research looking at the impact of IR and other informatics applications on physicians. They began by developing a taxonomy of system impact based on an organizational case study and grouped six types impact into broader categories of whether the impact was positive or negative (Pluye et al., 2004):
Next they performed a systematic review that gathered studies assessing the impact of IR systems on physicians and classified them as to whether they had the above impacts (Pluye et al., 2005). A number of 26 studies that met their inclusion criteria showed impact in each of the positive categories, with an estimated one-third of searches having a positive impact. Many searches, however, showed no impact and a few showed negative impract. Further work compared the impact of IR systems vs. decision support systems, noting that the former were more likely to cause learning and recall while the latter were associated with practice improvement (Grad et al., 2005).

Westbrook et al. (2005) performed a critical incident technique study finding, similar to Lindberg over a decade ago, that IR systems could lead to tangible positive benefits in the care of patients. Semi-structured interviews were done with 29 clinicians, which generated 85 episodes where the system provided tangible benefit. One quarter of these led to better provision of clinical care. They also identified a process of "journey mapping" that showed the "journey" clinicians could take from their first initial experiences with systems to their use as key knowledge tools. Inanother study, these same researchers also surveyed 55,000 users of their system, finding that 41% reported direct experience of a benefit (Westbrook et al., 2004).

Grad, R., Pluye, P., et al. (2005). Assessing the impact of clinical information-retrieval technology in a family practice residency. Journal of Evaluation in Clinical Practice, 11: 576-586.
Pluye, P. and Grad, R. (2004). How information retrieval technology may impact on physician practice:  an organizational case study in family medicine. Journal of Evaluation in Clinical Practice, 10: 413-430.
Pluye, P., Grad, R., et al. (2005). Impact of clinical information-retrieval technology on physicians: a literature review of quantitative, qualitative and mixed methods studies. International Journal of Medical Informatics, 74: 745-768.
Westbrook, J., Gosling, A., et al. (2004). Do clinicians use online evidence to support patient care?  A study of 55,000 clinicians. Journal of the American Medical Informatics Association, 11: 113-120.
Westbrook, J., Coiera, E., et al. (2005). Measuring the impact of online evidence retrieval systems using critical incidents and journey mapping. Studies in Health Technology and Informatics, 116: 533-538.

7.7  What has been learned about IR systems?

(5/20/07) The newer studies described in this update give us more insight into IR systems, but still show us that their impact in day-to-day clinical care is real yet in many ways modest, in the sense that their frequency of use is small. Given the role that search engines play in our lives, however, their importance cannot be denied. There is still considerable research that can be done to look at how to make them better.

Last updated - May 22, 2007