Information Retrieval:  A Health & Biomedical Perspective

Information Retrieval:  A Health & Biomedical Perspective (Second Edition)

William Hersh, M.D.

Springer-Verlag , 2003

Back to Updates Table of Contents

Update to Chapter 10 - Augmenting Systems for Users

10.1 Content

10.1.1 Early innovations

10.1.2 Linkage

10.1.2.1 Linkage to the EMR

(5/15/07) Several studies have now evaluated the approach of linking to medical knowledge from the context of the electronic health record (EHR). Each system uses different approaches and evaluation techniques, but a clear patterm emerges, which is that system usage frequency is comparable with the amount of usage of IR systems generally presented in Chapter 7. Research to date shows these systems have small impact from a frequency standpoint but are valued by users.

One study at Vanderbilt University Medical Center (Rosenbloom et al., 2005) provided access to contextual information during patient order entry and laboratory results reviewing. The user was presented with a list of knowledge resources to which they could link. A randomized controlled trial was performed to compare access to the linked information versus the information being available by non-contextual links. The contextually linked information was utilized more frequently, although it was only accessed about twice a month (once every 16 days).

Another study assessed a medication infobutton application, KnowledgeLink, that was implemented and evaluated by Maviglia et al. (2006) within the Partners Healthcare System EHR system. This infobutton worked by providing a "look up" button where drug names appear in the EHR application, which provided a link to a Web-based information resource with the drug name as the query. The information resources opened in a new browser window so that the user could easily return to the place they left off in the EHR by closing the window. The authors performed a study of KnowledgeLink, assessing its use and impact when linked to two different information resources, Micromedex and SkolarMD. Users were randomized by practice location to have KnowledgeLink link to one or the other reference.

Similar to the previous study, KnowledgeLink was used about twice per month by clinicians, representing 1.2% of all patient encounters. The median session time for usage was short (21 seconds), but users felt their questions were answered 84% of the time and they altered patient care decisions 15% of the time. Although user satsifaction was quite positive, suggestions for improvements included allowing refinement of the query and the ability to select other target resources. The group assigned to Micromedex as the knowledge resource was more likely to use KnowledgeLink than the one assigned to SkolarMD. Primary care physicians and nurse practitioners used the system more frequently than specialists.

A third evaluation study was performed by Cimino et al. (2006), assessing the infoutton system available at Columbia University Medical Center. Not only was usage assessed, but user satsifaction as well. Specific usage rates were not presented due to the changing nature of the system over the study period, but at its peak, usage was about once per month. The most common scenario for use of the system was during laboratory results look-up. Users were generally satisfied with the system and believed the information was contextually appropriate nearly all of the time.

Additional work by Cimino and colleagues beyond that described in the book includes the development of an "Infobutton Manager" which keeps track of the various information resources, generic questions that can be asked of them, and contexts in which those questions and resources might be used. The specific context of the patient is derived from the electronic health record (EHR) or clinical information system (CIS), e.g., demographic information, diagnoses, test results, and so forth. The system then creates specific infobuttons that provide linkage to available resources with queries to find knowledge-based information appropriate to that context. The framework for this work was described by Cimino et al. (2002).

Part of the work has focused on monitoring system logs from the clinical information system (Chen and Cimino, 2003). Results of logging use of infobuttons in different parts of the CIS demonstrate that their use is context-specific, i.e., the frequency of consulting different information resrouces varies based on the type of information in the CIS being viewed (Cimino et al., 2003).

Another aspect of this work has focused on identifying and categorizing information needs in an observational setting. Allen et al. (2003) identified six categories of information needs with 11 discernable patterns of questions:
In related work, Currie et al. (2003) developed a classification of information need events for observing use of the CIS:
In this study, four different types of users (attending physician, housestaff, medical student, and nurse) were observed during actual practice in three different clinical settings. The results demonstrated that information need events were frequent (averaging 11 per hour and not varying substantially across clinician types or locations) and more likely to be of the background type (76%) and explicitly expressed (85%).

Another outcome of this research has been the actual development of the "Infobutton Manager" (Cimino and Li, 2003). This tool matches a group of context parameters to information needs and then matches those needs to actual resources.  The context parameters include:
A form to enter the above data is available (although cannot be used outside the institution) at http://www.dmi.columbia.edu/homepages/ciminoj/howtoUseInfomanage.html. This system has also been implemented on wireless devices (Lei et al., 2003).

An additional challenge with infobuttons is that, at this time, most of them work by hard-coding communications between the EHR and information resource. To address this problem, the HL7 standards organization has begun work on a standard API between (a) EHR systems and infobutton managers and (b) infobutton managers and information resources. The idea is that by developing a standard interface between these entities, EHR and information resource vendors will not have to provide customized solutions every time this functionality is implemented. The standard is currently evolving in draft format and a version from 2005 is availabile publicly (Del Fiol et al., 2005). There is a growing amount of content available from publishers to be linked into EHRs, including from Elsevier (http://www.clinicaldecisionsupport.com/demo.html), Thomson (http://www.micromedex.com/products/hcs/), and Touchworks (http://www.touchworksemr.com/_htm/Mod_PL.asp).

Allen, M., Currie, L., et al. (2003). The classification of clinicians' information needs while using a clinical information system. Proceedings of the AMIA 2003 Annual Symposium, Washington, DC. Hanley & Belfus. 26-30.
Chen, E. and Cimino, J. (2003). Automated discovery of patient-specific clinician information needs using clinical information system log files. Proceedings of the AMIA 2003 Annual Symposium, Washington, DC. Hanley & Belfus. 145-149.
Cimino, J. (2006). Use, usability, usefulness, and impact of an infobutton manager. Proceedings of the AMIA 2006 Annual Symposium, Washington, DC. American Medical Informatics Association. 151-155.
Cimino, J. and Li, J. (2003). Sharing infobuttons to resolve clinicians' information needs. Proceedings of the AMIA 2003 Annual Symposium, Washington, DC. Hanley & Belfus. 815.
Cimino, J., Li, J., et al. (2002). Theoretical, empirical and practical approaches to resolving the unmet information needs of clinical information system users. Proceedings of the 2002 AMIA Annual Symposium, San Antonio, TX. Hanley & Belfus. 170-174.
Cimino, J., Li, J., et al. (2003). Use of online resources while using a clinical information system. Proceedings of the AMIA 2003 Annual Symposium , Washington, DC. Hanley & Belfus. 175-179.
Currie, L., Graham, M., et al. (2003). Clinical information needs in context: an observational study of clinicians while using a clinical information system. Proceedings of the AMIA 2003 Annual Symposium , Washington, DC. Hanley & Belfus. 190-194.
DelFiol, G., Rocha, R., et al. (2005). HL7 Infobutton Standard API Proposal. Ann Arbor, MI, Health Level Seven. http://cslxinfmtcs.csmc.edu/hl7/arden/2005-05-AMS/HL7-Infobutton-API-2-10-05.doc.
Lei, J., Chen, E., et al. (2003). Development of infobuttons in a wireless environment. Proceedings of the AMIA 2003 Annual Symposium, Washington, DC. Hanley & Belfus. 906.
Maviglia, S., Yoon, C., et al. (2006). KnowledgeLink: impact of context-sensitive information retrieval on clinicians' information needs. Journal of the American Medical Informatics Association, 13: 67-73.
Rosenbloom, S., Geissbuhler, A., et al. (2005). Effect of CPOE user interface design on user-initiated access to educational and patient information during clinical care. Journal of the American Medical Informatics Association, 12: 458-473.

10.1.2.2 Linkage to human knowledge

(5/30/04) Another form of linking to human knowledge is, of course, the reference librarian, described earlier in the book. With the growing amount of on-line information, the marletplace for others (i.e., non-librarians) to fulfill this role has grown. Janes et al. (2001) assessed 20 commerical and noncommercial "expert services" sites that answered information needs for a fee. Three types of questions were developed:
The response rate for all questions by different services was highly variable, with an overall average of 70%. The rate of correctness for questions with verifiable answers was likewise highly variable, but averaged 69% overall. The subjects with the highest rate of verifiable correctness were Shakespeare (100%) and education (90%). Health questions only had a 50% correct rate.

Another question-answering service gaining a high profile is Google Answers (http://answers.google.com/answers/). This service uses an "eBay-like" approach, where a user enters some information about the question and a price of how much money he or she is willing to pay (Kenney et al., 2003). Google maintains a group of "Researchers" who are "experts at finding information" (no further information is given). Entering a higher price will likely result in more detailed research or a quicker answer, according to the site instructions (http://answers.google.com/answers/help.html). The site also allows other users to search over questions that have been entered. Users of the service are discouraged from entering personal information about themselves, requesting private information about others, seeking assistance in conducting illegal activities, seeking help on school examinations or homework, or trying to sell or advertise products. A preliminary analysis by Kenney et al. compared the Google Answers service with reference librarians from Cornell University with 24 questions, with the blinded assessment of answers finding a trend towards better answers with the university librarians. These authors also note a concern expressed by professional librarians about the eBay-like approach of the information seeking, in particular with the non-establishment of a relationship between the patron and librarian.

Also in recognition that librarian-patron interactions are more likely to be asynchronous, a standard has been developed for question processing transactions. The standard is encoded in XML and handles metadata describing the questions, answers, people, and institutions of the interaction. The standard is currently is the draft standard for trial use phase and the activities of the committee developing are documented on a Web page, http://www.niso.org/committees/committee_az.html, where there is also a functional model (Anonymous, 2002) and series of use cases (Anonymous, 2003) for the standard.

Another approach to on-line clinical consultation has been the second-opinion service offered by Partners Healthcare, a health system composed of hospitals affiliated with Harvard Medical School (Massachusetts General, Brigham and Women's, and several community hopsitals). For a fee, a patient and his or her physician can obtain an Internet-based consultation. A review of the first 79 consultations found that while only a small number resulted in changed diagnoses (4%), a substantial number (90%) resulted in changes in treatment (Kedar et al., 2003).

Anonymous (2002). Question/Answer Transaction Protocol Functional Model. Bethesda, MD, National Information Standards Organization. http://www.loc.gov/standards/netref/funcmodel-wd1.pdf.
Anonymous (2003). Question/Answer Transaction Protocol Use Cases. Bethesda, MD, National Information Standards Organization. http://www.loc.gov/standards/netref/usecases-second-working-draft.html.
Kedar, I., Ternullo, J., et al. (2003). Internet based consultations to transfer knowledge for patients requiring specialised care: retrospective case review. British Medical Journal, 326: 696-699.
Kenney, A., McGovern, N., et al. (2003). Google meets eBay - what academic librarians can learn from alternative information providers. D-Lib Magazine , 9(6). http://www.dlib.org/dlib/june03/kenney/06kenney.html.
Janes, J., Hill, C., et al. (2001). Ask-an-expert services analysis. Journal of the American Society for Information Science and Technology , 52: 1106-1121.

(5/28/06)  The Professional's Information Link (PiL) service described in the book is now defunct, mainly due to low usage and excess cost. However, a similar system that has attained greater longevity is a database of clinical questions with associated ratings of resources retrieved developed at Duke University Medical Center (Crowley et al., 2003). In this system, residents enter clinical questions and rate the relevance of information retrieved, with their ratings vetted by chief residents and attending physicians. Others can later use the resource for searching.

Crowley, S., Owens, T., et al. (2003). A Web-based compendium of clinical questions and medical evidence to educate internal medicine residents. Academic Medicine, 78: 270-274.

10.2 Indexing

(5/30/04) A variety of issues are presented from the vantage of Tim Berners-Lee (2004), originiator of the Web as well as promoter of the Semantic Web, on a page maintained at the World Wide Web Consortium.

Berners-Lee, T. (2004). Design Issues - Architectural and Philosophical Points. World Wide Web Consortium. http://www.w3.org/DesignIssues/Overview.html. Accessed: May 30, 2004.

(5/30/04) The Library of Congress has developed an "action plan" for insuring that there is proper bibliographic control of Web resources and education of indexers and catalogers in the new century.

Anonymous (2004). Bibliographic Control of Web Resources: A Library of Congress Action Plan. Washington, DC, Library of Congress. http://lcweb.loc.gov/catdir/bibcontrol/actionplan.html.

10.2.1 Early approaches

10.2.2 NLM Indexing Initiative

(5/25/05) The NLM developed the system from this project into the Medical Text Indexer (MTI) and made it available to all indexers of its databases (McCray and Aronson, 2002). One change from the approach described in the book (summarized in Figure 10.4) is that name of the final step has been changed from "Clustering" to "Postprocessing," with further rules and heuristics applied. Some of the rules include the assignment of the check tags "Female" and "Pregnancy" when diseases of pregnancy occur, such as "Preeclampsia."  Similarly, the word "pediatrics" generates the check tag "Child" while "hamsters" generates "Animal." Further processing removes very common MeSH terms such as "Test" and Disease." In addition, there is filtering of terms with inappropriate semantic types, e.g., a chemical heading when no other terms are chemical in nature.

One evaluation of MTI with 10 indexers found that the system generally provided partial but not complete coverage of all the MeSH terms the indexers might want to employ for a given article (McCray and Aronson, 2002). The system was therefore viewed as a tool to aid indexing in a semi-automated fashion, though it was also being evaluated for deployment where manual indexing is not intended to be employed at all, such as with meeting abstracts in certain fields (e.g., HIV/AIDS, health services research, and space life sciences).

A more recent evaluation of MTI was described by Aronson et al. (2004). By this time, the system was in full operational use and available to all indexers. The system operated at a rate of about 530 MEDLINE records per hour (3,700 per week, a pace able to keep up with the 500,000+ records generated annually). Indexers averaged consulting the system about 379 times per day. Aronson et al. estimated MTI was used for about 20% of all articles indexed for MEDLINE. They also noted it was used in an automated fashion to index AIDS/HIV, health services research, and space life sciences abstracts in the NLM Gateway.

The system obtains terms from the MetaMap concept recognition system and from terms assigned to articles that are similar (using the PubMed Related Articles function). These two processes generate candidate terms that are then subject to three levels of filtering:
  1. Base - obtaining terms MetaMap and the PubMed Related Articles function, with certain boosting and substitution of some headings.
  2. Medium - use of additional rules to provide more specificity of headings.
  3. Strict - restricting output to only terms that are recommended by both MetaMap and the PubMed Related Articles function.
The first part of the evaluation assessed the recall, precision, and F2-measure (weighting recall twice as important as precision) for up to 25 recommendations from MTI for 273 articles. The respective values for all terms were 0.55, 0.29, and 0.46. When MTI output was restricted to main MeSH headings, recall improved to 0.81 while precision decreased to 0.11. For an average article, 7.7 of the MTI recommendations were chosen for indexing, of which 3.0 were main headings. The second part of the study identified some means for improving the interface of the system actually used by the indexers, such as showing where in the article the term appeared and displaying check tags first (presumably since they were most likely to be correct). Other user suggestions included avoiding terms from the Introduction or Background sections of articles, removing general terms when more specific ones were also chosen, and implementing indexing policy rules within the system.

A subseqent study focused more specifically on which portions of full-text articles were most beneficial for providing terms to indexers (Gay et al., 2005). A test collection of 500 articles, segmented by section, was used. While title and abstract sections provided a strong baseline performance, incremental benefit was found for using terms from table and figure captions as well as sections labelled as results, results and discussion, results and discussion, conclusions, and no header.

Aronson, A., Mork, J., et al. (2004). The NLM Indexing Initiative’s Medical Text Indexer. MEDINFO 2004 - Proceedings of the Eleventh World Congress on Medical Informatics, San Francisco, CA. IOS Press. 268-272. http://ii.nlm.nih.gov/resources/aronson-medinfo04.wheader.pdf.
Gay, C., Kayaalp, M., et al. (2005). Semi-automatic indexing of full text biomedical articles. Proceedings of the AMIA 2005 Annual Symposium, Washington, DC. Hanley & Belfus. http://ii.nlm.nih.gov/resources/amia05.fulltext.w.footer.pdf.
McCray, A. and Aronson, A. (2002). Automated and Semi-Automated Indexing. National Library of Medicine. http://ii.nlm.nih.gov/resources/MTI_091102.pdf.

10.2.3 Information Sources Map

10.2.4 Semantic Web

(5/28/06) A number of books about the Semantic Web continue to appear (Fensel et al., 2002; Daconta et al., 2003; Passin, 2004). The latter is most up to date and is relatively easy to read. The author provides a model, adapted from the vision of Berners-Lee, of a layered approach:
A key foundation of the Semantic Web is ontologies, which are formal explicit descriptions of concepts in a domain (sometimes called classes), properties of each concept describing features and attributes of the concept (also called slots, roles, or properties), and restrictions on those features and attributes (sometimes called facets or role restrictions) (Noy and McGuiness, 2001). A class can have subclasses that represent more specific concepts. An ontology along with a set of individual instances of classes constitutes a knowledge base. There is a continual evolution of alphabet soup names for various ontology schemas, particularly with regards to the Semantic Web, although the Web Ontology Language (OWL) appears to have carried the day (Anonymous, 2004). A couple of perspectives on ontologies with a biomedical orientation have been published by Grutter and Eikemeier (2002) and Rector (2004).

While a number of Semantic Web applications have demonstrated proof of the concept, the long-run future of the technology is not clear. Among the challenges are too much knowledge, e.g., systems will not be able to cope with all the "facts" out there, and knowledge "pollution," i.e., systems will not be able to know that facts are true or to handle them in a context different from their intended use (Passin, 2004).

The HealthCyberMap project still exists but has not made much progress. The most recent enhancement was the use of terminology from the UMLS Metathesaurus (Kamel Boulos et al., 2002). In addition, the book incorrectly stated that the "location" enhancement to the Dublin Core used in HealthCyberMap referred to the location of the publisher when in fact it referred to the geographic location of the disease.

Anonymous (2005). World Wide Web Consortium Issues RDF and OWL Recommendations. Cambridge, MA, World Wide Web Consortium. http://www.w3.org/2004/01/sws-pressrelease.
Daconta, M., Obrst, L., et al. (2003). The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management. Indianapolis, IN. Wiley Publishing.
Fensel, D., Wahlster, W., et al., eds. (2002). Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. Cambridge, MA. MIT Press.
Grutter, R. and Eikemeier, C. (2002). Development of a Simple Ontology Definition Language (SOntoDL) and its application to a medical information service on the World Wide Web. Proceedings of SWWS'01, The First Semantic Web Working Symposium, Stanford, CA. http://www.semanticweb.org/SWWS/program/full/paper47.pdf.
Horrocks, I., Patel-Schneider, P., et al. (2003). From SHIQ and RDF to OWL: the making of a Web Ontology Language. Journal of Web Semantics, 1: 7-26.
Kamel Boulos, M., Roudsari, A., et al. (2002). Towards a semantic medical Web: HealthCyberMap's tool for building an RDF metadata base of health information resources based on the qualified Dublin Core Metadata set. Medical Science Monitor, 8: MT124-MT136. http://www.medscimonit.com/pub/vol_8/no_7/2615.pdf.
Noy, N. and McGuinness, D. (2001). Ontology Development 101: A Guide to Creating Your First Ontology. Stanford University Knowledge Systems Laboratory. http://www.ksl.stanford.edu/people/dlm/papers/ontology101/ontology101-noy-mcguinness.html.
Passin, T. (2004). Explorer's Guide to the Semantic Web. Greenwich, CT. Manning Publications.
Rector, A. (2004). Defaults, context, and knowledge: alternatives for OWL-indexed knowledge bases. Pacific Symposium on Biocomputing 2004 , Kona, Hawaii. World Scientific. 226-237. http://helix-web.stanford.edu/psb04/rector.pdf.

10.3 Retrieval

(5/28/06) Another well-known writer in the HCI field has compiled a now-annual list of the "top ten mistakes" made in the creation of Web pages.  The first version was published in 1996, and was followed up in 1999, 2002, 2003, and 2005.  He also maintains a list of "all time" top mistakes, which include:
  1. Bad search (how can an IR person disagree?!?)
  2. PDF files for on-line reading
  3. Not changing color of visited links
  4. Non-scannable text
  5. Fixed font size
  6. Page titles with low search engine visibility
  7. Anything that looks like an advertisement
  8. Violating design conventions
  9. Opening new browser windows
  10. Not answering users' questions, e.g., an e-commerce site not showing a price
A page of links to Nielsen's prolific but enjoyable writing is also available at http://www.useit.com/alertbox/.

10.3.1 Starting points

10.3.2 Query Formulation

(5/30/04) Muramatsu and Pratt (2001) lament that commerical Web search engines provide little feedback to the user on how queries are transformed. They developed "transparent queries" to give users better feedback on how systems modify queries. The four modifications included:
A user study found that users understood better the query transformations that had been made of their searches, although they still had significant misunderstandings of what these transformations really did.

McKiernan (2003) recently pubslihed a paper describing a variety of novel interfaces to electronic journals. Another such interface has been described by Wiesman et al. (2004), who developed a concept browsing interface that allows the user to select, view definitions, and traverse semantic linkages to other concepts. These concepts are ultimately linked to documents that the user can view.

McKiernan, G. (2003). New age navigation: innovative information interfaces for electronic journals. The Serials Librarian, 45: 87-123. http://www.public.iastate.edu/~gerrymck/NewAge.pdf.
Muramatsu, J. and Pratt, W. (2001). Transparent queries: investigating users' mental models of search engines. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA. ACM Press. 217-224.
Wiesman, F., vandenHerik, H., et al. (2004). Information retrieval by metabrowsing. Journal of the American Society for Information Science and Technology , 55: 565-578.

10.3.3 Providing context

(5/23/03) Another graphical approach to visualization of conceptual relationships are Treemaps, developed by Shneiderman and colleagues at the Human-Computer Interaction Laboratory (HCIL) of the University of Maryland. Treemaps allow visualization for hierarchical structures and show attributes of leaf nodes by size and color coding. They enable users to compare sizes of nodes and of sub-trees, and are touted to be especially helpful in spotting unusual patterns. The project Web site is at:
http://www.cs.umd.edu/hcil/treemap-history/

(5/23/03) The book implies that the Topic Map system is a product of Highwire Press. While Highwire Press uses Topic Maps (see Figure 10.8 in the book), the Topic Map project is actually led by Topicmaps.Org, an independent consortium interested in developing the applicability of Topic Maps (http://www.topicmaps.org/). This group has developed an XML grammar for interchanging Web-based Topic Maps, called XML Topic Maps (XTM) Version 1.0, written by the Topicmaps.Org Authoring Group. A recent book describes all aspects of XML Topic Maps (Park and Bunting, 2003).

Park, J. and Hunting, S. (2003). XML Topic Maps - Creating and Using Topic Maps for the Web. Boston. Addison-Wesley.

10.3.4 Assisting relevance feedback and query expansion

10.4 Devices

10.4.1 Handheld devices

(5/28/06) There are few critical evaluations of handheld devices in the health care setting, with most reported research instead focusing on development of applications. Carroll et al. (2002) reviewed the lessons learned from the use of PDAs by clinicians in a neonatal intensive care unit. The main application used was an EMR, with the researchers hypothesizing that the system would reduce the number of times need to transcribe information, leading to a reduction in documentation errors. Their evaluation identified a number of limitations of these devices in the clinical setting.  (Even though this evaluation focused on patient-specific and not knowledge-based information, the results can be viewed in the context of how an application of the latter type of information might fare in this setting.)

Hardware limitations included the screen size being too small, text entry taking too long, and users fearing loss of their work due to instability of the system. Software limitations included the difficulty of altering the structure of the application once it was deployed, the tools used not being suited to large-scale data manipulation, the asynchronous nature of "hot-syncing," and the limitation of data entry to the PC or the PDA but not both. User issues that arose included what to do with data on pen and paper that could not easily be transribed into the system and how to handle siutations where users devised their own workarounds to the limitations in the system. The researchers concluded that essential features for success in the future should include easier linking to enterprise systems, synchronous but secure data transfer, easier data entry (perhaps using voice recognition), and increased screen size.

There are more studies that have assessed the usage of handheld devices in medical settings. Garrity and El Emam (2006) found that most surveys reported usage between 45-85% of all physicians studied, with a greater amount of usage for administrative and organizational tasks.

Carroll, A., Saluja, S., et al. (2002). The implementation of a Personal Digital Assistant (PDA) based patient record and charting system: lessons learned. Proceedings of the 2002 AMIA Annual Symposium, San Antonio, TX. Hanley & Belfus. 111-115.
Garritty, C. and El Emam, K. (2006). Who’s using PDAs? Estimates of PDA use by health care providers: a systematic review of surveys. Journal of Medical Internet Research, 8: e7. http://www.jmir.org/2006/2/e7/.

10.4.2 Tablet devices

(5/28/06) The tablet device market received a boost from the release of the Microsoft Windows XP Tablet Edition:
http://www.microsoft.com/windowsxp/tabletpc/default.asp

A recent white paper by a company that produces tablet-based health care applications gives a rosy vision of their use.

Anonymous (2006). Tablet PCs in Health Care. Austin, TX, Motion Computing. http://www.motioncomputing.com/resources/WhitePaper_HealthCare.pdf.

10.5 Digital Libraries

10.5.1 Overview of libraries

(5/14/07) In an eloquent lecture, Weise (2004) reminds us that a library is still a physical place, and that there is virtue to that place. She describes the value of the library as a place, its mission as a place, and the importance of professionalism in librarianship. Roush (2005) has talked about the "infinite library" to which we are evolving.

Roush, W. (2005). The infinite library. Technology Review. May, 2005. http://www.technologyreview.com/articles/05/05/issue/feature_library.asp.
Weise, F. (2004). Being there: library as place. Journal of the Medical Library Association, 92: 6-13.

10.5.2 Definitions and functions of DLs

(5/14/07) A large digital library (DL) project in Europe, DELOS (http://www.delos.info/), has published a "manifesto" on digital libraries (Candela et al., 2007). It presents a model of DLs with a three-tier framework contaning the DL, the DL system, and the DL management system. Surrounding the framework is the DL "universe," which addresses six core concepts: content, users, architecture, policy, quality, and functionality. This universe also features three roles of "actors," consisting of end-users, designers, system administrators, and application developers.

Another important report was published by the U.S. National Commission on Libraries and Information Science (NCLIS) (Anonymous, 2006) and addressed information policy issues in the face of "mass digitization" of information. The report identified nine areas with potential impact on information policy:
  1. Copyright - How should it be handled in digitization projects?
  2. Quality - What is the quality of optical character recognition (OCR), content, and authentication?
  3. Libraries - What are their roles and priorities for the digital age?
  4. Ownership and preservation - Who will assume long-term ownership of books, journals and other media as well as preserve the public record?
  5. Standardization and interoperability - How can systems and their content communicate with each other?
  6. Publishers - What are the roles of publishers in this era?
  7. Business models - What business models are needed and what will be the impact of the open access movement?
  8. Information literacy - What should be done about information illiteracy?
  9. Assessment - What assessment is being undertaken? How will we know if content and systems are meeting people's needs?
Anonymous (2006). Mass Digitization: Implications for Information Policy. Washington, DC, U.S. National Commission on Libraries and Information Science. http://www.nclis.gov/digitization/MassDigitizationSymposium-Report.pdf.
Candela, L., Castelli, D., et al. (2007). Setting the foundations of digital libraries - the DELOS Manifesto D-Lib Magazine, 13(3/4). http://www.dlib.org/dlib/march07/castelli/03castelli.html.

(5/30/04) Another model of digital libraries (DLs) is the 5S model of Goncalves et al. (2004). The authors hypothesize that DLs can be modeled, or explained, according to these five elements:
These authors also describe a taxonomy that defines the facets of a digital library based on the above five elements:
Goncalves, M., Fox, E., et al. (2004). Streams, structures, spaces, scenarios, societies (5S): a formal model for digital libraries. ACM Transactions on Information Systems, 22: 270-312.

(5/25/05) Although the informationist concept continues to generate a fair amount of discussion (and publications!), the concept has yet to see widespread adoption. The original publications cited in the book led to a conference held at the NLM to continue discussion on whether and how such a professional should be developed and function in the clinical setting. The conference resulted in a series of recommendations (Shipman et al., 2002; Plutchak, 2002). Shortly thereafter, the NLM announced funding to train informationists (http://nnlm.gov/scr/scnn/nov-dec03/fellowship.htm). Although there has been little research assessing the efficacy of the informationist approach (e.g., in terms of improved clinical care or reduced information-seeking time by clinicians), several new models have emerged. The most mature of these models in the clinical informationist, with the informationist helping to optimize the use not only evidence-based information but also informatics tools at the point of care (Guise et al., 2005). Another approach has been to adapt the model to the biomedical research environment, leading to the clinical bioinformationist model, focusing on molecular biology, genetic analysis, biotechnology, research literature, and databases (Lyons et al., 2004). Florance et al (2002) describe the challenges of integrating information specialists into various biomedical settings.

Florance, V., Giuse, N., et al. (2002). Information in context: integrating information specialists into practice settings. Journal of the Medical Library Association, 90: 49-58.
Giuse, N., Koonce, T., et al. (2005). Evolution of a mature clinical informationist model. Journal of the American Medical Informatics Association, 12: 249-255.
Lyon, J., Giuse, N., et al. (2004). A model for training the new bioinformationist. Journal of the Medical Library Association, 92: 188-195.
Plutchak, T. (2002). The informationist - two years later. Journal of the Medical Library Association, 90: 367-369.
Shipman, J., Cunningham, D., et al. (2002). The informationist conference: report. Journal of the Medical Library Association, 90: 458-464.

10.5.3 Access to content

(5/25/03) An increasing amount of content and metadata for knowledge-based information is formatted in XML. A book by Ahmet et al. (2001) is devoted to the use of XML for metadata, with extensive coverage of DCMI and RDF. Another important aspect of XML for IR applications is the eXtensible Style Sheet (XSL) Transformation (XSLT), which allows content to be reformatted into many forms, including HTML and PDF.

Ahmed, K., Ayers, D., et al. (2001). Professional XML Meta Data. Birmingham, UK. Wrox Press.

(5/25/03) Due to the growing use of XML, it is important to follow the development of XML Query, an emerging standard for querying XML content (Chamberlin, 2002). The XML Query standard has been specified:
http://www.w3.org/XML/Query
In addition, further work has been carried out to insure that comprehensive full-text searching capabilities are provided, as described in the following reports:
http://www.w3.org/TR/xmlquery-full-text-requirements/
http://www.w3.org/TR/xmlquery-full-text-use-cases/

Chamberlin, D. (2002). XQuery: an XML query language. IBM Systems Journal, 41: 597-615. http://www.research.ibm.com/journal/sj/414/chamberlin.pdf.

(5/24/03) An effort to develop standards for interoperability in the medical community, not limited to knowledge-based applications, is the Medbiquitous Consortium (http://www.medbiq.org/). This consortium of medical specialty societies, universities, and publishers is aiming to develop Web Services-based standards that will facilitate interoperability of applications devoted knowledge-based information, educational applications, and maintenance of certification in medical specialties. The current focus of work by Medbiquitous is on metadata for educational applications through enhancement of the Shareable Content Object Reference Model (SCORM, http://xml.coverpages.org/scorm.html), a set of 64 metadata elements that emanate from an expansion of the Dublin Core Metadata Initiative. Links to guides about SCORM development can be found at:
http://www.lsal.cmu.edu/lsal/expertise/projects/developersguide/

10.5.3.1 Access to individual items

(5/30/03) With the growing amount of digital scientific data on the Internet, there is also growing concern over how to make this data accessible and to preserve it. One workshop developed a series of recommendations that addressed the methods, costs, and terminology of archiving such data (Anonymous, 2003).

Anonymous (2003). The Selection, Appraisal and Retention of Digital Scientific Data - ERPANET/CODATA Workshop Final Report. Lisbon, Portugal, Biblioteca Nacional. http://www.erpanet.org/www/products/lisbon/LisbonReportFinal.pdf.

(5/15/07) The book lists a number of approaches to persistent identifiers of digital objects, but in the academic publishing arena, the Digital Object Identifier (DOI) standard has carried the day. Most publishers have adopted the DOI. An overview of the DOI system has been published (Anonymous, 2004) and an overview is available (http://www.crossref.org/01company/15doi_info.html). Any DOI can be resolved by appending the DOI to the following URL:  http://dx.doi.org/. For example, a recent paper that I authored on the informatics profession has the DOI 10.1197/jamia.M1912. Entering the URL http://dx.doi.org/10.1197/jamia.M1912 into a browser will link one to the page on the journal site where the article is published. In the long run, the DOI could be the "complete" citation of another document, although in biomedicine, the PubMed ID (PMID) also vies for the title of universal content identifier.

An outgrowth of the standardization on the DOI is the CrossRef project, which aims to create an infrastructure for linking citations across publishers (http://www.crossref.org/, http://www.crossref.org/01company/16fastfacts.html). Publishers who are members of CrossRef can insure that the DOIs for the content items they publish will resolve to a valid URL. They can also be assured that outbound links to other content adhering to the CrossRef standard will resolve to a valid URL. These resolutions will be maintained even if the actual URL of the content changes. CrossRef works hand-in-hand with OpenURL (Apps, 2006), a standard for transporting metadata and identifiers within URLs. These URLs can have the transported information resolved when the object might exist in more than one place but not have allowed access. For example, a library may not subscribe directly to a journal, but it may subscribe to an aggregation service that does. The library could then resolve the URL to point to the appropriate URL to access the object.

A related development is that many publishers have agreed to open their proprietary content (i.e., that in the "invisible Web") for indexing by the Web crawlers in the Google Scholar system (http://scholar.google.com/) (Quint, 2004; Sullivan, 2004). This makes the content searchable via Google, although protected content still requires subscription or other means of paid access. In PageRank-like fashion, the content is ranked by the number of citations to it. (It is interesting to see which of my own papers are cited the most! Not what I would expect!) Google Scholar has been compared to the Science Citation Index and URLs in the general Google search engine, with coverage and its overlap varying by discipline (Kousha et al., 2007)

Google is also undertaking other activities with libraries and other producers of scholarly work. Most prominent in the news lately has been its plan to digitize vast stores of books and other documents in a number of prominent university and other public libraries, including Oxford University, Harvard University, Stanford University, the University of Michigan, and the New York Public Library (Roush, 2005). There are a number of challenges whose solutions are not clear, including how users will best interact with the content, how it will all be digitized, and how copyright issues will be resolved. Stay tuned on this one. In a related project, 17 universities using the Dspace digital library system (http://www.dspace.org) from Massachusetts Institute of Technology will allow their archives, such as papers, technical reports, and other communications of their faculty, to be indexed by Google and made searchable as well (Young, 2004).

Now that the data comprising scientific investigation is more prevalent on the Web, there is a need for standards for its citation. Altman and King (2007) explore the various issues in this type of citation.

Altman, M. and King, G. (2007). A proposed standard for the scholarly citation of quantitative data. D-Lib Magazine, 13(3/4). http://www.dlib.org/dlib/march07/altman/03altman.html.
Anonymous (2004). CrossRef Launches Pilot Program of CrossRef Search, Powered By Google. Lynnfield, MA, Publishers International Linking Association. http://www.crossref.org/01company/pr/press20040428.html.
Anonymous (2004). Introductory Overview - The Digital Object Identifier System, International DOI Foundation. http://dx.doi.org/10.1000/203.
Apps, A. and MacIntyre, R. (2006). Why OpenURL? D-Lib Magazine, 12(5). http://www.dlib.org/dlib/may06/apps/05apps.html.
Kousha, K. and Thelwall, M. (2007). Google Scholar citations and Google Web/URL citations: a multi-discipline exploratory anlaysis. Journal of the American Society for Information Science & Technology, 58: 1055-1065.
Quint, B. (2004). Google Scholar Focuses on Research-Quality Content. Information Today. November 22, 2004. http://www.infotoday.com/newsbreaks/nb041122-1.shtml.
Roush, W. (2005). The Infinite Library. Technology Review. May, 2005. http://www.technologyreview.com/articles/05/05/issue/feature_library.asp.
Sullivan, D. (2004). Google Scholar Offers Access To Academic Information. Search Engine Watch. November 18, 2004. http://searchenginewatch.com/searchday/article.php/3437471.
Young, J. (2004). Google Teams Up With 17 Colleges to Test Searches of Scholarly Materials. The Chronicle of Higher Education. http://chronicle.com/free/2004/04/2004040901n.htm.

10.5.3.2 Access to collections

(5/15/07) Despite having lost momentum from its non-use by Internet search engines, the Z39.50 effort has not been disbanded. The project is currently being run by the Library of Congress (http://www.loc.gov/z3950/agency/). The project has released specifications for Search/Retrieve for the Web (SRW, also called the Search/Retrieve Webservice) and a Common Query Language (CQL). SRW has recently been enhanced to interoperate with OAI (see below) (Sanderson et al., 2005).

Sanderson, R., Young, J., et al. (2005). SRW/U with OAI. D-Lib Magazine, 11(2). http://www.dlib.org/dlib/february05/sanderson/02sanderson.html.

10.5.3.3 Access to metadata

(5/15/07) A growing concern about metadata is its quality, which has been addressed by Bruce and Hillmann (2004) as well as Beall (2006). A good primer on metadata for science digital libraries is at http://metamanagement.comm.nsdlib.org/outline.html.

Beall, J. (2006). Metadata and data quality problems in the digital library. Journal of Digital Information, 6(3): 355. http://jodi.tamu.edu/Articles/v06/i03/Beall/Beall.pdf.
Bruce, T. and Hillmann, D. (2004). The continuum of metadata quality: defining, expressing, exploiting, 238-256, in Hillmann, D. and Westbrooks, E., eds. Metadata in Practice. Chicago, IL. American Library Association.

(5/25/03) The lack of interoperability among knowledge-based resources on the Web means that current content is usually maintained in "silos" that dictate usage on their terms, e.g., their search engines, display format, etc.. Linkage across resources from different publishers, or across applications (e.g., EMR to IR systems) is difficult and non-standardized.

(5/25/05) The process of harvesting metadata in OAI is called the OAI Protocol for Metadata Harvesting (OAI-PMH) (Van de Sompel et al., 2004). A number of open-source tools have been developed for the OAI-PMH, many of which can be accessed from:
http://www.openarchives.org/tools/tools.html
Biomed Central is very involved in promoting OAI for access to the metadata of its content:
http://www.biomedcentral.com/info/libraries/oai
PubMed Central (PMC) also provides OAI access to its content as well, using the PMC DTD:
http://www.pubmedcentral.gov/about/oai.html

Brogan (2003) has summarized the growing number of digital library aggregation services. She notes that most of them relied on OAI-PMH. McKiernan (2003, 2003, 2004) has summarized many of the service providers who emply OAI.

One continuing challenge for metadata systems is the proliferation of different ones with different formats. Godby et al. (2004) have proposed a repository of crosswalks allowing translation among metadata elements.

Brogan, M. (2003). A Survey of Digital Library Aggregation Services. Washington, DC, The Digital Library Federation Council on Library and Information Resources . http://www.diglib.org/pubs/brogan/.
Godby, C., Young, J., et al. (2004). A repository of metadata crosswalks. D-Lib Magazine, 10(12). http://www.dlib.org/dlib/december04/godby/12godby.html.
McKiernan, G. (2003). Open Archives Initiative service providers. Part I: science and technology. Library Hi Tech News, 20(9): 30-38. http://www.public.iastate.edu/~gerrymck/OAI-SP-I.pdf.
McKiernan, G. (2003). Open Archives Initiative service providers. Part II: social sciences and humanities. Library Hi Tech News, 20(10): 24-31. http://www.public.iastate.edu/~gerrymck/OAI-SP-II.pdf.
McKiernan, G. (2004). Open Archives Initiative service providers. Part III: general. Library Hi Tech News, 21(1): 38-46. http://www.public.iastate.edu/~gerrymck/OAI-SP-III.pdf.
van de Sompel, H., Nelson, M., et al. (2004). Resource harvesting within the OAI-PMH framework. D-Lib Magazine, 10(12). http://www.dlib.org/dlib/december04/vandesompel/12vandesompel.html.

(5/30/04) There is emerging consensus not only that identifiers of digital content remain persistent but that metadata does so as well. A number of large-scale producers of metadata, including NLM, recently agreed to promote the use of Uniform Resource Identifiers (URIs) for this task (Baker and Dekker, 2003). A Web site has been developed to create a registry of such identifiers, http://info-uri.info/. A core concept behind their approach is that these URIs be non-referenceable, meaning that they do not point to actual locations on the Internet or Web, but instead represent a persistent namespace for metadata elements. This is described in more detail in the "About" page on their Web site ( http://info-uri.info/registry/docs/misc/faq.html).

Baker, T. and Dekker, M. (2003). Identifying metadata elements with URIs. D-Lib Magazine, 9(7/8). http://www.dlib.org/dlib/july03/baker/07baker.html.

10.5.4 Intellectual property

(5/15/07) The area of protecting on-line intellectual property (IP) now is most commonly called digital rights management (DRM) (Rosenblatt et al., 2002). There are a number of ongoing open and proprietary efforts in this field,w hch are mired in political and economic struggles among commercial content producers (e.g, the Recording Industry Association of America, Microsoft Corp., etc.). There has been considerable effort focused at developing DRM standards in the more open research and education communities (Martin, 2002), which are philosophically more akin to the health care environment than, say, users of products from the entertainment industry. Apple Computer has successfully launched its music sales business that charges 99 cents per song and provides nearly unlimited usage rights (http://www.apple.com/itunes/). While the system has been successful for Apple financially, it is not clear how much impact it has had in convincing music listerners not to download illegal copies of songs. The DRM issue remains a thorny one, not only for protecting IP but also allowing fair use and respecting the privacy rights of users (Tyrväinen, 2005). Bailey (2006) argues that strong copyright and DRM in face of poor "network neutrality" are a recipe for "digital dystopia."

It is certainly understandable that publishers wish to protect their IP. The question is how to provide them the tools to protect that property while expanding the market for their content, which may in turn allow them to lower the unit price of access. A particular challenge is how to serve the single users or those in small groups. While those at academic and other large medical centers often have direct access to resources based on their Internet Protocol addresses, practitioners who do not reside at such centers usually do not. Even clinicians at large centers want to access resources that their institutions do not provide and are inconvenienced by the usual authentication schemes.

A comprehensive framework for an inventory of digital rights comes from Rosenblatt et al. (2002), who define categories of rights and user actions within them:
As noted above, the approach of Martin et al. (2002) may work best for users in health and biomedical settings. Their solution aims at research and educational resources, where intellectual property protection is important, but must be balanced by open and easy use. They describe their approach as "federated," in that administration of access controls is shared between the origin site and the resource provider. Their approach builds on open standards, such as the Shibboleth initiative from Internet2 (http://shibboleth.internet2.edu/) that keeps track of, among other things, access right of individuals and resources. Shibboleth in turn takes advantage of the Open Security Access Mark-Up Language (OpenSAML, http://www.opensaml.org/), which defines rights for such resources and a single sign-on to access them. A guiding vision for DRM efforts should be a mechanism whereby individuals can gain access to resources with a single sign-on to all resources for which they and their institution have access rights. In addition, the DRM framework should allow easy and rapid access to resources for which they do not have subscription-style access. For example, if a user wants access to a systematic review from an online journal to which he or she does not subscribe, there should be a single dialog box asking if he or she would like to pay a certain amount from his or her on-line digital wallet and then get instant access after the one click required to accept making the payment. Coyle (2005) has described a metadata approach for copyright status.

Another approach to IP protection has been the Creative Commons License (http://creativecommons.org/). This approach is based on the premise that some people do not necessarily want full copyright protection (which is the default under law) to apply to their works, but instead desire to attach certain restrictions to its use. In essence, the Creative Commons License allows a creator of IP to retain some rights short of completely released the content into the public domain. The Creative Commons licensing process begins by completing a form on their Web site (http://creativecommons.org/license/). The licensee chooses four options for the license, as shown in the table below.

Option
Condition
Attribution
Others may copy, distribute, and display the copyrighted work - and derivates of it - but must give credit.
Noncommercial
Others may copy, distribute, and display the copyrighted work, but only for noncommercial purposes.
No Derivative Works
Others may copy, distribute, and display only unmodified versions of the copyrighted work.
Share Alike
Others can distribute derivative works only under a license identical to the one governing the original work.

For example, a person creating IP who desired to allow others to use his or her work unmodified and for non-commercial purposes would select a license that included all of the above options. Someone giving permission for the work to be modified but not used commercially or with any restrictions would choose a license with the latter two options. In addition to these basic four options, there are some additional special ones, such as allowing royalty-free uses in developing nations while retaining full copyright in the developed world or allowing specified amounts of sampling. Once the appropriate license is chosen, the Creative Commons Web site generates three types of data:
  1. Commons Deed - license in simple and plain language with appropriate icons
  2. Legal Code - Legal language designed to stand up in court.
  3. Digital Code - Metadata to be included on Web sites and pages that enables search engines and other applications to know terms of use.
The process also generates a logo with the Creative Commons logo and the statement, "Some Rights Reserved." The details of the licenses are described on the Creative Commons Web site (http://creativecommons.org/about/licenses/) and there are even comic-book style pages that give examples of their spectrum of rights and how they are used.

Another project of the Creative Commons is the Science Commons (http://sciencecommons.org/), which aims to bring a comparable approach to the world of scientific data and publications. The Creative Commons Web site also has a search engine (http://creativecommons.org/find/) that allows searching over materials based on the options above (e.g., a search to find images that may be used for non-commerical purposes and may be modified.

Bailey, C. (2006). Strong copyright + DRM + weak net neutrality = digital dystopia? Information Technology and Libraries, 25: 116-127. http://www.digital-scholarship.com/cwb/DigitalDystopia.pdf.
Coyle, K. (2005). Descriptive metadata for copyright status. First Monday, 10(10). http://www.firstmonday.org/issues/issue10_10/coyle/.
Martin, M., Kuhlman, D., et al. (2002). Federated digital rights management: a proposed DRM solution for research and education. D-Lib Magazine , 8: 7. http://www.dlib.org/dlib/july02/martin/07martin.html.
Rosenblatt, B., Trippe, B., et al. (2002). Digital Rights Management - Business and Technology. New York. M&T Books.
Tyrväinen, P. (2005). Concepts and a design for fair use and privacy in DRM. D-Lib Magazine, 11(2). http://www.dlib.org/dlib/february05/tyrvainen/02tyrvainen.html.

10.5.5  Preservation

(5/15/07) Rusbridge (2006) has reviewed some of the issues with regards to digital preservation. A data dictionary for preservation metadata has recently been released (Anonymous, 2005). Kenney et al. (2006) recently surveyed the archiving approaches of 12 e-journals, which of course have to pay more attention to digital preservation since they do not produce paper copies.

Anonymous (2005). Data Dictionary for Preservation Metadata - Final Report of the PREMIS Working Group. Dublin, OH, Online Computer Library Center, Inc. http://www.oclc.org/research/projects/pmwg/premis-final.pdf.
Kenney, A., Entlich, R., et al. (2006). E-Journal Archiving Metes and Bounds: A Survey of the Landscape. Washington, DC, Council on Library and Information Resources. http://www.clir.org/PUBS/reports/pub138/pub138.pdf.
Rusbridge, C. (2006). Excuse me ... some digital preservation fallacies? Ariadne, 46. http://www.ariadne.ac.uk/issue46/rusbridge/.

(5/25/05)  A Web site has been developed to describe government work in digital preservation generally (http://www.digitalpreservation.gov/) and as up-to-date information about the The National Digital Information Infrastructure Preservation Program (NDIIPP).

The NLM has also addressed the issue of permanence levels for its archives.  It has developed a Permanence Working Group.  This group has focused on three characteristics of Web documents: identifier validity, resource availability, and content invariance. They have developed a rating system based on these and distilled them into the following four permanence levels:
Byrnes, M. (2005). Permanence Levels and the Archives for NLM's Permanent Web Documents. NLM Technical Bulletin. March-April, 2005. e4. http://www.nlm.nih.gov/pubs/techbull/ma05/ma05_archive.html.

10.6  Future Directions

Last updated - May 15, 2007