Information Retrieval:  A Health & Biomedical Perspective

Information Retrieval:  A Health & Biomedical Perspective (Second Edition)

William Hersh, M.D.

Springer-Verlag , 2003

Book Errata

This page contains known errors in the book.  Please email the author with any new ones that you discover.  The list on this page is divided into substantive errata and typos.

Substantive Errata

Page 31 - In formula (4), impact factor, the word to should be two in both the numerator and denominator.

Page 99 - In the first full paragraph, the text correctly states that the E measure varies inversely with recall and precision, but the final statement of the paragraph in parentheses is incorrect, i.e., higher levels of recall and precision will push E towards 0 whereas lower levels will push it toward 1.  Also, the variable a indicates the relative value of precision, not recall.

Pages 136-137 - The correct name for the NLM collection of cross-sectional images of the human body is the Visible Human Project .  It is incorrectly stated in the second paragraph of section 4.4.1 as well as Table 4.5.

Page 161 - In Table 5.8,  for the data element M##, the word bacl should be back.  However, the name of this element has changed and is now MED####, where #### is a particular year.  It should also be noted that "back files" are not really used in MEDLINE any more, i.e., the PubMed interface to MEDLINE provides access to the entire database dating back to 1966.

Page 189 - Formula (2) is incorrect.  The correct formula for cosine normalization is:
Chapter 6, Formula 2    
(Note that the calculations which use this formula are correct, i.e., only the formula itself is incorrect.)

Page 190 - In Table 6.3, the weight of term B in document 5 should be 2.38 (not -2.38, i.e., no negative sign in the number).  In the first paragraph, the calculation of the weight of term D after Rocchio weighting should be 0 + (1.4 /2) – 0 = 0.70.  In the second paragraph, the new weight of document 2 should be 4.75 + 0.98 = 5.73 and the new weight of document 5 should be 1.97 + 2.38 – 0.81 = 3.54 .

Page 193 - In the first paragraph, line 4, the word characters should be words.  That is, ADJn usually requires words to be within n words of each other.

Page 217 - In the third paragraph, the text-word search hypertension AND beta AND blocker of the documents in Appendix 1 should only retrieve documents 5 and 8, not documents 5, 7, and 8.  In the fourth paragraph, there is a natural language search, drug treatments of hypertension .  Tables 6.6 and 6.7, and the text describing them, have a number of errors:
Thus, Table 6.6 should appear as follows:
treatment - 1.4 *      hypertension - 1.22 *      drug 1.52 *
(doc 3) 1.0 = 1.4 (doc 1) 2.0 = 2.44 (doc 5) 2.0 = 3.04
(doc 5) 1.0 = 1.4 (doc 2) 1.0 = 1.22 (doc 6) 1.0 = 1.52
(doc 8) 1.0 = 1.4 (doc 3) 2.0 = 2.44 (doc 7) 1.0 = 1.52
(doc 10) 1.0 = 1.4 (doc 4) 2.0 = 2.44
(doc 5) 1.0 = 1.22
(doc 8) 1.0 = 1.22
Likewise, Table 6.7 should appear as follows:
Document     Score
1 2.44
2 1.22
3 1.4 + 2.0 = 3.4
4 2.44
5 1.4 + 1.22 + 3.04 = 5.66
6 1.52
7 1.52
8 1.4 + 1.52 = 2.92
9 0
10 1.4
Document 5 still ends up ranking the highest, and the discussion in the final paragraph of the chapter is still correct.

Pages 270-271 - In Table 8.3, there are errors for two of the three query terms in the TF*IDF calculations which emanate from Appendix 3.  The TF*IDF for DRUG in Document 5 should be 3.04, while the TF*IDF for TREATMENT in Document 5 should be 1.4.  This then makes formula 1, which calculates the cosine between the query and Document 5 incorrect.  The correct calculation is:
Chapter 8, Formula 1
(This also makes some of the numbers in Table 8.4 incorrect as well, the correction of which are left as an exercise for the reader.)

Pages 271-272 - There are also errors in Forumlas 2 and 3.  Both should have each term in the vector from the denominator squared.  The correct formulas are:
Chapter 8, Formula 2  
Chapter 8, Formula 3  

Page 281 - Formula (11), the formula for pair weight in relevance feedback, is incorrect.  The quantity after the minus sign should be, correlation in nonrelevant documents, i.e.,
Chapter 8, Formula 11
Pages 443-447 - In Appendix 3, the IDF for the word treatment in documents 3, 5, and 8 should be 1.4, not 1.52.  It is correctly listed for document 10.

Typos

Page x (Preface) - In the fourth paragraph, line 4, the word disease should be distance.

Page 12 - In the last paragraph of Section 1.4.2, the company Vertity should be speed Verity (the URL is correct).

Pages 19-20 - In the first full paragraph on page 19, the first three sentences repeat themselves identically in the second three sentences of the paragraph.  In the second of the repeated sentences, the word on should be of.

Page 22 - In the fourth bullet of definitions of information from Webster's dictionary, the word in should be deleted.

Page 42 - In Table 2.3, under the Results heading in the line on Participant Flow, the word eviations should be deviations .

Page 45 - In the first full paragraph, line 10, the words report their should be deleted.

Page 45 - In the second full paragraph, line 8, the word rates should be rated.  In the following sentence, it should be stated that there were actually three (not one) items in the readers' top 10 topics that appeared in the experts' top 10.

Page 51 - In the third line of the final paragraph, the word strudtured should be structured.

Page 56 - In the second full paragraph 2, line 1, the word names should be named.
.
Page 68 - In the top paragraph, line 9, the word medial should be medical.

Page 68 - In the last paragraph, line 11, the phrase the antibiotics should be that antibiotics .

Page 69 - In the top paragraph, line 3, the end of the sentence should read, will play a role in the solution .

Page 70 - In Table 2.6, the sixth question should have the word text replaced with test.

Page 73 - In the second full paragraph, line 6, the word to should appear in the phrase find a computer use , i.e., find a computer to use.

Page 78 - In the second line just below equation 11, the word report should be replaced by reported .

Page 85 - In the third full paragraph, line 2, the word or should be nor.

Page 87 - In section 3.1.2, second line, the word other should be others.

Page 98 - In section 3.3.1.2, third paragraph, the first it (before should) should be deleted.

Page 102 - In section 3.3.1.4, second paragraph, last line, is should be deleted.

Page 107 - In the last paragraph, second-to-last line, the word previous should inserted between in and ones .

Page 109 - In the second full paragraph, second-to-last line, the word when should be deleted.

Page 113 - In section 3.5, second paragraph, line 4, the word to should be two.  Also in that section, the second-to-last paragraph refers to Table 3.5 but should refer to Table 3.6 .

Page 126 - In the first full paragraph, line 6, the word in should be is.

Page 127 - In the last paragraph, line 9, the word tern should be term.

Page 131 - In the first full paragraph, line 6, the word of should be to.  In section 4.3.2, first paragraph, line 6, the word becuase should be became .

Page 136 - In section 4.4, line 2, the word makes should be inserted between which and it.

Page 148 - In the second full paragraph, the first line under formula (1), the word th should be the and there is should be the word the between i and is.

Page 152 - The legend of Figure 5.2 states that the heading Hypertension is denoted by a heavy box.  However, this figure did not reproduce properly, and the box with C14.907.489 Hypertension inside is what should be denoted.

Page 156 - In the last paragraph of section 5.3.2, line 7, the word Crptococcus is misspelled and should be Cryptococcus .  It should also be in monospaced Courier font and non-italicized..

Page 161 - In Table 5.8, element STT, the word varient should be variant.  In element ST, the word unreviweed should be unreviewed.

Page 167 - In Table 5.10, the word Museum has an inappropriate close bracket, ].

Page 171 - At the bottom of Table 5.11, the source for Open Directory should be dmoz.org .

Page 180 - The title of section 5.6 should be Indexing Images .

Page 182 - In the first full paragraph, the last word of the paragraph fixed should be field.

Page 184 - In the first full paragraph, line 5, the word formulate should be formulates.

Page 185 - In the second paragraph, line 9, futility point should be defined as the number of documents beyond which a searcher will not continue to look at the results.

Page 186 - In Figure 6.1 at the very top, the word commom should be common.

Page 194 - In the second line, the word explore should be explode .  In the second full paragraph, line 5, the word home should be hone.

Page 198 - In the first paragraph, line 4, the MeSH term listed should be Bites and Stings not Strings.

Page 209 - In the second full paragraph, line 4, the word in should appear, i.e., between specific terms in the classification hierarchy .

Page 209 - In the first line of the last paragraph, the word gaining should be gaming.

Page 214 - In the first paragraph, line 3, the word generate should be generated.

Page 215 - In Section 6.5, line 4, the word a should appear, i.e., suited for a user.

Page 218 - There should be no = at the end of the chapter.

Page 220 - In Section 7.1.1, line 3, the word reasonable should be reasonably.

Page 224 - In Section 7.1.2,line 7, the word on should be an .

Page 226 - In Figure 7.2, the word Diagnasis should be Diagnosis .

Page 235 - In the second paragraph of section 7.4.2.1, the word transactions should be singular.

Page 236 - In the first full paragraph, line 5, the word rates should be rated.

Page 237 - In the legend of Figure 7.3, the citation should be McKibbon et al., 1990 (see bibliography for full citation).

Page 253 - In the first full paragraph, line 6, the word greed should be agreed.

In Chapter 8, there are several instances where TF*IDF is incorrectly replaced with TF*IOF :
Page 272 - first paragraph of section 8.2.2
Page 304 - first paragraph of section 8.4.2.3
Page 306 - second paragraph
Page 307 - first and second paragraphs
Page 309 - third paragraph

Page 273 - In Section 8.2.2.2, line 5, the word been should precede the word adapted.

Page 273 - Equation 7 is slightly misformatted.  Although not incorrect, it could be potentially misleading.  In the second addend, the 1.5 is multiplied the whole quantity as follows:
Chapter 8, Formula 7
Page 276 - In the first full paragraph, line 7, the word in should precede the phrase free text.

Page 277 - In Section 8.2.5, line 8, the word in should be deleted.

Page 278 - In the second full paragraph, line 1, the word in should be is.  In line 5, the second instance of the word a should be deleted.

Page 281 - In the first paragraph after the numbered list, line 2 should end with the word be.

Page 284 - In the second paragraph, the word at the started of the second sentence should be They, not The.

Page 285 - In the second paragraph, last line, the referral to Figure 5.5 should be to Table 5.15.

Page 289 - In the second paragraph of Section 8.3.2.2, line 4, the word studies should appear after (circa 1996-1999) .

Page 292 - In Section 8.3.2.3, line 9, the word the should appear after the first of and the word of should appear group .

Page 295 - In the first full paragraph, line 5, the word other should be another.  In the second paragraph, line 7, the word phrases should phrased.  In the fourth paragraph, line 2, the word a should precede 5-year .

Page 296 - In Table 8.6, item 12, the word goes should be goals .

Page 298 - In the first numbered list, item 2, the word is should follow information.

Page 301 - In the first full paragraph, line 4, the word searches should be searchers.

Page 309 - In Section 8.5, line 9, the word studies should be studied.

Page 340 - In the second paragraph, line 6, the word occur should be occurs.

Page 358 - In the first line, the word interventive should be intervention .

Page 364 - In the second full paragraph, line 4, the word other should be others.

Page 366 - In the third paragraph, line 1, the word automated should be automate.

Page 377 - The second paragraph should begin, Typical of a lexical-statistical IR system….

Page 383 - In section 10.4, line 1, the word device should be devices .  In addition, the headings of Table 10.3 are incorrect.  The headings With small screen and With large screen should be switched.

Page 384 - In the first paragraph of Section 10.4.1, line 11, 67% should not have an open parenthesis in front of it.

Page 388 - In the second paragraph, line 3, with word attaining should be attain.  In the last paragraph, line 5, the word a should be inserted between of and book.

Page 389 - In the first paragraph, line 2, the word area should actually be are a.

Page 391 - In the numbered list in the second full paragraph, item 3, the word located should locate.

Page 392 - In paragraph 3, line 4, the word been should be inserted between adhered and to.

Page 405 - In the third full paragraph, the word the should precede Canonical Phrase… .

Page 431 - In Document 1 of Appndix 1, line 3, the word it should be inserted between though and varies .  Since this is a stop word, the data in Appendices 2-3 are not affected.

Pages 443-447 - In Appendix 3, the heading for the third column should be TF*IDF, not IDF*TF,  for usage consistent with the rest of the book.

Last updated - June 6, 2004