Book Errata
This page contains known errors in the book.
Please email the
author
with any new ones that you discover. The
list on this page is divided into substantive errata and typos.
Substantive Errata
Page 31 - In formula (4), impact factor, the
word to should be two in both the numerator
and denominator.
Page 99 - In the first full paragraph, the text correctly states
that the E measure varies inversely with recall and precision,
but the final statement of the paragraph in parentheses is incorrect,
i.e., higher levels of recall and precision will push E towards 0 whereas
lower levels will push it toward 1. Also, the variable a
indicates the relative value of precision, not recall.
Pages 136-137 - The correct name for the
NLM collection of cross-sectional images of the human body
is the Visible Human Project . It is incorrectly
stated in the second paragraph of section 4.4.1 as well as Table
4.5.
Page 161 - In Table 5.8, for the data element
M##, the word bacl should be back. However,
the name of this element has changed and is now MED####, where ####
is a particular year. It should also be noted that "back files"
are not really used in MEDLINE any more, i.e., the PubMed interface
to MEDLINE provides access to the entire database dating back to
1966.
Page 189 - Formula (2) is incorrect. The correct
formula for cosine normalization is:
(Note that the calculations which use this formula are correct,
i.e., only the formula itself is incorrect.)
Page 190 - In Table 6.3, the weight of term B in
document 5 should be 2.38 (not -2.38, i.e., no negative
sign in the number). In the first paragraph, the calculation
of the weight of term D after Rocchio weighting should be 0 + (1.4
/2) – 0 = 0.70. In the second paragraph, the new weight of document
2 should be 4.75 + 0.98 = 5.73 and the new weight of document
5 should be 1.97 + 2.38 – 0.81 = 3.54 .
Page 193 - In the first paragraph, line 4, the word characters should
be words. That is, ADJn usually requires words to be within
n words of each other.
Page 217 - In the third paragraph, the text-word search hypertension
AND beta AND blocker of the documents in Appendix 1 should only
retrieve documents 5 and 8, not documents 5, 7, and 8.
In the fourth paragraph, there is a natural language search,
drug treatments of hypertension . Tables 6.6 and 6.7, and the
text describing them, have a number of errors:
- The term treatment occurs in document 10. As
a result, document 10 should appear in the first column of Table 6.6
under treatment . Furthermore, since there are four
documents containing that word, the IDF for treatment should be
1.40 , as is correctly shown in Appendix 2. (There
are, however, errors in Appendix 3; see pages 443-447 below.) Also
as a result of treatment occuring in document 10, Table 6.7
should show some weight for document 10.
- All of the TF values for documents where terms occur twice,
i.e., documents 1, 3, and 4 for hypertension and document 5 for
drug , should be 2.0 instead of 1.30.
Thus, Table 6.6 should appear as follows:
treatment - 1.4 * hypertension - 1.22 * drug 1.52 *
(doc 3) 1.0 = 1.4 (doc 1) 2.0 = 2.44 (doc 5) 2.0 = 3.04
(doc 5) 1.0 = 1.4 (doc 2) 1.0 = 1.22 (doc 6) 1.0 = 1.52
(doc 8) 1.0 = 1.4 (doc 3) 2.0 = 2.44 (doc 7) 1.0 = 1.52
(doc 10) 1.0 = 1.4 (doc 4) 2.0 = 2.44
(doc 5) 1.0 = 1.22
(doc 8) 1.0 = 1.22
Likewise, Table 6.7 should appear as follows:
Document Score
1 2.44
2 1.22
3 1.4 + 2.0 = 3.4
4 2.44
5 1.4 + 1.22 + 3.04 = 5.66
6 1.52
7 1.52
8 1.4 + 1.52 = 2.92
9 0
10 1.4
Document 5 still ends up ranking the highest, and the discussion
in the final paragraph of the chapter is still correct.
Pages 270-271 - In Table 8.3, there are errors for two of the
three query terms in the TF*IDF calculations which emanate from Appendix
3. The TF*IDF for DRUG in Document 5 should be 3.04, while the
TF*IDF for TREATMENT in Document 5 should be 1.4. This then makes
formula 1, which calculates the cosine between the query and Document
5 incorrect. The correct calculation is:
(This also makes some of the numbers in Table 8.4 incorrect as well,
the correction of which are left as an exercise for the reader.)
Pages 271-272 - There are also errors in Forumlas 2 and 3. Both
should have each term in the vector from the denominator squared. The
correct formulas are:
Page 281 - Formula (11), the formula for pair weight in relevance
feedback, is incorrect. The quantity after the minus sign should
be, correlation in nonrelevant documents, i.e.,
Pages 443-447 - In Appendix 3, the IDF for the word treatment
in documents 3, 5, and 8 should be 1.4, not 1.52.
It is correctly listed for document 10.
Typos
Page x (Preface) - In the fourth paragraph, line 4, the word disease
should be distance.
Page 12 - In the last paragraph of Section 1.4.2, the company Vertity
should be speed Verity (the URL is correct).
Pages 19-20 - In the first full paragraph on page 19, the first
three sentences repeat themselves identically in the second three
sentences of the paragraph. In the second of the repeated sentences,
the word on should be of.
Page 22 - In the fourth bullet of definitions
of information from Webster's dictionary, the word in
should be deleted.
Page 42 - In Table 2.3, under the Results heading
in the line on Participant Flow, the word eviations should
be deviations .
Page 45 - In the first full paragraph, line 10,
the words report their should be deleted.
Page 45 - In the second full paragraph, line 8,
the word rates should be rated. In the
following sentence, it should be stated that there were actually
three (not one) items in the readers' top 10 topics that appeared
in the experts' top 10.
Page 51 - In the third line of the final paragraph,
the word strudtured should be structured.
Page 56 - In the second full paragraph 2, line 1,
the word names should be named.
.
Page 68 - In the top paragraph, line 9, the word
medial should be medical.
Page 68 - In the last paragraph, line 11, the phrase
the antibiotics should be that antibiotics
.
Page 69 - In the top paragraph, line 3, the end
of the sentence should read, will play a role in the solution
.
Page 70 - In Table 2.6, the sixth question should
have the word text replaced with test.
Page 73 - In the second full paragraph, line 6,
the word to should appear in the phrase find a computer
use , i.e., find a computer to use.
Page 78 - In the second line just below equation
11, the word report should be replaced by reported
.
Page 85 - In the third full paragraph, line 2, the
word or should be nor.
Page 87 - In section 3.1.2, second line, the word
other should be others.
Page 98 - In section 3.3.1.2, third paragraph, the
first it (before should) should be deleted.
Page 102 - In section 3.3.1.4, second paragraph,
last line, is should be deleted.
Page 107 - In the last paragraph, second-to-last
line, the word previous should inserted between
in and ones .
Page 109 - In the second full paragraph, second-to-last
line, the word when should be deleted.
Page 113 - In section 3.5, second paragraph, line
4, the word to should be two. Also in
that section, the second-to-last paragraph refers to Table 3.5
but should refer to Table 3.6 .
Page 126 - In the first full paragraph, line 6, the
word in should be is.
Page 127 - In the last paragraph, line 9, the word
tern should be term.
Page 131 - In the first full paragraph, line 6, the
word of should be to. In section 4.3.2,
first paragraph, line 6, the word becuase should be became
.
Page 136 - In section 4.4, line 2, the word makes
should be inserted between which and it.
Page 148 - In the second full paragraph, the first line under formula (1),
the word th should be the and there is should be the word
the between i and is.
Page 152 - The legend of Figure 5.2 states that the
heading Hypertension is denoted by a heavy box. However,
this figure did not reproduce properly, and the box with C14.907.489
Hypertension inside is what should be denoted.
Page 156 - In the last paragraph of section 5.3.2,
line 7, the word Crptococcus is misspelled and should be
Cryptococcus . It should also be in monospaced
Courier font and non-italicized..
Page 161 - In Table 5.8, element STT, the word
varient should be variant. In element ST,
the word unreviweed should be unreviewed.
Page 167 - In Table 5.10, the word Museum has
an inappropriate close bracket, ].
Page 171 - At the bottom of Table 5.11, the source
for Open Directory should be dmoz.org .
Page 180 - The title of section 5.6 should be Indexing Images
.
Page 182 - In the first full paragraph, the last word of the paragraph
fixed should be field.
Page 184 - In the first full paragraph, line 5, the word
formulate should be formulates.
Page 185 - In the second paragraph, line 9, futility point
should be defined as the number of documents beyond which a searcher
will not continue to look at the results.
Page 186 - In Figure 6.1 at the very top, the word commom should
be common.
Page 194 - In the second line, the word explore should be
explode . In the second full paragraph, line 5, the word
home should be hone.
Page 198 - In the first paragraph, line 4, the MeSH term listed
should be Bites and Stings not Strings.
Page 209 - In the second full paragraph, line 4, the word
in should appear, i.e., between specific terms in the
classification hierarchy .
Page 209 - In the first line of the last paragraph, the word
gaining should be gaming.
Page 214 - In the first paragraph, line 3, the word generate
should be generated.
Page 215 - In Section 6.5, line 4, the word a should
appear, i.e., suited for a user.
Page 218 - There should be no = at the end of the chapter.
Page 220 - In Section 7.1.1, line 3, the word reasonable
should be reasonably.
Page 224 - In Section 7.1.2,line 7, the word on should
be an .
Page 226 - In Figure 7.2, the word Diagnasis should be Diagnosis
.
Page 235 - In the second paragraph of section 7.4.2.1, the word
transactions should be singular.
Page 236 - In the first full paragraph, line 5, the word
rates should be rated.
Page 237 - In the legend of Figure 7.3, the
citation should be McKibbon et al., 1990 (see bibliography
for full citation).
Page 253 - In the first full paragraph, line 6, the word
greed should be agreed.
In Chapter 8, there are several instances
where TF*IDF is incorrectly replaced with
TF*IOF :
Page 272 - first paragraph of section 8.2.2
Page 304 - first paragraph of section 8.4.2.3
Page 306 - second paragraph
Page 307 - first and second paragraphs
Page 309 - third paragraph
Page 273 - In Section 8.2.2.2, line 5, the word been
should precede the word adapted.
Page 273 - Equation 7 is slightly misformatted. Although
not incorrect, it could be potentially misleading. In the second
addend, the 1.5 is multiplied the whole quantity as follows:
Page 276 - In the first full paragraph, line 7,
the word in should precede the phrase free text.
Page 277 - In Section 8.2.5, line 8, the word in should
be deleted.
Page 278 - In the second full paragraph, line 1, the word
in should be is. In line 5, the second instance
of the word a should be deleted.
Page 281 - In the first paragraph after the numbered list, line
2 should end with the word be.
Page 284 - In the second paragraph, the word at the started
of the second sentence should be They, not The.
Page 285 - In the second paragraph, last line, the referral to Figure
5.5 should be to Table 5.15.
Page 289 - In the second paragraph of Section 8.3.2.2, line
4, the word studies should appear after (circa 1996-1999)
.
Page 292 - In Section 8.3.2.3, line 9, the word the should
appear after the first of and the word of should appear
group .
Page 295 - In the first full paragraph, line 5, the word
other should be another. In the second paragraph,
line 7, the word phrases should phrased. In the
fourth paragraph, line 2, the word a should precede 5-year
.
Page 296 - In Table 8.6, item 12, the word goes should
be goals .
Page 298 - In the first numbered list, item 2, the word is
should follow information.
Page 301 - In the first full paragraph, line 4, the word
searches should be searchers.
Page 309 - In Section 8.5, line 9, the word studies should
be studied.
Page 340 - In the second paragraph, line 6, the word occur
should be occurs.
Page 358 - In the first line, the word interventive should be
intervention .
Page 364 - In the second full paragraph, line 4, the word other
should be others.
Page 366 - In the third paragraph, line 1, the word automated
should be automate.
Page 377 - The second paragraph should begin, Typical of
a lexical-statistical IR system….
Page 383 - In section 10.4, line 1, the word device should be
devices . In addition, the headings of Table 10.3 are incorrect.
The headings With small screen and With large screen
should be switched.
Page 384 - In the first paragraph of Section 10.4.1, line 11,
67% should not have an open parenthesis in front of it.
Page 388 - In the second paragraph, line 3, with word attaining
should be attain. In the last paragraph, line 5, the word
a should be inserted between of and book.
Page 389 - In the first paragraph, line 2, the word area should
actually be are a.
Page 391 - In the numbered list in the second full paragraph, item 3,
the word located should locate.
Page 392 - In paragraph 3, line 4, the word been should be inserted
between adhered and to.
Page 405 - In the third full paragraph, the
word the should precede Canonical Phrase…
.
Page 431 - In Document 1 of Appndix 1, line 3, the
word it should be inserted between though and
varies . Since this is a stop word, the data in Appendices
2-3 are not affected.
Pages 443-447 - In Appendix 3, the heading for the third column
should be TF*IDF, not IDF*TF, for usage consistent
with the rest of the book.
Last updated - June 6, 2004