(5/26/05) Do not construe the lack of update of this chapter to
mean
that nothing has been happening with information extraction
lately. The main reason why I have not updated this
page is that I do not cover this chapter in my introductory IR course,
BMI 514, at OHSU. Probably the major activity is text mining in
the bioinformatics domain, including some TREC-like initiatives like
Biocreative. I was lucky enough to co-author (with Aaron Cohen) a
recent overview of the field (Cohen, 2005).
Cohen, A. and Hersh, W. (2005). A survey of current work in biomedical
text mining. Briefings in
Bioinformatics, 6: 57-71.
11.1 Patient-specific information
11.1.1 Challenges in processing the clinical narrative
11.1.2 Approaches to extraction from the clinical narrative
11.1.2.1 Early approaches
11.1.2.2 MedLEE
11.1.2.3 SymText
11.1.2.4 Other active systems for processing the clinical
narrative
11.1.3 Classification of the clinical narrative
11.1.3.1 Controlled clinical vocabularies and their
limitations
11.1.3.2 Requirements for clinical vocabularies
11.1.3.3 Multi-axial vocabulary efforts
11.1.3.4 Vocabulary-based classification of the clinical
narrative
11.1.4 Alternatives to natural language input of medical data
11.1.5 Future directions for clinical data capture and
analysis
11.2 Knowledge-based information
(5/23/03) A standardized corpus for IE experiments in genomics
attracting
increasing use is the GENIA corpus. The Web site for this project
is
at: http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/