Abstract
One of the primary challenges for the creation of digital libraries is to enhance the value of paper-based publications by providing digital access to the materials. Simple full-text searching is just a first step in this process. Better functionality may be gained by exploiting the natural structure within text. The following paper describes the process of digital conversion and integration of encyclopedic publications, glossaries and thesauri. The Biological Information Browsing (http://www.biobrowser.org) team developed text-processing tools, and an information retrieval and visualization environment that provides greater functionality for these traditionally paper-based publications. The process includes automatic text segmentation and structuring, automated XML markup, structure-based indexing, automatic thesaurus extraction for query expansion and on-line definitions. Very few other information systems provide complete services for publishing, indexing, XML query and retrieving documents.
Original language | English (US) |
---|---|
Pages | 377 |
Number of pages | 1 |
DOIs | |
State | Published - 2002 |
Event | Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries - Portland, OR, United States Duration: Jul 14 2002 → Jul 18 2002 |
Other
Other | Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries |
---|---|
Country/Territory | United States |
City | Portland, OR |
Period | 7/14/02 → 7/18/02 |
Keywords
- Electronic publishing
- Indexing
- Information retrieval
- Structured text
- XML
ASJC Scopus subject areas
- Software
- Information Systems
- Computer Science Applications
- Library and Information Sciences