NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
ERIC Number: ED661266
Record Type: Non-Journal
Publication Date: 2024
Pages: 176
Abstractor: As Provided
ISBN: 979-8-3840-7091-7
ISSN: N/A
EISSN: N/A
Available Date: N/A
Unlocking Hierarchical Structural Semantics in HathiTrust Digital Library Historical Medical Book Indexes for Information Retrieval: Two Preliminary Investigations
Huapu Liu
ProQuest LLC, Ph.D. Dissertation, The University of Alabama
This two-part dissertation centers on a re-examination of the role of book indexes in information retrieval research on full-text digital book collections in digital libraries. Early research focused on information retrieval and book indexes (in addition to other parts of books) in the 2000s when the Google Books corpus was first released to the digital libraries research community. However, the current technical environment presents new opportunities to examine book indexes in digital library research that can address new theoretical roles for the book index as a human-constructed source of conceptual and semantic relationships. The first part of this dissertation is an empirical systems-oriented evaluation of book indexes deployed in a specific information retrieval role: the investigation explores the book index as a novel source of "local" conceptual knowledge for use as a thesaurus during the query expansion phase of an information retrieval process. The performance of expanded queries based on the book index was compared to the original queries and expanded queries generated through the Local Context Analysis and Word Embedding technique, as measured by a series of standardized retrieval effectiveness metrics. Overall, the evaluation results are promising and indicate the benefits of book index-based query expansion in multiple aspects, demonstrating the potential of the information embedded in book indexes to enhance information retrieval in the context of query expansion. The second part of this dissertation investigates book indexes in a more speculative mode asking the question: How could book indexes be integrated into, and advance contemporary information retrieval addressed in Part 1, as well as further the fields of semantic graph and ontology engineering? This inquiry leads directly to the problem tackled in this part of the dissertation: How to recognize and recover the hierarchical typographical layout of book indexes lost in standard OCR during the digitization process. Part 2 of this dissertation described and evaluated a novel clustering-based computational workflow designed to automatically recognize this lost hierarchical typographical layout, which is the key first step leading to future research possibilities in various contemporary information retrieval research areas related to digital libraries. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A
Author Affiliations: N/A