ERIC Number: ED670180
Record Type: Non-Journal
Publication Date: 2021
Pages: 426
Abstractor: As Provided
ISBN: 979-8-4604-3529-6
ISSN: N/A
EISSN: N/A
Available Date: 0000-00-00
Improving Collection Understanding for Web Archives with Storytelling: Shining Light into Dark and Stormy Archives
Shawn M. Jones
ProQuest LLC, Ph.D. Dissertation, Old Dominion University
Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections because search engines do not currently represent multiple document versions well. Web archive collections are vast, some containing hundreds of thousands of documents. Thousands of collections exist, many of which cover the same topic. Few collections include standardized metadata. Too many documents from too many collections with insufficient metadata makes collection understanding an expensive proposition. This dissertation establishes a five-process model to assist with web archive collection understanding. This model aims to produce a social media story -- a visualization with which most web users are familiar. Each social media story contains surrogates which are summaries of individual documents. These surrogates, when presented together, summarize the topic of the story. After applying our storytelling model, they summarize the topic of a web archive collection. We develop and test a framework to select the best exemplars that represent a collection. We establish that algorithms produced from these primitives select exemplars that are otherwise undiscoverable using conventional search engine methods. We generate story metadata to improve the information scent of a story so users can understand it better. After an analysis showing that existing platforms perform poorly for web archives and a user study establishing the best surrogate type, we generate document metadata for the exemplars with machine learning. We then visualize the story and document metadata together and distribute it to satisfy the information needs of multiple personas who benefit from our model. Our tools serve as a reference implementation of our "Dark and Stormy Archives" storytelling model. "Hypercane" selects exemplars and generates story metadata. "MementoEmbed" generates document metadata. "Raintale" visualizes and distributes the story based on the story metadata and the document metadata of these exemplars. By providing understanding immediately, our stories save users the time and effort of reading thousands of documents and, most importantly, help them understand web archive collections. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
Descriptors: Archives, Web Sites, Story Telling, Social Media, Metadata, Algorithms, Electronic Publishing, Usability, User Needs (Information), Users (Information), Artificial Intelligence, Documentation
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Authoring Institution: N/A
Grant or Contract Numbers: RE7018000518; LG7115007715
Author Affiliations: N/A