NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 8 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Cai, Zhiqiang; Li, Hiyiang; Hu, Xiangen; Graesser, Art – Grantee Submission, 2016
This paper provides an alternative way of document representation by treating topic probabilities as a vector representation for words and representing a document as a combination of the word vectors. A comparison on summary data shows that this representation is more effective in document classification. [This paper was published in:…
Descriptors: Probability, Natural Language Processing, Models, Automation
Peer reviewed Peer reviewed
Shaw, W. M., Jr. – Information Processing and Management, 1990
Investigates the presence of clustering structure in a document collection and the influence of the presence of clustering structure on the success of cluster-based retrieval. Term-weight and similarity thresholds are discussed, empirical and statistical significance are considered, and indexing exhaustivity for document representation is…
Descriptors: Cluster Grouping, Documentation, Indexing, Information Retrieval
Murray, Daniel McClure – 1972
A retrieval system is considered in which document descriptions are stored and accessed in groups called clusters. All items in a cluster meet common similarity criteria and are represented by a composite entity called a profile. In large collections, profiles themselves are clustered and additional levels of profiles are generated. This entire…
Descriptors: Cluster Grouping, Doctoral Dissertations, Documentation, Information Retrieval
Peer reviewed Peer reviewed
Rasmussen, Edie M.; And Others – Information Processing and Management, 1991
This issue contains nine articles that provide an overview of trends and research in parallel information retrieval. Topics discussed include network design for text searching; the Connection Machine System; PThomas, an adaptive information retrieval system on the Connection Machine; algorithms for document clustering; and system architecture for…
Descriptors: Algorithms, Cluster Grouping, Computer Networks, Computer System Design
Peer reviewed Peer reviewed
Gordon, Michael D. – Journal of the American Society for Information Science, 1991
Discussion of clustering of documents and queries in information retrieval systems focuses on the use of a genetic algorithm to adapt subject descriptions so that documents become more effective in matching relevant queries. Various types of clustering are explained, and simulation experiments used to test the genetic algorithm are described. (27…
Descriptors: Algorithms, Cluster Grouping, Documentation, Information Retrieval
PDF pending restoration PDF pending restoration
White, Lee J.; And Others – 1975
The major advantage of sequential classification, a technique for automatically classifying documents into previously selected categories, is that the entire document need not be processed before it is classified. This method assumes the availability of a priori categories, a selection of keywords representative of these categories, and the a…
Descriptors: Algorithms, Automatic Indexing, Bayesian Statistics, Classification
Peer reviewed Peer reviewed
Shaw, W. M., Jr. – Information Processing and Management, 1993
Describes a study conducted on the cystic fibrosis (CF) database, a subset of MEDLINE, that investigated clustering structure and the effectiveness of cluster-based retrieval as a function of the exhaustivity of the uncontrolled subject descriptions. Results are compared to calculations for controlled descriptions based on Medical Subject Headings…
Descriptors: Bibliographic Records, Cluster Analysis, Cluster Grouping, Comparative Analysis
Kar, B. Gautam; White, Lee J. – 1975
The feasibility of using a distance measure, called the Bayesian distance, for automatic sequential document classification was studied. Results indicate that, by observing the variation of this distance measure as keywords are extracted sequentially from a document, the occurrence of noisy keywords may be detected. This property of the distance…
Descriptors: Algorithms, Automatic Indexing, Bayesian Statistics, Classification