NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 9 results Save | Export
Peer reviewed Peer reviewed
Huynh, Huynh; Saunders, Joseph C. – Journal of Educational Measurement, 1980
Single administration (beta-binomial) estimates for the raw agreement index p and the corrected-for-chance kappa index in mastery testing are compared with those based on two test administrations in terms of estimation bias and sampling variability. Bias is about 2.5 percent for p and 10 percent for kappa. (Author/RL)
Descriptors: Comparative Analysis, Error of Measurement, Mastery Tests, Mathematical Models
Subkoviak, Michael J.; Harris, Deborah J. – 1984
This study examined three statistical methods for selecting items for mastery tests. One is the pretest-posttest method due to Cox and Vargas (1966); it is computationally simple, but has a number of serious limitations. The second is a latent trait method recommended by van der Linden (1981); it is computationally complex, but has a number of…
Descriptors: Comparative Analysis, Elementary Secondary Education, Item Analysis, Latent Trait Theory
Frick, Theodore W. – 1991
Expert systems can be used to aid decisionmaking. A computerized adaptive test is one kind of expert system, although not commonly recognized as such. A new approach, termed EXSPRT, was devised that combines expert systems reasoning and sequential probability ratio test stopping rules. Two versions of EXSPRT were developed, one with random…
Descriptors: Adaptive Testing, Comparative Analysis, Computer Assisted Testing, Expert Systems
Mills, Craig N.; Melican, Gerald J. – 1987
The study compares three methods for establishing cut-off scores that effect a compromise between absolute cut-offs based on item difficulty and relative cut-offs based on expected passing rates. Each method coordinates these two types of information differently. The Beuk method obtains judges' estimates of an absolute cut-off and an expected…
Descriptors: Academic Standards, Certification, Comparative Analysis, Cutting Scores
Hambleton, Ronald K.; And Others – 1987
The study compared two promising item response theory (IRT) item-selection methods, optimal and content-optimal, with two non-IRT item selection methods, random and classical, for use in fixed-length certification exams. The four methods were used to construct 20-item exams from a pool of approximately 250 items taken from a 1985 certification…
Descriptors: Comparative Analysis, Content Validity, Cutting Scores, Difficulty Level
Klein, Thomas W. – 1990
Characteristics that distinguish criterion-referenced tests from their norm-referenced counterparts are discussed, including: the purposes that they are designed to serve; the characteristics of the types of items that they contain; and the manner in which they are developed. More specifically, the distinguishing characteristics include: reference…
Descriptors: Comparative Analysis, Criterion Referenced Tests, Differences, Educational Assessment
Phillips, Gary W. – 1982
This paper presents an introduction to the use of latent trait models for the estimation of domain scores. It was shown that these models provided an advantage over classical test theory and binomial error models in that unbiased estimates of true domain scores could be obtained even when items were not randomly selected from a universe of items.…
Descriptors: Comparative Analysis, Criterion Referenced Tests, Estimation (Mathematics), Goodness of Fit
Sarvela, Paul D. – 1986
Four discrimination indices were compared, using score distributions which were normal, bimodal, and negatively skewed. The score distributions were systematically varied to represent the common circumstances of a military training situation using criterion-referenced mastery tests. Three 20-item tests were administered to 110 simulated subjects.…
Descriptors: Comparative Analysis, Criterion Referenced Tests, Item Analysis, Mastery Tests
Beard, Jacob G.; Pettie, Allan L. – 1979
Test results from the Florida Educational Assessment of third and fifth grade communications and mathematics skills were used to compare linear and Rasch equating results. The samples consisted of over 5,000 cases for each grade and content area. The tests contained some items common to both the 1976 and 1977 test forms, but no fewer than 20…
Descriptors: Basic Skills, Communication Skills, Comparative Analysis, Difficulty Level