Descriptor
Scoring Formulas | 11 |
Test Length | 11 |
Test Items | 5 |
Test Reliability | 5 |
Cutting Scores | 3 |
Error of Measurement | 3 |
Item Analysis | 3 |
Latent Trait Theory | 3 |
Mastery Tests | 3 |
Multiple Choice Tests | 3 |
Scores | 3 |
More ▼ |
Source
Assessment & Evaluation in… | 1 |
Educational and Psychological… | 1 |
Evaluation in Education:… | 1 |
Review of Educational Research | 1 |
Author
Huynh, Huynh | 2 |
Bejar, Isaac I. | 1 |
Burton, Richard F. | 1 |
Gilmer, Jerry S. | 1 |
Hisama, Kay K. | 1 |
Kaiser, Henry F. | 1 |
Lenel, Julia C. | 1 |
Livingston, Samuel A. | 1 |
Maurelli, Vincent A. | 1 |
Millman, Jason | 1 |
Saunders, Joseph C. | 1 |
More ▼ |
Publication Type
Reports - Research | 6 |
Reports - Evaluative | 3 |
Speeches/Meeting Papers | 3 |
Journal Articles | 2 |
Numerical/Quantitative Data | 1 |
Education Level
Audience
Researchers | 1 |
Location
Laws, Policies, & Programs
Assessments and Surveys
Comprehensive Tests of Basic… | 2 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating

Serlin, Ronald C.; Kaiser, Henry F. – Educational and Psychological Measurement, 1978
When multiple-choice tests are scored in the usual manner, giving each correct answer one point, information concerning response patterns is lost. A method for utilizing this information is suggested. An example is presented and compared with two conventional methods of scoring. (Author/JKS)
Descriptors: Correlation, Factor Analysis, Item Analysis, Multiple Choice Tests
Livingston, Samuel A. – 1981
The standard error of measurement (SEM) is a measure of the inconsistency in the scores of a particular group of test-takers. It is largest for test-takers with scores ranging in the 50 percent correct bracket; with nearly perfect scores, it is smaller. On tests used to make pass/fail decisions, the test-takers' scores tend to cluster in the range…
Descriptors: Error of Measurement, Estimation (Mathematics), Mathematical Formulas, Pass Fail Grading

Millman, Jason – Review of Educational Research, 1973
Procedures for establishing standards and determining the number of items needed in criterion referenced measures were reviewed. Discussion of setting a passing score was organized around: performance of others, item content, educational consequences, psychological and financial costs, and error due to guessing and item sampling. (Author)
Descriptors: Criterion Referenced Tests, Educational Research, Literature Reviews, Measurement Techniques
Huynh, Huynh; Saunders, Joseph C., III – 1979
The Bayesian approach to setting passing scores, as proposed by Swaminathan, Hambleton, and Algina, is compared with the empirical Bayes approach to the same problem that is derived from Huynh's decision-theoretic framework. Comparisons are based on simulated data which follow an approximate beta-binomial distribution and on real test results from…
Descriptors: Bayesian Statistics, Cutting Scores, Grade 3, Mastery Tests
van der Linden, Wim J. – Evaluation in Education: International Progress, 1982
In mastery testing a linear relationship between an optimal passing score and test length is presented with a new optimization criterion. The usual indifference zone approach, a binomial error model, decision errors, and corrections for guessing are discussed. Related results in sequential testing and the latent class approach are included. (CM)
Descriptors: Cutting Scores, Educational Testing, Mastery Tests, Mathematical Models
Multiple Choice and True/False Tests: Reliability Measures and Some Implications of Negative Marking
Burton, Richard F. – Assessment & Evaluation in Higher Education, 2004
The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this…
Descriptors: Multiple Choice Tests, Error of Measurement, Test Reliability, Test Items
Saunders, Joseph C.; Huynh, Huynh – 1980
In most reliability studies, the precision of a reliability estimate varies inversely with the number of examinees (sample size). Thus, to achieve a given level of accuracy, some minimum sample size is required. An approximation for this minimum size may be made if some reasonable assumptions regarding the mean and standard deviation of the test…
Descriptors: Cutting Scores, Difficulty Level, Error of Measurement, Mastery Tests
Lenel, Julia C.; Gilmer, Jerry S. – 1986
In some testing programs an early item analysis is performed before final scoring in order to validate the intended keys. As a result, some items which are flawed and do not discriminate well may be keyed so as to give credit to examinees no matter which answer was chosen. This is referred to as allkeying. This research examined how varying the…
Descriptors: Equated Scores, Item Analysis, Latent Trait Theory, Licensing Examinations (Professions)
Maurelli, Vincent A.; Weiss, David J. – 1981
A monte carlo simulation was conducted to assess the effects in an adaptive testing strategy for test batteries of varying subtest order, subtest termination criterion, and variable versus fixed entry on the psychometric properties of an existent achievement test battery. Comparisons were made among conventionally administered tests and adaptive…
Descriptors: Achievement Tests, Adaptive Testing, Computer Assisted Testing, Latent Trait Theory
Bejar, Isaac I. – 1985
The Test of English as a Foreign Language (TOEFL) was used in this study, which attempted to develop a new methodology for assessing the speededness of right-scored tests. Traditional procedures of assessing speededness have assumed that the test is scored under formula-scoring instructions; this approach is not always appropriate. In this study,…
Descriptors: College Entrance Examinations, English (Second Language), Estimation (Mathematics), Evaluation Methods
Hisama, Kay K.; And Others – 1977
The optimal test length, using predictive validity as a criterion, depends on two major conditions: the appropriate item-difficulty rather than the total number of items, and the method used in scoring the test. These conclusions were reached when responses to a 100-item multi-level test of reading comprehension from 136 non-native speakers of…
Descriptors: College Students, Difficulty Level, English (Second Language), Foreign Students