NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 10 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Baldwin, Peter; Clauser, Brian E. – Journal of Educational Measurement, 2022
While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way--or may be incompatible with common examinee…
Descriptors: Scoring, Testing, Test Items, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Guo, Wenjing; Wind, Stefanie A. – Journal of Educational Measurement, 2021
The use of mixed-format tests made up of multiple-choice (MC) items and constructed response (CR) items is popular in large-scale testing programs, including the National Assessment of Educational Progress (NAEP) and many district- and state-level assessments in the United States. Rater effects, or raters' scoring tendencies that result in…
Descriptors: Test Format, Multiple Choice Tests, Scoring, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010
In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of…
Descriptors: Measures (Individuals), Scoring, Equated Scores, Test Bias
Peer reviewed Peer reviewed
Ward, William C.; And Others – Journal of Educational Measurement, 1980
Free response and machine-scorable versions of a test called Formulating Hypotheses were compared with respect to construct validity. Results indicate that the different forms involve different cognitive processes and measure different qualities. (Author/JKS)
Descriptors: Cognitive Processes, Cognitive Tests, Higher Education, Personality Traits
Peer reviewed Peer reviewed
Mills, Craig N. – Journal of Educational Measurement, 1983
This study compares the results obtained using the Angoff, borderline group, and contrasting groups methods of determining performance standards. Congruent results were obtained from the Angoff and contrasting groups methods for several test forms. Borderline group standards were not similar to standards obtained with other methods. (Author/PN)
Descriptors: Comparative Analysis, Criterion Referenced Tests, Cutting Scores, Standard Setting (Scoring)
Peer reviewed Peer reviewed
And Others; Hughes, David C. – Journal of Educational Measurement, 1980
The effect of context on the scoring of essays was examined by arranging that the scoring of the criterion essay would be preceded either by five superior essays or by five inferior essays. The contrast in essay quality had the hypothesized effect. Other effects were not significant. (CTM)
Descriptors: Essay Tests, High Schools, Holistic Evaluation, Scoring
Peer reviewed Peer reviewed
Wilcox, Rand R.; Wilcox, Karen Thompson – Journal of Educational Measurement, 1988
Use of latent class models to examine strategies that examinees (92 college students) use for a specific task is illustrated, via a multiple-choice test of spatial ability. Under an answer-until-correct scoring procedure, models representing an improvement over simplistic random guessing are proposed. (SLD)
Descriptors: College Students, Decision Making, Guessing (Tests), Multiple Choice Tests
Peer reviewed Peer reviewed
Norcini, John J. – Journal of Educational Measurement, 1987
Answer keys for physician and teacher licensing examinations were studied. The impact of variability on total errors of measurement was examined for answer keys constructed using the aggregate method. Results indicated that, in some cases, scorers contributed to a sizable reduction in measurement error. (Author/GDC)
Descriptors: Adults, Answer Keys, Error of Measurement, Evaluators
Peer reviewed Peer reviewed
Braun, Henry I.; And Others – Journal of Educational Measurement, 1990
The accuracy with which expert systems (ESs) score a new nonmultiple-choice free-response test item was investigated, using 734 high school students who were administered an advanced-placement computer science examination. ESs produced scores for 82 percent to 95 percent of the responses and displayed high agreement with a human reader on the…
Descriptors: Advanced Placement, Computer Assisted Testing, Computer Science, Constructed Response
Peer reviewed Peer reviewed
Bridgeman, Brent – Journal of Educational Measurement, 1992
Examinees in a regular administration of the quantitative portion of the Graduate Record Examination responded to particular items in a machine-scannable multiple-choice format. Volunteers (n=364) used a computer to answer open-ended counterparts of these items. Scores for both formats demonstrated similar correlational patterns. (SLD)
Descriptors: Answer Sheets, College Entrance Examinations, College Students, Comparative Testing