Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 3 |
Descriptor
Test Format | 10 |
Scoring | 8 |
Test Items | 5 |
Multiple Choice Tests | 4 |
Test Construction | 4 |
College Students | 2 |
Comparative Analysis | 2 |
Comparative Testing | 2 |
Computer Assisted Testing | 2 |
High Schools | 2 |
Higher Education | 2 |
More ▼ |
Source
Journal of Educational… | 10 |
Author
Baldwin, Peter | 1 |
Braun, Henry I. | 1 |
Bridgeman, Brent | 1 |
Clauser, Brian E. | 1 |
Guo, Wenjing | 1 |
Hughes, David C. | 1 |
Kim, Sooyeon | 1 |
McHale, Frederick | 1 |
Mills, Craig N. | 1 |
Norcini, John J. | 1 |
Walker, Michael E. | 1 |
More ▼ |
Publication Type
Journal Articles | 10 |
Reports - Research | 7 |
Reports - Evaluative | 2 |
Reports - Descriptive | 1 |
Education Level
Elementary Secondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 2 |
Advanced Placement… | 1 |
National Assessment of… | 1 |
What Works Clearinghouse Rating
Baldwin, Peter; Clauser, Brian E. – Journal of Educational Measurement, 2022
While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way--or may be incompatible with common examinee…
Descriptors: Scoring, Testing, Test Items, Test Format
Guo, Wenjing; Wind, Stefanie A. – Journal of Educational Measurement, 2021
The use of mixed-format tests made up of multiple-choice (MC) items and constructed response (CR) items is popular in large-scale testing programs, including the National Assessment of Educational Progress (NAEP) and many district- and state-level assessments in the United States. Rater effects, or raters' scoring tendencies that result in…
Descriptors: Test Format, Multiple Choice Tests, Scoring, Test Items
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010
In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of…
Descriptors: Measures (Individuals), Scoring, Equated Scores, Test Bias

Ward, William C.; And Others – Journal of Educational Measurement, 1980
Free response and machine-scorable versions of a test called Formulating Hypotheses were compared with respect to construct validity. Results indicate that the different forms involve different cognitive processes and measure different qualities. (Author/JKS)
Descriptors: Cognitive Processes, Cognitive Tests, Higher Education, Personality Traits

Mills, Craig N. – Journal of Educational Measurement, 1983
This study compares the results obtained using the Angoff, borderline group, and contrasting groups methods of determining performance standards. Congruent results were obtained from the Angoff and contrasting groups methods for several test forms. Borderline group standards were not similar to standards obtained with other methods. (Author/PN)
Descriptors: Comparative Analysis, Criterion Referenced Tests, Cutting Scores, Standard Setting (Scoring)

And Others; Hughes, David C. – Journal of Educational Measurement, 1980
The effect of context on the scoring of essays was examined by arranging that the scoring of the criterion essay would be preceded either by five superior essays or by five inferior essays. The contrast in essay quality had the hypothesized effect. Other effects were not significant. (CTM)
Descriptors: Essay Tests, High Schools, Holistic Evaluation, Scoring

Wilcox, Rand R.; Wilcox, Karen Thompson – Journal of Educational Measurement, 1988
Use of latent class models to examine strategies that examinees (92 college students) use for a specific task is illustrated, via a multiple-choice test of spatial ability. Under an answer-until-correct scoring procedure, models representing an improvement over simplistic random guessing are proposed. (SLD)
Descriptors: College Students, Decision Making, Guessing (Tests), Multiple Choice Tests

Norcini, John J. – Journal of Educational Measurement, 1987
Answer keys for physician and teacher licensing examinations were studied. The impact of variability on total errors of measurement was examined for answer keys constructed using the aggregate method. Results indicated that, in some cases, scorers contributed to a sizable reduction in measurement error. (Author/GDC)
Descriptors: Adults, Answer Keys, Error of Measurement, Evaluators

Braun, Henry I.; And Others – Journal of Educational Measurement, 1990
The accuracy with which expert systems (ESs) score a new nonmultiple-choice free-response test item was investigated, using 734 high school students who were administered an advanced-placement computer science examination. ESs produced scores for 82 percent to 95 percent of the responses and displayed high agreement with a human reader on the…
Descriptors: Advanced Placement, Computer Assisted Testing, Computer Science, Constructed Response

Bridgeman, Brent – Journal of Educational Measurement, 1992
Examinees in a regular administration of the quantitative portion of the Graduate Record Examination responded to particular items in a machine-scannable multiple-choice format. Volunteers (n=364) used a computer to answer open-ended counterparts of these items. Scores for both formats demonstrated similar correlational patterns. (SLD)
Descriptors: Answer Sheets, College Entrance Examinations, College Students, Comparative Testing