Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 6 |
Descriptor
Scoring | 23 |
Test Validity | 16 |
Validity | 6 |
Higher Education | 5 |
Multiple Choice Tests | 5 |
Test Reliability | 5 |
Comparative Analysis | 4 |
Scores | 4 |
Test Construction | 4 |
Test Results | 4 |
Accuracy | 3 |
More ▼ |
Source
Journal of Educational… | 23 |
Author
Publication Type
Journal Articles | 16 |
Reports - Research | 9 |
Reports - Evaluative | 4 |
Opinion Papers | 2 |
Reports - Descriptive | 2 |
Speeches/Meeting Papers | 1 |
Education Level
Audience
Location
Jordan | 1 |
Laws, Policies, & Programs
Assessments and Surveys
College and University… | 1 |
Graduate Record Examinations | 1 |
SAT (College Admission Test) | 1 |
Test of Standard Written… | 1 |
What Works Clearinghouse Rating
Ferrara, Steve; Qunbar, Saed – Journal of Educational Measurement, 2022
In this article, we argue that automated scoring engines should be transparent and construct relevant--that is, as much as is currently feasible. Many current automated scoring engines cannot achieve high degrees of scoring accuracy without allowing in some features that may not be easily explained and understood and may not be obviously and…
Descriptors: Artificial Intelligence, Scoring, Essays, Automation
Shermis, Mark D. – Journal of Educational Measurement, 2022
One of the challenges of discussing validity arguments for machine scoring of essays centers on the absence of a commonly held definition and theory of good writing. At best, the algorithms attempt to measure select attributes of writing and calibrate them against human ratings with the goal of accurate prediction of scores for new essays.…
Descriptors: Scoring, Essays, Validity, Writing Evaluation
Dorsey, David W.; Michaels, Hillary R. – Journal of Educational Measurement, 2022
We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement--one that has captured our collective interest and imagination. Scientists and practitioners within the domains…
Descriptors: Validity, Ethics, Artificial Intelligence, Evaluation Methods
Lane, Suzanne – Journal of Educational Measurement, 2019
Rater-mediated assessments require the evaluation of the accuracy and consistency of the inferences made by the raters to ensure the validity of score interpretations and uses. Modeling rater response processes allows for a better understanding of how raters map their representations of the examinee performance to their representation of the…
Descriptors: Responses, Accuracy, Validity, Interrater Reliability
Hopster-den Otter, Dorien; Wools, Saskia; Eggen, Theo J. H. M.; Veldkamp, Bernard P. – Journal of Educational Measurement, 2019
In educational practice, test results are used for several purposes. However, validity research is especially focused on the validity of summative assessment. This article aimed to provide a general framework for validating formative assessment. The authors applied the argument-based approach to validation to the context of formative assessment.…
Descriptors: Formative Evaluation, Test Validity, Scores, Inferences
Brennan, Robert L. – Journal of Educational Measurement, 2013
Kane's paper "Validating the Interpretations and Uses of Test Scores" is the most complete and clearest discussion yet available of the argument-based approach to validation. At its most basic level, validation as formulated by Kane is fundamentally a simply-stated two-step enterprise: (1) specify the claims inherent in a particular interpretation…
Descriptors: Validity, Test Interpretation, Test Use, Scores

Speedie, Stuart M.; And Others – Journal of Educational Measurement, 1971
Suggests that the correction procedure proposed by Clark and Mirels (Journal of Educational Measurement, 1970, 7, 83-86) is at best a partial correction. The merit of other precedures should be considered. (GS)
Descriptors: Creativity, Divergent Thinking, Intelligence, Responses
Incremental Reliability and Validity of Multiple-Choice Tests with an Answer-Until-Correct Procedure

Hanna, Gerald S. – Journal of Educational Measurement, 1975
An alternative to the conventional right-wrong scoring method used on multiple-choice tests was presented. In the experiment, the examinee continued to respond to a multiple-choice item until feedback signified a correct answer. Findings showed that experimental scores were more reliable but less valid than inferred conventional scores.…
Descriptors: Feedback, Higher Education, Multiple Choice Tests, Scoring

Ackerman, Terry A. – Journal of Educational Measurement, 1992
The difference between item bias and item impact and the way they relate to item validity are discussed from a multidimensional item response theory perspective. The Mantel-Haenszel procedure and the Simultaneous Item Bias strategy are used in a Monte Carlo study to illustrate detection of item bias. (SLD)
Descriptors: Causal Models, Computer Simulation, Construct Validity, Equations (Mathematics)

Hartnett, Rodney T. – Journal of Educational Measurement, 1971
Alternative scoring methods yield essentially the same information, including scale intercorrelations and validity. Reasons for preferring the traditional psychometric scoring technique are offered. (Author/AG)
Descriptors: College Environment, Comparative Analysis, Correlation, Item Analysis

Bridgeman, Brent; Morgan, Rick; Wang, Ming-mei – Journal of Educational Measurement, 1997
Test results of 915 high school students taking a history examination with a choice of topics show that students were generally able to pick the topic on which they could get the highest score. Implications for fair scoring when topic choice is allowed are discussed. (SLD)
Descriptors: Essay Tests, High School Students, History, Performance Factors

Williamson, David M.; Bejar, Isaac I.; Hone, Anne S. – Journal of Educational Measurement, 1999
Contrasts "mental models" used by automated scoring for the simulation division of the computerized Architect Registration Examination with those used by experienced human graders for 3,613 candidate solutions. Discusses differences in the models used and the potential of automated scoring to enhance the validity evidence of scores. (SLD)
Descriptors: Architects, Comparative Analysis, Computer Assisted Testing, Judges

Lennon, Roger T. – Journal of Educational Measurement, 1975
Reviews the 1974 Standards, an updating serving as a guide to test making and publishing, and training of persons for these endeavors. (DEP)
Descriptors: Educational Testing, Psychological Testing, Scoring, Standards

Ward, William C.; And Others – Journal of Educational Measurement, 1980
Free response and machine-scorable versions of a test called Formulating Hypotheses were compared with respect to construct validity. Results indicate that the different forms involve different cognitive processes and measure different qualities. (Author/JKS)
Descriptors: Cognitive Processes, Cognitive Tests, Higher Education, Personality Traits

Prediger, Dale; Hanson, Gary – Journal of Educational Measurement, 1977
Raw-score reports of vocational interest, personality traits and other psychological constructs are coming into common use. Using college seniors' scores on the American College Test Interest Inventory, criterion-related validity of standard scores based on same-sex and combined-sex norms was equal to or greater than that of raw scores.…
Descriptors: Higher Education, Interest Inventories, Majors (Students), Norms
Previous Page | Next Page »
Pages: 1 | 2