ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	6

Descriptor

Scoring	23
Test Validity	16
Validity	6
Higher Education	5
Multiple Choice Tests	5
Test Reliability	5
Comparative Analysis	4
Scores	4
Test Construction	4
Test Results	4
Accuracy	3
Correlation	3
Essay Tests	3
Essays	3
Guessing (Tests)	3
Test Interpretation	3
Writing Evaluation	3
Writing Skills	3
Artificial Intelligence	2
Computer Assisted Testing	2
Educational Assessment	2
Evaluation Criteria	2
Evaluation Methods	2
Inferences	2
Mathematical Models	2
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	16
Reports - Research	9
Reports - Evaluative	4
Opinion Papers	2
Reports - Descriptive	2
Speeches/Meeting Papers	1

Education Level

Audience

Location

Jordan

Laws, Policies, & Programs

Assessments and Surveys

College and University…	1
Graduate Record Examinations	1
SAT (College Admission Test)	1
Test of Standard Written…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 23 results Save | Export

Validity Arguments for AI-Based Automated Scores: Essay Scoring as an Illustration

Peer reviewed

Direct link

Ferrara, Steve; Qunbar, Saed – Journal of Educational Measurement, 2022

In this article, we argue that automated scoring engines should be transparent and construct relevant--that is, as much as is currently feasible. Many current automated scoring engines cannot achieve high degrees of scoring accuracy without allowing in some features that may not be easily explained and understood and may not be obviously and…

Descriptors: Artificial Intelligence, Scoring, Essays, Automation

Anchoring Validity Evidence for Automated Essay Scoring

Peer reviewed

Direct link

Shermis, Mark D. – Journal of Educational Measurement, 2022

One of the challenges of discussing validity arguments for machine scoring of essays centers on the absence of a commonly held definition and theory of good writing. At best, the algorithms attempt to measure select attributes of writing and calibrate them against human ratings with the goal of accurate prediction of scores for new essays.…

Descriptors: Scoring, Essays, Validity, Writing Evaluation

Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment

Peer reviewed

Direct link

Dorsey, David W.; Michaels, Hillary R. – Journal of Educational Measurement, 2022

We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement--one that has captured our collective interest and imagination. Scientists and practitioners within the domains…

Descriptors: Validity, Ethics, Artificial Intelligence, Evaluation Methods

Modeling Rater Response Processes in Evaluating Score Meaning

Peer reviewed

Direct link

Lane, Suzanne – Journal of Educational Measurement, 2019

Rater-mediated assessments require the evaluation of the accuracy and consistency of the inferences made by the raters to ensure the validity of score interpretations and uses. Modeling rater response processes allows for a better understanding of how raters map their representations of the examinee performance to their representation of the…

Descriptors: Responses, Accuracy, Validity, Interrater Reliability

A General Framework for the Validation of Embedded Formative Assessment

Peer reviewed

Direct link

Hopster-den Otter, Dorien; Wools, Saskia; Eggen, Theo J. H. M.; Veldkamp, Bernard P. – Journal of Educational Measurement, 2019

In educational practice, test results are used for several purposes. However, validity research is especially focused on the validity of summative assessment. This article aimed to provide a general framework for validating formative assessment. The authors applied the argument-based approach to validation to the context of formative assessment.…

Descriptors: Formative Evaluation, Test Validity, Scores, Inferences

Commentary on "Validating the Interpretations and Uses of Test Scores"

Peer reviewed

Direct link

Brennan, Robert L. – Journal of Educational Measurement, 2013

Kane's paper "Validating the Interpretations and Uses of Test Scores" is the most complete and clearest discussion yet available of the argument-based approach to validation. At its most basic level, validation as formulated by Kane is fundamentally a simply-stated two-step enterprise: (1) specify the claims inherent in a particular interpretation…

Descriptors: Validity, Test Interpretation, Test Use, Scores

Comment on "Fluency as a Pervasive Element in the Measurement of Creativity"

Peer reviewed

Speedie, Stuart M.; And Others – Journal of Educational Measurement, 1971

Suggests that the correction procedure proposed by Clark and Mirels (Journal of Educational Measurement, 1970, 7, 83-86) is at best a partial correction. The merit of other precedures should be considered. (GS)

Descriptors: Creativity, Divergent Thinking, Intelligence, Responses

Incremental Reliability and Validity of Multiple-Choice Tests with an Answer-Until-Correct Procedure

Peer reviewed

Hanna, Gerald S. – Journal of Educational Measurement, 1975

An alternative to the conventional right-wrong scoring method used on multiple-choice tests was presented. In the experiment, the examinee continued to respond to a multiple-choice item until feedback signified a correct answer. Findings showed that experimental scores were more reliable but less valid than inferred conventional scores.…

Descriptors: Feedback, Higher Education, Multiple Choice Tests, Scoring

A Didactic Explanation of Item Bias, Item Impact, and Item Validity from a Multidimensional Perspective.

Peer reviewed

Ackerman, Terry A. – Journal of Educational Measurement, 1992

The difference between item bias and item impact and the way they relate to item validity are discussed from a multidimensional item response theory perspective. The Mantel-Haenszel procedure and the Simultaneous Item Bias strategy are used in a Monte Carlo study to illustrate detection of item bias. (SLD)

Descriptors: Causal Models, Computer Simulation, Construct Validity, Equations (Mathematics)

A Note on the Comparability of Alternative Scoring Methods for the Institutional Functioning Inventory

Peer reviewed

Hartnett, Rodney T. – Journal of Educational Measurement, 1971

Alternative scoring methods yield essentially the same information, including scale intercorrelations and validity. Reasons for preferring the traditional psychometric scoring technique are offered. (Author/AG)

Descriptors: College Environment, Comparative Analysis, Correlation, Item Analysis

Choice among Essay Topics: Impact on Performance and Validity.

Peer reviewed

Bridgeman, Brent; Morgan, Rick; Wang, Ming-mei – Journal of Educational Measurement, 1997

Test results of 915 high school students taking a history examination with a choice of topics show that students were generally able to pick the topic on which they could get the highest score. Implications for fair scoring when topic choice is allowed are discussed. (SLD)

Descriptors: Essay Tests, High School Students, History, Performance Factors

"Mental Model" Comparison of Automated and Human Scoring.

Peer reviewed

Williamson, David M.; Bejar, Isaac I.; Hone, Anne S. – Journal of Educational Measurement, 1999

Contrasts "mental models" used by automated scoring for the simulation division of the computerized Architect Registration Examination with those used by experienced human graders for 3,613 candidate solutions. Discusses differences in the models used and the potential of automated scoring to enhance the validity evidence of scores. (SLD)

Descriptors: Architects, Comparative Analysis, Computer Assisted Testing, Judges

Standards for Educational & Psychological Tests

Peer reviewed

Lennon, Roger T. – Journal of Educational Measurement, 1975

Reviews the 1974 Standards, an updating serving as a guide to test making and publishing, and training of persons for these endeavors. (DEP)

Descriptors: Educational Testing, Psychological Testing, Scoring, Standards

Construct Validity of Free-Response and Machine-Scorable Forms of a Test.

Peer reviewed

Ward, William C.; And Others – Journal of Educational Measurement, 1980

Free response and machine-scorable versions of a test called Formulating Hypotheses were compared with respect to construct validity. Results indicate that the different forms involve different cognitive processes and measure different qualities. (Author/JKS)

Descriptors: Cognitive Processes, Cognitive Tests, Higher Education, Personality Traits

Some Consequences of Using Raw-Score Reports of Vocational Interests

Peer reviewed

Prediger, Dale; Hanson, Gary – Journal of Educational Measurement, 1977

Raw-score reports of vocational interest, personality traits and other psychological constructs are coming into common use. Using college seniors' scores on the American College Test Interest Inventory, criterion-related validity of standard scores based on same-sex and combined-sex norms was equal to or greater than that of raw scores.…

Descriptors: Higher Education, Interest Inventories, Majors (Students), Norms

Previous Page | Next Page »

Pages: 1 | 2

Wainer, Howard	2
Ackerman, Terry A.	1
Bejar, Isaac I.	1
Breland, Hunter M.	1
Brennan, Robert L.	1
Bridgeman, Brent	1
Chase, Clinton I.	1
Diamond, James J.	1
Dorsey, David W.	1
Eggen, Theo J. H. M.	1
Ferrara, Steve	1
Gaynor, Judith L.	1
Hanna, Gerald S.	1
Hanson, Gary	1
Hartnett, Rodney T.	1
Hone, Anne S.	1
Hopster-den Otter, Dorien	1
Jacobs, Stanley S.	1
Jaradat, Derar	1
Lane, Suzanne	1
Lennon, Roger T.	1
Michaels, Hillary R.	1
Morgan, Rick	1
Moss, Pamela A.	1
More ▼