ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	8

Descriptor

Test Items	15
Test Validity	15
Test Construction	10
Item Analysis	6
Computer Assisted Testing	4
Scores	4
Testing Problems	4
Achievement Tests	3
Evaluation Methods	3
Scoring	3
Standards	3
Test Interpretation	3
Difficulty Level	2
Elementary Secondary Education	2
Latent Trait Theory	2
Minority Groups	2
Models	2
Multiple Choice Tests	2
Norm Referenced Tests	2
Responses	2
School Districts	2
Standardized Tests	2
Test Bias	2
Test Format	2
Test Results	2
More ▼

Source

Educational Measurement:…

Publication Type

Journal Articles	15
Reports - Research	6
Reports - Descriptive	4
Reports - Evaluative	3
Opinion Papers	2
Guides - Non-Classroom	1
Information Analyses	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	1
Stanford Achievement Tests	1

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Instruction-Tuned Large-Language Models for Quality Control in Automatic Item Generation: A Feasibility Study

Peer reviewed

Direct link

Guher Gorgun; Okan Bulut – Educational Measurement: Issues and Practice, 2025

Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for…

Descriptors: Artificial Intelligence, Quality Control, Technology Uses in Education, Automation

Supporting the Interpretive Validity of Student-Level Claims in Science Assessment with Tiered Claim Structures

Peer reviewed

Direct link

Student, Sanford R.; Gong, Brian – Educational Measurement: Issues and Practice, 2022

We address two persistent challenges in large-scale assessments of the Next Generation Science Standards: (a) the validity of score interpretations that target the standards broadly and (b) how to structure claims for assessments of this complex domain. The NGSS pose a particular challenge for specifying claims about students that evidence from…

Descriptors: Science Tests, Test Validity, Test Items, Test Construction

Disrupted Data: Using Longitudinal Assessment Systems to Monitor Test Score Quality

Peer reviewed

Direct link

An, Lily Shiao; Ho, Andrew Dean; Davis, Laurie Laughlin – Educational Measurement: Issues and Practice, 2022

Technical documentation for educational tests focuses primarily on properties of individual scores at single points in time. Reliability, standard errors of measurement, item parameter estimates, fit statistics, and linking constants are standard technical features that external stakeholders use to evaluate items and individual scale scores.…

Descriptors: Documentation, Scores, Evaluation Methods, Longitudinal Studies

The Effect of Drag-and-Drop Item Features on Test-Taker Performance and Response Strategies

Peer reviewed

Direct link

Arslan, Burcu; Jiang, Yang; Keehner, Madeleine; Gong, Tao; Katz, Irvin R.; Yan, Fred – Educational Measurement: Issues and Practice, 2020

Computer-based educational assessments often include items that involve drag-and-drop responses. There are different ways that drag-and-drop items can be laid out and different choices that test developers can make when designing these items. Currently, these decisions are based on experts' professional judgments and design constraints, rather…

Descriptors: Test Items, Computer Assisted Testing, Test Format, Decision Making

Rapid-Guessing Behavior: Its Identification, Interpretation, and Implications

Peer reviewed

Direct link

Wise, Steven L. – Educational Measurement: Issues and Practice, 2017

The rise of computer-based testing has brought with it the capability to measure more aspects of a test event than simply the answers selected or constructed by the test taker. One behavior that has drawn much research interest is the time test takers spend responding to individual multiple-choice items. In particular, very short response…

Descriptors: Guessing (Tests), Multiple Choice Tests, Test Items, Reaction Time

A Process for Reviewing and Evaluating Generated Test Items

Peer reviewed

Direct link

Gierl, Mark J.; Lai, Hollis – Educational Measurement: Issues and Practice, 2016

Testing organization needs large numbers of high-quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time-consuming and expensive because each item is written,…

Descriptors: Test Items, Test Construction, Psychometrics, Models

Universal Design and Multimethod Approaches to Item Review

Peer reviewed

Direct link

Johnstone, Christopher J.; Thompson, Sandra J.; Bottsford-Miller, Nicole A.; Thurlow, Martha L. – Educational Measurement: Issues and Practice, 2008

Test items undergo multiple iterations of review before states and vendors deem them acceptable to be placed in a live statewide assessment. This article reviews three approaches that can add validity evidence to states' item review processes. The first process is a structured sensitivity review process that focuses on universal design…

Descriptors: Test Items, Disabilities, Test Construction, Testing Programs

Validity Issues in Test Speededness

Peer reviewed

Direct link

Lu, Ying; Sireci, Stephen G. – Educational Measurement: Issues and Practice, 2007

Speededness refers to the situation where the time limits on a standardized test do not allow substantial numbers of examinees to fully consider all test items. When tests are not intended to measure speed of responding, speededness introduces a severe threat to the validity of interpretations based on test scores. In this article, we describe…

Descriptors: Test Items, Timed Tests, Standardized Tests, Test Validity

The Golden Rule Settlement: A Minority Perspective.

Peer reviewed

Bond, Lloyd – Educational Measurement: Issues and Practice, 1987

This article suggests that mechanical application of Golden Rule-like procedures is inappropriate. The fundamental idea embodied in them, namely, that of taking issues of equity into account in test construction, may reasonably be done without doing violence to test validity. (JAZ)

Descriptors: Court Litigation, Item Analysis, Minority Groups, Standards

Using Standardized Tests for Assessing Local Learning Objectives.

Peer reviewed

Wilson, Sandra Meachan; Hiscox, Michael D. – Educational Measurement: Issues and Practice, 1984

This article presents a model that can be used by local school districts for reanalyzing standardized test results to obtain a more valid assessment of local learning objectives can be used to identify strengths/weaknesses of existing programs as well as individual students. (EGS)

Descriptors: Educational Objectives, Item Analysis, Models, School Districts

Test-Wiseness for Teachers and Students.

Peer reviewed

Carter, Kathy – Educational Measurement: Issues and Practice, 1986

This article discusses the validity issue in teacher-made tests. Seventh-grade students' comments about their responses to a test designed to illustrate faulty items suggests students are quite proficient in using secondary clues to figure out correct answers. Teacher comments suggest teachers are unaware they provide such clues. (Author/JAZ)

Descriptors: Cues, Grade 7, Item Analysis, Junior High Schools

Implications of the Golden Rule Settlement for Test Construction.

Peer reviewed

Linn, Robert L.; Drasgow, Fritz – Educational Measurement: Issues and Practice, 1987

This article discusses the application of the Golden Rule procedure to items of the Scholastic Aptitude Test. Using item response theory, the analyses indicate that the Golden Rule procedures are ineffective in detecting biased items and may undermine the reliability and validity of tests. (Author/JAZ)

Descriptors: College Entrance Examinations, Difficulty Level, Item Analysis, Latent Trait Theory

The Multiple True-False Item Format: A Status Review.

Peer reviewed

Frisbie, David A. – Educational Measurement: Issues and Practice, 1992

Literature related to the multiple true-false (MTF) item format is reviewed. Each answer cluster of a MTF item may have several true items and the correctness of each is judged independently. MTF tests appear efficient and reliable, although they are a bit harder than multiple choice items for examinees. (SLD)

Descriptors: Achievement Tests, Difficulty Level, Literature Reviews, Multiple Choice Tests

Valid Normative Information from Customized Achievement Tests.

Peer reviewed

Yen, Wendy M.; And Others – Educational Measurement: Issues and Practice, 1987

This paper discusses how to maintain the integrity of national nomative information for achievement tests when the test that is administered has been customized to satisfy local needs and is not a test that has been nationally normed. Alternative procedures for item selection and calibration are examined. (Author/LMO)

Descriptors: Achievement Tests, Elementary Secondary Education, Goodness of Fit, Item Analysis

Customizing a Norm-References Achievement Test to Achieve Curricular Validity: A Case Study.

Peer reviewed

Jolly, S. Jean; Gramenz, Gary W. – Educational Measurement: Issues and Practice, 1984

A norm-referenced achievement test, in combination with supplementary items, can be used to produce norm-referenced data as well as objective-referenced data. The experiences of the Palm Beach County (Florida) school district in developing and using such a test are described. (EGS)

Descriptors: Achievement Tests, Criterion Referenced Tests, Elementary Secondary Education, Item Analysis

An, Lily Shiao	1
Arslan, Burcu	1
Bond, Lloyd	1
Bottsford-Miller, Nicole A.	1
Carter, Kathy	1
Davis, Laurie Laughlin	1
Drasgow, Fritz	1
Frisbie, David A.	1
Gierl, Mark J.	1
Gong, Brian	1
Gong, Tao	1
Gramenz, Gary W.	1
Guher Gorgun	1
Hiscox, Michael D.	1
Ho, Andrew Dean	1
Jiang, Yang	1
Johnstone, Christopher J.	1
Jolly, S. Jean	1
Katz, Irvin R.	1
Keehner, Madeleine	1
Lai, Hollis	1
Linn, Robert L.	1
Lu, Ying	1
Okan Bulut	1
More ▼