ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	15
Since 2006 (last 20 years)	19

Descriptor

Test Construction	42
Test Items	42
Test Validity	10
Computer Assisted Testing	9
Item Analysis	9
Elementary Secondary Education	8
Item Response Theory	8
Test Format	8
Educational Assessment	6
Multiple Choice Tests	5
Psychometrics	5
Scores	5
Standards	5
Testing Problems	5
Achievement Tests	4
Automation	4
Comparative Analysis	4
Difficulty Level	4
Evaluation	4
Item Banks	4
Test Bias	4
Test Use	4
Test Wiseness	4
Adaptive Testing	3
Evaluation Methods	3
More ▼

Source

Educational Measurement:…

Publication Type

Journal Articles	42
Reports - Research	14
Reports - Evaluative	13
Reports - Descriptive	10
Opinion Papers	4
Information Analyses	3
Guides - Non-Classroom	2
Speeches/Meeting Papers	2
Book/Product Reviews	1
Collected Works - Serials	1
Guides - Classroom - Learner	1
Reference Materials -…	1
More ▼

Education Level

Elementary Secondary Education	2
Elementary Education	1
Grade 3	1
Grade 4	1
Grade 5	1

Audience

Location

West Virginia

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
SAT (College Admission Test)	1
Stanford Achievement Tests	1
Stanford Binet Intelligence…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 42 results Save | Export

A Workflow for Minimizing Errors in Template-Based Automated Item-Generation Development

Peer reviewed

Direct link

Yanyan Fu – Educational Measurement: Issues and Practice, 2024

The template-based automated item-generation (TAIG) approach that involves template creation, item generation, item selection, field-testing, and evaluation has more steps than the traditional item development method. Consequentially, there is more margin for error in this process, and any template errors can be cascaded to the generated items.…

Descriptors: Error Correction, Automation, Test Items, Test Construction

An Evaluation of Automatic Item Generation: A Case Study of Weak Theory Approach

Peer reviewed

Direct link

Fu, Yanyan; Choe, Edison M.; Lim, Hwanggyu; Choi, Jaehwa – Educational Measurement: Issues and Practice, 2022

This case study applied the "weak theory" of Automatic Item Generation (AIG) to generate isomorphic item instances (i.e., unique but psychometrically equivalent items) for a large-scale assessment. Three representative instances were selected from each item template (i.e., model) and pilot-tested. In addition, a new analytical framework,…

Descriptors: Test Items, Measurement, Psychometrics, Test Construction

Exploration of Latent Structure in Test Revision and Review Log Data

Peer reviewed

Direct link

Zhang, Susu; Li, Anqi; Wang, Shiyu – Educational Measurement: Issues and Practice, 2023

In computer-based tests allowing revision and reviews, examinees' sequence of visits and answer changes to questions can be recorded. The variable-length revision log data introduce new complexities to the collected data but, at the same time, provide additional information on examinees' test-taking behavior, which can inform test development and…

Descriptors: Computer Assisted Testing, Test Construction, Test Wiseness, Test Items

Instruction-Tuned Large-Language Models for Quality Control in Automatic Item Generation: A Feasibility Study

Peer reviewed

Direct link

Guher Gorgun; Okan Bulut – Educational Measurement: Issues and Practice, 2025

Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for…

Descriptors: Artificial Intelligence, Quality Control, Technology Uses in Education, Automation

Supporting the Interpretive Validity of Student-Level Claims in Science Assessment with Tiered Claim Structures

Peer reviewed

Direct link

Student, Sanford R.; Gong, Brian – Educational Measurement: Issues and Practice, 2022

We address two persistent challenges in large-scale assessments of the Next Generation Science Standards: (a) the validity of score interpretations that target the standards broadly and (b) how to structure claims for assessments of this complex domain. The NGSS pose a particular challenge for specifying claims about students that evidence from…

Descriptors: Science Tests, Test Validity, Test Items, Test Construction

Embedded Standard Setting: Aligning Standard-Setting Methodology with Contemporary Assessment Design Principles

Peer reviewed

Direct link

Lewis, Daniel; Cook, Robert – Educational Measurement: Issues and Practice, 2020

In this paper we assert that the practice of principled assessment design renders traditional standard-setting methodology redundant at best and contradictory at worst. We describe the rationale for, and methodological details of, Embedded Standard Setting (ESS; previously, Engineered Cut Scores. Lewis, 2016), an approach to establish performance…

Descriptors: Standard Setting, Evaluation, Cutting Scores, Performance Based Assessment

Rater Agreement in Test-to-Curriculum Alignment Reviews

Peer reviewed

Direct link

Traynor, A.; Merzdorf, H. E. – Educational Measurement: Issues and Practice, 2018

During the development of large-scale curricular achievement tests, recruited panels of independent subject-matter experts use systematic judgmental methods--often collectively labeled "alignment" methods--to rate the correspondence between a given test's items and the objective statements in a particular curricular standards document.…

Descriptors: Achievement Tests, Expertise, Alignment (Education), Test Items

The Effect of Drag-and-Drop Item Features on Test-Taker Performance and Response Strategies

Peer reviewed

Direct link

Arslan, Burcu; Jiang, Yang; Keehner, Madeleine; Gong, Tao; Katz, Irvin R.; Yan, Fred – Educational Measurement: Issues and Practice, 2020

Computer-based educational assessments often include items that involve drag-and-drop responses. There are different ways that drag-and-drop items can be laid out and different choices that test developers can make when designing these items. Currently, these decisions are based on experts' professional judgments and design constraints, rather…

Descriptors: Test Items, Computer Assisted Testing, Test Format, Decision Making

Digital Module 08: Foundations of Operational Item Analysis https://ncme.elevate.commpartners.com

Peer reviewed

Direct link

Yoo, Hanwook; Hambleton, Ronald K. – Educational Measurement: Issues and Practice, 2019

Item analysis is an integral part of operational test development and is typically conducted within two popular statistical frameworks: classical test theory (CTT) and item response theory (IRT). In this digital ITEMS module, Hanwook Yoo and Ronald K. Hambleton provide an accessible overview of operational item analysis approaches within these…

Descriptors: Item Analysis, Item Response Theory, Guidelines, Test Construction

Development and Validation of an Automatic Item Generation System for English Idioms

Peer reviewed

Direct link

Rafatbakhsh, Elaheh; Ahmadi, Alireza; Moloodi, Amirsaeid; Mehrpour, Saeed – Educational Measurement: Issues and Practice, 2021

Test development is a crucial, yet difficult and time-consuming part of any educational system, and the task often falls all on teachers. Automatic item generation systems have recently drawn attention as they can reduce this burden and make test development more convenient. Such systems have been developed to generate items for vocabulary,…

Descriptors: Test Construction, Test Items, Computer Assisted Testing, Multiple Choice Tests

On the Choice of Anchor Tests in Equating

Peer reviewed

Direct link

Sinharay, Sandip – Educational Measurement: Issues and Practice, 2018

The choice of anchor tests is crucial in applications of the nonequivalent groups with anchor test design of equating. Sinharay and Holland (2006, 2007) suggested "miditests," which are anchor tests that are content-representative and have the same mean item difficulty as the total test but have a smaller spread of item difficulties.…

Descriptors: Test Content, Difficulty Level, Test Items, Test Construction

Affordances of Item Formats and Their Effects on Test-Taker Cognition under Uncertainty

Peer reviewed

Direct link

Moon, Jung Aa; Keehner, Madeleine; Katz, Irvin R. – Educational Measurement: Issues and Practice, 2019

The current study investigated how item formats and their inherent affordances influence test-takers' cognition under uncertainty. Adult participants solved content-equivalent math items in multiple-selection multiple-choice and four alternative grid formats. The results indicated that participants' affirmative response tendency (i.e., judge the…

Descriptors: Affordances, Test Items, Test Format, Test Wiseness

Understanding Examinees' Responses to Items: Implications for Measurement

Peer reviewed

Direct link

Embretson, Susan E. – Educational Measurement: Issues and Practice, 2016

Examinees' thinking processes have become an increasingly important concern in testing. The responses processes aspect is a major component of validity, and contemporary tests increasingly involve specifications about the cognitive complexity of examinees' response processes. Yet, empirical research findings on examinees' cognitive processes are…

Descriptors: Testing, Cognitive Processes, Test Construction, Test Items

Easier Said than Done: Rejoinder on Sijtsma and on Green and Yang

Peer reviewed

Direct link

Davenport, Ernest C.; Davison, Mark L.; Liou, Pey-Yan; Love, Quintin U. – Educational Measurement: Issues and Practice, 2016

The main points of Sijtsma and Green and Yang in Educational Measurement: Issues and Practice (34, 4) are that reliability, internal consistency, and unidimensionality are distinct and that Cronbach's alpha may be problematic. Neither of these assertions are at odds with Davenport, Davison, Liou, and Love in the same issue. However, many authors…

Descriptors: Educational Assessment, Reliability, Validity, Test Construction

A Process for Reviewing and Evaluating Generated Test Items

Peer reviewed

Direct link

Gierl, Mark J.; Lai, Hollis – Educational Measurement: Issues and Practice, 2016

Testing organization needs large numbers of high-quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time-consuming and expensive because each item is written,…

Descriptors: Test Items, Test Construction, Psychometrics, Models

Previous Page | Next Page »

Pages: 1 | 2 | 3

Gierl, Mark J.	2
Hambleton, Ronald K.	2
Katz, Irvin R.	2
Keehner, Madeleine	2
Ahmadi, Alireza	1
Albanese, Mark A.	1
Armstrong, Anne-Marie	1
Arslan, Burcu	1
Bock, R. Darrell	1
Bond, Lloyd	1
Bottsford-Miller, Nicole A.	1
Carter, Kathy	1
Choe, Edison M.	1
Choi, Jaehwa	1
Cook, Robert	1
Davenport, Ernest C.	1
Davidson, Anne H.	1
Davison, Mark L.	1
Drasgow, Fritz	1
Embretson, Susan E.	1
Ferrara, Steve	1
Frey, Andreas	1
Frisbie, David A.	1
Fu, Yanyan	1
Glaser, Robert	1
More ▼