ERIC - Search Results

Publication Date

In 2025

Source

Educational Measurement:…

Author

Andrew Hoang	1
Chen Li	1
Daniel F. McCaffrey	1
Deborah J. Harris	1
Derek C. Briggs	1
Guher Gorgun	1
Hongwen Guo	1
Hyejin Yoon	1
Joseph A. Martineau	1
Jung Yeon Park	1
Katherine E. Castellano	1
Laurie Davis	1
Mo Zhang	1
Okan Bulut	1
Paul Deane	1
Sanford R. Student	1
Sean Joo	1
Stella Y. Kim	1
Sungyeun Kim	1
Ye Ma	1
Zikun Li	1
More ▼

Publication Type

Journal Articles	7
Reports - Research	7

Education Level

Secondary Education	2
Elementary Education	1
Junior High Schools	1
Middle Schools	1

Audience

Location

United States

Laws, Policies, & Programs

Assessments and Surveys

Program for International…

What Works Clearinghouse Rating

Showing all 7 results Save | Export

Instruction-Tuned Large-Language Models for Quality Control in Automatic Item Generation: A Feasibility Study

Peer reviewed

Direct link

Guher Gorgun; Okan Bulut – Educational Measurement: Issues and Practice, 2025

Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for…

Descriptors: Artificial Intelligence, Quality Control, Technology Uses in Education, Automation

Measurement Invariance for Multilingual Learners Using Item Response and Response Time in PISA 2018

Peer reviewed

Direct link

Jung Yeon Park; Sean Joo; Zikun Li; Hyejin Yoon – Educational Measurement: Issues and Practice, 2025

This study examines potential assessment bias based on students' primary language status in PISA 2018. Specifically, multilingual (MLs) and nonmultilingual (non-MLs) students in the United States are compared with regard to their response time as well as scored responses across three cognitive domains (reading, mathematics, and science).…

Descriptors: Achievement Tests, Secondary School Students, International Assessment, Test Bias

Growth across Grades and Common Item Grade Alignment in Vertical Scaling Using the Rasch Model

Peer reviewed

Direct link

Sanford R. Student; Derek C. Briggs; Laurie Davis – Educational Measurement: Issues and Practice, 2025

Vertical scales are frequently developed using common item nonequivalent group linking. In this design, one can use upper-grade, lower-grade, or mixed-grade common items to estimate the linking constants that underlie the absolute measurement of growth. Using the Rasch model and a dataset from Curriculum Associates' i-Ready Diagnostic in math in…

Descriptors: Elementary School Mathematics, Elementary School Students, Middle School Mathematics, Middle School Students

Generalizability Theory Approach to Analyzing Automated-Item Generated Test Forms

Peer reviewed

Direct link

Stella Y. Kim; Sungyeun Kim – Educational Measurement: Issues and Practice, 2025

This study presents several multivariate Generalizability theory designs for analyzing automatic item-generated (AIG) based test forms. The study used real data to illustrate the analysis procedure and discuss practical considerations. We collected the data from two groups of students, each group receiving a different form generated by AIG. A…

Descriptors: Generalizability Theory, Automation, Test Items, Students

Investigating Approaches to Controlling Item Position Effects in Computerized Adaptive Tests

Peer reviewed

Direct link

Ye Ma; Deborah J. Harris – Educational Measurement: Issues and Practice, 2025

Item position effect (IPE) refers to situations where an item performs differently when it is administered in different positions on a test. The majority of previous research studies have focused on investigating IPE under linear testing. There is a lack of IPE research under adaptive testing. In addition, the existence of IPE might violate Item…

Descriptors: Computer Assisted Testing, Adaptive Testing, Item Response Theory, Test Items

Applications and Modeling of Keystroke Logs in Writing Assessments

Peer reviewed

Direct link

Mo Zhang; Paul Deane; Andrew Hoang; Hongwen Guo; Chen Li – Educational Measurement: Issues and Practice, 2025

In this paper, we describe two empirical studies that demonstrate the application and modeling of keystroke logs in writing assessments. We illustrate two different approaches of modeling differences in writing processes: analysis of mean differences in handcrafted theory-driven features and use of large language models to identify stable personal…

Descriptors: Writing Tests, Computer Assisted Testing, Keyboarding (Data Entry), Writing Processes

Demystifying Adequate Growth Percentiles

Peer reviewed

Direct link

Katherine E. Castellano; Daniel F. McCaffrey; Joseph A. Martineau – Educational Measurement: Issues and Practice, 2025

Growth-to-standard models evaluate student growth against the growth needed to reach a future standard or target of interest, such as proficiency. A common growth-to-standard model involves comparing the popular Student Growth Percentile (SGP) to Adequate Growth Percentiles (AGPs). AGPs follow from an involved process based on fitting a series of…

Descriptors: Student Evaluation, Growth Models, Student Educational Objectives, Educational Indicators

Computer Assisted Testing	3
Test Items	3
Achievement Gains	2
Artificial Intelligence	2
Automation	2
Context Effect	2
Growth Models	2
Test Reliability	2
Test Validity	2
Academic Achievement	1
Academic Standards	1
Achievement Tests	1
Adaptive Testing	1
Cloze Procedure	1
Data	1
Data Analysis	1
Data Collection	1
Educational Indicators	1
Elementary School Mathematics	1
Elementary School Students	1
Error of Measurement	1
Evaluation Methods	1
Federal Programs	1
Generalizability Theory	1
Grade Level Differences	1
More ▼