ERIC - Search Results

Publication Date

In 2025	3
Since 2024	6
Since 2021 (last 5 years)	28
Since 2016 (last 10 years)	60
Since 2006 (last 20 years)	88

Descriptor

Test Items	127
Test Construction	42
Item Response Theory	25
Scores	25
Item Analysis	21
Computer Assisted Testing	19
Test Bias	19
Educational Assessment	17
Elementary Secondary Education	15
Test Validity	15
Test Format	14
Achievement Tests	13
Foreign Countries	13
Psychometrics	13
Testing Problems	12
Difficulty Level	11
Evaluation Methods	11
Mathematics Tests	11
Validity	11
Models	10
Scoring	9
Standards	9
Test Interpretation	9
Adaptive Testing	8
Automation	8
More ▼

Source

Educational Measurement:…

127

Publication Type

Journal Articles	127
Reports - Research	59
Reports - Evaluative	31
Reports - Descriptive	28
Opinion Papers	8
Information Analyses	5
Guides - Non-Classroom	3
Speeches/Meeting Papers	3
Book/Product Reviews	1
Collected Works - Serials	1
Guides - Classroom - Learner	1
Reference Materials -…	1
More ▼

Education Level

Secondary Education	9
Elementary Secondary Education	8
Higher Education	5
Postsecondary Education	5
Grade 5	3
Elementary Education	2
Grade 3	2
Grade 4	2
Grade 10	1
High Schools	1

Audience

Location

Canada	3
Germany	2
United States	2
China	1
Haiti	1
Ireland	1
Israel	1
Massachusetts	1
Oregon	1
West Virginia	1

Laws, Policies, & Programs

Every Student Succeeds Act…	1
No Child Left Behind Act 2001	1

Assessments and Surveys

Program for International…	5
ACT Assessment	2
Graduate Record Examinations	2
National Assessment of…	2
SAT (College Admission Test)	2
National Teacher Examinations	1
Preliminary Scholastic…	1
Stanford Achievement Tests	1
Stanford Binet Intelligence…	1
Test of English as a Foreign…	1
Trends in International…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 127 results Save | Export

A Workflow for Minimizing Errors in Template-Based Automated Item-Generation Development

Peer reviewed

Direct link

Yanyan Fu – Educational Measurement: Issues and Practice, 2024

The template-based automated item-generation (TAIG) approach that involves template creation, item generation, item selection, field-testing, and evaluation has more steps than the traditional item development method. Consequentially, there is more margin for error in this process, and any template errors can be cascaded to the generated items.…

Descriptors: Error Correction, Automation, Test Items, Test Construction

Generalizability Theory Approach to Analyzing Automated-Item Generated Test Forms

Peer reviewed

Direct link

Stella Y. Kim; Sungyeun Kim – Educational Measurement: Issues and Practice, 2025

This study presents several multivariate Generalizability theory designs for analyzing automatic item-generated (AIG) based test forms. The study used real data to illustrate the analysis procedure and discuss practical considerations. We collected the data from two groups of students, each group receiving a different form generated by AIG. A…

Descriptors: Generalizability Theory, Automation, Test Items, Students

Investigating Approaches to Controlling Item Position Effects in Computerized Adaptive Tests

Peer reviewed

Direct link

Ye Ma; Deborah J. Harris – Educational Measurement: Issues and Practice, 2025

Item position effect (IPE) refers to situations where an item performs differently when it is administered in different positions on a test. The majority of previous research studies have focused on investigating IPE under linear testing. There is a lack of IPE research under adaptive testing. In addition, the existence of IPE might violate Item…

Descriptors: Computer Assisted Testing, Adaptive Testing, Item Response Theory, Test Items

The Multidimensionality of Measurement Bias in High-Stakes Testing: Using Machine Learning to Evaluate Complex Sources of Differential Item Functioning

Peer reviewed

Direct link

Belzak, William C. M. – Educational Measurement: Issues and Practice, 2023

Test developers and psychometricians have historically examined measurement bias and differential item functioning (DIF) across a single categorical variable (e.g., gender), independently of other variables (e.g., race, age, etc.). This is problematic when more complex forms of measurement bias may adversely affect test responses and, ultimately,…

Descriptors: Test Bias, High Stakes Tests, Artificial Intelligence, Test Items

The Effect of Item Preknowledge on Response Time: Analysis of Two Datasets Using the Multiple-Group Lognormal Response Time Model with a Gating Mechanism

Peer reviewed

Direct link

Zopluoglu, Cengiz; Kasli, Murat; Toton, Sarah L. – Educational Measurement: Issues and Practice, 2021

Response time information has recently attracted significant attention in the literature as it may provide meaningful information about item preknowledge. The methods that use response time information to identify examinees with potential item preknowledge make an implicit assumption that the examinees with item preknowledge differ in their…

Descriptors: Reaction Time, Cheating, Test Items

Guesses and Slips as Proficiency-Related Phenomena and Impacts on Parameter Invariance

Peer reviewed

Direct link

Xiangyi Liao; Daniel M Bolt – Educational Measurement: Issues and Practice, 2024

Traditional approaches to the modeling of multiple-choice item response data (e.g., 3PL, 4PL models) emphasize slips and guesses as random events. In this paper, an item response model is presented that characterizes both disjunctively interacting guessing and conjunctively interacting slipping processes as proficiency-related phenomena. We show…

Descriptors: Item Response Theory, Test Items, Error Correction, Guessing (Tests)

An Evaluation of Automatic Item Generation: A Case Study of Weak Theory Approach

Peer reviewed

Direct link

Fu, Yanyan; Choe, Edison M.; Lim, Hwanggyu; Choi, Jaehwa – Educational Measurement: Issues and Practice, 2022

This case study applied the "weak theory" of Automatic Item Generation (AIG) to generate isomorphic item instances (i.e., unique but psychometrically equivalent items) for a large-scale assessment. Three representative instances were selected from each item template (i.e., model) and pilot-tested. In addition, a new analytical framework,…

Descriptors: Test Items, Measurement, Psychometrics, Test Construction

Do Subject Matter Experts' Judgments of Multiple-Choice Format Suitability Predict Item Quality?

Peer reviewed

Direct link

Berenbon, Rebecca F.; McHugh, Bridget C. – Educational Measurement: Issues and Practice, 2023

To assemble a high-quality test, psychometricians rely on subject matter experts (SMEs) to write high-quality items. However, SMEs are not typically given the opportunity to provide input on which content standards are most suitable for multiple-choice questions (MCQs). In the present study, we explored the relationship between perceived MCQ…

Descriptors: Test Items, Multiple Choice Tests, Standards, Difficulty Level

Item Selection Algorithm Based on Collaborative Filtering for Item Exposure Control

Peer reviewed

Direct link

Pan, Yiqin; Livne, Oren; Wollack, James A.; Sinharay, Sandip – Educational Measurement: Issues and Practice, 2023

In computerized adaptive testing, overexposure of items in the bank is a serious problem and might result in item compromise. We develop an item selection algorithm that utilizes the entire bank well and reduces the overexposure of items. The algorithm is based on collaborative filtering and selects an item in two stages. In the first stage, a set…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Algorithms

Reconceptualization of Coefficient Alpha Reliability for Test Summed and Scaled Scores

Peer reviewed

Direct link

Almehrizi, Rashid S. – Educational Measurement: Issues and Practice, 2022

Coefficient alpha reliability persists as the most common reliability coefficient reported in research. The assumptions for its use are, however, not well-understood. The current paper challenges the commonly used expressions of coefficient alpha and argues that while these expressions are correct when estimating reliability for summed scores,…

Descriptors: Reliability, Scores, Scaling, Statistical Analysis

Exploration of Latent Structure in Test Revision and Review Log Data

Peer reviewed

Direct link

Zhang, Susu; Li, Anqi; Wang, Shiyu – Educational Measurement: Issues and Practice, 2023

In computer-based tests allowing revision and reviews, examinees' sequence of visits and answer changes to questions can be recorded. The variable-length revision log data introduce new complexities to the collected data but, at the same time, provide additional information on examinees' test-taking behavior, which can inform test development and…

Descriptors: Computer Assisted Testing, Test Construction, Test Wiseness, Test Items

An Automated Item Pool Assembly Framework for Maximizing Item Utilization for CAT

Peer reviewed

Direct link

Hwanggyu Lim; Kyung T. Han – Educational Measurement: Issues and Practice, 2024

Computerized adaptive testing (CAT) has gained deserved popularity in the administration of educational and professional assessments, but continues to face test security challenges. To ensure sustained quality assurance and testing integrity, it is imperative to establish and maintain multiple stable item pools that are consistent in terms of…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Item Banks

A Machine Learning Approach for the Simultaneous Detection of Preknowledge in Examinees and Items When Both Are Unknown

Peer reviewed

Direct link

Pan, Yiqin; Wollack, James A. – Educational Measurement: Issues and Practice, 2023

Pan and Wollack (PW) proposed a machine learning method to detect compromised items. We extend the work of PW to an approach detecting compromised items and examinees with item preknowledge simultaneously and draw on ideas in ensemble learning to relax several limitations in the work of PW. The suggested approach also provides a confidence score,…

Descriptors: Artificial Intelligence, Prior Learning, Item Analysis, Test Content

Reporting Pass-Fail Decisions to Examinees with Incomplete Data: A Commentary on Feinberg (2021)

Peer reviewed

Direct link

Sinharay, Sandip – Educational Measurement: Issues and Practice, 2022

Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores, and hence to incomplete data, on credentialing tests such as the United States Medical Licensing examination. Feinberg compared four approaches for reporting pass-fail decisions to the examinees with incomplete data on credentialing…

Descriptors: Testing Problems, High Stakes Tests, Credentials, Test Items

Evaluating Population Invariance of Test Equating during the COVID-19 Pandemic

Peer reviewed

Direct link

Li, Dongmei; Kapoor, Shalini – Educational Measurement: Issues and Practice, 2022

Population invariance is a desirable property of test equating which might not hold when significant changes occur in the test population, such as those brought about by the COVID-19 pandemic. This research aims to investigate whether equating functions are reasonably invariant when the test population is impacted by the pandemic. Based on…

Descriptors: Test Items, Equated Scores, COVID-19, Pandemics

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Gierl, Mark J.	4
Sinharay, Sandip	4
Sireci, Stephen G.	4
Feinberg, Richard A.	3
Wainer, Howard	3
Anderson, Daniel	2
Clauser, Brian E.	2
Dorans, Neil J.	2
Frey, Andreas	2
Hambleton, Ronald K.	2
Hills, John R.	2
Katz, Irvin R.	2
Keehner, Madeleine	2
Khorramdel, Lale	2
Lai, Hollis	2
Lane, Suzanne	2
Li, Dongmei	2
Liang, Longjuan	2
Lim, Hwanggyu	2
Pan, Yiqin	2
Raymond, Mark R.	2
Reckase, Mark D.	2
Solano-Flores, Guillermo	2
Stone, Clement A.	2
More ▼