ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	18
Since 2006 (last 20 years)	72

Descriptor

Simulation	92
Test Bias	92
Test Items	92
Item Response Theory	47
Evaluation Methods	20
Models	18
Statistical Analysis	18
Comparative Analysis	16
Item Analysis	15
Error of Measurement	14
Scores	14
Computation	13
Sample Size	13
Difficulty Level	12
Computer Assisted Testing	11
Adaptive Testing	10
Foreign Countries	10
Measurement Techniques	10
Measurement	9
Regression (Statistics)	9
Psychometrics	8
Test Length	8
Test Reliability	8
Achievement Tests	7
Scoring	7
More ▼

Publication Type

Journal Articles	72
Reports - Research	56
Reports - Evaluative	24
Dissertations/Theses -…	7
Speeches/Meeting Papers	5
Reports - Descriptive	3
Books	1
Collected Works - General	1
Numerical/Quantitative Data	1

Education Level

Secondary Education	6
Elementary Secondary Education	4
Junior High Schools	4
Middle Schools	4
Grade 9	2
High Schools	2
Higher Education	2
Postsecondary Education	2
Elementary Education	1
Grade 4	1
Intermediate Grades	1
More ▼

Audience

Location

Taiwan	2
Canada	1
Florida	1
Germany	1
Indonesia	1
Netherlands	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	4
Trends in International…	3
Florida Comprehensive…	1
Graduate Record Examinations	1
National Assessment of…	1
Wechsler Adult Intelligence…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 92 results Save | Export

Using Item Scores and Distractors to Detect Item Compromise and Preknowledge

Peer reviewed

Direct link

Gorney, Kylie; Wollack, James A.; Sinharay, Sandip; Eckerly, Carol – Journal of Educational and Behavioral Statistics, 2023

Any time examinees have had access to items and/or answers prior to taking a test, the fairness of the test and validity of test score interpretations are threatened. Therefore, there is a high demand for procedures to detect both compromised items (CI) and examinees with preknowledge (EWP). In this article, we develop a procedure that uses item…

Descriptors: Scores, Test Validity, Test Items, Prior Learning

Estimating Difference-Score Reliability in Pretest-Posttest Settings

Peer reviewed

Direct link

Gu, Zhengguo; Emons, Wilco H. M.; Sijtsma, Klaas – Journal of Educational and Behavioral Statistics, 2021

Clinical, medical, and health psychologists use difference scores obtained from pretest--posttest designs employing the same test to assess intraindividual change possibly caused by an intervention addressing, for example, anxiety, depression, eating disorder, or addiction. Reliability of difference scores is important for interpreting observed…

Descriptors: Test Reliability, Scores, Pretests Posttests, Computation

Bias and Bias Correction Method for Nonproportional Abilities Requirement (NPAR) Tests

Peer reviewed

Direct link

Ip, Edward H.; Strachan, Tyler; Fu, Yanyan; Lay, Alexandra; Willse, John T.; Chen, Shyh-Huei; Rutkowski, Leslie; Ackerman, Terry – Journal of Educational Measurement, 2019

Test items must often be broad in scope to be ecologically valid. It is therefore almost inevitable that secondary dimensions are introduced into a test during test development. A cognitive test may require one or more abilities besides the primary ability to correctly respond to an item, in which case a unidimensional test score overestimates the…

Descriptors: Test Items, Test Bias, Test Construction, Scores

Impact of Item Parameter Drift on Rasch Scale Stability in Small Samples over Multiple Administrations

Peer reviewed

Direct link

Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020

Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…

Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling

Routing Strategies and Optimizing Design for Multistage Testing in International Large-Scale Assessments

Peer reviewed

Direct link

Svetina, Dubravka; Liaw, Yuan-Ling; Rutkowski, Leslie; Rutkowski, David – Journal of Educational Measurement, 2019

This study investigates the effect of several design and administration choices on item exposure and person/item parameter recovery under a multistage test (MST) design. In a simulation study, we examine whether number-correct (NC) or item response theory (IRT) methods are differentially effective at routing students to the correct next stage(s)…

Descriptors: Measurement, Item Analysis, Test Construction, Item Response Theory

The Effect of Person Misfit on Item Parameter Estimation and Classification Accuracy: A Simulation Study

Peer reviewed
PDF on ERIC

Download full text

Mousavi, Amin; Cui, Ying – Education Sciences, 2020

Often, important decisions regarding accountability and placement of students in performance categories are made on the basis of test scores generated from tests, therefore, it is important to evaluate the validity of the inferences derived from test results. One of the threats to the validity of such inferences is aberrant responding. Several…

Descriptors: Student Evaluation, Educational Testing, Psychological Testing, Item Response Theory

Within-Item Interactions in Bifactor Models for Ordered-Categorical Item Responses

Direct link

Fager, Meghan L. – ProQuest LLC, 2019

Recent research in multidimensional item response theory has introduced within-item interaction effects between latent dimensions in the prediction of item responses. The objective of this study was to extend this research to bifactor models to include an interaction effect between the general and specific latent variables measured by an item.…

Descriptors: Test Items, Item Response Theory, Factor Analysis, Simulation

Comparison of Two Non-IRT Based Multi-Groups DIF Detection Methods' Performances on Type I Error, Power and Precision Rates

Direct link

Esen, Ayse – ProQuest LLC, 2017

Detecting Differential Item Functioning (DIF) is an early step and very critical to investigate any possible bias between groups (e.g., males vs. females). Many early DIF studies only focused on two-group comparison. However, there are many cases where more than two groups exist: Cross-cultural studies are administered in many countries and any…

Descriptors: Test Bias, Cross Cultural Studies, Ethnicity, Error Patterns

Probing for Bias: Comparing Populations Using Item Response Curves

Peer reviewed
PDF on ERIC

Download full text

Paul J. Walter; Edward Nuhfer; Crisel Suarez – Numeracy, 2021

We introduce an approach for making a quantitative comparison of the item response curves (IRCs) of any two populations on a multiple-choice test instrument. In this study, we employ simulated and actual data. We apply our approach to a dataset of 12,187 participants on the 25-item Science Literacy Concept Inventory (SLCI), which includes ample…

Descriptors: Item Analysis, Multiple Choice Tests, Simulation, Data Analysis

Measuring Widening Proficiency Differences in International Assessments: Are Current Approaches Enough?

Peer reviewed

Direct link

Rutkowski, David; Rutkowski, Leslie; Liaw, Yuan-Ling – Educational Measurement: Issues and Practice, 2018

Participation in international large-scale assessments has grown over time with the largest, the Programme for International Student Assessment (PISA), including more than 70 education systems that are economically and educationally diverse. To help accommodate for large achievement differences among participants, in 2009 PISA offered…

Descriptors: Educational Assessment, Foreign Countries, Achievement Tests, Secondary School Students

Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

Peer reviewed

Direct link

Matlock, Ki Lynn; Turner, Ronna – Educational and Psychological Measurement, 2016

When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…

Descriptors: Item Response Theory, Computation, Test Items, Difficulty Level

Five Methods for Estimating Angoff Cut Scores with IRT

Peer reviewed

Direct link

Wyse, Adam E. – Educational Measurement: Issues and Practice, 2017

This article illustrates five different methods for estimating Angoff cut scores using item response theory (IRT) models. These include maximum likelihood (ML), expected a priori (EAP), modal a priori (MAP), and weighted maximum likelihood (WML) estimators, as well as the most commonly used approach based on translating ratings through the test…

Descriptors: Cutting Scores, Item Response Theory, Bayesian Statistics, Maximum Likelihood Statistics

Type I Error Inflation in DIF Identification with Mantel-Haenszel: An Explanation and a Solution

Peer reviewed

Direct link

Magis, David; De Boeck, Paul – Educational and Psychological Measurement, 2014

It is known that sum score-based methods for the identification of differential item functioning (DIF), such as the Mantel-Haenszel (MH) approach, can be affected by Type I error inflation in the absence of any DIF effect. This may happen when the items differ in discrimination and when there is item impact. On the other hand, outlier DIF methods…

Descriptors: Test Bias, Statistical Analysis, Test Items, Simulation

How Does Polytomous Item Bias Affect Total-Group Survey Score Comparisons?

Peer reviewed

Direct link

Hidalgo, Ma Dolores; Benítez, Isabel; Padilla, Jose-Luis; Gómez-Benito, Juana – Sociological Methods & Research, 2017

The growing use of scales in survey questionnaires warrants the need to address how does polytomous differential item functioning (DIF) affect observed scale score comparisons. The aim of this study is to investigate the impact of DIF on the type I error and effect size of the independent samples t-test on the observed total scale scores. A…

Descriptors: Test Items, Test Bias, Item Response Theory, Surveys

Anchor Selection Strategies for DIF Analysis: Review, Assessment, and New Approaches

Peer reviewed

Direct link

Kopf, Julia; Zeileis, Achim; Strobl, Carolin – Educational and Psychological Measurement, 2015

Differential item functioning (DIF) indicates the violation of the invariance assumption, for instance, in models based on item response theory (IRT). For item-wise DIF analysis using IRT, a common metric for the item parameters of the groups that are to be compared (e.g., for the reference and the focal group) is necessary. In the Rasch model,…

Descriptors: Test Items, Equated Scores, Test Bias, Item Response Theory

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7

Educational and Psychological…	20
Journal of Educational…	11
Applied Measurement in…	8
Applied Psychological…	7
International Journal of…	7
ProQuest LLC	7
Journal of Educational and…	4
ETS Research Report Series	3
Educational Measurement:…	2
Asia Pacific Education Review	1
EURASIA Journal of…	1
Education Sciences	1
Hacettepe University Journal…	1
IGI Global	1
Large-scale Assessments in…	1
Numeracy	1
Psychological Methods	1
Psychometrika	1
Society for Research on…	1
Sociological Methods &…	1
Structural Equation Modeling:…	1
More ▼

Wang, Wen-Chung	8
Penfield, Randall D.	6
Rutkowski, Leslie	4
Oshima, T. C.	3
Rutkowski, David	3
Shih, Ching-Lin	3
Su, Ya-Hui	3
Weiss, David J.	3
Beretvas, S. Natasha	2
Bolt, Daniel M.	2
Gierl, Mark J.	2
Kamata, Akihito	2
Liaw, Yuan-Ling	2
Paek, Insu	2
Pine, Steven M.	2
Sinharay, Sandip	2
Stark, Stephen	2
Stocking, Martha L.	2
Walker, Cindy M.	2
Wilson, Mark	2
Wyse, Adam E.	2
Ackerman, Terry	1
Algina, James	1
Ali, Usama S.	1
More ▼